CN114666653A

CN114666653A - Subtitle display method and device for music segments and readable storage medium

Info

Publication number: CN114666653A
Application number: CN202210290074.8A
Authority: CN
Inventors: 陈颖
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2022-06-24

Abstract

The application discloses a subtitle display method and device for music segments and a computer readable storage medium, wherein audio to be identified is segmented into a plurality of audio slices, and audio fingerprint characteristics of each audio slice are extracted; matching audio fingerprint characteristics from an audio fingerprint database established based on different songs to obtain the playing position of a song playing fragment in the audio to be identified and song information corresponding to the song playing fragment; and positioning the playing position of the pure music segment only played by the song in the song playing segments so as to display corresponding song information when the pure music segment is played. Therefore, the method and the device can locate the playing position of the pure music segment from the audio to be recognized, display corresponding song information when the pure music segment is played, and avoid subtitle errors of the pure music segment caused by intelligent subtitle translation, so that subtitle accuracy is improved, and better experience is brought to users.

Description

Subtitle display method and device for music segments and readable storage medium

Technical Field

The present application relates to the field of smart subtitles, and in particular, to a method and an apparatus for displaying subtitles for music pieces, and a computer-readable storage medium.

Background

The intelligent caption technology is a technology for displaying the voice content in the audio and video works in a text form. At present, in the fields of long audio or podcasts and the like, the intelligent subtitle technology is generally utilized to display the speaking content of a main broadcast in a work in a text form, so that the understanding of a user on the work can be greatly improved. However, the work may not only contain the speaking content of the main broadcasting, but also intersperse the playing of many music songs, as shown in fig. 1, these intersperse music songs exist in the work as background music or pure music segments, and for some foreign language songs or songs that are not sung, the intelligent subtitle technology is prone to have a situation of wrong text recognition, thereby causing the accuracy of subtitles to be reduced and the user experience to be poor.

Therefore, how to provide a solution to the above technical problem is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The application aims to provide a music segment subtitle display method, equipment and a computer readable storage medium, which can locate the playing position of a pure music segment from audio to be identified, display corresponding song information when the pure music segment is played, and avoid subtitle errors of the pure music segment caused by intelligent subtitle translation, so that subtitle accuracy is improved, and better experience is brought to a user.

In order to solve the above technical problem, the present application provides a method for displaying subtitles of a music piece, including:

segmenting audio to be identified into a plurality of audio slices, and extracting audio fingerprint characteristics of each audio slice;

matching the audio fingerprint characteristics from an audio fingerprint database established based on different songs to obtain the playing position of a song playing segment in the audio to be identified and song information corresponding to the song playing segment;

and positioning the playing position of a pure music segment which only plays the song in the song playing segments so as to display corresponding song information when the pure music segment plays.

Optionally, matching the audio fingerprint features from an audio fingerprint library established based on different songs to obtain a play position of a song play segment in the audio to be identified and song information corresponding to the song play segment, including:

matching the audio fingerprint characteristics of each audio slice from the audio fingerprint database to obtain a successfully matched target audio slice and song information corresponding to the target audio slice;

integrating the target audio slices to obtain a song playing segment in the audio to be identified and a playing position of the song playing segment;

and integrating the song information corresponding to each target audio slice in the song playing segment to obtain the song information corresponding to the song playing segment.

Optionally, the song information includes a song ID and a song fragment;

integrating the target audio slices to obtain a song playing segment in the audio to be identified and a playing position of the song playing segment; integrating song information corresponding to each target audio slice in the song playing segment to obtain song information corresponding to the song playing segment, wherein the song information comprises:

integrating adjacent target audio slices corresponding to the same song ID to obtain a song playing segment corresponding to the same song ID in the audio to be identified and a playing position of the song playing segment;

and integrating the song segments corresponding to the target audio slices in the song playing segments to obtain the song segments corresponding to the song playing segments.

Optionally, before integrating the target audio slices that are adjacent and correspond to the same song ID, the method for displaying subtitles of the music piece further includes:

when the adjacent target audio slices correspond to different song IDs, judging whether the songs corresponding to the adjacent target audio slices belong to different versions of the same song or not;

and if so, determining a first song ID with higher heat and a second song ID with lower heat from the song IDs corresponding to the adjacent target audio slices, and updating the second song ID to the first song ID.

Optionally, the song information includes song segments;

positioning the playing position of the pure music segment only played by the song in the song playing segments comprises:

segmenting the song playing segments into a plurality of song playing segments;

acquiring reference audio fingerprint data of a song segment corresponding to a target song playing slice in the audio fingerprint database; wherein the target song playing slice is any one of the song playing slices;

obtaining the total number of the same audio fingerprint data in the actual audio fingerprint data of the target song playing slice and the reference audio fingerprint data;

if the total number is larger than a preset number threshold value, determining that the target song playing slice belongs to a pure music segment;

and integrating the target song playing slices belonging to the pure music segments to obtain the pure music segments in the song playing segments and the playing positions of the pure music segments.

Optionally, the subtitle display method for a music piece further includes:

if the total number is not greater than a preset number threshold value, determining that the target song playing slice belongs to a background music segment with a main speech and a song playing function;

and integrating the target song playing slices belonging to the background music slices to obtain the background music slices in the song playing slices and the playing positions of the background music slices.

Optionally, the subtitle display method for a music piece further includes:

and performing audio recognition on the background music segment so as to display corresponding text information when the background music segment is played.

In order to solve the above technical problem, the present application further provides a subtitle display apparatus for a music piece, including:

a memory for storing a computer program;

a processor for implementing the steps of any one of the above-mentioned music piece subtitle display methods when executing the computer program.

In order to solve the above technical problem, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the above methods for displaying subtitles of music pieces.

The application provides a subtitle display method for music segments, which comprises the steps of segmenting audio to be identified into a plurality of audio slices, and extracting audio fingerprint characteristics of each audio slice; matching audio fingerprint characteristics from an audio fingerprint database established based on different songs to obtain the playing position of a song playing segment in the audio to be identified and song information corresponding to the song playing segment; and positioning the playing position of the pure music segment only played by the song in the song playing segments so as to display corresponding song information when the pure music segment is played. Therefore, the method and the device can locate the playing position of the pure music segment from the audio to be recognized, display corresponding song information when the pure music segment is played, and avoid subtitle errors of the pure music segment caused by intelligent subtitle translation, so that subtitle accuracy is improved, and better experience is brought to users.

The application also provides a subtitle display device of the music segment and a computer readable storage medium, which have the same beneficial effects as the subtitle display method.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed in the prior art and the embodiments are briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic diagram of a prior art work;

fig. 2 is a flowchart of a subtitle displaying method for a music piece according to an embodiment of the present application;

fig. 3 is a schematic diagram of audio slicing according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an audio-associated song provided by an embodiment of the present application;

fig. 5 is a schematic diagram of segment matching between song 1 and segment a by using an LCS algorithm according to an embodiment of the present application;

FIG. 6 is a time chart of Song 1 against background music pieces and pure music pieces according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a subtitle display apparatus for a music piece according to an embodiment of the present application.

Detailed Description

The core of the application is to provide a music segment subtitle display method, equipment and a computer readable storage medium, which can locate the playing position of a pure music segment from audio to be identified, display corresponding song information when the pure music segment is played, and avoid subtitle errors of the pure music segment caused by intelligent subtitle translation, thereby improving subtitle accuracy and bringing better experience to users.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 2, fig. 2 is a flowchart illustrating a subtitle displaying method for a music piece according to an embodiment of the present disclosure.

The subtitle display method for the music piece comprises the following steps:

step S1: and segmenting the audio to be identified into a plurality of audio slices, and extracting the audio fingerprint characteristics of each audio slice.

Specifically, the audio to be identified can be segmented into a plurality of audio slices, and specifically, the audio to be identified can be segmented into a plurality of audio slices based on the preset first slice length and the preset first slice interval density. It should be noted that the first slice length determines the slice length of each audio slice; the first slice interval density determines the slice interval of adjacent audio slices. For example, the length of the first slice is set to 6s, and in order to make the playing time more accurate, the interval density of the first slice is set to take one slice every 2s, as shown in fig. 3, the first audio slice of the audio to be recognized is 1-6 s of the audio to be recognized, and the second audio slice of the audio to be recognized is 3-8 s of the audio to be recognized.

Based on the method, the audio fingerprint features of each audio slice are extracted by using an audio fingerprint technology (unique digital features in a section of audio are extracted in the form of identifiers through a specific algorithm), namely the audio fingerprint features are extracted in a slicing mode, so that the audio fingerprint features of the whole audio to be identified are extracted.

Step S2: and matching audio fingerprint characteristics from an audio fingerprint database established based on different songs to obtain the playing position of the song playing segment in the audio to be identified and song information corresponding to the song playing segment.

Specifically, the application establishes an audio fingerprint library consisting of audio fingerprint characteristics of a plurality of songs (such as millions or even tens of millions of songs) in advance. Based on this, the audio fingerprint characteristics of the plurality of audio slices after the audio to be identified is segmented are matched from the audio fingerprint database, the playing position of the song playing segment in the audio to be identified can be obtained, and song information corresponding to the song playing segment can also be obtained.

Step S3: and positioning the playing position of the pure music segment only played by the song in the song playing segment so as to display corresponding song information when the pure music segment is played.

Specifically, the song playing segment of the audio to be identified includes two cases: 1) the song playing segment with the main speech and the song playing is provided, and the part of the song playing segment can be called as a background music segment; 2) the song playing segment without the anchor speaking and only the song playing is called pure music segment. The purpose of this application is: and displaying corresponding song information when the pure music segment is played, namely, the pure music segment does not adopt intelligent caption translation so as to avoid caption errors caused by intelligent caption translation of the pure music segment. Based on this, the technical means adopted by the application is as follows: the method comprises the steps of positioning the playing position of a pure music segment only played by a song in the song playing segment, wherein the song information corresponding to the song playing segment of the audio to be identified is known, and obtaining the song information corresponding to the pure music segment in the song playing segment according to the song information corresponding to the song playing segment, so that the corresponding song information is displayed when the pure music segment of the audio to be identified is played.

Therefore, the method and the device can locate the playing position of the pure music segment from the audio to be recognized, display corresponding song information when the pure music segment is played, and avoid subtitle errors of the pure music segment caused by intelligent subtitle translation, so that subtitle accuracy is improved, and better experience is brought to users.

On the basis of the above-described embodiment:

as an optional embodiment, matching audio fingerprint features from an audio fingerprint library established based on different songs to obtain a playing position of a song playing segment in an audio to be identified and song information corresponding to the song playing segment includes:

matching the audio fingerprint characteristics of each audio slice from an audio fingerprint library to obtain a target audio slice which is successfully matched and song information corresponding to the target audio slice;

Specifically, when the audio fingerprint features of the audio slice are matched from the audio fingerprint library, if the audio fingerprint features of a certain audio slice can be successfully matched with the audio fingerprint features of a corresponding song in the audio fingerprint library, it is indicated that the audio slice belongs to a song playing segment in the audio to be identified, and under this condition, song information (such as song ID (Identity), song segment and other information) corresponding to the audio slice can be determined according to the audio fingerprint features of the corresponding song matched by the audio slice in the audio fingerprint library; if the audio fingerprint characteristics of the audio slice cannot be successfully matched with the audio fingerprint characteristics of the corresponding song in the audio fingerprint library, the audio slice is not the song playing segment in the audio to be identified.

Based on the method and the device, the audio fingerprint characteristics of each audio slice are matched from the audio fingerprint library, the target audio slice which is successfully matched can be obtained, and song information corresponding to the target audio slice which is successfully matched can also be obtained. Because the successfully matched target audio slices belong to the song playing segments of the audio to be identified, the successfully matched target audio slices can be integrated to obtain the song playing segments of the audio to be identified and the playing positions of the song playing segments, and the song information corresponding to the successfully matched target audio slices is integrated to obtain the song information corresponding to the song playing segments of the audio to be identified.

As an alternative embodiment, the song information includes a song ID and a song clip;

Specifically, the song information of the application includes a song ID and a song fragment, and it should be noted that which song is specifically known according to the song ID; the starting and ending play positions of the song are known from the song clip.

It can be understood that if a plurality of target audio slices correspond to the same song ID, and a plurality of target audio slices correspond to the same song, the target audio slices adjacent to each other and corresponding to the same song ID are integrated, and a song playing segment corresponding to the same song ID in the audio to be identified and a playing position of the song playing segment can be obtained. Meanwhile, the song segments corresponding to the target audio slices in the song playing segments corresponding to the same song ID are integrated, and the song segments corresponding to the song playing segments can be obtained.

As shown in fig. 4, the long audio to be recognized includes two song playing segments, segment a of the long audio in the time period from T1 to T2 and segment B of the long audio in the time period from T3 to T4; the segment A is associated with the time segments s 1-s 2 of the song 1, namely when the segment A of the long audio is played, the song information of the song 1 in the time segments s 1-s 2 is synchronously displayed; segment B is associated with time periods s 3-s 4 of Song 2, i.e., when segment B of long audio is played, the song information of Song 2 is displayed synchronously in the time periods s 3-s 4.

Based on the above, the application can also store each song playing segment of the audio to be identified, the song ID corresponding to each song playing segment and the song segment into the song list of the audio to be identified, so as to display the song information by referring to the song list when the audio to be identified is played.

As an alternative embodiment, before integrating the target audio slices that are adjacent and correspond to the same song ID, the method for displaying subtitles of a music piece further includes:

when the adjacent target audio slices correspond to different song IDs, judging whether the songs corresponding to the adjacent target audio slices belong to different versions of the same song;

Further, considering that different versions of the same song correspond to different song IDs, it is determined whether the adjacent target audio slices correspond to the same song only according to the song IDs corresponding to the adjacent target audio slices, and the determination is not reliable. Based on the method, before the target audio slices which are adjacent and correspond to the same song ID are integrated, whether the adjacent target audio slices correspond to different song IDs is judged; if the same song ID is corresponded, the song ID corresponding to the adjacent target audio slice is not processed; if the song IDs correspond to different song IDs, judging whether the songs corresponding to the adjacent target audio slices belong to different versions of the same song or not according to the song IDs corresponding to the adjacent target audio slices; if the target audio slice does not belong to different versions of the same song, the song ID corresponding to the adjacent target audio slice is not processed; if the song IDs belong to different versions of the same song, determining a first song ID with higher heat and a second song ID with lower heat from the song IDs corresponding to the adjacent target audio slices, then updating the second song ID to the first song ID, namely processing the song IDs corresponding to the adjacent target audio slices into the same song ID, and reserving the song ID with higher heat, thereby enabling the subsequent judgment of whether the adjacent target audio slices correspond to the same song according to the song IDs corresponding to the adjacent target audio slices to be credible.

As an alternative embodiment, the song information includes song pieces;

the method for positioning the playing position of the pure music segment only played by the song in the song playing segments comprises the following steps:

segmenting the song playing segments into a plurality of song playing segments;

acquiring reference audio fingerprint data of a song segment corresponding to a target song playing slice in an audio fingerprint library; the target song playing slice is any song playing slice;

Specifically, the process for locating the playing position of the pure music segment in the song playing segment of the audio to be identified comprises the following steps: 1) the method comprises the steps of segmenting a song playing segment into a plurality of song playing segments, and particularly segmenting the song playing segment into a plurality of song playing segments based on a preset second segment length and a preset second segment interval density; it should be noted that the second slice length determines the slice length of each song playing slice; the second slice interval density determines the slice interval of adjacent song playing slices. For example, according to the fine time slicing, the second slice length is set to 0.5s one slice, and the second slice interval density is set to take one slice every 0.1 s. 2) Knowing song information (including song segments) corresponding to the song playing segments, determining the song segments corresponding to any song playing segments of the song playing segments according to the song information corresponding to the song playing segments, and acquiring reference audio fingerprint data of the song segments corresponding to any song playing segments in an audio fingerprint library. 3) On the basis of the principle that the matching degree of the pure music segments and the song segments is higher than that of the background music segments and the song segments, the total number of the same audio fingerprint data in the actual audio fingerprint data of the first song playing slice and the reference audio fingerprint data of the song segment corresponding to the first song playing slice is obtained, for example, the actual audio fingerprint data of the first song playing slice is ABCD, the reference audio fingerprint data of the song segment corresponding to the first song playing slice is ABDE, and the total number of the same audio fingerprint data is 3. And if the total quantity of the same audio fingerprint data is greater than a preset quantity threshold value, determining that the first song playing slice belongs to a pure music segment. In a similar way, the total number of the same audio fingerprint data in the actual audio fingerprint data of the second song playing slice and the reference audio fingerprint data of the song segment corresponding to the second song playing slice is obtained; if the total number of the same audio fingerprint data is larger than the preset number threshold, determining that the second song playing slice belongs to the pure music segment, and repeating the steps, determining whether any song playing slice (called a target song playing slice) belongs to the pure music segment. 4) And integrating all target song playing slices belonging to the pure music segments to obtain the pure music segments in the song playing segments and the playing positions of the pure music segments.

More specifically, the LCS (Longest Common subsequence) algorithm is used in the present application to perform segment matching between a song playing segment and a song segment, so as to precisely match the playing position of a pure music segment, as shown in fig. 5, the LCS length of the corresponding time stamp between the first song playing slice of the segment a and the song 1 is obtained (the total amount of the same audio fingerprint data in the actual audio fingerprint data of the first song playing slice of the segment a and the reference audio fingerprint data of the song segment corresponding to the first song playing slice is equal to the total amount of the same audio fingerprint data), and by such a loop, it can be determined whether all song playing slices of the segment a belong to the pure music segment, and all song playing slices of the segment a belonging to the pure music segment are integrated, so as to obtain the pure music segment in the segment a and the playing position of the pure music segment.

As an alternative embodiment, the method for displaying subtitles of a music piece further comprises:

Further, it can be understood that, when the total number of the same audio fingerprint data in the actual audio fingerprint data of the first song playing slice and the reference audio fingerprint data of the song segment corresponding to the first song playing slice is obtained, if the total number of the same audio fingerprint data is not greater than a preset number threshold, it is determined that the first song playing slice belongs to the background music segment, and through such a loop, it is determined whether any target song playing slice belongs to the background music segment. And integrating all target song playing slices belonging to the background music slices to obtain the background music slices in the song playing slices and the playing positions of the background music slices.

Therefore, the specific playing positions of the pure music piece and the background music piece can be obtained respectively finally, and the accuracy can reach the millisecond level. As shown in fig. 6, the specific time of each segment in the play of song 1 can be specifically identified.

As an alternative embodiment, the method for displaying subtitles of a music piece further includes:

Furthermore, the audio recognition can be performed on the background music segment in the audio to be recognized, so that corresponding text information is displayed when the background music segment of the audio to be recognized is played, and a user can understand the audio information more.

In summary, in the present application, a fuzzy matching algorithm (an algorithm for associating audio segments with songs) is first used to identify a song list included in an audio to be identified, and then a fine matching algorithm (an algorithm for calculating LCS by segment matching) is used to locate the playing position of a pure music segment with respect to the identified song, so that the playing positions of a pure anchor speech segment, a background music segment and a pure music segment with only anchor speech in the audio to be identified can be accurately located, and song information corresponding to the pure music segment can be displayed, thereby solving the problem of caption errors of the pure music segment due to intelligent caption translation, and improving user experience.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a subtitle display apparatus for a music piece according to an embodiment of the present application.

The caption display device for a music piece includes:

a memory 100 for storing a computer program;

a processor 200 for implementing the steps of any one of the above-mentioned music piece subtitle display methods when executing a computer program.

For introduction of the subtitle display apparatus provided by the present application, please refer to the above-mentioned embodiment of the subtitle display method, which is not described herein again.

The application also provides a computer readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the above-mentioned music piece subtitle display methods.

For the introduction of the storage medium provided in the present application, please refer to the above-mentioned embodiment of the subtitle display method, which is not described herein again.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A subtitle display method for a music piece is characterized by comprising the following steps:

and positioning the playing position of a pure music segment only played by the song in the song playing segments so as to display corresponding song information when the pure music segment is played.

2. The method for displaying subtitles of music segments according to claim 1, wherein the step of matching the audio fingerprint features from an audio fingerprint library established based on different songs to obtain the playing position of a song playing segment in the audio to be identified and song information corresponding to the song playing segment comprises the steps of:

matching the audio fingerprint characteristics of each audio slice from the audio fingerprint library to obtain a target audio slice which is successfully matched and song information corresponding to the target audio slice;

3. The method for displaying subtitles for music tracks according to claim 2, wherein the song information comprises a song ID and a song track;

4. The method of claim 3, wherein prior to integrating the target audio slices that are adjacent and correspond to the same song ID, the method further comprises:

5. The subtitle display method for a music piece according to any one of claims 1 to 4, wherein the song information includes a song piece;

positioning the playing position of the pure music segment only played by the song in the song playing segments, including:

segmenting the song playing segments into a plurality of song playing segments;

6. The method of displaying subtitles for a piece of music according to claim 5, wherein the method of displaying subtitles for a piece of music further comprises:

7. The method for displaying subtitles for a piece of music according to claim 6, wherein the method for displaying subtitles for a piece of music further comprises:

8. A subtitle display apparatus for a piece of music, comprising:

a memory for storing a computer program;

processor for implementing the steps of a method for subtitle displaying of a piece of music according to any of claims 1 to 7 when executing said computer program.

9. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the method for subtitle display of a piece of music according to any one of claims 1 to 7.