CN111368136A

CN111368136A - Song identification method and device, electronic equipment and storage medium

Info

Publication number: CN111368136A
Application number: CN202010244457.2A
Authority: CN
Inventors: 牛闯
Original assignee: Reach Best Technology Co Ltd
Current assignee: Reach Best Technology Co Ltd; Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2020-07-03

Abstract

The disclosure relates to a song identification method and device, electronic equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: extracting a target lyric text of a target song from a target video; matching the target lyric text with an original lyric text of at least one original song in an original song database; and when the matching degree of the target lyric text and the original lyric text of any original song is greater than a first preset matching degree, determining the original song with the matching degree greater than the first preset matching degree as the original version of the target song. The method for identifying the target song in the target video is expanded, the original version of the target song is identified by adopting the target lyric text, and the identification accuracy is improved.

Description

Song identification method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a song recognition method and apparatus, an electronic device, and a storage medium.

Background

With the increasing living standard of people, more and more users enjoy the entertainment by listening to songs, but the same song can have a plurality of different versions, such as an original version and a reproduced version. It is therefore desirable to provide a method of identifying an original version of a song.

In the related technology, the audio fingerprint of each original song is obtained in advance, then the audio fingerprint of the target song is obtained, the original song which is the same as the audio fingerprint of the target song is inquired, and the inquired original song is the original version of the target song.

However, the above identification method can only identify songs with the same or similar score, even the same song, if the score in the target song changes, the audio fingerprints of the target song and the original song will be different, and the original song corresponding to the target song cannot be searched, and the identification accuracy is low.

Disclosure of Invention

The present disclosure provides a song identification method, apparatus, electronic device and storage medium, which can identify an original version of a target song in a target video and improve identification accuracy.

According to a first aspect of embodiments of the present disclosure, there is provided a song identification method, the method including:

extracting a target lyric text of a target song from a target video;

matching the target lyric text with an original lyric text of at least one original song in an original song database;

and when the matching degree of the target lyric text and the original lyric text of any original song is greater than a first preset matching degree, determining the original song with the matching degree greater than the first preset matching degree as the original version of the target song.

In one possible implementation, the extracting, from the target video, target lyric text of the target song includes:

extracting audio information of a target song in the target video;

and converting the audio information into the target lyric text by adopting an audio recognition technology.

In another possible implementation manner, the extracting, from the target video, a target lyric text of the target song includes:

acquiring at least one video frame in the target video;

and identifying text in the at least one video frame, and taking the text in the at least one video frame as the target lyric text.

In another possible implementation manner, the matching the target lyric text with the original lyric text of at least one original song in an original song database includes:

converting the target lyric text into a target lyric code, wherein the target lyric code is used for representing the pronunciation of the target lyric text;

matching the target lyric code with an original lyric code of at least one original lyric text;

when the matching degree of the target lyric text and the original lyric text of any original song is greater than a first preset matching degree, determining that the original song with the matching degree greater than the first preset matching degree is the original version of the target song, including:

and when the matching degree of the target lyric code and the original lyric code of any original lyric text is greater than a first preset matching degree, determining that the original song corresponding to the original lyric text with the matching degree greater than the first preset matching degree is the original version of the target song.

for any one of the at least one original song, matching each character in the target lyric text with each character in an original lyric text of the original song;

and determining the number of matched characters of the target lyric text and the original lyric text as the matching degree of the target lyric text and the original lyric text.

In another possible implementation, the target lyric text comprises a plurality of target lyric text pieces, and the original lyric text of the at least one original song comprises a plurality of original lyric text pieces; the matching the target lyric text with the original lyric text of at least one original song in an original song database comprises:

for any one of the at least one original song, matching each target lyric text segment in the target lyric text with each original lyric text segment in the original lyric text of the original song;

and determining the matching degree of each target lyric text segment in the target lyric text and each original lyric text segment in the original lyric text of the original song.

In another possible implementation manner, the matching, for any one of the at least one original song, each target lyric text segment in the target lyric text with each original lyric text segment in the original lyric text of the original song includes:

for any one of the at least one original song, matching each character in each target lyric text segment in the original song with each character in each original song text segment in an original lyric text of the original song;

and determining the number of matched characters of the target lyric text segment and the original lyric text segment as the matching degree of the target lyric text segment and the original lyric text segment.

and determining the number of matched text segments of the target lyric text and the original lyric text as the matching degree of the target lyric text and the original lyric text.

In another possible implementation, the method further includes:

and when the target lyric text comprises a plurality of target lyric text segments, performing de-duplication processing on the target lyric text segments.

According to a second aspect of embodiments of the present disclosure, there is provided a song recognition apparatus, the apparatus including:

the extraction unit is used for extracting a target lyric text of a target song from a target video;

the matching unit is used for matching the target lyric text with the original lyric text of at least one original song in an original song database;

and the determining unit is used for determining the original song with the matching degree higher than the first preset matching degree as the original version of the target song when the matching degree of the target lyric text and the original lyric text of any original song is higher than the first preset matching degree.

In one possible implementation, the extraction unit includes:

the extraction subunit is used for extracting the audio information of the target song in the target video;

and the conversion subunit is used for converting the audio information into the target lyric text by adopting an audio recognition technology.

In another possible implementation manner, the extraction unit includes:

the first acquisition subunit is used for acquiring at least one video frame in the target video;

and the identification subunit is used for identifying the text in the at least one video frame and taking the text in the at least one video frame as the target lyric text.

In another possible implementation manner, the matching unit is configured to convert the target lyric text into a target lyric code, where the target lyric code is used to represent a reading of the target lyric text;

the matching unit is also used for matching the target lyric codes with original lyric codes of at least one original lyric text;

the determining unit is used for determining the original song corresponding to the original lyric text with the matching degree larger than a first preset matching degree as the original version of the target song when the matching degree of the target lyric code and the original lyric code of any original lyric text is larger than the first preset matching degree.

In another possible implementation manner, the matching unit includes:

a matching subunit, configured to match, for any original song of the at least one original song, each character in the target lyric text with each character in an original lyric text of the original song;

and the determining subunit is used for determining the number of the matched characters of the target lyric text and the original lyric text as the matching degree of the target lyric text and the original lyric text.

In another possible implementation manner, the target lyric text comprises a plurality of target lyric text segments, and the matching unit comprises:

a matching subunit, configured to match, for any original song of the at least one original song, each target lyric text segment in the target lyric text with each original lyric text segment in the original lyric text of the original song;

and the determining subunit is used for determining the matching degree of each target lyric text segment in the target lyric text and each original lyric text segment in the original lyric text of the original song according to the matching degree of each target lyric text segment in the target lyric text and each original lyric text segment in the original lyric text of the original song.

In another possible implementation manner, the matching subunit is configured to, for any original song of the at least one original song, match each character in each target lyric text segment in the original song with each character in each original song text segment in the original lyric text of the original song;

the matching subunit is further configured to determine the number of matching characters of the target lyric text segment and the original lyric text segment as the matching degree of the target lyric text segment and the original lyric text segment.

In another possible implementation manner, the matching unit is configured to determine the number of matching text segments of the target lyric text and the original lyric text as a matching degree of the target lyric text and the original lyric text.

In another possible implementation manner, the apparatus further includes:

and the de-duplication unit is used for performing de-duplication processing on the target lyric text segments when the target lyric text comprises the target lyric text segments.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

one or more processors;

volatile or non-volatile memory for storing the one or more processor-executable commands;

wherein the one or more processors are configured to perform the song recognition method of the first aspect.

According to a fourth aspect provided by embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, wherein instructions of the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the song recognition method according to the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, wherein the instructions of the computer program product, when executed by a processor of an electronic device, enable the electronic device to perform the song recognition method according to the first aspect.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

the method, the device, the electronic equipment and the storage medium provided by the embodiment of the application extract a target lyric text of a target song from a target video, match the target lyric text with an original lyric text of at least one original song in an original song database, and determine that the original song with the matching degree larger than a first preset matching degree is an original version of the target song when the matching degree of the target lyric text and the original lyric text of any original song is larger than the first preset matching degree. The method for identifying the target song in the target video is expanded, the original version of the target song is identified by adopting the target lyric text, and the identification accuracy is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow diagram illustrating a song identification method according to an exemplary embodiment.

Fig. 2 is a flow diagram illustrating a song identification method according to an example embodiment.

Fig. 3 is a flow diagram illustrating a song identification method according to an example embodiment.

Fig. 4 is a schematic diagram illustrating a structure of a song recognition apparatus according to an exemplary embodiment.

Fig. 5 is a schematic diagram illustrating a structure of another song recognition apparatus according to an exemplary embodiment.

Fig. 6 is a block diagram illustrating a terminal according to an example embodiment.

Fig. 7 is a schematic diagram illustrating a configuration of a server according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The embodiment of the disclosure provides a song identification method, which can extract a target lyric text of a target song from a target video, match the target lyric text with an original lyric text of at least one original song in an original song database, and determine an original song with a matching degree greater than a first preset matching degree as an original version of the target song when the matching degree of the target lyric text and the original lyric text of any original song is greater than the first preset matching degree.

For example, the method provided by the embodiment of the present disclosure is applied to a scene in which an original song corresponding to a song in a video is identified, and after any user uploads the video in the video application, the method provided by the embodiment of the present disclosure is adopted, so that the original song corresponding to the song in the video can be identified, and thus the original song is determined to be an original version of the song in the video, that is, the song in the video is the singing of which song.

In addition, after the original song is determined, if the original song has personal copyright, the video is limited from being uploaded in a video application, or if the original song is high in popularity, the number of times of recommendation of the video is limited, and the like.

The song identification method provided by the embodiment of the disclosure is applied to electronic equipment, and the electronic equipment can comprise a terminal and can also comprise a server.

When the electronic equipment comprises a terminal, the terminal is used for extracting a target lyric text of a target song from a target video, matching the target lyric text with an original lyric text of at least one original song in an original song database, and when the matching degree of the target lyric text and the original lyric text of any original song is greater than a first preset matching degree, determining that the original song is an original version of the target song.

Or when the electronic equipment comprises a terminal and a server, the terminal is used for sending the target video to the server, the server is used for extracting the target lyric text of the target song from the target video, matching the target lyric text with the original lyric text of at least one original song in the original song database, and when the matching degree of the target lyric text and the original lyric text of any original song is greater than a first preset matching degree, determining the original song with the matching degree greater than the first preset matching degree as the original version of the target song.

The terminal can be various terminals such as a mobile phone, a tablet computer, a computer and the like, and the server can be a server, a server cluster consisting of a plurality of servers, or a cloud computing service center.

Fig. 1 is a flow diagram illustrating a song recognition method according to an exemplary embodiment, referring to fig. 1, the method including:

in step 101, a target lyric text of a target song is extracted from a target video.

In step 102, the target lyrics text is matched with the original lyrics text of at least one original song in the original song database.

In step 103, when the matching degree of the target lyric text and the original lyric text of any original song is greater than a first preset matching degree, the original song with the matching degree greater than the first preset matching degree is determined to be the original version of the target song.

The method provided by the embodiment of the disclosure comprises the steps of extracting a target lyric text of a target song from a target video, matching the target lyric text with an original lyric text of at least one original song in an original song database, and determining an original song with a matching degree greater than a first preset matching degree as an original version of the target song when the matching degree of the target lyric text and the original lyric text of any original song is greater than the first preset matching degree. The method for identifying the target song in the target video is expanded, the original version of the target song is identified by adopting the target lyric text, and the identification accuracy is improved.

In one possible implementation, extracting target lyric text of a target song from a target video includes:

extracting audio information of a target song in a target video;

and converting the audio information into a target lyric text by adopting an audio recognition technology.

In another possible implementation manner, extracting target lyric text of a target song from a target video includes:

acquiring at least one video frame in a target video;

and identifying text in at least one video frame, and taking the text in at least one video frame as the target lyric text.

In another possible implementation, matching the target lyric text with the original lyric text of at least one original song in the original song database includes:

converting the target lyric text into a target lyric code, wherein the target lyric code is used for expressing the pronunciation of the target lyric text;

matching the target lyric code with at least one original lyric code of an original lyric text;

when the matching degree of the target lyric text and the original lyric text of any original song is greater than a first preset matching degree, determining the original song with the matching degree greater than the first preset matching degree as the original version of the target song, wherein the steps of:

and when the matching degree of the target lyric code and the original lyric code of any original lyric text is greater than a first preset matching degree, determining the original song corresponding to the original lyric text with the matching degree greater than the first preset matching degree as the original version of the target song.

for any one of the at least one original song, matching each character in the target lyric text with each character in the original lyric text of the original song;

and determining the number of the matched characters of the target lyric text and the original lyric text as the matching degree of the target lyric text and the original lyric text.

In another possible implementation, the target lyric text comprises a plurality of target lyric text segments, and the original lyric text of the at least one original song comprises a plurality of original lyric text segments; matching the target lyric text with an original lyric text of at least one original song in an original song database, comprising:

and determining the matching degree of each target lyric text segment in the target lyric text and each original lyric text segment in the original lyric text of the original song according to the matching degree of each target lyric text segment in the target lyric text and each original lyric text segment in the original lyric text of the original song.

In another possible implementation, for any one of the at least one original song, matching each target lyric text segment in the target lyric text with each original lyric text segment in the original lyric text of the original song comprises:

for any one of the at least one original song, matching each character in each target lyric text fragment in the original song with each character in each original song text fragment in the original lyric text of the original song;

In another possible implementation, the method further includes:

Fig. 2 is a flowchart illustrating a song recognition method, see fig. 2, applied in an electronic device, according to an exemplary embodiment, the method including:

in step 201, audio information of a target song in a target video is extracted.

Wherein, the target video is any video. For example, the target video may be a song video, a dance video, a lecture video, and so on. The target video includes a target song, and audio information of the target song in the target video may be extracted, where the audio information may be a background song in the target video, a song singed by a user in the target video, or a song otherwise existing in the target video, and so on.

In addition, the target video can be a video uploaded in any video application, or a video uploaded in other types of applications, and the like.

If the target video includes the target song, audio information can be extracted from the target video, that is, the target song in the target video is extracted, and then the original song of the original version corresponding to the target song in the target video is identified according to the audio information.

In step 202, audio recognition techniques are used to convert the audio information into the target lyric text.

Wherein the audio recognition technique is used to recognize text in the audio information. For example, the audio recognition techniques may include linguistic and acoustic based methods, stochastic modeling methods, methods utilizing artificial neural networks, probabilistic parsing, and the like.

After the audio information in the target video is obtained, the audio information is converted into a target lyric text by adopting an audio identification technology, and then an original song matched with the target song in an original song database can be identified by adopting a text matching mode.

In one possible implementation, when the target lyric text includes a plurality of target lyric text pieces, the plurality of target lyric text pieces are subjected to de-duplication processing.

After the audio information of the target song is converted into the target lyric text, the obtained target lyric text may include a plurality of same target lyric text segments, and if the plurality of target lyric text segments are matched for a plurality of times, which may cause resource consumption, the plurality of target lyric text segments are obtained and then are subjected to de-duplication processing.

Optionally, the same target lyric text field is selected from the plurality of target lyric text fields, any group of target lyric text fields are selected from the same target lyric text field, the any group of target lyric text fields are stored, and the target lyric text fields except the any group of target lyric text fields in the same target lyric text fields are deleted, so that the duplication elimination processing is completed.

In step 203, the target lyrics text is matched with the original lyrics text of at least one original song in the original song database.

Wherein the original song database comprises at least one original song. In addition, the original song is a first released version of a song, and after other singers sing the song again, the songs sung by the other singers are the song turned over corresponding to the original song. Or when other users recompose the accompaniment music in the song to form a new song, the song is also the song of singing corresponding to the original song. Or when other users recompose the lyrics in the song to form a new song, the song is also the song of singing of the corresponding song of the original song.

In addition, the original song database also comprises at least one original lyric text of the original song, and after the target lyric text of the target song is obtained, the target lyric text can be matched with the original lyric text of the at least one original song in the original song database, so that the matching degree of the target lyric text and the at least one original lyric text is obtained.

Additionally, in one possible implementation, when the original lyric text includes a plurality of original lyric text pieces, the plurality of original lyric text pieces are de-duplicated. The step of performing de-duplication on the original lyric text segments is similar to the step of performing de-duplication on the target lyric text segments, and is not described herein again.

The method for acquiring the matching degree of the target lyric text and the at least one original lyric text comprises any one of the following steps:

1. and for any original song in the at least one original song, matching each character in the target lyric text with each character in the original lyric text of the original song, and determining the number of the matched characters of the target lyric text and the original lyric text as the matching degree of the target lyric text and the original lyric text.

Wherein, two identical characters are matched characters.

When the matching degree of the target lyric text and the original lyric text is determined, for any original song in the at least one original song, matching each character in the target lyric text with each character in the original lyric text of the original song, wherein if the number of the matched characters is larger, the more matched characters represent that each lyric text is matched with the original lyric text, therefore, the number of the matched characters in the target lyric text and the original lyric text of the original song is obtained, and the obtained number of the matched characters is the matching degree of the target lyric text and the original lyric text.

For example, when the target lyric text is "here, most excited tone", the original lyric text is "here, most excited tone of the song", and the number of matching characters of the target lyric text and the original lyric text is 9.

2. The target lyrics text comprises a plurality of target lyrics text segments and the original lyrics text of the original song comprises a plurality of original lyrics text segments.

And for any original song in the at least one original song, matching each target lyric text segment in the target lyric text with each original lyric text segment in the original lyric text of the original song, and determining the matching degree of the target lyric text and the original lyric text of the original song according to the matching degree of each target lyric text segment in the target lyric text and each original lyric text segment in the original lyric text of the original song.

Optionally, for any original song of the at least one original song, matching each character in each target lyrics text segment in the original song with each character in each original song text segment in the original lyrics text of the original song, and determining the number of matched characters of the target lyrics text segment and the original lyrics text segment as the matching degree of the target lyrics text segment and the original lyrics text segment.

When the target lyric text field is matched with the original lyric text field, matching each character in the target lyric text field with each character in the original lyric text field respectively, obtaining the number of the matched characters of the target lyric text field and the original lyric text field, and determining the obtained number of the matched characters as the matching degree of the target lyric text field and the original lyric text field.

For example, if the target lyric text fragment is the "most beautiful sound" and the original lyric text fragment is the "most beautiful song", the matching degree of the target lyric text fragment and the original lyric text fragment is determined to be 3 when the target lyric text fragment and the original lyric text fragment are matched.

Optionally, the number of consecutive matching characters in the target lyrics text fragment and the original lyrics text fragment is determined as a degree of matching of the target lyrics text fragment and the original lyrics text fragment.

And when the matching degree of the target lyric text field and the original lyric text field is determined according to the characters in the target lyric text field and the characters in the original lyric text field, determining the number of the continuous matching characters of the target lyric text field and the original lyric text field as the matching degree of the target lyric text field and the original lyric text field.

For example, when there is a continuous 3 character match between the target lyrics text fragment and the original lyrics text fragment, the degree of match between the target lyrics text fragment and the original lyrics text fragment is 3, and when there is a 3 interval character match between the target lyrics text fragment and the original lyrics text fragment, the degree of match between the target lyrics text fragment and the original lyrics text fragment is 1.

According to the method and the device, the number of the continuous matching characters in the target lyric text field and the original lyric text field is determined as the matching degree of the target lyric text field and the original lyric text field, and the accuracy of the determined matching degree can be improved.

Optionally, when the matching degree between the target lyric text field and the original lyric text field is obtained, a first lyric vector of the target lyric text field and a second lyric vector of the original lyric text field are obtained, and then the matching degree between the target lyric text field and the original lyric text field is determined according to the first lyric vector and the second lyric vector.

The matching degree of the target lyric text field and the original lyric text field is represented by Euclidean distance and cosine distance of the first lyric vector and the second lyric vector. In addition, the larger the Euclidean distance between the first lyric vector and the second lyric vector is, the smaller the matching degree of the target lyric text segment and the original lyric text segment is. And the larger the cosine distance between the first lyric vector and the second lyric vector is, the larger the matching degree between the target lyric text segment and the original lyric text segment is.

3. And determining the number of matched text segments of the target lyric text and the original lyric text as the matching degree of the target lyric text and the original lyric text.

When the matching degree of the target lyric text and the original lyric text is determined, for any original song in the at least one original song, matching each target text segment in the target lyric text with each original lyric text segment in the original lyric text of the original song, obtaining the number of the matching text segments of the target lyric text and the original lyric text of the original song, and obtaining the number of the obtained matching text segments as the matching degree of the target lyric text and the original lyric text.

Optionally, when the matching degree of the target lyric text segment and the original lyric text segment is greater than a second preset matching degree, the target lyric text segment and the original lyric text segment are determined to be matched text segments.

The second preset matching degree is set by a server, or set by a developer, or set in other ways.

4. And acquiring the number of matched text segments of the target lyric text segment and the original lyric text segment, and determining the matching ratio of each target lyric text segment in the target lyric text to the original lyric text segment as the matching degree of the target lyric text to the original lyric text according to the acquired number and the total number of the target lyric text segments in the target lyric text.

In step 204, when the matching degree of the target lyric text and the original lyric text of any original song is greater than a first preset matching degree, the original song with the matching degree greater than the first preset matching degree is determined to be the original version of the target song.

And when the matching degree of the target lyric text and the original lyric text is greater than a first preset matching degree, determining the original song with the matching degree greater than the first preset matching degree as the original version of the target song.

The first preset matching degree is set by a server, or by a developer, or in other ways.

For example, when the matching degree of the target lyric text in the target song a and the original lyric text of the original song B is 5 and the first preset matching degree is 4, the obtained matching degree is greater than the first preset matching degree, and the original song B is determined to be the original version of the target song a.

It should be noted that, in the embodiment of the present application, the target lyric text is only described as an example of matching with the original lyric text, in another embodiment, the target lyric text is converted into a target lyric code, the target lyric code is matched with the original lyric code of the original lyric text, and when the matching degree between the target lyric code and the original lyric code is greater than a first preset matching degree, the original song with the matching degree greater than the first preset matching degree is determined as the original version of the target song.

Wherein the target lyric codes are used for representing the pronunciation of the target lyric texts, and the original lyric codes are used for representing the pronunciation of the original lyric texts.

The target lyric text is taken as a Chinese character for explanation, for example, the target lyric text is the 'most beautiful song', and the target lyric text is converted into the target lyric code to be 'zui mei de ge qu'.

In addition, the original song database can store original lyric codes of original lyric texts of original songs, the target lyric texts are converted into target lyric codes, then the target lyric codes can be matched with the original lyric codes of the original songs, and when the matching degree of the target lyric codes and the original lyric codes is larger than a first preset matching degree, the original songs with the matching degree larger than the first preset matching degree are determined to be original versions of the target songs.

Or, the original song database does not store the original lyric codes of the original lyric texts of the original songs in advance, the target lyric texts are converted into the target lyric codes, the original lyric texts of the original songs are converted into the original lyric codes, the target lyric codes are matched with the original lyric codes of the original songs, and when the matching degree of the target lyric codes and the original lyric codes is greater than a first preset matching degree, the original songs with the matching degree greater than the first preset matching degree are determined to be the original versions of the target songs.

Wherein, matching the target lyric code with the original lyric code of the original lyric text comprises:

1. and for any original song in the at least one original song, matching the code of each character in the target lyric text with the code of each character in the original lyric text of the original song, and determining the number of the matched codes of the target lyric text and the original lyric text as the matching degree of the target lyric text and the original lyric text.

2. The target lyric text comprises a plurality of target lyric text segments and the original lyric text of the at least one original song comprises a plurality of original lyric text segments.

And for any original song in the at least one original song, matching the code of each target lyric text segment in the target lyric text with the code of each original lyric text segment in the original lyric text of the original song, and determining the matching degree of the target lyric text and the original lyric text according to the matching degree of the code of each target lyric text segment and the code of each original lyric text segment in the original lyric text.

Optionally, for any one of the at least one original song, the code of each character in each target lyrics text segment in the original song is matched with the code of each character in each original song text segment in the original lyrics text of the original song, and the number of matched codes of the target lyrics text segment and the original lyrics text segment is determined as the matching degree of the target lyrics text segment and the original lyrics text segment.

Optionally, the number of consecutive matching codes in the target lyrics text fragment and the original lyrics text fragment is determined as the degree of matching of the target lyrics text fragment and the original lyrics text fragment.

And when the matching degree of the target lyric text field and the original lyric text field is determined according to the codes in the target lyric text field and the codes in the original lyric text field, determining the number of continuous matching codes of the target lyric text field and the original lyric text field as the matching degree of the target lyric text field and the original lyric text field.

According to the embodiment of the application, the number of the continuous matching codes in the target lyric text field and the original lyric text field is determined as the matching degree of the target lyric text field and the original lyric text field, and the accuracy rate of the determined matching degree can be improved.

Optionally, when the matching degree between the target lyric text fragment and the original lyric text fragment is obtained, a first encoding vector of the encoding of the target lyric text fragment and a second encoding vector of the encoding of the original lyric text fragment are obtained, and then the matching degree between the target lyric text fragment and the original lyric text fragment is determined according to the first encoding vector and the second encoding vector.

The matching degree of the target lyric text segment and the original lyric text segment can be represented by Euclidean distance, cosine distance and the like of the first lyric vector and the second lyric vector.

3. And determining the number of the matched coding segments of the target lyric text and the original lyric text as the matching degree of the target lyric text and the original lyric text.

When the matching degree of the target lyric text and the original lyric text is determined, for any original song in the at least one original song, matching a target code segment of each target lyric text segment in the target lyric text with an original code segment of each original lyric text segment in the original lyric text of the original song, obtaining the number of the matched code segments of the target lyric text and the original lyric text of the original song, and obtaining the number of the obtained matched code segments as the matching degree of the target lyric text and the original lyric text.

Optionally, when the matching degree of the target lyric coded segment and the original lyric coded segment is greater than a second preset matching degree, the target lyric coded segment and the original lyric coded segment are determined to be matching coded segments.

4. Acquiring the number of the matching code segments of the target lyric code segment and the original lyric text segment, and determining the matching ratio of each target lyric code segment in the target lyric code to the original lyric code segment as the matching degree of the target lyric text and the original lyric text according to the acquired number and the total number of the target lyric code segments in the target lyric code.

According to the method provided by the embodiment of the application, the target lyric text of the target song is extracted from the target video, the target lyric text is matched with the original lyric text of at least one original song in the original song database, and when the matching degree of the target lyric text and the original lyric text of any original song is larger than a first preset matching degree, the original song with the matching degree larger than the first preset matching degree is determined to be the original version of the target song. The method for identifying the target song in the target video is expanded, the original version of the target song is identified by adopting the target lyric text, and the identification accuracy is improved.

In addition, the method provided by the embodiment of the application converts the target lyric text into the target lyric code, matches the target lyric code with the original lyric code of the original lyric text, determines the original song with the matching degree greater than the first preset matching degree as the original version of the target song when the matching degree of the target lyric code and the original lyric code is greater than the first preset matching degree, can expand the text range due to the fact that the codes are adopted to represent the reading of the text, and can improve the accuracy rate of text recognition and further improve the accuracy rate of original version of the target song by adopting the codes of the lyrics to determine the matching degree of the target lyric text and the original lyric text.

Fig. 3 is a flowchart illustrating a song recognition method, see fig. 3, applied in an electronic device, according to an exemplary embodiment, the method including:

in step 301, at least one video frame in a target video is acquired.

In step 302, text in at least one video frame is identified, and the text in the at least one video frame is taken as the target lyrics text.

The target video comprises at least one video frame, and the target video can also comprise text information, then at least one video frame in the target video is obtained, and then the text in each video frame in the at least one video frame is obtained, wherein the obtained text is the target lyric text of the target song in the target video.

In one possible implementation, ORC (Optical Character Recognition) Recognition technology is used to recognize text in the video frame, or other technology is used to recognize text included in the video frame.

In step 303, the target lyrics text is matched with the original lyrics text of at least one original song in the original song database.

In step 304, when the matching degree of the target lyric text and the original lyric text of any original song is greater than a first preset matching degree, the original song with the matching degree greater than the first preset matching degree is determined to be the original version of the target song.

Wherein, steps 303-304 are similar to steps 203-204, and are not described herein again.

In addition, the embodiment of fig. 2 is to acquire the target lyric text of the target song by means of audio recognition, and the embodiment of fig. 3 is to acquire the target lyric text of the target song by means of recognizing the text in the video frame. The embodiments of fig. 2 and fig. 3 respectively adopt different ways to obtain the target lyric text of the target song in the target video, and the obtained target lyric text of the target song may also be different.

It should be noted that, the present application is only described by taking the example of respectively identifying the original version of the target song in the target video in the manner of audio identification adopted in fig. 2 and the manner of identifying the text in the video frame adopted in fig. 3. In another embodiment, the embodiment of fig. 2 and the embodiment of fig. 3 may be further combined, step 201 and step 203 in the embodiment of fig. 2 are executed first to obtain a target lyric text, when the matching degree between the target lyric text and the original lyric text of each original song is not greater than a first preset matching degree, step 301 and step 304 in the embodiment of fig. 3 are executed to obtain another target lyric text of the target video, and when the matching degree between the obtained another target lyric text and the original lyric text of any original song is greater than the first preset matching degree, the original song with the matching degree greater than the first preset matching degree is determined to be the original version of the target song. Or, step 301 and step 304 in the embodiment of fig. 3 are executed first to obtain a target lyric text, when the matching degree between the target lyric text and the original lyric text of each original song is not greater than the first preset matching degree, another target lyric text of the target video is obtained by executing step 201 and step 203 in the embodiment of fig. 2, and when the matching degree between the obtained another target lyric text and the original lyric text of any original song is greater than the first preset matching degree, the original song with the matching degree greater than the first preset matching degree is determined to be the original version of the target song.

Fig. 4 is a schematic diagram illustrating a structure of a song recognition apparatus according to an exemplary embodiment. Referring to fig. 4, the apparatus includes:

an extracting unit 401, configured to extract a target lyric text of a target song from a target video;

a matching unit 402, configured to match the target lyric text with an original lyric text of at least one original song in the original song database;

the determining unit 403 is configured to determine, when a matching degree of the target lyric text and an original lyric text of any original song is greater than a first preset matching degree, that the original song with the matching degree greater than the first preset matching degree is an original version of the target song.

The device provided by the embodiment of the application extracts a target lyric text of a target song from a target video, matches the target lyric text with an original lyric text of at least one original song in an original song database, and determines an original song with a matching degree greater than a first preset matching degree as an original version of the target song when the matching degree of the target lyric text and the original lyric text of any original song is greater than the first preset matching degree. The method for identifying the target song in the target video is expanded, the original version of the target song is identified by adopting the target lyric text, and the identification accuracy is improved.

In one possible implementation, referring to fig. 5, the extraction unit 401 includes:

the extraction sub-unit 4011 is configured to extract audio information of a target song in a target video;

and the conversion sub-unit 4012 is configured to convert the audio information into the target lyric text by using an audio recognition technology.

In another possible implementation, referring to fig. 5, the extraction unit 401 includes:

a first obtaining sub-unit 4013, configured to obtain at least one video frame in the target video;

and the identifying sub-unit 4014 is configured to identify text in at least one video frame, and use the text in the at least one video frame as the target lyric text.

In another possible implementation, the matching unit 402 is configured to convert the target lyric text into a target lyric code, where the target lyric code is used to represent the reading of the target lyric text;

a matching unit 402, further configured to match the target lyric code with an original lyric code of at least one original lyric text;

a determining unit 403, configured to determine, when a matching degree between the target lyric code and an original lyric code of any one of the original lyric texts is greater than a first preset matching degree, that the original song corresponding to the original lyric text having the matching degree greater than the first preset matching degree is an original version of the target song.

In another possible implementation, referring to fig. 5, the matching unit 402 includes:

a matching subunit 4021, configured to match each character in the target lyric text with each character in the original lyric text of the original song, for any one of the at least one original song;

a determining subunit 4022, configured to determine the number of matching characters of the target lyric text and the original lyric text as the matching degree of the target lyric text and the original lyric text.

In another possible implementation, the target lyrics text comprises a plurality of target lyrics text segments, and referring to fig. 5, the matching unit 402 comprises:

a matching subunit 4021, configured to match each target lyric text segment in the target lyric text with each original lyric text segment in the original lyric text of the original song, for any one of the at least one original song;

the determining subunit 4022 is configured to determine a matching degree between each target lyric text segment in the target lyric text and each original lyric text segment in the original lyric text of the original song.

In another possible implementation, the matching subunit 4021 is configured to, for any one of the at least one original song, match each character in each target lyric text fragment in the original song with each character in each original song text fragment in the original lyric text of the original song;

the matching subunit 4021 is further configured to determine the number of matching characters between the target lyric text fragment and the original lyric text fragment as the matching degree between the target lyric text fragment and the original lyric text fragment.

In another possible implementation, the matching unit 402 is configured to determine the number of matching text segments of the target lyric text and the original lyric text as the matching degree of the target lyric text and the original lyric text.

In another possible implementation, referring to fig. 5, the apparatus further includes:

a de-duplication unit 404 for performing de-duplication processing on the plurality of target lyrics text pieces when the target lyrics text comprises the plurality of target lyrics text pieces.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 6 is a block diagram illustrating a terminal according to an example embodiment. The terminal 600 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group audio Layer III, motion Picture Experts compression standard audio Layer 3), an MP4 player (Moving Picture Experts Group audio Layer IV, motion Picture Experts compression standard audio Layer 4), a notebook computer, or a desktop computer. The terminal 800 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

In general, the terminal 600 includes: one or more processors 601 and one or more memories 602.

The processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit, data recommender), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 601 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include volatile memory or non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one instruction for being possessed by processor 601 to implement the song recognition methods provided by the method embodiments herein.

In some embodiments, the terminal 600 may further optionally include: a peripheral interface 603 and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 603 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 604, a touch screen display 605, a camera 606, an audio circuit 607, a positioning component 608, and a power supply 609.

The peripheral interface 603 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 601 and the memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 601, the memory 602, and the peripheral interface 603 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 604 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 604 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 604 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display 605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 605 is a touch display screen, the display screen 605 also has the ability to capture touch signals on or over the surface of the display screen 605. The touch signal may be input to the processor 601 as a control signal for processing. At this point, the display 605 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 605 may be one, providing the front panel of the terminal 600; in other embodiments, the display 605 may be at least two, respectively disposed on different surfaces of the terminal 600 or in a folded design; in still other embodiments, the display 605 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 600. Even more, the display 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 605 may be made of LCD (liquid crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 606 is used to capture images or video. Optionally, camera assembly 606 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 606 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuitry 607 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 601 for processing or inputting the electric signals to the radio frequency circuit 604 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 600. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 601 or the radio frequency circuit 604 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 607 may also include a headphone jack.

The positioning component 608 is used to locate the current geographic location of the terminal 600 to implement navigation or LBS (location based Service). The positioning component 608 can be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

Power supply 609 is used to provide power to the various components in terminal 600. The power supply 609 may be ac, dc, disposable or rechargeable. When the power supply 609 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 600 also includes one or more sensors 610. The one or more sensors 610 include, but are not limited to: acceleration sensor 611, gyro sensor 612, pressure sensor 613, fingerprint sensor 614, optical sensor 615, and proximity sensor 616.

The acceleration sensor 611 may detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 600. For example, the acceleration sensor 611 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 601 may control the touch screen display 605 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 611. The acceleration sensor 611 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 612 may detect a body direction and a rotation angle of the terminal 600, and the gyro sensor 612 and the acceleration sensor 611 may cooperate to acquire a 3D motion of the user on the terminal 600. The processor 601 may implement the following functions according to the data collected by the gyro sensor 612: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensor 613 may be disposed on a side frame of the terminal 600 and/or on a lower layer of the touch display screen 605. When the pressure sensor 613 is disposed on the side frame of the terminal 600, a user's holding signal of the terminal 600 can be detected, and the processor 601 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 613. When the pressure sensor 613 is disposed at the lower layer of the touch display screen 605, the processor 601 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 605. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 614 is used for collecting a fingerprint of a user, and the processor 601 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 614, or the fingerprint sensor 614 identifies the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 601 authorizes the user to have relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 614 may be disposed on the front, back, or side of the terminal 600. When a physical key or vendor Logo is provided on the terminal 600, the fingerprint sensor 614 may be integrated with the physical key or vendor Logo.

The optical sensor 615 is used to collect the ambient light intensity. In one embodiment, processor 601 may control the display brightness of touch display 605 based on the ambient light intensity collected by optical sensor 615. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 605 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 605 is turned down. In another embodiment, the processor 601 may also dynamically adjust the shooting parameters of the camera assembly 606 according to the ambient light intensity collected by the optical sensor 615.

A proximity sensor 616, also known as a distance sensor, is typically disposed on the front panel of the terminal 600. The proximity sensor 616 is used to collect the distance between the user and the front surface of the terminal 600. In one embodiment, when the proximity sensor 616 detects that the distance between the user and the front surface of the terminal 600 gradually decreases, the processor 601 controls the touch display 605 to switch from the bright screen state to the dark screen state; when the proximity sensor 616 detects that the distance between the user and the front surface of the terminal 600 gradually becomes larger, the processor 601 controls the touch display 605 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 6 is not intended to be limiting of terminal 600 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

Fig. 7 is a schematic structural diagram of a server 700 according to an exemplary embodiment, where the server 700 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 701 and one or more memories 702, where at least one instruction is stored in the memory 702, and is loaded and executed by the processor 701 to implement the methods provided by the foregoing embodiments of the song identification method. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

The server 700 may be used to perform the steps performed by the server in the song recognition method described above.

In an exemplary embodiment, there is also provided a non-transitory computer readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the steps performed by a terminal or a server in the song recognition method described above.

In an exemplary embodiment, there is also provided a computer program product, wherein instructions of the computer program product, when executed by a processor of an electronic device, enable the electronic device to perform the steps performed by the terminal or the server in the song recognition method described above.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A song identification method, the method comprising:

extracting a target lyric text of a target song from a target video;

2. The method of claim 1, wherein extracting target lyric text of a target song from a target video comprises:

extracting audio information of a target song in the target video;

3. The method of claim 1, wherein extracting target lyric text of a target song from a target video comprises:

acquiring at least one video frame in the target video;

4. The method of claim 1, wherein matching the target lyric text with a raw lyric text of at least one raw song in a raw song database comprises:

5. The method of claim 1, wherein matching the target lyric text with a raw lyric text of at least one raw song in a raw song database comprises:

6. The method of claim 1, wherein the target lyric text comprises a plurality of target lyric text pieces, and wherein the original lyric text of the at least one original song comprises a plurality of original lyric text pieces; the matching the target lyric text with the original lyric text of at least one original song in an original song database comprises:

7. The method of claim 6, wherein the matching each target lyric text segment in the target lyric text with each original lyric text segment in an original lyric text of the original song, for any one of the at least one original song, comprises:

8. A song recognition apparatus, characterized in that the apparatus comprises:

9. An electronic device, characterized in that the electronic device comprises:

one or more processors;

wherein the one or more processors are configured to perform the song recognition method of any one of claims 1-7.

10. A non-transitory computer-readable storage medium having instructions stored thereon that, when executed by a processor of an electronic device, enable the electronic device to perform the song recognition method of any one of claims 1-7.