CN115186128A

CN115186128A - Comment playing method and device, storage medium and electronic equipment

Info

Publication number: CN115186128A
Application number: CN202210806632.1A
Authority: CN
Inventors: 谢劲松; 周倩; 顾立瑞; 王翎雁; 成明; 陆锋平
Original assignee: Hangzhou Netease Cloud Music Technology Co Ltd
Current assignee: Hangzhou Netease Cloud Music Technology Co Ltd
Priority date: 2022-07-08
Filing date: 2022-07-08
Publication date: 2022-10-14

Abstract

The embodiment of the disclosure relates to the technical field of internet, and more particularly, to a comment playing method, a comment playing device, a storage medium and an electronic device. The comment playing method comprises the following steps: obtaining comment data of a music object in response to playing the music object; determining a style of a tune of a segment of the music object and a comment style of the comment data; and synchronously displaying comment data of which the comment styles are matched with the music styles of the sections when any section of the music object is played. According to the comment playing scheme, the comment data with the comment style matched with the music of the music clip can be synchronously displayed according to the currently played music clip, and multi-sense immersion experience of listening to music and watching the matched comment data is brought, so that the use experience of a user is effectively improved.

Description

Comment playing method and device, storage medium and electronic equipment

Technical Field

The embodiment of the disclosure relates to the technical field of internet, in particular to a comment playing method, a comment playing device, a storage medium and electronic equipment.

Background

This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims and the description herein is not admitted to be prior art by inclusion in this section.

Comment data are displayed in the music playing process, and the purpose is to enhance the immersion of a user listening to music and improve the use experience of the user. However, in the existing method for displaying comment data in the music playing process, the comment data is often acquired and displayed, so that the immersion of a user listening to music cannot be effectively enhanced, and the use experience of the user is influenced.

Disclosure of Invention

The embodiment of the disclosure provides a comment playing method and device, a storage medium and electronic equipment, which can synchronously display comment data matched with a currently played music segment according to the music segment, so as to bring multi-sense immersive experience of listening to music and watching the matched comment data, thereby effectively improving the use experience of a user.

According to an aspect of the present disclosure, there is provided a comment playing method including: in response to playing a music object, obtaining comment data of the music object; determining a style of a tune of a segment of the music object and a comment style of the comment data; and synchronously displaying comment data with a comment style matched with the music of the segment when any segment of the music object is played.

In an exemplary embodiment of the disclosure, the synchronously presenting comment data with a comment style matching with a music of the segment includes: obtaining target comment data of which the comment style is matched with the target music style according to the target music style of the current playing segment; and controlling the display time of each piece of comment content according to the reading speed corresponding to the target music, an initial reading speed and the word number of each piece of comment content of the target comment data.

In an exemplary embodiment of the present disclosure, the controlling a presentation time of each piece of comment content includes: for the comment content with the word number smaller than the lower limit threshold value, controlling the display time of the comment content according to the reading speed corresponding to the target music and the word number of the comment content; for the comment content with the word number larger than the upper limit threshold value, controlling the display time of the comment content according to the initial reading speed and the word number of the comment content; and for the comment content with the word number between the lower limit threshold and the upper limit threshold, controlling the display time of the comment content according to the reading speed corresponding to the target music, the initial reading speed and the word number of the comment content.

In an exemplary embodiment of the present disclosure, the controlling the presentation time of the comment content includes: determining first display time according to the reading speed corresponding to the target music and the word number of the comment content; determining second display time according to the initial reading speed and the word number of the comment content; and determining the display time of the comment content according to the weight relation between the first display time and the second display time.

In an exemplary embodiment of the present disclosure, the comment playing method further includes: under the condition that the client authorizes and captures the movement speed of the human eye focus, taking the movement speed of the human eye focus as the initial reading speed; and under the condition that the movement speed of the focus of the human eyes cannot be acquired, taking a preset reading speed as the initial reading speed.

In an exemplary embodiment of the present disclosure, the obtaining comment data of the music object includes: and screening comment data related to the attribute of the music object from all comment data of the music object.

In an exemplary embodiment of the present disclosure, the filtering out comment data related to the attribute of the music object includes: inputting each piece of comment data into a named entity recognition model to obtain a named entity recognition result of each piece of comment data; and screening comment data of which the attribute of the music object is hit by the named entity recognition result as comment data related to the attribute of the music object.

In an exemplary embodiment of the present disclosure, the attribute of the music object includes an identification attribute and a scene attribute of the music object; the comment playing method further comprises the following steps: obtaining a training data set, wherein the training data set comprises a first data set carrying an identification attribute label and a second data set carrying the identification attribute label and a scene attribute label; and training the named entity recognition model based on the first data set and the second data set successively.

In an exemplary embodiment of the present disclosure, the identification attribute includes a song title and/or an author name of the music object; the scene attribute includes a name of a movie work using the music object.

In an exemplary embodiment of the present disclosure, determining a comment style of the comment data includes: inputting each piece of comment data into an emotion recognition model to obtain an emotion recognition result of each piece of comment data; and determining the comment style of each piece of comment data according to the emotion recognition result of each piece of comment data.

In an exemplary embodiment of the present disclosure, determining the melody of the piece of the musical object includes: inputting each fragment into a melody recognition model to obtain a melody recognition result of each fragment; and determining the melody of each segment according to the melody recognition result of each segment.

According to an aspect of the present disclosure, there is provided a comment playing apparatus including: the comment data acquisition module is used for responding to a music object played and acquiring comment data of the music object; the music style and comment style determining module is used for determining the music style of the segment of the music object and the comment style of the comment data; and the comment data display module is used for synchronously displaying comment data of which the comment styles are matched with the music styles of the sections when any section of the music object is played.

In an exemplary embodiment of the present disclosure, the comment data presentation module includes: the target comment data acquisition module is used for acquiring target comment data of which the comment style is matched with the target music according to the target music of the current playing segment; and the display time control module is used for controlling the display time of each piece of comment content according to the reading speed corresponding to the target music, an initial reading speed and the word number of each piece of comment content of the target comment data.

In an exemplary embodiment of the present disclosure, the presentation time control module includes: the first control module is used for controlling the display time of the comment content with the word number smaller than a lower threshold according to the reading speed corresponding to the target music and the word number of the comment content; the second control module is used for controlling the display time of the comment content with the word number larger than the upper limit threshold value according to the initial reading speed and the word number of the comment content; and the third control module is used for controlling the display time of the comment content according to the reading speed corresponding to the target music style, the initial reading speed and the word number of the comment content for the comment content with the word number between the lower limit threshold and the upper limit threshold.

In an exemplary embodiment of the present disclosure, the third control module includes: the first display time determining module is used for determining first display time according to the reading speed corresponding to the target music and the word number of the comment content; the second display time determining module is used for determining second display time according to the initial reading speed and the word number of the comment content; and the display time determining module is used for determining the display time of the comment content according to the weight relation between the first display time and the second display time.

In an exemplary embodiment of the present disclosure, the comment playing apparatus further includes: the human eye focus movement speed acquisition module is used for taking the human eye focus movement speed as the initial reading speed under the condition that the client authorizes and captures the human eye focus movement speed; and the default reading speed acquisition module is used for taking a preset reading speed as the initial reading speed under the condition that the movement speed of the focus of the human eyes cannot be acquired.

In an exemplary embodiment of the present disclosure, the comment data acquisition module includes: and the comment data screening module is used for screening comment data related to the attribute of the music object from all comment data of the music object.

In an exemplary embodiment of the present disclosure, the comment data filtering module includes: the named entity recognition module is used for inputting each piece of comment data into a named entity recognition model to obtain a named entity recognition result of each piece of comment data; and the recognition result screening module is used for screening comment data of which the attribute of the music object is hit by the named entity recognition result as comment data related to the attribute of the music object.

In an exemplary embodiment of the present disclosure, the attribute of the music object includes an identification attribute and a scene attribute of the music object; the comment playing apparatus further includes: the training data acquisition module is used for acquiring a training data set, wherein the training data set comprises a first data set carrying an identification attribute label and a second data set carrying the identification attribute label and a scene attribute label; and the model training module is used for training the named entity recognition model based on the first data set and the second data set in sequence.

In an exemplary embodiment of the disclosure, the music and comment style determination module includes: the emotion recognition module is used for inputting each piece of comment data into an emotion recognition model to obtain an emotion recognition result of each piece of comment data; and the comment style determining module is used for determining the comment style of each piece of comment data according to the emotion recognition result of each piece of comment data.

In an exemplary embodiment of the present disclosure, the music and comment style determination module includes: the melody recognition module is used for inputting each segment into a melody recognition model to obtain the melody recognition result of each segment; and the melody determination module is used for determining the melody of each segment according to the melody identification result of each segment.

According to an aspect of the present disclosure, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements a comment playing method as described in any of the above embodiments.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the comment playing method according to any of the above embodiments via execution of the executable instructions.

The comment playing method, the comment playing device, the storage medium and the electronic equipment in the disclosed embodiment respond to the playing of a music object, and synchronously display comment data with a comment style matched with the music style of a currently played section when any section of the music object is played by obtaining the music style of the section of the music object and the comment style of the comment data of the music object; therefore, by adopting the comment playing scheme disclosed by the invention, the comment data with the comment style matched with the music of the music clip can be synchronously displayed according to the currently played music clip, the play of the music clip is matched with the display of the comment data, the multi-sense immersion experience of listening to music and watching the matched comment data is brought, and the use experience of a user is effectively improved.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

fig. 1 schematically illustrates a flowchart of a comment playing method according to an embodiment of the present disclosure;

FIG. 2 schematically shows a flow diagram for synchronously presenting comment data having a comment style matched to a music piece's music style according to one embodiment of the present disclosure;

FIG. 3 schematically shows a flowchart for controlling presentation time of each piece of review content according to one embodiment of the present disclosure;

fig. 4 schematically illustrates a page schematic diagram of comment data synchronously showing a match of a comment style with a music piece currently played, according to an embodiment of the present disclosure;

fig. 5 schematically illustrates a page diagram for synchronously presenting comment data of which the comment style is matched with the music style of a currently played music piece according to still another embodiment of the present disclosure;

fig. 6 schematically shows a module architecture diagram of a comment playing apparatus according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a schematic diagram of a storage medium according to an embodiment of the present disclosure;

fig. 8 schematically illustrates a block architecture diagram of an electronic device according to an embodiment of the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present disclosure will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present disclosure, and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

In addition, the data related to the present disclosure may be data authorized by a user or fully authorized by each party, and the acquisition, transmission, use, and the like of the data all meet the requirements of relevant national laws and regulations, and the embodiments of the present disclosure may be combined with each other.

According to the embodiment of the disclosure, a comment playing method, a comment playing device, a storage medium and an electronic device are provided.

In this document, any number of elements in the drawings is intended to be illustrative and not restrictive, and any nomenclature is used for distinction only and not for any restrictive meaning.

The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments of the present disclosure.

Summary of The Invention

At present, comment data are often acquired and displayed in a method for displaying comment data in a music playing process, so that the immersion feeling of a user listening to music cannot be effectively enhanced, and the use feeling of the user is influenced.

For example, in the music playing process, all comment data of the music are acquired, and all comment data are sequentially displayed according to the playing progress of the music when the comment data is published.

For another example, a method for displaying comment data during music playing is provided, in which comment data instantly published by other users is obtained in real time during music playing for display.

Since comment data posted by a user is likely to be irrelevant to the currently played music; therefore, these methods for displaying comment data during music playing process easily cause that the displayed comment data is not associated with the currently played music, so that the immersion of the user in listening to the music cannot be effectively enhanced, and the use experience of the user may be affected because the comment data is not associated with the music.

In view of the above, the basic idea of the present disclosure is: the comment playing scheme is provided, comment data with a comment style matched with the music style can be synchronously displayed according to the music style of the currently played music clip, the matching between the playing of the music clip and the displaying of the comment data is ensured, and the multi-sense immersive experience of listening to music and watching the matched comment data is brought; in addition, according to the comment playing scheme, the playing speed of the comment data can be further controlled according to the reading speed corresponding to the music of the music clip and the initial reading speed of the comment data, so that the multi-sense immersive experience of listening to music and watching the matched comment data is coordinately improved, and the use experience of a user is further improved.

Having described the general principles of the present disclosure, various non-limiting embodiments of the disclosure are described in detail below with reference to the accompanying drawings.

Exemplary method

The comment playing method of the exemplary embodiment of the present disclosure may be applied to various user terminal devices, such as a mobile phone, a computer, and the like. In some scenarios, the comment playing method of the exemplary embodiment of the present disclosure may be applied to music-like applications in user terminal devices; when a user uses the music application, a multi-sense immersive experience of listening to music and watching matched comment data can be realized through the comment playing method provided by the exemplary embodiment of the present disclosure.

A comment playing method according to an exemplary embodiment of the present disclosure is described below with reference to fig. 1. Referring to fig. 1, the comment playing method of the present embodiment may include the following steps:

s110, responding to playing a music object, obtaining comment data of the music object.

The music objects may be vocal music and instrumental music, including various music types such as pop music, rock music, electronic music, classical music, and the like.

S120, determining the music style of the segment of the music object and the comment style of the comment data.

The melody of the segment of the music object can be detected and calibrated in advance. For example, for each music object in the music object library, the music styles of the respective segments of the music object are detected in advance and calibrated, so that when the music object is played, the music styles of the corresponding segments can be quickly acquired. In some scenarios, the music style of the currently playing segment may also be detected in real time as the music object is played.

And the comment style of the comment data can be detected and calibrated in advance. For example, for each music object, the comment style of each piece of comment data of the music object is detected in advance and is calibrated, so that when the corresponding music object is played, the comment style of each piece of comment data belonging to the music object can be quickly acquired. In addition, for the comment data published in real time, the comment style thereof can be detected in real time, so that the matched comment data published in real time can also be presented.

S130, when any one of the sections of the music object is played, comment data with the comment style matched with the music of the section is synchronously displayed.

Therefore, according to the comment playing method of the embodiment of the disclosure, by matching the comment style of the comment data with the music style of the music segment, when any one segment of a music object is played, comment data matching the comment style with the music style of the currently played music segment is synchronously displayed, so that the playing of the music segment is matched with the displaying of the comment data, a multi-sense immersive experience of listening to music and watching the matched comment data is brought, and the use experience of a user is effectively improved.

Fig. 2 illustrates a process of synchronously presenting comment data in which a comment style matches a music piece's music style according to an embodiment of the present disclosure. Referring to fig. 2, in an exemplary embodiment of the present disclosure, when any one of the sections of the music object is played, comment data having a comment style matching a song style of the section is synchronously presented, including the steps of:

and S210, obtaining target comment data of which the comment styles are matched with the target music styles according to the target music styles of the current playing segments.

The target music style is the music style of the currently played music piece, and the target comment data is comment data with a comment style matched with the target music style.

The comment styles and the music styles are generally in one-to-one matching relationship, the number of categories of the comment styles is equal to the number of categories of the music, and each category of music has one matched comment style. One-to-many or many-to-one matching relationship can be adopted between the comment styles and the music, the number of categories of the comment styles is different from the number of categories of the music, each category of music can have multiple types of matched comment styles, or each type of comment styles can be matched with multiple types of music. In some cases, a many-to-many matching relationship between the comment styles and the music may also be employed.

And S220, controlling the display time of each comment content according to the reading speed corresponding to the target music, the initial reading speed and the word number of each comment content of the target comment data.

The music can preset corresponding reading speed according to the rhythm speed, so that the fast-rhythm music corresponds to a fast reading speed, and the slow-rhythm music corresponds to a slow reading speed; meanwhile, the playing speed of the target comment data is comprehensively controlled by combining the initial reading speed of the character reading, so that the display time of each comment content in the target comment data takes the reading speed corresponding to the target music and the initial reading speed of the character reading into consideration.

Fig. 3 illustrates a process of controlling presentation time of each piece of comment content according to an embodiment of the present disclosure. Referring to fig. 3, in an exemplary embodiment of the disclosure, the step S220 of controlling the display time of each piece of comment content according to the reading speed corresponding to the target music score, an initial reading speed, and the word count of each piece of comment content of the target comment data specifically includes the following steps:

and S220a, controlling the display time of the comment content according to the reading speed corresponding to the target music and the word number of the comment content for the comment content with the word number smaller than the lower limit threshold value.

The comment contents with the word number smaller than the lower limit threshold value are controlled in playing speed according to the reading speed corresponding to the target music, so that the playing speed of the comment contents can be completely matched with the rhythm of the target music, and meanwhile, no burden is caused on character reading due to the fact that the word number is small.

In some scenarios, the lower threshold of the number of words is, for example, 10 words, and the reading speed corresponding to the music style includes, for example, three types, i.e., a reading speed S1 corresponding to the music style with a fast rhythm (which is a fast rhythm and can be set as needed), a reading speed S2 corresponding to the music style with a medium rhythm (which is a medium rhythm and can be set as needed), and a reading speed S3 corresponding to the music style with a slow rhythm (which is a slow rhythm and can be set as needed), where S1 > S2 > S3. Then, when a music piece is played, for comment contents with less than 10 characters, the comment style of which is matched with the target music piece, the display time of each comment content is determined according to the reading speed Sn (n =1, 2 or 3, the unit of Sn is character/minute, and the specific numerical value can be set as required) corresponding to the target music piece and the character number of each comment content, so that the playing speed of the comment contents with less characters is based on the rhythm speed of the music piece.

And S220b, controlling the display time of the comment content according to the initial reading speed and the word number of the comment content for the comment content with the number of words larger than the upper limit threshold value.

The comment content with the word number larger than the upper limit threshold value is not ready to be read due to excessive word number even if the display time of the comment content is determined according to the reading speed corresponding to the music with the slowest rhythm; therefore, for the comment contents, the playing speed is not suitable for being considered according to the rhythm of the segment music, and the playing speed is directly determined according to the initial reading speed, otherwise, the comment contents cause burden on character reading and influence on the use experience of the user.

In some scenarios, the upper threshold for word count is, for example, 50 words, and the initial reading speed is, for example, the average speed S of reading words _AVG . Then, when a music piece is played, for the comment contents with more than 50 characters whose comment styles are matched with the target music style of the music piece, the reading speed corresponding to the target music style is not considered any more, but the comment contents are read according to the initial reading speed S _AVG (S _AVG The unit of (2) is word/minute, and the specific numerical value can be set as required) and the word number of each comment content, the display time of each comment content is determined, and the play speed of the comment content with a large number of words is based on the initial reading speed of the character reading.

And S220c, controlling the display time of the comment content according to the reading speed and the initial reading speed corresponding to the target music style and the word number of the comment content for the comment content with the word number between the lower limit threshold and the upper limit threshold.

The comment contents with the word number between the lower limit threshold and the upper limit threshold need to consider the reading speed corresponding to the target music style and the initial reading speed of the character reading so that the playing speed of the comment contents is matched with the rhythm of the target music style and the character reading is not burdened.

In some scenarios, the lower threshold and the upper threshold for word count are, for example, 10 words and 50 words, respectively; then, when a music piece is played, for the comment content whose comment style is 10-50 words matched with the target music tune, the reading speed Sn corresponding to the target music tune and the initial reading speed S of the character reading are considered _AVG And determining the display time of each comment content by combining the word number of each comment content, so that the play speed of the comment content with proper word number can be matched with the rhythm of the target music and the normal speed of character reading.

In an exemplary embodiment of the present disclosure, for a comment content whose number of words is between a lower threshold and an upper threshold, controlling a presentation time of the comment content according to a reading speed corresponding to a target music style, an initial reading speed, and a number of words of the comment content may include: determining first display time according to the reading speed corresponding to the target music and the word number of the comment content; determining second display time according to the initial reading speed and the word number of the comment content; and determining the display time of the comment content according to the weight relation between the first display time and the second display time.

The weight relationship between the first display time and the second display time can be set according to needs. In an exemplary embodiment, for the comment content with the number of words between the lower threshold and the upper threshold, multiple steps may be further subdivided by the number of words, and a corresponding weight relationship may be set for each step.

For example, for the comment content with the digits of 10 to 30 words, the weighting relationship between the first presentation time and the second presentation time may be set to 0.5, taking the following limit threshold and the upper limit threshold as 10 words and 50 words, respectively as an example: 0.5, in the playing speed of the comment contents, the reading speed corresponding to the target music and the initial reading speed of the character reading respectively account for 50 percent of the weight. Then, for a comment content with 25 characters, the target music book is used to read correspondinglyThe reading speed is S2 and the initial reading speed is S _AVG For example, the formula for calculating the display time is as follows: 25/S2 0.5+25/S _AVG *0.5。

For the comment content with the word number between 30 words and 50 words, the weight relationship between the first display time and the second display time may be set to 0.3:0.7, in the playing speed of the comment contents, the reading speed corresponding to the target music and the initial reading speed of the character reading respectively account for 30% and 70% of the weight. Then, for a comment content with 40 characters, the reading speed corresponding to the target music is S3, and the initial reading speed is S _AVG For example, the formula for calculating the display time is: 25/S3 + 0.3+25/S _AVG *0.7。

In other embodiments, parameters related to specific values, such as the weighting relationship between the first presentation time and the second presentation time, the further subdivision of the comment content with the number between the lower threshold and the upper threshold, the lower threshold, and the upper threshold, may be adjusted as needed, and are not limited to the above example.

Further, in an exemplary embodiment of the present disclosure, the comment playing method further includes: under the condition that the client authorizes and captures the movement speed of the human eye focus, taking the movement speed of the human eye focus as an initial reading speed; and under the condition that the movement speed of the focus of the human eyes cannot be acquired, taking a preset reading speed as an initial reading speed.

The moving speed of the focus of the human eyes can be acquired by a camera module of a client, namely user terminal equipment. Under the condition that the movement speed of the focus of human eyes can be captured, the movement speed of the focus of human eyes is used as the initial reading speed, so that the playing speed of the comment data completely accords with the reading speed of a current user. In the case that the movement speed of the focus of the human eye cannot be captured (including the case that the client does not authorize to capture the movement speed of the focus of the human eye, and the case that the client authorizes to capture the movement speed of the focus of the human eye but does not effectively capture the movement speed due to hardware conditions, insufficient light, and the like), the average speed of reading the text (e.g., 476 words/minute) is used as the initial reading speed, so that the playing speed of the comment data can meet the reading speed of most users.

Fig. 4 shows an illustration of a page synchronously presenting comment data having a comment style matching a currently playing music piece's music style according to an embodiment of the present disclosure. Referring to fig. 4, taking a music application in a user terminal device as an example, when a user enters a related page 400 of the music application, a music object 410 is clicked and played; then, the comment playing method deployed in the music-class application obtains the comment data of the music object 410 in response to playing the music object 410, and synchronously displays the comment data with a comment style matching with the music of the currently played segment on the relevant page 400 along with the playing progress of the music object 410, for example, the comment data may be displayed in a page area corresponding to the comment hot wall 420.

In some scenarios, the music object 410 may also be automatically recommended to the user and automatically played when the user enters the related page 400, depending on the user's listening preferences, song popularity, and the like.

The comment data can be automatically played, wherein the presentation time of each piece of comment content is realized according to the comment playing method described in the above embodiments. In the hot comment wall 420, comment word input filter can be set, and only comment contents within 60 words are displayed to ensure the reading experience. The comment data may also support the user to switch using a swipe or other gesture of a setting.

In the related page 400, in addition to the page area corresponding to the music object 410 and the page area corresponding to the hot wall 420, other contents, such as similar song recommendations, a song chart, and the like, are included, which are not specifically shown in fig. 4.

Fig. 5 shows a page schematic of comment data synchronously showing matching of a comment style with a music style of a currently played music piece according to still another embodiment of the present disclosure. Referring to fig. 4 and 5, the user may click on the hot comment wall 420 and display it in full screen as the comment page 500 shown in fig. 5. In the review page 500, a progress bar 510 may be used to show the playing progress of each piece of review content 520 matching the currently playing segment. The display time of each piece of comment content 520 depends on the number of words and the reading speed corresponding to the target music of the currently played clip. The user can also switch each comment content 520 under the current playing segment in a mode of sliding and the like; when the current playing segment is switched to the last comment content, the current playing segment can be switched to the next segment of the currently played music object by continuously sliding; and when the last comment content of the last section of the currently played music object is switched to, the switchable music object continues to slide.

By the mode of synchronously displaying the comment data with the comment style matched with the music of the currently played music clip, a user can flexibly interact, and multi-sense immersive experience of listening to music while watching the matched comment data and flexibly switching the comment data according to requirements is realized.

In an exemplary embodiment of the present disclosure, the melody may be a melody and the comment style may be a mood. Therefore, comment data with emotion matched with the melody of the currently played music piece can be synchronously displayed along with the playing of the music object. For example: synchronously displaying comment data with flat emotion for the music segments with gentle melodies; synchronously displaying comment data of cheerful emotion for music segments with light melody; and synchronously displaying comment data with high emotion for music fragments with violent melody. Specifically, each melody has emotion matched with the type according to the type of the melody and the type of the emotion, so that comment data with emotion matched with the melody type can be synchronously displayed when music fragments of any melody type are played.

In an exemplary embodiment of the present disclosure, determining a comment style of the comment data may include: inputting each piece of comment data into an emotion recognition model to obtain an emotion recognition result of each piece of comment data; and determining the comment style of each comment data according to the emotion recognition result of each comment data.

The emotion recognition model can adopt a text classification model, the emotion recognition result comprises the probability that the comment data belong to various emotion categories, and the emotion categories are determined according to the training of the emotion recognition model.

Specifically, text classification is a supervised classification task, so in the training process of the emotion recognition model, some emotion classification samples need to be labeled in advance, for example, 1000+ pieces of comment data with emotion classification of "flat", 1000+ pieces of comment data with emotion classification of "cheerful", and 1000+ pieces of comment data with emotion classification of "high", the more accurately the emotion classification samples are labeled, the better the emotion recognition model learns, and the more accurate the classification is.

The emotion recognition model may be selected from RNNS/LSTMs-based algorithms, transformers architecture-based algorithms, and BERT algorithms.

Wherein RNNS is the abbreviation of Current Neural Networks, namely Recurrent Neural Networks; LSTMs are Short for Long Short Term Memory Networks, i.e. Long and Short Term Memory Networks. Most language modeling methods are based on RNN, but the simple RNN has the problems of gradient disappearance and gradient explosion, and cannot model a longer context dependence relationship. Therefore, most RNNS are replaced by LSTMs to capture the longer context in the document. Common text classification models based on LSTM such as the language representation model ELMO, the universal language model fine tuning ulmmit, and the like.

Transformers, also known as Attention neural networks, can solve the sequential problem of LSTM through the Attention mechanism, making parallel training easy to implement by treating the entire sequence as a whole.

BERT (Bidirectional Encoder expressions from transformations) is a method of pre-training language representation. BERT, which employs multiple encoders, is an autoregressive language model suitable for natural language generation.

Based on the text classification task of the embodiment, the best effect of BERT is found by testing the model, so the BERT is adopted as an emotion recognition model and trained, and the emotion recognition result of output comment data can be predicted.

BERT training mainly includes pre-training and fine-tuning processes. Pre-training is the first stage of BERT training, done in an unsupervised manner, and consists of two main tasks: mask Language Modeling (MLM) and Next Sequence Prediction (NSP). The specific pre-training process for the MLM task and the NSP task is known and will not be described further. In some cases, pre-trained BERT models (e.g., BERT-based-uncased models, BERT-based-chinese models, etc.) may also be loaded directly.

Fine-tuning (Fine-tuning) refers to Fine-tuning a pre-trained BERT model in combination with a specific text classification task (i.e., emotion recognition task in this embodiment). The fine adjustment process of the present embodiment may also be performed by a known method, and therefore will not be described.

In an exemplary embodiment of the disclosure, the emotion recognition model obtained finally through continuous optimization and adjustment in the training process of the BERT model can achieve an accuracy rate of more than 80% when the emotion classification task is processed. The emotion recognition result output by the emotion recognition model, specifically, the probability that the input comment data respectively belong to the emotion categories of 'flat', 'cheerful' and 'high'; according to the emotion recognition result of each piece of comment data, the emotion category with the highest probability can be used as the comment style of the corresponding comment data.

In an exemplary embodiment of the present disclosure, determining the melody of the piece of the music object may include: inputting each segment into a melody recognition model to obtain a melody recognition result of each segment; and determining the melody of each segment according to the melody recognition result of each segment.

The melody recognition model can be constructed based on the existing melody recognition algorithm. After the melody recognition model is trained, the probabilities that corresponding music segments belong to gentle melody categories, light melody categories and violent melody categories respectively can be output; thus, according to the melody recognition result of the music piece, the melody class in which the probability is the highest is taken as the melody.

In some cases, the identification of the melody of a musical piece may also be done in conjunction with human intervention to more accurately determine the style of each piece.

Further, in an exemplary embodiment of the present disclosure, obtaining comment data of a music object in response to playing of the music object includes: and screening out comment data related to the attribute of the music object from all comment data of the music object.

The screening of the comment data related to the attribute of the music object may include: inputting each piece of comment data into a named entity recognition model to obtain a named entity recognition result of each piece of comment data; and screening out comment data of which the named entity recognition result hits the attribute of the music object as comment data related to the attribute of the music object.

Named Entity Recognition (NER) is used for information extraction, and predefined Entity categories, such as names, locations, numbers, etc., can be extracted from the unstructured text, specifically to the Entity categories related to the attributes of the music objects in the present embodiment.

By screening the comment data related to the attributes of the music objects, the displayed comment data can embody the attributes of the music objects, so that the user listening to the music objects can be helped to deepen the cognition of the user listening to the music objects, and the experience of the user listening to the music objects is improved.

In an exemplary embodiment, the attributes of the music object include an identification attribute and a scene attribute of the music object; the comment playing method further comprises the following steps: obtaining a training data set, wherein the training data set comprises a first data set carrying an identification attribute label and a second data set carrying the identification attribute label and a scene attribute label; training the named entity recognition model based on the first data set and the second data set.

The identification attribute may include the title and/or author name of the music object; the song name specifically refers to the name of the music object, i.e., the name of the song, and the author name specifically refers to the name of the singer. The scene attribute may specifically include using the shadow of the music object as a brand name; the film and television works comprise heddles, movies, games and the like.

The following describes the training process of the named entity recognition model.

First, a data set is prepared. General NER models have been developed, and common model structures include Bi-LSTM-CRF (bidirectional LSTM + conditional random field) model, IDCNN-CRF (expanded convolution + conditional random field) model, BERT-CRF model, etc. The good machine learning model can not be separated from sufficient and high-quality data, and the NER model under the general scene can acquire a public data set in the network for training and testing; in a music scene, most entities are songs, singers and the like, the forms are numerous and the normalization is poor, and most entities have the characteristic of broad semantics, so that the entities are easily confused with other general entities (such as the singers Yes and the songs Hello), and a data set of the general scene cannot be directly used in the music scene. Therefore, in the face of the situation of data scarcity in a music scene, the embodiment avoids the need of manually marking a large amount of data and can obtain an NER model with good performance by establishing a semi-supervised training logic.

In one embodiment, the generic NER model is first trained to migrate to a music scene with a small amount of data (i.e., the first data set), and then trained iteratively (i.e., iteratively trained based on the second data set) to improve the performance of the model.

In a particular embodiment, the first data set particularly comprises data carrying a song title tag and an author name tag. In a specific scenario, 10000 SONG lists can be obtained by crawling from an internal system of a music application as original data, 5000 names of head singers and 5000 names of head SONGs are selected, a word list is constructed, two labels of a SONG title and an author title (the SONG title label is SONG, and the author title label is ART) are marked, and a first data set is formed.

The second data set comprises in particular data carrying a shadow as a brand name tag. In a specific scene, the names of 5000 posts related to movie and television integrated tours can be obtained by crawling from an external system to serve as original data, 200000 movie and television work names are selected from the original data, VIDEO tags are added, and the VIDEO tags and the first data set form a second data set.

After the data set preparation is completed, model training is performed. As described above, a general NER model has been developed, and a NER model with the best recognition effect on named entities in music scenes can be selected through multiple NER models.

In one embodiment, a plurality of NER models including the BERT-CRF model were tested to obtain the following comparative table of model effects:

each algorithm model in the model effect comparison table is already available, so the network structure and the technical principle of the algorithm model are not explained. According to the model effect comparison table, the named entity recognition model constructed based on the BERT-CRF model has the optimal model effect in a music scene, and can accurately recognize entities such as song names, singer names and related images in data as name names. Other models, such as the complex Flat-NER model, perform more generally because: the text length in the data set is usually short (because the data is mostly taken from singing lists, movie and television comprehensive names, etc., and is usually short text, the context semantics are weak), and the Flat-NER model has strong dependency on the context due to the design of lattice-lstm and the mechanism of sliding attribute, so that a good recognition effect in a music scene is difficult to obtain. The BERT-CRF model directly processes input texts by using BERT, and the pretraining of the BERT contains a large amount of embedded information, so that the short texts can be well classified when being processed. In addition, the LSTM model also has better performance on a test set, but because the BERT model is subjected to rich pre-training, after being familiar with the data distribution of the training set, the BERT model can also benefit from rich pre-training learning to obtain better recognition effect when processing tasks (medium and long texts) with different distributions, while the LSTM model can only process data which is distributed more similarly to the samples of the training set.

Therefore, in one embodiment, a named entity recognition model constructed based on the BERT-CRF model is finally adopted to identify the named entity of each piece of comment data.

In summary, the comment playing method described in the above embodiment of the present disclosure can synchronously display the comment data whose comment style matches the music style according to the music style of the currently played music piece, so as to ensure that the playing of the music piece matches the display of the comment data, thereby bringing a multi-sense immersive experience of listening to music while watching the matched comment data; in addition, the comment playing method described in the embodiment of the present disclosure can further control the playing speed of the comment data according to the reading speed corresponding to the music piece and the initial reading speed of the comment data, so that the multi-sense immersive experience of listening to music while watching the matched comment data is coordinately enhanced, and the user experience is further improved.

Exemplary devices

Having introduced the comment playing method of the exemplary embodiment of the present disclosure, a comment playing apparatus of the exemplary embodiment of the present disclosure will be described below with reference to fig. 6.

The comment playing device provided by the embodiment of the present disclosure can be used for implementing the comment playing method described in the corresponding embodiment. The features and principles of the comment playing method described in any of the above embodiments can be applied to the following corresponding comment playing apparatus embodiments. In the following comment playback apparatus embodiment, a description of features and principles that have been already elucidated with respect to comment playback will not be repeated.

Referring to fig. 6, a comment playing apparatus 600 of an exemplary embodiment of the present disclosure may include a comment data acquisition module 610, a tune and comment style determination module 620, and a comment data presentation module 630. The comment data obtaining module 610 is configured to obtain comment data of a music object in response to playing the music object; the music style and comment style determining module 620 is used for determining the music style of the music piece of the music object and the comment style of the comment data; the comment data presentation module 630 is configured to synchronously present comment data having a comment style matching the music style of the section when any section of the music object is played.

Therefore, the comment playing device 600 can synchronously display the comment data with the comment style matched with the music of the music segment according to the currently played music segment, ensures that the music segment is played and matched with the comment data, brings multi-sense immersion experience of listening to music and watching the matched comment data, and effectively improves the use experience of users.

According to an exemplary embodiment of the present disclosure, the comment data presentation module 630 includes: the target comment data acquisition module 610 is configured to acquire target comment data with a comment style matched with a target music style according to the target music style of the currently played clip; and the display time control module is used for controlling the display time of each piece of comment content according to the target music and the word number of each piece of comment content of the target comment data.

According to an exemplary embodiment of the present disclosure, a presentation time control module includes: the first control module is used for controlling the display time of the comment content with the word number smaller than the lower limit threshold value according to the reading speed corresponding to the target music; the second control module is used for controlling the display time of the comment content according to an initial reading speed for the comment content with the word number larger than the upper limit threshold; and the third control module is used for controlling the display time of the comment content according to the reading speed corresponding to the target music style, the initial reading speed and the word number of the comment content for the comment content with the word number between the lower limit threshold and the upper limit threshold.

According to an exemplary embodiment of the present disclosure, the third control module includes: the first display time determining module is used for determining first display time according to the reading speed corresponding to the target music and the word number of the comment content; the second display time determining module is used for determining second display time according to the initial reading speed and the word number of the comment content; and the display time determining module is used for determining the display time of the comment content according to the weight relation between the first display time and the second display time.

According to an exemplary embodiment of the present disclosure, the comment playing apparatus further includes: the human eye focus movement speed acquisition module is used for taking the human eye focus movement speed as an initial reading speed under the condition that the client authorizes and captures the human eye focus movement speed; and the default reading speed acquisition module is used for taking a preset reading speed as an initial reading speed under the condition that the movement speed of the focus of human eyes cannot be acquired.

According to an exemplary embodiment of the present disclosure, the comment data acquisition module 610 includes: and the comment data screening module is used for screening comment data related to the attribute of the music object from all comment data of the music object.

According to an exemplary embodiment of the present disclosure, the comment data filtering module includes: the named entity recognition module is used for inputting each piece of comment data into a named entity recognition model to obtain a named entity recognition result of each piece of comment data; and the recognition result screening module is used for screening the comment data of which the attribute of the music object is hit by the recognition result of the named entity as comment data related to the attribute of the music object.

According to an exemplary embodiment of the present disclosure, the attributes of the music object include an identification attribute and a scene attribute of the music object; the comment playing apparatus further includes: the training data acquisition module is used for acquiring a training data set, wherein the training data set comprises a first data set carrying an identification attribute label and a second data set carrying the identification attribute label and a scene attribute label; and the model training module is used for training the named entity recognition model based on the first data set and the second data set in sequence.

According to an exemplary embodiment of the present disclosure, the identification attribute includes a song title and/or an author name of the music object; the scene attribute includes using the shadow of the music object as a name of the item.

According to an exemplary embodiment of the present disclosure, the melody and comment style determination module 620 includes: the emotion recognition module is used for inputting each piece of comment data into an emotion recognition model to obtain an emotion recognition result of each piece of comment data; and the comment style determining module is used for determining the comment style of each piece of comment data according to the emotion recognition result of each piece of comment data.

According to an exemplary embodiment of the present disclosure, the melody and comment style determination module 620 includes: the melody recognition module is used for inputting each segment into a melody recognition model to obtain the melody recognition result of each segment; and the melody determining module is used for determining the melody of each segment according to the melody recognition result of each segment.

The principle and the characteristic of each functional module of the comment playing device of the above embodiment of the present disclosure are the same as those of the above embodiment of the comment playing method described in detail in the present disclosure, and therefore, the specific characteristic and the principle of each functional module can refer to the description of each comment playing method embodiment described above, and the description is not repeated here.

In summary, the comment playing apparatus provided in the exemplary embodiment of the present disclosure can synchronously display comment data matching the comment style and the music style according to the music style of the currently played music piece, so as to ensure that the playing of the music piece matches the displaying of the comment data, and bring a multi-sense immersive experience of listening to music while watching the matched comment data; in addition, the comment playing device provided by the exemplary embodiment of the present disclosure can further control the playing speed of the comment data according to the reading speed corresponding to the music piece and the initial reading speed of the comment data, so that the multi-sense immersive experience of listening to music while watching the matched comment data is coordinated and enhanced, and the user experience is further improved.

Exemplary storage Medium

Having described the comment playing method and apparatus of the exemplary embodiment of the present disclosure, a storage medium of the exemplary embodiment of the present disclosure will be described next with reference to fig. 7.

Referring to fig. 7, a storage medium 700 for implementing the comment playing method described above according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a device, such as a personal computer. However, the storage medium of the present disclosure is not limited thereto, and in this document, the storage medium may be any tangible medium that contains or stores a program, which can be used by or in connection with an instruction execution system, apparatus, or device.

A storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Exemplary electronic device

Having described the storage medium of the exemplary embodiment of the present disclosure, next, an electronic device of the exemplary embodiment of the present disclosure will be described with reference to fig. 8.

The electronic device 800 shown in fig. 8 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present disclosure.

As shown in fig. 8, the electronic device 800 is in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, a bus 830 connecting different system components (including the memory unit 820 and the processing unit 810), and a display unit 840.

Wherein the storage unit stores program code, which can be executed by the processing unit 810, to cause the processing unit 810 to perform the steps of the comment playing method according to various exemplary embodiments of the present disclosure described in the above exemplary method section of the present specification. For example, processing unit 810 may perform the steps as shown in fig. 1.

The memory unit 820 may include volatile memory units such as a random access memory unit (RAM) 8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.

The storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which or some combination thereof may comprise an implementation of a network environment.

Bus 830 may include a data bus, an address bus, and a control bus.

The electronic device 800 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, bluetooth device, etc.) via an input/output (I/O) interface 850. The electronic device 800 further comprises a display unit 840 connected to the input/output (I/O) interface 850 for displaying. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 860. As shown, the network adapter 860 communicates with the other modules of the electronic device 800 via the bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

It should be noted that although in the above detailed description several modules or sub-modules of the music style recognition device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units/modules described above may be embodied in one unit/module according to embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that the present disclosure is not limited to the particular embodiments disclosed, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A comment playing method, comprising:

in response to playing a music object, obtaining comment data of the music object;

determining a style of a tune of a segment of the music object and a comment style of the comment data;

and synchronously displaying comment data of which the comment styles are matched with the music styles of the sections when any section of the music object is played.

2. The comment playing method of claim 1, wherein the synchronously presenting comment data having a comment style matching a song of the segment comprises:

obtaining target comment data of which the comment style is matched with the target music style according to the target music style of the current playing segment;

and controlling the display time of each piece of comment content according to the reading speed corresponding to the target music, an initial reading speed and the word number of each piece of comment content of the target comment data.

3. The comment playing method of claim 2, wherein the controlling of the presentation time of each piece of the comment content includes:

for the comment content with the word number smaller than a lower limit threshold, controlling the display time of the comment content according to the reading speed corresponding to the target music and the word number of the comment content;

for the comment content with the word number larger than the upper limit threshold value, controlling the display time of the comment content according to the initial reading speed and the word number of the comment content;

and for the comment content with the word number between the lower limit threshold and the upper limit threshold, controlling the display time of the comment content according to the reading speed corresponding to the target music style, the initial reading speed and the word number of the comment content.

4. The comment playing method described in claim 3, wherein said controlling a presentation time of the comment content includes:

determining first display time according to the reading speed corresponding to the target music and the word number of the comment content;

determining second display time according to the initial reading speed and the word number of the comment content;

and determining the display time of the comment content according to the weight relation between the first display time and the second display time.

5. The comment playing method according to claim 2, wherein the comment playing method further comprises:

under the condition that the client authorizes and captures the movement speed of the human eye focus, taking the movement speed of the human eye focus as the initial reading speed;

and under the condition that the movement speed of the focus of the human eyes cannot be acquired, taking a preset reading speed as the initial reading speed.

6. The comment playing method of claim 1, wherein the obtaining of the comment data of the music object includes:

and screening comment data related to the attribute of the music object from all comment data of the music object.

7. The comment playing method described in claim 6, wherein said filtering out comment data related to the attribute of the music object includes:

inputting each piece of comment data into a named entity recognition model to obtain a named entity recognition result of each piece of comment data;

and screening comment data of which the attribute of the music object is hit in the named entity recognition result as comment data related to the attribute of the music object.

8. A comment playing apparatus characterized by comprising:

the comment data acquisition module is used for responding to a music object played and acquiring comment data of the music object;

the music style and comment style determining module is used for determining the music style of the segment of the music object and the comment style of the comment data;

and the comment data display module is used for synchronously displaying comment data of which the comment styles are matched with the music styles of the sections when any section of the music object is played.

9. A storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the comment playing method of any one of claims 1 to 7.

10. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the comment playing method of any one of claims 1-7 via execution of the executable instructions.