CN117725256A

CN117725256A - Song recommendation method, song recommendation device, electronic equipment and storage medium

Info

Publication number: CN117725256A
Application number: CN202311733253.5A
Authority: CN
Inventors: 吴洁; 李茂�; 尹凯; 李涛; 顾靖楠
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-12-15
Filing date: 2023-12-15
Publication date: 2024-03-19

Abstract

The disclosure provides a song recommendation method, a song recommendation device, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, and particularly relates to the technical field of natural language processing, deep learning and large models. The method comprises the following steps: acquiring a first dialogue text of a client aiming at an avatar; inputting a first dialogue text into a large model, and outputting a requirement recognition result of the first dialogue text by the large model; if the requirement identification result comprises a song recommendation requirement, determining a target song to be recommended based on the first dialogue text; and inputting the identification of the target song into the large model, and outputting a second dialogue text of the avatar aiming at the client by the large model, wherein the second dialogue text carries the identification of the target song.

Description

Song recommendation method, song recommendation device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and in particular, to the field of natural language processing, deep learning, and large model technology, and more particularly, to a song recommendation method, apparatus, electronic device, storage medium, and computer program product.

Background

At present, along with the continuous development of artificial intelligence technology, a large model has the advantages of good generalization and the like, and is widely applied to the fields of information extraction, text credibility evaluation, machine translation and the like. The related art song recommendation method mostly only provides song searching and song playing functions, and has the problem of poor interactivity.

Disclosure of Invention

The present disclosure provides a song recommendation method, apparatus, electronic device, storage medium, and computer program product.

According to a first aspect of the present disclosure, a song recommendation method is provided, including: acquiring a first dialogue text of a client aiming at an avatar; inputting a first dialogue text into a large model, and outputting a requirement recognition result of the first dialogue text by the large model; if the requirement identification result comprises a song recommendation requirement, determining a target song to be recommended based on the first dialogue text; and inputting the identification of the target song into the large model, and outputting a second dialogue text of the avatar aiming at the client by the large model, wherein the second dialogue text carries the identification of the target song.

According to a second aspect of the present disclosure, there is provided a song recommendation apparatus, comprising: the first acquisition module is used for acquiring a first dialogue text aiming at the virtual image of the client; the second acquisition module is used for inputting the first dialogue text into the large model, and outputting a requirement recognition result of the first dialogue text by the large model; the determining module is used for determining a target song to be recommended based on the first dialogue text if the requirement identification result comprises a song recommendation requirement; and the third acquisition module is used for inputting the identification of the target song into the large model, and outputting a second dialogue text of the virtual image for the client by the large model, wherein the second dialogue text carries the identification of the target song.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the song recommendation method set forth in the first aspect above.

According to a fourth aspect of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the song recommendation method set forth in the first aspect above is provided.

According to a fifth aspect of the present disclosure, a computer program product is presented, comprising a computer program which, when executed by a processor, implements the song recommendation method presented in the first aspect above.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a song recommendation method according to an embodiment of the disclosure;

FIG. 2 is a flow chart of a song recommendation method according to another embodiment of the present disclosure;

FIG. 3 is a flowchart of a song recommendation method according to another embodiment of the present disclosure;

FIG. 4 is a flowchart of a song recommendation method according to another embodiment of the present disclosure;

FIG. 5 is a flowchart of a song recommendation method according to another embodiment of the present disclosure;

FIG. 6 is a flowchart of a song recommendation method according to another embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a song recommendation apparatus according to an embodiment of the disclosure;

fig. 8 is a schematic block diagram of an electronic device of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

AI (Artificial Intelligence ) is a technical science that studies, develops theories, methods, techniques and application systems for simulating, extending and expanding human intelligence. At present, the AI technology has the advantages of high automation degree, high accuracy and low cost, and is widely applied.

Natural language processing (Natural Language Processing, NLU) is an important direction in the fields of computer science and artificial intelligence to study computer systems that can effectively implement natural language communications, and in particular, software systems therein.

DL (Deep Learning) is a new research direction in the field of ML (Machine Learning), and is an inherent rule and expression hierarchy of Learning sample data, so that a Machine can analyze Learning ability like a person, can recognize data such as characters, images and sounds, and is widely applied to speech and image recognition.

Large models refer to machine learning models where the model has a large parameter size and complexity, requires a large amount of computing resources and memory space to train and store, and often requires distributed computing and special hardware acceleration techniques. The large model has stronger generalization capability and expression capability. The large model includes LLM (Large Language Model ). Large language models refer to deep learning models trained using large amounts of text data that can generate natural language text or understand the meaning of language text. The large language model can process various natural language tasks, such as text classification, question-answering, dialogue and the like, and is an important path to artificial intelligence.

Fig. 1 is a flowchart illustrating a song recommendation method according to an embodiment of the disclosure. As shown in fig. 1, the method includes:

s101, acquiring a first dialogue text of the client aiming at the avatar.

It should be noted that, the execution body of the song recommendation method according to the embodiment of the present disclosure may be a hardware device having data information processing capability and/or software necessary for driving the hardware device to operate. Alternatively, the execution body may include a workstation, a server, a computer, a user terminal, and other intelligent devices. The user terminal comprises, but is not limited to, a mobile phone, a computer, intelligent voice interaction equipment, intelligent household appliances, vehicle-mounted terminals and the like.

It should be noted that, the client refers to a client for providing a song recommendation service, and may include, for example, a song playing APP, a chat APP, a mobile phone assistant APP, and the like. The avatar refers to an avatar displayed on the client without excessively limiting a display position of the avatar on the client, for example, a lower left corner of a display page (such as a dialog page) of the client may be included, and the dialog page refers to a dialog page between a user manipulating the client and the avatar. The avatar is not excessively limited, and may include, for example, a virtual singer (e.g., lotteries, junior future, etc.), a cartoon character, a game character, a mobile phone assistant, etc.

It should be noted that the first dialog text is not limited too much, and may include text composed of any language such as chinese and english. For example, the first dialog text may include "i want to listen to xx", "i want to listen to the song with lyrics xx", "i want to sing the song" i want to listen to the happy song "," i want to sing a song xx "," i don't care ", etc.

In one embodiment, obtaining the first dialog text for the avatar by the client includes obtaining the first dialog text based on operation information of a user manipulating the client. It will be appreciated that a user manipulating a client may log into the client and enter text, voice information, etc. on a dialog page of the client.

In some examples, the user's operation information includes the user's input text, which may be regarded as the first dialog text.

In some examples, the operation information of the user includes voice information of the user, and voice recognition may be performed on the voice information of the user to obtain the first dialog text.

In one embodiment, obtaining the first dialog text for the avatar by the client includes receiving the first dialog text sent by the client and/or listening to the first dialog text.

S102, inputting the first dialogue text into the large model, and outputting a requirement recognition result of the first dialogue text by the large model.

It should be noted that the large model may be implemented by any large model in the related art, which is not limited herein. For example, a transducer model may be used, and it should be noted that the transducer model is a neural network model based on a self-attention mechanism. For example, the large model may be a large language model, such as BERT (Bidirectional Encoder Representation from Transformers), BERT is a pre-trained large language model.

It should be noted that the requirement recognition result is not limited too much, and may include, for example, a song recommendation requirement, a non-song recommendation requirement, and the like. It should be noted that the non-song recommendation requirement is not limited too much, and may include, for example, a knowledge question-and-answer requirement, a graphic dialogue requirement (such as a drawing requirement), a boring requirement, a document analysis requirement, and the like.

In one embodiment, outputting, by the large model, a demand recognition result of the first dialog text includes matching, by the large model, the first dialog text with a plurality of first regular expressions in a regular expression library, and if the first dialog text matches with at least one first regular expression successfully, determining that the demand recognition result includes a song recommendation demand. And/or the number of the groups of groups,

Classifying the requirements of the first dialogue text based on the first dialogue text by the large model to obtain the requirement category of the first dialogue text, and determining that the requirement recognition result comprises song recommendation requirements if the target requirement category of the first dialogue text is song recommendation requirements.

In some examples, the method further includes determining that the requirement identification result includes a non-song recommendation requirement if the first dialog text fails to match each of the first regular expressions and/or if the target requirement category of the first dialog text is a non-song recommendation requirement.

In some examples, classifying the demand of the first dialog text based on the first dialog text by the large model to obtain a demand category of the first dialog text includes obtaining a probability of the first dialog text under the candidate demand category based on the first dialog text by the large model, and taking the candidate demand category as a target demand category of the first dialog text if the probability of the first dialog text under the candidate demand category is greater than a set threshold. It should be noted that the above set threshold is not limited too much, and may include 0.5, for example.

It should be noted that the regular expression library includes a plurality of first regular expressions, and the first regular expressions are part or all of the regular expressions in the regular expression library.

And S103, if the requirement recognition result comprises song recommendation requirements, determining target songs to be recommended based on the first dialogue text.

It should be noted that, the number of target songs is at least one, and the determining the target songs to be recommended based on the first dialogue text may be implemented by any song recommendation method in the related art, which is not limited herein too.

In one embodiment, determining the target song to be recommended based on the first dialogue text includes extracting an identification of the target song from the first dialogue text, and taking the candidate song identified by the identification of the target song as the target song. It should be noted that, the target song is a candidate song of a part, the identifications of the songs are in one-to-one correspondence with the songs, and the identifications of the songs are not limited too much, for example, may include song names, numbers, and the like of the songs.

S104, inputting the identification of the target song into the large model, and outputting a second dialogue text of the virtual image for the client by the large model, wherein the second dialogue text carries the identification of the target song.

It should be noted that the second dialog text is not limited too much, and may include text composed of any language such as chinese and english.

For example, the second dialog text may include "good, avatar A-you want to sing xx, hope you like to be able to do a job (blink), and later want to listen to the song to say" give me a song bar "for avatar A, avatar A is always in a job" for me.

For example, the second dialog text may include (confidence full) that avatar A sings you a first xx bar-! Hope you like this song, and want to listen to the song later, say "give me a song bar" to avatar a, as long as you call, avatar a will be in "to".

Note that, the avatar a is the name of the avatar, for example, lotteries, and xx in the second dialog text is the song name of the target song.

In one embodiment, the method further comprises controlling the client to display a second dialog text, for example, controlling the client to display the second dialog text on a dialog page such that a user manipulating the client may see the second dialog text, thereby enabling a dialog interaction between the user and the avatar.

According to the song recommendation method, a first dialogue text of a client aiming at an virtual image is obtained, the first dialogue text is input into a large model, a requirement recognition result of the first dialogue text is output by the large model, if the requirement recognition result comprises song recommendation requirements, a target song to be recommended is determined based on the first dialogue text, the identification of the target song is input into the large model, a second dialogue text of the virtual image aiming at the client is output by the large model, and the second dialogue text carries the identification of the target song. Therefore, the large model can obtain a requirement recognition result based on the first dialogue text of the client aiming at the virtual image, and can determine a target song to be recommended based on the first dialogue text if the requirement recognition result comprises a song recommendation requirement, the large model can generate a second dialogue text of the virtual image aiming at the client based on the identification of the target song, song recommendation can be realized in the dialogue process of the client and the virtual image, namely, song recommendation accompanied by the virtual image is realized, interactivity of song recommendation is improved, and the large model can be utilized to generate the second dialogue text based on the identification of the target song, so that accuracy of the second dialogue text is improved.

In the above embodiment, the steps after the step S104 of outputting the second dialogue text of the avatar for the client by the large model may be further understood with reference to fig. 2, and fig. 2 is a schematic flow chart of a song recommendation method according to another embodiment of the disclosure, as shown in fig. 2, and the method includes:

s201, a first dialogue text of the client aiming at the avatar is acquired.

S202, inputting the first dialogue text into the large model, and outputting a requirement recognition result of the first dialogue text by the large model.

And S203, if the requirement recognition result comprises song recommendation requirements, determining target songs to be recommended based on the first dialogue text.

S204, inputting the identification of the target song into the large model, and outputting a second dialogue text of the virtual image for the client by the large model, wherein the second dialogue text carries the identification of the target song.

The relevant content of steps S201-S204 can be seen in the above embodiments, and will not be described here again.

S205, a playing request of the client for the target song is obtained.

In one embodiment, obtaining the play request of the client for the target song includes receiving the play request of the client for the target song, and/or monitoring the play request of the target song of the client.

In one embodiment, obtaining a play request of the client for the target song includes generating the play request of the client for the target song based on operation information of a user who manipulates the client. It will be appreciated that a user manipulating a client may enter text (e.g., "play xx"), voice information, click icons, etc., on a dialog page of the client.

In some examples, obtaining the play request of the client for the target song includes controlling the client to display a play icon of the target song, and generating the play request of the client for the target song based on a click operation of the play icon of the target song by a user who manipulates the client.

S206, determining target parameters of the avatar based on the target song.

It should be noted that the target parameters are not limited too much, and may include parameters such as motion, expression, gesture, position, skin, and the like. Different target songs may correspond to different target parameters.

In one embodiment, determining the target parameters of the avatar based on the target song includes obtaining a mapping relationship between candidate songs and candidate parameters of the avatar, and determining the target parameters based on the target song and the mapping relationship. For example, the candidate parameters for the target song map may be taken as target parameters.

S207, controlling the client to play the target song.

In one embodiment, controlling the client to play the target song includes sending indication information for indicating to play the target song to the client, and accordingly, the client may receive the indication information and play the target song based on the indication information.

In one embodiment, controlling the client to play the target song includes controlling a song play component in the client to play the target song.

S208, controlling the avatar based on the target parameters in the process of playing the target song.

It should be noted that, based on the target parameter, the control of the avatar may be implemented by any control method of the avatar in the related art, which is not limited herein.

In one embodiment, controlling the avatar based on the target parameters includes controlling an action of the avatar based on action parameters in the target parameters.

In one embodiment, controlling the avatar based on the target parameters includes configuring skin of the avatar based on skin parameters in the target parameters.

According to the song recommendation method, a playing request of a client side for a target song is obtained, target parameters of an avatar are determined based on the target song, the client side is controlled to play the target song, and the avatar is controlled based on the target parameters in the process of playing the target song. Therefore, when the playing request of the client for the target song is obtained, the target parameter of the virtual image can be determined in consideration of the target song, the flexibility of the parameter of the virtual image is improved, and in the process of playing the target song, the virtual image is controlled based on the target parameter, so that the singing effect of the virtual image can be realized.

In the above embodiment, regarding the determination of the target song to be recommended based on the first dialogue text in step S103, it can be further understood with reference to fig. 3, and fig. 3 is a schematic flow chart of a song recommendation method according to another embodiment of the disclosure, as shown in fig. 3, the method includes:

s301, a first dialogue text of the client aiming at the avatar is acquired.

S302, inputting the first dialogue text into the large model, and outputting a requirement recognition result of the first dialogue text by the large model.

The relevant content of steps S301 to S302 can be seen in the above embodiments, and will not be described here again.

S303, if the requirement recognition result comprises song recommendation requirement, song recommendation intention recognition is carried out on the first dialogue text, and an intention recognition result is obtained.

It should be noted that, the intention recognition result is not limited too much, for example, the intention recognition result may include a first kind of intention, a second kind of intention, a third kind of intention, and the like, that is, the intention other than the first kind of intention includes the second kind of intention, the third kind of intention, and the like.

In the embodiment of the disclosure, song recommendation intention recognition is performed on the first dialogue text to obtain an intention recognition result, which includes the following possible embodiments:

in the mode 1, text extraction is performed on the first dialogue text to obtain extracted texts, candidate song names of a plurality of candidate songs in a song library and a synonym word library of the candidate song names are obtained, and if at least one extracted text is consistent with the candidate song names and/or if at least one extracted text exists in the synonym word library of the candidate song names, it is determined that the intention recognition result comprises a first type of intention.

It should be noted that, the text extraction of the first dialog text may be implemented by any text extraction method in the related art, which is not limited herein. The number of the extracted texts is at least one, and the extracted texts are not limited too much, and may include, for example, song names, lyrics, singer names, and the like in the first dialog text. The song library includes a plurality of candidate songs, and the synonym word library of the candidate song names includes at least one synonym of the candidate song names.

For example, the candidate song names include "song name 1", "song name 2", and the synonym word stock of the candidate song names includes "song name 3", "song name 4".

If the first dialogue text is "i want to listen to song name 1", text extraction can be performed on "i want to listen to song name 1" to obtain extraction text "song name 1", and it can be known that extraction text "song name 1" is consistent with candidate song name "song name 1", and it is determined that the intention recognition result includes first kind intention.

If the first dialogue text is "i want to listen to song name 3", text extraction can be performed on "i want to listen to song name 3" to obtain extracted text "song name 3", and it can be known that extracted text "song name 3" exists in the synonym word stock of candidate song names, and it is determined that the intention recognition result comprises first kind of intention.

And if at least one extracted text is consistent with the candidate song names, and/or if at least one extracted text exists in a synonym word stock of the candidate song names, the user designates the song names of songs to be recommended, and the intention recognition result is determined to comprise first-class intention.

Mode 2, performing text extraction on the first dialogue text to obtain extracted texts, obtaining candidate lyrics of a plurality of candidate songs in a song library, a synonym word library of the candidate lyrics, and an emotion word library, and determining that an intention recognition result comprises a second type of intention if at least one extracted text is consistent with the candidate lyrics and/or if at least one extracted text exists in the synonym word library of the candidate lyrics and/or if at least one extracted text exists in the emotion word library.

It should be noted that, the synonym word library of the candidate lyrics includes at least one synonym of the candidate lyrics, the emotion word library includes a plurality of emotion words, and the emotion words are not limited too much, for example, may include "happy", "difficult", "active" and the like.

For example, the candidate lyrics include "lyrics 1", "lyrics 2", the synonym word library of the candidate lyrics includes "lyrics 3", "lyrics 4", and the emotion word library includes "not happy", "difficult", "active", "happy".

If the first dialogue text is "i want to listen to the song with lyrics 1", the text extraction can be performed on the "i want to listen to the song with lyrics 1" to obtain an extraction text "lyrics 1", the extraction text "lyrics 1" is known to be consistent with the candidate lyrics "lyrics 1", and the intention recognition result is determined to comprise the second kind of intention.

If the first dialogue text is "i want to listen to the song with lyrics 4", the text extraction can be performed on the "i want to listen to the song with lyrics 4" to obtain an extraction text "lyrics 4", the extraction text "lyrics 4" can be known to exist in the synonym word stock of the candidate lyrics, and the intention recognition result is determined to comprise the second category of intention.

If the first dialogue text is "I don't care", the text extraction can be carried out on the "I don't care", the extracted text is obtained, the extracted text is known to exist in the emotion word lexicon, and the intention recognition result is determined to comprise the second type of intention. Therefore, when the first dialogue text contains emotion words, the intention recognition result can be determined to comprise the second type of intention, song recommendation can be performed based on the true emotion of the user, songs with specific emotion can be accurately recommended, and individuation and intelligence of song recommendation are improved.

If the first dialogue text is "i want to listen to the happy song", text extraction can be performed on the "i want to listen to the happy song" to obtain an extracted text "happy", the extracted text "happy" can be known to exist in the emotion word lexicon, and the intention recognition result is determined to comprise the second kind of intention. Therefore, when the first dialogue text contains emotion words, the intention recognition result can be determined to comprise the second type of intention, song recommendation can be performed based on the appointed emotion of the user, songs with specific emotion can be accurately recommended, and individuation and intelligence of song recommendation are improved.

Therefore, text extraction is carried out on the first dialogue text to obtain extracted texts, if at least one extracted text is consistent with the candidate lyrics, and/or if at least one extracted text exists in a synonym word stock of the candidate lyrics, and/or if at least one extracted text exists in an emotion word stock, at the moment, the user designates lyrics or specific emotion of a song to be recommended, and determines that an intention recognition result comprises a second type of intention, for example, when the first dialogue text comprises emotion words, the intention recognition result comprises the second type of intention can be determined, song recommendation can be carried out based on the true emotion of the user or the designated emotion of the user, songs of the specific emotion can be accurately recommended, and individuation and intelligence of song recommendation are improved.

And 3, matching the first dialogue text with a plurality of candidate regular expressions in the regular expression library, and if the first dialogue text is successfully matched with at least one candidate regular expression, determining that the intention recognition result comprises a third type of intention.

It should be noted that the regular expression library includes a plurality of candidate regular expressions, and the candidate regular expressions are part or all of the regular expressions in the regular expression library.

For example, the regular expression library includes candidate regular expressions 1, 2.

If the first dialogue text is 'give me singing song', if 'give me singing song' is successfully matched with the candidate regular expression 1, determining that the intention recognition result comprises a third type of intention.

If the first dialogue text is "singing me a singer B song", and if "singing me a singer B song" is successfully matched with the candidate regular expression 2, determining that the intention recognition result comprises a third type of intention.

If the first dialogue text is "do you sing singer B 'and if" do you sing singer B' is successfully matched with the candidate regular expression 2, determining that the intention recognition result comprises a third type of intention.

And if the first dialogue text is successfully matched with at least one candidate regular expression, the user does not explicitly specify songs to be recommended, the user is indicated to want to randomly listen to the songs, and the intention recognition result is determined to comprise a third type of intention.

S304, determining a first song to be recommended corresponding to the intention recognition result according to the recommendation strategy corresponding to the intention recognition result.

It should be noted that, the number of the first songs corresponding to one intention recognition result is at least one, different intention recognition results may correspond to different first songs, or may correspond to the same first song, different intention recognition results may correspond to different recommendation strategies, and the recommendation strategies may be implemented by any song recommendation strategy in the related art, which is not limited too much.

S305, determining a target song based on the first song corresponding to the at least one intention recognition result.

In one embodiment, determining the target song based on the first song corresponding to the at least one intent recognition result includes randomly screening the target song from the first song corresponding to the at least one intent recognition result.

In one embodiment, determining the target song based on the first song corresponding to the at least one intention recognition result includes taking the first song corresponding to the first type of intention as the target song if the intention recognition result includes only the first type of intention or taking the first song corresponding to the first type of intention as the target song if the intention recognition result includes both the first type of intention and the non-first type of intention, and taking the first song corresponding to the non-first type of intention as the target song if the intention recognition result includes only the non-first type of intention. Thus, when the intent recognition result includes the first type intent and the non-first type intent, the first song corresponding to the first type intent is preferentially taken as the target song, or when the intent recognition result includes only the first type intent, the first song corresponding to the first type intent is taken as the target song, or when the intent recognition result includes only the non-first type intent, the first song corresponding to the non-first type intent is taken as the target song.

In one embodiment, determining the target song based on the first song corresponding to the at least one intention recognition result includes obtaining a score corresponding to the first song, sorting the plurality of first songs according to the score from high to low, and taking the first songs of the first N sorted songs as the target song, wherein N is a positive integer. Thus, the first song with the higher score can be regarded as the target song.

It should be noted that, different songs may correspond to different scores, and the score corresponding to the first song may be obtained by using any method for scoring a song in the related art, which is not limited herein too, for example, the score corresponding to the first song may be obtained, and the related content of the score of the candidate song may be obtained in the following embodiments, which is not described herein again.

Note that N is not limited to a large amount, and may be 1, for example.

According to the song recommendation method, song recommendation intention recognition is conducted on the first dialogue text, intention recognition results are obtained, first songs to be recommended corresponding to the intention recognition results are determined according to recommendation strategies corresponding to the intention recognition results, and target songs are determined based on the first songs corresponding to at least one intention recognition result. Therefore, the first songs corresponding to the intention recognition results can be determined by considering the recommendation strategy corresponding to a certain intention recognition result, the flexibility of the first songs corresponding to the intention recognition results can be realized, the target songs can be determined by considering the first songs corresponding to at least one intention recognition result, and the accuracy of song recommendation is improved.

In the above embodiment, regarding the determining the first song to be recommended corresponding to the intention recognition result according to the recommendation policy corresponding to the intention recognition result in step S304, as can be further understood with reference to fig. 4, fig. 4 is a schematic flow chart of a song recommendation method according to another embodiment of the disclosure, as shown in fig. 4, the method includes:

s401, a first dialogue text of the client aiming at the avatar is acquired.

S402, inputting the first dialogue text into the large model, and outputting a requirement recognition result of the first dialogue text by the large model.

S403, if the requirement recognition result comprises song recommendation requirement, song recommendation intention recognition is carried out on the first dialogue text, and an intention recognition result is obtained.

The relevant content of steps S401 to S403 can be seen in the above embodiments, and will not be described here again.

S404, if the intention recognition result is the first intention, acquiring the text similarity between the first dialogue text and the candidate song names of a plurality of candidate songs in the song library.

It should be noted that the text similarity is not limited too much, for example, the value range of the text similarity is 0 to 1. The text similarity between the first dialogue text and the candidate song names can be obtained by adopting any text similarity obtaining method in the related technology, and the text similarity corresponding to different candidate song names can be the same or different, and the text similarity is not limited too much.

And S405, determining a first song corresponding to the first type intention from the plurality of candidate songs based on the text similarity.

In one embodiment, determining a first song corresponding to the first type of intent from among the plurality of candidate songs based on the text similarity includes sorting the plurality of candidate songs from high to low according to the text similarity, and taking the first M candidate songs as the first song corresponding to the first type of intent, wherein M is a positive integer. It should be noted that M is not limited to a large amount, and may be 1, for example.

For example, candidate song names include "song name 1", "song name 2", m=1.

If the first dialogue text is "i want to listen to song name 1", the text similarity between "i want to listen to song name 1" and the candidate song name "song name 1" is 0.8, and the text similarity between "i want to listen to song name 1" and the candidate song name "song name 2" is 0.2, and the candidate song corresponding to the candidate song name "song name 1" can be used as the first song corresponding to the first type of intention.

In one embodiment, determining a first song corresponding to the first type of intent from a plurality of candidate songs based on the text similarity includes obtaining scores corresponding to the candidate songs based on the text similarity corresponding to the candidate songs, sorting the plurality of candidate songs from high to low according to the scores, and taking the first M candidate songs as the first song corresponding to the first type of intent, wherein M is a positive integer. Therefore, the text similarity corresponding to the candidate songs can be considered, the score corresponding to the candidate songs is obtained, and the candidate songs with higher scores are used as the first songs corresponding to the first type of intention.

In some examples, obtaining the score corresponding to the candidate song based on the text similarity corresponding to the candidate song includes obtaining a second product of the text similarity corresponding to the candidate song and a second weight as the score corresponding to the candidate song. It should be noted that the second weight is not limited too much, and may be 10000, for example.

For example, the candidate song names include "song name 1", "song name 2", m=1, and the second weight is 10000.

If the first dialogue text is "i want to listen to song name 1", the text similarity between "i want to listen to song name 1" and candidate song name "song name 1" may be obtained to be 0.8, then the score=0.8x10000=8000 corresponding to the candidate song corresponding to candidate song name "song name 1", and the text similarity between "i want to listen to song name 1" and candidate song name "song name 2" may be obtained to be 0.2, then the score=0.2x10000=2000 corresponding to the candidate song corresponding to candidate song name "song name 2" may be used as the first song corresponding to the first type of intention.

S406, determining a target song based on the first song corresponding to the at least one intention recognition result.

The relevant content of step S406 may be referred to the above embodiments, and will not be described herein.

According to the song recommendation method, if the intention recognition result is the first type intention, text similarity between the first dialogue text and candidate song names of a plurality of candidate songs in a song library is obtained, and based on the text similarity, a first song corresponding to the first type intention is determined from the plurality of candidate songs. Thus, when the intention recognition result is the first type intention, that is, the song name of the song to be recommended is specified by the user, the first song corresponding to the first type intention can be determined from the plurality of candidate songs in consideration of the text similarity between the first dialogue text and the candidate song names.

In the above embodiment, regarding the determining the first song to be recommended corresponding to the intention recognition result according to the recommendation policy corresponding to the intention recognition result in step S304, as can be further understood with reference to fig. 5, fig. 5 is a schematic flow chart of a song recommendation method according to another embodiment of the disclosure, as shown in fig. 5, the method includes:

s501, a first dialogue text of the client aiming at the avatar is acquired.

S502, inputting the first dialogue text into the large model, and outputting a requirement recognition result of the first dialogue text by the large model.

S503, if the requirement recognition result includes song recommendation requirement, song recommendation intention recognition is carried out on the first dialogue text, and an intention recognition result is obtained.

And S504, if the intention recognition result is not the first type intention, acquiring the text similarity between the first dialogue text and the candidate song names of a plurality of candidate songs in the song library.

The relevant content of steps S501-S504 can be seen in the above embodiments, and will not be described here again.

S505, obtaining a first feature similarity between the feature of the first dialogue text and the feature of the candidate lyrics of the plurality of candidate songs in the song library.

It should be noted that, the features of the first dialog text and the features of the candidate lyrics are not limited too much, and may include word vectors, for example. The first feature similarity is not limited too much, and may include, for example, cosine similarity, euclidean distance, and the like, for example, the value range of the first feature similarity is 0 to 1. The first feature similarity between the feature of the first dialog text and the feature of the candidate lyrics may be obtained by any feature similarity obtaining method in the related art, and the first feature similarities corresponding to different candidate songs may be the same or different, which is not limited herein too.

S506, determining a first song which is not corresponding to the first type intention from the candidate songs based on the text similarity and the first feature similarity.

In one embodiment, determining a first song corresponding to a non-first type of intent from a plurality of candidate songs based on the text similarity and the first feature similarity includes obtaining scores corresponding to the candidate songs based on the text similarity and the first feature similarity, sorting the plurality of candidate songs according to the scores from high to low, and taking the first M candidate songs as the first songs corresponding to the non-first type of intent.

In some examples, obtaining the score corresponding to the candidate song based on the text similarity and the first feature similarity includes obtaining a third product of the first feature similarity corresponding to the candidate song and the first weight, obtaining a second product of the text similarity corresponding to the candidate song and the second weight, wherein the second weight is greater than the first weight, and obtaining a sum of the third product and the second product as the score corresponding to the candidate song.

It should be noted that the first weight and the second weight are not limited too much, for example, the first weight may be 1, and the second weight may be 10000.

S507, determining a target song based on the first song corresponding to the at least one intention recognition result.

The relevant content of step S507 may be referred to the above embodiments, and will not be described herein.

According to the song recommendation method, if an intention recognition result is not the first type intention, text similarity between a first dialogue text and candidate song names of a plurality of candidate songs in a song library is obtained, first feature similarity between features of the first dialogue text and features of candidate lyrics of the plurality of candidate songs in the song library is obtained, and a first song which is not corresponding to the first type intention is determined from the plurality of candidate songs based on the text similarity and the first feature similarity. Therefore, when the intention recognition result is not the first type intention, namely the song name of the song to be recommended is not specified by the user, the text similarity between the first dialogue text and the candidate song name and the first feature similarity between the feature of the first dialogue text and the feature of the candidate lyrics can be comprehensively considered, and the first song which is not corresponding to the first type intention is determined from the plurality of candidate songs.

In the above embodiment, regarding the determination of the first song corresponding to the non-first type intention from the plurality of candidate songs based on the text similarity and the first feature similarity in step S506, it may be further understood with reference to fig. 6, and fig. 6 is a schematic flow chart of a song recommendation method according to another embodiment of the disclosure, as shown in fig. 6, and the method includes:

S601, a first dialogue text of the client aiming at the avatar is acquired.

S602, inputting the first dialogue text into the large model, and outputting a requirement recognition result of the first dialogue text by the large model.

And S603, if the requirement recognition result comprises song recommendation requirement, performing song recommendation intention recognition on the first dialogue text to obtain an intention recognition result.

S604, if the intention recognition result is not the first intention, acquiring the text similarity between the first dialogue text and the candidate song names of a plurality of candidate songs in the song library.

S605, obtaining first feature similarity between the features of the first dialogue text and the features of the candidate lyrics of the plurality of candidate songs in the song library.

The relevant content of steps S601-S605 can be seen in the above embodiments, and will not be described here again.

S606, obtaining second feature similarity between the features of the first dialogue text and the features of the candidate song names of the candidate songs in the song library.

It should be noted that the features of the candidate song names are not limited too much, and may include word vectors, for example. The second feature similarity may refer to the content related to the first feature similarity in the above embodiment, which is not described herein. The second feature similarities corresponding to different candidate songs may or may not be the same, and are not limited in this regard.

S607, obtaining a third feature similarity between the feature of the first dialogue text and the feature of the candidate attribute of the plurality of candidate songs in the song library.

It should be noted that the candidate attributes of the song are not limited too much, and may include classical, popular, rock, lyrics, wounded, cheerful, etc. The features of the candidate attributes are not overly limited and may include word vectors, for example. The third feature similarity may refer to the content related to the first feature similarity in the above embodiment, which is not described herein. The third feature similarities corresponding to different candidate songs may or may not be the same, and are not limited in this regard.

S608, determining a first song corresponding to the non-first type intention from the plurality of candidate songs based on the text similarity, the first feature similarity, the second feature similarity, and the third feature similarity.

In one embodiment, determining a first song corresponding to the non-first type of intent from the plurality of candidate songs based on the text similarity, the first feature similarity, the second feature similarity, and the third feature similarity includes obtaining scores corresponding to the candidate songs based on the text similarity, the first feature similarity, the second feature similarity, and the third feature similarity, sorting the plurality of candidate songs from high score to low score, and taking the M candidate songs before sorting as the first song corresponding to the non-first type of intent.

In some examples, the score corresponding to the candidate song is obtained based on the text similarity, the first feature similarity, the second feature similarity, and the third feature similarity, including weighting and summing the text similarity, the first feature similarity, the second feature similarity, and the third feature similarity to obtain the score corresponding to the candidate song.

In one embodiment, determining the first song that is not corresponding to the first type of intention from the plurality of candidate songs based on the text similarity, the first feature similarity, the second feature similarity, and the third feature similarity includes obtaining a maximum of the first feature similarity, the second feature similarity, and the third feature similarity that are corresponding to the candidate songs as a maximum feature similarity of the candidate songs, obtaining a score of the candidate songs based on the maximum feature similarity of the candidate songs and the text similarity of the candidate songs, and if the maximum feature similarity of the candidate songs is greater than a first set threshold, and/or if the text similarity of the candidate songs is greater than a second set threshold, and/or if the score of the candidate songs is greater than a third set threshold, adding the candidate songs to the song set, and selecting the first song that is not corresponding to the first type of intention from the song set. Therefore, the maximum feature similarity and the text similarity corresponding to the candidate songs can be comprehensively considered, the score corresponding to the candidate songs is obtained, the candidate songs with higher maximum feature similarity and/or higher text similarity and/or higher score are added to the album, and the first songs which are not corresponding to the first type of intention are screened from the album.

It should be noted that the album includes a plurality of candidate songs.

In some examples, the score corresponding to the candidate song is obtained based on the maximum feature similarity corresponding to the candidate song and the text similarity corresponding to the candidate song, including obtaining an average of the maximum feature similarity corresponding to the candidate song and the text similarity corresponding to the candidate song, and obtaining the score corresponding to the candidate song based on the average. For example, the product of the average value and the set coefficient is taken as the score corresponding to the candidate song.

In some examples, obtaining the score corresponding to the candidate song based on the maximum feature similarity corresponding to the candidate song and the text similarity corresponding to the candidate song includes obtaining a first product of the maximum feature similarity corresponding to the candidate song and a first weight, obtaining a second product of the text similarity corresponding to the candidate song and a second weight, wherein the second weight is greater than the first weight, and obtaining a sum of the first product and the second product as the score corresponding to the candidate song. Thus, the second weight corresponding to the text similarity is larger than the first weight corresponding to the maximum feature similarity, that is, the text similarity has a larger influence on the score corresponding to the candidate song, so that the first song which is not corresponding to the first type of intention is determined mainly in consideration of the text similarity.

In some examples, selecting a first song from the album that does not correspond to the first category of intent includes randomly selecting a first song from the album that does not correspond to the first category of intent.

In some examples, selecting the first song corresponding to the non-first type intention from the song set includes sorting the candidate songs in the song set according to the score from high to low if the intention recognition result is the second type intention, and taking the M candidate songs before sorting as the first song corresponding to the second type intention, wherein M is a positive integer, and randomly selecting the first song corresponding to the third type intention from the song set if the intention recognition result is the third type intention. Therefore, when the intention recognition result is the second type intention, namely the song lyrics of the song to be recommended or the specific emotion is designated by the user, the candidate song with higher score in the song set can be used as the first song corresponding to the second type intention, and when the intention recognition result is the third type intention, namely the song to be recommended is not explicitly designated by the user, the first song corresponding to the third type intention can be randomly screened out from the song set.

S609, determining a target song based on the first song corresponding to the at least one intention recognition result.

The relevant content of step S609 can be seen in the above embodiment, and will not be described here again.

According to the song recommendation method, second feature similarity between the features of the first dialogue text and the features of candidate song names of a plurality of candidate songs in a song library is obtained, third feature similarity between the features of the first dialogue text and the features of candidate attributes of the plurality of candidate songs in the song library is obtained, and based on the text similarity, the first feature similarity, the second feature similarity and the third feature similarity, a first song which is not corresponding to the first type intention is determined from the plurality of candidate songs. Thus, when the result of the recognition is not the first type of intention, that is, the song name of the song to be recommended is not specified by the user, the text similarity between the first dialogue text and the candidate song name, the first feature similarity between the feature of the first dialogue text and the feature of the candidate song lyrics, the second feature similarity between the feature of the first dialogue text and the feature of the candidate song name, and the third feature similarity between the feature of the first dialogue text and the feature of the candidate attribute can be comprehensively considered, and the first song corresponding to the non-first type of intention is determined from the plurality of candidate songs.

On the basis of any embodiment, the method further includes outputting, by the large model, a fourth dialogue text for the client for the avatar if the target song does not exist in the song library, wherein the fourth dialogue text is used for guiding a user who handles the client to listen to candidate songs in the song library. For example, the fourth dialog text includes "sorry, not found out eligible songs, avatar A has many original songs-if you want to listen to avatar A to sing, you can say" give me a singing bar "to avatar A, avatar A will sing you'll-! ". Therefore, when the target song does not exist in the song library, the large model generates a fourth dialogue text of the virtual image aiming at the client so as to guide the user controlling the client to listen to the candidate song in the song library, so that song recommendation is continuously carried out on the user, and the use experience of the user is improved.

In one embodiment, the method further comprises controlling the client to display fourth dialog text.

On the basis of any one of the above embodiments, the method further includes generating, by the large model, a third dialog text for the avatar for the client based on the first dialog text if the requirement recognition result includes a non-song recommendation requirement. Therefore, when the requirement recognition result comprises a non-song recommendation requirement, the large model can generate a third dialogue text of the virtual image aiming at the client based on the first dialogue text, and the method is suitable for application scenes such as knowledge question answering, picture-text dialogue, chatting and the like.

Note that the third dialog text is not limited too much, and may include text composed of any language such as chinese and english.

In one embodiment, the method further comprises generating, by the large model, a dialog image based on the first dialog text if the requirement recognition result includes a teletext requirement.

In one embodiment, the method further comprises controlling the client to display a third dialog text and/or dialog image.

For example, if the first dialog text is a "network structure of a Transformer", the requirement recognition result of the first dialog text includes a knowledge question-and-answer requirement, and the third dialog text may include a "network structure of a Transformer" mainly composed of two parts: an encoder and a decoder.

For example, if the first dialog text is "picture for me drawing a cat", the requirement recognition result of the first dialog text includes a graphic dialog requirement, and the third dialog text may include "me drawn, welcome feedback and advice to me, and help me progress quickly. You can fully describe your needs to continue drawing, such as: help me draw a penguin surfing in water.

For example, if the first dialog text is "i have little distraction recently", the requirement recognition result of the first dialog text includes boring requirements, and the third dialog text may include "what is you feel uncomfortable? Can share with avatar a? Avatar a will listen to your trouble here and provide some help and advice to you.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

According to an embodiment of the present disclosure, the present disclosure further provides a song recommendation apparatus, which is configured to implement the song recommendation method described above.

Fig. 7 is a block diagram of a song recommendation apparatus according to an embodiment of the present disclosure.

As shown in fig. 7, the song recommendation apparatus 700 includes: a first acquisition module 701, a second acquisition module 702, a determination module 703, and a third acquisition module 704.

A first obtaining module 701, configured to obtain a first dialogue text for an avatar of a client;

a second obtaining module 702, configured to input a first dialog text into a large model, and output, by the large model, a requirement recognition result of the first dialog text;

a determining module 703, configured to determine, based on the first dialogue text, a target song to be recommended if the requirement identification result includes a song recommendation requirement;

a third obtaining module 704, configured to input the identifier of the target song to the large model, and output, by the large model, a second dialogue text of the avatar for the client, where the second dialogue text carries the identifier of the target song.

In one embodiment of the present disclosure, the apparatus further comprises: the control module is used for: acquiring a playing request of the client for the target song; determining target parameters of the avatar based on the target song; controlling the client to play the target song; and controlling the avatar based on the target parameters in the process of playing the target song.

In one embodiment of the present disclosure, the determining module 703 is further configured to: performing song recommendation intention recognition on the first dialogue text to obtain an intention recognition result; determining a first song to be recommended corresponding to the intention recognition result according to a recommendation strategy corresponding to the intention recognition result; and determining the target song based on at least one first song corresponding to the intention recognition result.

In one embodiment of the present disclosure, the determining module 703 is further configured to: if the intention recognition result only comprises first-class intention, or if the intention recognition result comprises first-class intention and non-first-class intention, taking a first song corresponding to the first-class intention as the target song; and if the intention recognition result only comprises the non-first intention, taking the first song corresponding to the non-first intention as the target song.

In one embodiment of the present disclosure, the determining module 703 is further configured to: obtaining the score corresponding to the first song; and sequencing the plurality of first songs according to the score from high to low, and taking the first songs with the top N of the sequence as the target songs, wherein N is a positive integer.

In one embodiment of the present disclosure, the determining module 703 is further configured to: if the intention recognition result is a first type intention, acquiring text similarity between the first dialogue text and candidate song names of a plurality of candidate songs in a song library; and determining a first song corresponding to the first type of intention from a plurality of candidate songs based on the text similarity.

In one embodiment of the present disclosure, the determining module 703 is further configured to: obtaining the score corresponding to the candidate song based on the text similarity corresponding to the candidate song; and sorting the candidate songs according to the score from high to low, and taking M candidate songs before sorting as first songs corresponding to the first type of intention, wherein M is a positive integer.

In one embodiment of the present disclosure, the determining module 703 is further configured to: if the intention recognition result is not the first type intention, acquiring text similarity between the first dialogue text and candidate song names of a plurality of candidate songs in a song library; acquiring a first feature similarity between the feature of the first dialogue text and the feature of the candidate lyrics of a plurality of candidate songs in the song library; and determining a first song corresponding to the non-first type intention from a plurality of candidate songs based on the text similarity and the first feature similarity.

In one embodiment of the present disclosure, the determining module 703 is further configured to: acquiring second feature similarity between the features of the first dialogue text and the features of candidate song names of a plurality of candidate songs in the song library; obtaining a third feature similarity between the features of the first dialogue text and the features of the candidate attributes of the plurality of candidate songs in the song library; and determining a first song corresponding to the non-first type intention from a plurality of candidate songs based on the text similarity, the first feature similarity, the second feature similarity and the third feature similarity.

In one embodiment of the present disclosure, the determining module 703 is further configured to: obtaining the maximum value of the first feature similarity, the second feature similarity and the third feature similarity corresponding to the candidate song as the maximum feature similarity corresponding to the candidate song; obtaining a score corresponding to the candidate song based on the maximum feature similarity corresponding to the candidate song and the text similarity corresponding to the candidate song; if the maximum feature similarity corresponding to the candidate song is greater than a first set threshold, and/or if the text similarity corresponding to the candidate song is greater than a second set threshold, and/or if the score corresponding to the candidate song is greater than a third set threshold, adding the candidate song to a song set; and screening the first songs which are not corresponding to the first type of intention from the song set.

In one embodiment of the present disclosure, the determining module 703 is further configured to: obtaining a first product of the maximum feature similarity corresponding to the candidate song and a first weight; obtaining a second product of the text similarity corresponding to the candidate song and a second weight, wherein the second weight is greater than the first weight; and obtaining the sum value of the first product and the second product as the score corresponding to the candidate song.

In one embodiment of the present disclosure, the determining module 703 is further configured to: if the intention recognition result is a second type intention, sorting a plurality of candidate songs in the song set according to the score from high to low, and taking M candidate songs before sorting as first songs corresponding to the second type intention, wherein M is a positive integer; and if the intention recognition result is a third type intention, randomly screening out a first song corresponding to the third type intention from the song set.

In one embodiment of the present disclosure, the determining module 703 is further configured to: extracting the text of the first dialogue text to obtain an extracted text; obtaining candidate song names of a plurality of candidate songs in a song library and a synonym word library of the candidate song names; and if at least one extracted text is consistent with the candidate song names, and/or if at least one extracted text exists in a synonym word stock of the candidate song names, determining that the intention recognition result comprises a first type of intention.

In one embodiment of the present disclosure, the determining module 703 is further configured to: extracting the text of the first dialogue text to obtain an extracted text; acquiring candidate lyrics of a plurality of candidate songs in a song library, a synonym word library of the candidate lyrics and an emotion word library; and if at least one extracted text is consistent with the candidate lyrics, and/or if at least one extracted text exists in a synonym word bank of the candidate lyrics, and/or if at least one extracted text exists in the emotion word bank, determining that the intention recognition result comprises a second type of intention.

In one embodiment of the present disclosure, the determining module 703 is further configured to: matching the first dialogue text with a plurality of candidate regular expressions in a regular expression library; and if the first dialogue text is successfully matched with at least one candidate regular expression, determining that the intention recognition result comprises a third type of intention.

In one embodiment of the present disclosure, the apparatus further comprises: a fourth acquisition module, configured to: and if the requirement identification result comprises a non-song recommendation requirement, generating a third dialogue text of the avatar aiming at the client by the large model based on the first dialogue text.

According to the song recommendation device, a first dialogue text of a client aiming at an avatar is obtained, the first dialogue text is input to a large model, a requirement recognition result of the first dialogue text is output by the large model, if the requirement recognition result comprises song recommendation requirements, a target song to be recommended is determined based on the first dialogue text, the identification of the target song is input to the large model, a second dialogue text of the avatar aiming at the client is output by the large model, and the second dialogue text carries the identification of the target song. Therefore, the large model can obtain a requirement recognition result based on the first dialogue text of the client aiming at the virtual image, and can determine a target song to be recommended based on the first dialogue text if the requirement recognition result comprises a song recommendation requirement, the large model can generate a second dialogue text of the virtual image aiming at the client based on the identification of the target song, song recommendation can be realized in the dialogue process of the client and the virtual image, namely, song recommendation accompanied by the virtual image is realized, interactivity of song recommendation is improved, and the large model can be utilized to generate the second dialogue text based on the identification of the target song, so that accuracy of the second dialogue text is improved.

According to embodiments of the present disclosure, the present disclosure also proposes an electronic device, a readable storage medium and a computer program product.

FIG. 8 illustrates a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the electronic device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in electronic device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 806, such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the respective methods and processes described above, such as a song recommendation method. For example, in some embodiments, the song recommendation method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the song recommendation method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the song recommendation method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To address interactions with a user account, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user account; and a keyboard and pointing device (e.g., a mouse or trackball) through which a user account may present input to the computer. Other kinds of devices may also be used to propose interactions with a user account; for example, feedback presented to the user account may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user account may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user account computer having a graphical user account interface or a web browser through which a user account can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

According to an embodiment of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the steps of the song recommendation method according to the above-described embodiments of the present disclosure.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A song recommendation method, comprising:

acquiring a first dialogue text of a client aiming at an avatar;

inputting a first dialogue text into a large model, and outputting a requirement recognition result of the first dialogue text by the large model;

If the requirement identification result comprises a song recommendation requirement, determining a target song to be recommended based on the first dialogue text;

and inputting the identification of the target song into the large model, and outputting a second dialogue text of the avatar aiming at the client by the large model, wherein the second dialogue text carries the identification of the target song.

2. The method of claim 1, wherein the method further comprises:

acquiring a playing request of the client for the target song;

determining target parameters of the avatar based on the target song;

controlling the client to play the target song;

and controlling the avatar based on the target parameters in the process of playing the target song.

3. The method of claim 1, wherein the determining a target song to be recommended based on the first dialog text comprises:

performing song recommendation intention recognition on the first dialogue text to obtain an intention recognition result;

determining a first song to be recommended corresponding to the intention recognition result according to a recommendation strategy corresponding to the intention recognition result;

And determining the target song based on at least one first song corresponding to the intention recognition result.

4. The method of claim 3, wherein the determining the target song based on the first song corresponding to at least one of the intent recognition results comprises:

if the intention recognition result only comprises first-class intention, or if the intention recognition result comprises first-class intention and non-first-class intention, taking a first song corresponding to the first-class intention as the target song;

and if the intention recognition result only comprises the non-first intention, taking the first song corresponding to the non-first intention as the target song.

5. The method of claim 3, wherein the determining the target song based on the first song corresponding to at least one of the intent recognition results comprises:

obtaining the score corresponding to the first song;

and sequencing the plurality of first songs according to the score from high to low, and taking the first songs with the top N of the sequence as the target songs, wherein N is a positive integer.

6. The method of claim 3, wherein the determining, according to the recommendation policy corresponding to the intent recognition result, the first song to be recommended corresponding to the intent recognition result comprises:

If the intention recognition result is a first type intention, acquiring text similarity between the first dialogue text and candidate song names of a plurality of candidate songs in a song library;

and determining a first song corresponding to the first type of intention from a plurality of candidate songs based on the text similarity.

7. The method of claim 6, wherein the determining, based on the text similarity, a first song from a plurality of the candidate songs for which the first type of intent corresponds comprises:

obtaining the score corresponding to the candidate song based on the text similarity corresponding to the candidate song;

and sorting the candidate songs according to the score from high to low, and taking M candidate songs before sorting as first songs corresponding to the first type of intention, wherein M is a positive integer.

8. The method of claim 3, wherein the determining, according to the recommendation policy corresponding to the intent recognition result, the first song to be recommended corresponding to the intent recognition result comprises:

if the intention recognition result is not the first type intention, acquiring text similarity between the first dialogue text and candidate song names of a plurality of candidate songs in a song library;

Acquiring a first feature similarity between the feature of the first dialogue text and the feature of the candidate lyrics of a plurality of candidate songs in the song library;

and determining a first song corresponding to the non-first type intention from a plurality of candidate songs based on the text similarity and the first feature similarity.

9. The method of claim 8, wherein the determining a first song from a plurality of the candidate songs for which the non-first type of intent corresponds based on the text similarity and the first feature similarity comprises:

acquiring second feature similarity between the features of the first dialogue text and the features of candidate song names of a plurality of candidate songs in the song library;

obtaining a third feature similarity between the features of the first dialogue text and the features of the candidate attributes of the plurality of candidate songs in the song library;

and determining a first song corresponding to the non-first type intention from a plurality of candidate songs based on the text similarity, the first feature similarity, the second feature similarity and the third feature similarity.

10. The method of claim 9, wherein the determining a first song from a plurality of the candidate songs that corresponds to the non-first type of intent based on the text similarity, the first feature similarity, the second feature similarity, and the third feature similarity comprises:

Obtaining the maximum value of the first feature similarity, the second feature similarity and the third feature similarity corresponding to the candidate song as the maximum feature similarity corresponding to the candidate song;

obtaining a score corresponding to the candidate song based on the maximum feature similarity corresponding to the candidate song and the text similarity corresponding to the candidate song;

if the maximum feature similarity corresponding to the candidate song is greater than a first set threshold, and/or if the text similarity corresponding to the candidate song is greater than a second set threshold, and/or if the score corresponding to the candidate song is greater than a third set threshold, adding the candidate song to a song set;

and screening the first songs which are not corresponding to the first type of intention from the song set.

11. The method of claim 10, wherein the deriving the score for the candidate song based on the maximum feature similarity for the candidate song and the text similarity for the candidate song comprises:

obtaining a first product of the maximum feature similarity corresponding to the candidate song and a first weight;

obtaining a second product of the text similarity corresponding to the candidate song and a second weight, wherein the second weight is greater than the first weight;

And obtaining the sum value of the first product and the second product as the score corresponding to the candidate song.

12. The method of claim 10, wherein the screening out the first song from the album that corresponds to the non-first type of intent comprises:

if the intention recognition result is a second type intention, sorting a plurality of candidate songs in the song set according to the score from high to low, and taking M candidate songs before sorting as first songs corresponding to the second type intention, wherein M is a positive integer;

and if the intention recognition result is a third type intention, randomly screening out a first song corresponding to the third type intention from the song set.

13. The method according to any one of claims 3-12, wherein the performing song recommendation intent recognition on the first dialog text to obtain an intent recognition result includes:

extracting the text of the first dialogue text to obtain an extracted text;

obtaining candidate song names of a plurality of candidate songs in a song library and a synonym word library of the candidate song names;

and if at least one extracted text is consistent with the candidate song names, and/or if at least one extracted text exists in a synonym word stock of the candidate song names, determining that the intention recognition result comprises a first type of intention.

14. The method according to any one of claims 3-12, wherein the performing song recommendation intent recognition on the first dialog text to obtain an intent recognition result includes:

extracting the text of the first dialogue text to obtain an extracted text;

acquiring candidate lyrics of a plurality of candidate songs in a song library, a synonym word library of the candidate lyrics and an emotion word library;

and if at least one extracted text is consistent with the candidate lyrics, and/or if at least one extracted text exists in a synonym word bank of the candidate lyrics, and/or if at least one extracted text exists in the emotion word bank, determining that the intention recognition result comprises a second type of intention.

15. The method according to any one of claims 3-12, wherein the performing song recommendation intent recognition on the first dialog text to obtain an intent recognition result includes:

matching the first dialogue text with a plurality of candidate regular expressions in a regular expression library;

and if the first dialogue text is successfully matched with at least one candidate regular expression, determining that the intention recognition result comprises a third type of intention.

16. The method of any one of claims 1-12, wherein the method further comprises:

and if the requirement identification result comprises a non-song recommendation requirement, generating a third dialogue text of the avatar aiming at the client by the large model based on the first dialogue text.

17. A song recommendation apparatus comprising:

the first acquisition module is used for acquiring a first dialogue text aiming at the virtual image of the client;

the second acquisition module is used for inputting the first dialogue text into the large model, and outputting a requirement recognition result of the first dialogue text by the large model;

the determining module is used for determining a target song to be recommended based on the first dialogue text if the requirement identification result comprises a song recommendation requirement;

and the third acquisition module is used for inputting the identification of the target song into the large model, and outputting a second dialogue text of the virtual image for the client by the large model, wherein the second dialogue text carries the identification of the target song.

18. The apparatus of claim 17, wherein the apparatus further comprises: the control module is used for:

Acquiring a playing request of the client for the target song;

determining target parameters of the avatar based on the target song;

controlling the client to play the target song;

19. The apparatus of claim 17, wherein the means for determining is further configured to:

20. The apparatus of claim 19, wherein the means for determining is further configured to:

21. The apparatus of claim 19, wherein the means for determining is further configured to:

obtaining the score corresponding to the first song;

22. The apparatus of claim 19, wherein the means for determining is further configured to:

23. The apparatus of claim 22, wherein the means for determining is further configured to:

24. The apparatus of claim 19, wherein the means for determining is further configured to:

25. The apparatus of claim 24, wherein the means for determining is further configured to:

26. The apparatus of claim 25, wherein the means for determining is further configured to:

27. The apparatus of claim 26, wherein the means for determining is further configured to:

28. The apparatus of claim 26, wherein the means for determining is further configured to:

29. The apparatus of any of claims 19-28, wherein the determining module is further to:

extracting the text of the first dialogue text to obtain an extracted text;

30. The apparatus of any of claims 19-28, wherein the determining module is further to:

Extracting the text of the first dialogue text to obtain an extracted text;

31. The apparatus of any of claims 19-28, wherein the determining module is further to:

32. The apparatus of any of claims 17-28, wherein the apparatus further comprises: a fourth acquisition module, configured to:

33. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-16.

34. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-16.

35. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-16.