CN112309390A

CN112309390A - Information interaction method and device

Info

Publication number: CN112309390A
Application number: CN202010147899.5A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2021-02-02

Abstract

The embodiment of the disclosure discloses an information interaction method and device. One embodiment of the method comprises: acquiring voice interaction audio generated by a target user; generating feedback information of the voice interaction audio based on the voice interaction audio; determining an information feedback mode of feedback information based on user image information of a target user; and controlling feedback information, and performing information interaction in a mode indicated by an information feedback mode. The implementation method can determine the information feedback mode based on the user portrait and further carry out information interaction, so that the information interaction mode is enriched, and the interaction duration between the target user and the electronic equipment is prolonged. In the case that the electronic device is an early education device (such as an early education machine), the learning duration and the learning efficiency of the target user can be improved.

Description

Information interaction method and device

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to an information interaction method and device.

Background

At present, it is increasingly common to use electronic devices such as learning machines and early education machines for learning.

For example, in the field of early education of children, the existing learning machines, early education machines and other electronic devices usually focus on only the teaching of knowledge and the interaction of information.

For example, in the learning of primary school Chinese, in order to enable a child to correctly grasp the pronunciation of words and sentences, one way needs to rely on a teacher or parents to distinguish the pronunciation of the child; the other mode is based on deep learning, and whether the pronunciation of the children meets the target audio standard is mechanically detected.

Disclosure of Invention

The disclosure provides an information interaction method and device.

In a first aspect, an embodiment of the present disclosure provides an information interaction method, where the method includes: acquiring voice interaction audio generated by a target user; generating feedback information of the voice interaction audio based on the voice interaction audio; determining an information feedback mode of feedback information based on user image information of a target user; and controlling feedback information, and performing information interaction in a mode indicated by an information feedback mode.

In some embodiments, obtaining voice interaction audio generated by a target user comprises: acquiring a voice interaction audio generated by a target user aiming at a target audio; and generating feedback information of the voice interaction audio based on the voice interaction audio, wherein the feedback information comprises: and generating feedback information of the voice interaction audio based on the target audio and the voice interaction audio.

In some embodiments, the target audio is the audio to be read after, and the voice interaction audio is the audio to be read after by the user; and generating feedback information of the voice interaction audio based on the target audio and the voice interaction audio, wherein the feedback information comprises: and generating feedback information of the user follow-up reading audio, which is used for indicating whether the audio to be followed-up read is matched with the user follow-up reading audio or not, based on the similarity between the audio to be followed-up read and the user follow-up reading audio and the size relation between a preset audio similarity threshold value.

In some embodiments, the method further comprises: determining target music for playing for a target user under the condition that a first preset condition is met based on music played for the target user and selected by the target user, wherein the first preset condition comprises the similarity between audio to be read after and the audio read after by the user and is greater than or equal to a preset audio similarity threshold value; and controlling the target music to be played in response to the first preset condition being met.

In some embodiments, the target audio is the audio to be replied, and the voice interaction audio is the user reply audio; and generating feedback information of the voice interaction audio based on the target audio and the voice interaction audio, wherein the feedback information comprises: and generating feedback information of the user reply audio, which is used for indicating whether the user reply audio is matched with the audio to be replied or not, based on whether the user reply audio is matched with the audio to be replied or not.

In some embodiments, the method further comprises: determining target music for playing for the target user under the condition of meeting a second preset condition based on the music played for the target user and selected by the target user, wherein the second preset condition comprises that the user reply audio is matched with the audio to be replied; and controlling the target music to be played in response to the second preset condition being met.

In some embodiments, the method further comprises: determining music to be pushed to a target user based on user portrait information, wherein the user portrait information comprises at least one of: gender, age, character, audio selected by the target user played for the target user.

In some embodiments, the user representation information includes: gender, age, character, and audio selected by the target user played for the target user; and determining music to be pushed to a target user according to the user portrait information, wherein the music to be pushed to the target user comprises: determining the gender, age and character of the target user and audio selected by the target user played for the target user based on the interactive information of the target user; determining a target number of pieces of music for pushing to a target user from a predetermined music set based on gender, age and character; and determining the music to be pushed for pushing to the target user from the target number of pieces of music based on the audio selected by the target user played for the target user.

In some embodiments, determining music to be pushed for pushing to a target user based on user profile information includes: the method comprises the steps of responding to the fact that music to be pushed is pushed to a target user for a non-first time, determining the music pushed to a user associated with the target user, and using the pushed music as the music to be pushed to the target user, wherein the similarity of user portrait information of the user associated with the target user and user portrait information of the target user is larger than or equal to a preset user portrait information similarity threshold value.

In some embodiments, determining music to be pushed to the target user based on the user profile information of the target user further comprises: and responding to the first time of pushing the music to be pushed to the target user, and determining the music to be pushed to the target user based on the interactive information of the target user in the target time period, wherein the target time period takes the moment of pushing the music to be pushed to the target user last time as the starting moment and the current moment as the ending moment.

In some embodiments, the information feedback manner comprises at least one of: image feedback mode, text feedback mode, and audio feedback mode.

In a second aspect, an embodiment of the present disclosure provides an information interaction apparatus, including: an acquisition unit configured to acquire voice interaction audio generated by a target user; a generating unit configured to generate feedback information of the voice interaction audio based on the voice interaction audio; a first determination unit configured to determine an information feedback manner of feedback information based on user portrait information of a target user; and the first control unit is configured to control the feedback information to perform information interaction in a manner indicated by the information feedback manner.

In some embodiments, the obtaining unit comprises: the acquisition subunit is configured to acquire voice interaction audio generated by a target user aiming at the target audio; and the generating unit includes: and the generating subunit is configured to generate feedback information of the voice interaction audio based on the target audio and the voice interaction audio.

In some embodiments, the target audio is the audio to be read after, and the voice interaction audio is the audio to be read after by the user; and, the generating subunit includes: the first generation module is configured to generate feedback information of the user follow-up reading audio, wherein the feedback information is used for indicating whether the audio to be read-up reading and the user follow-up reading audio are matched or not, and is based on the similarity between the audio to be read-up reading and the user follow-up reading audio and the size relation between a preset audio similarity threshold value.

In some embodiments, the apparatus further comprises: the second determining unit is configured to determine target music for playing for the target user under the condition that a first preset condition is met on the basis of the music played for the target user and selected by the target user, wherein the first preset condition comprises the similarity between the audio to be read after and the audio read after by the user, and the similarity is larger than or equal to a preset audio similarity threshold value; a second control unit configured to control the target music playing in response to satisfaction of a first preset condition.

In some embodiments, the target audio is the audio to be replied, and the voice interaction audio is the user reply audio; and, the generating subunit includes: and the second generating module is configured to generate feedback information of the user reply audio, which is used for indicating whether the user reply audio is matched with the audio to be replied, based on whether the user reply audio is matched with the audio to be replied.

In some embodiments, the apparatus further comprises: determining target music for playing for the target user under the condition of meeting a second preset condition based on the music played for the target user and selected by the target user, wherein the second preset condition comprises that the user reply audio is matched with the audio to be replied; and controlling the target music to be played in response to the second preset condition being met.

In some embodiments, the apparatus further comprises: a third determining unit configured to determine music to be pushed to a target user based on user profile information, wherein the user profile information includes at least one of: gender, age, character, audio selected by the target user played for the target user.

In some embodiments, the user representation information includes: gender, age, character, and audio selected by the target user played for the target user; and the third determination unit includes: the first determining subunit is configured to determine the gender, age and character of the target user and the audio selected by the target user played for the target user based on the interactive information of the target user; a second determining subunit configured to determine, from a predetermined music collection, a target number of pieces of music for pushing to a target user based on gender, age, and character; and the third determining subunit is configured to determine the music to be pushed to the target user from the target number of pieces of music based on the audio selected by the target user played for the target user.

In some embodiments, the third determination unit comprises: the fourth determining subunit is configured to respond to the music to be pushed to the target user for the non-first time, determine the music pushed to the user associated with the target user, and use the pushed music as the music to be pushed to the target user, wherein the similarity between the user portrait information of the user associated with the target user and the user portrait information of the target user is larger than or equal to a preset user portrait information similarity threshold value.

In some embodiments, the third determining unit further comprises: and the fifth determining subunit is configured to determine the music to be pushed to the target user based on the interaction information of the target user within a target time period in response to the music to be pushed to the target user being pushed for the first time, wherein the target time period takes the moment of pushing the music to be pushed to the target user last time as a starting moment and takes the current moment as an ending moment.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; the storage device stores one or more programs thereon, and when the one or more programs are executed by the one or more processors, the one or more processors implement the method of any embodiment of the information interaction method.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the method of any one of the embodiments of the information interaction method described above.

According to the information interaction method and device provided by the embodiment of the disclosure, firstly, voice interaction audio generated by a target user is obtained, feedback information of the voice interaction audio is generated based on the voice interaction audio, an information feedback mode of the feedback information is determined based on user image information of the target user, and finally, the feedback information is controlled, and information interaction is carried out in a mode indicated by the information feedback mode. Therefore, the information interaction mode is enriched, and the duration of interaction between the target user and the electronic equipment is prolonged. Under the condition that the electronic equipment is early education equipment, the learning duration and the learning efficiency of the target user are improved.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of an information interaction method according to the present disclosure;

FIG. 3 is a schematic diagram of an application scenario of an information interaction method according to the present disclosure;

FIG. 4 is a flow diagram of yet another embodiment of an information interaction method according to the present disclosure;

FIG. 5 is a flow diagram of yet another embodiment of an information interaction method according to the present disclosure;

FIG. 6 is a schematic block diagram of one embodiment of an information-interacting device, according to the present disclosure;

FIG. 7 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of an information interaction method or an information interaction apparatus of embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

101, 102, 103 to interact with the server 105 over the network 104 to receive or transmit data (e.g., voice interaction audio generated by a target user), etc. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as learning-based software, video playing software, news-based application, image processing-based application, web browser application, shopping-based application, search-based application, instant messaging tool, mailbox client, social platform software, and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, early education machines, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as a plurality of software or software modules (e.g., software or software modules for providing distributed services) or as a single software or software module (e.g., software with early education functionality). And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server for generating feedback information of voice interaction audio. Alternatively, after generating the feedback information, the server 105 may feed back the generated feedback information to the

terminal apparatuses

101, 102, 103. As an example, the server 105 may be a cloud server.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be further noted that the information interaction method provided by the embodiment of the present disclosure may be executed by a server, may also be executed by a terminal device, and may also be executed by the server and the terminal device in cooperation with each other. Accordingly, each part (for example, each unit, sub-unit, module, sub-module) included in the information interaction device may be entirely disposed in the server, may be entirely disposed in the terminal device, and may be disposed in the server and the terminal device, respectively.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. When the electronic device on which the information interaction method operates does not need to perform data transmission with other electronic devices, the system architecture may only include the electronic device (e.g., the terminal device) on which the information interaction method operates.

With continued reference to FIG. 2, a flow 200 of one embodiment of an information interaction method according to the present disclosure is shown. The information interaction method comprises the following steps:

step 201, acquiring a voice interaction audio generated by a target user.

In this embodiment, an execution subject of the information interaction method (e.g., a server or a terminal device shown in fig. 1) may obtain a voice interaction audio generated by a target user. The target user may be a user interacting with the execution subject. As an example, the execution subject may be an early education machine, or a server for providing support to the early education machine, and the target user may be a child using the early education machine.

The voice interaction audio may be voice audio generated by the target user for interaction. For example, in a read-after scenario, the voice interaction audio may be read-after audio generated by the target user (i.e., user read-after audio); in the questioning mode, the voice interaction audio may be reply audio generated by the target user as described above (i.e., user reply audio). In addition, the voice interaction audio may also be any other audio generated by the target user as described above.

As an example, when the execution subject is a server, after a terminal device in communication connection with the execution subject obtains a voice interaction audio generated by a target user, the execution subject may obtain the voice interaction audio generated by the target user from the terminal device through a wired connection manner or a wireless connection manner, and in this scenario, the terminal device may have a voice obtaining function; when the execution subject is a terminal device, the execution subject may directly obtain the voice interaction audio generated by the target user from the target user.

Step 202, based on the voice interaction audio, generating feedback information of the voice interaction audio.

In this embodiment, the executing entity may generate feedback information of the voice interaction audio based on the voice interaction audio acquired in step 201.

It should be appreciated that in a human-computer interaction scenario, after obtaining user-generated speech (e.g., speech interaction audio), feedback (e.g., feedback information generating the speech interaction audio) needs to be performed based on the user-generated speech.

As an example, the executing agent may execute the step 202 in the following manner:

inputting the voice interaction audio obtained in the step 201 into a pre-trained feedback information generation model, and generating feedback information of the voice interaction audio obtained in the step 201.

The feedback information generation model can represent the corresponding relation between the voice interaction audio and the feedback information of the voice interaction audio.

Here, the feedback information generation model may be a convolutional neural network model obtained by training using a machine learning algorithm, or may be a two-dimensional table or a database in which the voice interaction audio, the feedback information of the voice interaction audio, and the incidence relation information between the voice interaction audio and the feedback information are stored.

As another example, the executing entity may execute the step 202 in the following manner:

first, the voice interaction audio obtained in step 201 is sent to a preset terminal for the preset terminal to play. And the user of the preset terminal inputs feedback information aiming at the played voice interactive audio.

And then, receiving information input by a user of the preset terminal aiming at the played voice interaction audio, and taking the received information as feedback information of the voice interaction audio.

It will be appreciated that prior to playing the audio, the content of the audio to be played (e.g., the feedback information described above) needs to be determined.

Here, the feedback information may be music (e.g., a song of a child).

And step 203, determining an information feedback mode of feedback information based on the user image information of the target user.

In this embodiment, the execution subject may determine tone information of audio to be played for the target user based on the user portrait information of the target user. Wherein the user representation information may include, but is not limited to, at least one of: age, native place, work, school calendar, gender, hobbies, and the like. The tone information is used to indicate the tone of the audio played for the target user.

Here, the user portrait information may be characterized in the form of a matrix, a vector, or the like. The user portrait information may be information obtained by performing feature engineering processing such as dimension reduction and normalization.

As an example, the executing agent may execute the step 203 by:

and inputting the user portrait information of the target user into a pre-trained first determination model to obtain an information feedback mode of feedback information. The first determination model can represent the corresponding relation between the user image information and the information feedback mode.

For example, the first determination model may be a convolutional neural network model trained by a machine learning algorithm, or may be a two-dimensional table or a database storing user image information and information feedback methods.

As still another example, the executing main body may further execute the step 203 by:

and inputting the user portrait information of the target user and the feedback information obtained in the step 202 into a pre-trained second person determination model to obtain an information feedback mode of the feedback information. The second determination model may represent a corresponding relationship between the user portrait information, the feedback information, and the information feedback manner.

For example, the second determination model may be a convolutional neural network model trained by a machine learning algorithm, or may be a two-dimensional table or a database storing user portrait information, feedback information, and information feedback modes.

In some alternative implementations of the present embodiment, the user representation information may include information of audio selected by the target user that is played for the target user. Thus, the executing main body may further execute the step 203 by:

and determining an information feedback mode for feeding back information based on the audio which is played for the target user and selected by the target user.

The audio played by the target user and selected by the target user may be music or audio in a video selected by the target user.

As an example, the tone color information for the audio played for the target user may indicate: the tone of any one of the multiple audio frequencies selected by the target user and played by the target user; it may also indicate: the tone of the audio with the longest playing time in the multiple audio selected by the target user and played by the target user; it may also indicate: and the tone of the audio played for the target user at the latest time in the plurality of audio played for the target user and selected by the target user.

It can be understood that, in the optional implementation manner, the tone of the audio played for the target user and selected by the target user is often the tone preferred by the target user, and therefore, the optional implementation manner may control the audio of the audio reply information to be played by using the tone indicated by the tone information, so that the audio preferred by the target user may be used to interact with the target user through subsequent steps. Therefore, the information interaction mode is enriched, the target user can accept the information more easily in the interaction process with the user, and the user experience is improved.

In some optional implementations of this embodiment, the information feedback manner includes at least one of the following: image feedback mode, text feedback mode, and audio feedback mode.

Wherein, the feedback mode of the image may include at least one of the following: the genre of the image, the content in the image, etc., for example: the feedback mode of the image can be cartoon image. The text feedback mode may include at least one of the following: the color of the text, the font size of the text, the content of the text, etc., for example, the feedback manner of the text may indicate the name or the animation name of the animation character in the animation poster. The feedback mode of the audio can comprise at least one of the following: the timbre of the audio, the pitch of the audio, the loudness of the audio, and so forth.

As an example, when the feedback information in step 202 is music and the information feedback manner includes the tone of audio, the tone information for playing the music for the target user may be determined based on the user portrait information of the target user and the feedback information (i.e., music) in step 202.

Specifically, if the music library includes 3 different tone versions of the music (e.g., the music singing in first, the music singing in second, and the music singing in third), the executing entity may determine the music for playing for the target user from the 3 tone versions of the music, and use the tone information of the music as the tone information of the audio for playing for the target user. It should be appreciated that prior to playing the audio, the playing form (e.g., timbre) of the audio to be played needs to be determined.

It should be noted that, in the embodiment of the present application, the execution order of the

steps

201 and 203 is not limited, for example, the execution main body may execute the steps in the order of the

steps

201, 202, and 203, may execute the steps in the order of the

steps

203, 201, and 202, and may execute the steps in the order of the

steps

201, 203, and 202.

And step 204, controlling the feedback information, and performing information interaction in a mode indicated by an information feedback mode.

In this embodiment, the executing entity may control the feedback information generated in step 202 to perform information interaction in a manner indicated by the information feedback manner determined in step 203.

As an example, when the information feedback manner includes a feedback manner of an image, the execution subject may control the feedback information to be presented in a feedback manner of the image indicated by the information feedback manner.

As another example, when the information feedback manner includes a feedback manner of audio (e.g., tone color information), in a case where a tone color indicated by the synthesized feedback information and tone color information (e.g., music determined for playing for a target user from music of different tone colors) has been obtained, the execution subject may directly control the synthesized audio to play; in the case where the synthesized audio is not obtained, the execution main body may first synthesize the tone indicated by the feedback information and the tone information to obtain the synthesized audio, and then control the synthesized audio to be played.

Here, in the case where the execution subject is a terminal device, it may directly play the feedback information in the tone indicated by the tone information; in the case that the execution subject is a server, the execution subject may send the audio information and/or the tone information to a terminal device communicatively connected thereto so that the terminal device plays the feedback information in the tone indicated by the tone information, or send the audio obtained by synthesizing the audio information and the tone information to the terminal device communicatively connected thereto so that the terminal device plays the feedback information in the tone indicated by the tone information (i.e., plays the synthesized audio).

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the information interaction method according to the present embodiment. In the application scenario of fig. 3, the early education machine 302 acquires the voice interaction audio 303 generated by the target user 301 (the text indicated by the voice interaction audio 303 is "1 plus 1 equals 2" in the illustration). The early tutor 302 then generates feedback information 304 (illustrated as "kids, your real wand!") for the voice interaction audio 303 based on the voice interaction audio 303. Then, the early tutor 302 determines an information feedback mode of the feedback information based on the user figure information of the target user 301. For example, the early tutor 302 may determine an information feedback mode (e.g., tone color information indicating the tone color of the XX character in the XX animation) of the feedback information based on the user representation information of the target user 301 (e.g., information of the animation selected by the target user 301 played for the target user 301). Finally, the early education machine 302 can control the feedback information to perform information interaction in a manner indicated by the information feedback manner. For example, the audio of the generated feedback information is played with the tone of the XX character in the XX animation.

According to the method provided by the embodiment of the disclosure, the feedback information of the voice interaction audio is generated by acquiring the voice interaction audio generated by the target user, the information feedback mode of the feedback information is determined based on the user portrait information of the target user, and then the feedback information is controlled to perform information interaction in the mode indicated by the information feedback mode, so that the mode of performing voice interaction with the user can be determined according to the user portrait information, thereby enriching the information interaction mode and being beneficial to prolonging the time of interaction between the target user and the electronic equipment. In the case that the electronic device is an early education device (such as an early education machine), the learning duration and the learning efficiency of the target user can be improved.

With further reference to FIG. 4, a flow 400 of yet another embodiment of an information interaction method is shown. The process 400 of the information interaction method includes the following steps:

step 401, acquiring a voice interaction audio generated by a target user for a target audio.

In this embodiment, an execution subject of the information interaction method (e.g., a server or a terminal device shown in fig. 1) may obtain a voice interaction audio generated by a target user for a target audio.

When the execution main body is a terminal device, the target audio may be an audio played by the execution main body; when the execution agent is a server, the target audio may be an audio played by a terminal device communicatively connected to the execution agent. The target audio may be used for audio interaction with a target user.

The target user may be a user who interacts with the execution subject. As an example, the execution subject may be an early education machine, or a server for providing support to the early education machine, and the target user may be a child using the early education machine.

The voice interaction audio may be voice audio generated by the target user for interaction. For example, in a read-after scenario, the voice interaction audio may be read-after audio generated by the target user (i.e., user read-after audio); in the questioning mode, the voice interaction audio may be reply audio generated by the target user as described above (i.e., user reply audio). In addition, the voice interaction audio may also be any audio generated by the target user as described above.

And 402, generating feedback information of the voice interaction audio based on the target audio and the voice interaction audio.

In this embodiment, an execution subject of the information interaction method (e.g., a server or a terminal device shown in fig. 1) may generate feedback information of the voice interaction audio based on the target audio and the voice interaction audio.

As an example, the executing agent may execute the step 402 in the following manner:

inputting the voice interaction audio and the target audio obtained in the step 401 into a pre-trained generation model, and generating feedback information of the voice interaction audio obtained in the step 401.

The generation model can represent the corresponding relation among the voice interaction audio, the target audio and the feedback information of the voice interaction audio.

Here, the feedback information generation model may be a convolutional neural network model obtained by training using a machine learning algorithm, or may be a two-dimensional table or a database in which the voice interaction audio, the target audio, the feedback information, and the association relationship information between the voice interaction audio, the target audio, and the feedback information are stored.

As another example, the executing entity may further execute the step 402 in the following manner:

first, the voice interaction audio and the target audio obtained in step 401 are sent to a preset terminal for the preset terminal to play. After the preset terminal plays the voice interaction audio and the target audio, a user of the preset terminal can input feedback information of the voice interaction audio.

And then, receiving information input by a user of the preset terminal aiming at the played voice interaction audio and the target audio, and taking the received information as feedback information of the voice interaction audio.

In some optional implementations of this embodiment, the target audio in step 401 may be the audio to be read after. The audio of the voice interaction in step 401 may be the audio of the user to follow. Thus, the executing body may further execute the step 403 by:

and generating feedback information of the user follow-up reading audio, which is used for indicating whether the audio to be followed-up read is matched with the user follow-up reading audio or not, based on the similarity between the audio to be followed-up read and the user follow-up reading audio and the size relation between a preset audio similarity threshold value.

Specifically, the execution main body may generate feedback information indicating that the audio to be read after is matched with the audio to be read after by the user and the audio to be read after is read after by the user, when the similarity between the audio to be read after and the audio to be read after by the user is greater than or equal to a preset audio similarity threshold; and under the condition that the similarity between the audio to be read and the user read is smaller than the preset audio similarity threshold, the execution main body can generate feedback information which is used for indicating that the audio to be read and the user read are not matched and is used for indicating the audio to be read and the user read.

It can be understood that the technical solutions described in the above optional implementation manners may be applied to a read-after mode, so that the read-after duration and the read-after efficiency of the target user may be improved.

In some application scenarios of the above alternative implementation, the execution main body may further perform the following steps:

step one, determining target music for playing for a target user under the condition of meeting a first preset condition based on music played for the target user and selected by the target user. The first preset condition comprises the similarity between the audio to be read and the audio to be read, and the similarity is larger than or equal to a preset audio similarity threshold value.

As an example, the execution subject may randomly select music from music selected by the target user, which is played for the target user, and take the selected music as target music for playing for the target user if the first preset condition is satisfied.

As still another example, the execution main body may further select music having the longest playing time from among pieces of music selected by the target user and played for the target user, and use the selected music as the target music for playing for the target user if the first preset condition is satisfied.

And step two, under the condition of meeting the first preset condition, controlling the target music to play.

Here, in the case where a first preset condition is satisfied, when the execution subject is a terminal device, the execution subject may directly play the target music; when the execution subject is a server, the execution subject may send the target music to a terminal device communicatively connected thereto so that the terminal device plays the target music when the first preset condition is satisfied

It can be understood that, in the above application scenario, the target music for playing for the target user in a case where the audio to be read after is similar to the audio to be read after of the user (for example, a similarity between the audio to be read after and the audio to be read after is greater than or equal to a preset audio similarity threshold) may be determined based on the music played for the target user and selected by the target user, and therefore, in a case where the audio to be read after is similar to the audio to be read after of the user, it may be determined that the target user reads after correctly, so as to play the target music.

In some optional implementations of this embodiment, the target audio in step 401 is an audio to be replied. The voice interaction audio in step 401 is the user reply audio. Thus, the executing body may further execute the step 403 by:

and generating feedback information of the user reply audio, which is used for indicating whether the user reply audio is matched with the audio to be replied or not, based on whether the user reply audio is matched with the audio to be replied or not.

As an example, the execution subject may input the user reply audio and the audio to be replied to a matching model trained in advance, and determine whether the user reply audio and the audio to be replied match. If the user reply audio is matched with the audio to be replied, the predetermined first feedback information is used as feedback information for indicating that the user reply audio is matched with the audio to be replied and the user reply audio; and if the user reply audio is not matched with the audio to be replied, using the predetermined second feedback information as the feedback information of the user reply audio for indicating that the user reply audio is not matched with the audio to be replied. For example, the first feedback information and the second feedback information may be preset. For example, the first feedback information may be "congratulations of you right and the second feedback information may be" a drunken answer ".

As still another example, the execution subject may further input the user reply audio and the audio to be replied to a pre-trained information generation model, and generate feedback information of the user reply audio indicating whether the user reply audio matches the audio to be replied. The information generation model can generate feedback information of the user reply audio, which is used for indicating whether the user reply audio is matched with the audio to be replied or not, based on the user reply audio and the audio to be replied. For example, the information generation model may be a convolutional neural network model obtained by training using a machine learning algorithm, or may be a two-dimensional table or a database storing user reply audio, audio to be replied, feedback information of the user reply audio indicating whether the user reply audio matches the audio to be replied, and correspondence information between the user reply audio and the audio to be replied.

It can be understood that the technical solution described in the above optional implementation manner can be applied to the question and answer learning mode in the field of early education and the like, so that the learning duration and the learning efficiency of the target user can be improved.

the method comprises the first step of determining target music for playing for a target user under the condition that a second preset condition is met on the basis of music played for the target user and selected by the target user. The second preset condition comprises that the user reply audio is matched with the audio to be replied.

As an example, the execution subject may randomly select music from music selected by the target user, which is played for the target user, and take the selected music as target music for playing for the target user if the second preset condition is satisfied.

As still another example, the execution main body may further select music having the longest playing time from among pieces of music selected by the target user and played for the target user, and use the selected music as the target music for playing for the target user if the second preset condition is satisfied.

And secondly, controlling the target music to play under the condition of meeting a second preset condition.

Here, in the case where a second preset condition is satisfied, when the execution subject is a terminal device, the execution subject may directly play the target music; when the execution subject is a server, the execution subject may send the target music to a terminal device communicatively connected thereto so that the terminal device plays the target music, if the second preset condition is satisfied.

It can be understood that, in the above application scenario, the target music for playing for the target user when the user reply audio matches the audio to be replied (for example, the user reply audio correctly replies the audio to be replied) may be determined based on the music played for the target user, and thus, when the user reply audio matches the audio to be replied, it may be determined that the target user correctly replies, so as to play the target music.

And step 403, determining an information feedback mode of feedback information based on the user image information of the target user.

In this embodiment, the execution manner of step 403 may refer to step 203 in the corresponding embodiment of fig. 2, which is not described herein again.

And step 404, controlling the feedback information, and performing information interaction in a mode indicated by an information feedback mode.

In this embodiment, the execution manner of step 404 may refer to step 204 in the corresponding embodiment of fig. 2, which is not described herein again.

It should be noted that, besides the above-mentioned contents, the embodiment of the present application may further include the same or similar features and effects as those of the embodiment corresponding to fig. 2 and/or fig. 5, and details are not repeated herein.

As can be seen from fig. 4, the process 400 of the information interaction method in this embodiment may generate feedback information of the voice interaction audio based on the target audio and the voice interaction audio, so as to further enrich the interaction manner and help increase the interaction duration between the target user and the electronic device. In the case that the electronic device is an early education device (such as an early education machine), the learning duration and the learning efficiency of the target user can be improved.

With continuing reference to FIG. 5, a flow 500 of yet another embodiment of a method of information interaction is shown. The process 500 of the information interaction method includes the following steps:

step 501, acquiring a voice interaction audio generated by a target user.

In this embodiment, step 501 is substantially the same as step 201 in the corresponding embodiment of fig. 2, and is not described here again.

Step 502, based on the voice interaction audio, generating feedback information of the voice interaction audio.

In this embodiment, step 502 is substantially the same as step 202 in the corresponding embodiment of fig. 2, and is not described herein again.

Step 503, determining an information feedback mode of the feedback information based on the user image information of the target user.

In this embodiment, step 503 is substantially the same as step 203 in the corresponding embodiment of fig. 2, and is not described herein again.

Step 504, controlling feedback information, and performing information interaction in a manner indicated by an information feedback manner

In this embodiment, step 504 is substantially the same as step 204 in the corresponding embodiment of fig. 2, and is not described here again.

And 505, determining music to be pushed to a target user based on the user portrait information.

In this embodiment, an executing subject (for example, a server or a terminal device shown in fig. 1) of the information interaction method may determine music to be pushed for pushing to a target user based on user portrait information. Wherein the user representation information includes at least one of: gender, age, character, audio selected by the target user played for the target user.

In some optional implementations of this embodiment, the user portrait information includes: gender, age, character, and audio selected by the target user played for the target user. Thus, the executing agent may execute the step 505 as follows:

firstly, determining the gender, age and character of the target user and the audio selected by the target user played for the target user based on the interactive information of the target user.

The interactive information can be obtained from daily use of the target user, and comprises inquiry, browsing records and the like.

Here, the execution subject may determine the gender, age, and character of the target user and the audio selected by the target user played for the target user by counting and analyzing the interaction information of the target user.

Then, based on gender, age and character, a target number of pieces of music for pushing to the target user is determined from a predetermined music collection.

As an example, the execution subject may select, from a predetermined music set, a target number of pieces of music that have been pushed to a user having at least one attribute of the gender, the age, and the character, and thereby use the selected target number of pieces of music as the target number of pieces of music for pushing to the target user.

As yet another example, the executive may also input gender, age, personality, and a predetermined music collection into a pre-trained music model to determine a target number of pieces of music for pushing to the target user. The music model can be used for determining a target number of pieces of music to be pushed to a target user from a music collection. Illustratively, the music model may be a convolutional neural network model trained by using a machine learning algorithm.

And finally, determining the music to be pushed for pushing to the target user from the target number of pieces of music based on the audio selected by the target user played for the target user.

As an example, the executing body may select, from the target number of pieces of music, a piece of music with the highest similarity to the audio selected by the target user played for the target user, so as to use the selected piece of music as the piece of music to be pushed for pushing to the target user.

As still another example, the execution main body may also randomly select music from the target number of pieces of music, thereby using the selected music as the music to be pushed to the target user.

It can be understood that, in the above optional implementation manner, a target number of pieces of music are selected for the target user based on gender, age and personality, and then the pieces of music to be pushed to the target user are determined based on the audio selected by the target user played for the target user, so that the pieces of music to be pushed to the target user can be pushed by combining objective characteristics such as gender, age and personality of the target user and subjective characteristics such as audio selected by the target user played by the target user, thereby improving the pertinence of music pushing.

In some optional implementations of this embodiment, the executing main body may further execute the step 505 in the following manner:

in the case that the music to be pushed is not pushed to the target user for the first time, determining the music already pushed to the user associated with the target user, and using the pushed music as the music to be pushed to the target user. And the similarity between the user portrait information of the user associated with the target user and the user portrait information of the target user is greater than or equal to a preset user portrait information similarity threshold value.

It can be understood that, in the above alternative implementation manner, music to be pushed to the target user is determined based on the music already pushed by the user associated with the target user, so that the pertinence of music pushing can be improved on the premise of cold start.

In some application scenarios of the above alternative implementation, the executing entity may also execute the step 505 in the following manner:

and under the condition that the music to be pushed is pushed to the target user for the first time, determining the music to be pushed for pushing to the target user based on the interactive information of the target user in the target time period. The target time period takes the moment of pushing the music to be pushed to the target user last time as the starting moment and takes the current moment as the ending moment.

It can be understood that, in the above optional implementation manner, the interactive information of the new target user can be obtained along with the lapse of time, so as to determine the music to be pushed currently used for pushing to the target user, thereby realizing the music pushing that changes along with the changes of the characteristics of the target user, such as age, hobby, and the like.

It should be noted that, besides the above-mentioned contents, the embodiment of the present application may further include the same or similar features and effects as those of the embodiment corresponding to fig. 2 and/or fig. 4, and details are not repeated herein.

As can be seen from fig. 5, the process 500 of the information interaction method in this embodiment may determine, based on the user portrait information, music to be pushed to the target user, so that the target user may be stimulated through the music in the process of interacting with the target user, thereby enriching the interaction manner and contributing to increase the duration of interaction between the target user and the electronic device. Under the condition that the electronic equipment is early education equipment (such as an early education machine), the learning duration and the learning efficiency of the target user can be improved, and therefore the user experience is improved.

With further reference to fig. 6, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of an information interaction device, which corresponds to the embodiment of the method shown in fig. 2, and which may include the same or corresponding features as the embodiment of the method shown in fig. 2 and produce the same or corresponding effects as the embodiment of the method shown in fig. 2, in addition to the features described below. The device can be applied to various electronic equipment.

As shown in fig. 6, the information interaction apparatus 600 of the present embodiment includes: an acquisition unit 601, a generation unit 602, a first determination unit 603, and a first control unit 604. The obtaining unit 601 is configured to obtain voice interaction audio generated by a target user; a generating unit 602 configured to generate feedback information of the voice interaction audio based on the voice interaction audio; a first determining unit 603 configured to determine an information feedback manner of the feedback information based on user portrait information of the target user; and the first control unit 604 is configured to control the feedback information to perform information interaction in a manner indicated by the information feedback manner.

In this embodiment, the obtaining unit 601 of the information interaction apparatus 600 may obtain the voice interaction audio generated by the target user.

In this embodiment, the generating unit 602 may generate feedback information of the voice interaction audio based on the voice interaction audio acquired by the acquiring unit 601.

In this embodiment, the first determining unit 603 may determine tone information of audio to be played for the target user based on the user portrait information of the target user.

In this embodiment, the first control unit may generate the control feedback information generated by the generating unit 602, and perform information interaction in a manner indicated by an information feedback manner.

In some optional implementations of this embodiment, the obtaining unit 601 includes: an acquisition subunit (not shown in the figure) configured to acquire voice interaction audio generated by a target user for a target audio; and, the generating unit 602 includes: a generating subunit (not shown in the figure) configured to generate feedback information of the voice interaction audio based on the target audio and the voice interaction audio.

In some optional implementation manners of this embodiment, the target audio is a to-be-read-after audio, and the voice interaction audio is a user read-after audio; and, the generating subunit includes: and a first generating module (not shown in the figure) configured to generate feedback information of the user follow-up reading audio, which is used for indicating whether the audio to be read-up and the user follow-up reading audio are matched or not, based on the size relation between the similarity between the audio to be read-up and the user follow-up reading audio and a preset audio similarity threshold.

In some optional implementations of this embodiment, the apparatus 600 further includes: a second determining unit (not shown in the figure), configured to determine, based on music played by the target user and selected by the target user, target music for playing by the target user if a first preset condition is met, where the first preset condition includes a similarity between the audio to be read after and the audio read after by the user, and is greater than or equal to a preset audio similarity threshold; a second control unit (not shown in the figure) configured to control the target music playing in response to satisfaction of the first preset condition.

In some optional implementation manners of this embodiment, the target audio is an audio to be replied, and the voice interaction audio is an audio replied by the user; and, the generating subunit includes: and a second generating module (not shown in the figure) configured to generate feedback information of the user reply audio indicating whether the user reply audio matches the audio to be replied, based on whether the user reply audio matches the audio to be replied.

In some optional implementations of this embodiment, the apparatus 600 further includes: determining target music for playing for the target user under the condition of meeting a second preset condition based on the music played for the target user and selected by the target user, wherein the second preset condition comprises that the user reply audio is matched with the audio to be replied; and controlling the target music to be played in response to the second preset condition being met.

In some optional implementations of this embodiment, the apparatus 600 further includes: a third determining unit (not shown in the figures) configured to determine music to be pushed to a target user based on user profile information, wherein the user profile information comprises at least one of: gender, age, character, audio selected by the target user played for the target user.

In some optional implementations of this embodiment, the user portrait information includes: gender, age, character, and audio selected by the target user played for the target user; and the third determination unit includes: a first determining subunit (not shown in the figure) configured to determine the gender, age, and character of the target user and the audio selected by the target user played for the target user based on the interaction information of the target user; a second determining subunit (not shown in the figure) configured to determine a target number of pieces of music for pushing to the target user from a predetermined music collection based on gender, age, and character; and a third determining subunit (not shown in the figure) configured to determine the music to be pushed to the target user from the target number of pieces of music based on the audio selected by the target user played for the target user.

In some optional implementations of this embodiment, the third determining unit includes: the fourth determining subunit is configured to respond to the music to be pushed to the target user for the non-first time, determine the music pushed to the user associated with the target user, and use the pushed music as the music to be pushed to the target user, wherein the similarity between the user portrait information of the user associated with the target user and the user portrait information of the target user is larger than or equal to a preset user portrait information similarity threshold value.

In some optional implementation manners of this embodiment, the third determining unit further includes: and the fifth determining subunit is configured to determine the music to be pushed to the target user based on the interaction information of the target user within a target time period in response to the music to be pushed to the target user being pushed for the first time, wherein the target time period takes the moment of pushing the music to be pushed to the target user last time as a starting moment and takes the current moment as an ending moment.

In the apparatus provided by the above embodiment of the present disclosure, the obtaining unit 601 obtains the voice interaction audio generated by the target user, the generating unit 602 generates feedback information of the voice interaction audio based on the voice interaction audio, and the first determining unit 603 determines an information feedback manner of the feedback information based on the user portrait information of the target user, and then the first controlling unit 604 controls the feedback information to perform information interaction in a manner indicated by the information feedback manner. Therefore, the voice interaction mode with the user can be determined according to the user portrait information, the information interaction mode is enriched, and the duration of interaction between the target user and the electronic equipment is prolonged. In the case that the electronic device is an early education device (such as an early education machine), the learning duration and the learning efficiency of the target user can be improved.

Referring now to fig. 7, a schematic diagram of an electronic device (e.g., the server or terminal device of fig. 1) 700 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The terminal device/server shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 7, electronic device 700 may include a processing means (e.g., central processing unit, graphics processor, etc.) 701 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from storage 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 illustrates an electronic device 700 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 7 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication means 709, or may be installed from the storage means 708, or may be installed from the ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Python, Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a generation unit, a first determination unit, and a first control unit. Where the names of these units do not in some cases constitute a limitation of the unit itself, for example, the capturing unit may also be described as a "unit that captures voice interaction audio generated by a target user".

As another aspect, the present disclosure also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring voice interaction audio generated by a target user; generating feedback information of the voice interaction audio based on the voice interaction audio; determining an information feedback mode of feedback information based on user image information of a target user; and controlling feedback information, and performing information interaction in a mode indicated by an information feedback mode.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. An information interaction method comprises the following steps:

acquiring voice interaction audio generated by a target user;

generating feedback information of the voice interaction audio based on the voice interaction audio;

determining an information feedback mode of the feedback information based on the user portrait information of the target user;

and controlling the feedback information, and performing information interaction in a mode indicated by the information feedback mode.

2. The method of claim 1, wherein the obtaining voice interaction audio generated by a target user comprises:

acquiring a voice interaction audio generated by a target user aiming at a target audio; and

the generating feedback information of the voice interaction audio based on the voice interaction audio comprises:

and generating feedback information of the voice interaction audio based on the target audio and the voice interaction audio.

3. The method of claim 2, wherein the target audio is a to-be-read-after audio and the voice interaction audio is a user read-after audio; and

generating feedback information of the voice interaction audio based on the target audio and the voice interaction audio, including:

4. The method of claim 3, wherein the method further comprises:

determining target music for playing for the target user under the condition that a first preset condition is met based on the music played for the target user and selected by the target user, wherein the first preset condition comprises the similarity between the audio to be read after and the audio read after by the user, and the similarity is greater than or equal to a preset audio similarity threshold value;

and responding to the first preset condition, and controlling the target music to play.

5. The method of claim 2, wherein the target audio is an audio to be replied, and the voice interaction audio is a user reply audio; and

6. The method of claim 5, wherein the method further comprises:

determining target music for playing for the target user under the condition that a second preset condition is met based on the music selected by the target user and played for the target user, wherein the second preset condition comprises that the user reply audio is matched with the audio to be replied;

and responding to the second preset condition, and controlling the target music to play.

7. The method of claim 1, wherein the method further comprises:

determining music to be pushed to the target user based on the user portrait information, wherein the user portrait information includes at least one of: gender, age, character, audio selected by the target user played for the target user.

8. The method of claim 7, wherein the user representation information comprises: gender, age, character, and audio selected by the target user played for the target user; and

the determining music to be pushed to the target user according to the user portrait information includes:

determining the gender, age and character of the target user and the audio selected by the target user played for the target user based on the interactive information of the target user;

determining a target number of pieces of music for pushing to the target user from a predetermined set of music based on the gender, the age, and the personality;

and determining the music to be pushed to the target user from the target number of pieces of music based on the audio selected by the target user and played by the target user.

9. The method of claim 7, wherein the determining music to be pushed for pushing to the target user based on the user representation information comprises:

responding to the fact that the music to be pushed is pushed to the target user for the non-first time, determining the music pushed to the user associated with the target user, and using the pushed music as the music to be pushed to the target user, wherein the similarity between the user portrait information of the user associated with the target user and the user portrait information of the target user is larger than or equal to a preset user portrait information similarity threshold value.

10. The method of claim 9, wherein the determining music to be pushed for pushing to the target user based on the user representation information of the target user further comprises:

responding to the first time of pushing the music to be pushed to the target user, and determining the music to be pushed to the target user based on the interactive information of the target user in a target time period, wherein the target time period takes the moment of pushing the music to be pushed to the target user last time as a starting moment and takes the current moment as an ending moment.

11. The method according to one of claims 1 to 10, wherein the information feedback manner comprises at least one of: image feedback mode, text feedback mode, and audio feedback mode.

12. An information interaction device, comprising:

an acquisition unit configured to acquire voice interaction audio generated by a target user;

a generating unit configured to generate feedback information of the voice interaction audio based on the voice interaction audio;

a first determination unit configured to determine an information feedback manner of the feedback information based on user portrait information of the target user;

and the first control unit is configured to control the feedback information to perform information interaction in a manner indicated by the information feedback manner.

13. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-11.

14. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-11.