CN117150071A

CN117150071A - Voice comment display method and voice comment processing method

Info

Publication number: CN117150071A
Application number: CN202311228020.XA
Authority: CN
Inventors: 刘仁鹏
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2023-09-21
Filing date: 2023-09-21
Publication date: 2023-12-01

Abstract

The disclosure relates to a voice comment display method and a voice comment processing method, and belongs to the technical field of computers. The voice comment display method comprises the following steps: responding to comment viewing operation of any resource, and displaying a comment display interface of the resource; displaying text information of the voice comments in a comment display interface of the resource, and displaying play options of the voice comments, wherein the text information comprises characters converted from the voice comments; and responding to the playing operation of the playing option, and playing the voice comment in an audio mode. According to the method, the voice comments are displayed in a mode of adding the text and playing options, so that a user can conveniently and clearly know what the voice comments say, and can quickly and intuitively know the comment contents; if the user is interested in the voice comment, the voice comment is played, and the voice comment has higher expressive force and more attractive voice frequency, so that the interaction effect of the comment can be improved, and the man-machine interaction efficiency is improved.

Description

Voice comment display method and voice comment processing method

Technical Field

The disclosure relates to the technical field of computers, in particular to a voice comment display method and a voice comment processing method.

Background

With the rapid development of computer technology, users can watch their own interesting works (videos, pictures, articles, etc.) through applications installed on terminals. In the process of watching the work, the user can not only comment on the work, but also view comments posted on the work by other users.

At present, most comments are composed of at least one of characters, expressions and pictures, but the expressions of the characters, the expressions and the pictures are limited, so that the comment interaction effect is poor, and the man-machine interaction efficiency is low.

Disclosure of Invention

The invention provides a voice comment display method and a voice comment processing method, which can improve the interaction effect of comments and improve the man-machine interaction efficiency. The technical scheme of the present disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided a method for displaying a voice comment, the method for displaying a voice comment including:

responding to comment viewing operation of any resource, and displaying a comment display interface of the resource;

displaying text information of the voice comments in a comment display interface of the resource, and displaying play options of the voice comments, wherein the text information comprises characters converted from the voice comments;

And responding to the playing operation of the playing option, and playing the voice comment in an audio mode.

In some embodiments, the presenting the play option of the voice comment includes:

and displaying the play options of the voice comments according to a first display style, wherein the first display style is matched with the voice characteristics of the voice comments.

In some embodiments, the speech features include at least one of a first timbre feature for representing gender, a second timbre feature for representing age, an emotional feature, and a content feature.

and displaying the play options of the voice comments as expression pictures, virtual objects or dynamic pictures.

In some embodiments, the voice comment presentation method further includes:

acquiring picture information corresponding to the voice comment from a server, wherein the picture information is picture information matched with the voice characteristic of the voice comment and acquired by the server;

the displaying the play options of the voice comments comprises the following steps:

and displaying the play options in a picture form based on the picture information.

In some embodiments, the presenting text information of the voice comment includes:

displaying the text information of the voice comment according to a second display style;

the voice comment display method further comprises the following steps:

displaying the text comments according to a third display style in the comment display interface of the resource;

wherein the second display style is different from the display style.

In some embodiments, the text information of the voice comment is different from the font of the text comment; or,

the text information of the voice comment is different from the font of the text comment; or,

the word information of the voice comment is different from the word size of the word comment.

According to a second aspect of the embodiments of the present disclosure, there is provided a voice comment processing method, including:

receiving a voice comment aiming at any resource and sent by a terminal, wherein the voice comment comprises audio information;

performing voice recognition processing on the audio information to obtain text information of the voice comments;

and responding to a comment acquisition request sent by any terminal for the resource, and sending comment information of the resource to the terminal, wherein the comment information comprises text information and picture information of the voice comment so that the terminal can display the text information of the voice comment and display play options of the voice comment based on the picture information.

In some embodiments, before the sending the comment information of the resource to the terminal, the voice comment processing method further includes:

acquiring preset picture information as the picture information of the voice comment; or,

and acquiring voice characteristics of the voice comments based on at least one of the audio information and the text information of the voice comments, and acquiring picture information matched with the voice characteristics from a plurality of picture information to serve as the picture information of the voice comments.

In some embodiments, the obtaining the voice feature of the voice comment based on at least one of the audio information and the text information of the voice comment includes at least one of:

performing first tone recognition on the audio information of the voice comment to obtain a first tone characteristic used for representing gender;

performing second tone recognition on the audio information of the voice comment to obtain second tone characteristics used for representing the age group;

carrying out emotion recognition on the audio information of the voice comment to obtain a first emotion feature;

carrying out emotion recognition on the text information of the voice comment to obtain a second emotion feature;

And carrying out keyword matching on the text information of the voice comment based on a plurality of preset keywords to obtain keywords matched with the text information in the keywords, and taking the obtained keywords as the content characteristics of the voice comment.

In some embodiments, the speech features of the speech comment are a plurality; the obtaining, from a plurality of pieces of picture information, picture information matching the voice feature as picture information of the voice comment includes:

for any one of the plurality of picture information, determining a matching parameter of each of a plurality of voice features of the voice comment and the picture information;

determining matching parameters of the voice comments and the picture information based on the matching parameters of each voice feature and the picture information;

and taking the picture information with the highest matching parameter with the voice comment as the picture information of the voice comment.

According to a third aspect of embodiments of the present disclosure, there is provided a voice comment presentation apparatus including:

a display unit configured to execute a comment display interface that displays any resource in response to a comment viewing operation on the resource;

The display unit is further configured to display text information of the voice comments in a comment display interface of the resource and display play options of the voice comments, wherein the text information comprises characters converted from the voice comments;

and a playing unit configured to perform playing of the voice comment in an audio manner in response to a playing operation of the playing option.

In some embodiments, the presenting unit is configured to execute presenting the play option of the voice comment according to a first display style, where the first display style matches the voice feature of the voice comment.

In some embodiments, the display module is configured to display the play option of the voice comment as an expression picture, a virtual object, or a dynamic picture.

In some embodiments, the voice comment presentation apparatus further includes:

an acquisition unit configured to perform acquisition of picture information corresponding to the voice comment from a server, the picture information being picture information matching a voice feature of the voice comment acquired by the server;

The display unit is configured to display the play options in a picture form based on the picture information.

In some embodiments, the display unit is configured to display the text information of the voice comment according to a second display style;

the display unit is further configured to execute displaying the text comments according to a third display style in the comment display interface of the resource;

wherein the second display style is different from the display style.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a voice comment processing apparatus including:

a receiving unit configured to execute a voice comment for any resource sent by a receiving terminal, wherein the voice comment comprises audio information;

the recognition unit is configured to perform voice recognition processing on the audio information to obtain the text information of the voice comment;

And the sending unit is configured to send comment information of the resource to the terminal in response to receiving a comment acquisition request for the resource sent by any terminal, wherein the comment information comprises text information and picture information of the voice comment so that the terminal can display the text information of the voice comment and display play options of the voice comment based on the picture information.

In some embodiments, the voice comment processing apparatus further includes:

an acquisition unit configured to perform acquisition of preset picture information as the picture information of the voice comment; or,

the acquisition unit is configured to perform acquisition of voice characteristics of the voice comment based on at least one of the audio information and the text information of the voice comment, and acquire picture information matched with the voice characteristics from a plurality of picture information as the picture information of the voice comment.

In some embodiments, the identification unit is configured to perform at least one of:

In some embodiments, the speech features of the speech comment are a plurality; the acquisition unit is configured to execute the step of determining, for any one of the plurality of pieces of picture information, a matching parameter of each of a plurality of voice features of the voice comment and the picture information; determining matching parameters of the voice comments and the picture information based on the matching parameters of each voice feature and the picture information; and taking the picture information with the highest matching parameter with the voice comment as the picture information of the voice comment.

According to a fifth aspect of embodiments of the present disclosure, there is provided a terminal comprising:

A processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the speech comment presentation method as described in the above aspect.

According to a sixth aspect of embodiments of the present disclosure, there is provided a server comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of speech comment processing as described in the above aspect.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, which when executed by a processor of a terminal, causes the terminal to perform the speech comment presentation method as described in the above aspect; alternatively, the instructions in the computer-readable storage medium, when executed by a processor of a server, enable the server to perform the voice comment processing method as described in the above aspect.

According to an eighth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the speech comment exhibiting method as described in the above aspect, or implements the speech comment processing method as described in the above aspect.

In the embodiment of the disclosure, the voice comments are displayed in the form of text plus play options, so that the user can conveniently and clearly know the contents of the voice comments, and the user can quickly and intuitively know the comment contents; if the user is interested in the voice comment, the voice comment can be played in an audio mode, and the interactive effect of the comment can be submitted due to the fact that the audio has higher expressive force and is attractive, and the man-machine interaction efficiency is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a schematic diagram of an implementation environment, shown in accordance with an exemplary embodiment;

FIG. 2 is a flowchart illustrating a method of speech comment presentation according to an exemplary embodiment;

FIG. 3 is a flowchart illustrating a method of speech comment processing according to an exemplary embodiment;

FIG. 4 is a flowchart illustrating a method of speech comment presentation according to an exemplary embodiment;

FIG. 5 is a schematic diagram of a comment presentation interface shown in accordance with an exemplary embodiment;

FIG. 6 is a flowchart illustrating a method of speech comment processing according to an exemplary embodiment;

FIG. 7 is a schematic diagram illustrating a posting of a voice comment according to an exemplary embodiment;

FIG. 8 is a flowchart illustrating a method of speech comment processing according to an exemplary embodiment;

FIG. 9 is a block diagram of a voice comment presentation apparatus according to an exemplary embodiment;

FIG. 10 is a block diagram of a voice comment presentation apparatus according to an exemplary embodiment;

FIG. 11 is a block diagram of a voice comment processing apparatus according to an exemplary embodiment;

FIG. 12 is a block diagram of a voice comment processing apparatus according to an exemplary embodiment;

fig. 13 is a block diagram illustrating a structure of a terminal according to an exemplary embodiment;

fig. 14 is a block diagram illustrating a structure of a server according to an exemplary embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

The information related to the present disclosure may be information authorized by the user or sufficiently authorized by the parties.

The embodiment of the disclosure provides a voice comment display method, which is executed by a terminal. In some embodiments, the terminal is a notebook, cell phone, tablet, or other terminal. The embodiment of the disclosure also provides a voice comment processing method, which is executed by the server. In some embodiments, the server may be a server, or a server cluster composed of several servers, or a cloud computing service center. Of course, the server may also include other functional servers in order to provide more comprehensive and diverse services.

Fig. 1 is a schematic diagram illustrating an implementation environment according to an exemplary embodiment, and as shown in fig. 1, the implementation environment includes a first terminal 101, a second terminal 102, and a server 103, where the first terminal 101 and the second terminal 102 are connected to the server 103 through a wired or wireless network, respectively.

In some embodiments, the first terminal 101 and the second terminal 102 install thereon a target application served by the server 103, through which the first terminal 101 and the second terminal 102 implement functions such as data transmission, message interaction, and the like. Optionally, the target application is a target application in a terminal operating system or a target application provided for a third party. For example, the target application is a community-like application, a multimedia sharing application, or the like, and has a function of browsing resources, comments, but of course, the target application can also have other functions, such as a shopping function, an instant messaging function, a game function, or the like.

In some embodiments, the first terminal 101 is a terminal that posts comments and the second terminal 102 is a terminal that browses comments.

In some embodiments, the first terminal 101 may issue a voice comment on any resource, the first terminal 101 sends the issued voice comment to the server 103, the server 103 receives the voice comment, and updates comment information of the resource based on the voice comment, so that after the first terminal 101 or the second terminal 102 opens a comment presentation interface of the resource, the voice comment may be viewed.

The voice comment display method and the voice comment processing method provided by the disclosure can be applied to a scene of posting a voice comment. The application scenario of the embodiments of the present disclosure is described below.

For example, short video comment scenes:

the user can browse the short video in the short video application and can post a voice comment to the browsed short video. When the short video application displays the voice comment issued by the user, the short video application displays the voice comment in an audio mode instead of displaying the voice message, namely, displays the text converted by the voice comment, and displays the play option of the voice comment if the user clicks the play option.

As another example, a television series comment scene:

the user can watch the TV play in the video application or the video website, and can post the voice comment on the watched TV play. When the video application or the video website displays the voice comment posted by the user, the voice comment is not displayed in a voice message display mode, but is displayed in a text message display mode, namely, the text converted by the voice comment is displayed, the video application or the video website also displays the play option of the voice comment, and if the user clicks the play option, the video application or the video website plays the voice comment in an audio mode.

It should be noted that, the embodiments of the present disclosure are only exemplary descriptions of the above two application scenarios, and are not limited to the application scenarios, and the voice comment method and the voice comment processing method provided by the embodiments of the present disclosure can also be applied to other scenarios, for example, article comment scenarios, and the like.

Fig. 2 is a flowchart illustrating a method of displaying a voice comment, which is performed by a terminal, as shown in fig. 2, according to an exemplary embodiment, including the following steps.

In step 201, the terminal responds to comment viewing operation on any resource, and displays a comment display interface of the resource.

The resource may be any resource transmitted in the internet, and the resource may be any short video work, television show, movie, article, etc., and the embodiments of the present disclosure do not limit the resource.

The comment viewing operation is an operation for viewing comments. In some embodiments, the terminal displays a comment viewing option of the resource, and the comment viewing operation is a trigger operation on the comment viewing option. In some embodiments, the comment viewing operation is any type of operation that is preset, for example, a click operation, a double click operation, a slide operation, a long press operation, or a combination of any one or more of the operations. The comment viewing operation is not limited by the embodiments of the present disclosure.

The comment display interface of the resource is used for displaying comments posted by the user on the resource, can display voice comments, can display text comments, can display comments of the current user, and can display comments of other users.

In step 202, the terminal displays text information of the voice comment in the comment display interface of the resource, and displays a play option of the voice comment, where the text information includes text converted from the voice comment.

The terminal displays text information of the voice comment in a comment display interface of the resource, which means that: in the comment display interface of the resource, the voice comment is not displayed in a voice display mode, but is displayed in a text display mode, namely, the text converted from the voice comment is displayed. For example, the user posts a voice comment "bar click", and when the terminal presents the voice comment in the comment presentation interface, the terminal presents the word "bar click".

Considering that the expressive force of the audio is higher, the embodiment of the disclosure also provides a play option of the voice comment, if the user is interested in the voice comment after seeing the text information of a certain voice comment, the user can click on the play option, and then the audio information of the voice comment can be played.

In some embodiments, the play options of the voice comment may be displayed at a position corresponding to the text information of the voice comment, for example: the relative positional relationship between the text information and the play options is not limited in the present disclosure, such as the rear of the text information, the front of the text information, the lower of the text information, and the upper of the text information.

In step 203, the terminal plays the voice comment in an audio manner in response to a play operation of the play option.

The user puts forward the voice comment by inputting the audio information, so that the terminal can acquire the audio information of the voice comment, and the terminal responds to the playing operation of the playing option to play the voice comment in an audio mode, namely, play the audio information of the voice comment.

According to the voice comment display method provided by the embodiment of the disclosure, the voice comments are displayed in a mode of adding the playing options to the characters, so that a user can conveniently and clearly know the content of the voice comments, and the user can quickly and intuitively know the comment content; if the user is interested in the voice comment, the voice comment can be played in an audio mode, and the interactive effect of the comment can be improved and the human-computer interaction efficiency is improved because the audio has higher expressive force and more attractive force.

Fig. 3 is a flowchart illustrating a method of processing a voice comment, as shown in fig. 3, performed by a server, according to an exemplary embodiment, including the following steps.

In step 301, the server receives a speech comment for any resource sent by the terminal, where the speech comment includes audio information.

It should be noted that, the server may receive any type of comments for the resource, for example, any one or a combination of multiple of voice comments, text comments, video comments, expression comments, and the like, which are sent by the terminal. The embodiments of the present disclosure are merely illustrative of the process of receiving a voice comment and processing the voice comment.

In step 302, the server performs a voice recognition process on the audio information to obtain text information of the voice comment.

The voice recognition process is a process mode of converting audio information into text information, and the server may perform the voice recognition process by adopting any algorithm or any voice recognition model to convert the audio information into the text information, which is not limited in the embodiment of the present disclosure.

In step 303, the server responds to the comment acquisition request for the resource sent by any terminal, and sends comment information of the resource to the terminal, wherein the comment information comprises text information and picture information of a voice comment, so that the terminal displays the text information of the voice comment, and displays play options of the voice comment based on the picture information.

When a user wants to view comments of a certain resource, the user can send a comment acquisition request of the resource to a server through a terminal so that the server can send comment information of the resource to the terminal, and the terminal displays the comments of the resource based on the comment information.

In some embodiments, the comment acquisition request is triggered by a comment viewing option. For example, the display interface of the resource comprises a comment viewing option, and the terminal responds to the triggering operation of the comment viewing option and sends a comment acquisition request for the resource to the server. In other embodiments, the comment acquisition request is triggered by performing a preset operation. For example, the terminal sends a comment acquisition request for the resource to the server in response to receiving a preset operation in the presentation interface of the resource.

In the embodiment of the disclosure, the server converts the audio information of the voice comment into the text information in advance, and when the comment information is issued, the text information and the picture information of the voice comment are issued to the terminal, so that the terminal displays the voice comment in a mode of adding a play option to the text.

According to the voice comment processing method provided by the embodiment of the disclosure, after receiving the voice comment, the audio information of the voice comment is converted into the text information, and when the comment information is issued, the text information and the picture information of the voice comment are issued to the terminal, so that the terminal displays the voice comment in a mode of adding a play option to the text, the user can conveniently and clearly know what the voice comment says, and the user can more quickly and intuitively know the comment content; if the user is interested in the voice comment, the voice comment can be played in an audio mode, and the interactive effect of the comment can be submitted due to the fact that the audio has higher expressive force and is attractive, and the man-machine interaction efficiency is improved.

Fig. 4 is a flowchart illustrating a method of displaying a voice comment, which is performed by a terminal, as shown in fig. 4, according to an exemplary embodiment, including the following steps.

In step 401, the terminal responds to comment viewing operation on any resource, and displays a comment display interface of the resource.

The above step 401 is the same as the above step 201, and will not be described in detail here.

In step 402, the terminal displays text comments according to a third display style, displays text information of the voice comments according to a second display style, and displays a play option of the voice comments according to a first display style, wherein the second display style is different from the third display style, and the first display style is matched with voice features of the voice comments.

In the comment display interface of the resource, the terminal displays not only the voice comment but also the text comment. In the embodiment of the disclosure, the voice comments and the text comments are displayed in text form, so that the terminal can display the voice comments and the text comments according to different display modes in order to facilitate the user to distinguish the voice comments and the text comments. That is, the terminal displays the text information of the voice comment according to the second display style, and displays the text comment according to the third display style, where the second display style is different from the third display style. Because the display style of the text information of the voice comment is different from the display style of the text comment, the user can rapidly distinguish the voice comment from the text comment, and the man-machine interaction efficiency is improved.

In some embodiments, the text information of the voice comment and the display style of the text comment are different and may be embodied in at least one of a font, a font style, a font size, or the like.

Optionally, the text information of the voice comment is different from the font of the text comment. For example, the font of the text information of the voice comment is a regular script, and the font of the text comment is Song Ti. For another example, the font of the text information of the voice comment is a flower, and the font of the text comment is Song Ti.

Optionally, the text information of the voice comment and the font of the text comment are different. For example, the character pattern of the text information of the voice comment is underlined obliquely, and the character pattern of the text comment is conventional. For another example, the word patterns of the text information of the voice comment are thickened, and the word patterns of the text comment are conventional.

Optionally, the text information of the voice comment is different from the font size of the text comment. For example, the word size of the word information of the voice comment is five, and the word size of the word comment is small five.

For example, as shown in fig. 5, the text information of the voice comment is displayed in a display style with a diagonal underline, and the text comment is displayed in a regular display style.

In addition, in order to enable a user to quickly and intuitively learn more information of the voice comment through text information of the voice comment and the play options, in the embodiment of the disclosure, the display style of the play options is matched with the voice characteristics of the voice comment. That is, the speech features of the speech comments are instantiated by the play options. Thus, the user can increase the intuitive knowledge of the voice comments based on the play options displayed by the terminal.

In some embodiments, the speech features include at least one of a first timbre feature for representing gender, a second timbre feature for representing age, an emotional feature, and a content feature. Because the display style of the play options is matched with the voice characteristics, the user can more quickly and intuitively know at least one of the gender, age bracket, emotion, content and the like of the voice comment publisher based on the play options, and the man-machine interaction efficiency is further improved.

In some embodiments, the play option may be presented as an emoticon, a virtual object, a moving picture, or the like. The terminal displays the play options of the voice comments, which comprises the following steps: and displaying the playing options of the voice comments as expression pictures, virtual objects or dynamic pictures.

In the embodiment of the disclosure, the display style of the play option is also matched with the voice feature of the voice comment, so that when the play option of the voice comment is displayed as an expression picture, a virtual object or a dynamic picture, the expression picture, the virtual object or the dynamic picture needs to be matched with the voice feature of the voice comment.

For example, when the voice feature includes the first tone feature, if the first tone feature indicates gender as female, the play option may be displayed as a female emotion figure, or the play option may be displayed as a female avatar, or the play option may be displayed as a pink trumpet, or the play option may be displayed as a dynamic figure of a child speaking. If the first tone characteristic indicates that the sex is male, the playing option can be displayed as a male expression picture, or the playing option can be displayed as a male virtual character, or the playing option can be displayed as a blue loudspeaker, or the playing option can be displayed as a dynamic picture of the talking of a boy.

For another example, when the voice feature includes the second tone feature, if the second tone feature indicates that the age group is a child, the play option may be displayed as a child expression picture, or the play option may be displayed as a virtual character belonging to the child at the age group, or the play option may be displayed as a dynamic picture of the child speaking.

As another example, when the speech feature includes an emotional feature, if the emotional feature indicates that the emotion is happy, the play option may be displayed as a picture for indicating happy expression, or the play option may be displayed as a virtual character whose expression is happy, or the play option may be displayed as a dynamic picture for indicating happy expression. If the emotional characteristic indicates emotion as anger, the play option may be displayed as an expressive picture for indicating anger, or the play option may be displayed as a virtual character whose expression is anger, or the play option may be displayed as a dynamic picture for indicating anger.

For another example, when the voice feature includes the content feature, if the content feature is "praise", the play option may be displayed as an expressive picture for indicating praise, or the play option may be displayed as a virtual character whose action is praise, or the play option may be displayed as a dynamic picture for indicating praise.

Of course, in order to show the playing characteristics of the playing options, each displayed playing option includes a horn image. For example, when the playing option is displayed as an expression picture, the picture content of the expression picture is the expression of a handheld loudspeaker. As another example, as shown in fig. 5, the play option is presented as a cartoon girl holding a horn.

In some embodiments, the terminal displays the play option based on the picture information obtained from the server. The voice comment display method further comprises the following steps: and acquiring picture information corresponding to the voice comments from the server, wherein the picture information is picture information matched with the voice characteristics of the voice comments and acquired by the server. The terminal displays the play options of the voice comments, which comprises the following steps: based on the picture information, the play options are presented in the form of pictures. The timing of the terminal obtaining the picture information corresponding to the voice comment from the server may be the timing of the terminal pulling the comment display interface from the server based on the comment viewing operation.

Another point to be described is that, in the embodiment of the present disclosure, only "according to the third display style, the text comment is displayed, and according to the second display style, the text information of the voice comment is displayed" as an example, and the text information of the voice comment and the display style of the text comment are described by way of example. In another embodiment, the text information of the voice comment and the display style of the text comment may be the same, and the user distinguishes the voice comment and the text comment according to whether there is a play option.

Another point to be described is that, in the embodiment of the present disclosure, only "according to the first display style, the play option of the voice comment is displayed, where the first display style matches with the voice feature of the voice comment" is taken as an example to exemplarily describe the display style of the play option, and in another embodiment, the terminal may also display the play option according to the fixed display style, where the display style of the play option is not limited in the embodiment of the present disclosure.

In step 403, the terminal plays the voice comment in an audio manner in response to a play operation on the play option.

The terminal plays the voice comment in an audio mode, namely, plays the audio information of the voice comment. In some embodiments, the terminal responds to comment viewing operation on the resource, and pulls a comment display interface of the resource from the server, in the process of pulling the comment display interface of the resource, the server issues audio information of the voice comment to the terminal, and the subsequent terminal responds to playing operation on a playing option, and locally acquires the audio information of the voice comment to play. Of course, the terminal may also pull the audio information of the voice comment from the server when the user needs to play the voice comment. That is, the terminal responds to the triggering operation of the playing options, and sends an audio acquisition request to the server, wherein the audio acquisition request carries comment identification of the voice comments, so that the server issues the audio information of the voice comments to the terminal based on the comment identification. The terminal receives the audio information sent by the server and plays the audio information.

In addition, the fact that the voice comments and the text comments are displayed in the text mode is considered, and therefore the user can hardly distinguish the two comments, and the text information and the text comments of the voice comments are displayed according to different display modes by the terminal, so that the user can distinguish the voice comments and the text comments more quickly and intuitively, and the man-machine interaction efficiency is improved.

And when the terminal displays the play options, the display style of the play options is matched with the voice characteristics of the voice comments. Therefore, the user can acquire the information of more voice comments more quickly and intuitively through the display mode of the play options, and the man-machine interaction efficiency is improved.

Fig. 6 is a flowchart illustrating a voice comment processing method, as shown in fig. 6, performed by a server, according to an exemplary embodiment, including the following steps.

In step 601, the server receives a speech comment for any resource sent by the terminal, where the speech comment includes audio information.

In some embodiments, as shown in fig. 7, the terminal displays a comment display interface of any resource, where the comment display interface includes a voice comment input option 701, and a user may press the voice comment input option 701 for a long time to input a voice comment, after the user releases the voice comment input option 701 or the voice input duration reaches the maximum voice comment duration, the recording of the voice comment is completed, and the terminal may automatically send the recorded voice comment to the server, or may send the recorded voice comment to the server after the user confirms to send the voice comment. The embodiments of the present disclosure are not limited in this regard.

In step 602, the server performs a voice recognition process on the audio information to obtain text information of the voice comment.

After receiving the voice comments, the server performs voice recognition on the audio information of the voice comments so as to convert the voice comments into characters. The server may perform speech recognition processing on the audio information by using any algorithm or any speech recognition model, which is not limited in the embodiments of the present disclosure.

In step 603, the server obtains a voice feature of the voice comment based on at least one of the audio information and the text information of the voice comment.

In some embodiments, the server obtains the voice characteristics of the voice comment based on at least one of the audio information and the text information of the voice comment, including at least one of:

(1) And carrying out first tone recognition on the audio information of the voice comments to obtain a first tone characteristic used for representing gender.

In some embodiments, the server may perform tone recognition on the audio information of the voice comment through a gender recognition model. Optionally, the gender identification model is used for identifying the gender corresponding to the audio information based on the tone information in the audio information. Optionally, the gender recognition model is trained based on the sample audio information and the sample gender corresponding to the sample audio information.

(2) And carrying out second tone recognition on the audio information of the voice comments to obtain second tone characteristics used for representing the age group.

In some embodiments, the server may perform tone recognition on the audio information of the voice comment through the age-segment recognition model. Optionally, the age group identification model is used for identifying the age group corresponding to the audio information based on tone color information in the audio information. Optionally, the age group identification model is trained based on the sample audio information and a sample age group corresponding to the sample audio information.

(3) And carrying out emotion recognition on the audio information of the voice comments to obtain a first emotion feature.

In some embodiments, the server may emotion-identify the audio information of the voice comment through an emotion-identifying model. Optionally, the emotion recognition model is used for recognizing emotion corresponding to the audio information based on tone information in the audio information. Optionally, the emotion recognition model is trained based on the sample audio information and sample emotion information corresponding to the sample audio information.

(4) And carrying out emotion recognition on the text information of the voice comments to obtain second emotion characteristics.

In some embodiments, the server may perform emotion recognition on the text information of the voice comment through an emotion recognition model. Optionally, the emotion recognition model is used for recognizing emotion corresponding to the text information based on semantic information of the text information. Optionally, the emotion recognition model is trained based on the sample text information and the sample emotion information corresponding to the sample text information.

In other embodiments, the server may perform keyword matching on the text information of the voice comment through a correspondence between the keywords and the emotions, and determine the emotion as the second emotion feature when the text information of the voice comment includes a keyword corresponding to any emotion.

(5) And carrying out keyword matching on the text information of the voice comment based on a plurality of preset keywords to obtain keywords matched with the text information in the keywords, and taking the obtained keywords as the content characteristics of the voice comment.

The preset keywords may include: praise, true stick, stick-to-stick, etc. words that can express the reviewer's main ideas. When any keyword is included in the text information, or when a word similar to the keyword is included, the keyword may be used as a content feature of the voice comment.

In step 604, the server acquires picture information matching the voice feature from the plurality of picture information as picture information of the voice comment.

The plurality of pieces of picture information in step 604 may be preset pieces of picture information, and the plurality of pieces of picture information may be pieces of picture information of different expressions, pieces of picture information of different virtual objects, and pieces of picture information of different dynamic pictures, which are not limited in the embodiment of the present disclosure.

As can be seen from the description of step 603, there may be one or a plurality of speech features of the speech comment. In some embodiments, the voice feature of the voice comment is one, and the server obtains, from a plurality of pieces of picture information, picture information matching the voice feature as the picture information of the voice comment, including: the server acquires the matching parameters of each piece of picture information and the voice feature, and takes the picture information with the highest matching parameters as the picture information of the voice comment.

In other embodiments, the speech features of the speech comment are multiple; the server acquires picture information matched with the voice characteristics from a plurality of picture information as picture information of voice comments, and the method comprises the following steps: for any picture information in the plurality of picture information, determining a matching parameter of each voice feature in the plurality of voice features of the voice comment and the picture information; determining matching parameters of the voice comments and the picture information based on the matching parameters of each voice feature and the picture information; and taking the picture information with the highest matching parameter with the voice comment as the picture information of the voice comment.

The matching parameters of the voice feature and the picture information are used for representing the matching degree of the voice feature and the picture information. Similarly, the matching parameters of the voice comment and the picture information are used for indicating the matching degree of the voice comment and the picture information.

It should be noted that, in the embodiment of the present disclosure, only the server obtains, as the image information of the voice comment, the image information matched with the voice feature from the plurality of image information, and an example of the manner of obtaining the image information is described. In yet another embodiment, the play options for different voice comments are the same. The voice comment processing method further comprises the following steps: and acquiring preset picture information as the picture information of the voice comment.

In step 605, the server responds to the comment acquisition request for the resource sent by any terminal, and sends comment information of the resource to the terminal, wherein the comment information comprises text information and picture information of the voice comment, so that the terminal displays the text information of the voice comment and displays play options of the voice comment based on the picture information.

Note that, the comment information may or may not include audio information of a voice comment. When the comment information comprises the audio information of the voice comment, the terminal responds to the triggering operation of the playing option of the voice comment, and the audio information of the voice comment can be obtained locally and played. When the comment information does not include the audio information of the voice comment, the terminal responds to the triggering operation of the playing option of the voice comment, and an audio acquisition request can be sent to the server, wherein the audio acquisition request carries the comment identification of the voice comment, so that the server acquires the audio information of the voice comment based on the comment identification and sends the acquired audio information to the terminal. And the terminal receives the audio information and plays the audio information.

Another point to be noted is that the server in the embodiment of the present disclosure may be a server cluster, for example, the server includes: the system comprises a comment server, an audio-to-text server, a comment materialization strategy server and a comment storage server.

For example, as shown in fig. 8, the terminal uploads the audio information of the voice comment to the comment server. After receiving the voice comment uploaded by the terminal, the comment server sends the audio information of the voice comment to the audio text-to-text server, the audio text-to-text server converts the audio information into text information, and the converted text information is returned to the comment server. And then, the comment server sends at least one of the audio information and the text information to a comment imaging strategy server, the comment imaging strategy server acquires matched picture information based on at least one of the audio information and the text information, the acquired picture information is returned to the comment server, and the comment server correspondingly stores the audio information, the text information and the picture information of the voice comment in a comment storage server. The comment server also transmits the text information and the picture information of the voice comment to the terminal so that the terminal updates a comment display interface, and the voice comment just published by the user is displayed in the comment display interface.

According to the voice comment processing method provided by the embodiment of the disclosure, when the voice comment is displayed by the terminal, the terminal is displayed in a text form, so that a user can conveniently and clearly know what the voice comment says, and the user can quickly and intuitively know the comment content; if the user is interested in the voice comment, the voice comment can be played in an audio mode, and the interactive effect of the comment can be submitted due to the fact that the audio has higher expressive force and is attractive, and the man-machine interaction efficiency is improved.

Fig. 9 is a block diagram illustrating a structure of a voice comment presentation apparatus according to an exemplary embodiment.

Referring to fig. 9, the voice comment exhibiting apparatus includes:

a display unit 901 configured to perform a comment display interface that displays any resource in response to a comment viewing operation on the resource;

the display unit 901 is further configured to execute displaying text information of a voice comment in a comment display interface of the resource, and displaying a play option of the voice comment, where the text information includes a text converted from the voice comment;

a playing unit 902 configured to perform playing of the voice comment in an audio manner in response to a playing operation of the playing option.

In some embodiments, the presenting unit 901 is configured to execute presenting the play option of the voice comment according to a first display style, where the first display style matches the voice feature of the voice comment.

In some embodiments, the display unit 901 is configured to display the playing options of the voice comment as an expression picture, a virtual object, or a moving picture.

As shown in fig. 10, in some embodiments, the voice comment presentation apparatus further includes:

an obtaining unit 903 configured to obtain, from a server, picture information corresponding to the voice comment, the picture information being picture information that matches a voice feature of the voice comment and is obtained by the server;

the display unit 901 is configured to display the play options in a picture form based on the picture information.

In some embodiments, the display unit 901 is configured to perform displaying the text information of the voice comment according to the second display style;

The display unit 901 is further configured to execute displaying text comments according to a third display style in a comment display interface of the resource;

wherein the second display style is different from the display style.

The specific manner in which each unit performs the operation has been described in detail in the embodiments of the related method with respect to the voice comment presentation apparatus in the above embodiments, and will not be described in detail here.

Fig. 11 is a block diagram showing a structure of a voice comment processing apparatus according to an exemplary embodiment. Referring to fig. 11, the voice comment processing apparatus includes:

a receiving unit 1101 configured to execute a voice comment for any resource transmitted by a receiving terminal, the voice comment including audio information;

a recognition unit 1102 configured to perform a voice recognition process on the audio information to obtain text information of the voice comment;

The sending unit 1103 is configured to send comment information of the resource to any terminal in response to receiving a comment acquisition request for the resource sent by the terminal, where the comment information includes text information and picture information of the voice comment, so that the terminal displays the text information of the voice comment, and displays a play option of the voice comment based on the picture information.

As shown in fig. 12, in some embodiments, the voice comment processing apparatus further includes:

an acquisition unit 1104 configured to perform acquisition of preset picture information as the picture information of the voice comment; or,

the obtaining unit 1104 is configured to obtain a voice feature of the voice comment based on at least one of the audio information and the text information of the voice comment, and obtain, from a plurality of pieces of picture information, picture information matching the voice feature as the picture information of the voice comment.

In some embodiments, the identification unit 1102 is configured to perform at least one of:

In some embodiments, the speech features of the speech comment are a plurality; the acquiring unit 1104 is configured to perform determining, for any one of the plurality of pieces of picture information, a matching parameter of each of a plurality of voice features of the voice comment and the picture information; determining matching parameters of the voice comments and the picture information based on the matching parameters of each voice feature and the picture information; and taking the picture information with the highest matching parameter with the voice comment as the picture information of the voice comment.

With respect to the voice comment processing apparatus in the above-described embodiment, the specific manner in which each unit performs an operation has been described in detail in the embodiment of the related method, and will not be explained in detail here.

Fig. 13 is a block diagram illustrating a structure of a terminal according to an exemplary embodiment. In some embodiments, terminal 1300 includes: desktop computers, notebook computers, tablet computers, smart phones or other terminals, etc. Terminal 1300 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, etc.

In general, the terminal 1300 includes: a processor 1301, and a memory 1302.

In some embodiments, processor 1301 includes one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. In some embodiments, processor 1301 is implemented in hardware in at least one of a DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). In some embodiments, processor 1301 also includes a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, processor 1301 is integrated with a GPU (Graphics Processing Unit, image processor) for taking care of rendering and rendering of content that the display screen is required to display. In some embodiments, the processor 1301 also includes an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

In some embodiments, memory 1302 includes one or more computer-readable storage media that are non-transitory. In some embodiments, memory 1302 also includes high-speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1302 is used to store executable instructions for execution by processor 1301 to implement the speech comment presentation methods provided by the method embodiments in the present disclosure.

In some embodiments, the terminal 1300 may further optionally include: a peripheral interface 1303 and at least one peripheral. In some embodiments, processor 1301, memory 1302, and peripheral interface 1303 are connected by a bus or signal line. In some embodiments, each peripheral device is connected to peripheral device interface 1303 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1304, a display screen 1305, a camera assembly 1306, audio circuitry 1307, a positioning assembly 1308, and a power supply 1309.

A peripheral interface 1303 may be used to connect I/O (Input/Output) related at least one peripheral to the processor 1301 and the memory 1302. In some embodiments, processor 1301, memory 1302, and peripheral interface 1303 are integrated on the same chip or circuit board; in some other embodiments, any one or both of processor 1301, memory 1302, and peripheral interface 1303 are implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 1304 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 1304 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1304 converts an electrical signal to an electromagnetic signal for transmission, or converts a received electromagnetic signal to an electrical signal. In some embodiments, the radio frequency circuit 1304 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. In some embodiments, the radio frequency circuit 1304 communicates with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuit 1304 further includes NFC (Near Field Communication ) related circuits, which are not limited by the present disclosure.

The display screen 1305 is used to display a UI (User Interface). In some embodiments, the UI includes graphics, text, icons, video, and any combination thereof. When the display 1305 is a touch display, the display 1305 also has the ability to capture touch signals at or above the surface of the display 1305. In some embodiments, the touch signal is input to the processor 1301 as a control signal for processing. At this time, the display screen 1305 is also used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 1305 is one and is disposed on the front panel of the terminal 1300; in other embodiments, the display 1305 is at least two, and is disposed on different surfaces of the terminal 1300 or in a folded design; in other embodiments, display 1305 is a flexible display disposed on a curved surface or a folded surface of terminal 1300. Even further, the display screen 1305 is also arranged in an irregular pattern other than rectangular, i.e. a shaped screen. In some embodiments, the display screen 1305 is made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 1306 is used to capture images or video. In some embodiments, camera assembly 1306 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 1306 also includes a flash. In some embodiments, the flash is a single color temperature flash, and in some embodiments, the flash is a dual color temperature flash. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and is used for light compensation under different color temperatures.

In some embodiments, the audio circuit 1307 comprises a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 1301 for processing, or inputting the electric signals to the radio frequency circuit 1304 for voice communication. For purposes of stereo acquisition or noise reduction, in some embodiments, the microphones are provided in a plurality and are disposed at different portions of the terminal 1300. In some embodiments, the microphone is an array microphone or an omni-directional pickup microphone. The speaker is then used to convert electrical signals from the processor 1301 or the radio frequency circuit 1304 into sound waves. In some embodiments, the speaker is a conventional thin film speaker, and in some embodiments, the speaker is a piezoceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only an electric signal but also an acoustic wave audible to humans can be converted into an acoustic wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 1307 also comprises a headphone jack.

The location component 1308 is used to locate the current geographic location of the terminal 1300 to enable navigation or LBS (Location Based Service, location-based services). In some embodiments, the positioning component 1307 is a positioning component based on the united states GPS (Global Positioning System ), the beidou system of china, the grainer positioning system of russia, or the galileo system of the european union.

A power supply 1309 is used to power the various components in the terminal 1300. In some embodiments, the power supply 1309 is an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power supply 1309 comprises a rechargeable battery, the rechargeable battery is a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery is also used to support fast charge technology.

In some embodiments, terminal 1300 also includes one or more sensors 1310. The one or more sensors 1310 include, but are not limited to: acceleration sensor 1311, gyroscope sensor 1312, pressure sensor 1313, optical sensor 1314, and proximity sensor 1315.

In some embodiments, acceleration sensor 1311 detects the magnitude of acceleration on three coordinate axes of the coordinate system established with terminal 1300. For example, the acceleration sensor 1311 is configured to detect components of gravitational acceleration on three coordinate axes. In some embodiments, processor 1301 controls display screen 1305 to display a user interface in either a landscape view or a portrait view based on gravitational acceleration signals acquired by acceleration sensor 1311. In some embodiments, acceleration sensor 1311 is also used for the acquisition of motion data of a game or user.

In some embodiments, the gyro sensor 1312 detects the body direction and the rotation angle of the terminal 1300, and the gyro sensor 1312 and the acceleration sensor 1311 cooperate to collect 3D actions of the user on the terminal 1300. Processor 1301 can implement the following functions based on the data collected by gyro sensor 1312: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

In some embodiments, pressure sensor 1313 is disposed on a side frame of terminal 1300 and/or below display screen 1305. When the pressure sensor 1313 is disposed at a side frame of the terminal 1300, a grip signal of the terminal 1300 by a user can be detected, and the processor 1301 performs left-right hand recognition or quick operation according to the grip signal collected by the pressure sensor 1313. When the pressure sensor 1313 is disposed at the lower layer of the display screen 1305, the processor 1301 realizes control of the operability control on the UI interface according to the pressure operation of the user on the display screen 1305. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The optical sensor 1314 is used to collect ambient light intensity. In one embodiment, processor 1301 controls the display brightness of display screen 1305 based on the intensity of ambient light collected by optical sensor 1314. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 1305 is turned up; when the ambient light intensity is low, the display brightness of the display screen 1305 is turned down. In another embodiment, processor 1301 also dynamically adjusts the shooting parameters of camera assembly 1306 based on the intensity of ambient light collected by optical sensor 1314.

A proximity sensor 1315, also referred to as a distance sensor, is typically provided on the front panel of the terminal 1300. The proximity sensor 1315 is used to collect the distance between the user and the front of the terminal 1300. In one embodiment, when proximity sensor 1315 detects a gradual decrease in the distance between the user and the front of terminal 1300, processor 1301 controls display screen 1305 to switch from a bright screen state to a inactive screen state; when the proximity sensor 1315 detects that the distance between the user and the front surface of the terminal 1300 gradually increases, the processor 1301 controls the display screen 1305 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 13 is not limiting of terminal 1300 and can include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

Fig. 14 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1400 may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPU) 1401 and one or more memories 1402, where at least one computer program is stored in the memories 1402, and the at least one computer program is loaded and executed by the processor 1401 to implement the method for processing a voice comment according to the above embodiments. Of course, the server may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

In an exemplary embodiment, a computer readable storage medium is also provided, e.g. a memory comprising instructions executable by a processor of a terminal to perform the speech comment presentation method or the speech comment processing method in the above method embodiments. In some embodiments, the computer readable storage medium may be ROM (Read-Only Memory), RAM (Random Access Memory ), CD-ROM (Compact Disc Read-Only Memory, compact disc Read Only), magnetic tape, floppy disk, optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, which comprises a computer program which, when executed by a processor, implements the speech comment presentation method or the speech comment processing method in the above-described method embodiments.

In some embodiments, the computer program related to the embodiments of the present disclosure may be deployed to be executed on one electronic device or on a plurality of electronic devices located at one site, or alternatively, on a plurality of electronic devices distributed at a plurality of sites and interconnected by a communication network, where a plurality of electronic devices distributed at a plurality of sites and interconnected by a communication network may constitute a blockchain system. The electronic device may be provided as a terminal.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. The voice comment display method is characterized by comprising the following steps of:

2. The method for displaying a voice comment according to claim 1, wherein the displaying a play option of the voice comment includes:

3. The method of claim 2, wherein the voice features include at least one of a first tone feature for representing gender, a second tone feature for representing age, an emotional feature, and a content feature.

4. The method for displaying a voice comment according to claim 1, wherein the displaying a play option of the voice comment includes:

5. The speech comment presentation method according to any one of claims 2 to 4, further comprising:

6. The method for displaying voice comments according to claim 1, wherein the text information for displaying voice comments comprises:

the voice comment display method further comprises the following steps:

wherein the second display style is different from the display style.

7. The method for presenting a voice comment according to claim 6,

8. The voice comment processing method is characterized by comprising the following steps of:

9. The voice comment processing method of claim 8 wherein prior to the sending of comment information for the resource to the terminal, the voice comment processing method further comprises:

10. The method for processing a voice comment according to claim 9, wherein the obtaining the voice feature of the voice comment based on at least one of the audio information and the text information of the voice comment includes at least one of:

11. The method for processing a voice comment according to claim 9, wherein the voice feature of the voice comment is plural; the obtaining, from a plurality of pieces of picture information, picture information matching the voice feature as picture information of the voice comment includes:

12. A speech comment presentation apparatus characterized in that the speech comment display apparatus includes:

13. A speech comment processing apparatus, characterized by comprising:

14. A terminal, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the speech comment presentation method of any one of claims 1 to 7.

15. A server, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the speech comment processing method of any one of claims 8 to 11.

16. A computer readable storage medium, wherein instructions in the computer readable storage medium, when executed by a processor of a terminal, enable the terminal to perform the speech comment presentation method of any one of claims 1 to 7; alternatively, the instructions in the computer-readable storage medium, when executed by a processor of a server, enable the server to perform the speech comment processing method of any one of claims 8 to 11.

17. A computer program product comprising a computer program which, when executed by a processor, implements the speech comment presentation method of any one of claims 1 to 7 or implements the speech comment processing method of any one of claims 8 to 11.