CN107516533A

CN107516533A - A kind of session information processing method, device, electronic equipment

Info

Publication number: CN107516533A
Application number: CN201710555403.6A
Authority: CN
Inventors: 周琪
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2017-07-10
Filing date: 2017-07-10
Publication date: 2017-12-26
Also published as: TWI671739B; WO2019011185A1; TW201909171A

Abstract

This specification embodiment discloses a kind of session information processing method, device, electronic equipment.Methods described includes：Pending voice messaging, the emotional information of the identification pending voice messaging expression can be obtained.Then corresponding ornamental arrangement information can be transferred according to the information content including the emotional content, and then the display carrier of the pending voice messaging is rendered using the ornamental arrangement information, such as audio-frequency information bar, the audio-frequency information bar after then displaying renders.

Description

Session information processing method and device and electronic equipment

Technical Field

The embodiment of the specification belongs to the technical field of information data processing, and particularly relates to a session information processing method, a session information processing device and electronic equipment.

Background

With the rapid development of the internet and intelligent terminals, the chat interaction modes of users are also diversified. Such as text, voice, video chat, etc. in instant messaging software.

The voice chat is one of the chat modes which are frequently used and important for current users, and mainly carries out information interaction between two parties or more parties by taking audio as an information carrier. Because of the carrier characteristics of the audio information, the language information in the voice chat can not be directly read like text characters, and the information content can be quickly perceived through vision. In the conversation chat, the audio information needs to be played to know the specific information content or meaning, for example, in the conversation interface shown in fig. 1, the language information of the voice chat is presented in a single bar style in the conversation, and the user needs to click and play the voice information to obtain the meaning. For the user, the audio information displayed in the conversation interface is equivalent to a black box, and the user does not know the information content expressed by the audio information at all, and usually needs to play and listen by clicking the audio information. Therefore, a session information processing method with more intuitive voice information content display, higher interest and better user experience is needed.

Disclosure of Invention

Embodiments of the present disclosure provide a method, an apparatus, and an electronic device for processing session information, which can perform additional description or decoration on an audio message body by analyzing keywords or emotional information in an audio, so that the content of the audio message can be visually displayed, thereby improving the interest of chat and improving user experience.

The session information processing method, device and electronic device provided by the embodiments of the present specification are implemented in the following manners:

a method of session information processing, the method comprising:

acquiring voice information to be processed, and identifying information content of the voice information to be processed, wherein the information content at least comprises emotion information expressed by the voice information to be processed;

acquiring decoration configuration information matched with the information content;

rendering the display carrier of the voice information to be processed by utilizing the decoration configuration information;

and displaying the rendered display carrier.

A method of session information processing, the method comprising:

acquiring voice information to be processed, and identifying information content of the voice information to be processed, wherein the information content at least comprises keyword information contained in the voice information to be processed;

and displaying the rendered display carrier.

A method of session information processing, the method comprising:

acquiring voice information to be processed, and identifying the information content of the voice information to be processed, wherein the information content at least comprises one of keyword information contained in the voice information to be processed and emotion information expressed by the voice information to be processed;

and displaying the rendered display carrier.

A method of session information processing, the method comprising:

acquiring session information to be processed, and identifying the information content of the session information to be processed, wherein the information content at least comprises one of keyword information contained in the session information to be processed and emotion information expressed by the session information to be processed;

acquiring decoration configuration information matched with the identified information content;

rendering the display carrier of the session information to be processed by utilizing the decoration configuration information;

and displaying the rendered display carrier.

A session information processing apparatus, the apparatus comprising:

the emotion recognition module is used for acquiring voice information to be processed and recognizing the information content of the voice information to be processed, wherein the information content at least comprises emotion information expressed by the voice information to be processed;

the decoration matching module is used for acquiring decoration configuration information matched with the information content;

the rendering module is used for rendering the display carrier of the voice information to be processed by utilizing the decoration configuration information;

and the effect display module is used for displaying the rendered display carrier.

A session information processing apparatus, the apparatus comprising:

the keyword recognition module is used for acquiring voice information to be processed and recognizing the information content of the voice information to be processed, wherein the information content at least comprises keyword information contained in the voice information to be processed;

A session information processing apparatus, the apparatus comprising:

the information content identification module is used for acquiring voice information to be processed and identifying the information content of the voice information to be processed, wherein the information content at least comprises one of keyword information contained in the voice information to be processed and emotion information expressed by the voice information to be processed;

A session information processing apparatus, the apparatus comprising:

the information content identification module is used for acquiring the session information to be processed and identifying the information content of the session information to be processed, wherein the information content at least comprises one of keyword information contained in the session information to be processed and emotion information expressed by the session information to be processed;

the decoration matching module is used for acquiring decoration configuration information matched with the identified information content;

the rendering module is used for rendering the display carrier of the session information to be processed by utilizing the decoration configuration information;

A session information processing apparatus comprising a processor and a memory for storing processor-executable instructions that when executed by the processor implement:

and displaying the rendered display carrier.

An electronic device comprising at least one processor and a memory storing processor-executable instructions for a user, the instructions when executed by the processor implementing:

and displaying the rendered display carrier.

One or more embodiments of the present specification provide a session information processing method, apparatus, and electronic device, which may acquire to-be-processed voice information and identify emotion information expressed by the to-be-processed voice information. Then, corresponding decoration configuration information can be called according to the information content including the emotional content, and then a display carrier (for example, an audio information strip) of the voice information to be processed is rendered and displayed by using the decoration configuration information. Therefore, the display carrier is displayed after being rendered according to the emotion expressed by the voice information, so that the content of the voice information can be expressed or partially expressed, and the interestingness of the voice information in chatting is increased. By using the implementation scheme provided by the embodiment of the specification, the audio message body can be additionally described or decorated by analyzing the hotwords or the feelings in the audio, so that the content of the audio message can be visually displayed, the chatting interest is improved, and the user experience is improved.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a diagram of a conventional conversation interface showing a voice message body;

fig. 2 is a schematic flowchart of an embodiment of a session information processing method provided in this specification;

FIG. 3 is a schematic diagram illustrating a scenario of an embodiment of a method according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a scenario of another embodiment of a method according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating a scenario of another embodiment of a method according to an embodiment of the present disclosure;

FIG. 6 is a schematic method flow diagram of another embodiment of the method provided herein;

FIG. 7 is a schematic process flow diagram in an application scenario of another embodiment of the method provided in the present specification;

FIG. 8 is a schematic method flow diagram of another embodiment of the method provided herein;

FIG. 9 is a block diagram of an embodiment of a session information processing apparatus provided in the present specification;

FIG. 10 is a block diagram of another embodiment of a session information processing apparatus provided in the present specification;

FIG. 11 is a block diagram illustrating an exemplary embodiment of the keyword recognition module in the apparatus provided in the present specification;

FIG. 12 is a block diagram of another embodiment of the apparatus provided herein;

FIG. 13 is a block diagram of another embodiment of the apparatus provided herein;

fig. 14 is a schematic structural diagram of an electronic device provided in this specification.

Detailed Description

In order to make the technical solutions in the present specification better understood, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the drawings in one or more embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the specification, and not all embodiments. All other embodiments obtained by a person skilled in the art based on one or more embodiments of the present disclosure without any inventive step should fall within the scope of the protection of the embodiments of the present disclosure.

Fig. 2 is a schematic flowchart of an embodiment of a session information processing method provided in this specification. Although the present specification provides the method steps or apparatus structures as shown in the following examples or figures, more or less steps or modules may be included in the method or apparatus structures based on conventional or non-inventive efforts. In the case of steps or structures which do not logically have the necessary cause and effect relationship, the execution order of the steps or the block structure of the apparatus is not limited to the execution order or the block structure shown in the embodiments or the drawings of the present specification. When the described method or module structure is applied to a device, a server or an end product in practice, the method or module structure according to the embodiment or the figures may be executed sequentially or in parallel (for example, in a parallel processor or multi-thread processing environment, or even in an implementation environment including distributed processing and server clustering).

Embodiments provided in the embodiments of the present disclosure may be used in various terminal devices with a session function, such as a mobile intelligent terminal, a vehicle-mounted device, a PC (personal computer), a wearable device, a server of a business system, and the like. The terminal device may be installed or constructed with an instant messaging application, a temporary session, and the like, and the embodiments provided by the embodiments of the present specification may be used in a session application scenario including the terminal device. The embodiment of the specification can identify keywords or emotions of the voice information and then display the voice information on the audio information display carrier. The method specifically comprises the step of judging the emotion of a voice message sender according to a recognition algorithm by analyzing a single piece or multiple pieces of voice messages. And then, secondary rendering can be performed on the style of the display carrier of the audio information according to the emotion information (such as changing the color of a message body, adding an Emoji expression head and the like), so that the chat interestingness and the user experience are greatly improved, and the expression information content of the audio information can be displayed or partially displayed. In a specific embodiment, as shown in fig. 2, in an embodiment of a session information processing method provided in this specification, the method may include:

s2: acquiring voice information to be processed, and identifying information content of the voice information to be processed, wherein the information content at least comprises emotion information expressed by the voice information to be processed.

In this embodiment, after a user inputs a piece of voice information in the session interface, the voice information may be acquired as the to-be-processed voice information for performing session information processing in this embodiment. The voice information input by the user may be further processed by the processing, and then rendered by using the decoration configuration information as described below and then displayed in the session interface, or may be displayed by first displaying a message body of the voice information (which may refer to a display carrier containing the voice information, such as an audio information bar in the session interface), and then rendering the message body and displaying an animation effect.

In an application scenario of this embodiment, after obtaining the voice information to be processed, information content in the voice information to be processed may be identified, where the identified information may at least include emotion information expressed by the voice information to be processed. The recognized emotional information may be specifically represented by a characteristic word or identifier, such as "happy", "sad", "surprised", or the like. The emotion information identification mode in the voice information to be processed can be processed by adopting various algorithms or models. In this embodiment, the analysis processing of the language information may include extracting audio information features, and the emotion of the speaker, such as joy, anger, and the like, may be identified through the identification algorithm. In an embodiment of the method provided in the embodiments of the present specification, the identifying information content of the to-be-processed speech information may include:

s20: and extracting audio characteristic data of the voice information to be processed, inputting the audio characteristic data into a pre-constructed emotion recognition model, and determining the emotion category to which the voice information to be processed belongs through the emotion recognition model.

In a specific implementation manner, feature signals such as speech rate, pitch, frequency and the like can be extracted from digitized audio signals, modeling can be performed on the feature information, model training can be performed through a plurality of samples, and models can be optimized. After the model training is finished, the model can be used as a classifier to classify the emotion reflected by the audio signal of the newly input voice information to be processed and determine the emotion information expressed by the voice information to be processed, such as joy and anger. The emotion recognition model can comprise a selected recognition algorithm or an algorithm which is obtained by training a network model through sample data and can recognize emotion classification to which the audio information belongs. For example, a KNN (K-nearest neighbor) classification algorithm, or a deep/convolutional neural network and its transformed/variant neural network may be selected.

In one or more embodiments provided in this specification, the extracted audio feature data includes at least one of emotion feature data and mood feature data of a living being of the speech information to be processed.

The emotional characteristic data of the creatures can comprise emotions such as anger, sadness, joy and the like, and the tone characteristic data can comprise characteristic data on tone states such as question, statement, question reversal, exclamation and the like. In one embodiment, extracting the audio feature data may include feature data that may reflect emotional aspects of the person, such as reflecting the emotion of the person based on the speed, pitch, frequency, and the like, in a specific example, the speed of the speech is fast, the pitch is high, and the emotion of the speaker is higher, whereas the speed of the speech is slow, the pitch is low, and the emotion of the speaker is lower. In other embodiments, the mood characteristic data may be obtained from a grammatical point of view as the audio characteristic data. E.g. from among the pending speech information, to decide whether the speaker is distressed or exclamated. Of course, the two methods may be combined, and the emotion of the inventor is judged by the extracted tone feature data and the order of the subject and the predicate. The type of the corresponding audio feature data to be extracted can be specifically set according to the application environment or the data processing requirement and the like.

S4: and acquiring decoration configuration information matched with the information content.

In an embodiment of the present specification, a decoration configuration database may be preset, and the decoration configuration database may store decoration configuration information for decorating a display carrier of audio information, such as color change of an audio information strip. The decoration configuration database can be arranged in a local terminal, such as a server local storage, or a client local storage, and can also be stored in a remote server. In this way, after the information content in the voice information to be processed is obtained, the decoration configuration information matched with the identified information content can be obtained from the decoration configuration database.

In an application scenario of this embodiment, the information content may include emotion information. In some embodiments, the identified informational content may include a plurality or variety of emotional information, such as "happy," "liked," "surprised," and the like, that tend to be positive emotional categories. Therefore, in one or more embodiments provided in this specification, the recognition results of the plurality of recognized emotion information may be combined and converted into corresponding request parameters. The corresponding decoration configuration information can be called through unified and normalized request parameters. Specifically, in an embodiment, the obtaining of the decoration configuration information matched with the identified information content may include:

s40: combining the recognition results of the emotion information in the voice information to be processed to generate a request parameter;

s42: and acquiring the decoration configuration information matched based on the request parameters.

As mentioned above, "happy", "liked" and "surprised" may be combined into request parameters including two emotion categories "happy" and "surpride", and then one or two corresponding decoration configuration information may be matched from the decoration configuration database. Of course, in other embodiments, the three emotion category information may be combined into one request parameter "happy" according to the set processing rule, and the decoration configuration information of "happy" may be retrieved according to the request parameter.

The decoration configuration information described in the embodiments of the present specification may include (but is not limited to) a text description, an Emoji expression, a background color, an animation effect, and the like. One emotion information or one request parameter may correspond to one or more decoration configuration information, and a corresponding mapping relationship may be set, for example, the emotion of "happy" may correspond to three decoration configuration information of "happy 1", "happy 2" and "happy 3", and a specific decoration configuration information used may be executed in sequence, or in a random or other defined manner.

Accordingly, in one or more embodiments of the present description, the decoration configuration information may include at least one of:

the method comprises the following steps of describing characters, expressing expressions, changing background colors, playing animation effects at preset positions of the display carrier and deforming animation effects of the display carrier.

By utilizing the decoration configuration information comprising the types, the interestingness of the conversation can be greatly increased, and the display mode of a single voice message bar is changed. And the set decoration configuration information is associated and matched with the expressed emotion information in the voice information, so that the content of the voice information can be displayed or partially displayed through the decoration configuration information, and the user experience is improved.

S6: and rendering the display carrier of the voice information to be processed by utilizing the decoration configuration information.

After the decoration configuration information is obtained, the display carrier can be rendered by using the decoration configuration information. The display carrier may include, for example, an audio information bar, or may include other message carriers used in the conversation interface to indicate that the information content sent by the user is voice information, such as an audio information bar, an audio information frame, an audio information window, an audio symbol, and the like. The audio information window may include a window for a user to play audio information, for example, after the user inputs voice information, the audio information window may be popped up at a designated position of the session interface or separately for playing voice information, and the audio information window may be rendered by using the decoration configuration information. Of course, the display carrier may also be in the form of audio symbols or other forms, such as speaker audio symbols, keys marked with "voice", voice "characters, and the like. Thus, in other embodiments of the method, the display carrier may comprise:

at least one of an audio information bar, an audio information window, an audio symbol.

S8: and displaying the rendered display carrier.

The rendered presentation carrier may be presented in a conversation interface. For example, the voice information bar is rendered by the decoration configuration information to present the animation effect of rainbow change.

In a specific implementation scenario, the animation effect may be set at a preset position of the display carrier to play, as shown in fig. 3, where fig. 3 is a scenario diagram of an embodiment of the method according to the embodiment of the present disclosure. In fig. 3, when it is recognized that the user is "question" based on the voice information input by the user, a short animation of "three question marks" may be played on the horizontal right side of the audio information bar, and each question mark symbol may pop up at predetermined time intervals.

In another implementation scenario, the decoration configuration information may include a morphing animation effect of the display carrier. As shown in fig. 4, fig. 4 is a schematic view of another embodiment of the method according to the embodiment of the present disclosure. In fig. 4, after the user inputs the voice message 1, if the emotion information of the user is "happy", the audio information bar of the voice message 1 input by the user may be set to an animation effect of "rainbow transform" of a multi-color specified frequency transform, so as to present a pleasant atmosphere corresponding to the emotion expressed by the voice message input by the user. Accordingly, if the emotional information is recognized as "sad", the decoration fitting information of "melancholy" in which blue is frequency-shifted slower than purple may be used, as shown in fig. 5. Fig. 5 is a schematic diagram of another embodiment of the method according to the embodiments of the present disclosure.

One or more embodiments of the present specification provide a session information processing method, which may acquire voice information to be processed, and identify emotion information expressed by the voice information to be processed. Then, corresponding decoration configuration information can be called according to the information content including the emotional content, and then a display carrier (for example, an audio information strip) of the voice information to be processed is rendered and displayed by using the decoration configuration information. Therefore, the display carrier is displayed after being rendered according to the emotion expressed by the voice information, so that the content of the voice information can be expressed or partially expressed, and the interestingness of the voice information in chatting is increased. By using the implementation scheme provided by the embodiment of the specification, the audio message body can be additionally described or decorated by analyzing the emotional information in the audio, so that the content of the audio information can be visually displayed, the chatting interest is improved, and the user experience is improved.

In the session information processing method provided in one or more embodiments of the present specification, the content of the voice information may also be identified from a dimension of keyword information included in the voice information, then corresponding decoration configuration information is matched according to the information content including the keyword, and a display carrier of the voice information is rendered and then displayed in a session interface. Specifically, as shown in fig. 6, fig. 6 is a schematic flow chart of another embodiment of the method provided in this specification. In another embodiment of the method provided in the present specification, the method may include:

s120: acquiring voice information to be processed, and identifying information content of the voice information to be processed, wherein the information content at least comprises keyword information contained in the voice information to be processed;

s140: acquiring decoration configuration information matched with the identified information content;

s160: rendering the display carrier of the voice information to be processed by utilizing the decoration configuration information;

s180: and displaying the rendered display carrier.

In the specific processing process, when it is judged that the voice information to be processed is not processed, the voice-to-text processing is performed, and keywords are extracted from the generated text according to rules (such as extraction of network hot words or syntax elements such as a principal and a predicate object). The identifying the information content of the voice information to be processed may include:

converting the voice information to be processed into a text;

and matching the text with keywords in a preset word bank, and extracting the contained keyword information.

The keywords may include words that are screened out in advance (e.g., popular words in the network), and the words may be stored in a predetermined word bank. In an embodiment of the present specification, after performing text conversion on a speech, determining whether a word in the text includes a keyword in the prediction lexicon, and if so, extracting the keyword. The algorithm for extracting the keywords may be implemented by a string search, and specific algorithms may include a full search, a KMP (an improved string matching algorithm proposed by d.e.knuth, j.h.morris and v.r.pratt), a Boyer More (a string matching algorithm), and other classical algorithms.

In another embodiment, the identifying the information content of the to-be-processed voice information may also include:

and converting the voice information to be processed into a text, and extracting key words in the text according to an adopted text grammar rule.

The extracted keywords are identified according to the syntax elements such as the predicate object of the converted text.

Of course, referring to the foregoing implementation of emotion information in speech to be processed based on recognition, the implementation of recognizing keywords after converting speech information into text may also include other embodiments, and in a specific embodiment, the obtaining of decoration configuration information matching the recognized information content may include:

merging the recognition results of the keyword information in the voice information to be processed to generate a request parameter;

and acquiring the decoration configuration information matched based on the request parameters.

And, the obtaining the decoration configuration information matched based on the request parameter may include:

matching the request parameters in a locally preset decoration configuration database to acquire decoration configuration information;

or,

and sending the request parameters to a remote server to obtain decoration configuration information which is obtained by the remote server according to the request parameters.

According to the foregoing description of the embodiments, other embodiments may also be included, such as the kind of the decoration configuration information. The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. Reference may be made to the description of the related embodiments, which is not repeated herein.

One or more embodiments of the present specification provide a session information processing method, which may acquire to-be-processed voice information, convert the to-be-processed voice information into text information, and extract a keyword in the text information. Then, corresponding decoration configuration information can be called according to the information content including the keywords, and then the decoration configuration information is used for displaying the display carrier (such as an audio information strip) of the voice information to be processed after rendering. Therefore, corresponding decoration configuration information is called according to the keywords contained in the voice information to render and display the display carrier, so that the content of the voice information can be expressed or partially expressed, and the interestingness of the voice information in chatting is increased. By using the implementation scheme provided by the embodiment of the specification, the audio message body can be additionally described or decorated by analyzing the keywords in the audio, so that the content of the audio message can be visually displayed, the chatting interest is improved, and the user experience is improved.

In another embodiment of the method provided by the present specification, the above-mentioned implementation based on the keyword information and emotion information in the voice information to be processed may be combined, the information content of the voice information to be processed is identified from at least one of the two dimensions, and then the decoration configuration information is matched to display the rendered information carrier. For example, the keyword information and the expressed emotion information in the speech information to be processed may be recognized at the same time. Accordingly, one or more embodiments of the present specification provide a session information processing method, including:

and displaying the rendered display carrier.

In one or more embodiments of the present specification, when the information content of the speech information to be processed, which includes the keyword information included in the speech information to be processed and the emotion information expressed by the speech information to be processed, is identified, the identification results of the keyword information and the emotion information in the speech information to be processed may be merged to generate the request parameter. Correspondingly, the obtaining of the decoration configuration information matched with the information content includes obtaining the decoration configuration information matched based on the request parameter. As shown in fig. 7, fig. 7 is a schematic processing flow diagram in an application scenario of another embodiment of the method provided in this specification.

Therefore, the recognition combination of the keywords in the voice information to be processed and the recognition result of the emotion information are combined, the recognized information content result can be optimized, and the decoration configuration information of the voice information to be processed can be matched more accurately, so that the display effect is improved, and the user experience is improved.

Further, based on the innovative idea of the embodiments of the present specification, in the session information processing scenario, the data object to be processed may not be limited to voice information, and in other implementation scenarios, information such as text, image, or video may also be included. Based on the above embodiment of the present specification, when processing session information of a text, an image or a video, information content of at least one of keyword information and expressed emotion information in the text, the image or the video can be identified, and then a display carrier of the session information is rendered and displayed in a session interface based on matching of the identified information content with corresponding decoration configuration information. Therefore, the present specification also provides a session information processing method, which may include:

s220: acquiring session information to be processed, and identifying the information content of the session information to be processed, wherein the information content at least comprises one of keyword information contained in the session information to be processed and emotion information expressed by the session information to be processed;

s240: acquiring decoration configuration information matched with the identified information content;

s260: rendering the display carrier of the session information to be processed by utilizing the decoration configuration information;

s280: and displaying the rendered display carrier.

Fig. 8 is a schematic flow chart of another embodiment of the method provided by the present specification. Similarly, the above-mentioned session processing method for other data sources may also include other implementation manners according to the description of the embodiment of the speech information processing method, and the specific implementation manner may refer to the description of the embodiment of the related method, which is not described in detail herein.

The session information processing method provided by the embodiment of the present specification can be used for processing voice information, and the rendered message body can also include text characters, videos, and the like. The implementation scheme for solving the problem is similar to the processing of the voice information, so that the specific implementation of the embodiment of this specification may refer to the implementation of the voice information processing method, and repeated details are not described again.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

One or more embodiments of the present specification provide a session information processing method, which may acquire voice information to be processed, and identify emotion information expressed by the voice information to be processed. Then, corresponding decoration configuration information can be called according to the information content including the emotional content, and then a display carrier (for example, an audio information strip) of the voice information to be processed is rendered and displayed by using the decoration configuration information. Therefore, the display carrier is displayed after being rendered according to the emotion expressed by the voice information, so that the content of the voice information can be expressed or partially expressed, and the interestingness of the voice information in chatting is increased. By using the implementation scheme provided by the embodiment of the specification, the audio message body can be additionally described or decorated by analyzing the hotwords or the feelings in the audio, so that the content of the audio message can be visually displayed, the chatting interest is improved, and the user experience is improved.

Based on the session information processing method, one or more embodiments of the present specification further provide a session information processing apparatus. The apparatus may include systems (including distributed systems), software (applications), modules, components, servers, clients, etc. that use the methods described in the embodiments of the present specification in conjunction with any necessary apparatus to implement the hardware. Based on the same innovative conception, embodiments of the present specification provide an apparatus as described in the following embodiments. Since the implementation scheme of the apparatus for solving the problem is similar to that of the method, the specific implementation of the apparatus in the embodiment of the present specification may refer to the implementation of the foregoing method, and repeated details are not repeated. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated. Specifically, fig. 9 is a schematic block diagram of an embodiment of a session information processing apparatus provided in this specification, and as shown in fig. 9, the apparatus may include:

the emotion recognition module 101 may be configured to acquire to-be-processed voice information, and recognize information content of the to-be-processed voice information, where the information content at least includes emotion information expressed by the to-be-processed voice information;

the decoration matching module 102 may be configured to obtain decoration configuration information matched with the information content;

the rendering module 103 may be configured to render the display carrier of the to-be-processed voice information by using the decoration configuration information;

the effect displaying module 104 may be configured to display the rendered display carrier.

Of course, in other embodiments of the apparatus, described with reference to the foregoing method embodiment, the emotion recognition module 101 may include:

the emotion recognition model unit can be used for extracting audio characteristic data of the voice information to be processed, inputting the audio characteristic data into a pre-constructed emotion recognition model, and determining the emotion category to which the voice information to be processed belongs through the emotion recognition model.

And in another embodiment, the extracted audio feature data comprises at least one of emotional feature data and tone feature data of the living creatures of the voice information to be processed.

In other embodiments, the decoration configuration information may be matched by recognizing a keyword in the voice message. Fig. 10 is a schematic block diagram of another embodiment of a session information processing apparatus provided in this specification, and as shown in fig. 10, the apparatus may include:

the keyword recognition module 201 may be configured to obtain to-be-processed voice information, and recognize information content of the to-be-processed voice information, where the information content at least includes keyword information included in the to-be-processed voice information;

the decoration matching module 202 may be configured to obtain decoration configuration information matched with the information content;

the rendering module 203 may be configured to render the display carrier of the voice information to be processed by using the decoration configuration information;

the effect display module 204 may be configured to display the rendered display carrier.

Of course, in other embodiments of the apparatus, described with reference to the foregoing method embodiment, the emotion recognition module 201 may include:

a text conversion unit 2011, configured to convert the to-be-processed voice information into a text;

the lexicon matching unit 2012 may be configured to match the text with keywords in a preset lexicon, and extract information about the keywords included in the text.

In other embodiments of the session information processing apparatus, the keyword recognition module 201 may include:

the grammar matching unit 2013 may be configured to convert the voice information to be processed into a text, and extract a keyword in the text according to a used text grammar rule.

Of course, in other embodiments of the apparatus, the keyword recognition apparatus may be configured to include the thesaurus matching unit and the grammar matching unit, and recognize the keyword in the speech information by using one or two ways. As shown in fig. 11, fig. 11 is a schematic block diagram of an embodiment of the keyword recognition module in the apparatus provided in this specification.

According to the corresponding method embodiments, the device for recognizing based on keywords or emotion in voice information may further include other embodiments. The apparatus may further include:

the characteristic merging unit can be used for merging the recognition results of the keyword information or the emotion information in the voice information to be processed to generate a request parameter;

accordingly, the decoration matching module may match decoration configuration information based on the request parameters.

In other embodiments, the decoration configuration information may include at least one of:

the method comprises the following steps of describing characters, expressing expressions, changing background colors, playing animation effects at preset positions of the display carrier and deforming animation effects of the display carrier. In other embodiments, the display carrier may include:

audio information bar, audio information window, audio coincidence.

In another embodiment of the device, a processing module for recognizing keywords in the voice message and a processing module for recognizing emotion information expressed by the voice message can be provided. The device can identify the information content of the voice information to be processed based on any one of the modules or by using the two modules. Accordingly, the present specification also provides another embodiment of a session information processing apparatus, which may include:

the information content identification module 301 may be configured to acquire to-be-processed voice information and identify information content of the to-be-processed voice information, where the information content at least includes one of keyword information included in the to-be-processed voice information and emotion information expressed by the to-be-processed voice information;

a decoration matching module 302, configured to obtain decoration configuration information matched with the information content;

the rendering module 303 may be configured to render the display carrier of the to-be-processed voice information by using the decoration configuration information;

the effect displaying module 304 may be configured to display the rendered display carrier.

Fig. 12 is a schematic block diagram of another embodiment of the apparatus provided in the present specification. As shown in fig. 12, in another embodiment of the session information processing apparatus, if a processing manner of recognizing the speech information keyword and the emotion information is used, the apparatus may further include:

a feature merging unit 305, configured to merge recognition results of the keyword information and the emotion information in the to-be-processed voice information to generate a request parameter;

accordingly, the decoration matching module 302 may match decoration configuration information based on the request parameters.

It should be noted that the above-mentioned description of the apparatus according to the method embodiment may also include other embodiments, and specific implementation manners may refer to the description of the related method embodiment, which is not described herein again.

One or more embodiments of the present specification provide a conversation information processing apparatus that can acquire speech information to be processed and recognize emotion information expressed by the speech information to be processed. Then, corresponding decoration configuration information can be called according to the information content including the emotional content, and then a display carrier (for example, an audio information strip) of the voice information to be processed is rendered and displayed by using the decoration configuration information. Therefore, the display carrier is displayed after being rendered according to the emotion expressed by the voice information, so that the content of the voice information can be expressed or partially expressed, and the interestingness of the voice information in chatting is increased. By using the implementation scheme provided by the embodiment of the specification, the audio message body can be additionally described or decorated by analyzing the hotwords or the feelings in the audio, so that the content of the audio message can be visually displayed, the chatting interest is improved, and the user experience is improved.

In another embodiment of the session information processing apparatus, the identified processed information source may further include text, images, videos, and the like, and keywords or emotion information included in the session information such as the text or the videos may be identified and then matched with corresponding decoration configuration information to render a message body. A specific embodiment is shown in fig. 13, and fig. 13 is a schematic block structure diagram of another embodiment of the apparatus provided in this specification, where the apparatus may include:

the information content identification module 401 may be configured to obtain session information to be processed, and identify information content of the session information to be processed, where the information content at least includes one of keyword information included in the session information to be processed and emotion information expressed by the session information to be processed;

a decoration matching module 402, configured to obtain decoration configuration information matching the identified information content;

a rendering module 403, configured to render, by using the decoration configuration information, the display carrier of the to-be-processed session information;

an effect display module 404, configured to display the rendered display carrier.

Similarly, the above-mentioned session processing method for other data sources may further include other implementation manners according to the description of the embodiment of the voice information processing method, and the implementation scheme for solving the problem is similar to the foregoing processing of the voice information, so that specific implementation of one or more embodiments in this specification may refer to implementation of the foregoing voice information processing method, and repeated details are not repeated.

The session information processing method or apparatus provided in the embodiments of the present specification may be implemented in a computer by a processor executing corresponding program instructions, for example, implemented on a PC side using a c + + language of a windows operating system, or implemented on an intelligent terminal using android, iOS system programming languages, or implemented on processing logic based on a quantum computer. In another embodiment of a session information processing apparatus provided in this specification, the apparatus may include a processor and a memory for storing processor-executable instructions, and the processor implements, when executing the instructions:

and displaying the rendered display carrier.

In another embodiment of the apparatus shown, a processor and a memory for storing processor-executable instructions may be included, the processor when executing the instructions implementing:

and displaying the rendered display carrier.

The to-be-processed session information may include at least one of voice information, text information, image information, and video information.

The conversation information processing device provided by the embodiment of the specification can acquire the voice information to be processed and identify emotion information expressed by the voice information to be processed. Then, corresponding decoration configuration information can be called according to the information content including the emotional content, and then a display carrier (for example, an audio information strip) of the voice information to be processed is rendered and displayed by using the decoration configuration information. Therefore, the display carrier is displayed after being rendered according to the emotion expressed by the voice information, so that the content of the voice information can be expressed or partially expressed, and the interestingness of the voice information in chatting is increased. By using the implementation scheme provided by the embodiment of the specification, the audio message body can be additionally described or decorated by analyzing the hotwords or the feelings in the audio, so that the content of the audio message can be visually displayed, the chatting interest is improved, and the user experience is improved.

The method or the device can be used in various electronic devices with conversation functions, such as mobile intelligent terminals, vehicle-mounted devices, Personal Computers (PCs), intelligent wearable devices, servers of business systems and the like, can perform secondary rendering on display carriers for displaying messages, improves chat interestingness and improves user experience. In particular, the present specification provides an electronic device, as shown in fig. 14, which may include at least one processor and a memory storing processor-executable instructions for a user, where the processor executes the instructions to implement:

and displaying the rendered display carrier.

In an embodiment of the electronic device, the to-be-processed session information may include at least one of voice information, text information, image information, and video information. Currently, other types of session information to be processed may also be included according to different data sources, and the details may be determined according to an actual session scenario.

It should be noted that the description of the apparatus or the electronic device described above according to the related method embodiment may also include other embodiments, and specific implementation may refer to the description of the method embodiment, which is not described herein in detail. The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the hardware + program class embodiment, since it is substantially similar to the method embodiment, the description is simple, and the relevant points can be referred to the partial description of the method embodiment.

Although the description of the embodiments refers to the descriptions of data/definitions, acquisition, interaction, calculation, judgment, etc. such as speech-to-text recognition conversion, KMP algorithm keyword search, emotion information recognition using neural networks, decoration configuration information matching manner/exhibition effect, etc., the embodiments of the present description are not limited to those that necessarily conform to industry communication standards, standard computer data processing and storage rules, or the description of one or more embodiments of the present description. Certain industry standards, or implementations modified slightly from those described using custom modes or examples, may also achieve the same, equivalent, or similar, or other, contemplated implementations of the above-described examples. The embodiments using the modified or transformed data acquisition, storage, judgment, processing and the like can still fall within the scope of the alternative embodiments of the embodiments in this specification.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardsradware (Hardware Description Language), vhjhd (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a vehicle-mounted human-computer interaction device, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Although one or more embodiments of the present description provide method operational steps as described in the embodiments or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive approaches. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual apparatus or end product executes, it may execute sequentially or in parallel (e.g., parallel processors or multi-threaded environments, or even distributed data processing environments) according to the method shown in the embodiment or the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, when implementing one or more of the present description, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of multiple sub-modules or sub-units, etc. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage, graphene storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

As will be appreciated by one skilled in the art, one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

One or more embodiments of the present description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present specification can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In the description of the specification, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

The above description is merely exemplary of one or more embodiments of the present disclosure and is not intended to limit the scope of one or more embodiments of the present disclosure. Various modifications and alterations to one or more embodiments described herein will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims.

Claims

1. A method of session information processing, the method comprising:

and displaying the rendered display carrier.

2. A session information processing method according to claim 1, wherein said identifying the information content of the voice information to be processed comprises:

and extracting audio characteristic data of the voice information to be processed, inputting the audio characteristic data into a pre-constructed emotion recognition model, and determining the emotion category to which the voice information to be processed belongs through the emotion recognition model.

3. A conversation information processing method according to claim 2, wherein the extracted audio feature data includes at least one of emotion feature data and mood feature data of living creatures of the speech information to be processed.

4. A session information processing method according to any one of claims 1 to 3, wherein said acquiring decoration configuration information matching with the identified information content comprises:

combining the recognition results of the emotion information in the voice information to be processed to generate a request parameter;

5. The session information processing method according to claim 4, wherein said obtaining the decoration configuration information matched based on the request parameter comprises:

or,

6. A session information processing method according to claim 1, wherein the decoration configuration information includes at least one of:

7. A session information processing method according to claim 1, wherein said display carrier comprises:

8. A method of session information processing, the method comprising:

and displaying the rendered display carrier.

9. A session information processing method according to claim 8, wherein said identifying the information content of the voice information to be processed comprises:

converting the voice information to be processed into a text;

10. A session information processing method according to claim 8, wherein said identifying the information content of the voice information to be processed comprises:

and converting the voice information to be processed into a text, and extracting key words in the text according to a used text grammar rule.

11. A session information processing method according to any one of claims 8 to 10, wherein said acquiring decoration configuration information matching the identified information content comprises:

12. The session information processing method according to claim 11, wherein said obtaining the decoration configuration information matched based on the request parameter comprises:

or,

13. A session information processing method according to claim 8, wherein the decoration configuration information includes at least one of:

14. A session information processing method according to claim 8, wherein said display carrier comprises:

audio information bar, audio information window, audio coincidence.

15. A method of session information processing, the method comprising:

and displaying the rendered display carrier.

16. A conversation information processing method according to claim 15, when identifying information contents including keyword information contained in the voice information to be processed, emotion information expressed by the voice information to be processed, said method further comprising:

combining the recognition results of the keyword information and the emotion information in the voice information to be processed to generate a request parameter;

correspondingly, the obtaining of the decoration configuration information matched with the information content includes obtaining the decoration configuration information matched based on the request parameter.

17. A method of session information processing, the method comprising:

and displaying the rendered display carrier.

18. A session information processing apparatus, the apparatus comprising:

19. A session information processing apparatus according to claim 18, said emotion recognition module comprising:

and the emotion recognition model unit is used for extracting the audio characteristic data of the voice information to be processed, inputting the audio characteristic data into a pre-constructed emotion recognition model, and determining the emotion category to which the voice information to be processed belongs through the emotion recognition model.

20. A session information processing apparatus according to claim 19, wherein said extracted audio feature data includes at least one of emotion feature data and mood feature data of a living being of said speech information to be processed.

21. A session information processing apparatus according to any one of claims 18 to 20, said apparatus further comprising:

the characteristic merging unit is used for merging the recognition results of the emotion information in the voice information to be processed to generate a request parameter;

correspondingly, the decoration matching module matches decoration configuration information based on the request parameters.

22. A session information processing apparatus according to any one of claims 18 to 20, wherein the decoration configuration information includes at least one of:

23. A session information processing apparatus according to any one of claims 18 to 20, said presentation carrier comprising:

audio information bar, audio information window, audio coincidence.

24. A session information processing apparatus, the apparatus comprising:

25. A session information processing apparatus according to claim 24, said keyword recognition module comprising:

the text conversion unit is used for converting the voice information to be processed into a text;

and the grammar matching unit is used for converting the voice information to be processed into a text and extracting key words in the text according to a used text grammar rule.

26. A session information processing apparatus according to claim 24, said keyword recognition module comprising:

and the word bank matching unit is used for matching the word text with the keywords in a preset word bank and extracting the contained keyword information.

27. A session information processing apparatus according to any one of claims 24 to 26, said apparatus further comprising:

28. A session information processing apparatus according to any one of claims 24 to 26, wherein the decoration configuration information includes at least one of:

29. A session information processing apparatus according to any one of claims 24 to 26, wherein the presentation carrier comprises:

audio information bar, audio information window, audio coincidence.

30. A session information processing apparatus, the apparatus comprising:

31. A session information processing apparatus according to claim 30, said apparatus further comprising:

the characteristic merging unit is used for merging the recognition results of the keyword information and the emotion information in the voice information to be processed to generate a request parameter;

32. A session information processing apparatus, the apparatus comprising:

33. A session information processing apparatus comprising a processor and a memory for storing processor-executable instructions that when executed by the processor implement:

and displaying the rendered display carrier.

34. A session information processing apparatus as claimed in claim 33, wherein said session information to be processed comprises at least one of voice information, text information, image information and video information.

35. An electronic device comprising at least one processor and a memory storing processor-executable instructions for a user, the instructions when executed by the processor implementing:

and displaying the rendered display carrier.