CN112910761B

CN112910761B - Instant messaging method, device, equipment, storage medium and program product

Info

Publication number: CN112910761B
Application number: CN202110126201.6A
Authority: CN
Inventors: 沈航; 刘俊启
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2023-04-21
Anticipated expiration: 2041-01-29
Also published as: CN112910761A

Abstract

The application discloses an instant messaging method, an instant messaging device, instant messaging equipment, an instant messaging storage medium and an instant messaging program product, and relates to the technical field of artificial intelligence such as computer vision and natural language processing. One embodiment of the method comprises the following steps: responding to receiving a selection instruction of a user for the expression image, and acquiring chat information; determining a target emotion corresponding to the chat information; determining target audio corresponding to the expression image and the target emotion from a plurality of preset audio according to the target emotion; determining information to be transmitted according to the expression image and the target audio; and sending the information to be transmitted to the server so that the server transmits the information to be transmitted to the target client. The method and the device ensure information transmission, improve the interest of instant messaging, and simultaneously can obviously express the emotion of the information sending party, thereby improving the user experience.

Description

Instant messaging method, device, equipment, storage medium and program product

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to the field of artificial intelligence technologies such as computer vision and natural language processing, and in particular, to an instant messaging method, an instant messaging device, an apparatus, a storage medium, and a program product.

Background

With the development of the internet, instant messaging (Instant Messenger, IM) application on a client has become an indispensable information interaction mode in daily life of people. In the instant messaging process, chat users can send chat information such as voice, images and the like besides words, so that information interaction with other users is realized.

Disclosure of Invention

The embodiment of the application provides an instant messaging method, an instant messaging device, instant messaging equipment, an instant messaging storage medium and a instant messaging program product.

In a first aspect, an embodiment of the present application provides an instant messaging method, including: responding to receiving a selection instruction of a user for the expression image, and acquiring chat information; determining a target emotion corresponding to the chat information; determining target audio corresponding to the expression image and the target emotion from a plurality of preset audio according to the target emotion; determining information to be transmitted according to the expression image and the target audio; and sending the information to be transmitted to the server so that the server transmits the information to be transmitted to the target client.

In a second aspect, an embodiment of the present application provides an instant communication device, including: the information acquisition module is configured to respond to receiving a selection instruction of a user on the expression image and acquire chat information; the first determining module is configured to determine a target emotion corresponding to the chat information; a second determining module configured to determine target audio corresponding to the expression image and the target emotion from a plurality of preset audio according to the target emotion; the third determining module is configured to determine information to be transmitted according to the expression image and the target audio; and the information sending module is configured to send the information to be transmitted to the server so that the server can transmit the information to be transmitted to the target client.

In a third aspect, an embodiment of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the first aspect.

In a fourth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method as described in the first aspect.

In a fifth aspect, embodiments of the present application propose a computer program product comprising a computer program which, when executed by a processor, implements the method described in the first aspect.

The instant messaging method, the device, the equipment, the storage medium and the program product provided by the embodiment of the application firstly respond to receiving a selection instruction of a user for the expression image to acquire chat information; then determining a target emotion corresponding to the chat information; then determining target audio corresponding to the expression image and the target emotion from a plurality of preset audio according to the target emotion; then determining information to be transmitted according to the expression image and the target audio; finally, the information to be transmitted is sent to a server, so that the server transmits the information to be transmitted to a target client; according to the embodiment of the application, the information transmission is ensured, the interest of instant messaging is improved, the emotion of the information sending party can be obviously expressed, and the user experience is improved.

It should be understood that the description of this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings. The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

FIG. 1 is an exemplary system architecture in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of an instant messaging method according to the present application;

FIG. 3 is an application scenario diagram of an instant messaging method according to the present application;

FIG. 4 is a schematic diagram illustrating an embodiment of an instant messaging device according to the present application;

fig. 5 is a block diagram of an electronic device for implementing an instant messaging method according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 in which embodiments of instant messaging methods and apparatus of the present application may be employed.

As shown in fig. 1, a system architecture 100 may include a client 101, a server 102, and a client 103. Any two of the

clients

101, 102, and 103 may communicate via various connection categories, such as wired, wireless communication links, or fiber optic cables, for example.

A user may interact with the client 103 through the server 102 using the client 101 to receive or transmit information or the like. Various applications may be installed on clients 101 and 103, such as various client applications, multiparty interactive applications, artificial intelligence applications, image processing applications, image beauty applications, instant messaging applications, and the like.

The server 102 may be a server providing various services, such as a background server providing support for the client 101 and the client 103. The background server may send the information to be transmitted sent by the client 101 to the client 103.

The server may be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein.

In practice, the instant messaging method provided in the embodiments of the present application may be executed by the client 101 or the client 103, and the instant messaging device may also be disposed in the client 101 or the client 103.

It should be understood that the number of clients and servers in fig. 1 is merely illustrative. There may be any number of clients and servers, as desired for an implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of an instant messaging method in accordance with the present application is shown. The instant messaging method comprises the following steps:

step 201, in response to receiving a selection instruction of the user for the emoticon, chat information is acquired.

In this embodiment, the executing body of the instant messaging method (for example, the client 101 shown in fig. 1) may obtain the chat information when receiving a selection instruction of the user for the expression image on the input interface of the instant messaging application, for example, obtain the chat information on the chat interface of the instant messaging application when receiving a selection instruction of the user for the expression image on the input interface of the instant messaging application. Wherein the emoticons may be static/dynamic emoticons, which may include text and/or pictures; the expression element can be a text or a picture which can express emotion and transmit information in the expression image. The chat information may be information displayed on a chat interface of the instant messaging application, for example, information that the client 101 interacts with the client 103 before the current time or information that the client 101 is about to send (i.e. is ready to send) to the client 103 at the current time; the chat information may be at least one of a picture, text and audio.

Here, the selection instruction may be a selection operation of the expression image, for example: the user performs operations such as clicking, double clicking or sliding at a preset position (e.g., an emoticon thumbnail) on the instant messaging application.

Step 202, determining a target emotion corresponding to the chat information.

In this embodiment, when the chat information is a picture, the executing body may extract the chat information and obtain a target emotion corresponding to the expression image.

In a specific example, the executing body may perform feature extraction on chat information through an image feature extraction model obtained through pre-training, where the image feature extraction model is obtained through training based on a convolutional neural network (Convolutional Neural Networks, CNN), and the image feature extraction model may be a VGG Net (Visual Geometry Group) model, a Res Net (Residual Neural Network) model, an Alex Net model, or the like.

In this embodiment, when the chat information is text, the executing body may segment the chat information; then, word segmentation is converted into corresponding vectors by utilizing a word vector model; then, the vectors corresponding to the word segmentation are spliced to obtain text vectors; then, a feature extraction network in the text emotion classification model is trained in advance, and semantic feature vectors are extracted from the text vectors; and then, analyzing the semantic feature vector based on a classification network in the text emotion classification model to obtain a target emotion corresponding to the chat information.

It should be noted that the text emotion classification model includes a feature extraction network and a classification network, where the feature extraction network layer is used to extract semantic feature vectors with smaller dimensions from the text vectors. The text emotion classification model is obtained by training related training sample data in advance, and the training sample data is marked with corresponding preset emotion types. The preset emotion categories may be divided according to the business requirements of the actual field, and may include happiness, sadness, gas, liking, surprise, flat, and the like, for example. The text emotion classification model obtained through training can also output emotion types corresponding to the text to be processed.

In one specific example, the feature extraction network layer may be implemented using a Bi-gating loop unit (Bi-directional Gated Recurrent Unit, bi-GRU) network based on attention mechanisms.

In this embodiment, when the chat information is audio, the executing body may extract an audio feature vector of an audio segment of the chat information, where the audio segment corresponds to a section of speech in the chat information; then, matching the audio feature vector of the audio fragment with a plurality of emotion feature models, wherein the emotion feature models respectively correspond to one of a plurality of emotion classifications; and then, taking the emotion classification corresponding to the emotion characteristic model with the matching result as the target emotion of the audio fragment.

Wherein the audio feature vector comprises one or more of the following audio features: energy characteristics, voicing frame number characteristics, pitch frequency characteristics, formant characteristics, harmonic to noise ratio characteristics, and mel cepstrum coefficient characteristics.

In a specific example, when the emotion feature model is a mixed gaussian model (GMM), likelihood probabilities between audio feature vectors of an audio clip and a plurality of emotion feature models, respectively, are calculated; and then, taking the emotion classification corresponding to the emotion characteristic model with the likelihood probability larger than the preset threshold value and the maximum emotion characteristic model as the target emotion of the audio fragment.

It should be noted that the above emotion feature model may also be implemented in other forms, such as a Support Vector Machine (SVM) model, a K nearest neighbor classification algorithm (KNN) model, a markov model (HMM), and A Neural Network (ANN) model.

And 203, determining target audio corresponding to the expression image and the target emotion from a plurality of preset audio according to the target emotion.

In this embodiment, the executing body may match with a plurality of preset audios one by one according to the target emotion, so as to determine the target audio corresponding to the expression image and the target emotion. The audio may be audio corresponding to different emotions or different emotion grades, and the audio may be audio corresponding to a text included in the expression image, for example, audio corresponding to a "haha" expression image is audio corresponding to a "haha". The above emotion levels may be different levels for each of the different classes of emotion, for example, sad emotion may include: very sad, light sad, etc.; the different emotion levels correspond to different tones, e.g., very sad corresponds to different tones than light sad corresponds to different tones.

It should be noted that, the executing body may pre-establish a correspondence between a plurality of audio frequencies, expression images and target emotions.

And 204, determining information to be transmitted according to the expression image and the target audio.

In this embodiment, the executing body may synthesize the expression image and the target audio into a piece of information to be transmitted, for example, establish a mapping relationship between the expression image and the target audio; or the execution body does not process the expression image and the target audio, but sends the expression image and the target audio to the target client together.

In step 205, the information to be transmitted is sent to the server, so that the server transmits the information to be transmitted to the target client.

In this implementation manner, the executing entity may first send the information to be transmitted to the server, and after the server receives the information to be transmitted, the server (e.g., the server 102 shown in fig. 1) transmits the information to be transmitted to the target client (e.g., the client 103 shown in fig. 1).

According to the instant messaging method provided by the embodiment of the application, firstly, chat information is obtained in response to receiving a selection instruction of a user for the expression image; then determining a target emotion corresponding to the chat information; then determining target audio corresponding to the expression image and the target emotion from a plurality of preset audio according to the target emotion; then determining information to be transmitted according to the expression image and the target audio; finally, the information to be transmitted is sent to a server, so that the server transmits the information to be transmitted to a target client; according to the embodiment of the application, the information transmission is ensured, the interest of instant messaging is improved, the emotion of the information sending party can be obviously expressed, and the user experience is improved.

In some optional implementations of this embodiment, the plurality of audio frequencies and the emoticons have a mapping relationship, and emotions corresponding to the plurality of audio frequencies are different.

In this implementation, each of the plurality of audio has a mapping relationship with the emoticon, and each of the plurality of audio corresponds to a different emotion or a different emotion (i.e., audio including the same emotion among the plurality of audio) corresponding to a part of the plurality of audio.

In the implementation manner, the plurality of audios and the expression image have a mapping relation, and the emotions corresponding to the plurality of audios are different, so that the target audios of the expression image and the chat information can be accurately matched from the plurality of audios according to the target emotion.

In some optional implementations of the present embodiment, determining, according to the target emotion, target audio corresponding to the expression image and the target emotion from among a plurality of preset audio includes: according to the target emotion, matching target audio corresponding to the target emotion from a plurality of audio having a mapping relation with the expression image.

In the implementation manner, matching is performed from a plurality of audios with mapping relation with the expression image according to the target emotion so as to obtain target audios corresponding to the target emotion.

In this implementation manner, the determination of the target audio of the target emotion may be implemented from a plurality of audios having a mapping relationship with the expression image according to the target emotion.

In some optional implementations of this embodiment, the chat information includes at least one of: expression image, history chat information and current chat information.

In this implementation, the historical chat information may be chat information between the client 101 and the client 103 before the current time, where the historical chat information may include at least one of a picture, text, and audio. The current chat information may be information to be transmitted by the user through the client 101 (i.e., not transmitted to the client 103), for example, information input on an input interface of the instant messaging application, and may include at least one of a picture, text, and audio.

In the implementation manner, the target emotion corresponding to the chat information can be determined according to at least one of the expression image, the historical chat information and the current chat information, so that the target emotion of the chat information can be accurately obtained.

In some optional implementations of this embodiment, if the chat information includes: current chat information, emoticons and/or historical chat information; and

according to the expression image and the target audio, determining information to be transmitted comprises the following steps:

and determining information to be transmitted according to the expression image, the target audio and the current chat information.

In the implementation mode, the chat information comprises current chat information, expression images and historical chat information; or, current chat information and expression images; or, when the chat information is the current chat information and the historical chat information, the executing body may determine the information to be transmitted based on the emoticon, the target audio and the current chat information.

Here, determining information to be transmitted based on the emoticon, the target audio, and the current chat information may include: the executing body may synthesize the expression image, the target audio and the current chat information into one piece of information to be transmitted, for example, establish a mapping relationship between the expression image, the target audio and the current chat information; or the execution body does not process the expression image and the target audio, but sends the expression image and the target audio to the target client together.

In this implementation, when the chat information includes the current chat information, and the emoticons and/or the historical chat information, the information to be transmitted may be accurately determined based on the emoticons, the target audio, and the current chat information.

In some optional implementations of this embodiment, if the chat information includes: current chat information, emoticons, and historical chat information; and

determining a target emotion corresponding to the chat information, including: determining a target emotion according to the emotion corresponding to the historical chat information, the preset first weight, the emotion corresponding to the expression image, the preset second weight and the emotion corresponding to the current chat information and the preset third weight.

In this implementation manner, when the chat information includes the current chat information, the expression image and the historical chat information, the weighted summation may be performed according to the emotion corresponding to the historical chat information and the preset first weight, the emotion corresponding to the expression image and the preset second weight, and the emotion corresponding to the current chat information and the preset third weight, so as to obtain the target emotion.

It should be noted that, the weights may be based on the use situation of the user, for example, the user prefers to send the expression image, and the weights corresponding to the expression image may be set higher than the weights corresponding to the current chat information and the historical chat information; in addition, the current chat information can reflect the current emotion of the user of the historical chat information, so that the weight corresponding to the current chat information can be set higher than the weight corresponding to the historical chat information. The above weights may also be set by the user himself.

In the implementation manner, the accurate determination of the target emotion can be realized based on the emotion and the weight corresponding to the current chat information, the expression image and the historical chat information.

In some optional implementations of this embodiment, the instant messaging method further includes: in response to receiving a trigger operation to the plot image, the target audio is played from text to speech.

In this implementation, after the information To be transmitted is transmitted, a trigger operation may be performed on the expression image To play the target audio through Text To Speech (TTS). The triggering operation may include a pressing operation or a sliding operation, for example, a single click, a double click, a left slide, a right slide.

In a specific example, the expression image is "ha", and the audio corresponding to "ha" is played through tts.

In the implementation mode, when the triggering operation of the expression image is received, the target audio can be played through tts, so that man-machine interaction is promoted, the interest in the instant messaging process is improved, and the user experience is improved.

In order to facilitate understanding, the following provides an application scenario in which the instant messaging method of the embodiments of the present application may be implemented. As shown in fig. 3, in step 301, a first client obtains chat information when receiving a selection instruction of a user for an emoticon; step 302, a first client determines a target emotion corresponding to chat information; step 303, the first client determines target audio corresponding to the expression image and the target emotion from a plurality of preset audio according to the target emotion; step 304, the first client determines information to be transmitted according to the expression image and the target audio; step 305, the first client sends information to be transmitted to the server; step 306, the server sends the information to be transmitted to the second client; step 307, when receiving the trigger operation of the expression image, playing the target audio through tts.

According to the embodiment of the application, the emotion of the chat text in the chat information is utilized to determine the sound corresponding to the emotion of the expression package in the chat information; then, when touch operation of the expression package is received, playing sound corresponding to the expression package; therefore, the information transmission between the first client and the second client is ensured, the interest of instant messaging is improved, the emotion of the information sending party can be obviously expressed, and the user experience is improved.

With further reference to fig. 4, as an implementation of the method shown in the foregoing drawings, the present application provides an embodiment of an instant messaging device, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device may be specifically applied to various electronic devices.

As shown in fig. 4, the instant communication apparatus 400 of the present embodiment may include: an information acquisition module 401 configured to acquire chat information in response to receiving a selection instruction of an expression image from a user; a first determining module 402 configured to determine a target emotion corresponding to the chat information; a second determining module 403 configured to determine a target audio corresponding to the expression image and the target emotion from among a plurality of preset audios according to the target emotion; a third determining module 404 configured to determine information to be transmitted according to the emoticon and the target audio; the information sending module 405 is configured to send the information to be transmitted to the server, so that the server transmits the information to be transmitted to the target client.

In this embodiment, in the instant communication apparatus 400: specific processing of the information obtaining module 401, the first determining module 402, the second determining module 403, the third determining module 404, and the information sending module 405 and technical effects thereof may refer to the relevant descriptions of steps 201 to 205 in the corresponding embodiment of fig. 2, and are not repeated herein. The first determining module 402, the second determining module 403, and the third determining module 404 may be the same module or different modules.

In some optional implementations of the present embodiment, the second determining module 403 is further configured to: according to the target emotion, matching target audio corresponding to the target emotion from a plurality of audio having a mapping relation with the expression image.

the third determination module 404 is further configured to: and determining information to be transmitted according to the expression image, the target audio and the current chat information.

the first determination module 402 is further configured to: determining a target emotion according to the emotion corresponding to the historical chat information, the preset first weight, the emotion corresponding to the expression image, the preset second weight and the emotion corresponding to the current chat information and the preset third weight.

In some optional implementations of this embodiment, the instant messaging device further includes: an audio playing module (not shown in the figure) is configured to play the target audio from text to speech in response to receiving a trigger operation on the emoticon.

According to embodiments of the present application, there is also provided an electronic device, a readable storage medium and a computer program product.

Fig. 5 shows a schematic block diagram of an example electronic device 500 that may be used to implement embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 5, the apparatus 500 includes a computing unit 501 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Various components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above, such as an instant messaging method. For example, in some embodiments, the instant messaging method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the instant messaging method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the instant messaging method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present application may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Artificial intelligence is the discipline of studying computers to simulate certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of humans, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural voice processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

According to the technical scheme, firstly, chat information is obtained in response to receiving a selection instruction of a user for the expression image; then determining a target emotion corresponding to the chat information; then determining target audio corresponding to the expression image and the target emotion from a plurality of preset audio according to the target emotion; then determining information to be transmitted according to the expression image and the target audio; finally, the information to be transmitted is sent to a server, so that the server transmits the information to be transmitted to a target client; according to the embodiment of the application, the information transmission is ensured, the interest of instant messaging is improved, the emotion of the information sending party can be obviously expressed, and the user experience is improved.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. An instant messaging method comprising:

responding to a received selection instruction of a user for the expression image, and obtaining chat information, wherein the chat information comprises at least one of the expression image, historical chat information and current chat information, and the current chat information is information to be sent by the user at the current moment;

determining a target emotion corresponding to the chat information;

determining target audio corresponding to the expression image and the target emotion from a plurality of preset audio according to the target emotion, wherein the plurality of audio and the expression image have a mapping relation, and the emotions corresponding to the plurality of audio are different;

determining information to be transmitted according to the expression image and the target audio;

the information to be transmitted is sent to a server, so that the server transmits the information to be transmitted to a target client;

the method further comprises the steps of:

playing the target audio from text to speech in response to receiving a trigger operation on the emoticon;

wherein, according to the target emotion, determining target audio corresponding to the expression image and the target emotion from a plurality of preset audio comprises:

matching a target audio corresponding to the target emotion from a plurality of audios having the mapping relation with the expression image according to the target emotion;

wherein, in response to the chat information including text, the determining the target emotion corresponding to the chat information includes:

word segmentation is carried out on the text;

converting the word segmentation into corresponding vectors by using a word vector model;

splicing vectors corresponding to the segmentation to obtain text vectors;

extracting the text vector by utilizing a feature extraction network in a text emotion classification model obtained by pre-training to obtain a semantic feature vector;

analyzing the semantic feature vector based on a classification network in the text emotion classification model to obtain a target emotion corresponding to the text;

and, responding to the chat information further comprises audio, and the determining the target emotion corresponding to the chat information further comprises:

extracting an audio feature vector of the audio;

matching the audio feature vector with a plurality of emotion feature models, wherein the plurality of emotion feature models respectively correspond to one of a plurality of emotion classifications;

and taking the emotion classification corresponding to the emotion feature model with the matching result as the target emotion corresponding to the audio.

2. The method of claim 1, wherein if the chat message comprises: current chat information, emoticons and/or historical chat information; and

the determining information to be transmitted according to the expression image and the target audio includes:

3. The method of claim 2, wherein if the chat message comprises: current chat information, emoticons, and historical chat information; and

the determining the target emotion corresponding to the chat information comprises the following steps:

determining the target emotion according to the emotion corresponding to the historical chat information, the preset first weight, the emotion corresponding to the expression image, the preset second weight and the emotion corresponding to the current chat information, and the preset third weight.

4. An instant messaging device comprising:

the information acquisition module is configured to respond to receiving a selection instruction of a user on the expression image, and acquire chat information, wherein the chat information comprises at least one of the expression image, historical chat information and current chat information, and the current chat information is information to be sent by the user at the current moment;

the first determining module is configured to determine a target emotion corresponding to the chat information;

a second determining module configured to determine target audio corresponding to the expression image and the target emotion from a plurality of preset audio according to the target emotion, wherein the plurality of audio and the expression image have a mapping relationship, and the emotions corresponding to the plurality of audio are different;

the third determining module is configured to determine information to be transmitted according to the expression image and the target audio;

the information sending module is configured to send the information to be transmitted to a server so that the server can transmit the information to be transmitted to a target client;

wherein the apparatus further comprises:

an audio playing module configured to play the target audio from text to speech in response to receiving a trigger operation on the emoticon;

wherein the second determination module is further configured to:

wherein, in response to the chat information including text, the first determination module is further configured to:

word segmentation is carried out on the text;

splicing vectors corresponding to the segmentation to obtain text vectors;

and, in response to the chat information further comprising audio, the first determination module is further configured to:

extracting an audio feature vector of the audio;

5. The apparatus of claim 4, wherein if the chat message comprises: current chat information, emoticons and/or historical chat information; and

the third determination module is further configured to:

6. The apparatus of claim 5, wherein if the chat message comprises: current chat information, emoticons, and historical chat information; and

the first determination module is further configured to:

7. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3.

8. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-3.