CN110493123B

CN110493123B - Instant messaging method, device, equipment and storage medium

Info

Publication number: CN110493123B
Application number: CN201910872132.6A
Authority: CN
Inventors: 刘立强; 何丹; 伍学平; 李辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-16
Filing date: 2019-09-16
Publication date: 2022-06-28
Anticipated expiration: 2039-09-16
Also published as: CN110493123A

Abstract

The application discloses an instant communication method, a device, equipment and a storage medium, which relate to the technical field of communication, and the method comprises the following steps: receiving a voice awakening signal in the process of displaying a user interface of the instant messaging program; according to the voice awakening signal, awakening a voice control interface for displaying the instant communication program; receiving a voice control signal in the process of displaying a voice control interface of the instant messaging program; and performing at least one of message sending, reading new messages and function setting to other contacts in the instant messaging program by adopting conversational voice interaction according to the voice control signal. This application is through receiving the speech control signal, adopts dialogue formula voice interaction, and the control operation that the speech control signal that carries out the input corresponds in the instant communication procedure does not need the user to use the touching operation just can obtain voice feedback, has improved the efficiency of using the instant communication procedure, simultaneously, uses the interactive mode of dialogue formula, improves speech control's accuracy.

Description

Instant messaging method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of communications technologies, and in particular, to an instant messaging method, an instant messaging device, an instant messaging apparatus, and a storage medium.

Background

With the development of communication technology, instant messaging programs are more and more widely applied. Through the instant communication program, two or more people can use the network to transmit the text message, the file, the voice and the video in real time so as to communicate.

For people with poor eyesight, the instant messaging program needs to be used in an accessible mode. In the mode, a user needs to touch a screen of the terminal with a hand, and the terminal provides voice feedback for the touch and operation of the user, so that people with visual impairment are helped.

The instant messaging program is used in a barrier-free mode, a user needs to use touch operation to obtain voice feedback, and the efficiency is not high in the scene that the terminal is inconvenient to touch operation such as cooking.

Disclosure of Invention

The embodiment of the application provides an instant messaging method, an instant messaging device and a storage medium, which can be used for solving the problems that when an instant messaging program is used in a barrier-free mode, a user needs to use touch operation to obtain voice feedback, and the efficiency is low in a situation that the terminal is inconvenient to touch operation such as cooking.

The technical scheme is as follows:

according to an aspect of the present application, there is provided an instant messaging method, the method including:

Receiving a voice wake-up signal in the process of displaying a user interface of an instant messaging program;

according to the voice awakening signal, awakening a voice control interface for displaying the instant communication program;

receiving a voice control signal in the process of displaying a voice control interface of the instant messaging program;

and performing at least one of message sending, reading new messages and function setting to other contacts in the instant messaging program by adopting conversational voice interaction according to the voice control signal.

In an alternative embodiment, performing at least one of message sending, message receiving and function setting of other contacts in the instant messaging program by using conversational voice interaction according to the voice control signal includes: displaying a first prompt text corresponding to the voice control signal in a first display area on the voice control interface; playing the dialogue type response voice corresponding to the voice control signal, and displaying a second prompt text of the dialogue type response voice in a second display area on the voice control interface; wherein the conversational response voice comprises: the execution result of the voice control signal, or the candidate prompt information of the next voice control corresponding to the voice control signal.

In an optional embodiment, when the candidate prompt messages include at least two candidate prompt messages, third prompt words corresponding to the at least two candidate prompt messages are displayed in a third display area on the voice control interface.

In an alternative embodiment, the voice control signal comprises: a first voice control signal for contacting a first contact; the method for playing the dialogue type response voice corresponding to the voice control signal and displaying the second prompt text of the dialogue type response voice in the second display area on the voice control interface comprises the following steps: and when the first contact has unique match in the address list, playing the dialogue response voice of the session interface of the first contact, and displaying the second prompt text of the session interface of the first contact in a second display area on the voice control interface.

In an alternative embodiment, the voice control signal comprises: a sixth voice control signal for reading the new message; the method for playing the dialogue type response voice corresponding to the voice control signal and displaying the second prompt text of the dialogue type response voice in the second display area on the voice control interface comprises the following steps: and playing the dialogue type response voice for reading the new message, displaying a second prompt text of a third contact corresponding to the new message in a second display area on the voice control interface, and displaying a third prompt message thumbnail corresponding to the new message in a third display area on the voice control interface.

In an alternative embodiment, the voice control signal comprises: an eighth voice control signal for sending a voice message to a fourth contact; the method for playing the dialogue type response voice corresponding to the voice control signal and displaying the second prompt text of the dialogue type response voice in the second display area on the voice control interface comprises the following steps: playing the recorded dialogue type response voice and displaying the recorded second prompt words in a second display area on the voice control interface; and receiving a voice message sent to the fourth contact, and displaying a third prompt text which is being recorded in a third display area on the voice control interface.

In another aspect, an instant messaging device is provided, the device comprising: the device comprises a receiving module, a display module and a processing module;

the receiving module is configured to receive a voice wake-up signal in the process of displaying the user interface of the instant messaging program;

the display module is configured to wake up and display a voice control interface of the instant communication program according to the voice wake-up signal;

the receiving module is configured to receive a voice control signal in the process of displaying a voice control interface of the instant messaging program;

And the processing module is configured to perform at least one of message sending, reading new messages and function setting to other contacts in the instant messaging program by adopting conversational voice interaction according to the voice control signal.

In another aspect, a computer device is provided, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the instant messaging method as provided in the embodiments of the present application.

In another aspect, a computer-readable storage medium is provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, and which is loaded and executed by the processor to implement the instant messaging method as provided in the embodiments of the present application.

In another aspect, a computer program product is provided, which when run on a computer causes the computer to perform the instant messaging method as provided in the embodiments of the present application.

The beneficial effects that technical scheme that this application embodiment brought include at least:

by receiving the input voice control signal and adopting dialogue type voice interaction, the control operation corresponding to the input voice control signal is executed in the instant messaging program, the voice feedback can be obtained without touch operation of a user, the efficiency of using the instant messaging program is improved, and meanwhile, the dialogue type interaction mode is used for being responsible for message transmission between the user and the function setting of the instant messaging program, the voice control process is clarified, and the accuracy of voice control is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a block diagram of a terminal according to an exemplary embodiment of the present application;

fig. 2 is a flowchart of an instant messaging method provided in an exemplary embodiment of the present application;

Fig. 3 is a schematic diagram of an instant messaging method provided in an exemplary embodiment of the present application;

FIG. 4 is a flow chart of an instant messaging method provided by an exemplary embodiment of the present application;

fig. 5 is a schematic diagram of an instant messaging method provided in an exemplary embodiment of the present application;

FIG. 6 is a flow chart of an instant messaging method provided by an exemplary embodiment of the present application;

fig. 7 is a schematic diagram of an instant messaging method provided in an exemplary embodiment of the present application;

fig. 8 is a schematic diagram of an instant messaging method provided in an exemplary embodiment of the present application;

FIG. 9 is a flow chart of an instant messaging method provided by an exemplary embodiment of the present application;

fig. 10 is a schematic diagram of an instant messaging method provided by an exemplary embodiment of the present application;

fig. 11 is a schematic diagram of an instant messaging method provided by an exemplary embodiment of the present application;

fig. 12 is a schematic diagram of an instant messaging method provided by an exemplary embodiment of the present application;

fig. 13 is a flowchart of an instant messaging method provided by an exemplary embodiment of the present application;

fig. 14 is a schematic diagram of an instant messaging method provided by an exemplary embodiment of the present application;

fig. 15 is a schematic diagram of an instant messaging method provided by an exemplary embodiment of the present application;

FIG. 16 is a flow chart of an instant messaging method provided by an exemplary embodiment of the present application;

fig. 17 is a schematic diagram of an instant messaging method provided in an exemplary embodiment of the present application;

FIG. 18 is a diagram illustrating messaging in an instant messaging program, according to an exemplary embodiment of the present application;

FIG. 19 is a schematic illustration of text-to-speech provided by an exemplary embodiment of the present application;

FIG. 20 is a diagram illustrating matching contacts in an address book, as provided in an illustrative embodiment of the present application;

fig. 21 is a block diagram illustrating an instant messaging device according to an exemplary embodiment of the present application;

fig. 22 is a block diagram illustrating an instant messaging device according to an exemplary embodiment of the present application;

fig. 23 is a block diagram of a server according to an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.

First, terms referred to in the embodiments of the present application are briefly described:

instant messaging: is a terminal service, which allows two or more people to use the network to transmit text messages, files, voice and video in real time. Typical representatives are: and instant messaging software such as a mobile phone QQ, WeChat, WhatsApp and the like.

AIO (all In one): refers to a public chat window component. In the instant messaging software, users participate in many different types of conversations such as friends, groups and public accounts, in order to provide uniform interactive experience for the users, the software provides chat window components shared by different conversations, and behavior habits of the users such as input, click operation and the like in the chat window components can be considered to be consistent.

Fig. 1 shows a block diagram of a terminal 1100 according to an exemplary embodiment of the present application, where an instant messaging program is installed in the terminal 1100. The terminal 1100 may be an electronic device that is installed and operated with an application program and can perform screen projection, such as a smart phone, a tablet computer, an electronic book, a portable personal computer, and the like. Terminal 1100 in the present application may include one or more of the following components: a processor 1110, a memory 1120, and a screen 1130.

Processor 1110 may include one or more processing cores. The processor 1110 interfaces with various interfaces and circuitry throughout the various portions of the terminal 1100, and performs various functions of the terminal 1100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1120, and invoking data stored in the memory 1120. Alternatively, the processor 1110 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1110 may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is responsible for rendering and drawing the content that the screen 1130 needs to display; the modem is used to handle wireless communications. It is to be appreciated that the modem can be implemented by a single communication chip without being integrated into the processor 1110.

The Memory 1120 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1120 includes a non-transitory computer-readable medium. The memory 1120 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1120 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, and the like), instructions for implementing the above method embodiments, and the like, and the operating system may be an Android (Android) system (including a system based on Android system depth development), an IOS system developed by apple corp (including a system based on IOS system depth development), or other systems. The stored data area may also store data created by terminal 1100 during use (e.g., phonebook, audio-visual data, chat log data), and the like.

The screen 1130 may be a touch display screen for receiving a touch operation of a user on or near the screen using any suitable object such as a finger, a stylus, or the like, and displaying a user interface of each application. The touch display screen is generally provided on a front panel of the terminal 1100. The touch display screen may be designed as a full-screen, a curved screen, or a profiled screen. The touch display screen can also be designed as a combination of a full screen and a curved screen, and a combination of a special screen and a curved screen, which is not limited in this embodiment.

In addition, those skilled in the art will appreciate that the configuration of terminal 1100 illustrated in the above-identified figures is not meant to be limiting with respect to terminal 1100, and that the terminal may include more or fewer components than illustrated, or some components may be combined, or a different arrangement of components. For example, the terminal 1100 further includes a radio frequency circuit, a shooting component, a sensor, an audio circuit, a Wireless Fidelity (WiFi) component, a power supply, a bluetooth component, and other components, which are not described herein again.

Fig. 2 is a flowchart illustrating an instant messaging method according to an exemplary embodiment of the present application. The method is applied to the terminal shown in fig. 1, and comprises the following steps:

step 201, in the process of displaying a user interface of an instant messaging program, receiving a voice wake-up signal;

the Instant Messaging (Instant Messaging) program is a program for realizing online chat and communication by an Instant Messaging technology. The instant messaging process allows two or more people to communicate text messages, files, voice and video in real time using a network.

Illustratively, instant messaging programs include, but are not limited to: at least one of QQ, WeChat, Feixin, MSN and strange.

The user interface of the instant messaging program is any user interface belonging to the instant messaging program.

In this application, the instant messenger program is exemplified as QQ for illustrative purposes.

Optionally, the user interface of the instant messaging program includes but is not limited to: at least one of a chat list interface, a chat window interface AIO chatting with a friend, a presentation interface of a friend's movement, and a setting interface of a QQ as shown in (a) of fig. 3.

The voice wake-up signal is a voice signal for starting the conversational voice interaction function of the instant messaging program.

Optionally, under the condition that the terminal shown in fig. 1 runs with the instant messaging program, the voice wake-up signal may be received, specifically, when the user inputs the voice wake-up signal, the voice wake-up signal is collected by a microphone of the terminal, where the instant messaging program obtains the permission to use the microphone.

Optionally, the voice wake-up signal is a voice signal input by a user and including a specific wake-up word, for example: including a voice signal that wakes up the word "hello, something". The specific awakening word of the voice awakening signal is set by default in the instant communication program; alternatively, the specific wake-up word of the voice wake-up signal is user-selected to be set.

Illustratively, in the user interface of the QQ as shown in (a) in fig. 3, the QQ acquires the right to use the microphone, and the QQ receives a voice wake-up signal input by the user from the microphone: "hello, QQ".

It should be noted that when the user inputs not the voice wake-up signal but other voice signals, the conversational voice interaction function of the instant messaging program is not started.

Step 202, according to the voice wake-up signal, waking up and displaying a voice control interface of the instant communication program;

the voice control interface is an interface for displaying the conversational voice interaction situation of the instant communication program.

Optionally, the text corresponding to the voice wake-up signal is displayed above the voice control interface by default.

Optionally, a voice control interface of the instant messaging program is displayed on a screen of the terminal in a full screen manner; or, the voice control interface of the instant communication program is displayed on the screen of the terminal in a half screen mode; or, the voice control interface of the instant messaging program is displayed on the screen of the terminal in a floating mode.

Step 203, receiving a voice control signal in the process of displaying a voice control interface of the instant messaging program;

the voice control signal is a voice signal for controlling the instant messaging program to perform some operations.

Optionally, the text corresponding to the voice control signal is displayed above the voice control interface by default.

Optionally, the content of the voice control signal includes: at least one of an operational intent and a target contact. As shown in fig. 3 (b), after receiving the voice wakeup signal, displaying a voice control interface of the instant messaging program, where the voice control interface prompts an optional voice control signal. The voice control signal "open setting" includes an operation intention "open setting". The voice control signal "message to the minuscule" includes an operation intention "message to" and a target contact "minuscule". The voice control signal "make a voice call to xiaoming" includes an operation intention "make a voice call" and a target contact "xiaoming".

As shown in (c) of fig. 3, the voice control signal is "give dimension issue message", the previous voice is a voice wake-up signal, and the text corresponding to the voice wake-up signal is displayed above the text corresponding to the voice control signal with a smaller font size.

Step 204, adopting dialogue type voice interaction according to the voice control signal, and executing at least one of message sending, reading new messages and function setting to other contacts in the instant communication program;

the dialogue type voice interaction refers to an interaction mode of outputting dialogue type response voice signals corresponding to input voice signals.

The conversational speech interaction includes: at least one of a single-round of conversational voice interaction and a multi-round of conversational voice interaction.

In summary, in the method provided in this embodiment, by receiving the input voice control signal and using the conversational voice interaction, the control operation corresponding to the input voice control signal is executed in the instant messaging program, a voice response can be obtained without a user touching the instant messaging program, hands of the user are freed, and efficiency of using the instant messaging program is improved.

In an alternative embodiment based on fig. 2, fig. 4 shows a flowchart of an instant messaging method provided by an exemplary embodiment of the present application. In this embodiment, step 204 in the above embodiment may alternatively be implemented as step 2041 and step 2042, where the method includes:

2041, displaying a first prompt text corresponding to the voice control signal in a first display area on the voice control interface;

schematically, after the voice wake-up signal starts the conversational voice interaction function, the instant messaging program in the terminal receives the voice control signal through the microphone, and the voice content in the voice control signal is converted into text content, namely first prompt text, through the voice-to-text module in the processor, and the text content is displayed in the first display area of the voice control interface.

Optionally, the first display area is located at the top of the voice control interface.

Optionally, the voice control signal is used for instructing at least one of message sending, reading new messages and function setting to other contacts in the instant messaging program.

Step 2042, playing the dialogue type response voice corresponding to the voice control signal, and displaying a second prompt text of the dialogue type response voice in a second display area on the voice control interface;

Wherein the conversational response voice comprises: the execution result of the voice control signal, or the candidate prompt information of the next voice control corresponding to the voice control signal.

Illustratively, as shown in fig. 5 (a), the voice control signal is "send a message to a dimension", the target contact in the voice control signal is "dimension", the server matches the "dimension" with the remark names of the friends in the list, and successfully matches the dimension to the friends, and the conversational response voice is "found, and is opening a conversation …", which is the execution result of the voice control signal.

In one example, when the candidate prompt messages include at least two candidate prompt messages, third prompt words corresponding to the at least two candidate prompt messages respectively are displayed in a third display area on the voice control interface.

In one example, when receiving the voice control signal, at least one of the following three steps is further included:

displaying a sound wave feedback animation corresponding to the voice control signal in a fourth display area on the voice control interface, wherein the sound wave feedback animation is used for displaying the waveform of the voice control signal;

displaying voice operation prompt information corresponding to the voice control signal in a fifth display area on the voice control interface;

And displaying sound wave recording feedback animation corresponding to the voice control signal in a sixth display area on the voice control interface, wherein the sound wave recording feedback animation is used for displaying the recording condition of the voice control signal.

As shown in fig. 5 (a), a first prompt text "give dimension issue message" corresponding to the voice control signal is displayed in the first display area 501, and is located at the top on the voice control interface; the second display area 502 displaying the second prompt text "found, open conversation …" of the dialogue-like response voice is located below the first display area 501; the matched friend is displayed in the third display area 503, which is located below the second display area 502; a sound wave feedback animation corresponding to the voice control signal is displayed in the fourth display area 504, and the sound wave feedback animation is used for displaying the waveform of the voice control signal and is positioned below the third display area 503; a voice operation prompt message 'please talk and i listen' corresponding to the voice control signal is displayed in the fifth display area and is positioned below the fourth display area 504; and sound wave recording feedback animation corresponding to the voice control signal is displayed in the sixth display area, and the sound wave recording feedback animation is used for displaying the recording condition of the voice control signal and is positioned below the fifth display area 505.

In summary, the method provided in this embodiment displays the text corresponding to the received voice control signal on the voice control interface, and displays the corresponding response information on the interface, including: the method has the advantages that the characters corresponding to the dialogue type response voice, the sound wave feedback animation, the voice operation prompt information and the sound wave recording feedback animation can be clearly known by a user, the vision is facilitated to be free of obstacles, the situation that the user with two hands feels the dialogue type voice interaction visually is relieved by using the instant messaging method, and the accuracy and the convenience degree of voice control are improved.

In an alternative embodiment based on fig. 4, fig. 6 shows a flowchart of an instant messaging method provided in an exemplary embodiment of the present application. In this embodiment, the voice control signal is a first voice control signal for contacting the first contact, and step 2042 in the above embodiment may alternatively be implemented as step 2043:

step 2043, when the first contact has unique match in the address book, playing a dialogue type response voice for opening the session interface of the first contact, and displaying a second prompt text for opening the session interface of the first contact in a second display area on the voice control interface;

The first contact is a target contact in the first voice control signal. Optionally, the second contact is a friend or a group.

The method comprises the steps that the first contact person has unique matching in an address list, namely a voice control signal input by a user is received, the server converts the voice control signal into corresponding characters, the corresponding characters comprise the first contact person, and the first contact person is matched with all contact persons in the address list one by one to obtain a unique matching result.

In one example, contacting the first contact includes: at least one of sending a text message, initiating a voice call, and initiating a video call to the first contact.

Optionally, when the first contact is contacted, the session interface is an AIO interface with the first contact when the text message is sent to the first contact; when the first contact is contacted, the conversation interface is an interface for waiting the first contact to answer the voice call; when the first contact is contacted, the video call is initiated, and the session interface is an interface for waiting the first contact to answer the video call.

As shown in fig. 5, contacting the first contact is sending a text message to the first contact. In fig. 5 (a), the voice control signal is "send message to dimension", the first contact is a dimension, the first contact has unique match in the address book, the dialogue response voice of the session interface of the dimension being opened is played, and the second prompt text of the session interface of the dimension being opened is displayed in the second display area on the voice control interface. In fig. 5 (b), the AIO is opened for transition switching, and the interface is switched from the voice control interface to the AIO. In fig. 5 (c), AIO is entered.

As shown in fig. 7, contacting the first contact is initiating a voice call to the first contact. In fig. 7 (a), the voice control signal is "make a voice call to dimension", the first contact is dimension, the first contact has a unique match in the address book, a dialog response voice that is determined to be the friend is played, and the matching result of the first contact is confirmed again. In fig. 7 (b), after receiving the confirmation voice signal, playing the dialogue-type response voice of the conversation interface of the dimension being opened, and displaying the second prompt text of the conversation interface of the dimension being opened in the second display area on the voice control interface. In fig. 7 (c), an interface waiting for the first contact to answer the voice call is opened for transition switching, and the interface is switched from the voice control interface to an interface waiting for the first contact to answer the voice call. In fig. 7 (d), an interface waiting for the first contact to answer the voice call is entered.

As shown in fig. 8, contacting the first contact is initiating a video call to the first contact. In fig. 8 (a), the voice control signal is "call a video to a dimension", the first contact is a dimension, the first contact has a unique match in the address book, a dialogue-type response voice that is determined to be the friend is played, and the matching result of the first contact is confirmed again. In fig. 8 (b), after receiving the confirmation voice signal, playing the dialogue-type response voice of the conversation interface of the dimension being opened, and displaying the second prompt text of the conversation interface of the dimension being opened in the second display area on the voice control interface. In fig. 8 (c), an interface waiting for the first contact to answer the video call is opened for transition switching, and the interface is switched from the voice control interface to an interface waiting for the first contact to answer the video call. In fig. 8 (d), an interface is entered to wait for the first contact to answer the video call.

In summary, according to the method provided by this embodiment, when the target contact in the voice control signal has the unique matching result in the address book, the corresponding session interface is opened, so that the efficiency of performing a session in a scene where an instant messaging program is used without touch operation is improved.

In an alternative embodiment based on fig. 4, fig. 9 shows a flowchart of an instant messaging method provided by an exemplary embodiment of the present application. In this embodiment, the voice control signal is a second voice control signal for contacting the second contact, and step 2042 in the above embodiment may alternatively be implemented as step 2044:

step 2044, when at least two candidate matching contacts exist in the address list for the second contact in the first prompt text, playing a dialogue type response voice of the candidate prompt sequence numbers of the at least two candidate matching contacts, and displaying the second prompt text of the candidate prompt sequence numbers of the at least two candidate matching contacts in a second display area on the voice control interface.

The second contact is a target contact in the second voice control signal. Optionally, the second contact is a friend or a group.

The candidate matching contacts are a plurality of candidate contacts matched with the second contact in the address list; or a plurality of common contacts in the address book. Each candidate matching contact corresponds to a candidate prompt sequence number.

The server cannot obtain unique matching for the second contact in the address list, and the condition that the candidate matching contact exists includes: a plurality of matching results exist in the second contact in the address list, and the second contact corresponds to a plurality of candidate matching contacts; or, the second contact does not have any matching result in the address book, and the common contact is taken as a candidate matching contact.

Optionally, contacting the second contact includes: at least one of sending a text message, initiating a voice call, and initiating a video call to the second contact.

In one example, the voice control signal further comprises: a third voice control signal for selecting a target candidate prompt sequence number from the at least two candidate prompt sequence numbers; determining a second contact corresponding to the target candidate prompt sequence number according to the third voice control signal; the method further includes playing a conversational response voice that is opening the second contact's conversation interface, and displaying a second prompt text that is opening the second contact's conversation interface in a second display area on the voice control interface.

As shown in fig. 10, contacting the second contact is sending a text message to the second contact. In fig. 10 (a), the voice control signal is "send message to dimension", and the second contact is dimension. The second contact has 4 matching results in the address book: the candidate matched contact person with the candidate prompt sequence number of 1 is pico; the candidate prompt sequence number 2 is the candidate matching contact weiwei; the candidate prompt sequence number 3 is that the candidate matching contact person smiles; the candidate prompt sequence number of 4 is the candidate matching contact Wenwer classmate. And playing the dialogue type response voice of the candidate prompt sequence numbers of the 4 candidate matched contacts, and displaying the second prompt words for finding the 4 friends and sending the messages to the fourth user in a second display area on the voice control interface.

In (b) of fig. 10, a third voice control signal selecting a candidate cue number 2 from among 4 candidate cue numbers is received; determining a second contact person weiwei corresponding to the target candidate prompt sequence number 2 according to the third voice control signal; playing a dialogue response voice of the conversation interface of the weiwei being opened, and displaying a second prompt text of the conversation interface of the weiwei being opened in a second display area on the voice control interface. In fig. 10 (c), the AIO is turned on to perform transition switching, and the interface is switched from the voice control interface to the AIO.

As shown in fig. 11, contacting the second contact is initiating a voice call to the second contact. In fig. 11 (a), the voice control signal is "send message to dimension", and the second contact is dimension. The second contact has 4 matching results in the address book: the candidate matched contact person with the candidate prompt sequence number of 1 is pico; the candidate prompt sequence number 2 is the candidate matching contact weiwei; the candidate prompt sequence number 3 is that the candidate matching contact person smiles; the candidate prompt sequence number of 4 is the candidate matching contact Wenwer classmate. And playing the dialogue type response voice of the candidate prompt sequence numbers of the 4 candidate matched contacts, and displaying the second prompt words for finding the 4 friends and making the voice call for the second number in a second display area on the voice control interface.

In fig. 11 (b), a third voice control signal selecting a candidate cue sequence number 1 from among 4 candidate cue sequence numbers is received; determining the second contact person pico corresponding to the target candidate prompt sequence number 1 according to the third voice control signal; and displaying a second prompt text of the slightly opened conversation interface in a second display area on the voice control interface. In fig. 11 (c), the interface waiting for the second contact to answer the voice call is opened for transition switching, and the interface is switched from the voice control interface to the interface waiting for the second contact to answer the voice call. In fig. 11 (d), an interface for waiting for the second contact to answer the voice call is entered.

In one example, the voice control signal further comprises: the fourth voice control signal is used for selecting a target candidate prompt sequence number from the at least two candidate prompt sequence numbers to carry out remark modification, and the fifth voice control signal is used for determining a modified name of a second contact corresponding to the target candidate prompt sequence number; determining a second contact corresponding to the target candidate prompt sequence number for remark modification according to the fourth voice control signal, and playing a dialogue type response voice for modifying the remark of the second contact; determining a modified name of the second contact according to the fifth voice control signal; the method further includes playing a conversational response voice that is opening the second contact's conversational interface, and displaying a second prompt text that is opening the second contact's conversational interface in a second display area on the voice control interface.

As shown in fig. 12, contacting the second contact is sending a text message to the second contact. In fig. 12 (a), the voice control signal is "send message to dimension", and the second contact is dimension. The second contact does not have any matching result in the address list, and the common contact in the address list is taken as a candidate matching contact: the candidate matched contact person Didi with the candidate prompt sequence number of 1; the candidate matching contact person withholds the head with the candidate prompt sequence number of 2; the candidate prompt sequence number 3 is that the candidate matching contact eats fructus Pyri; the candidate prompt sequence number of 4 is the candidate matching contact Wenwer classmate. And playing the dialogue type response voice of the candidate prompt sequence numbers of the 4 candidate matched contacts, and displaying the second prompt words which do not find the dimension in a second display area on the voice control interface and send the messages to the fourth prompt words.

In (b) in fig. 12, a fourth voice control signal for selecting a target candidate prompt sequence number from the 4 candidate prompt sequence numbers to modify the remarks is received, a second contact didi corresponding to the target candidate prompt sequence number 1 for modifying the remarks is determined according to the fourth voice control signal, and a dialogue type response voice for modifying the remarks of the second contact is played.

In fig. 12 (c), a fifth voice control signal for determining the modification name of the second contact corresponding to the target candidate prompt sequence number is received, and the modification name of the second contact is determined according to the fifth voice control signal.

Alternatively, as shown in fig. 12 (d), a fifth input signal for determining the modification name of the second contact corresponding to the target candidate presentation sequence number is received, and the modification name of the second contact is determined from the fifth input signal.

The method further includes playing a conversational response voice that is opening the second contact's conversation interface, and displaying a second prompt text that is opening the second contact's conversation interface in a second display area on the voice control interface.

In summary, according to the method provided in this embodiment, when the second contact in the voice control signal has multiple matching results or no matching result in the address book, the candidate matching contact is displayed, and the session interface of the corresponding second contact is opened according to the voice control signal, so that the efficiency of performing a session in a scenario where an instant messaging program is used in a touchless operation is improved.

In an alternative embodiment based on fig. 4, fig. 13 shows a flowchart of an instant messaging method provided in an exemplary embodiment of the present application. In this embodiment, the voice control signal is a sixth voice control signal for reading the new message, and step 2042 in the above embodiment may alternatively be implemented as step 2045:

Step 2045, a dialogue type response voice for reading the new message is played, a second prompt text of a third contact corresponding to the new message is displayed in a second display area on the voice control interface, and a third prompt message thumbnail corresponding to the new message is displayed in a third display area on the voice control interface;

the new message is a message that has not been read in the instant messenger.

Optionally, the new message is a text message or a voice message. If the new message is a character message, reading the new message; and if the new message is a voice message, playing the new message.

The third contact is a target contact in the third voice control signal. Optionally, the third contact is a friend or a group.

The third prompting message thumbnail is the thumbnail of all new messages of the third contact, or the thumbnail of the new message of the third contact which is played currently.

In one example, a dialogue type response voice whether to reply the new message is played, and a second prompt text whether to reply the new message is displayed in a second display area on the voice control interface; and receiving a seventh voice control signal for confirming whether to reply the new message, and performing at least one of replying the new message, not replying the new message and reading the next new message.

Illustratively, if the seventh voice control signal is "reply", then replying to the new message is performed; if the seventh voice control signal is 'cancel', executing not replying the new message, and exiting the voice control interface for reading the new message currently; and if the seventh voice control signal is not received, reading the next new message by default.

As shown in fig. 14 (a), the voice control signal is "help me read new message". In fig. 14 (b), a message "go to zhou jeren's concert at night? And displaying a second prompt text 'dimension language' of a third contact corresponding to the new message in a second display area on the voice control interface, and displaying a third prompt message thumbnail corresponding to the new message in a third display area on the voice control interface.

As shown in fig. 14 (c), after all the new messages of the dimension are read, the dialog type response voice of whether to reply to the new message is played, and the second prompt text of whether to reply to the new message is displayed in the second display area on the voice control interface. As shown in fig. 14 (d), the seventh voice control signal for confirming the reply new message is received, the second prompt text which is being recorded is displayed in the second display area on the voice control interface, and after the recording is finished, if the voice control signal for confirming the transmission is received, the voice reply message is transmitted.

In fig. 15, the voice control signal is "help me read new message". In fig. 15 (a), the message "go to jerry in the evening at singing meeting of zhou jeron? And displaying a second prompt character for whether to reply the new message in a second display area on the voice control interface, and displaying a third prompt message thumbnail corresponding to the new message in a third display area on the voice control interface.

As shown in fig. 15 (b), after the dialog response voice indicating whether to reply to the new message is played, if the seventh voice control signal is not received, the second prompt text of the next new message is automatically read when the second display area on the voice control interface is displayed as no operation.

As shown in fig. 15 (c), when the seventh voice control signal is not received, the second prompt text "small hair comes to new message is displayed in the second display area on the voice control interface, and the third prompt message thumbnail corresponding to the new message is displayed in the third display area on the voice control interface.

In summary, according to the method provided in this embodiment, when a new message is received in the instant messaging program, the received new message is read aloud through the voice control signal, and the new message can be replied through the voice control, so that the efficiency of performing a conversation in a scene where the instant messaging program is used without a touch operation is improved.

In an alternative embodiment based on fig. 4, fig. 16 shows a flowchart of an instant messaging method provided in an exemplary embodiment of the present application. In this embodiment, the voice control signal is an eighth voice control signal for sending a voice message to the fourth contact, and step 2042 in the above embodiment may alternatively be implemented as step 2046:

step 2046, playing the recorded dialogue-type response voice, displaying the recorded second prompt text in a second display area on the voice control interface, receiving the voice message sent to the fourth contact, and displaying a third prompt text corresponding to the sent voice message in a third display area on the voice control interface;

the fourth contact is a target contact in the fourth voice control signal. Optionally, the fourth contact is a friend or a group.

As shown in (a) of fig. 17, the voice control signal is to issue a voice to the dimension. As shown in (b) of fig. 17, the dialogue response voice which is being recorded is played, the second prompt text which is being recorded is displayed in the second display area on the voice control interface, and the thumbnail of the chat frame with the fourth contact person is displayed in the third display area on the voice control interface.

As shown in fig. 17 (c), a voice message sent to the fourth contact for 5 seconds is received, and a third prompt text corresponding to the sent voice message is displayed in a third display area on the voice control interface, that is, a third prompt text "i am on the road, there is a little traffic congestion, and you wait a little, i arrive immediately";

in summary, according to the method provided in this embodiment, when the user wants to send a voice message to a friend in the address book through the instant messaging program, the voice control interface for sending the voice message is turned on through the voice control signal, and the received voice message of the user is sent, so that the efficiency of performing a conversation in a scenario where the instant messaging program is used without a touch operation is improved.

the server of the instant messaging program needs to provide the following services: voice assistant access services, voice services, semantic services, text-to-voice services, messaging services, and relationship chain services.

The voice assistant access service is a service for establishing communication with a client and transmitting and receiving data to realize the function of transmitting and receiving messages.

The voice service refers to a service for receiving voice data and converting the voice data into a voice text.

The semantic service refers to a service that recognizes the content of voice data.

A text-to-speech (TTS) service refers to a service that converts text data into voice data.

Schematically, the text-to-speech method is shown in fig. 19.

1901, performing text analysis on the second prompt words displayed in the second display area to obtain the phonon characteristics corresponding to the second prompt words;

the second prompt text is text information which is displayed in the second display area and feeds back the voice control signal.

Illustratively, the second prompt is "hello". The phones are "n, i, h, ao".

Text analysis refers to linguistic analysis of an input text, and lexical, grammatical and semantic analysis is performed sentence by sentence to determine the low-level structure of a sentence and the composition of phonemes of each word, including sentence break, word segmentation, polyphonic word processing, numeric processing, abbreviation processing, and the like.

Step 1902, constructing a duration model according to the phonon characteristics obtained by text analysis, and expanding the phonon characteristics through the duration model;

a phonon is a unit of speech recognition modeling. The phonon features of the text are extracted, via step 1901.

Optionally, the duration model is used to predict the duration of the phone, and the duration model takes a frame as a basic unit. The phonon features are augmented by a duration model.

Illustratively, the second prompt is "hello". The phones are "n, i, h, ao". Wherein, the duration of "n" is 2 frames, the duration of "i" is 1 frame, the duration of "h" is 2 frames, and the duration of "ao" is 2 frames.

Step 1903, constructing an acoustic model according to the extended phonon characteristics to obtain acoustic characteristics of the second prompt text;

the acoustic model is used to predict the acoustic features at the frame level.

Optionally, the acoustic features include, but are not limited to: fundamental frequency, spectral and non-periodic components.

1904, converting the acoustic features into voice waveforms through a vocoder;

the vocoder analyzes the voice signal at the transmitting end, extracts the acoustic characteristics of the voice signal, encodes and encrypts the voice signal to obtain the matching with the channel, transmits the voice signal to the receiving end through the information channel, and restores the original voice waveform according to the received acoustic characteristics.

And obtaining a voice waveform corresponding to the second prompt text 'hello' through a vocoder, and feeding back to the user through voice.

A messaging service refers to a service that sends messages from a client to other clients or receives messages from other clients.

The relationship chain service refers to a service that manages contacts of a user who are in a relationship chain.

The step of sending the message comprises:

1800, performing awakening detection on the instant messaging program in the client;

and awakening the voice control interface for displaying the instant communication program by receiving the voice awakening signal.

Optionally, the client receives a voice wake-up signal from the user through a microphone.

Step 1811, after receiving the voice control signal of sending voice message to contact a, the client sends the voice control signal to the voice assistant access service module in the server;

step 1812, the voice assistant access service module in the server sends the received voice control signal to the voice service module in the server;

step 1813, the voice service module in the server obtains the nickname of the contact person from the relation chain service module in the server to match the contact person;

a contact is a person or group in an address book in an instant messaging program.

And matching the nickname of the target contact person a with the nicknames of all the contact persons in the address list. The matching case includes: and obtaining a unique matching result, obtaining a plurality of matching results, and not obtaining the matching result.

Step 1814, the voice service module in the server converts the voice control signal into a voice text and sends the voice text to the semantic service module in the server;

and the semantic service module analyzes the meaning contained in the voice text corresponding to the voice control signal.

Step 1815, the semantic service module in the server analyzes and processes the voice text, and sends the processed instruction and the matched contact information to the voice assistant access service module in the server;

illustratively, the voice text corresponding to the voice control signal is "three times a message", the voice text is analyzed as an instruction "send a message", and the target contact is "three times a message".

Step 1816, the voice assistant access service module in the server sends the received instruction and the matching contact information to the client.

The matching contact is the matching result of the target contact in the voice control signal.

Optionally, the matching contact is one or more.

Step 1817, the client sends the message to the message service module in the server according to the received instruction and the information of the matched contact.

The step of receiving a message comprises:

Step 1821, the client receives the message from the message service module in the server;

the message is a text message.

Step 1822, the client sends the text content of the message to the voice assistant access service module in the server;

step 1823, the voice assistant access service module in the server sends the text content to the text-to-voice service module in the server;

step 1824, the text-to-speech service module in the server converts the text content into speech data and sends the speech data to the text-to-speech service module in the server;

in step 1825, the voice assistant access service module in the server sends the received voice data to the client for playing.

Through this step, reading aloud of the received new message is completed.

Fig. 20 is a schematic diagram illustrating matching contacts in an address book according to an exemplary embodiment of the present application, including:

step 2001 of recognizing the voice control signal includes: intent recognition and entity recognition.

The intention recognition includes: at least one of messaging other contacts, reading new messages, and function settings.

The entity identification identifies a nickname for the target contact. Wherein the target contact is a person or a group. Illustratively, the voice control signal corresponds to the voice text "three messages", and the target contact is "three.

Step 2002, extracting the nickname of the identified target contact, and comparing the nickname with the nickname of the contact in the contact list in the address list of the user, which specifically comprises step 2003, step 2004 and step 2005:

step 2003, processing the extracted nickname of the target contact;

the treatment process comprises the following steps: normalizing symbols and numbers; if the nickname of the target contact person has the condition of Chinese and English mixing, segmenting Chinese and English; and dividing the segmented Chinese and English according to the Chinese character initial and final dictionary and the English phonetic symbol dictionary respectively to obtain the pronunciation corresponding to the nickname of the target contact person.

Step 2004, phoneme conversion;

phones (phones) are the smallest phonetic unit divided according to natural attributes of speech, and are analyzed according to pronunciation actions in syllables, and one action constitutes one phone. The phone set is the same for both Chinese and English. And unifying all the pronunciations obtained by the division in the step 2003 to Chinese initials and finals according to the determined mapping relation.

Step 2005, similarity measure;

and judging the similarity between the nicknames of all the contacts in the address list and the extracted nickname of the target contact, and matching according to the similarity.

Illustratively, the extracted nickname of the target contact is "zhang san", and the nickname of the contact with the highest similarity in the address list is "zhang san".

Step 2006, obtaining a result;

the highest similarity is the matching result.

Fig. 21 illustrates an instant messaging device provided in an exemplary implementation of the present application, the device comprising: a receiving module 2101, a display module 2102 and a processing module 2103;

a receiving module 2101 configured to receive a voice wake-up signal in the process of displaying the user interface of the instant messaging program;

a display module 2102 configured to wake up a voice control interface displaying an instant messaging program according to the voice wake-up signal;

the receiving module 2101 is configured to receive a voice control signal in the process of displaying the voice control interface of the instant messaging program;

a processing module 2103 configured to perform at least one of messaging to other contacts, reading new messages, and function setting in the instant messaging program using conversational voice interaction based on the voice control signal.

Fig. 22 illustrates an instant messaging device provided in an exemplary implementation of the present application, the device comprising: a receiving module 2101, a display module 2102, a processing module 2103, a playing module 2104;

In one example, the display module 2102 is configured to display a first prompt text corresponding to the voice control signal in a first display area on the voice control interface; a playing module 2104 configured to play the dialogue-type response voice corresponding to the voice control signal; a display module 2102 configured to display a second prompt text for the conversational response voice in a second display area on the voice control interface; wherein the dialogue-type response voice includes: the execution result of the voice control signal, or the candidate prompt information of the next voice control corresponding to the voice control signal.

In one example, the display module 2102 is configured to display a sonic feedback animation corresponding to the voice control signal in a fourth display area on the voice control interface, wherein the sonic feedback animation is used for displaying a waveform of the voice control signal; a display module 2102 configured to display voice operation prompt information corresponding to the voice control signal in a fifth display area on the voice control interface; and the display module 2102 is configured to display sound wave recording feedback animation corresponding to the voice control signal in a sixth display area on the voice control interface, wherein the sound wave recording feedback animation is used for displaying recording conditions of the voice control signal.

In one example, the voice control signal includes: a first voice control signal for contacting a first contact; the playing module 2104 is configured to play the dialogue-type response voice of the session interface of the first contact being opened when the first contact has a unique match in the address book, and the displaying module 2102 is configured to display the second prompt text of the session interface of the first contact being opened in the second display area on the voice control interface. A

In one example, the voice control signal includes: a second voice control signal for contacting a second contact; the playing module 2104 is configured to play the dialogue-type response voice of the candidate prompt sequence numbers of the at least two candidate matching contacts when the second contact in the first prompt text exists in the address list, and the displaying module 2102 is configured to display the second prompt text of the candidate prompt sequence numbers of the at least two candidate matching contacts in a second display area on the voice control interface.

In one example, the voice control signal further comprises: a third voice control signal for selecting a target candidate prompt sequence number from the at least two candidate prompt sequence numbers; a processing module 2103 configured to determine a second contact corresponding to the target candidate prompt sequence number according to the third voice control signal; a playing module 2104 configured to play the dialogue-like response voice that is opening the second contact's conversation interface, and a display module 2102 configured to display a second prompt text that is opening the second contact's conversation interface in a second display area on the voice control interface.

In one example, the voice control signal further comprises: the fourth voice control signal is used for selecting a target candidate prompt sequence number from the at least two candidate prompt sequence numbers to carry out remark modification, and the fifth voice control signal is used for determining a modified name of a second contact corresponding to the target candidate prompt sequence number; the processing module 2103 is configured to determine, according to the fourth voice control signal, a second contact corresponding to the target candidate prompt sequence number for note modification, and the playing module 2104 is configured to play a dialogue-type response voice for modifying a note of the second contact; a processing module 2103 configured to determine a modified name of the second contact from the fifth voice control signal; a playing module 2104 configured to play the dialogue-like response voice that is opening the second contact's conversation interface, and a display module 2102 configured to display a second prompt text that is opening the second contact's conversation interface in a second display area on the voice control interface.

In one example, contacting the second contact includes: at least one of sending a text message, initiating a voice call, and initiating a video call to the second contact.

In one example, the voice control signal includes: a sixth voice control signal for reading the new message; the display module 2102 is configured to display a second prompt text of a third contact corresponding to the new message in a second display area on the voice control interface, and display a third prompt message thumbnail corresponding to the new message in a third display area on the voice control interface.

In one example, the playing module 2104 configured to play a conversational response voice of whether to reply to a new message, the display module 2102 configured to display a second prompt text of whether to reply to a new message in a second display area on the voice control interface; a receiving module 2101 configured to receive a seventh voice control signal confirming whether to reply to the new message, a processing module 2103 configured to perform at least one of replying to the new message, not replying to the new message, reading the next new message.

In one example, the voice control signal includes: an eighth voice control signal for sending a voice message to a fourth contact; the system comprises a playing module 2104 configured to play the recorded dialogue response voice, a display module 2102 configured to display the recorded second prompt text in a second display area on the voice control interface, a receiving module 2101 configured to receive the voice message sent to the fourth contact, and a display module 2102 configured to display a third prompt text corresponding to the sent voice message in a third display area on the voice control interface.

The application also provides a server, which comprises a processor and a memory, wherein at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to realize the instant messaging method provided by the above method embodiments. It should be noted that the server may be a server provided in fig. 23 as follows.

Referring to fig. 23, a schematic structural diagram of a server according to an exemplary embodiment of the present application is shown. Specifically, the method comprises the following steps: the server 1500 includes a Central Processing Unit (CPU)1501, a system memory 1504 including a Random Access Memory (RAM)1502 and a Read Only Memory (ROM)1503, and a system bus 1505 connecting the system memory 1504 and the central processing unit 1501. The server 1500 also includes a basic input/output system (I/O system) 1506, which facilitates transfer of information between various devices within the computer, and a mass storage device 1507 for storing an operating system 1513, application programs 1514, and other program modules 1515.

The basic input/output system 1506 includes a display 1508 for displaying information and an input device 1509 such as a mouse, keyboard, etc. for inputting information by a user. Wherein a display 1508 and an input device 1509 are connected to the central processing unit 1501 via an input output controller 1510 connected to the system bus 1505. The basic input/output system 1506 may also include an input/output controller 1510 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input-output controller 1510 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 1507 is connected to the central processing unit 1501 through a mass storage controller (not shown) connected to the system bus 1505. The mass storage device 1507 and its associated computer-readable media provide non-volatile storage for the server 1500. That is, the mass storage device 1507 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROI drive.

Without loss of generality, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 1504 and mass storage device 1507 described above may be collectively referred to as memory.

The memory stores one or more programs configured to be executed by the one or more central processing units 1501, the one or more programs containing instructions for implementing the instant messaging method described above, and the central processing unit 1501 executes the one or more programs to implement the instant messaging method provided by the various method embodiments described above.

According to various embodiments of the present application, server 1500 may also operate as a remote computer connected to a network via a network, such as the Internet. That is, the server 1500 may be connected to the network 1512 through the network interface unit 1511 connected to the system bus 1505, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 1511.

The memory further comprises one or more programs, the one or more programs are stored in the memory, and the one or more programs comprise steps executed by the server for performing the instant messaging method provided by the embodiment of the invention.

The embodiment of the present application further provides a computer device, where the computer device includes a memory and a processor, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded by the processor and implements the instant messaging method.

An embodiment of the present application further provides a computer-readable storage medium, where at least one instruction, at least one program, a code set, or a set of instructions is stored in the computer-readable storage medium, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the instant messaging method.

The present application further provides a computer program product, which when running on a computer, causes the computer to execute the instant messaging method provided by the above method embodiments.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, which may be a computer readable storage medium contained in a memory of the above embodiments; or it may be a separate computer-readable storage medium not incorporated into the terminal. The computer readable storage medium has at least one instruction, at least one program, code set, or set of instructions stored therein, which is loaded and executed by a processor to implement the instant messaging method described above.

Optionally, the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), Solid State Drive (SSD), or optical disc. The Random Access Memory may include a Resistance Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). The above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the advantages and disadvantages of the embodiments.

It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The present application is intended to cover various modifications, alternatives, and equivalents, which may be included within the spirit and scope of the present application.

Claims

1. An instant messaging method, the method comprising:

Receiving a voice wake-up signal in the process of displaying a user interface of an instant messaging program, wherein the voice wake-up signal is used for starting a conversational voice interaction function, and the conversational voice interaction function is a function provided by the instant messaging program;

according to the voice awakening signal, awakening and displaying a voice control interface of the instant messaging program on the user interface based on the conversational voice interaction function;

displaying a first prompt text corresponding to the voice control signal in a first display area on the voice control interface;

playing a dialogue type response voice corresponding to the voice control signal, and displaying a second prompt text of the dialogue type response voice in a second display area on the voice control interface, wherein the dialogue type response voice comprises an execution result of the voice control signal, or candidate prompt information of next voice control corresponding to the voice control signal;

and transitionally switching the voice control interface to a conversation interface indicated by the voice control signal, wherein the conversation interface is used for conversation with other contacts.

2. The method of claim 1, further comprising:

and when the candidate prompt messages comprise at least two candidate prompt messages, displaying third prompt words corresponding to the at least two candidate prompt messages in a third display area on the voice control interface.

3. The method according to claim 1 or 2, wherein upon receiving the voice control signal, the method further comprises at least one of the following three steps:

4. The method of claim 1, wherein the voice control signal comprises: a first voice control signal for contacting a first contact;

The playing of the dialogue type response voice corresponding to the voice control signal and the displaying of the second prompt text of the dialogue type response voice in the second display area on the voice control interface include:

and when the first contact has unique match in the address list, playing a dialogue type response voice for opening the conversation interface of the first contact, and displaying a second prompt text for opening the conversation interface of the first contact in a second display area on the voice control interface.

5. The method of claim 4,

the contacting the first contact comprises: at least one of sending a text message, initiating a voice call, and initiating a video call to the first contact.

6. The method of claim 1, wherein the voice control signal comprises: a second voice control signal for contacting a second contact;

when the second contact in the first prompt text has at least two candidate matching contacts in the address list, the dialogue type response voice of the candidate prompt sequence numbers of the at least two candidate matching contacts is played, and the second prompt text of the candidate prompt sequence numbers of the at least two candidate matching contacts is displayed in a second display area on the voice control interface.

7. The method of claim 6, wherein the voice control signal further comprises: a third voice control signal for selecting a target candidate prompt sequence number from the at least two candidate prompt sequence numbers;

the method further comprises the following steps:

determining the second contact corresponding to the target candidate prompt sequence number according to the third voice control signal;

and playing the dialogue type response voice of the session interface of the second contact person, and displaying the second prompt words of the session interface of the second contact person in a second display area on the voice control interface.

8. The method of claim 6, wherein the voice control signal further comprises: a fourth voice control signal used for selecting a target candidate prompt sequence number from the at least two candidate prompt sequence numbers to carry out remark modification, and a fifth voice control signal used for determining a modified name of the second contact corresponding to the target candidate prompt sequence number;

the method further comprises the following steps:

determining the second contact corresponding to the target candidate prompt sequence number for remark modification according to the fourth voice control signal, and playing a dialogue type response voice for modifying the remark of the second contact;

Determining a modified name of the second contact according to the fifth voice control signal;

and playing the dialogue type response voice of the session interface of the second contact person, and displaying a second prompt text of the session interface of the second contact person in a second display area on the voice control interface.

9. The method according to any one of claims 6 to 8,

the contacting the second contact comprises: at least one of sending a text message, initiating a voice call, and initiating a video call to the second contact.

10. The method of claim 1, wherein the voice control signal comprises: a sixth voice control signal for reading the new message;

and playing and reading the dialogue type response voice of the new message, displaying a second prompt text of a third contact corresponding to the new message in a second display area on the voice control interface, and displaying a third prompt message thumbnail corresponding to the new message in a third display area on the voice control interface.

11. The method of claim 10, further comprising:

playing a dialogue type response voice whether to reply the new message or not, and displaying a second prompt text whether to reply the new message or not in a second display area on the voice control interface;

and receiving a seventh voice control signal for confirming whether to reply the new message, and performing at least one of replying to the new message, not replying to the new message and reading the next new message.

12. The method of claim 1, wherein the voice control signal comprises: an eighth voice control signal for sending a voice message to a fourth contact;

playing the recorded dialogue response voice, displaying the recorded second prompt words in a second display area on the voice control interface, receiving the voice message sent to the fourth contact person, and displaying third prompt words corresponding to the sent voice message in a third display area on the voice control interface.

13. An instant messaging device, the device comprising: the device comprises a receiving module, a display module and a playing module;

the receiving module is configured to receive a voice wake-up signal in a process of displaying a user interface of an instant messaging program, wherein the voice wake-up signal is used for starting a conversational voice interaction function, and the conversational voice interaction function is a function provided by the instant messaging program;

the display module is configured to wake up and display a voice control interface of the instant messaging program on the user interface based on the conversational voice interaction function according to the voice wake-up signal;

the receiving module is configured to receive a voice control signal in the process of displaying the voice control interface of the instant messaging program;

the display module is configured to display a first prompt text corresponding to the voice control signal in a first display area on the voice control interface;

the playing module is configured to play a dialogue-type response voice corresponding to the voice control signal, where the dialogue-type response voice includes an execution result of the voice control signal, or candidate prompt information of next voice control corresponding to the voice control signal;

The display module is configured to display a second prompt text of the conversational response voice in a second display area on the voice control interface;

the display module is configured to switch the voice control interface to a conversation interface indicated by the voice control signal in a transition mode, and the conversation interface is used for conversation with other contacts.

14. A computer device comprising a processor and a memory, wherein at least one program is stored in the memory, and wherein the at least one program is loaded and executed by the processor to implement the instant messaging method according to any one of claims 1 to 12.

15. A computer-readable storage medium, wherein at least one program is stored in the computer-readable storage medium, and the at least one program is loaded and executed by a processor to implement the instant messaging method according to any one of claims 1 to 12.