CN111819830B

CN111819830B - Information recording and displaying method and terminal in communication process

Info

Publication number: CN111819830B
Application number: CN201880090760.2A
Authority: CN
Inventors: 王骅
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-09-13
Filing date: 2018-09-13
Publication date: 2022-05-17
Anticipated expiration: 2038-09-13
Also published as: WO2020051852A1; CN111819830A

Abstract

The embodiment of the application discloses a method and a terminal for recording and displaying information in a communication process, relates to the technical field of communication, and can improve the relevance between voice data recorded in the voice communication process and a call record of the voice communication. The specific scheme is as follows: in the process of carrying out voice communication between a first terminal and a second terminal, the first terminal identifies voice data captured by a microphone of the first terminal; if the text corresponding to the first voice data captured by the microphone is matched with the preset starting word, the first terminal starts to record the voice data of the second terminal; after the voice data recording is finished, the first terminal displays a text corresponding to the voice data of the first interface; the first terminal receives a first operation of a user on a text corresponding to the voice data, and plays the voice data in response to the first operation.

Description

Information recording and displaying method and terminal in communication process

Technical Field

The embodiment of the application relates to the technical field of communication, in particular to a method and a terminal for recording and displaying information in a communication process.

Background

With the development of electronic technology, the functions of electronic terminals (such as mobile phones) are increasing, and the dependence of users on mobile phones is also increasing. For example, a mobile phone can be used as a communication or entertainment tool, and can also include a memo function and a recorder, that is, a user can record some text or picture information in a memo of the mobile phone and record some voice information in the recorder of the mobile phone.

In order to facilitate a user to record information related to the voice communication during the process of using the mobile phone to perform the voice communication, in a general case, a memo and a recorder entry may be integrated into a voice communication interface of the mobile phone (as shown in fig. 1, a memo open button 103 and a recorder open button 102 are included in a voice communication interface 101 of the mobile phone 100), so that the user may directly start the memo or the recorder through the entry during the voice communication, and record information related to the voice communication through the memo or the recorder.

However, after the voice call is ended, if the user wants to view the information recorded by the memo or recorder, the user needs to start the memo or radio in the mobile phone to view the corresponding information. Moreover, when the user views the memo or the information recorded by the radio, the user may need to look over or listen to a plurality of records to find the information related to the voice call, and the user experience is poor.

Disclosure of Invention

The embodiment of the application provides a method and a terminal for recording and displaying information in a communication process, which can improve the relevance between voice data recorded in the voice communication process and call records of the voice communication.

In a first aspect, an embodiment of the present application provides a method for recording and displaying information during a communication process, which can be applied to a process in which a first terminal and a second terminal perform voice communication. The first terminal can identify voice data captured by a microphone of the first terminal in the process of carrying out voice communication between the first summary segment and the second terminal; if the text corresponding to the first voice data captured by the microphone is matched with the preset starting word, the first terminal starts to record the voice data of the second terminal; the voice data of the second terminal is converted from the audio electric signal received from the second terminal; after the voice data recording is finished, the first terminal displays a first interface comprising a text corresponding to the recorded voice data; the first terminal receives a first operation of a user on a text corresponding to the voice data, and plays the recorded voice data in response to the first operation.

In the embodiment of the application, in the process of voice communication between the first terminal and the second terminal, when a text corresponding to first voice data captured by a microphone of the first terminal is matched with a preset starting word text, the first terminal can automatically start recording the voice data corresponding to the voice communication. In other words, after receiving the voice command (i.e. the first voice data whose text matches the preset verb-start text) issued by the user, the first terminal may automatically start recording the voice data corresponding to the voice communication. The first terminal may display a first interface including a text corresponding to the voice data after the voice data recording is finished. Namely, the first terminal can intuitively display the text information corresponding to the voice data recorded by the first terminal to the user. Moreover, the first terminal can respond to the first operation of the user on the text corresponding to the voice data to play the voice data, and the relevance between the text information and the voice data is improved.

In summary, the first terminal may automatically record corresponding voice data in response to a voice command issued by a user during a voice call. Through the scheme, the terminal is more intelligent, the interaction performance between the terminal and a user is improved, and the user experience is improved.

With reference to the first aspect, in a possible design manner, the first terminal may automatically display the first interface after the voice data recording is finished. Or, the first terminal may display the first interface in response to the end of the voice communication; and when the voice communication is finished, the recording of the voice data is finished, or when the voice communication is finished, the first terminal finishes recording the voice data. Or, the first terminal may display the first interface in response to a second operation input by the user after the voice data recording is finished. The second operation is used for indicating the first terminal to display a call record interface of the first terminal. The first interface is a call log interface. The call record interface comprises the call record items of the voice communication. The call record item is used for recording the call record information of the voice communication. The call record item includes a text corresponding to the recorded voice data. Or, the first terminal may display the first interface in response to a third operation of the user on the call record item of the voice call in the call record interface after the voice data recording is finished. The third operation is to instruct the first terminal to display a recording details interface of the voice communication. The first interface is a record details interface. The record details interface is used for displaying call record information of the voice communication. The recording detail interface comprises a text corresponding to the recorded voice data.

With reference to the first aspect, in another possible design, the first terminal may record not only the voice data of the second terminal, but also the voice data captured by the microphone of the first terminal. The method of the embodiment of the application may further include: and if the text corresponding to the first voice data captured by the microphone is matched with the preset starting word, the first terminal starts to record the voice data captured by the microphone. In the embodiment of the application, the first terminal can record not only the voice data of the second terminal, but also the voice data captured by the microphone of the first terminal. I.e. the first terminal may record the dialog content of the calling party and the called party.

With reference to the first aspect, in another possible design manner, the text corresponding to the recorded voice data includes at least two pieces of text information. The recorded voice data includes at least two pieces of voice data. The at least two sections of text information correspond to the at least two sections of voice data one to one. The first terminal can receive a first operation of a user on a first text message in at least two text messages; and responding to the first operation, and the first terminal plays the first voice data segment. The first text message is one text message of at least two text messages. The first text information corresponds to a first speech data segment.

The text corresponding to the recorded voice data may include at least two pieces of text information. The first terminal can receive a first operation of a user on any one of the at least two sections of text information, and plays the corresponding voice data section. That is, the user can selectively control the first terminal to play any one of the at least two voice data segments recorded by the first terminal.

With reference to the first aspect, in another possible design, when the first terminal converts the recorded voice data into text information, the text information obtained by the conversion may not completely coincide with the text of the voice data recorded by the first terminal. I.e. the text information converted by the first terminal may be subject to some errors. Based on this situation, the first terminal may receive a fourth operation (i.e., a modification operation) of the second text information by the user, the fourth operation being for modifying the second text information into the third text information. The second text information is one of the at least two sections of text information, and the second text information corresponds to the second voice data section. And in response to the fourth operation, the first terminal modifies the second text information into third text information. The user can control the first terminal to play the second voice data segment corresponding to the second text message, and compare the second voice data segment played by the first terminal with the second text message displayed by the first terminal. And when the second text information is inconsistent with the second voice data segment played by the first terminal, operating the first terminal to modify the second text information. After modifying the second text information, the first terminal may display the third text information on the first interface. And the first terminal receives the first operation of the user on the third text message and plays the second voice data segment (namely the voice data segment corresponding to the second text message).

In the embodiment of the application, the first terminal can respond to the modification operation of the user on the text information and replace the text information before modification with the modified text information. Therefore, the user can modify the text information obtained by converting the voice data of the first terminal according to the voice data stored in the first terminal, and can correct errors occurring when the text information is obtained by converting the voice data of the first terminal.

With reference to the first aspect, in another possible design manner, the text corresponding to the recorded voice data may include at least two pieces of text information. When the first terminal displays the text corresponding to the recorded voice data on the first interface, at least two sections of text information can be displayed on the first interface according to the time sequence of the voice data sections corresponding to the recorded text information and the source information of the voice data sections corresponding to the text information. Wherein the source information is used to indicate that the voice data segment is voice data captured by the microphone or voice data of the second terminal.

The first terminal displays at least two sections of text information on the first interface according to the time sequence of the voice data sections corresponding to the recorded text information and the source information of the voice data sections corresponding to the text information. In this way, the conversation content of the calling party and the called party can be clearly shown to the user.

With reference to the first aspect, in another possible design, the first terminal may prompt the user that the first terminal starts recording the voice data when the first terminal starts recording the voice data. Specifically, if a text corresponding to the first voice data captured by the microphone matches a preset start word, the first terminal may send out the first prompt message. The first prompt message is used for prompting the user that the first terminal starts to record the voice data. The first prompt message is a prompt sound or a vibration prompt.

In this embodiment of the application, the first terminal may send the first prompt message to prompt the user that the first terminal starts to record the voice data when the text corresponding to the first voice data matches the preset start word, that is, the first terminal starts to record the voice data. Therefore, the user can know that the first terminal starts to record the voice data through the first prompt message, the direct interaction between the first terminal and the user is increased, the interaction performance between the first terminal and the user is improved, and the user experience is improved.

With reference to the first aspect, in another possible design manner, before the first terminal displays the first interface, the method in the embodiment of the present application may further include: in the process of recording the voice data, if the text corresponding to the second voice data captured by the microphone is matched with the preset ending wakeup word, the first terminal stops recording the voice data.

When the text corresponding to the second voice data captured by the microphone of the first terminal matches with the preset end word text, the first terminal may automatically stop recording the voice data. In other words, after receiving the voice command (i.e. the second voice data whose text matches the preset end word text) sent by the user, the first terminal may automatically stop recording the voice data.

In summary, the first terminal may automatically record and stop recording voice data in response to a voice command issued by a user during a voice call. Through the scheme, the terminal is more intelligent, the interaction performance between the terminal and a user is improved, and the user experience is improved.

With reference to the first aspect, in another possible design, when stopping recording the voice data, the first terminal may further prompt the user that the first terminal stops recording the voice data. Specifically, in the process of recording the voice data, if a text corresponding to the second voice data captured by the microphone is matched with the preset end wakeup word, the first terminal may send out the second prompt message. The second prompt message is used for prompting the user that the first terminal stops recording the voice data. The second prompt message is a prompt sound or a vibration prompt.

In this embodiment of the application, the first terminal may send the second prompt message to prompt the user that the first terminal stops recording the voice data when the text corresponding to the second voice data matches the preset start word, that is, the first terminal stops recording the voice data. Therefore, the user can know that the first terminal stops recording the voice data through the first prompt message, direct interaction between the first terminal and the user is increased, interaction performance between the first terminal and the user is improved, and user experience is improved.

With reference to the first aspect, in another possible design, in a case that the first terminal plays the voice data using the speaker, when the first prompt information or the first prompt information is a prompt sound, the prompt sound emitted by the first terminal may be captured by a microphone of the first terminal. In this case, when the text corresponding to the first voice data matches the preset start word, if it is determined that the first terminal uses the speaker to play the voice data, the first terminal may perform echo suppression on the voice data collected by the microphone according to the voice data played by the speaker.

Thus, the voice data after echo suppression does not include the above-mentioned warning tone. And the voice data sent by the first terminal to the second terminal is the voice data after echo suppression. The voice data sent by the first terminal to the second terminal will not include the above-mentioned warning tone. The user of the second terminal does not hear the alert tone.

Further, after playing the second prompt message, the first terminal may stop performing echo suppression on the voice data collected by the microphone. Therefore, the power consumption generated by the first terminal continuously executing echo suppression can be avoided, and the endurance time of the first terminal is prolonged.

With reference to the first aspect, in another possible design, the first terminal may store at least two pieces of voice data according to the time sequence for recording each piece of voice data and the source information of each piece of voice data. The source information is used for indicating that the voice data segment is the voice data of the second terminal, or the source information is used for indicating that the voice data segment is the voice data of the microphone.

With reference to the first aspect, in another possible design manner, the first interface may include not only at least two pieces of text information, but also at least two player inserts. The at least two playing plugins are used for playing at least two sections of voice data, and the at least two playing plugins correspond to the at least two sections of text information one to one.

In a second aspect, an embodiment of the present application provides a terminal, which is a first terminal. The terminal includes: one or more processors, memory, touch screen, microphone, communication interface, microphone, and speaker; the memory, the display, the communication interface and the processor are coupled; the touch screen is used for displaying the image generated by the processor; the microphone is used for capturing voice data; the memory is for storing computer program code; the computer program code includes computer instructions that, when executed by the processor, cause the processor to perform voice communication with the second terminal via the communication interface; recognizing voice data captured by a microphone; if the text corresponding to the first voice data captured by the microphone is identified to be matched with the preset starting word, starting to record the voice data of the second terminal, wherein the voice data of the second terminal is converted from the audio electric signal received from the second terminal; the processor is also used for saving the recorded voice data in the memory; the processor is also used for controlling the touch screen to display a first interface after the voice data recording is finished; the first interface comprises a text corresponding to the recorded voice data; the processor is also used for receiving a first operation of a user on the text displayed by the touch screen; and controlling the receiver or the loudspeaker to play the voice data in response to the first operation.

With reference to the second aspect, in a possible design manner, the processor is configured to control the touch screen to display the first interface after the recording of the voice data is finished, and the method includes: the processor is used for automatically controlling the touch screen to display a first interface after the voice data recording is finished; or the processor is used for responding to the end of the voice communication, controlling the touch screen to display the first interface, and ending the recording of the voice data when the voice communication is ended; or the processor is used for responding to a second operation input by a user after the voice data recording is finished, controlling the touch screen to display a first interface, wherein the second operation is used for indicating a call record interface of the terminal display terminal, the first interface is a call record interface, the call record interface comprises a call record item of voice communication, the call record item is used for recording call record information of the voice communication, and the call record item comprises a text corresponding to the recorded voice data; or the processor is configured to control the touch screen to display the first interface in response to a third operation of the user on a call record item of the voice call in the call record interface after the voice data recording is finished, where the third operation is used to instruct the terminal to display a record detail interface of the voice communication, the first interface is the record detail interface, the record detail interface is used to display call record information of the voice communication, and the record detail interface includes a text corresponding to the recorded voice data.

With reference to the second aspect, in another possible design manner, the processor is further configured to start recording the voice data captured by the microphone by the first terminal if it is recognized that the text corresponding to the first voice data captured by the microphone matches a preset start word.

With reference to the second aspect, in another possible design manner, the text corresponding to the recorded voice data includes at least two pieces of text information, the voice data includes at least two pieces of voice data, and the at least two pieces of text information correspond to the at least two pieces of voice data one to one. The processor is used for receiving a first operation of a user on a text displayed on the touch screen and controlling a receiver or a loudspeaker to play voice data in response to the first operation, and comprises the following steps: the processor is used for receiving a first operation of a user on first text information displayed by the touch screen; and responding to the first operation, and controlling the receiver or the loudspeaker to play the first voice data segment. The first text information is one of at least two sections of text information; the first text information corresponds to a first speech data segment.

With reference to the second aspect, in another possible design manner, the processor is further configured to receive a fourth operation of the user on the second text information displayed on the touch screen, where the fourth operation is used to modify the second text information into third text information; the second text information is one text information of at least two text information sections, and the second text information corresponds to the second voice data section; in response to a fourth operation, modifying the second text information into third text information; and storing the third text information in the memory, and the corresponding relation between the third text information and the second voice data segment. The processor is further used for controlling the touch screen to display third text information on the first interface; and receiving a first operation of the user on the third text information displayed on the touch screen, and controlling the telephone receiver or the loudspeaker to play the second voice data segment.

With reference to the second aspect, in another possible design manner, the processor is configured to control the touch screen to display the first interface, and includes: and the processor is used for controlling the touch screen to display at least two sections of text information on the first interface according to the time sequence of the voice data sections corresponding to the recorded text information and the source information of the voice data sections corresponding to the text information. Wherein the source information is used to indicate that the voice data segment is voice data captured by the microphone or voice data of the second terminal.

With reference to the second aspect, in another possible design manner, the processor is further configured to send first prompt information if it is recognized that a text corresponding to the first voice data captured by the microphone matches a preset starting word, where the first prompt information is used to prompt the user terminal to start recording the voice data, and the first prompt information is a prompt tone or a vibration prompt.

With reference to the second aspect, in another possible design manner, the processor is further configured to stop recording the voice data if it is recognized that a text corresponding to the second voice data captured by the microphone matches a preset end-word before the touch screen is controlled to display the first interface.

With reference to the second aspect, in another possible design manner, the processor is further configured to, in the process of recording voice data, send a second prompt message if it is recognized that a text corresponding to second voice data captured by the microphone matches a preset end reminder, where the second prompt message is used to prompt the user that the first terminal stops recording the voice data, and the second prompt message is a prompt sound or a vibration prompt.

With reference to the second aspect, in another possible design manner, the first prompt information and the second prompt information are prompt tones, and the processor is further configured to determine that the speaker plays the voice data when a text corresponding to the first voice data matches a preset start word; carrying out echo suppression on voice data collected by a microphone according to the voice data played by a loudspeaker; and after the receiver or the loudspeaker plays the second prompt message, stopping performing echo suppression on the voice data collected by the microphone.

With reference to the second aspect, in another possible design manner, the memory stores at least two pieces of voice data according to the time sequence of recording each piece of voice data and the source information of each piece of voice data. The source information is used to indicate that the voice data segment is captured by the microphone or the source information is used to indicate that the voice data segment is voice data of the second terminal.

With reference to the second aspect, in another possible design manner, the first interface displayed by the touch screen further includes at least two player plug-ins, where the at least two player plug-ins correspond to the at least two segments of text information one to one, and the at least two player plug-ins correspond to the at least two segments of voice data one to one. And the processor is also used for receiving the click operation of the user on the first player plug-in the at least two player plug-ins and controlling the receiver or the loudspeaker to play the voice data segment corresponding to the first player plug-in.

In a third aspect, an embodiment of the present application provides a computer storage medium, where the computer storage medium includes computer instructions, and when the computer instructions are run on a terminal, the terminal is enabled to execute the method for recording and displaying information in a communication process according to the first aspect and any one of the possible design manners thereof.

In a fourth aspect, an embodiment of the present application provides a computer program product, which when running on a computer, causes the computer to execute the method for recording and displaying information during communication process described in the first aspect and any one of the possible design manners thereof.

In addition, for technical effects brought by the terminal according to the second aspect and any design manner thereof, the computer storage medium according to the third aspect, and the computer program product according to the fourth aspect, reference may be made to the technical effects brought by the first aspect and the different design manners, and details are not described here.

Drawings

Fig. 1 is a first schematic diagram of an example display interface provided in an embodiment of the present application;

fig. 2 is a schematic diagram of a hardware structure of a terminal according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of an example communication scenario provided in an embodiment of the present application;

fig. 4 is a first flowchart of a method for recording and displaying information in a communication process according to an embodiment of the present application;

fig. 5 is a schematic diagram of an example display interface provided in the embodiment of the present application;

fig. 6 is a schematic diagram of an example display interface provided in the embodiment of the present application;

fig. 7 is a schematic diagram of an example display interface provided in the embodiment of the present application;

fig. 8 is a schematic diagram of an example display interface provided in an embodiment of the present application;

fig. 9 is a sixth schematic view of an example display interface provided in an embodiment of the present application;

fig. 10A is a flowchart of a method for recording and displaying information in a communication process according to an embodiment of the present application;

fig. 10B is a flowchart of a method for recording and displaying information in a communication process according to the embodiment of the present application;

fig. 10C is a schematic diagram seven of an example display interface provided in the embodiment of the present application;

fig. 11 is an eighth schematic view of an example display interface provided in the embodiment of the present application;

fig. 12 is a schematic diagram nine illustrating an example display interface provided in an embodiment of the present application;

fig. 13 is a schematic diagram ten illustrating an example display interface provided in an embodiment of the present application;

fig. 14 is a first schematic structural component diagram of a terminal according to an embodiment of the present disclosure;

fig. 15 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides a method and a terminal for recording and displaying information in a communication process, which can be applied to the process of voice communication between a first terminal and a second terminal. Specifically, in the process of performing voice communication between the first terminal and the second terminal, when a text corresponding to first voice data captured by a microphone of the first terminal matches a preset start word text, the first terminal may automatically start recording the voice data corresponding to the voice communication. The voice data captured by the microphone of the first terminal is voice data sent by a user of the first terminal. In other words, the first terminal may automatically start recording the voice data corresponding to the voice communication after receiving the voice command (i.e., the first voice data whose text matches the preset verb text) issued by the user of the first terminal.

Of course, when the text corresponding to the second voice data captured by the microphone of the first terminal matches the preset end word text, the first terminal may automatically stop recording the voice data.

In summary, in the process of the voice communication between the first terminal and the second terminal, the first terminal may automatically record and stop recording the voice data in response to the voice command of the user. Through the scheme, the terminal is more intelligent, the interaction performance between the terminal and a user is improved, and the user experience is improved.

The terminal in the embodiment of the present application may be a portable computer (e.g., a mobile phone), a notebook computer, a Personal Computer (PC), a wearable electronic device (e.g., a smart watch), a tablet computer, an Augmented Reality (AR) \ Virtual Reality (VR) device, a vehicle-mounted computer, and the like, and the following embodiment does not specially limit a specific form of the terminal.

Referring to fig. 2, a block diagram of a terminal 200 according to an embodiment of the present disclosure is shown. The terminal 200 may include a processor 210, an external memory interface 220, an internal memory 221, a Universal Serial Bus (USB) interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a radio frequency module 250, a communication module 260, an audio module 270, a speaker 270A, a receiver 270B, a microphone 270C, an earphone interface 270D, a sensor module 280, a key 290, a motor 291, an indicator 292, a camera 293, a display screen 294, a Subscriber Identity Module (SIM) card interface 295, and the like. The sensor module 280 may include a pressure sensor 280A, a gyroscope sensor 280B, an air pressure sensor 280C, a magnetic sensor 280D, an acceleration sensor 280E, a distance sensor 280F, a proximity light sensor 280G, a fingerprint sensor 280H, a temperature sensor 280J, a touch sensor 280K, an ambient light sensor 280L, a bone conduction sensor 280M, and the like.

The illustrated structure of the embodiment of the present application does not limit the terminal 200. It may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 210 may include one or more processing units. For example, the processor 210 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

The controller may be a decision maker directing the various components of the terminal 200 to work in concert as instructed. Are the neural center and the command center of the terminal 200. The controller generates an operation control signal according to the instruction operation code and the time sequence signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 210 for storing instructions and data. In some embodiments, the memory in processor 210 is a cache memory that may hold instructions or data that have just been used or recycled by processor 210. If the processor 210 needs to use the instruction or data again, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 210, thereby increasing the efficiency of the system.

In some embodiments, processor 210 may include an interface. The interface may include an Integrated Circuit (I2C) interface, an Integrated Circuit built-in audio source (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a SIM interface, and/or a USB interface, etc.

The I2C interface is a bi-directional synchronous serial bus that includes a Serial Data Line (SDL) and a Serial Clock Line (SCL). In some embodiments, processor 210 may include multiple sets of I2C buses. The processor 210 may be coupled to the touch sensor 280K, the charger, the flash, the camera 293, etc. through different I2C bus interfaces. For example: the processor 210 may be coupled to the touch sensor 280K through an I2C interface, such that the processor 210 and the touch sensor 280K communicate through an I2C bus interface to implement the touch function of the terminal 200.

The I2S interface may be used for audio communication. In some embodiments, processor 210 may include multiple sets of I2S buses. Processor 210 may be coupled to audio module 270 via an I2S bus to enable communication between processor 210 and audio module 270. In some embodiments, the audio module 270 may communicate audio signals to the communication module 260 through an I2S interface to enable answering a call through a bluetooth headset.

The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, audio module 270 and communication module 260 may be coupled by a PCM bus interface. In some embodiments, the audio module 270 may also transmit the audio signal to the communication module 260 through the PCM interface, so as to implement the function of answering a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication, with different sampling rates for the two interfaces.

The UART interface is a universal serial data bus used for asynchronous communications. The bus is a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 210 and the communication module 260. For example: the processor 210 communicates with the bluetooth module through the UART interface to implement the bluetooth function. In some embodiments, the audio module 270 may transmit the audio signal to the communication module 260 through the UART interface, so as to realize the function of playing music through the bluetooth headset.

The MIPI interface may be used to connect the processor 210 with peripheral devices such as the display screen 294, the camera 293, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 210 and camera 293 communicate via a CSI interface to implement the capture functionality of terminal 200. The processor 210 and the display screen 294 communicate through the DSI interface to implement a display function of the terminal 200.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. In some embodiments, a GPIO interface may be used to connect processor 210 with camera 293, display 294, communications module 260, audio module 270, sensor module 280, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, a MIPI interface, and the like.

The USB interface 230 may be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc. The USB interface 230 may be used to connect a charger to charge the terminal 200, and may also be used to transmit data between the terminal 200 and peripheral devices. And the earphone can also be used for connecting an earphone and playing audio through the earphone. But also for connecting other electronic devices, such as AR devices, etc.

The interface connection relationship between the modules in the embodiment of the present application is only schematically illustrated, and does not limit the structure of the terminal 200. The terminal 200 may adopt different interface connection manners or a combination of multiple interface connection manners in the embodiment of the present application.

The charge management module 240 is configured to receive a charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 240 may receive charging input from a wired charger via the USB interface 230. In some wireless charging embodiments, the charging management module 240 may receive a wireless charging input through a wireless charging coil of the terminal 200. The charging management module 240 may also supply power to the terminal 200 through the power management module 241 while charging the battery 242.

The power management module 241 is used to connect the battery 242, the charging management module 240 and the processor 210. The power management module 241 receives the input from the battery 242 and/or the charging management module 240, and supplies power to the processor 210, the internal memory 221, the external memory interface 220, the display 294, the camera 293, and the communication module 260. The power management module 241 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In some embodiments, the power management module 241 may also be disposed in the processor 210. In some embodiments, the power management module 241 and the charging management module 240 may also be disposed in the same device.

The wireless communication function of the terminal 200 may be implemented by the antenna 1, the antenna 2, the rf module 250, the communication module 260, a modem, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in terminal 200 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the cellular network antenna may be multiplexed into a wireless local area network diversity antenna. In some embodiments, the antenna may be used in conjunction with a tuning switch.

The radio frequency module 250 may provide a communication processing module including a solution for wireless communication such as 2G/3G/4G/5G, etc. applied to the terminal 200. The rf module 250 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The rf module 250 receives electromagnetic waves from the antenna 1, and performs filtering, amplification, and other processing on the received electromagnetic waves, and transmits the electromagnetic waves to the modem for demodulation. The rf module 250 may also amplify the signal modulated by the modem, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the rf module 250 may be disposed in the processor 210. In some embodiments, at least some functional blocks of rf module 250 may be disposed in the same device as at least some blocks of processor 210.

The modem may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 270A, the receiver 270B, etc.) or displays images or video through the display screen 294. In some embodiments, the modem may be a stand-alone device. In some embodiments, the modem may be separate from processor 210, in the same device as rf module 250 or other functional modules.

The communication module 260 may provide a communication processing module of a solution for wireless communication applied to the terminal 200, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The communication module 260 may be one or more devices integrating at least one communication processing module. The communication module 260 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 210. The communication module 260 may also receive a signal to be transmitted from the processor 210, frequency-modulate and amplify the signal, and convert the signal into electromagnetic waves via the antenna 2 to radiate the electromagnetic waves.

In some embodiments, antenna 1 of terminal 200 is coupled to rf module 250 and antenna 2 is coupled to communication module 260, such that terminal 200 may communicate with networks and other devices via wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), Wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), Long Term Evolution (LTE), LTE, BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a global satellite positioning system (SBAS), a global navigation satellite system (GLONASS), a BeiDou satellite navigation system (BDS), a Quasi-Zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

The terminal 200 implements a display function through the GPU, the display screen 294, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 294 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 210 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 294 is used to display images, video, and the like. The display screen 294 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, terminal 200 may include 1 or N display screens 294, N being a positive integer greater than 1.

The terminal 200 may implement a shooting function through the ISP, the camera 293, the video codec, the GPU, the display screen, and the application processor.

The ISP is used to process the data fed back by the camera 293. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 293.

The camera 293 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, terminal 200 may include 1 or N cameras 293, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the terminal 200 selects a frequency point, the digital signal processor is used to perform fourier transform or the like on the frequency point energy.

Video codecs are used to compress or decompress digital video. The terminal 200 may support one or more video codecs. In this way, the terminal 200 can play or record video in a plurality of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can implement applications such as intelligent recognition of the terminal 200, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 220 may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the terminal 200. The external memory card communicates with the processor 210 through the external memory interface 220 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.

Internal memory 221 may be used to store computer-executable program code, including instructions. The processor 210 executes various functional applications of the terminal 200 and data processing by executing instructions stored in the internal memory 221. The memory 221 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (e.g., audio data, a phonebook, etc.) created during use of the terminal 200, and the like. In addition, the memory 221 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, other volatile solid-state storage devices, a Universal Flash Storage (UFS), and the like.

The terminal 200 may implement an audio function through the audio module 270, the speaker 270A, the receiver 270B, the microphone 270C, the earphone interface 270D, and the application processor. Such as music playing, recording, etc.

Audio module 270 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. Audio module 270 may also be used to encode and decode audio signals. In some embodiments, the audio module 270 may be disposed in the processor 210, or some functional modules of the audio module 270 may be disposed in the processor 210.

The speaker 270A, also called a "horn", is used to convert an audio electrical signal into an acoustic signal. The terminal 200 can listen to music through the speaker 270A or listen to a handsfree call.

The receiver 270B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the terminal 200 receives a call or voice information, it is possible to receive voice by placing the receiver 270B close to the human ear.

The microphone 270C, also referred to as a "microphone," is used to convert acoustic signals into electrical signals. When making a call or transmitting voice information, the user can input a voice signal to the microphone 270C by speaking the user's mouth near the microphone 270C. The terminal 200 may be provided with at least one microphone 270C. In some embodiments, the terminal 200 may be provided with two microphones 270C, which may implement a noise reduction function in addition to collecting sound signals. In some embodiments, three, four or more microphones 270C may be further disposed on the terminal 200 to collect the sound signals, reduce noise, identify sound sources, implement directional recording functions, and so on.

The headphone interface 270D is used to connect wired headphones. The headset interface 270D may be the USB interface 230, or may be an open mobile platform (OMTP) standard interface of 3.5mm, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 280A is used to sense a pressure signal, which can be converted into an electrical signal. In some embodiments, the pressure sensor 280A may be disposed on the display screen 294. The pressure sensor 280A can be of a wide variety of types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor, the capacitance between the electrodes changes. The terminal 200 determines the intensity of the pressure according to the change in the capacitance. When a touch operation is applied to the display screen 294, the terminal 200 detects the intensity of the touch operation based on the pressure sensor 280A. The terminal 200 may also calculate the touched position based on the detection signal of the pressure sensor 280A. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.

The gyro sensor 280B may be used to determine the motion attitude of the terminal 200. In some embodiments, the angular velocity of terminal 200 about three axes (i.e., x, y, and z axes) may be determined by gyroscope sensor 280B. The gyro sensor 280B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 280B detects a shake angle of the terminal 200, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the terminal 200 by a reverse movement, thereby achieving anti-shake. The gyro sensor 280B may also be used for navigation, somatosensory gaming scenes.

The air pressure sensor 280C is used to measure air pressure. In some embodiments, the terminal 200 calculates altitude, aiding positioning and navigation, from the barometric pressure value measured by the barometric pressure sensor 280C.

The magnetic sensor 280D includes a hall sensor. The terminal 200 may detect the opening and closing of the flip holster using the magnetic sensor 280D. In some embodiments, when the terminal 200 is a folder, the terminal 200 may detect the opening and closing of the folder according to the magnetic sensor 280D. And then according to the detected opening and closing state of the leather sheath or the opening and closing state of the flip, the characteristics of automatic unlocking of the flip and the like are set.

The acceleration sensor 280E may detect the magnitude of acceleration of the terminal 200 in various directions (typically three axes). The magnitude and direction of gravity can be detected when the terminal 200 is stationary. The method can also be used for identifying the terminal posture, and is applied to transverse and vertical screen switching, pedometers and other applications.

A distance sensor 280F for measuring distance. The terminal 200 may measure the distance by infrared or laser. In some embodiments, taking a picture of a scene, terminal 200 may utilize range sensor 280F to range for fast focus.

The proximity light sensor 280G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. Infrared light is emitted outward through the light emitting diode. Infrared reflected light from nearby objects is detected using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the terminal 200. When insufficient reflected light is detected, it can be determined that there is no object near the terminal 200. The terminal 200 can utilize the proximity light sensor 280G to detect that the user holds the terminal 200 close to the ear for talking, so as to automatically turn off the screen for power saving. The proximity light sensor 280G may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.

The ambient light sensor 280L is used to sense the ambient light level. The terminal 200 may adaptively adjust the display screen brightness according to the perceived ambient light brightness. The ambient light sensor 280L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 280L may also cooperate with the proximity light sensor 280G to detect whether the terminal 200 is in a pocket to prevent inadvertent contact.

The fingerprint sensor 280H is used to collect a fingerprint. The terminal 200 can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access to an application lock, fingerprint photographing, fingerprint incoming call answering and the like.

The temperature sensor 280J is used to detect temperature. In some embodiments, terminal 200 implements a temperature processing strategy using the temperature detected by temperature sensor 280J. For example, when the temperature reported by the temperature sensor 280J exceeds the threshold, the terminal 200 performs a reduction in the performance of the processor located near the temperature sensor 280J, so as to reduce power consumption and implement thermal protection.

The touch sensor 280K is also referred to as a "touch panel". May be provided on the display screen 294. For detecting a touch operation acting thereon or thereabout. The detected touch operation may be passed to the application processor to determine the type of touch event and provide a corresponding visual output via the display screen 294.

The bone conduction sensor 280M may acquire a vibration signal. In some embodiments, the bone conduction sensor 280M may acquire a vibration signal of the human vocal part vibrating the bone mass. The bone conduction sensor 280M may also contact the pulse of the human body to receive the blood pressure pulsation signal. In some embodiments, bone conduction sensor 280M may also be disposed in the headset. The audio module 270 may analyze a voice signal based on the vibration signal of the bone block vibrated by the sound part obtained by the bone conduction sensor 280M, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure pulsation signal acquired by the bone conduction sensor 280M, so as to realize a heart rate detection function.

The keys 290 include a power-on key, a volume key, etc. The keys 290 may be mechanical keys. Or may be touch keys. The terminal 200 receives a key 290 input, and generates a key signal input related to user setting and function control of the terminal 200.

The motor 291 may generate a vibration cue. The motor 291 can be used for both incoming call vibration prompting and touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. Touch operations applied to different areas of the display screen 294 may also correspond to different vibration feedback effects. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Indicator 292 may be an indicator light that may be used to indicate a state of charge, a change in charge, or may be used to indicate a message, missed call, notification, etc.

The SIM card interface 295 is used to connect a SIM. The SIM card can be attached to and detached from the terminal 200 by being inserted into the SIM card interface 295 or being pulled out from the SIM card interface 295. The terminal 200 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1. The SIM card interface 295 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. Multiple cards can be inserted into the same SIM card interface 295 at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 295 may also be compatible with different types of SIM cards. The SIM card interface 295 may also be compatible with external memory cards. The terminal 200 interacts with the network through the SIM card to implement functions such as communication and data communication. In some embodiments, the terminal 200 employs eSIM. Namely: an embedded SIM card. The eSIM card can be embedded in the terminal 200 and cannot be separated from the terminal 200.

Please refer to fig. 3, which is a schematic diagram illustrating an example of a communication scenario in which the method for recording and displaying information in a communication process according to the embodiment of the present application is applied. In the embodiment of the present application, the terminal 200 shown in fig. 3 is taken as a first terminal, the terminal 300 is taken as a second terminal, and the terminal 200 and the terminal 300 perform voice communication as an example, so as to exemplify a method for recording and displaying information in a communication process provided in the embodiment of the present application. As shown in fig. 3, during the voice communication between the terminal 200 and the terminal 300, the user 210 uses the terminal 200 to perform a voice call with the user 310 using the terminal 300.

An embodiment of the present application provides a method for recording information in a communication process, as shown in fig. 4, the method for recording information in a communication process may include S401 to S404:

s401, the terminal 200 performs voice communication with the terminal 300.

S402, the terminal 200 recognizes the voice data captured by the microphone 270C of the terminal 200.

Among other things, the microphone 270C (also referred to as a "microphone") of the terminal 200 may capture voice data around the terminal 200 during voice communication between the terminal 200 and the terminal 300. The voice data around the terminal 200 may include voice data uttered by the user 210 and ambient noise around the terminal 200. The terminal 200 may convert the voice data captured by the microphone 270C into a text and then determine whether the converted text matches a preset activation word. If the matching is detected, the terminal 200 may start recording the voice data corresponding to the voice communication.

For example, the preset start words may be "i'm record", "start recording", and "start recording now", etc.

The preset starting word in the embodiment of the present application may be a default starting verb configured in the terminal 200 when the terminal 200 leaves a factory. Alternatively, the preset starting word may be a custom starting verb set in the terminal 200 by the user.

In the embodiment of the present application, taking the terminal 200 (i.e., the mobile phone 200) shown in fig. 5 as an example, a process of the terminal receiving the verb start set by the user is described:

the cell phone 200 may receive a user's click operation (e.g., a single click operation) on a "set" application icon on the desktop of the cell phone 200. In response to a user clicking on the "setup" application icon, the cell phone 200 may display a cell phone setup interface. The mobile phone setting interface can comprise an airplane mode option, a WLAN option, a Bluetooth option, a mobile network option, a system application option and the like. Specific functions of the "flight mode" option, the "WLAN" option, the "bluetooth" option, and the "mobile network" option may refer to specific descriptions in the conventional technology, and are not described herein again in this embodiment of the present application. The mobile phone 200 may display a system application setting interface in response to a user clicking on the "system application" option. The system application setting interface comprises a telephone option, a contact option, a short message option and the like. The handset 200 may display a phone setting interface 501 shown in (a) in fig. 5 in response to a click operation of the user on a "phone" option in the system application setting interface. Optionally, the setting interface may include a "phone" option. The handset 200 may display a phone setting interface 501 shown in (a) in fig. 5 in response to a click operation of a "phone" option in the handset setting interface by the user.

As shown in fig. 5 (a), a "call forwarding" option, a "call waiting" option, an "incoming call interception" option, and a "call recording" option 502 are included in the phone setting interface 501. Specific functions of the "call forwarding" option, the "call waiting" option, and the "incoming call interception" option may refer to specific descriptions in the conventional technology, and are not described herein again in the embodiments of the present application. The cell phone 200 may display a call recording interface 503 shown in (b) in fig. 5 in response to a click operation of the "call recording" option 502. The call recording interface 503 includes a "record notification" option, a "turn on automatic recording" option, and a "voice controlled recording" option 504. After the "record notification" option is turned on, the mobile phone 200 may prompt in the notification bar after recording. After the option of 'turn on automatic recording' is turned on, the mobile phone 200 can automatically record when the incoming call is turned on. After the option 504 of "voice control recording" is turned on, in the process of calling, the mobile phone 200 receives a verb start (such as "start recording" or "remember me") sent by the user, and can automatically start recording; after the mobile phone 200 receives an end word (e.g., "end recording") sent by the user or the call is ended or the mobile phone 200 records for a preset time (e.g., 1 minute, 2 minutes or 5 minutes), the recording can be automatically ended. The mobile phone 200 may display a voice control recording interface 505 shown in (c) of fig. 5 in response to a user clicking on the "voice control recording" option 504. Voice control record interface 505 includes voice control record switch 506, "open verb" option 507, and "end word" option 510. The "open verb" option 507 includes an open word 508 and a "custom open verb" option 509 that are currently configured in the handset 200. The "end word" option 510 includes an end word 511 currently configured in the handset 200 and a "custom end word" option 512. Before the user has not set the starting verb and the ending word in the mobile phone 100, the default starting verb "start recording" is indicated to the user in the "starting verb" option 507, and the default ending word "end recording" is indicated to the user in the "ending word" option 510.

The mobile phone 200 may display a startup word customization interface 601 shown in fig. 6 in response to a user clicking on the "custom startup verb" option. A "cancel" button 604, a "confirm" button 605, a "start word input box" 602, and a start word suggestion 603 may be included in the start word customization interface 601. The "cancel" button 604 is used to trigger the mobile phone to cancel the setting of the custom verb starter, and displays the voice control recording interface 505 shown in (c) in fig. 5. The "launch word input box" 602 is used to receive a custom launch verb entered by a user. The "OK" button 210 is used to save the custom start verb that the user entered in the "start word input box" 602. The verb-to-start suggestion 603 is used for prompting the user of the requirement of the mobile phone for the custom verb-to start. Assume that the user enters the custom start verb "i take a note" in the "start word input box" 602 shown in fig. 6. The cellular phone 200 may set the start word of the cellular phone 200 to "i'm take a note" in response to a click operation (e.g., a one-click operation) of the "ok" button 605 shown in fig. 6 by the user.

S403, the terminal 200 determines whether a text corresponding to the first voice data captured by the microphone 270C matches a preset start word.

Specifically, if the text corresponding to the first voice data captured by the microphone 270C matches the preset start word, the terminal 200 executes S404; if the text corresponding to the first voice data captured by the microphone 270C does not match the preset start word, the terminal 200 continues to execute S402:

s404, the terminal 200 records the voice data of the terminal 300.

In the embodiment of the present application, matching a text corresponding to the first voice data with a preset start word specifically includes: the text corresponding to the first voice data comprises a preset starting verb.

Illustratively, in connection with the above example, the preset enable verb of the terminal 200 is "i remember. As shown in fig. 3, during voice communication between the terminal 200 and the terminal 300, the user 210 utters voice data 1 "can provide your address and phone? ". The microphone 270C of the terminal 200 can then capture the voice data 1 "can you provide your address and phone? ". The terminal 200 can recognize the voice data 1, get the text "can provide your address and phone for" of the voice data 1? ", and determines that the text of the voice data 1 does not match the preset verb" i remember ". During voice communication between the terminal 200 and the terminal 300, the user 210 utters voice data 2 "i'm's notes" (i.e., first voice data). The microphone 270C of the terminal 200 may then capture the voice data 2 "i remember". The terminal 200 may recognize the voice data 2, obtain the text "i'm take down" of the voice data 2, and determine that the text of the voice data 2 matches the preset verb-on "i'm take down". When the terminal 200 determines that the text of the voice data 2 matches the preset verb "i write", it may start recording the voice data corresponding to the voice communication. As shown in fig. 3, the terminal 200 may start recording voice data corresponding to the voice communication at time t 1.

In the embodiment of the present application, the voice data of the terminal 300 is converted from the audio electrical signal received from the terminal 300.

Optionally, the terminal 200 may also record voice data captured by the microphone 270C of the terminal 200. If the text corresponding to the first voice data captured by the microphone 270C matches the preset activation word, the terminal 200 may perform not only S404 but also S404':

s404', the terminal 200 records the voice data captured by the microphone 270C of the terminal 200.

That is, the voice data recorded by the terminal 200 may include: voice data captured by the microphone 270C of the terminal 200, and voice data converted from an audio electric signal received from the terminal 300.

Among other things, the microphone 270C of the terminal 200 may receive voice data (i.e., a first sound signal) uttered by the user 210. The terminal 200 may convert the first sound signal into a first audio electric signal. Then, the terminal 200 transmits the first audio electric signal to the terminal 300. The terminal 300 may convert the first audio electric signal from the terminal 200 into a first sound signal and play the first sound signal through a receiver (also referred to as "earpiece"), or play the first sound signal through a speaker (also referred to as "speaker") of the terminal 300. The "voice data captured by the microphone 270C" is specifically a first sound signal captured by the microphone.

Likewise, a microphone (also referred to as a "microphone") of terminal 300 may receive voice data (i.e., a second sound signal) emitted by user 310. The terminal 300 may convert the second sound signal into a second audio electric signal. Then, the terminal 300 transmits the second audio electric signal to the terminal 200. The terminal 200 may convert the second audio electric signal from the terminal 300 into a second sound signal and play the second sound signal through a receiver 270B (also referred to as "earpiece"), or play the second sound signal through a speaker 270A (also referred to as "speaker") of the terminal 200. The "voice data converted from the audio electrical signal received from the terminal 300" is specifically: the terminal 200 converts a second sound signal obtained by converting the second audio electric signal received from the terminal 300.

Optionally, as shown in fig. 4, after S404, the method of the embodiment of the present application may further include S405-S406:

s405, in the process of recording the voice data, the terminal 200 determines whether a text corresponding to the second voice data captured by the microphone 270C matches a preset end word.

Specifically, if the text corresponding to the second speech data captured by the microphone 270C matches the preset end wakeup word, the terminal 200 executes S406; if the text corresponding to the second voice data captured by the microphone 270C does not match the preset end wakeup word, the terminal 200 continues to record the voice data:

s406, the terminal 200 stops recording the voice data.

For example, the preset verb ending in the embodiment of the present application may be a default verb-starting configured in the terminal 200 when the terminal 200 leaves the factory. For example, the preset end word in the mobile phone 200 may be a default end word "end recording" shown in (c) of fig. 5. Alternatively, the preset starting word may be a custom starting verb set in the terminal 200 by the user. For example, cell phone 200 may display an end word customization interface in response to a user clicking on "customize end word" option 512 shown in fig. 5 (c). The end word self-defining interface is used for setting an end word. The specific contents of the ending word customization interface may refer to the starting word customization interface 601 shown in fig. 6. Assume that the user sets a custom end word "i remember" for the cell phone 200 (i.e., the terminal 200) at the end word custom interface.

For example, the preset end words in the embodiment of the present application may be "end recording", "i remember", "end recording", "remember", and "record completed", etc. In the embodiment of the present application, the method of the present application is described by taking a preset ending word of the terminal 200 as "i remember".

As shown in fig. 3, during voice communication between terminal 200 and terminal 300, from the start of recording voice data at terminal 200 (i.e., from time t 1), user 310 utters voice data 3 "shenzhen, aenan mountain region department issue road No. 1". The microphone of the terminal 300 may capture the voice data 3 and convert the voice data 3 (i.e., sound signal) into the electrical audio signal 1. The terminal 300 transmits the audio electrical signal 1 to the terminal 200. The terminal 200 converts the audio electric signal 1 from the terminal 300 into a sound signal (i.e., voice data 3), and the terminal 300 stores the voice data 3. The microphone 270C of the terminal 200 may capture the voice data 4 "kay" and record the voice data 4. The terminal 200 receives the audio electrical signal 2 transmitted by the terminal 300. The terminal 200 converts the audio signal 2 into sound information, i.e., voice data 5 "telephone number 88776655". The terminal 200 saves the voice data 4.

In the embodiment of the present application, matching the text corresponding to the second speech data with the preset end word specifically includes: the text corresponding to the second voice data includes a preset end word.

The microphone 270C of the terminal 200 may capture the voice data 6 "good, i've remembered". The terminal 200 may recognize the voice data 6, obtain the text of the voice data 6 "good, i'm remembered", and determine that the preset end word "i'm remembered" is included in the text of the voice data 6. That is, the terminal 200 may determine that the text of the voice data 6 matches the preset end word "i'm remembered". When the terminal 200 determines that the text of the voice data 6 matches the preset end word "i remember", recording of the voice data corresponding to the voice communication can be ended. As shown in fig. 3, the terminal 200 may end (i.e., stop) recording voice data corresponding to the voice communication at time t 2.

In this embodiment, in the process of performing voice communication between the terminal 200 and the terminal 300, when the text corresponding to the first voice data captured by the microphone 207C of the terminal 200 matches the preset start word text, the terminal 200 may automatically start recording the voice data corresponding to the voice communication. In other words, after receiving the voice command (i.e., the first voice data whose text matches the preset verb-to-start text) issued by the user 210, the terminal 200 may automatically start recording the voice data corresponding to the voice communication.

Also, in the process of recording voice data by the terminal 200, when a text corresponding to the second voice data captured by the microphone 207C of the terminal 200 matches a preset end word text, the terminal 200 may automatically stop recording voice data. In other words, the terminal 200 may automatically stop recording the voice data after receiving the voice command (i.e., the second voice data whose text matches the preset end word text) issued by the user 210.

In summary, the terminal 200 may automatically record and stop recording voice data in response to a voice command issued by a user during a voice call. Through the scheme, the terminal is more intelligent, the interaction performance between the terminal and a user is improved, and the user experience is improved.

Wherein, S405 to S406 in the embodiment of the present application are optional. The terminal 200 may stop recording voice data when voice communication is ended. Alternatively, the terminal 200 may automatically stop recording the voice data after recording the voice data for a preset time (e.g., 1 minute, 2 minutes, or 5 minutes).

Optionally, the terminal 200 may further prompt the user that the terminal 200 starts recording the voice data when the terminal 200 starts recording the voice data corresponding to the voice communication. Specifically, after S403, if the text corresponding to the first speech data captured by the microphone 270C matches the preset startup word, the method in the embodiment of the present application may further include S701:

s701, the terminal 200 sends out first prompt information. The first prompt message is used to prompt the user terminal 200 to start recording voice data.

For example, the first prompt message in the embodiment of the present application may be a prompt tone or a vibration prompt. For example, the alert tone in the embodiment of the present application may be a monosyllabic alert tone, such as "ding", "tic", or "swishing", etc. Alternatively, the alert tone may be an N second length ring tone, such as N equal to 2, 3, or 5. Alternatively, the alert tone is an audio signal, such as "start recording".

In this embodiment, the terminal 200 may send the first prompt message to prompt the user terminal 200 to start recording the voice data when the text corresponding to the first voice data matches the preset start word, that is, when the terminal 200 starts recording the voice data. Therefore, the user can know that the terminal 200 starts to record the voice data through the first prompt message, so that the direct interaction between the terminal 200 and the user is increased, the interaction performance between the terminal 200 and the user is improved, and the user experience is improved.

Optionally, the terminal 200 may further prompt the user that the terminal 200 stops recording the voice data when the recording of the voice data is stopped. Specifically, after S405, if the text corresponding to the second speech data captured by the microphone 270C matches the preset end wakening word, the method in the embodiment of the present application may further include S702:

s702, the terminal 200 sends out the second prompt message. The second prompt message is used to prompt the user terminal 200 to stop recording the voice data.

For example, the second prompt message in the embodiment of the present application may be a prompt tone or a vibration prompt. For example, the alert tone in the embodiment of the present application may be a monosyllabic alert such as "ding", "tic", or "swishing", etc. Alternatively, the alert tone may be an N second length ring tone, such as N equal to 2, 3, or 5. Alternatively, the alert tone is a sound signal, such as "end recording".

The second prompt message in the embodiment of the present application may be the same as or different from the first prompt message. The second prompt information is different from the first prompt information, and may specifically include the following three conditions:

case (1): the first prompt message is a prompt tone, and the second prompt message is a vibration prompt. Case (2): the first prompt message and the second prompt message are both prompt tones, but the prompt tone corresponding to the first prompt message is different from the prompt tone corresponding to the second prompt message. For example, the cue tone corresponding to the first cue information is "ding", and the cue tone corresponding to the second cue information is "tic"; alternatively, the alert sound corresponding to the first alert information is "start recording", and the alert sound corresponding to the second alert information is "end recording". Case (3): the first prompt message and the second prompt message are both vibration prompts, but the vibration prompts corresponding to the first prompt message and the vibration prompts corresponding to the second prompt message have different vibration modes. For example. The vibration mode of the vibration prompt corresponding to the first prompt message is single vibration, and the vibration mode of the vibration prompt corresponding to the second prompt message is continuous twice vibration.

In this embodiment, the terminal 200 may send the second prompt message to prompt the user terminal 200 to stop recording the voice data when the text corresponding to the second voice data matches the preset start word, that is, when the terminal 200 stops recording the voice data. Therefore, the user can know that the terminal 200 stops recording the voice data through the prompt message, so that the direct interaction between the terminal 200 and the user is increased, the interaction performance between the terminal 200 and the user is improved, and the user experience is improved.

It is understood that, in the case where the terminal 200 plays the voice data using the speaker 270A, when the first prompt information or the first prompt information is a prompt tone, the prompt tone emitted by the terminal 200 may be captured by the microphone 207C of the terminal 200. In this case, the terminal 200 transmits the alert tone captured by the microphone 207C to the terminal 300. In this way, the user 310 may also hear the alert tone, which may affect the user's conversation experience.

In the embodiment of the present application, to avoid that the voice communication of the terminal 200 makes the end user (i.e. the user 310) hear the prompt tone sent by the terminal 200. The terminal 200 may determine whether the terminal 200 is playing the voice data using the speaker 270A or the receiver 270B when the text corresponding to the first voice data captured by the microphone 270C matches the preset startup word. In the case where the terminal 200 plays the voice data using the speaker 270A, the terminal 200 may perform echo suppression on the voice data collected by the microphone 270C according to the voice data played by the speaker 270A. Thus, the voice data after echo suppression does not include the above-mentioned warning tone. The voice data transmitted from the terminal 200 to the terminal 300 is the voice data after echo suppression. The voice data transmitted from the terminal 200 to the terminal 300 does not include the above-mentioned alert tone. The user 310 will not hear the alert tone.

Further, the terminal 200 may stop performing echo suppression on the voice data collected by the microphone 270C after playing the second prompt message (i.e., the prompt tone). Thus, power consumption caused by the terminal 200 continuously performing echo suppression can be avoided, and the endurance time of the terminal 200 can be prolonged.

The voice data recorded by the terminal 200 may include one piece of voice data or at least two pieces of voice data. For example, in connection with the corresponding example of fig. 3. The terminal 200 can record three pieces of voice data: "Shenzhen, Nanshan regional department of Shenzhen, road No. 1", "kahie" and "telephone number 88776655".

In this embodiment, the terminal 200 may store the recorded voice data according to the time sequence of recording each piece of voice data and the source information of each piece of voice data. The source information indicates that the voice data is voice data captured by the microphone 270C or voice data converted from an audio electrical signal received from the terminal 300.

For example, the time sequence of recording the three voice data segments by the terminal 200 is as follows: voice data 3 "Shenzhen Nanshan regional department issue road No. 1", voice data 4 "kahen", and voice data 5 "phone number is 88776655". The speech data 3 "shenzhen, nanshan district issue road No. 1" and the speech data 5 "telephone number 88776655" are speech data converted from the audio electric signal received from the terminal 300. Voice data 4 "kay" is voice data captured by the microphone 270C of the terminal 200.

For example, the terminal 200 may store the voice data recorded during the voice communication in a table manner. Assume that the call time of the terminal 200 and the terminal 300 performing the voice communication shown in fig. 6 is 2018, 8 month, 8 day 08: 06, the call duration is 21 minutes. As shown in table 1, an example of a voice data table is shown in the embodiment of the present application:

TABLE 1

As shown in table 1, the voice data table stores voice data recorded during a call between the terminal 200 and the terminal 300 (the telephone number is 138 × 5678). In table 1, the telephone number 138 × 5678 and the call time 2018, 8 months and 8 days 08 are sequentially stored according to the time sequence of recording the voice data and the source of the voice data: during the voice call at 06, voice data 3 "Shenzhen nan Shanke Lu No. 1", voice data 4 "kahen", and voice data 5 "telephone number 88776655" recorded by terminal 200. Also, the source of each voice data is also noted in table 1. Such as voice data 3 and voice data 5, are sourced from each other. That is, the voice data 3 and the voice data 5 are voice data converted from the audio electric signal received from the terminal 300. The source of the speech data 4 is local. I.e., voice data 4 is voice data captured by the microphone 270C of the terminal 200.

In one implementation, the terminal 200 may segment the voice data recorded by the terminal according to the source of the voice data. Specifically, in connection with the example shown in fig. 3, after the terminal 200 records the voice data 3 (the voice data converted from the audio electrical signal received from the terminal 300), the microphone 270C of the terminal 200 captures the voice data 4. Since the source of the voice data 3 is different from the voice data 4; accordingly, the terminal 200 can save the voice data 3 and the voice data 4 in segments. After the microphone 270C of the terminal 200 captures the voice data 4, the terminal 200 receives an audio electric signal (an audio electric signal corresponding to the voice data 5) from the terminal 300. Since the sources of the voice data 4 and the voice data 5 are different; therefore, the terminal 200 can save the voice data 4 and the voice data 5 in segments.

In some application scenarios, during the voice communication between the terminal 200 and the terminal 300, after the user 210 sends a piece of voice data (e.g. voice data a), both the user 210 and the user 310 will be silent for a period of time, and then the user 210 sends a piece of voice data (e.g. voice data b) again. Alternatively, in the process of the voice communication between the terminal 200 and the terminal 300, after the user 310 sends a segment of voice data, both the user 210 and the user 310 will be silent for a period of time, and then the user 310 sends a segment of voice data again. In this scenario, when the terminal 200 segments the voice data recorded by the terminal, not only the source of the voice data but also the interval time of the voice data may be referred to. For example, during the voice communication between the terminal 200 and the terminal 300, the user 210 sends a piece of voice data (e.g., voice data a), and the microphone 270C of the terminal 200 captures the voice data a. After the microphone 270C captures the voice data a, if the microphone 270C does not capture new voice data and the terminal 200 does not receive the audio electrical signal from the terminal 300 within a certain time (e.g., 1 minute, 2 minutes, or 5 minutes), the microphone 270C captures the voice data b, and the terminal 200 may save the voice data a and the voice data b in segments. After the microphone 270C of the terminal 200 captures the voice data b, the terminal 200 receives an audio electric signal (an audio electric signal corresponding to the voice data C) from the terminal 300. The sources of the voice data b and the voice data c are different; accordingly, the terminal 200 can save the voice data b and the voice data c in segments.

For example, the terminal 200 may store the voice data a, the voice data b, and the voice data c recorded in the voice communication process in a table manner. As shown in table 2, an example of a voice data table shown in the embodiment of the present application is:

TABLE 2

In table 2, the telephone number 138 × 5678 and the call time 2018, 8 months and 8 days 08 are sequentially stored according to the time sequence of recording the voice data and the source of the voice data: 06, voice data a, voice data b, and voice data c recorded by the terminal 200. Also, the source of each voice data is also noted in table 1. If the source of the voice data a is the local machine, the source of the voice data b is the local machine, and the source of the voice data c is the opposite party. That is, the voice data a and the voice data b are voice data captured by the microphone 270C of the terminal 200, and the voice data C is voice data converted from an audio electric signal received from the terminal 300.

The voice data may be stored with the call log information that stores the corresponding voice communication. That is, the terminal 200 can store the voice data in the storage area for storing the call log information. Alternatively, the voice data may be stored in a separate storage area. The embodiments of the present application are not described herein in detail. If the terminal 200 stores the voice data and the call record information together, it may be that the terminal 200 records the voice data in some voice communication processes and does not record the voice data in other voice communication processes, so that the voice data recorded by the terminal 200 is included in the information corresponding to part of the call record, and the voice data recorded by the terminal 200 is not included in the information corresponding to part of the call record. For example, as shown in table 3, an example of a call record information table shown in the embodiment of the present application is:

TABLE 3

In the call log information table shown in table 3, the telephone number is 138 × 5678, and the call time is 2018, 8 month, 8 day 08: the call record information of the voice call 06 includes voice data 3-voice data 5 recorded in the corresponding call process. Terminal 200 calls at telephone number 180 × 1234, call time 8/10 in 2018: 01, voice data is not recorded in the voice communication process; therefore, the voice data is not stored in the corresponding call log information. As shown in table 2, the telephone number is 180 × 1234, the call time is 2018, 8 months, 8 days 10: both the voice data and the source of the call log information for the voice call of 01 may be NULL (e.g., NULL).

Compared with the voice data table shown in table 1, the call record information table shown in table 3 may include not only the telephone number, the call time, the voice data, and the source information of the voice data corresponding to the voice call, but also information such as the call time of the voice call.

The embodiment of the application provides a method for displaying information in a communication process. After the voice data recording is finished, the terminal 200 displays a first interface including a player plug-in for playing the voice data. The terminal 200 may receive a first operation of the player plug-in by the user and play corresponding voice data in response to the first operation.

In implementation (1), the terminal 200 may automatically display the first interface in response to the end of the voice communication after the end of the recording of the voice data.

Illustratively, as shown in fig. 7, the terminal 200 may display a first interface 701 shown in fig. 7 in response to the voice communication ending. The first interface 701 may include a player plug-in for playing voice data recorded by the terminal 200. For example, player plug-in 702, player plug-in 703, and player plug-in 704 in first interface 701. The player plug-in 702 is used for playing the voice data 3. The player plug-in 703 is used to play the voice data 4. The player plug-in 704 is used to play the above-mentioned voice data 5. Of course, the first interface 701 may also include only one player plug-in, and the player plug-in may be used to play all voice data recorded by the terminal 200 in the voice communication process, such as the voice data 3, the voice data 4, and the voice data 5.

In implementation (2), the terminal 200 may automatically display the first interface after the voice data recording is finished. After the recording of the voice data by the terminal 200 is finished, the terminal 200 may still perform voice communication with the terminal 300. In this case, the terminal 200 displays the first interface during voice communication after the voice data recording is finished.

In implementation (3), after the recording of the voice data is finished, the terminal 200 displays the first interface in response to the user inputting the second operation. The second operation is used to trigger the terminal 200 to display the call record interface of the terminal 200, i.e., the first interface. Wherein the terminal 200 may receive a second operation input by the user. In response to the second operation, the terminal 200 displays a call record interface of the terminal 200. One or more call log entries may be included in the call log interface. One call log entry may correspond to one call log. One call record can record the opposite-end communication number of one voice communication, contact information (such as contact name or remark name), call start or end time, call duration and the like. For example, the call record interface in the embodiment of the present application includes a call record item for the terminal 200 to perform voice communication with the terminal 300. The call log entry is used to record call log information of the terminal 200 in voice communication with the terminal 300, such as the phone number 138 × 5678 of the terminal 300.

For example, the second operation may be a click operation (e.g., a single click operation) by the user on the "phone" icon 802 in the desktop 801 of the mobile phone shown in fig. 8 (a).

In this embodiment, the call record item for the terminal 200 and the terminal 300 to perform voice communication may further include a player plug-in. The player plug-in is used for playing the voice data recorded by the terminal 200.

It is assumed that three call records are held in the terminal 200, that is, the call record corresponding to the call record item 804, the call record corresponding to the call record item 805, and the call record corresponding to the call record item 806 shown in (b) in fig. 8. In response to a click operation of the "phone" icon 802 by the user, the terminal 200 may display a call log interface 803 shown in (b) of fig. 8. Call log interface 803 includes call log entry 804, call log entry 805, and call log entry 806. As shown in table 1, terminal 200 stores voice data recorded when terminal 200 performs voice communication with a terminal corresponding to telephone number 138 x 5678. Then the call record entry 804 may include a player plug-in 807 for playing the voice data 3, voice data 4 and voice data 5 in table 2. The terminal 200 may play the voice data 3, the voice data 4, and the voice data 5 in table 1 in sequence in response to a click operation (e.g., a click operation) of the player plug-in 807 by the user. As shown in table 1, the terminal 200 stores voice data recorded when the terminal 200 performs voice communication with a terminal corresponding to the telephone number 159 × 7986. Then the call record item 804 may include a player plug-in 808 for playing the voice data 7 in table 2. The terminal 200 can play the voice data 7 in table 1 in sequence in response to a click operation (e.g., a one-click operation) of the player plug-in 808 by the user.

It is understood that the terminal 200 does not record voice data during voice communication with the terminal corresponding to the telephone number 159 x 7986. Thus, no player plug-in is displayed in the call record item 806.

In the present application, the call record item in the call record interface includes a player plug-in for playing the voice data recorded in the corresponding voice communication process. In this way, the terminal 200 can respond to the click operation of the player plug-in by the user to play the corresponding voice data. In addition, the terminal 200 can show the relationship between the voice data played by the player plug-in and the call record to the user in the call record interface, so that the relevance between the voice data recorded by the terminal 200 and the call record is improved.

In the implementation manner (4), the above-mentioned player plug-in may not be included in the call record item in the call record interface. In this implementation, after the voice data recording is finished, the terminal 200 displays the first interface in response to a third operation of the user on the call record item of the voice call in the call record interface. The third operation is used to trigger the terminal 200 to display a record detail interface, i.e. the first interface, corresponding to the call record item. The record details interface includes at least one player plug-in. The at least one player plug-in corresponds to the at least two sections of voice data one to one. And the at least one player plug-in is displayed in the recording detail interface according to the time sequence of recording the corresponding voice data and the source information of the corresponding voice data.

For example, in response to a user's click operation on the "phone" icon 802 in the cell phone desktop 801 shown in (a) in fig. 8, the terminal 200 may display the call record interface 901 shown in (a) in fig. 9. The call log interface 901 includes a call log item 902, a call log item 903, and a call log item 904. The call record item 902, the call record item 903 and the call record item 904 do not include a player plug-in. However, in response to the third operation of the call record item 902 by the user, the terminal 200 may display a record detail interface 905 shown in (b) in fig. 9. The recording details interface 905 includes a player plug-in 906, a player plug-in 907, and a player plug-in 908. Among them, the player plug-in 906, the player plug-in 907, and the player plug-in 908 are arranged in the recording details interface 905 according to the order in which the voice data played by them are recorded. In response to a user's single-click operation on the player plug-in 906, the terminal 200 can play the speech data 3 "Shenzhen City Nanshan district publishing way No. 1" shown in Table 1. In response to a single click operation of the player plug-in 907 by the user, the terminal 200 may play the voice data 4 "kay" shown in table 1. In response to a single-click operation of the player plug-in 908 by the user, the terminal 200 can play the voice data 5 "telephone number 88776655" shown in table 1.

In the present application, the call record item in the call record interface includes a player plug-in for playing the voice data recorded in the corresponding voice communication process. In this way, the terminal 200 can respond to the click operation of the player plug-in by the user to play the corresponding voice data. In addition, the terminal 200 can visually display the relationship between the voice data played by the player plug-in and the call record to the user on the call record interface, so that the relevance between the voice data recorded by the terminal 200 and the call record is improved.

In the embodiment of the present application, the terminal 200 may convert the recorded voice data into text information, i.e., text of the recorded voice data. The voice data recorded by the terminal 200 may be at least two pieces of voice data. The text of the recorded voice data may include at least two pieces of text information. The at least two sections of text information correspond to the at least two sections of voice data one to one. The terminal 200 may store at least two pieces of text information according to the time sequence of recording the voice data corresponding to the text information and the source information of the voice data corresponding to the text information.

For example, the terminal 200 may store the voice data recorded in the voice communication process and the text information corresponding to each piece of voice data in a table manner. With reference to table 1, as shown in table 4, an example of a speech data and text information table shown in the embodiment of the present application is:

TABLE 4

As shown in table 2, the voice data and text information table stores the voice data and the text information corresponding to the voice data recorded during the call between the terminal 200 and the terminal 300 (the telephone number is 138 × 5678).

As shown in fig. 10A, a method for displaying information in a communication process according to an embodiment of the present application may include S1001-S1002. As shown in fig. 10B, a method for recording and displaying information in a communication process according to an embodiment of the present application may include S401 to S404, S404', and S1001 to S1002.

S1001, after the recording of the voice data is finished, the terminal 200 displays a first interface including a text corresponding to the recorded voice data. The voice data is voice data recorded by the terminal 200 during voice communication between the terminal 200 and the terminal 300.

For the first interface and the method for displaying the first interface by the terminal 200 in response to the first event, reference may be made to the detailed description in the implementation manner (1) to the implementation manner (4), and details of the embodiment of the present application are not described herein again. In this embodiment, the first interface may include text corresponding to the voice data. In this embodiment, the first interface may or may not include a player plug-in.

Wherein, the voice data comprises at least two sections of voice data. For example, the voice data may include the above-described voice data 3, voice data 4, and voice data 5. The text corresponding to the voice data comprises at least two sections of text information. The at least two sections of voice data correspond to the at least two sections of text information one by one. In this embodiment of the present application, the terminal 200 records at least two pieces of voice data, and the manner of storing the at least two pieces of voice data and the at least two pieces of text information may refer to the detailed description in the above embodiments, which is not repeated herein.

Illustratively, the first interface is the above-mentioned record detail interface. The terminal 200 may receive a third operation of the call record item of the terminal 200 and the terminal 300 by the user for voice communication. The third operation is used to trigger the terminal 200 to display a record detail interface corresponding to the call record item. The record details interface includes at least two sections of text information. The at least two sections of text information correspond to the at least two sections of voice data one by one. For example, in response to a third operation of the call record item 902 by the user, the terminal 200 may display the record detail interface 1001 shown in fig. 10C. The recording details interface 1001 includes the text information "Shenzhen nan mountain region department issue road No. 1" 1002 corresponding to the voice data 3, the text information "kahen" 1003 corresponding to the voice data 4, and the text information "telephone number 88776655" 1004 corresponding to the voice data 5 shown in table 2. The text information "shenzhen, nan shan regional branch office 1 number" 1002, the text information "kahen" 1003, and the text information "telephone number 88776655" 1004 are arranged in the order in which the corresponding voice data is recorded. Also, the source of each text message is indicated in the record details interface 1001. For example, the text information "Shenzhen City Nanshan region branch issue No. 1" 1002 comes from the other party, i.e., the text information "Shenzhen City Nanshan region branch issue No. 1" 1002 is text information corresponding to the speech data converted from the audio electric signal received from the terminal 300. The text information "kay" 1003 comes from the computer, that is, the text information "kay" 1003 is the text information corresponding to the voice data captured by the microphone 270C. The text information "telephone number is 88776655" 1004 comes from the other party, that is, the text information "telephone number is 88776655" 1004 is text information corresponding to voice data converted from an audio electric signal received from the terminal 300.

In this embodiment, the terminal 200 may display text information corresponding to the voice data recorded by the terminal 200 in a recording detail interface of the call record. That is, the terminal 200 can visually display the text information corresponding to the voice data recorded by the terminal 200 and the relationship between the text information and the call record to the user on the call record interface, so that the relevance between the text information and the call record is improved.

S1002, the terminal 200 may receive a first operation of the user on the text corresponding to the voice data, and play the corresponding voice data in response to the first operation.

Specifically, the terminal 200 may further receive a first operation of the user on any one text message (e.g., the first text message) of the at least two text messages in the record details interface. The first user triggers the terminal 200 to play the voice data segment corresponding to the first text message. For example, the first operation may be any one of a single-click operation, a double-click operation, and a long-press operation of the first text information by the user. In response to a first operation of the user on the first text message, the terminal 200 may play a first voice data segment corresponding to the first text message. The first voice data segment is one of at least two segments of voice data.

Illustratively, the terminal 200 can receive a first operation by the user on the first textual information "Shenzhen City Nanshan district department issue road No. 1" 1002 in the record details interface 1001 shown in FIG. 10C. In response to a first operation by the user on the first textual information "Shenzhen City Nanshan region July issue way No. 1" 1002, as shown in FIG. 10C, the terminal 200 can launch the speech data segment "Shenzhen City Nanshan region July issue way No. 1".

Optionally, the recording details interface may include at least two pieces of text information, and may further include at least two player plug-ins. At least two player plug-ins correspond to at least two sections of text information one by one. The at least two player plug-ins are used for playing voice data corresponding to at least two sections of text information. For example, as shown in fig. 11, not only the text information "shenzhen, nanshan district department issue No. 1" 1102, the text information "kayama" 1103, and the text information "telephone number 88776655" 1104 but also the player plug-in 1105, the player plug-in 1106, and the player plug-in 1107 may be included in the recording details interface 1101. The player plug-in 1105 is used to play the voice data "Shenzhen city nanshan district branch issue 1" corresponding to the text information 1102. The player plug-in 1106 is used for playing the voice data "kay" corresponding to the text information 1103. The player plug-in 1107 is used to play voice data "telephone number is 88776655" corresponding to the text information 1104.

In some cases, when the terminal 200 converts voice data recorded by the terminal 200 into text information, the converted text information may not completely coincide with the text of the voice data recorded by the terminal 200. I.e., the text information converted by the terminal 200 may have some errors. In this embodiment, the terminal 200 may receive a fourth operation (i.e., a modification operation) performed by the user on any one text information (e.g., the second text information) of the at least two pieces of text information, and modify the second text information stored in the terminal 200. For example, the fourth operation may be any one of a single-click operation, a double-click operation, and a long-press operation of the second text information by the user. The fourth operation is different from the second operation described above. The second text information is one of the at least two pieces of text information. The second text information corresponds to a second speech data segment of the at least two segments of speech data. Thus, the user can control the terminal 200 to play the second voice data segment corresponding to the second text message, and compare the second voice data segment played by the terminal 200 with the second text message displayed by the terminal 200. When the second text information is not identical to the text of the second voice data segment played by the terminal 200, the operation terminal 200 modifies the second text information.

Wherein, in response to the fourth operation, the first terminal may modify the second text information into third text information; displaying the third text information on the first interface; and receiving the first operation of the user on the third text information, and playing the second voice data segment.

Illustratively, as shown in (a) in fig. 12, not only the text information "telephone number 8877665" 1202 but also the player plug-in 1203 may be included in the recording detail interface 1201. The player plug-in 1203 is used to play the voice data 5 "telephone number is 88776655" shown in table 2. After the terminal 200 plays the voice data 5 "telephone number 88776655" in response to a single-click operation of the player plug-in 1203 by the user, the user 210 finds that the text information "telephone number 8877665" 1202 is different from the voice data 5. At this time, the user can modify 1202 the text information "telephone number 8877665" in accordance with the voice data 5.

For example, the terminal 200 may receive a fourth operation (e.g., a double-click operation) by the user on the second text information "telephone number 8877665" 1202. In response to the modification operation, the terminal 200 may display a record detail interface 1204 shown in (b) in fig. 12. The record details interface 1204 includes a modification box 1205 in which the second text information is displayed and a keyboard 1206. As shown in (c) in fig. 12, the user enters the correct text information "telephone number is 88776655" (i.e., the third text information) in the modification box 1205. In response to a click operation of the "OK" button in the modification box 1205 by the user, the terminal 200 may display a record detail interface 1208 shown in (d) in fig. 12. The recording details interface 1208 includes therein text information "telephone number 88776655" 1207 (i.e., third text information). Thereafter, the terminal 200 receives a first operation 1207 of the text information "telephone number 88776655" by the user, and can play a second piece of voice data (i.e., a piece of voice data corresponding to the second text information).

In the embodiment of the present application, the terminal 200 may replace the text information before modification with the modified text information in response to the modification operation of the user on the text information. Thus, the user can modify the text information obtained by converting the voice data by the terminal 200 according to the voice data stored in the terminal 200, and can correct the error occurring when the text information is obtained by converting the voice data by the terminal 200.

It will be appreciated that the limited location in the call log entry in the call log interface may not be sufficient to fully display the at least two pieces of textual information. In this case, the call log entry may include keywords of the at least two pieces of text information. The terminal 200 may display the at least two pieces of text information and the corresponding player plug-in response to a click operation (e.g., a click operation) of the keyword by the user. For example, the terminal 200 may display a call record interface 1301 illustrated in (a) in fig. 13. Call log interface 1301 includes a call log entry 1302. The call log entry 1302 includes the keyword "address, phone" 1303. In response to a click operation of the keyword "address, phone" 1303 by the user, the terminal 200 may display at least two pieces of text information shown in (b) of fig. 13 and its corresponding player plug-in 1304.

Optionally, in an implementation manner of the embodiment of the present application, the at least two segments of voice data recorded by the terminal 200 only include the voice data converted from the audio electrical signal received from the terminal 300, and do not include the voice data captured by the microphone 270C. In other words, the terminal 200 records only voice data from the other party. The terminal 200 may store the at least two pieces of voice data according to the time sequence of recording each piece of voice data.

Alternatively, in another implementation, the at least two pieces of voice data recorded by the terminal 200 include only the voice data captured by the microphone 270C, and do not include the voice data converted from the electrical audio signal received from the terminal 300. In other words, the terminal 200 records only voice data uttered by the own terminal. The terminal 200 may store the at least two pieces of voice data according to the time sequence of recording each piece of voice data.

In the above two implementation manners, the terminal 200 may display the text information of the voice data or the player plug-in the call record item in the call record interface, and the terminal 200 may display the text information of the voice data or the player plug-in the record detail interface corresponding to the call record item, which may refer to the detailed description in the foregoing embodiments, and details are not repeated here in the embodiments of the present application.

It is understood that the terminal 200 includes a hardware structure and/or a software module for performing each function in order to implement the functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

In the embodiment of the present application, the terminal 200 may be divided into functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.

For example, in the case of dividing each functional module by corresponding functions, fig. 14 shows a possible structural diagram of the terminal involved in the above embodiment, and the terminal 1400 includes: a control module 1401, a monitoring module 1402, an automatic speech recognition module 1403, a monitoring module 1404, an echo suppression module 1405, a recording module 1406, a recording module 1407, a text recording module 1408, a playback module 1409, and a display module 1410.

The control module 1401 is configured to control the monitoring module 1402 to monitor voice data sent by the host when the terminal 1400 performs voice communication with other terminals. The monitoring module 1402 is configured to transmit the monitored voice data to the automatic voice recognition module 1403. The automatic speech recognition module 1403 converts speech data monitored by the monitoring module 1402 into text information.

The control module 1402 is further configured to determine whether the text information recognized by the automatic speech recognition module 1403 (i.e., the text of the speech data monitored by the monitoring module 1402) matches a preset start word. The control module 1402 is further configured to start the monitoring module 1404, the recording module 1406, the recording module 1407, and the text recording module 1408 when the text of the voice data monitored by the monitoring module 1402 matches a preset start word.

The monitoring module 1404 is configured to monitor voice data from the opposite end, and transmit the monitored voice data to the automatic voice recognition module 1403. The automatic speech recognition module 1403 is used for converting the speech data monitored by the monitoring module 1404 into text information. The control module 1402 is further configured to control the automatic speech recognition module 1403 to transmit the converted text information and the source information thereof to the text recording module 1408 when the text of the speech data monitored by the monitoring module 1402 matches the preset start word. The word recording module 1408 is used to record the text information and its source information from the automatic speech recognition module 1403. The recording module 1406 is used for recording voice data sent by the owner of the computer. The recording module 1407 is used for recording the voice data from the opposite end.

Further, the control module 1402 is further configured to turn off the monitoring module 1404, the recording module 1406, the recording module 1407, and the text recording module 1408 when the text of the voice data monitored by the monitoring module 1402 matches the preset end word.

Optionally, the control module 1402 is further configured to control the playback module 1409 to play the second prompt message, such as a prompt tone, described in the foregoing embodiment when the text of the voice data monitored by the monitoring module 1402 matches a preset end word. The control module 1402 is further configured to activate the echo suppression module 1405 when the text of the voice data monitored by the monitoring module 1402 matches a preset end word. The control module 1402 is further configured to control the playback module 1409 to play the second prompt message, such as a prompt tone, shown in the foregoing embodiment when the text of the voice data monitored by the monitoring module 1402 matches a preset end word. The echo suppression module 1405 is configured to perform echo suppression on the voice data monitored by the monitoring module 1402 according to the alert tone played by the playback module 1409. The control module 1402 is further configured to close the playback module 1409 after the playback module 1409 plays the second prompt message.

The control module 1401 is further configured to control the display module 1410 to display the incoming call reminding interface and the call record interface, which are described in the foregoing embodiments.

Of course, the terminal 1300 includes, but is not limited to, the above-listed unit modules. For example, the terminal 300 may further include a receiving module and a transmitting module. The receiving module is used for receiving data or instructions sent by other terminals. The sending module is used for sending data or instructions to other terminals. Moreover, the functions that can be specifically implemented by the above functional units also include, but are not limited to, the functions corresponding to the method steps described in the above examples, and the detailed description of the corresponding method steps may be referred to for the detailed description of other units of the terminal 1400, which is not described herein again in this embodiment of the present application.

In the case of an integrated unit, fig. 15 shows a possible structural diagram of the terminal involved in the above-described embodiment. The terminal 1500 includes: processing module 1501, storage module 1502, display module 1503, communication module 1504, and audio module 1505. The processing module 1501 is used for controlling and managing the operation of the terminal 1500. For example, the processing module 1501 may be configured to support the terminal 1500 to perform S402, S403, S404', S405, S406, S701, S702 of the above-described method embodiments, generate the first interface in S1001, "receive the first operation" in S1002, and/or other processes for the techniques described herein. The display module 1503 is used for displaying the image generated by the processing module 1501. For example, the display module 1503 is used to support the terminal 1500 to perform S1001 in the above-described method embodiments, and/or other processes for the techniques described herein. A storage module 1502 is used for storing program codes and data of the terminal. For example, the storage module 1502 is used for storing the voice data recorded by the processor executing S404 and S404' and the text information of the recorded voice data. The communication module 1504 is used to support communication of the terminal 200 with other network entities, such as the terminal 300. For example, the communication module 1504 is used to enable the terminal 1500 to perform S401 in the above-described method embodiments, and/or other processes for the techniques described herein. The audio module 1505 is used for collecting voice data sent by the user of the terminal 200 and playing the voice data. For example, audio module 1505, such as a microphone in audio module 1505, is used to support terminal 1500 in performing "capturing voice data". The audio module 1505, such as the speaker and the receiver in the audio module 1505, are used to support the operation of the terminal 1500 to "play voice data". For detailed description of each unit included in the terminal 1500, reference may be made to the description in each method embodiment described above, and details are not described here again.

The Processing module 1501 may be a Processor or a controller, such as a Central Processing Unit (CPU), a general purpose Processor, a Digital Signal Processor (DSP), an Application-Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others. The communication module may be a transceiver, a transceiving circuit or a communication interface, etc. The storage module 1502 may be a memory.

The processing module 1501 is a processor (e.g., the processor 210 shown in fig. 2), and the communication module 1504 includes a radio frequency module (e.g., the radio frequency module 250 shown in fig. 2). The communication module can also comprise a Wi-Fi module, a Bluetooth module and the like. The communication modules such as the radio frequency module 250, the Wi-Fi module, and the bluetooth module may be collectively referred to as a communication interface. The storage module 1502 is a memory (e.g., the internal memory 221 shown in FIG. 2). The display module 1503 is a touch screen (including the display screen 294 shown in fig. 2, and a display panel and a touch panel are integrated into the display screen 294). The audio module 270 may include a microphone (e.g., microphone 270C shown in fig. 2), a speaker (e.g., speaker 270A shown in fig. 2), a receiver (e.g., receiver 270B shown in fig. 2), and a headphone interface (e.g., headphone interface 270D shown in fig. 2). The terminal provided by the embodiment of the present application may be the terminal 200 shown in fig. 2. Wherein the processor, the communication interface, the touch screen, the memory, the microphone, the receiver and the speaker may be coupled together by a bus.

An embodiment of the present application further provides a computer storage medium, where a computer program code is stored in the computer storage medium, and when the processor executes the computer program code, the terminal executes the relevant method steps in any of fig. 4, fig. 10A, or fig. 10B to implement the method in the foregoing embodiment.

The embodiments of the present application also provide a computer program product, which when run on a computer, causes the computer to execute the relevant method steps in any one of fig. 4, fig. 10A, or fig. 10B to implement the method in the above embodiments.

In this embodiment, the terminal 1400, the terminal 1500, the computer storage medium, or the computer program product provided in the present application are all configured to execute the corresponding methods provided above, so that the beneficial effects achieved by the present application can refer to the beneficial effects in the corresponding methods provided above, and are not described herein again.

Through the above description of the embodiments, it is clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical functional division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another device, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may be one physical unit or a plurality of physical units, that is, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partially contributed to by the prior art, or all or part of the technical solutions may be embodied in the form of a software product, where the software product is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk, and various media capable of storing program codes.

The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for recording and displaying information in a communication process is characterized in that the method is applied to the process of voice communication between a first terminal and a second terminal, and comprises the following steps:

the first terminal identifies voice data captured by a microphone of the first terminal;

if the text corresponding to the first voice data captured by the microphone is matched with a preset starting word, the first terminal starts to record the voice data of the second terminal, and the voice data of the second terminal is converted from an audio electric signal received from the second terminal;

in the process of recording the voice data, if a text corresponding to second voice data captured by the microphone is matched with a preset ending wakeup word, the first terminal stops recording the voice data;

after the voice data recording is finished, the first terminal displays a first interface; the first interface comprises a text corresponding to the recorded voice data; the recorded voice data comprises voice data of the second terminal recorded by the first terminal or voice data captured by the microphone;

and the first terminal receives a first operation of a user on a text corresponding to the recorded voice data, and plays the recorded voice data in response to the first operation.

2. The method according to claim 1, wherein the displaying, by the first terminal, a first interface after the recording of the voice data is finished comprises:

after the voice data recording is finished, the first terminal automatically displays the first interface;

alternatively, the first and second electrodes may be,

responding to the end of the voice communication, and displaying the first interface by the first terminal;

alternatively, the first and second electrodes may be,

after the voice data recording is finished, responding to a second operation input by a user, wherein the first terminal displays the first interface, the second operation is used for indicating the first terminal to display a call record interface of the first terminal, the first interface is the call record interface, the call record interface comprises a call record item of the voice communication, the call record item is used for recording call record information of the voice communication, and the call record item comprises a text corresponding to the recorded voice data;

alternatively, the first and second electrodes may be,

after the voice data recording is finished, responding to a third operation of a user on a call record item of the voice communication in a call record interface, wherein the first terminal displays the first interface, the third operation is used for indicating the first terminal to display a record detail interface of the voice communication, the first interface is the record detail interface, the record detail interface is used for displaying call record information of the voice communication, and the record detail interface comprises a text corresponding to the recorded voice data.

3. The method of claim 1, further comprising:

and if the text corresponding to the first voice data captured by the microphone is matched with the preset starting word, the first terminal starts to record the voice data captured by the microphone.

4. The method according to claim 1, wherein the text corresponding to the recorded voice data comprises at least two pieces of text information, the recorded voice data comprises at least two pieces of voice data, and the at least two pieces of text information correspond to the at least two pieces of voice data one to one;

the first terminal receives a first operation of a user on a text corresponding to the recorded voice data, and plays the voice data in response to the first operation, and the method comprises the following steps:

the first terminal receives the first operation of a user on first text information; responding to the first operation, and playing a first voice data segment by the first terminal;

the first text message is one of the at least two text messages; the first text information corresponds to the first speech data segment.

5. The method of claim 4, further comprising:

the first terminal receives a fourth operation of a user on second text information, wherein the fourth operation is used for modifying the second text information into third text information; the second text information is one of the at least two sections of text information, and the second text information corresponds to a second voice data section;

in response to the fourth operation, the first terminal modifies the second text information into the third text information;

the first terminal displays the third text information on the first interface;

and the first terminal receives the first operation of the user on the third text message and plays the second voice data segment.

6. The method of claim 4, wherein the first terminal displays a first interface; the first interface comprises a text corresponding to the recorded voice data, and comprises:

the first terminal displays the at least two sections of text information on the first interface according to the time sequence of the voice data sections corresponding to the recorded text information and the source information of the voice data sections corresponding to the text information;

wherein the source information is used for indicating that the voice data segment is voice data captured by the microphone or voice data of the second terminal.

7. The method according to any one of claims 1-6, further comprising:

and if the text corresponding to the first voice data is matched with the preset starting word, the first terminal sends first prompt information, the first prompt information is used for prompting a user that the first terminal starts to record the voice data, and the first prompt information is prompt sound or vibration prompt.

8. The method of claim 7, further comprising:

recording voice data in-process, if the text that second voice data that the microphone was caught corresponds is reminded the word with the preset end and is matchd, then first terminal sends second tip information, second tip information is used for the suggestion user first terminal stops recording voice data, second tip information is warning sound or vibration suggestion.

9. The method of claim 8, wherein the first and second prompting messages are prompting tones, the method further comprising:

when the text corresponding to the first voice data is matched with the preset starting word, the first terminal determines that the first terminal uses a loudspeaker to play the voice data;

the first terminal performs echo suppression on the voice data collected by the microphone according to the voice data played by the loudspeaker;

and after the first terminal plays the second prompt message, stopping echo suppression of the voice data collected by the microphone.

10. The method according to any one of claims 4-6, wherein the first terminal stores the at least two pieces of voice data according to the time sequence of recording each piece of voice data and the source information of each piece of voice data;

the source information is used for indicating that the voice data segment is captured by the microphone, or the source information is used for indicating that the voice data segment is the voice data of the second terminal.

11. The method of any of claims 4-6, further comprising at least two player inserts in the first interface;

the at least two playing plugins are used for playing the at least two sections of voice data, and the at least two playing plugins correspond to the at least two sections of text information one to one.

12. A terminal, characterized in that the terminal is a first terminal, the terminal comprising: one or more processors, memory, touch screen, microphone, communication interface, microphone, and speaker; the memory, the communication interface and the processor are coupled; the touch screen is used for displaying the image generated by the processor; the microphone is used for capturing voice data; the memory for storing computer program code; the computer program code comprises computer instructions which, when executed by the processor,

the processor is used for carrying out voice communication with a second terminal through the communication interface; identifying voice data captured by the microphone; if the text corresponding to the first voice data captured by the microphone is identified to be matched with a preset starting word, starting to record the voice data of the second terminal, wherein the voice data of the second terminal is converted from an audio electric signal received from the second terminal; the processor is further used for storing the recorded voice data in the memory;

the processor is further used for controlling the touch screen to display a first interface after the voice data recording is finished; the first interface comprises a text corresponding to the recorded voice data; the recorded voice data comprises voice data of the second terminal recorded by the first terminal or voice data captured by the microphone;

the processor is further used for receiving a first operation of the text displayed by the touch screen by a user; controlling the telephone receiver or the loudspeaker to play the voice data in response to the first operation;

the processor is further used for controlling the touch screen to display the first interface, and stopping recording the voice data if the recognized text corresponding to the second voice data captured by the microphone is matched with a preset ending awakening word in the voice data recording process.

13. The terminal of claim 12, wherein the processor is configured to control the touch screen to display a first interface after the recording of the voice data is finished, and the method comprises:

the processor is used for automatically controlling the touch screen to display the first interface after the voice data recording is finished;

alternatively, the first and second electrodes may be,

the processor is used for responding to the end of the voice communication and controlling the touch screen to display the first interface;

alternatively, the first and second electrodes may be,

the processor is configured to control the touch screen to display the first interface in response to a second operation input by a user after the recording of the voice data is finished, where the second operation is used to instruct the terminal to display a call record interface of the terminal, the first interface is the call record interface, the call record interface includes a call record item of the voice communication, the call record item is used to record call record information of the voice communication, and the call record item includes a text corresponding to the recorded voice data;

alternatively, the first and second electrodes may be,

the processor is configured to control the touch screen to display the first interface in response to a third operation of a user on a call record item of the voice communication in a call record interface after the voice data recording is finished, where the third operation is used to instruct the terminal to display a record detail interface of the voice communication, the first interface is the record detail interface, the record detail interface is used to display call record information of the voice communication, and the record detail interface includes a text corresponding to the recorded voice data.

14. The terminal of claim 12, wherein the processor is further configured to start recording the voice data captured by the microphone if it is recognized that the text corresponding to the first voice data captured by the microphone matches the preset activation word.

15. The terminal according to claim 12, wherein the text corresponding to the recorded voice data includes at least two pieces of text information, the recorded voice data includes at least two pieces of voice data, and the at least two pieces of text information correspond to the at least two pieces of voice data one to one;

the processor is configured to receive a first operation of a user on the text displayed on the touch screen, and control the receiver or the speaker to play the voice data in response to the first operation, and includes:

the processor is used for receiving the first operation of a user on first text information displayed by the touch screen; responding to the first operation, controlling the telephone receiver or the loudspeaker to play a first voice data segment;

16. The terminal of claim 15, wherein the processor is further configured to receive a fourth operation of the second text information displayed on the touch screen by the user, and the fourth operation is configured to modify the second text information into third text information; the second text information is one of the at least two sections of text information, and the second text information corresponds to a second voice data section; in response to the fourth operation, modifying the second text information into the third text information; storing the third text information and the corresponding relation between the third text information and the second voice data segment in the memory;

the processor is further configured to control the touch screen to display the third text information on the first interface; and receiving the first operation of the user on the third text information displayed by the touch screen, and controlling the telephone receiver or the loudspeaker to play the second voice data segment.

17. The terminal of claim 15, wherein the processor, configured to control the touch screen to display the first interface, comprises:

the processor is used for controlling the touch screen to display the at least two sections of text information on the first interface according to the time sequence of the voice data sections corresponding to the recorded text information and the source information of the voice data sections corresponding to the text information;

18. The terminal according to any one of claims 12 to 17, wherein the processor is further configured to send a first prompt message if it is recognized that the text corresponding to the first voice data captured by the microphone matches the preset start word, where the first prompt message is used to prompt the user that the terminal starts recording voice data, and the first prompt message is a prompt sound or a vibration prompt.

19. The terminal of claim 18, wherein the processor is further configured to issue a second prompt message if it is recognized that a text corresponding to second voice data captured by the microphone matches the preset end reminder during recording the voice data, the second prompt message being used to prompt the user that the first terminal stops recording the voice data, and the second prompt message being a prompt tone or a vibration prompt.

20. The terminal of claim 19, wherein the first prompt message and the second prompt message are prompt tones, and the processor is further configured to determine that voice data is played by the speaker when a text corresponding to the first voice data matches the preset start word; carrying out echo suppression on the voice data collected by the microphone according to the voice data played by the loudspeaker; and after the receiver or the loudspeaker plays the second prompt message, stopping performing echo suppression on the voice data acquired by the microphone.

21. A terminal according to any one of claims 15-17, wherein the memory stores the at least two pieces of voice data according to a chronological order of recording each piece of voice data and source information of each piece of voice data;

22. The terminal according to any one of claims 15 to 17, wherein the first interface displayed by the touch screen further includes at least two player plug-ins, the at least two player plug-ins are in one-to-one correspondence with the at least two pieces of text information, and the at least two player plug-ins are in one-to-one correspondence with the at least two pieces of voice data;

the processor is further configured to receive a click operation of a user on a first player plug-in of the at least two player plug-ins, and control the receiver or the speaker to play a voice data segment corresponding to the first player plug-in.

23. A computer storage medium, characterized in that the computer storage medium comprises computer instructions which, when run on a terminal, cause the terminal to perform the method of recording and displaying information during communication according to any one of claims 1-11.