CN115457957A

CN115457957A - Voice information display method and device

Info

Publication number: CN115457957A
Application number: CN202211026843.XA
Authority: CN
Inventors: 刘红邦
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2022-08-25
Filing date: 2022-08-25
Publication date: 2022-12-09

Abstract

The application discloses a voice information display method and device, and belongs to the technical field of communication. The method comprises the following steps: receiving first voice information and second voice information sent by second electronic equipment, wherein the second voice information is associated with the first voice information; and displaying a second difference text with semantic difference in the second semantic result under the condition that the first semantic result corresponding to the first voice information and the second semantic result corresponding to the second voice information have difference.

Description

Voice information display method and device

Technical Field

The application belongs to the technical field of communication, and particularly relates to a voice information display method and device.

Background

With the rapid development of mobile communication technology, mobile terminals, such as smart phones, have become an indispensable tool in various aspects of people's life. Various functions of the mobile terminal are gradually improved, various intelligent services can be provided, and great convenience is brought to users.

Taking the voice chat function as an example, various application programs of the current mobile terminal have the function, and communicate with the opposite party in real time through voice chat, so that on one hand, the communication effect is better than that of text chat, and on the other hand, when a user is inconvenient to type, normal communication can be realized through voice input. However, in the prior art, the user cannot obtain the content of the newly received voice message in the voice message recording process, so that the user is required to record again after listening to the newly received voice message, the operation is complicated, and the communication efficiency is low.

Disclosure of Invention

The embodiment of the application aims to provide a voice message display method and device, which can facilitate a user to read a voice message quickly, thereby facilitating reply and improving communication efficiency.

In a first aspect, an embodiment of the present application provides a voice information display method, which is applied to a first electronic device, and the method includes:

receiving first voice information and second voice information sent by second electronic equipment, wherein the second voice information is associated with the first voice information;

and under the condition that a first semantic result corresponding to the first voice information and a second semantic result corresponding to the second voice information have differences, displaying a second difference text with semantic differences in the second semantic result.

In a second aspect, an embodiment of the present application provides a voice information display apparatus, including:

the receiving module is used for receiving first voice information and second voice information sent by second electronic equipment, and the second voice information is associated with the first voice information;

and the display module is used for displaying a second difference text with semantic difference in the second semantic result under the condition that the first semantic result corresponding to the first voice information and the second semantic result corresponding to the second voice information have difference.

In a third aspect, an embodiment of the present application further provides a voice information display method, which is applied to a second electronic device, and the method includes:

receiving a first voice input, and generating first voice information corresponding to the first voice input;

sending first voice information to first electronic equipment;

receiving a second voice input of the first voice information, and generating second voice information corresponding to the second voice input;

and associating the second voice information with the first voice information, and sending the second voice information to the first electronic equipment, so that the first electronic equipment displays a second difference text with semantic difference in a second semantic result corresponding to the second voice information according to the first voice information and the second voice information.

In a fourth aspect, an embodiment of the present application further provides a voice information display apparatus, where the apparatus includes:

the second receiving module is used for receiving the first voice input and generating first voice information corresponding to the first voice input;

the second sending module is used for sending the first voice information to the first electronic equipment;

the second receiving module is also used for receiving a second voice input of the first voice information and generating second voice information corresponding to the second voice input;

the second sending module is further configured to associate the second voice information with the first voice information, and send the second voice information to the first electronic device, so that the first electronic device displays, according to the first voice information and the second voice information, a second difference text with semantic difference in a second semantic result corresponding to the second voice information.

In a fifth aspect, embodiments of the present application provide an electronic device, which includes a processor and a memory, where the memory stores a program or instructions executable on the processor, and the program or instructions, when executed by the processor, implement the steps of the method as provided in the first aspect or the third aspect.

In a sixth aspect, embodiments of the present application provide a readable storage medium on which a program or instructions are stored, which when executed by a processor, implement the steps of the method as provided in the first or third aspect.

In a seventh aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method as provided in the first aspect or the third aspect.

In an eighth aspect, embodiments of the present application provide a computer program product, which is stored in a storage medium and executed by at least one processor to implement the method as provided in the first or third aspect.

In the embodiment of the application, semantic analysis is performed on the associated first voice information and second voice information, so that a first semantic result and a second semantic result can be compared, and a second difference text with semantic difference in the second semantic result is generated and displayed, so that a user can quickly read the second difference text without listening to the voice information, and the communication efficiency is improved; particularly, in the process of recording new voice information or editing new text messages, the user can also effectively reply according to the second distinguished text, so that the condition that the user replies according to the first voice information and then listens to the second voice information and then modifies the replied content is avoided or reduced to a certain extent.

Drawings

Fig. 1 is a schematic flowchart of a voice message display method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a display interface of a voice message display method according to an embodiment of the present application;

FIG. 3 is a partial flow chart of a method for displaying voice information according to an embodiment of the present application;

FIG. 4 is a schematic view of a display interface of a voice message display method according to an embodiment of the present application;

FIG. 5 is a partial flow chart of a method for displaying voice information according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a display interface of a voice message display method according to an embodiment of the present application;

FIG. 7 is a partial flow chart of a method for displaying voice information according to an embodiment of the present application;

FIG. 8 is a partial flow chart of a method for displaying voice information according to an embodiment of the present application;

FIG. 9 is a flowchart illustrating a method for displaying voice information according to an embodiment of the present application;

FIG. 10 is a partial flow diagram of a method for displaying voice information according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a voice message display device according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a voice message display device according to another embodiment of the present application;

fig. 13 is a schematic structural diagram of an electronic device according to still another embodiment of the present application;

fig. 14 is a hardware configuration diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/", and generally means that the former and latter related objects are in an "or" relationship.

The following describes the voice information display method provided in the embodiment of the present application in detail through a specific embodiment and an application scenario thereof with reference to the accompanying drawings.

As shown in fig. 1, fig. 1 is a schematic flowchart of a voice information display method according to an embodiment of the present application, where the embodiment of the present application provides a voice information display method, and the method is applied to a first electronic device, and the method may include:

s101, receiving first voice information and second voice information sent by second electronic equipment, wherein the second voice information is associated with the first voice information;

in the present application, the electronic device includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, and the like, as long as the electronic device has an instant messaging function. The first electronic device and the second electronic device provided by the application can be electronic devices of the same model or different models held by different users.

The first voice information and the second voice information are both voice information sent by the second electronic device, and the user can associate the first voice information with the second voice information when sending the second voice information. That is, the association relationship between the first voice information and the second voice information is determined by the second electronic device. Recording of the second voice information may be performed by pressing a displayed area of the first voice information, thereby associating the first voice information with the second voice information, for example. Illustratively, the first voice information and the second voice information may be associated by dragging the display area of the second voice information, which has been transmitted to the first electronic device, to the display area of the first voice information. The user can control the second electronic device to realize the association with the first voice information in or after the sending process of the second voice information, and the ways of controlling the second electronic device to realize the association of the first voice information and the second voice information by the user are various and are not described in detail herein.

Optionally, the association relationship between the first voice information and the second voice information may also be determined according to the voice contents of the first voice information and the second voice information, for example, the voice contents are associated with the same topic, the voice contents both include the same keyword, and the like.

S102, under the condition that a first semantic result corresponding to the first voice information and a second semantic result corresponding to the second voice information are different, displaying a second difference text with semantic difference in the second semantic result.

The semantic result is character information which is obtained through semantic analysis and has the same meaning as the meaning of the voice information. Semantic analysis can be performed on the first voice information and the second voice information through a semantic analysis model, and a first semantic result corresponding to the first voice information and a second semantic result corresponding to the second voice information are obtained. The semantic analysis model can be obtained through training of a neural network or a semantic analysis network. It should be noted that, the implementation method of semantic analysis is not limited in the embodiments of the present application, and details are not described herein.

And comparing the obtained semantic results, thereby identifying and obtaining the text with semantic difference. Similarly, the comparison between the first semantic result and the second semantic result to obtain the second difference text can be performed by the semantic analysis model. It should be noted that, in the embodiment of the present application, a method for obtaining the first difference text and the second difference text is not limited, and details are not described herein.

In some embodiments, to alert the user of the difference between the first speech information and the second speech information, the second distinct text is displayed simultaneously with the first distinct text having a semantic difference in the first semantic result. In order to facilitate browsing of a user, a display area may be preset in the display interface to display the first difference text and the second difference text, or a display box pops up in the display interface to display the first difference text and the second difference text in the display box. In order to facilitate the user to distinguish the first distinguishing text of the first voice information from the second distinguishing text of the second voice information, in some embodiments, the first distinguishing text is displayed in the first display area corresponding to the first voice information, and the second distinguishing text is displayed in the second display area corresponding to the second voice information.

Referring to fig. 2, fig. 2 is a schematic view of a display interface of a voice message display method according to an embodiment of the present application. In the display interface, first voice information is displayed in the first display area 201, the size of the first display area 201 is positively correlated with the length of the first voice information, and a user clicks the first display area 201 to play the first voice information. Meanwhile, a first distinguishing text "friday" is displayed in the first display area 201, second voice information is displayed in the second display area 202, the size of the second display area 202 is positively correlated with the length of the second voice information, and a user clicks the second display area 202 to play the second voice information. While the second distinct text "thursday" is displayed in the second display area 202. So that the user can quickly distinguish that the 'friday' in the first distinguished text belongs to the first voice information and the 'thursday' in the second distinguished text belongs to the second voice information. After listening to the first voice message, receiving a second voice message sent by the second electronic device in the process of recording a new voice message or editing a new text message according to the date 'friday' in the first voice message, wherein the user can timely correct the reply content according to the displayed second distinguished text 'thursday' without listening to the second voice message.

In the above embodiment, semantic analysis is performed on the associated first voice information and second voice information, so that the first semantic result and the second semantic result can be compared, and a second difference text with semantic difference in the second semantic result is generated and displayed, so that a user can quickly read the second difference text without listening to the voice information, and the communication efficiency is improved; particularly, in the process of recording new voice information or editing new text messages, the user can also effectively correct and reply according to the second distinguished text, and the situation that the user listens to the second voice information and then supplements the reply to the user again under the condition that the user replies according to the first voice information is avoided or reduced to a certain extent.

Referring to fig. 3, in some embodiments, after S102, the method further includes:

s201, under the condition that third voice information replying the first voice information is recorded, determining semantic sub-results of the same category as the semantic classification in third semantic results corresponding to the third voice information according to the semantic classification to which the second distinguishing text belongs;

s202, replacing the voice data corresponding to the semantic sub-result in the third voice information with the voice data corresponding to the second distinguishing text in the second voice information to generate fourth voice information.

The third voice message is the reply message recorded by the user through the first electronic device according to the first voice message. The audio data generated by the voice recording operation of the user on the first voice information may be used as the third voice information, for example, the voice recording is performed in response to the user pressing the first display area corresponding to the first voice information, and the voice recording data is the third voice information replied to the first voice information. Or after the first voice message is played, the voice recording data received immediately can be confirmed as the third voice message replied to the first voice message. After the third voice message is recorded and the operation input of the user for associating the third voice message with the first voice message is received, the third voice message is confirmed to be the voice message replied to the first voice message.

Semantic classification is the classification of the second distinct text. The semantic classification can be set by the person skilled in the art as desired. A contact name class, a title class, an amount class, a time class, etc. may be set. For example: the contact name class may include names of contacts such as zhang, lie, and the like set in the address book in the first electronic device. For example: the title class may include dad, mom, etc. For example: the amount class may include values with monetary units of units, blocks, thousand, etc.

The semantic sub-result is the same semantic sub-result in the third semantic result as the semantic classification, and the semantic sub-result may be the whole third semantic result or part of the third semantic result. It can be understood that, when the semantic classification to which the second distinct text belongs is a title class, the semantic sub-result should also correspond to the title class; when the semantics of the first difference text and the second difference text are classified into the amount class, the semantic sub-result should also correspond to the amount class, and so on.

It will be understood by those skilled in the art that the third speech information is a segment of audio data a, the semantic sub-result corresponds to a certain small segment of audio a of the audio data a, the same second speech information is a segment of audio data B, and the second distinct text corresponds to a certain small segment of audio B of the audio data B. And when the user finishes recording the third voice information, replacing a small section of audio a corresponding to the semantic sub-result in the third voice information with an audio b corresponding to the second distinguishing text in the second voice information to generate fourth voice information. The user can select appropriate voice information from the third voice information and the fourth voice information according to the requirement and send the appropriate voice information to the second electronic equipment.

The fourth voice message is generated according to the third voice message, and the fourth voice message is not obtained by extra recording of the user, so the step that the user rerecords the voice message after listening to the second voice message to correct the audio a corresponding to the semantic sub-result in the third voice message can be omitted, and the mode that the user replies the correct message is more flexible.

In some embodiments, after S201, the method further comprises:

and displaying the semantic sub-result in a third display area corresponding to the third voice information.

In this embodiment, please refer to fig. 4, where fig. 4 is a schematic display interface diagram of a voice information display method according to an embodiment of the present application. In the display interface, first voice information is displayed in a first display area 401, and a first distinction text "friday" is displayed in the first display area 401, and second voice information is displayed in a second display area 402, and a second distinction text "thursday" is displayed in the second display area 402. In the third display area 403, the third speech information is displayed, while in the third display area 403 the semantic sub-result "friday" is displayed. The semantic classifications of the first difference text of "friday", the second difference text of "thursday" and the semantic sub-result of "friday" are all time classes.

The semantic sub-results are displayed in the third display area, so that the user can conveniently confirm the parts with semantic differences in each voice message.

Referring to fig. 5, in order to further improve the communication efficiency, in some embodiments, S202 includes:

s501, receiving a first input of a second difference text;

s502, responding to the first input, replacing the voice data corresponding to the semantic sub-result in the third voice information with the voice data corresponding to the second distinguishing text in the second voice information, and generating fourth voice information.

The first input may be a touch input, for example, the touch input may be a touch-click operation, a slide operation, or other operations. The first input may also be an input operation performed by a user via another input device such as a mouse, a remote control device, a keyboard, etc., connected to the electronic device. Optionally, the first input may indicate a second area in which the second distinct text is located. Unless otherwise specified, the input to the electronic device in the following description may be a touch click operation, a sliding operation, or other operations, and may also be an input operation performed by a user through another input device such as a mouse, a remote control device, and a keyboard connected to the electronic device.

Illustratively, the first input may be dragging the second distinct text to an area where the semantic sub-result is located. Referring to fig. 6, fig. 6 is a schematic view of a display interface of a voice message display method according to an embodiment of the present application. In the display interface, first voice information is displayed in a first display area 401, and a first distinction text "friday" is displayed in the first display area 401, and second voice information is displayed in a second display area 402, and a second distinction text "thursday" is displayed in the second display area 402. In the third display area 403, the third speech information is displayed, while in the third display area 403 the semantic sub-result "friday" is displayed. The first input is a sliding operation from the second distinct text "thursday" to the area displayed by the semantic sub-result "friday".

After the first electronic device receives the first voice message, multiple pieces of reply information are sent, the same second voice message can correspond to multiple second distinct texts, and voice data which the user wants to replace can be determined through first input of the second distinct texts by the user, so that the generated fourth voice message better meets the requirements of the user.

As will be understood by those skilled in the art, in the case that a plurality of second distinct texts exist in the second speech information, a first input of a third distinct text by a user is received, and audio data corresponding to the third distinct text is substituted for audio data corresponding to a semantic sub-result in the same kind of speech as the third distinct text in the third speech information, wherein the plurality of second distinct texts includes the third distinct text, so as to realize substitution of speech data of a partial semantic sub-result of the third speech information. For example: the second voice message is a piece of audio data B, and the second voice message has two second distinct texts, wherein the two second distinct texts correspond to the audio "friday" and the audio "football" in the audio data B. The third voice information is a piece of audio data A, and has two semantic sub-results which are respectively similar to the second different text voice, and the two semantic sub-results correspond to the audio frequency 'thursday' and the audio frequency 'basketball playing' in the audio data A. When the first input of the second distinction text ZhouWu is received, the 'Zhouquan' of the third voice information is replaced by the 'ZhouWu', and when the first input of the second distinction text 'play football', the 'basketball playing' of the third voice information is replaced by the 'play football'.

Referring to fig. 7, in another embodiment, after S202, the method further includes:

and S701, in the case that the third voice information is not sent to the second electronic equipment, or in the case that the third voice information is sent to the second electronic equipment and marked as a read identifier, sending fourth voice information to the second electronic equipment.

The third voice message is not sent to the second electronic device, that is, the third voice message is recorded and is not sent to the second electronic device. In this case, the fourth voice information is directly transmitted to the second electronic device to omit the user selection operation.

The third voice information is marked as a read identification, i.e. representing that the third voice information has been read by the user of the second electronic device. In this case, if new voice information is sent to the second electronic device through the first electronic device, so that the new voice information replaces the read third voice information in the second electronic device, not only the possibility that the user listens to the wrong information for error reply cannot be reduced, but also the user may miss the voice information with correct semantics. Therefore, in this embodiment, the voice data corresponding to the second distinguished text in the second voice information is used to replace the voice data corresponding to the semantic sub-result in the third voice information, so as to generate fourth voice information, and the new fourth voice information is directly sent to the second electronic device, so that the user of the second electronic device can listen to the fourth voice information with correct semantics, thereby omitting the user of the first electronic device from recording the complete fourth voice information again.

Referring to fig. 8, in another embodiment, after S202, the method further includes:

s801, associating the fourth voice information with the third voice information under the condition that the third voice information is sent to the second electronic equipment and marked as an unread mark;

the third voice information is marked as the unread identification, which indicates that the user of the second electronic equipment has not read the third voice information. As will be understood by those skilled in the art, after the user of the second electronic device reads the third voice message, the second electronic device sends a status change instruction to the first electronic device, so that the first electronic device changes the unread flag of the third voice message to the read flag.

S802, fourth voice information is sent to the second electronic device, so that the second electronic device replaces the third voice information with the fourth voice information.

When the user records and replies the third voice message according to the first voice message, the third voice message can be replaced by generating new fourth voice message. It will be understood by those skilled in the art that the third speech information is a segment of audio data a, the semantic sub-result corresponds to a certain small segment of audio a of the audio data a, the same second speech information is a segment of audio data B, and the second distinct text corresponds to a certain small segment of audio B of the audio data B. And when the first electronic equipment sends third voice information which is not read by a user of the second electronic equipment, replacing a certain small segment of audio a corresponding to the semantic sub-result in the third voice information with an audio b corresponding to the second distinguishing text in the second voice information to generate new fourth voice information. Therefore, the user is prevented from recording the voice information again, the fourth voice information after information correction can be generated at the same time, and the fourth voice information related to the third voice information is sent to the second electronic equipment, so that the second electronic equipment replaces the received third voice information to be the fourth voice information, and the user using the second electronic equipment can directly listen to the fourth voice information. To a certain extent, communication errors caused by listening of the third voice message with the wrong information by the user of the second electronic device are avoided.

As will be understood by those skilled in the art, in the case that the second electronic device receives the third voice information sent by the first electronic device and is marked as the unread identifier, and receives the fourth voice information associated with the third voice information, the third voice information can be deleted and the received fourth voice information can be displayed.

In some embodiments, in order to facilitate the user to quickly determine the correspondence between each distinct text and the speech information, S201 includes:

and displaying a second distinguishing text in a second display area corresponding to the second voice information. Of course, the first distinguishing text may also be displayed in the first display area corresponding to the first voice message.

In other embodiments, in order to facilitate the user to determine the position of the audio corresponding to each distinct text in the speech information, S201 includes:

determining a second position of the voice data corresponding to the second distinguishing text in the second voice information;

and displaying a second distinguishing text in a second area of a second display area according to the second position, wherein the second display area is a display area corresponding to the second voice information, and the second area of the second display area corresponds to the second position of the second voice information.

It will be understood by those skilled in the art that the speech length of the second speech information is proportional to the display length of the second display area, i.e. the longer the speech length of the speech information, the longer the display length of the display area. The second distinguishing text corresponds to a second position of the second voice information, the second position is a specific position of the voice data of the second distinguishing text in the whole voice length of the second voice information, and a second area corresponding to the second distinguishing text in the second display area can be obtained through calculation according to the proportional relation between the voice length of the second voice information and the display length of the second display area, so that the second distinguishing text can be displayed in the second area.

Under the condition of displaying the first distinguishing text, the first distinguishing text is displayed in the first area of the first display area in the same way, and details are not repeated herein.

The user can roughly determine the position of the voice data corresponding to the second distinguishing text in the second voice message according to the position of the second area in the second display area. Optionally, the user may click the second display area to play the second voice information, and in the process of listening to the second voice information, the user may quickly locate the voice data corresponding to the second distinct text by dragging the progress bar corresponding to the second voice information, so as to listen to the voice data.

The second distinguishing text is displayed in the second area, so that a user can conveniently determine the position of the distinguishing text corresponding to the voice information, and conveniently listen to the voice corresponding to the distinguishing text in the voice information, so as to check whether the semantic analysis is accurate. In particular, when the number of the first distinction texts is plural and the number of the second distinction texts is plural, it is necessary for the user to repeatedly listen to and check the first speech information and the second speech information.

It should be noted that, in the voice information display method provided in the foregoing embodiment, the execution subject may be the first electronic device. The following embodiments further provide another voice information display method, and the execution subject may be a second electronic device. Referring to fig. 9, the method for displaying voice information includes:

s901, receiving a first voice input, and generating first voice information corresponding to the first voice input;

s902, sending first voice information to first electronic equipment;

s903, receiving a second voice input of the first voice information, and generating second voice information corresponding to the second voice input;

and S904, associating the second voice information with the first voice information, and sending the second voice information to the first electronic device, so that the first electronic device displays a second difference text with semantic difference in a second semantic result corresponding to the second voice information according to the first voice information and the second voice information.

In S901 and S903, a first voice input may be performed through a built-in or external microphone, and the second electronic terminal converts the received analog signal into a digital signal, that is, generates first voice information corresponding to the first voice input and records the voice information, and generates second voice information corresponding to the voice information.

After the first voice message and the second voice message are sent to the first electronic device, the first electronic device may execute the steps in S101 to S102, so as to display a second difference text having semantic difference in the second voice message according to the first voice message and the second voice message. For a specific implementation method, reference may be made to each embodiment of the voice information display method in which the execution subject is the first electronic device, and details are not described herein again.

In this embodiment, in a scene where the first voice information has an audio with incorrect semantics, the user can send the second voice information associated with the first voice information to the first electronic device, so that the first electronic device generates and displays the second difference text with semantic difference in the second semantic result, and the user of the first electronic device can quickly read the second difference text without listening to the voice information, thereby improving the communication efficiency; particularly, in the process of recording new voice information or editing new text messages, the user of the first electronic device can also perform effective reply according to the first distinguishing text and the second distinguishing text, and the problem that the user can supplement the reply again after listening to the second voice information under the condition that the user replies according to the first voice information is avoided or reduced to a certain extent.

Referring to fig. 10, in some embodiments, after receiving a second voice input to the first voice information and generating second voice information corresponding to the second voice input, the method includes:

s1001, receiving third voice information sent by first electronic equipment;

and S1002, under the condition that the third voice information is marked as the unread mark and fourth voice information associated with the third voice information is received, replacing the received third voice information with the fourth voice information.

Those skilled in the art will understand that the third speech information in S1001 may be the third speech information described in S801, and the fourth speech information in S1002 may be the fourth speech information described in S802. Similarly, after the user of the second electronic device reads the third voice information, that is, the first electronic device and the second electronic device may mark the read identifier with the third voice information, respectively.

And under the condition that the second electronic equipment receives the fourth voice information, replacing the received third voice information with the fourth voice information, so that a user using the second electronic equipment can directly listen to the fourth voice information. To a certain extent, a communication error caused by listening of the third voice message with the error message by the user of the second electronic device is avoided.

It should be noted that, in the voice information display method provided in the embodiment of the present application, the execution main body may be a voice information display apparatus. The embodiment of the present application takes the voice information display device executing the voice information display method as an example, and describes the voice information display device provided in the embodiment of the present application.

Fig. 11 is a schematic structural diagram of a voice information display apparatus according to another embodiment of the present application, and as shown in fig. 11, the voice information display apparatus may include:

the first receiving module 1101 is configured to receive first voice information and second voice information sent by a second electronic device, where the second voice information is associated with the first voice information;

the first display module 1102 is configured to display a second difference text having a semantic difference in a second semantic result when a first semantic result corresponding to the first voice information and the second semantic result corresponding to the second voice information differ.

In an optional example, the voice information display apparatus may further include:

a first analysis module 1103, configured to, in a case that a third speech information that replies to the first speech information is recorded, determine, according to a semantic classification to which the second distinct text belongs, a semantic sub-result of the same category as the semantic classification in a third semantic result corresponding to the third speech information;

and the audio replacing module 1104 is configured to replace the voice data corresponding to the semantic sub-result in the third voice information with the voice data corresponding to the second distinguished text in the second voice information, so as to generate fourth voice information.

In another optional example, the first display module 1102 is further configured to display the semantic sub-result in a third display area corresponding to the third speech information.

In a further alternative example,

the first receiving module 1101 is further configured to receive a first input of a second difference text;

the audio replacement module 1104 is further configured to replace the voice data corresponding to the semantic sub-result in the third voice information with the voice data corresponding to the second distinguished text in the second voice information in response to the first input, and generate fourth voice information.

In another optional example, the voice information display apparatus may further include:

a first sending module 1104, configured to send the fourth voice information to the second electronic device if the third voice information is not sent to the second electronic device, or if the third voice information is sent to the second electronic device and marked as a read identifier.

In another optional example, the audio replacement module 1104 is further configured to associate the fourth voice information with the third voice information if the third voice information has been sent to the second electronic device and is marked as an unread identification;

the first sending module 1104 is further configured to send the fourth voice information to the second electronic device, so that the second electronic device replaces the third voice information with the fourth voice information.

In another optional example, the first display module 1102 is further configured to determine that the speech data corresponding to the second distinct text is in a second position of the second speech information;

Fig. 12 is a schematic structural diagram of a voice information display apparatus according to another embodiment of the present application, and as shown in fig. 12, the voice information display apparatus may include:

a second receiving module 1201, configured to receive the first voice input, and generate first voice information corresponding to the first voice input; receiving a second voice input of the first voice information, and after generating second voice information corresponding to the second voice input, the method comprises the following steps:

a second sending module 1202, configured to send the first voice message to the first electronic device;

the second receiving module 1201 is further configured to receive a second voice input to the first voice information, and generate second voice information corresponding to the second voice input;

the second sending module 1202 is further configured to associate the second voice information with the first voice information, and send the second voice information to the first electronic device, so that the first electronic device displays, according to the first voice information and the second voice information, a second difference text with semantic difference in a second semantic result corresponding to the second voice information.

In another optional example, the second receiving module 1201 is further configured to receive third voice information sent by the first electronic device;

the voice information display device may include:

the second display module 1203 is configured to replace the third voice information with fourth voice information when the third voice information is marked as an unread flag and the fourth voice information associated with the third voice information is received.

The voice information display device in the embodiment of the present application may be an electronic device, or may be a component in an electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be a device other than a terminal. The electronic Device may be, for example, a Mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic Device, a Mobile Internet Device (MID), an Augmented Reality (AR)/Virtual Reality (VR) Device, a robot, a wearable Device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and may also be a server, a Network Attached Storage (Network Attached Storage, NAS), a personal computer (NAS), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not limited in particular.

The voice information display device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android (Android) operating system, an IOS operating system, or other possible operating systems, which is not specifically limited in the embodiments of the present application.

The voice information display device provided in the embodiment of the present application can implement each process implemented by the method embodiment in fig. 1 to 8 or the method embodiment in fig. 9 to 10, and is not described herein again to avoid repetition.

Optionally, as shown in fig. 13, an electronic device 100 is further provided in this embodiment of the present application, and includes a processor 1301, a memory 1302, and a program or an instruction stored in the memory 1302 and executable on the processor 1301, where the program or the instruction is executed by the processor 1301 to implement each process of the foregoing voice information display method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic device and the non-mobile electronic device described above. The electronic device 100 may be the aforementioned first electronic device or second electronic device.

Fig. 14 is a schematic hardware structure diagram of an electronic device implementing the embodiment of the present application.

The electronic device 100 includes, but is not limited to: a radio frequency unit 141, a network module 142, an audio output unit 143, an input unit 144, a sensor 145, a display unit 146, a user input unit 147, an interface unit 148, a memory 149, and a processor 140.

Those skilled in the art will appreciate that the electronic device 100 may further comprise a power supply (e.g., a battery) for supplying power to various components, and the power supply may be logically connected to the processor 140 via a power management system, so as to implement functions of managing charging, discharging, and power consumption via the power management system. The electronic device structure shown in fig. 14 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is not repeated here.

The radio frequency unit 141 is configured to receive first voice information and second voice information sent by a second electronic device, where the second voice information is associated with the first voice information;

the processor 140 is configured to display a second distinguished text having a semantic distinction in a second semantic result when a first semantic result corresponding to the first voice message and a second semantic result corresponding to the second voice message have a distinction.

Optionally, the voice information display apparatus may further include:

the processor 140 is configured to, under the condition that a third voice message replying the first voice message is recorded, determine, according to the semantic classification to which the second distinct text belongs, a semantic sub-result of the same category as the semantic classification in a third semantic result corresponding to the third voice message;

the processor 140 is configured to replace the voice data corresponding to the semantic sub-result in the third voice information with the voice data corresponding to the second distinguished text in the second voice information, and generate fourth voice information.

Optionally, the processor 140 is further configured to display the semantic sub-result in a third display area corresponding to the third speech information.

Optionally, the input unit 144 is configured to receive a first input of a second difference text;

the processor 140 is further configured to generate fourth speech information in response to the first input by replacing speech data in the third speech information corresponding to the semantic sub-result with speech data in the second speech information corresponding to the second distinguished text.

Optionally, the voice information display apparatus may further include:

the radio frequency unit 141 is configured to send the fourth voice information to the second electronic device when the third voice information is not sent to the second electronic device, or when the third voice information is sent to the second electronic device and marked as a read identifier.

Optionally, the processor 140 is further configured to associate the fourth voice information with the third voice information if the third voice information has been sent to the second electronic device and is marked as an unread flag;

the radio frequency unit 141 is further configured to send fourth voice information to the second electronic device, so that the second electronic device replaces the third voice information with the fourth voice information.

Optionally, the processor 140 is further configured to determine that the voice data corresponding to the second distinct text is in a second position of the second voice information;

In another electronic device, the input unit 144 is configured to receive a first voice input, and generate first voice information corresponding to the first voice input;

the radio frequency unit 141 is further configured to send first voice information to the first electronic device;

the input unit 144 is configured to receive a second voice input of the first voice information, and generate second voice information corresponding to the second voice input;

the radio frequency unit 141 is further configured to associate the second voice information with the first voice information, and send the second voice information to the first electronic device, so that the first electronic device displays, according to the first voice information and the second voice information, a second difference text with semantic difference in a second semantic result corresponding to the second voice information.

Optionally, the radio frequency unit 141 receives third voice information sent by the first electronic device;

the processor 140 is further configured to replace the third voice information with fourth voice information if the third voice information is marked as an unread flag and the fourth voice information associated with the third voice information is received.

It should be understood that, in the embodiment of the present application, the input Unit 144 may include a Graphics Processing Unit (GPU) 1441 and a microphone 1442, and the Graphics Processing Unit 1441 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 146 may include a display panel 1461, and the display panel 1461 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 147 includes at least one of a touch panel 1471 and other input devices 1472. The touch panel 1471 is also referred to as a touch panel. The touch panel 1471 may include two parts of a touch detection device and a touch controller. Other input devices 1472 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.

The memory 149 may be used to store software programs as well as various data. The memory 149 may mainly include a first storage area storing a program or an instruction and a second storage area storing data, wherein the first storage area may store an operating system, an application program or an instruction (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, memory 149 may include volatile memory or nonvolatile memory, or memory 149 may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. The volatile Memory may be a Random Access Memory (RAM), a Static random access Memory (Static RAM, SRAM), a Dynamic random access Memory (Dynamic RAM, DRAM), a Synchronous Dynamic random access Memory (Synchronous DRAM, SDRAM), a Double Data Rate Synchronous Dynamic random access Memory (Double Data Rate SDRAM, ddr SDRAM), an Enhanced Synchronous SDRAM (ESDRAM), a Synchronous Link DRAM (SLDRAM), and a Direct Memory bus RAM (DRRAM). The memory 149 in the embodiments of the subject application includes, but is not limited to, these and any other suitable types of memory.

Processor 140 may include one or more processing units; optionally, the processor 140 integrates an application processor, which primarily handles operations involving the operating system, user interface, and applications, and a modem processor, which primarily handles wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 140.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the foregoing voice information display method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device in the above embodiment. Readable storage media include computer readable storage media such as computer read only memory ROM, random access memory RAM, magnetic or optical disks, and the like.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the foregoing voice information display method embodiment, and the same technical effect can be achieved.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

The embodiments of the present application provide a computer program product, where the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the processes of the foregoing embodiment of the voice information display method, and can achieve the same technical effects, and in order to avoid repetition, details are not repeated here.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A voice information display method is applied to first electronic equipment, and is characterized by comprising the following steps:

and under the condition that a first semantic result corresponding to the first voice information and a second semantic result corresponding to the second voice information are different, displaying a second different text with semantic difference in the second semantic result.

2. The method of claim 1, wherein after displaying the semantically distinguished second distinct text in the second semantic result, the method further comprises:

under the condition that third voice information replying the first voice information is recorded, determining semantic sub-results of the same category as the semantic classification in third semantic results corresponding to the third voice information according to the semantic classification to which the second distinguishing text belongs;

and replacing the voice data corresponding to the semantic sub-result in the third voice information with the voice data corresponding to the second distinguishing text in the second voice information to generate fourth voice information.

3. The method according to claim 2, wherein after determining the semantic sub-result of the same category as the semantic classification in the third semantic result corresponding to the third speech information, the method further comprises:

displaying the semantic sub-result in a third display area corresponding to the third voice information.

4. The method according to claim 2, wherein the replacing the speech data corresponding to the semantic sub-result in the third speech information with the speech data corresponding to the second distinct text in the second speech information to generate fourth speech information comprises:

receiving a first input to the second distinct text;

and responding to the first input, replacing the voice data corresponding to the semantic sub-result in the third voice information with the voice data corresponding to the second distinguishing text in the second voice information, and generating fourth voice information.

5. The method of claim 2, wherein after the generating the fourth speech information, the method further comprises:

and sending the fourth voice information to the second electronic equipment under the condition that the third voice information is not sent to the second electronic equipment or under the condition that the third voice information is sent to the second electronic equipment and marked as a read identifier.

6. The method of claim 2, wherein after the generating the fourth speech information, the method further comprises;

associating the fourth voice information with the third voice information when the third voice information is sent to the second electronic equipment and marked as an unread identification;

and sending the fourth voice information to the second electronic equipment so that the second electronic equipment replaces the third voice information with the fourth voice information.

7. The method of claim 1, wherein displaying the semantically distinguished second difference text in the second semantic result comprises:

and displaying a second distinguishing text in a second area of a second display area according to the second position, wherein the second display area is a display area corresponding to the second voice message, and the second area of the second display area corresponds to the second position of the second voice message.

8. A voice information display method is applied to a second electronic device and is characterized by comprising the following steps:

sending the first voice information to first electronic equipment;

9. The method of claim 8, wherein after receiving a second speech input for the first speech information and generating second speech information corresponding to the second speech input, comprising:

receiving third voice information sent by the first electronic equipment;

and replacing the third voice information with fourth voice information under the condition that the third voice information is marked as an unread identification and the fourth voice information associated with the third message is received.

10. A voice information display device characterized by comprising:

the first receiving module is used for receiving first voice information and second voice information sent by second electronic equipment, and the second voice information is associated with the first voice information;

the first display module is used for displaying a second difference text with semantic difference in the second semantic result under the condition that the first semantic result corresponding to the first voice information and the second semantic result corresponding to the second voice information are different.