CN112637407A

CN112637407A - Voice input method and device and electronic equipment

Info

Publication number: CN112637407A
Application number: CN202011529379.7A
Authority: CN
Inventors: 张孝东
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-04-09
Also published as: WO2022135259A1

Abstract

The application discloses a voice input method, a voice input device and electronic equipment, and belongs to the technical field of communication. The problem that the efficiency of editing the message by the electronic equipment is low can be solved. The method comprises the following steps: receiving a first voice message input by a user, and displaying first voice content corresponding to the first voice message; receiving a first input of a user to first content, wherein the first content is content corresponding to target content in first voice content; and responding to the first input, and replacing or deleting the target voice message corresponding to the target content in the first voice message. The embodiment of the application is applied to the process of sending the message by the electronic equipment.

Description

Voice input method and device and electronic equipment

Technical Field

The application belongs to the technical field of communication, and particularly relates to a voice input method and device and electronic equipment.

Background

Generally, when a user communicates with a contact through a chat application program in an electronic device, the user can communicate by sending a voice message, specifically, the user can input a voice while performing a long-press input on a voice recording control at a chat conversation interface, so that the electronic device can record the voice content of the user in real time, and send the voice message to the contact after the voice recording is completed.

However, in the above method, when the user performs voice input, if the voice input content is incorrect, the user needs to trigger the electronic device to cancel the ongoing voice recording through input, and send an error-free voice message to the contact after the voice recording is performed again. Therefore, the operation of the user is cumbersome and time-consuming, and especially, under the condition that the voice content input by the user is more, the time-consuming is longer, so that the efficiency of editing the message by the electronic device is lower.

Disclosure of Invention

The embodiment of the application aims to provide a voice input method, a voice input device and electronic equipment, and can solve the problem that the efficiency of editing a message by the electronic equipment is low.

In order to solve the technical problem, the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides a speech input method, where the method includes: receiving a first voice message input by a user, and displaying first voice content corresponding to the first voice message; receiving a first input of a user to first content, wherein the first content is content corresponding to target content in first voice content; and responding to the first input, and replacing or deleting the target voice message corresponding to the target content in the first voice message.

In a second aspect, an embodiment of the present application provides a voice input device, including: the device comprises a receiving module, a display module and a processing module. The receiving module is used for receiving a first voice message input by a user. And the display module is used for displaying the first voice content corresponding to the first voice message. The receiving module is further configured to receive a first input of a user for first content, where the first content is content corresponding to target content in the first voice content. And the processing module is used for responding to the first input received by the receiving module and replacing or deleting the target voice message corresponding to the target content in the first voice message.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In this embodiment of the application, a user may input a first voice message to trigger the electronic device to display first voice content corresponding to the first voice message, so that the user may input first content corresponding to target content in the first voice content, and the electronic device may replace or delete the target voice message corresponding to the target content in the first voice message. When the user inputs the first voice message to be sent, the electronic device can display the first voice content corresponding to the first voice message in real time, so that when the target content in the first voice content is wrong, the user can input the first content corresponding to the target content, and the electronic device can replace or delete the target voice message with the mistake in the first voice message. The user does not need to input the first voice message again, so that the operation of the user can be simplified, the user can visually determine the wrong content and timely modify the wrong content through the displayed first voice content, and the efficiency of editing the message by the electronic equipment is improved.

Drawings

Fig. 1 is a schematic diagram of a speech input method according to an embodiment of the present application;

fig. 2 is one of schematic diagrams of an example of an interface of a mobile phone according to an embodiment of the present disclosure;

fig. 3 is a second schematic diagram of a speech input method according to an embodiment of the present application;

fig. 4 is a second schematic diagram of an example of an interface of a mobile phone according to an embodiment of the present disclosure;

fig. 5 is a third schematic diagram of a voice input method according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a speech input device according to an embodiment of the present application;

fig. 7 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The speech input method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

In the embodiment of the application, when a user sends a voice message to a target contact through a chat application program in an electronic device (for example, i am in east to people and you can get to find me and we have a meal together), the user can input the voice message to be sent through a dialog interface corresponding to the target contact, and if the user realizes that wrong content (for example, east to people) exists in the voice message to be sent, the electronic device can be triggered to acquire the voice message input by the user and control the voice message to be in a state to be edited, so that the user can input the wrong content in the voice message (namely, input correct content corresponding to the wrong content) to perform voice input on the correct content (for example, west to people) so that the electronic device can replace the wrong content in the voice message with the correct content according to the input of the user, to obtain a new voice message (for example, i am now in the west road of people, you can come to find me and we have a meal together), and send the new voice message to the target contact person, so that the message editing efficiency of the electronic device can be improved.

An embodiment of the present application provides a voice input method, and fig. 1 shows a flowchart of the voice input method provided in the embodiment of the present application, where the method can be applied to an electronic device. As shown in fig. 1, the voice input method provided by the embodiment of the present application may include steps 201 to 203 described below.

Step 201, the electronic device receives a first voice message input by a user, and displays first voice content corresponding to the first voice message.

In the embodiment of the application, under the condition that the user sends the voice message to the contact, the user can input the first voice message to be sent, so that the electronic device can acquire and display first voice content (namely, text content) corresponding to the first voice message input by the user, and then the user can input the first content, so that the electronic device can replace or delete a target voice message corresponding to the target content in the first voice message.

Optionally, in this embodiment of the application, a user may input a control for sending voice displayed in a conversation page when the electronic device displays the conversation page corresponding to the target contact in the chat-type application, and then the user may input voice content to input the first voice message.

Optionally, in this embodiment of the application, a user may perform a long-press input or a click input on a control for sending a voice to trigger the electronic device to be in a voice recording state (i.e., execute a recording function), where the first voice message is a voice input performed during a process in which the user performs the long-press input on the control for sending the voice, or a voice input performed after the user performs the click input on the control for sending the voice.

Optionally, in this embodiment of the application, while the user inputs the first voice message, the electronic device may execute a recording function to record the voice input of the user (that is, the first voice message), acquire first voice content corresponding to the first voice message, and display the first voice content.

When the user inputs the control for sending the voice as a long-press input, the long-press input and the voice input of the user (i.e., the first voice message) are simultaneously performed, that is, the user performs the long-press input and the voice input simultaneously, and the electronic device may acquire the first voice content corresponding to the first voice message of the user. Under the condition that the user does not perform long-press input, if the user performs voice input, the electronic device cannot acquire first voice content corresponding to the first voice message of the user.

Optionally, in this embodiment of the application, when the user starts inputting the first voice message, the electronic device may convert the voice content corresponding to the first voice message into text content according to the acquired voice content corresponding to the first voice message, and display the first voice content (i.e., text content) corresponding to the first voice message at a preset position in the screen.

Optionally, in this embodiment of the application, after the user completes input (i.e., voice input) of the first voice message, the electronic device may directly display a complete first voice content corresponding to the first voice message at a preset position in the screen; or, the electronic device may gradually display the first voice content corresponding to the first voice message at a preset position in the screen (that is, display the corresponding text according to the progress of the voice input of the user) while the user inputs the first voice message, that is, convert the voice content of the first voice message into the text content in real time and display the text content.

Optionally, in this embodiment of the application, the first voice content corresponding to the displayed first voice message is in an editable state, and the user may select and input some or all of the first text content, so that the electronic device edits (for example, modifies or replaces) the selected content.

Optionally, in this embodiment of the application, the electronic device may obtain first voice content corresponding to the first voice message, and control the first voice message to be in a state to be edited.

Optionally, in this embodiment of the application, after the electronic device acquires the first voice content corresponding to the first voice message, the user may perform a sliding input while performing a long-time pressing input on the voice input control so as to slide to a preset region (for example, a display region corresponding to the first voice content is displayed), so as to trigger the electronic device to control the first voice content to be in the to-be-edited state.

It should be noted that the long-press input and the slide input performed on the voice input control may be a complete input, that is, there is no time interval between the long-press input and the slide input, and the long-press input and the slide input are a complete and continuous input.

Optionally, in this embodiment of the application, in a case that the electronic device controls the first voice message to be in a state to be edited, the user may trigger the electronic device to replace the content in the first voice message with other content by inputting (that is, the user may trigger the electronic device to modify the wrong content in the first voice message).

The electronic device is taken as a mobile phone for illustration. As shown in fig. 2, when the user makes a first input to the voice input control, the mobile phone displays a recording interface 10, where a text display area 11 is included in the recording interface 10, so as to synchronously display text content corresponding to the voice input of the user in the text display area 11 (for example, i am now in the east of people, you can come to find me, and we have a meal together).

Step 202, the electronic device receives a first input of a first content from a user.

In an embodiment of the present application, the first content is a content corresponding to a target content in the first voice content.

Optionally, in this embodiment of the application, the user may perform voice input again, so that the electronic device may record the voice content corresponding to the first content; alternatively, the user may make a text input to input the first content.

Step 203, the electronic device responds to the first input, and replaces or deletes the target voice message corresponding to the target content in the first voice message.

Optionally, in this embodiment of the application, the electronic device may replace the target content in the first voice message with the first content; alternatively, the target content in the first voice message is deleted.

Optionally, in this embodiment of the application, the electronic device may perform semantic analysis processing on the first voice message to determine the target content from the first voice message.

Optionally, in this embodiment of the application, the electronic device may intercept the target content from the first voice message, and perform combined processing on the voice content corresponding to the first content and the intercepted first voice message to obtain a new voice message (i.e., a third voice message described below).

It should be noted that the electronic device may delete the voice content corresponding to the target content from the first voice message, and add the voice content corresponding to the first content to a position of the voice content corresponding to the target content in the first voice message, so as to combine to obtain a new voice message.

Optionally, in this embodiment of the application, when the electronic device displays the first voice content corresponding to the first voice message, the electronic device may replace the target text content in the first voice content with the text content corresponding to the first content according to the input of the user, so as to obtain the replaced first voice content, and update and display the first voice content in the screen.

Optionally, in this embodiment of the application, the electronic device may replace the target content in the first voice message with the first content according to the replaced first voice content, so as to generate a new voice message.

Optionally, in this embodiment of the application, the electronic device may send a new voice message obtained after the replacement to the contact.

The embodiment of the application provides a voice input method, wherein a user can input a first voice message to be sent to trigger an electronic device to display first voice content corresponding to the first voice message, so that the user can input first content corresponding to target content in the first voice content, and the electronic device can replace or delete the target voice message corresponding to the target content in the first voice message. When the user inputs the first voice message to be sent, the electronic device can display the first voice content corresponding to the first voice message in real time, so that when the target content in the first voice content is wrong, the user can input the first content corresponding to the target content, and the electronic device can replace or delete the target voice message with the mistake in the first voice message. The user does not need to input the first voice message again, so that the operation of the user can be simplified, the user can visually determine the wrong content and timely modify the wrong content through the displayed first voice content, and the efficiency of editing the message by the electronic equipment is improved.

Optionally, in this embodiment, with reference to fig. 1, as shown in fig. 3, after the step 201, the voice input method provided in this embodiment may further include a step 301 and a step 302 described below, and the step 202 may specifically be implemented by a step 202a described below, and the step 203 may specifically be implemented by a step 203a described below.

Step 301, the electronic device receives a second input from the user.

In this embodiment of the application, the second input is a selection input of the target content by the user.

In this embodiment of the application, the second input is input of the target text content (i.e. the target content) in the first voice content by the user.

Optionally, in this embodiment of the application, the user may input the target content to trigger the electronic device to select the target content, and further, the target content may be marked and displayed.

Optionally, in this embodiment of the application, the second input may be any one of: a user click input for the target content, a user long press input for the target content, a user double click input for the target content, etc.

Step 302, the electronic device determines a target voice message according to the target content in response to the second input.

Optionally, in this embodiment of the application, the electronic device may mark and display the target content, and determine, from the first voice message, a target voice message corresponding to the target content according to a position of the target content in the first voice content.

Illustratively, as shown in connection with fig. 2, the user may, for the text content displayed in the text display area 11: the character 'the east of people' in the 'I can come to find me and go to eat together' is input to trigger the mobile phone to highlight the character corresponding to the 'the east of people' so that the position of the 'the east of people' in the 'I can come to find me and go to eat together' is determined from the voice message corresponding to the voice input of the user according to the 'the east of people' in the 'I' in the east of people 'and the position of the' the east of people 'in the' I 'and the' the east of people 'in the' way.

Step 202a, the electronic device receives a second voice message input by the user.

In an embodiment of the present application, the second voice message is a voice message corresponding to the first content.

Optionally, in this embodiment of the application, after the electronic device displays the first voice content, the user may perform voice input on the first content, so that the electronic device may acquire a second voice message (i.e., a voice message corresponding to the first content) input by the user.

Step 203a, the electronic device replaces the target voice message in the first voice message with the second voice message according to the second voice message, or deletes the target voice message in the first voice message; a third voice message is obtained.

Optionally, in this embodiment of the application, the voice content corresponding to the voice input performed by the user at least includes the voice content of the first content.

Optionally, in this embodiment of the application, the voice content corresponding to the voice input performed by the user may further include other content, and the electronic device may obtain the voice content of the first content by performing semantic analysis on the voice content corresponding to the voice input performed by the user.

Optionally, in this embodiment of the application, the electronic device processes the first voice message according to the first content obtained through the semantic analysis, so as to replace or delete the target voice content.

In the embodiment of the application, after the electronic device displays first voice content corresponding to a first voice message, a user can select and input target content in the first voice content, so that the electronic device can determine a target voice message in the first voice message according to the input of the user to the target content, and replace the target voice message in the first voice message with a second voice message according to the second voice message corresponding to the first content input by the user, or delete the target voice message in the first voice message; and obtaining the third voice message, so that the user can accurately determine the content which needs to be replaced or deleted in the first voice message.

Optionally, in this embodiment of the present application, after step 201 described above, the voice input method provided in this embodiment of the present application may further include steps 401 to 403 described below.

Step 401, the electronic device displays a target control.

In this embodiment of the application, the target control is used to edit the first voice message.

Optionally, in this embodiment of the application, when the user starts inputting the voice input control (that is, when the electronic device displays the voice input interface), the electronic device may display the target control at a preset position in the screen, so that the user may input the target control to trigger the electronic device to be in the voice recording state.

Optionally, in this embodiment of the application, when the electronic device displays the target control, the electronic device does not need to display the first voice content corresponding to the first voice message in the screen.

Step 402, the electronic device receives a third input from the user.

In this embodiment of the application, the third input is an input of a target control by a user.

Optionally, in this embodiment of the application, after performing long-time pressing input on the voice input control, the user may perform sliding input to slide to the position where the target control is located, so as to trigger the electronic device to be in the voice recording state.

And step 403, the electronic equipment responds to the third input and controls the electronic equipment to be in a voice recording state.

Optionally, in this embodiment of the application, when the electronic device is in a voice recording state, the user may input the first voice message to record the voice content input by the user.

For example, referring to fig. 2, as shown in fig. 4, when a user inputs a voice input control, the mobile phone displays a recording interface 10, the recording interface 10 includes a target control 12, and the user can input the target control 12 after completing inputting a first voice message (e.g., i am now in east of people, you can get to find me, and we have a meal together), so as to trigger the mobile phone to be in a voice recording state, so that the user can perform voice input again (e.g., replace east of people with west of people), so that the mobile phone can replace a voice content corresponding to east of people in the first voice message with a voice content corresponding to west of people according to the voice input of the user.

Optionally, in this embodiment of the application, before "replace or delete the target voice message corresponding to the target content in the first voice message" in step 203, the voice input method provided in this embodiment of the application may further include step 404 described below.

In step 404, the electronic device performs semantic analysis processing on the second voice message in response to the first input, and determines the first content and the target content.

Optionally, in this embodiment of the application, the electronic device may perform semantic analysis processing on the second voice message in an intelligent semantic analysis manner, so as to determine a content to be replaced (i.e., a target content) in the first voice message and the replaced first content.

For example, when the user needs to modify the "east of people" in the first voice message "i am now in east of people, you can come to find me, and we go to eat together", the content of the voice input by the user may be "replace east of people with west of people" or "wrong east of people, should be west of people", etc., so that the electronic device may replace east of people "in the first voice message with west of people" to obtain a replaced first voice message "i am now in west of people, you can come to find me, and we go to eat together".

In the embodiment of the application, after the electronic device controls the first voice message to be in the state to be edited, the electronic device can display the target control for editing the first voice message, so that a user can input the target control to enable the electronic device to be in the voice recording state, and perform semantic analysis processing on the voice input of the user to accurately determine the first content and the target content, so that the user can flexibly control the electronic device to execute corresponding operations through voice input.

It should be noted that, in the case of executing step 404, the specific step of step 203 may be replaced by "the electronic device replaces or deletes the target voice message corresponding to the target content in the first voice message".

Optionally, in this embodiment, with reference to fig. 1, as shown in fig. 5, after step 203, the voice input method provided in this embodiment may further include steps 501 to 503 described below.

Step 501, the electronic device receives a fourth input from the user.

In this embodiment, the fourth input is an input of a fourth voice message by the user.

Optionally, in this embodiment of the application, after the electronic device replaces the target content in the first voice message with the first content to obtain the third voice message, the user may perform voice input again to input the fourth voice message.

It should be noted that, in the case that the user inputs the fourth voice message, the first voice message and the fourth voice message may be understood as a complete voice message (i.e., a fifth voice message described below), and when the user inputs the complete voice message, and when the user completes inputting the first partial voice message (i.e., the first voice message), the user may pause the voice input first to replace the wrong content (i.e., the target content) in the first voice message, so that after the replacement of the wrong content is completed, the user continues to input the second partial voice message (i.e., the fourth voice message).

Step 502, the electronic device responds to the fourth input, and combines the fourth voice message and the third voice message to obtain a fifth voice message.

Optionally, in this embodiment of the application, the electronic device may add the fourth voice message to the third voice message to obtain a complete voice message (i.e., the fifth voice message).

Optionally, in this embodiment of the application, the electronic device may perform voice concatenation processing on the fourth voice message and the third voice message, so as to combine the two voice messages to obtain one voice message.

Step 503, the electronic device sends a fifth voice message including the third voice message.

Optionally, in this embodiment of the application, when the user performs a fourth input, that is, the electronic device receives an input of the fourth voice message from the user, the voice message sent by the electronic device is a fifth voice message including the third voice message; and under the condition that the user does not perform the fourth input, the voice message sent by the electronic equipment is the third voice message.

In this embodiment, before the electronic device sends the third voice message, the user may input the fourth voice message, so that the electronic device may combine the fourth voice message and the third voice message to obtain the fifth voice message, and thus the electronic device may send the fifth voice message including the third voice message, which may improve flexibility of sending the voice message by the electronic device.

It should be noted that, in the voice input method provided in the embodiment of the present application, the execution subject may be a voice input device, or a control module in the voice input device for executing the voice input method. In the embodiment of the present application, a voice input device executing a loaded voice input method is taken as an example to describe the voice input device provided in the embodiment of the present application.

Fig. 6 shows a schematic diagram of a possible structure of the voice input device involved in the embodiment of the present application. As shown in fig. 6, the voice input device 70 may include: a receiving module 71, a display module 72 and a processing module 73.

The receiving module 71 is configured to receive a first voice message input by a user. The display module 72 is configured to display first voice content corresponding to the first voice message. The receiving module 71 is further configured to receive a first input of the first content by the user, where the first content is content corresponding to target content in the first voice content. And the processing module 73 is configured to replace or delete a target voice message corresponding to the target content in the first voice message in response to the first input received by the receiving module 71.

In one possible implementation manner, the speech input device 70 provided by the embodiment of the present application may further include: and determining a module. The receiving module 71 is further configured to receive a second input of the user after the display module 72 displays the first voice content corresponding to the first voice message, where the second input is a selection input of the target content by the user. A determining module, configured to determine the target voice message according to the target content in response to the second input received by the receiving module 71. The receiving module 71 is specifically configured to receive a second voice message input by the user, where the second voice message is a voice message corresponding to the first content. The processing module 73 is specifically configured to replace the target voice message in the first voice message with the second voice message according to the second voice message, or delete the target voice message in the first voice message; a third voice message is obtained.

In one possible implementation manner, the speech input device 70 provided by the embodiment of the present application may further include: and a control module. The display module 72 is further configured to display a target control after displaying the first voice content corresponding to the first voice message, where the target control is used to edit the first voice message. The receiving module 71 is further configured to receive a third input from the user, where the third input is an input from the user to the target control. And the control module is used for responding to the third input received by the receiving module 71 and controlling the voice input device to be in a voice recording state.

In a possible implementation manner, the processing module 73 is further configured to perform semantic analysis processing on the second voice message to determine the first content and the target content before replacing or deleting a target voice message corresponding to the target content in the first voice message.

In one possible implementation manner, the speech input device 70 provided by the embodiment of the present application may further include: and a sending module. The receiving module 71 is further configured to receive a fourth input of the user after the processing module 73 replaces or deletes the target voice message corresponding to the target content in the first voice message, where the fourth input is an input of the fourth voice message by the user. The processing module 73 is further configured to, in response to the fourth input received by the receiving module 71, perform a combination process on the fourth voice message and the third voice message to obtain a fifth voice message. And the sending module is used for sending a fifth voice message comprising the third voice message.

The voice input device provided by the embodiment of the present application can implement each process implemented by the voice input device in the above method embodiments, and for avoiding repetition, detailed description is not repeated here.

The embodiment of the application provides a voice input device, and when a user inputs a first voice message to be sent, an electronic device can display first voice content corresponding to the first voice message in real time, so that when a target content in the first voice content is incorrect, the user can input the first content corresponding to the target content, and the electronic device can replace or delete the target voice message with the error in the first voice message. The user does not need to input the first voice message again, so that the operation of the user can be simplified, the user can visually determine the wrong content and timely modify the wrong content through the displayed first voice content, and the efficiency of editing the message by the electronic equipment is improved.

The voice input device in the embodiment of the present application may be a device, and may also be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The voice input device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

Optionally, an electronic device is further provided in this embodiment of the present application, and includes a processor 110, a memory 109, and a program or an instruction stored in the memory 109 and executable on the processor 110, where the program or the instruction is executed by the processor 110 to implement each process of the foregoing voice input method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.

It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 7 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 100 includes, but is not limited to: a radio frequency unit 101, a network module 102, an audio output unit 103, an input unit 104, a sensor 105, a display unit 106, a user input unit 107, an interface unit 108, a memory 109, and a processor 110.

Those skilled in the art will appreciate that the electronic device 100 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 110 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 7 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

Wherein, the user input unit 107 is configured to receive a first voice message input by a user.

A display unit 106, configured to display first voice content corresponding to the first voice message.

And a processor 110, configured to replace or delete a target voice message corresponding to the target content in the first voice message in response to the first input.

The embodiment of the application provides an electronic device, and when a user inputs a first voice message to be sent, the electronic device can display first voice content corresponding to the first voice message in real time, so that when a target content in the first voice content is incorrect, the user can input the first content corresponding to the target content, and the electronic device can replace or delete the target voice message with the error in the first voice message. The user does not need to input the first voice message again, so that the operation of the user can be simplified, the user can visually determine the wrong content and timely modify the wrong content through the displayed first voice content, and the efficiency of editing the message by the electronic equipment is improved.

Optionally, the user input unit 107 is further configured to receive a second input from the user, where the second input is a selection input from the user for a target content in the first text content.

The processor 110 is further configured to determine a target voice message based on the target content in response to the second input.

The user input unit 107 is specifically configured to receive a second voice message input by the user, where the second voice message is a voice message corresponding to the first content.

The processor 110 is specifically configured to replace the target voice message in the first voice message with the second voice message according to the second voice message, or delete the target voice message in the first voice message; a third voice message is obtained.

The display unit 106 is further configured to display a target control, where the target control is used to edit the first voice message.

The user input unit 107 is further configured to receive a third input from the user, where the third input is an input from the user to the target control.

The processor 110 is further configured to control the electronic device to be in a voice recording state in response to a third input.

The processor 110 is further configured to perform semantic analysis processing on the second voice message to determine the first content and the target content.

The user input unit 107 is further configured to receive a fourth input from the user, where the fourth input is an input from the user to a fourth voice message.

The processor 110 is further configured to, in response to the fourth input, perform a combination process on the fourth voice message and the third voice message to obtain a fifth voice message.

A network module 102, configured to send a fifth voice message including the third voice message.

It should be understood that, in the embodiment of the present application, the input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, and the Graphics Processing Unit 1041 processes image data of a still picture or a video obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 107 includes a touch panel 1071 and other input devices 1072. The touch panel 1071 is also referred to as a touch screen. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 109 may be used to store software programs as well as various data including, but not limited to, application programs and an operating system. The processor 110 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the foregoing voice input method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the foregoing voice input method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of speech input, the method comprising:

receiving a first voice message input by a user, and displaying first voice content corresponding to the first voice message;

receiving first input of a user on first content, wherein the first content is content corresponding to target content in the first voice content;

and responding to the first input, and replacing or deleting the target voice message corresponding to the target content in the first voice message.

2. The method of claim 1, wherein after displaying the first voice content corresponding to the first voice message, the method further comprises:

receiving a second input of the user, wherein the second input is a selection input of the target content by the user;

in response to the second input, determining the target voice message according to the target content;

the receiving of the first input of the first content by the user comprises:

receiving a second voice message input by a user, wherein the second voice message is a voice message corresponding to the first content;

the replacing or deleting the target voice message corresponding to the target content in the first voice message includes:

replacing the target voice message in the first voice message with the second voice message or deleting the target voice message in the first voice message according to the second voice message; a third voice message is obtained.

3. The method of claim 1, wherein after displaying the first voice content corresponding to the first voice message, the method further comprises:

displaying a target control for editing the first voice message;

receiving a third input of the user, wherein the third input is input of the target control by the user;

and responding to the third input, and controlling the electronic equipment to be in a voice recording state.

4. The method according to claim 2, wherein before the replacing or deleting the target voice message corresponding to the target content in the first voice message, the method further comprises:

and performing semantic analysis processing on the second voice message to determine the first content and the target content.

5. The method according to any one of claims 1 to 4, wherein after the replacing or deleting of the target voice message corresponding to the target content in the first voice message, the method further comprises:

receiving a fourth input of the user, wherein the fourth input is the input of the fourth voice message by the user;

responding to the fourth input, and combining the fourth voice message and the third voice message to obtain a fifth voice message;

transmitting the fifth voice message including the third voice message.

6. A voice input apparatus, characterized in that the voice input apparatus comprises: the device comprises a receiving module, a display module and a processing module;

the receiving module is used for receiving a first voice message input by a user;

the display module is used for displaying first voice content corresponding to the first voice message;

the receiving module is further configured to receive a first input of a first content by a user, where the first content is a content corresponding to a target content in the first voice content;

the processing module is configured to replace or delete a target voice message corresponding to the target content in the first voice message in response to the first input received by the receiving module.

7. The voice input apparatus according to claim 6, characterized in that the voice input apparatus further comprises: a determination module;

the receiving module is further configured to receive a second input of the user after the display module displays the first voice content corresponding to the first voice message, where the second input is a selection input of the user for the target content;

the determining module is configured to determine the target voice message according to the target content in response to the second input received by the receiving module;

the receiving module is specifically configured to receive a second voice message input by a user, where the second voice message is a voice message corresponding to the first content;

the processing module is specifically configured to replace the target voice message in the first voice message with the second voice message according to the second voice message, or delete the target voice message in the first voice message; a third voice message is obtained.

8. The voice input apparatus according to claim 6, characterized in that the voice input apparatus further comprises: a control module;

the display module is further configured to display a target control after displaying first voice content corresponding to the first voice message, where the target control is used to edit the first voice message;

the receiving module is further configured to receive a third input of the user, where the third input is an input of the target control by the user;

and the control module is used for responding to the third input received by the receiving module and controlling the voice input device to be in a voice recording state.

9. The voice input device of claim 7, wherein the processing module is further configured to perform semantic analysis processing on the second voice message to determine the first content and the target content before replacing or deleting a target voice message corresponding to the target content in the first voice message.

10. The voice input apparatus according to any one of claims 6 to 9, characterized in that the message transmission apparatus further comprises: a sending module;

the receiving module is further configured to receive a fourth input of the user after the processing module replaces or deletes the target voice message corresponding to the target content in the first voice message, where the fourth input is an input of the fourth voice message by the user;

the processing module is further configured to, in response to the fourth input received by the receiving module, perform combined processing on the fourth voice message and the third voice message to obtain a fifth voice message;

the sending module is configured to send the fifth voice message including the third voice message.

11. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the voice input method of any of claims 1-5.

12. A readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the speech input method according to any one of claims 1-5.