WO2022135259A1

WO2022135259A1 - Speech input method and apparatus, and electronic device

Info

Publication number: WO2022135259A1
Application number: PCT/CN2021/138688
Authority: WO
Inventors: 张孝东
Original assignee: 维沃移动通信有限公司
Priority date: 2020-12-22
Filing date: 2021-12-16
Publication date: 2022-06-30
Also published as: CN112637407A

Abstract

The present application relates to the technical field of communications. Disclosed are a speech input method and apparatus, and an electronic device. The method comprises: receiving a first speech message input by a user and displaying first speech content corresponding to the first speech message; receiving a first input for first content by a user, the first content being content corresponding to target content in the first speech content; and in response to the first input, replacing or deleting a target speech message corresponding to the target content in the first speech message. Embodiments of the present application are applied in a process of sending a message by an electronic device.

Description

Voice input method, device and electronic device

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to Chinese Patent Application No. 202011529379.7 filed in China on December 22, 2020, the entire contents of which are hereby incorporated by reference.

technical field

The present application belongs to the field of communication technologies, and in particular relates to a voice input method, device and electronic device.

Background technique

Usually, when a user communicates with a contact through a chat application in an electronic device, the user can communicate by sending a voice message. Specifically, the user can long-press the voice recording control on the chat dialog interface to input At the same time, voice input is performed, so that the electronic device can record the user's voice content in real time, and after the voice recording is completed, send a voice message to the contact.

However, in the above method, when the user performs voice input, if the content of the voice input is incorrect, the user needs to trigger the electronic device to cancel the ongoing voice recording through the input, and re-record the voice before sending the correct voice message to the contact. Therefore, the user's operation is cumbersome and time-consuming, especially when the user inputs a lot of voice content, it takes a long time, and thus the efficiency of editing the message by the electronic device is low.

SUMMARY OF THE INVENTION

The purpose of the embodiments of the present application is to provide a voice input method, device, and electronic device, which can solve the problem of low efficiency in editing a message by the electronic device.

In order to solve the above technical problems, this application is implemented as follows:

In a first aspect, an embodiment of the present application provides a voice input method, the method includes: receiving a first voice message input by a user, displaying first voice content corresponding to the first voice message; An input, the first content is content corresponding to the target content in the first voice content; in response to the first input, the target voice message corresponding to the target content in the first voice message is replaced or deleted.

In a second aspect, an embodiment of the present application provides a voice input device, which includes: a receiving module, a display module, and a processing module. The receiving module is used for receiving the first voice message input by the user. The display module is used for displaying the first voice content corresponding to the first voice message. The receiving module is further configured to receive a user's first input of the first content, where the first content is content corresponding to the target content in the first voice content. The processing module is configured to replace or delete the target voice message corresponding to the target content in the first voice message in response to the first input received by the receiving module.

In a third aspect, embodiments of the present application provide an electronic device, the electronic device includes a processor, a memory, and a program or instruction stored on the memory and executable on the processor, the program or instruction being The processor implements the steps of the method according to the first aspect when executed.

In a fourth aspect, an embodiment of the present application provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the method according to the first aspect are implemented .

In a fifth aspect, an embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction, and implement the first aspect the method described.

In this embodiment of the present application, the user can input the first voice message to trigger the electronic device to display the first voice content corresponding to the first voice message, so that the user can enter the first voice content corresponding to the target content in the first voice content. A content is input, so that the electronic device can replace or delete the target voice message corresponding to the target content in the first voice message. When the user inputs the first voice message to be sent, the electronic device can display the first voice content corresponding to the first voice message in real time, so when the target content in the first voice content is incorrect, the user can The first content corresponding to the target content is input, so that the electronic device can replace or delete the erroneous target voice message in the first voice message. There is no need for the user to re-input the first voice message, so the user's operation can be simplified, and the displayed first voice content facilitates the user to intuitively determine the erroneous content and modify it in time, thereby improving the efficiency of message editing by the electronic device.

Description of drawings

1 is one of the schematic diagrams of a voice input method provided by an embodiment of the present application;

Fig. 2 is one of the example schematic diagrams of an interface of a mobile phone provided by an embodiment of the present application;

3 is the second schematic diagram of a voice input method provided by an embodiment of the present application;

4 is the second schematic diagram of an example of an interface of a mobile phone provided by an embodiment of the present application;

5 is a third schematic diagram of a voice input method provided by an embodiment of the present application;

6 is a schematic structural diagram of a voice input device provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

The terms "first", "second" and the like in the description and claims of the present application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in sequences other than those illustrated or described herein, and distinguish between "first", "second", etc. The objects are usually of one type, and the number of objects is not limited. For example, the first object may be one or more than one. In addition, "and/or" in the description and claims indicates at least one of the connected objects, and the character "/" generally indicates that the associated objects are in an "or" relationship.

The voice input method provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

In the embodiment of the present application, when the user sends a voice message to the target contact through a chat application program in the electronic device (for example: I am on Renmin East Road The voice message to be sent can be input on the dialogue interface corresponding to the target contact. If the user realizes that there is an error in the voice message to be sent (for example, Renmin East Road), the electronic device can be triggered to obtain the user input. After the voice message is sent, the voice message is controlled to be in the pending editing state, so that the user can input the wrong content in the voice message (that is, input the correct content corresponding to the wrong content), so as to correct the correct content (such as Renmin West Road) ) for voice input, so that the electronic device can replace the wrong content in the voice message with the correct content according to the user's input to get a new voice message (for example: I am now on Renmin West Road, you can come and find me, we will Let's have dinner together), and send the new voice message to the target contact, so that the efficiency of editing the message by the electronic device can be improved.

An embodiment of the present application provides a voice input method. FIG. 1 shows a flowchart of a voice input method provided by an embodiment of the present application, and the method can be applied to an electronic device. As shown in FIG. 1 , the voice input method provided by this embodiment of the present application may include the following steps 201 to 203 .

Step 201: The electronic device receives the first voice message input by the user, and displays the first voice content corresponding to the first voice message.

In the embodiment of the present application, when the user sends a voice message to a contact, the user can input the first voice message to be sent, so that the electronic device can acquire and display the first voice corresponding to the first voice message input by the user. content (ie, text content), and then the user can input the first content, so that the electronic device can replace or delete the target voice message corresponding to the target content in the first voice message.

Optionally, in this embodiment of the present application, when the electronic device displays the conversation page corresponding to the target contact in the chat application, the user can input the control displayed in the conversation page for sending voice, and then the user can Voice content may be input to input the first voice message.

Optionally, in this embodiment of the present application, the user can perform a long-press input or a click input on the control used for sending voice to trigger the electronic device to be in a voice recording state (that is, perform a recording function), and the above-mentioned first voice message The voice input is performed during the process of long-pressing the control for sending voice, or it is the voice input after the user taps the control for sending voice.

Optionally, in this embodiment of the present application, when the user inputs the first voice message, the electronic device may perform a recording function to record the user's voice input (ie, the first voice message), and obtain the first voice message. The corresponding first voice content is displayed, and the first voice content is displayed.

It should be noted that, in the case where the user's input to the control for sending voice is a long-press input, the long-press input and the user's voice input (ie, the first voice message) are simultaneous inputs, that is, when the user performs By performing voice input while long-pressing the input, the electronic device can acquire the first voice content corresponding to the user's first voice message. In the case where the user does not perform a long-press input, if the user performs a voice input, the electronic device cannot obtain the first voice content corresponding to the user's first voice message.

Optionally, in this embodiment of the present application, when the user starts to input the first voice message, the electronic device may convert the voice content corresponding to the first voice message into text according to the acquired voice content corresponding to the first voice message. content, and display the first voice content (ie, text content) corresponding to the first voice message at a preset position on the screen.

Optionally, in this embodiment of the present application, after the user completes the input of the first voice message (that is, voice input), the electronic device may directly display the complete first voice message corresponding to the first voice message in a preset position on the screen. a voice content; or, the electronic device may gradually display the first voice content corresponding to the first voice message at a preset position on the screen while the user is inputting the first voice message (that is, according to the progress of the user's voice input , display the corresponding text), that is, convert the voice content of the first voice message into text content in real time and display it.

Optionally, in this embodiment of the present application, the first voice content corresponding to the first voice message displayed above is in an editable state, and the user can select and input part or all of the first text content, so that the electronic device Make edits to the selection (eg modify or replace).

Optionally, in this embodiment of the present application, the electronic device may acquire the first voice content corresponding to the first voice message, and control the first voice message to be in a to-be-edited state.

Optionally, in this embodiment of the present application, after the electronic device obtains the first voice content corresponding to the first voice message, the user can perform sliding input while long-pressing the voice input control to slide to the preset area ( For example, the display area corresponding to the first voice content is displayed) to trigger the electronic device to control the first voice content to be in a state to be edited.

It should be noted that the above-mentioned long-press input and sliding input of the voice input control can be a complete input, that is, there is no time interval between the long-press input and the sliding input, and the long-press input and the sliding input are a complete and continuous input. input of.

Optionally, in this embodiment of the present application, when the electronic device controls the first voice message to be in a state to be edited, the user can trigger the electronic device to replace the content in the first voice message with other content by inputting (that is, the user can trigger the The electronic device modifies the erroneous content in the first voice message).

Exemplarily, the electronic device is a mobile phone as an example for description. As shown in FIG. 2 , when the user makes the first input to the voice input control, the mobile phone displays a recording interface 10 , and the recording interface 10 includes a text display area 11 to synchronously display the user's voice in the text display area 11 Enter the corresponding text content (for example: I am currently on Renmin East Road, you can come and find me, we will have dinner together).

Step 202: The electronic device receives the first input of the first content by the user.

In the embodiment of the present application, the above-mentioned first content is content corresponding to the target content in the first voice content.

Optionally, in this embodiment of the present application, the user can perform voice input again, so that the electronic device can record the voice content corresponding to the first content; or, the user can perform text input to input the first content.

Step 203 , in response to the first input, the electronic device replaces or deletes the target voice message corresponding to the target content in the first voice message.

Optionally, in this embodiment of the present application, the electronic device may replace the target content in the first voice message with the first content; or delete the target content in the first voice message.

Optionally, in this embodiment of the present application, the electronic device may perform semantic analysis processing on the first voice message, so as to determine the target content from the first voice message.

Optionally, in this embodiment of the present application, the electronic device may intercept the target content from the first voice message, and perform combined processing on the voice content corresponding to the first content and the intercepted first voice message to obtain a new voice message. Voice message (ie, the third voice message described below).

It should be noted that the electronic device may delete the voice content corresponding to the target content from the first voice message, and add the voice content corresponding to the first content to the position where the voice content corresponding to the target content is located in the first voice message. , so as to combine to get a new voice message.

Optionally, in this embodiment of the present application, when the electronic device displays the first voice content corresponding to the first voice message, the electronic device may replace the target text content in the first voice content with the first voice content according to the user's input. The text content corresponding to the content is obtained to obtain the replaced first voice content, and the first voice content is updated and displayed on the screen.

Optionally, in this embodiment of the present application, the electronic device may replace the target content in the first voice message with the first content according to the replaced first voice content, so as to generate a new voice message.

Optionally, in this embodiment of the present application, the electronic device may send a new voice message obtained after replacement to the contact.

An embodiment of the present application provides a voice input method, where a user can input a first voice message to be sent to trigger an electronic device to display the first voice content corresponding to the first voice message, so that the user can Input the first content corresponding to the target content in the first voice message, so that the electronic device can replace or delete the target voice message corresponding to the target content in the first voice message. When the user inputs the first voice message to be sent, the electronic device can display the first voice content corresponding to the first voice message in real time, so when the target content in the first voice content is incorrect, the user can The first content corresponding to the target content is input, so that the electronic device can replace or delete the erroneous target voice message in the first voice message. There is no need for the user to re-enter the first voice message, so the user's operation can be simplified, and the displayed first voice content facilitates the user to intuitively determine the wrong content and modify it in time, thereby improving the efficiency of message editing by the electronic device.

Optionally, in the embodiment of the present application, with reference to FIG. 1 , as shown in FIG. 3 , after the above step 201 , the voice input method provided by the embodiment of the present application may further include the following

steps

301 and 302 , and the above steps 202 can be specifically implemented by the following step 202a, and the above-mentioned step 203 can be specifically implemented by the following step 203a.

Step 301, the electronic device receives a second input from the user.

In the embodiment of the present application, the above-mentioned second input is the user's selection input of the target content.

In the embodiment of the present application, the above-mentioned second input is the user's input to the target text content (ie, the target content) in the first voice content.

Optionally, in this embodiment of the present application, the user may input the target content to trigger the electronic device to select the target content, and further, the target content may be marked and displayed.

Optionally, in this embodiment of the present application, the above-mentioned second input may be any one of the following: a user's click input on the target content, a user's long-press input on the target content, a user's double-click input on the target content, and the like.

Step 302, the electronic device determines the target voice message according to the target content in response to the second input.

Optionally, in this embodiment of the present application, the electronic device may mark and display the target content, and determine the target voice message corresponding to the target content from the first voice message according to the position of the target content in the first voice content.

Exemplarily, with reference to Fig. 2, the user can display the text content in the text display area 11: "I am on Renmin East Road, you can come and find me, let's go to eat together" in the text "Renmin East Road" Enter to trigger the mobile phone to highlight the text corresponding to "Renmin East Road", so that according to the location of "Renmin East Road" in "I'm on Renmin East Road now, you can come and find me, let's go to eat together", from the user The voice part corresponding to "Renmin East Road" is determined in the voice message corresponding to the voice input.

Step 202a, the electronic device receives the second voice message input by the user.

In this embodiment of the present application, the above-mentioned second voice message is a voice message corresponding to the first content.

Optionally, in this embodiment of the present application, after the electronic device displays the first voice content, the user can perform voice input on the first content, so that the electronic device can obtain the second voice message (that is, the voice corresponding to the first content) input by the user. information).

Step 203a, the electronic device replaces the target voice message in the first voice message with the second voice message according to the second voice message, or deletes the target voice message in the first voice message; and obtains a third voice message.

Optionally, in this embodiment of the present application, the voice content corresponding to the voice input by the user includes at least the voice content of the first content.

Optionally, in this embodiment of the present application, the voice content corresponding to the user's voice input may further include other content, and the electronic device may obtain the voice content of the first content by performing semantic analysis on the voice content corresponding to the user's voice input.

Optionally, in this embodiment of the present application, the electronic device processes the first voice message according to the first content obtained by semantic analysis, so as to replace or delete the target voice content.

In this embodiment of the present application, after the electronic device displays the first voice content corresponding to the first voice message, the user can select and input the target content in the first voice content, so that the electronic device can select and input the target content according to the user's input on the target content. , determine the target voice message in the first voice message, and replace the target voice message in the first voice message with the second voice message according to the second voice message corresponding to the first content input by the user, or delete the first voice message The target voice message in the message; the third voice message is obtained, so that the user can accurately determine the content that needs to be replaced or deleted in the first voice message.

Optionally, in the embodiment of the present application, after the above step 201, the voice input method provided by the embodiment of the present application may further include the following steps 401 to 403.

Step 401, the electronic device displays the target control.

In the embodiment of the present application, the above-mentioned target control is used to edit the first voice message.

Optionally, in this embodiment of the present application, when the user starts to input the voice input control (that is, when the electronic device displays the voice input interface), the electronic device may display the target control at a preset position on the screen, so that the user can The target control makes an input to trigger the electronic device to be in a voice recording state.

Optionally, in this embodiment of the present application, when the electronic device displays the target control, the electronic device does not need to display the first voice content corresponding to the first voice message on the screen.

Step 402: The electronic device receives a third input from the user.

In the embodiment of the present application, the above-mentioned third input is the input of the user to the target control.

Optionally, in this embodiment of the present application, the user may perform sliding input after long-pressing the voice input control to slide to the position of the target control, thereby triggering the electronic device to be in a voice recording state.

Step 403, the electronic device controls the electronic device to be in a voice recording state in response to the third input.

Optionally, in this embodiment of the present application, when the electronic device is in a voice recording state, the user may input the first voice message to record the voice content input by the user.

Exemplarily, in conjunction with FIG. 2, as shown in FIG. 4, when the user inputs the voice input control, the mobile phone displays the recording interface 10, and the recording interface 10 includes the target control 12, and the user can enter the first voice message ( For example: I am now on Renmin East Road, you can come to me, let's go to dinner) After the input is completed, input the target control 12 to trigger the phone to be in the voice recording state, so that the user can make voice input again (for example: Replace Renmin East Road with Renmin West Road), so that the mobile phone can replace the voice content corresponding to "Renmin East Road" in the first voice message with the voice content corresponding to "Renmin West Road" according to the user's voice input.

Optionally, in the embodiment of the present application, before "replace or delete the target voice message corresponding to the target content in the first voice message" in the above step 203, the voice input method provided in the embodiment of the present application may further include: Step 404 is described below.

Step 404: In response to the first input, the electronic device performs semantic analysis processing on the second voice message to determine the first content and the target content.

Optionally, in this embodiment of the present application, the electronic device may perform semantic analysis processing on the second voice message by means of intelligent semantic analysis, so as to determine the content to be replaced (that is, the target content) in the first voice message, and the content after the replacement. first content.

Exemplarily, when the user needs to modify "Renmin East Road" in the first voice message "I'm on Renmin East Road now, you can come and find me, let's go to eat together", the content of the voice input by the user can be "Replace Renmin East Road with Renmin West Road" or "Renmin East Road is wrong, it should be Renmin West Road", etc., so that the electronic device can replace "Renmin East Road" in the first voice message with "Renmin West Road" ” to get the replaced first voice message “I’m on Renmin West Road now, you can come and find me, let’s have dinner together”.

In this embodiment of the present application, after the electronic device controls the first voice message to be in a state to be edited, the electronic device can display a target control for editing the first voice message, so that the user can input the target control to make the electronic device It is in a voice recording state, and performs semantic analysis and processing on the user's voice input to accurately determine the first content and the target content, so that the user can flexibly control the electronic device to perform corresponding operations through the voice input.

It should be noted that, when the above step 404 is executed, the specific step of the above step 203 may be replaced by "the electronic device replaces or deletes the target voice message corresponding to the target content in the first voice message".

Optionally, in the embodiment of the present application, with reference to FIG. 1 , as shown in FIG. 5 , after step 203 above, the voice input method provided by the embodiment of the present application may further include the following steps 501 to 503 .

Step 501, the electronic device receives a fourth input from the user.

In this embodiment of the present application, the above-mentioned fourth input is an input of a fourth voice message by the user.

Optionally, in this embodiment of the present application, after the electronic device replaces the target content in the first voice message with the first content to obtain the third voice message, the user can perform voice input again to input the fourth voice message.

It should be noted that, when the user inputs the fourth voice message, the above-mentioned first voice message and the fourth voice message can be understood as a complete voice message (that is, the fifth voice message described below), and the user is When the complete voice message is input, when the input of the first part of the voice message (ie the first voice message) is completed, the voice input may be paused first to replace the wrong content (ie the target content) in the first voice message, Therefore, after the replacement of the wrong content is completed, the input of the second part of the voice message (ie, the fourth voice message) is continued.

Step 502: In response to the fourth input, the electronic device performs combined processing on the fourth voice message and the third voice message to obtain a fifth voice message.

Optionally, in this embodiment of the present application, the electronic device may add the fourth voice message to the third voice message to obtain a complete voice message (ie, the fifth voice message).

Optionally, in this embodiment of the present application, the electronic device may perform voice splicing processing on the fourth voice message and the third voice message, so as to combine the two voice messages to obtain one voice message.

Step 503: The electronic device sends a fifth voice message including the third voice message.

Optionally, in this embodiment of the present application, when the user performs the fourth input, that is, when the electronic device receives the user's input for the fourth voice message, the voice message sent by the electronic device is the fifth voice message including the third voice message. Voice message; if the user does not perform the fourth input, the voice message sent by the electronic device is the third voice message.

In the embodiment of the present application, before the electronic device sends the third voice message, the user may input the fourth voice message, so that the electronic device performs combined processing on the fourth voice message and the third voice message to obtain the fifth voice message, Therefore, the electronic device can send the fifth voice message including the third voice message, so the flexibility of the electronic device to send the voice message can be improved.

It should be noted that, in the voice input method provided by the embodiments of the present application, the execution body may be a voice input device, or a control module in the voice input device for executing the voice input method. In the embodiments of the present application, the voice input device provided by the embodiments of the present application is described by taking the voice input device executing the method for loading voice input as an example.

FIG. 6 shows a possible schematic structural diagram of the voice input device involved in the embodiment of the present application. As shown in FIG. 6 , the voice input device 70 may include: a receiving module 71 , a display module 72 and a processing module 73 .

The receiving module 71 is configured to receive the first voice message input by the user. The display module 72 is configured to display the first voice content corresponding to the first voice message. The receiving module 71 is further configured to receive a user's first input of the first content, where the first content is content corresponding to the target content in the first voice content. The processing module 73 is configured to replace or delete the target voice message corresponding to the target content in the first voice message in response to the first input received by the receiving module 71 .

In a possible implementation manner, the voice input apparatus 70 provided in this embodiment of the present application may further include: a determination module. The receiving module 71 is further configured to receive the user's second input after the display module 72 displays the first voice content corresponding to the first voice message, where the second input is the user's selection input on the target content. The determining module is configured to determine the target voice message according to the target content in response to the second input received by the receiving module 71 . The receiving module 71 is specifically configured to receive a second voice message input by a user, where the second voice message is a voice message corresponding to the first content. The processing module 73 is specifically configured to replace the target voice message in the first voice message with the second voice message according to the second voice message, or delete the target voice message in the first voice message; and obtain a third voice message.

In a possible implementation manner, the voice input device 70 provided in this embodiment of the present application may further include: a control module. The display module 72 is further configured to display a target control after displaying the first voice content corresponding to the first voice message, where the target control is used to edit the first voice message. The receiving module 71 is further configured to receive a user's third input, where the third input is the user's input to the target control. The control module is configured to control the voice input device to be in a voice recording state in response to the third input received by the receiving module 71 .

In a possible implementation manner, the processing module 73 is further configured to perform semantic analysis processing on the second voice message before replacing or deleting the target voice message corresponding to the target content in the first voice message to determine the first voice message. content and target content.

In a possible implementation manner, the voice input apparatus 70 provided in this embodiment of the present application may further include: a sending module. Wherein, the receiving module 71 is further configured to receive a fourth input from the user after the processing module 73 replaces or deletes the target voice message corresponding to the target content in the first voice message, where the fourth input is the user's response to the fourth voice input of the message. The processing module 73 is further configured to perform combined processing on the fourth voice message and the third voice message in response to the fourth input received by the receiving module 71 to obtain a fifth voice message. A sending module, configured to send a fifth voice message including the third voice message.

The voice input device provided in the embodiments of the present application can implement each process implemented by the voice input device in the foregoing method embodiments. To avoid repetition, the detailed description is not repeated here.

An embodiment of the present application provides a voice input device, because when a user inputs a first voice message to be sent, the voice input device can display the first voice content corresponding to the first voice message in real time, so that the first voice content can be displayed in real time. When the target content in the first voice message is wrong, the user can input the first content corresponding to the target content, so that the voice input device can replace or delete the wrong target voice message in the first voice message. There is no need for the user to re-input the first voice message, so the user's operation can be simplified, and the displayed first voice content facilitates the user to intuitively determine the wrong content and modify it in time, thereby improving the efficiency of editing the message by the voice input device .

The voice input device in this embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The apparatus may be a mobile electronic device or a non-mobile electronic device. Exemplarily, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, an in-vehicle electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (personal digital assistant). assistant, PDA), etc., non-mobile electronic devices can be servers, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (television, TV), teller machine or self-service machine, etc., this application Examples are not specifically limited.

The voice input device in this embodiment of the present application may be a device with an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.

Optionally, an embodiment of the present application further provides an electronic device, including a processor 110, a memory 109, a program or instruction stored in the memory 109 and executable on the processor 110, the program or instruction being processed by the processor When 110 is executed, each process of the above embodiments of the voice input method is implemented, and the same technical effect can be achieved. To avoid repetition, details are not described here.

It should be noted that the electronic devices in the embodiments of the present application include the aforementioned mobile electronic devices and non-mobile electronic devices.

FIG. 7 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 100 includes but is not limited to: a radio frequency unit 101, a network module 102, an audio output unit 103, an input unit 104, a sensor 105, a display unit 106, a user input unit 107, an interface unit 108, a memory 109, and a processor 110, etc. part.

Those skilled in the art can understand that the electronic device 100 may also include a power source (such as a battery) for supplying power to various components, and the power source may be logically connected to the processor 110 through a power management system, so as to manage charging, discharging, and power management through the power management system. consumption management and other functions. The structure of the electronic device shown in FIG. 7 does not constitute a limitation on the electronic device. The electronic device may include more or less components than the one shown, or combine some components, or arrange different components, which will not be repeated here. .

The user input unit 107 is configured to receive the first voice message input by the user.

The display unit 106 is configured to display the first voice content corresponding to the first voice message.

The processor 110 is configured to, in response to the first input, replace or delete the target voice message corresponding to the target content in the first voice message.

The embodiment of the present application provides an electronic device, because when a user inputs a first voice message to be sent, the electronic device can display the first voice content corresponding to the first voice message in real time, so that the first voice content in the first voice content can be displayed in real time. When the target content is wrong, the user can input the first content corresponding to the target content, so that the electronic device can replace or delete the wrong target voice message in the first voice message. There is no need for the user to re-input the first voice message, so the user's operation can be simplified, and through the displayed first voice content, it is convenient for the user to intuitively determine the wrong content and modify it in time, thereby improving the efficiency of editing the message by the electronic device.

Optionally, the user input unit 107 is further configured to receive a second input from the user, where the second input is the user's selection input on the target content in the first text content.

The processor 110 is further configured to, in response to the second input, determine the target voice message according to the target content.

The user input unit 107 is specifically configured to receive a second voice message input by a user, where the second voice message is a voice message corresponding to the first content.

The processor 110 is specifically configured to replace the target voice message in the first voice message with the second voice message according to the second voice message, or delete the target voice message in the first voice message; and obtain a third voice message.

The display unit 106 is further configured to display a target control, where the target control is used to edit the first voice message.

The user input unit 107 is further configured to receive a user's third input, where the third input is the user's input to the target control.

The processor 110 is further configured to control the electronic device to be in a voice recording state in response to the third input.

The processor 110 is further configured to perform semantic analysis processing on the second voice message to determine the first content and the target content.

The user input unit 107 is further configured to receive a fourth input from a user, where the fourth input is an input of a fourth voice message by the user.

The processor 110 is further configured to, in response to the fourth input, perform combined processing on the fourth voice message and the third voice message to obtain a fifth voice message.

The network module 102 is configured to send a fifth voice message including a third voice message.

It should be understood that, in this embodiment of the present application, the input unit 104 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 1042. Such as camera) to obtain still pictures or video image data for processing. The display unit 106 may include a display panel 1061, which may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 107 includes a touch panel 1071 and other input devices 1072 . The touch panel 1071 is also called a touch screen. The touch panel 1071 may include two parts, a touch detection device and a touch controller. Other input devices 1072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which are not described herein again. Memory 109 may be used to store software programs as well as various data including, but not limited to, application programs and operating systems. The processor 110 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, and an application program, and the like, and the modem processor mainly processes wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 110 .

Embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium. When the program or instruction is executed by a processor, each process of the above-mentioned voice input method embodiment can be achieved, and the same can be achieved. In order to avoid repetition, the technical effect will not be repeated here.

Wherein, the processor is the processor in the electronic device described in the foregoing embodiments. The readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.

An embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement the voice input method embodiments described above. Each process can achieve the same technical effect. In order to avoid repetition, it will not be repeated here.

It should be understood that the chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip, a system-on-a-chip, or a system-on-a-chip, or the like.

It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element. In addition, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in the reverse order depending on the functions involved. To perform functions, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to some examples may be combined in other examples.

From the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.

The embodiments of the present application have been described above in conjunction with the accompanying drawings, but the present application is not limited to the above-mentioned specific embodiments, which are merely illustrative rather than restrictive. Under the inspiration of this application, without departing from the scope of protection of the purpose of this application and the claims, many forms can be made, which all fall within the protection of this application.

Claims

A voice input method, the method comprising:

receiving the first voice message input by the user, and displaying the first voice content corresponding to the first voice message;

receiving a user's first input of first content, where the first content is content corresponding to target content in the first voice content;

In response to the first input, a target voice message corresponding to the target content in the first voice message is replaced or deleted.
The method according to claim 1, wherein after the displaying the first voice content corresponding to the first voice message, the method further comprises:

receiving a second input from the user, where the second input is the user's selection input on the target content;

determining the target voice message according to the target content in response to the second input;

The receiving the first input of the first content by the user includes:

receiving a second voice message input by a user, where the second voice message is a voice message corresponding to the first content;

The replacing or deleting the target voice message corresponding to the target content in the first voice message includes:

According to the second voice message, replace the target voice message in the first voice message with the second voice message, or delete the target voice message in the first voice message; obtain the first voice message Three voice messages.
The method according to claim 1, wherein after the displaying the first voice content corresponding to the first voice message, the method further comprises:

displaying a target control, the target control is used to edit the first voice message;

receiving a user's third input, where the third input is the user's input to the target control;

In response to the third input, the electronic device is controlled to be in a voice recording state.
The method according to claim 2, wherein before replacing or deleting the target voice message corresponding to the target content in the first voice message, the method further comprises:

Semantic analysis processing is performed on the second voice message to determine the first content and the target content.
The method according to any one of claims 1 to 4, wherein after replacing or deleting the target voice message corresponding to the target content in the first voice message, the method further comprises:

receiving a fourth input from the user, where the fourth input is the user's input on a fourth voice message;

In response to the fourth input, combining the fourth voice message and the third voice message to obtain a fifth voice message;

The fifth voice message including the third voice message is sent.
A voice input device comprising: a receiving module, a display module and a processing module;

The receiving module is configured to receive the first voice message input by the user;

the display module, configured to display the first voice content corresponding to the first voice message;

The receiving module is further configured to receive a user's first input of first content, where the first content is content corresponding to the target content in the first voice content;

The processing module is configured to replace or delete the target voice message corresponding to the target content in the first voice message in response to the first input received by the receiving module.
The voice input device according to claim 6, wherein the voice input device further comprises: a determination module;

The receiving module is further configured to receive a second input from the user after the display module displays the first voice content corresponding to the first voice message, where the second input is the user's selection of the target content enter;

the determining module, configured to determine the target voice message according to the target content in response to the second input received by the receiving module;

The receiving module is specifically configured to receive a second voice message input by a user, where the second voice message is a voice message corresponding to the first content;

The processing module is specifically configured to replace the target voice message in the first voice message with the second voice message according to the second voice message, or delete the target voice message in the first voice message. the target voice message; obtain a third voice message.
The voice input device according to claim 6, wherein the voice input device further comprises: a control module;

The display module is also used to display a target control after displaying the first voice content corresponding to the first voice message, and the target control is used to edit the first voice message;

The receiving module is further configured to receive a user's third input, where the third input is the user's input to the target control;

The control module is configured to control the voice input device to be in a voice recording state in response to the third input received by the receiving module.
The voice input device according to claim 7, wherein the processing module is further configured to, before replacing or deleting the target voice message corresponding to the target content in the first voice message, Semantic analysis processing is performed on the two voice messages to determine the first content and the target content.
The voice input device according to any one of claims 6 to 9, wherein the message sending device further comprises: a sending module;

The receiving module is further configured to receive a fourth input from the user after the processing module replaces or deletes the target voice message corresponding to the target content in the first voice message, where the fourth input is: the user's input of the fourth voice message;

The processing module is further configured to perform combined processing on the fourth voice message and the third voice message in response to the fourth input received by the receiving module to obtain a fifth voice message;

The sending module is configured to send the fifth voice message including the third voice message.
An electronic device, comprising a processor, a memory, and a program or instruction stored on the memory and executable on the processor, the program or instruction being executed by the processor to achieve as claimed in claims 1-5 The steps of any one of the voice input methods.
A readable storage medium on which programs or instructions are stored, and when the programs or instructions are executed by a processor, implement the steps of the voice input method according to any one of claims 1-5.
A computer software product executed by at least one processor to implement the speech input method of any one of claims 1 to 5.
An electronic device comprising an electronic device configured to perform the voice input method of any one of claims 1 to 5.
A chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, and the processor is used to run a program or an instruction to implement the voice as claimed in any one of claims 1 to 5 input method.