CN116959406A

CN116959406A - Voice message processing method and device, electronic equipment and storage medium

Info

Publication number: CN116959406A
Application number: CN202310739963.2A
Authority: CN
Inventors: 严超
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2023-06-20
Filing date: 2023-06-20
Publication date: 2023-10-27

Abstract

The application discloses a voice message processing method, a voice message processing device, electronic equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: receiving a first input in the case of recording a target voice message; displaying a target insert at a first location of a target voice message in response to a first input; the target insert is used for marking a target voice segment corresponding to the first position and a voice editing type of the target voice segment.

Description

Voice message processing method and device, electronic equipment and storage medium

Technical Field

The application belongs to the technical field of computers, and particularly relates to a voice message processing method, a voice message processing device, electronic equipment and a storage medium.

Background

Currently, users may communicate with other users by sending voice messages to the other users through electronic devices in an instant chat interface. In the related art, after the user finishes recording the voice message, if the user is not satisfied with the voice message, the user can edit the voice message. However, when editing a voice message, it is necessary to repeatedly play back the voice message to confirm a voice clip to be edited, resulting in low editing efficiency of the voice message.

Disclosure of Invention

The embodiment of the application aims to provide a voice message processing method, a voice message processing device, electronic equipment and a storage medium, which can improve the editing efficiency of voice messages.

In a first aspect, an embodiment of the present application provides a voice message processing method, where the voice message processing method includes: receiving a first input in the case of recording a target voice message; displaying a target insert at a first location of a target voice message in response to a first input; the target insert is used for marking a target voice segment corresponding to the first position and a voice editing type of the target voice segment.

In a second aspect, an embodiment of the present application provides a voice message processing apparatus, including: a receiving module and a display module. And the receiving module is used for receiving a first input under the condition of recording the target voice message. A display module for displaying a target insert at a first location of a target voice message in response to the first input received by the receiving module; the target insert is used for marking a target voice segment corresponding to the first position and a voice editing type of the target voice segment.

In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the method as described in the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor perform the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product stored in a storage medium, the program product being executable by at least one processor to implement the method according to the first aspect.

In some embodiments of the present application, in the case of recording a target voice message, the electronic device may respond to the first input to display the target insert at the first location of the target voice message, where, on one hand, the target insert is used to mark the target voice segment corresponding to the first location, so that when the user edits the target voice message, the user may directly locate the voice segment to be edited, and on the other hand, since the target insert is also used to mark the voice editing type of the target voice segment, the user may directly invoke the voice editing function corresponding to the voice editing type, thereby improving the editing efficiency of the voice message.

Drawings

FIG. 1 is a flowchart of a voice message processing method according to an embodiment of the present application;

FIG. 2 is a second flowchart of a voice message processing method according to an embodiment of the present application;

FIG. 3 is one example diagram of an editing interface for a target insert provided by an embodiment of the present application;

FIG. 4 is a third flowchart of a voice message processing method according to an embodiment of the present application;

FIG. 5 is one example diagram of a display interface for a target insert provided by an embodiment of the present application;

FIG. 6 is a second exemplary diagram of an editing interface for a target insert provided by an embodiment of the present application;

FIG. 7 is a third exemplary diagram of an editing interface for a target insert provided by an embodiment of the present application;

FIG. 8 is a fourth exemplary diagram of an editing interface for a target insert provided by an embodiment of the present application;

fig. 9 is a schematic structural diagram of a voice message processing apparatus according to an embodiment of the present application;

fig. 10 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application;

fig. 11 is a second schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in some embodiments of the present application will be clearly described below with reference to the accompanying drawings in some embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or otherwise described herein, and that the objects identified by "first," "second," etc. are generally of a type not limited to the number of objects, for example, the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The voice message processing method provided by the embodiment of the application is described in detail below through specific embodiments and application scenarios thereof with reference to the accompanying drawings.

The voice message processing method provided by the embodiment of the application can be applied to a voice sending scene.

Illustratively, consider the scenario in which user A sends a voice message to user B in an instant messaging chat interface. In the related art, when the electronic device displays the instant messaging chat interface, the instant messaging chat interface may include a recorded voice message control and a send voice message control, in which, the user a may perform long press input on the recorded voice message control, then perform voice output on the electronic device, so that the electronic device may record voice output by the user, then the user a may release the voice message control, so that the electronic device may end voice recording to obtain a voice message, and display the voice message in the instant messaging chat interface, further, the user a may perform input on the voice message, so that the electronic device may display a voice editing interface corresponding to the voice message, where the voice editing interface includes at least one editing control, in the voice editing interface, the user a may perform long press input on a voice segment to be edited in the voice message, then the user a may perform input on any editing control in the at least one voice segment, so that the electronic device may perform editing processing on the voice message, for example, delete a voice segment to obtain the voice message, and then send the voice message to the electronic device, so that the electronic device may click the voice message is less time-consuming than the voice message is sent by the electronic device.

In some embodiments of the present application, during recording a voice message, the electronic device may display a target insert at a first location of the recorded voice message, where the target insert is used to mark a target voice clip corresponding to the first location, and a voice editing type of the target voice clip, so when finishing recording the voice message, the electronic device may edit the recorded voice message according to the editing type corresponding to the target insert and an inserting location of the target insert in the recorded voice message, to obtain an edited voice message, that is, during recording the voice message, the electronic device may respond to the first input and display the target insert at the first location of the target voice message.

The execution body of the voice message processing method provided by the embodiment of the application can be a voice message processing device, and the voice message processing device can be an electronic device or a functional module in the electronic device. The technical solution provided by the embodiment of the present application is described below by taking an electronic device as an example.

An embodiment of the present application provides a method for processing a voice message, and fig. 1 shows a flowchart of a method for processing a voice message provided by an embodiment of the present application. As shown in fig. 1, the voice message processing method provided in the embodiment of the present application may include the following steps 201 and 202.

Step 201, the electronic device receives a first input in case of recording a target voice message.

Alternatively, in some embodiments of the present application, the first input may be any one of the following: click input, slide input, long press input, preset track input, and the like. The method can be specifically determined according to actual use conditions, and the embodiment of the application is not limited.

It can be understood that the target voice message is a voice message in recording, that is, a voice message being recorded by the electronic device when the electronic device does not end recording.

For example, assuming that the voice message currently being recorded by the electronic device is "your good", the first voice message is "your good", and then the voice message continuously recorded by the electronic device is "beijing", the first voice message is "your good, beijing".

In some embodiments of the present application, during recording of the target voice message, the electronic device may display a voice recording interface that includes the target voice message information.

It will be appreciated that since the target voice message is in the process of recording, the electronic device can display the recording target voice message information in real time in the voice recording interface.

Optionally, in some embodiments of the present application, the target voice message information may include at least one of: the method comprises the steps of target voice message duration information, semantic information of target voice messages and a time axis corresponding to the target voice messages.

Optionally, in some embodiments of the present application, before receiving the first input, the electronic device may receive the input of the user to the first recording control, so as to display the voice recording interface, where the electronic device records the target voice message, and simultaneously displays the voice recording interface.

In some embodiments of the present application, the electronic device may display a voice recording interface during recording the target voice message, so that a user may conveniently and quickly edit the target voice message in the voice recording interface.

Optionally, in some embodiments of the present application, the electronic device may start recording the target voice message by one click input of the recording control by the user, and then the electronic device may receive one click input of the recording control by the user again, and end recording the target voice message.

Step 202, the electronic device displays a target insert in a first location of the target voice message in response to the first input, where the target insert is used to mark a target voice segment corresponding to the first location, and a voice editing type of the target voice segment.

For example, the electronic device may display the target insert at a first location of a timeline in the voice recording interface described above.

It will be appreciated that the first position is determined by the first input.

Alternatively, in some embodiments of the present application, the number of target inserts may be one or more.

Alternatively, in some embodiments of the present application, the number of target speech segments may be one or more.

Optionally, in some embodiments of the present application, the voice edit type includes at least one of: text comment type and voice clip edit type.

Alternatively, in some embodiments of the present application, the target insert may be determined by the electronic device according to semantic information of the target voice message.

The target insert displayed on the time axis may be an electronic device that analyzes the semantics of the target voice message through a target algorithm to obtain semantic information, then divides the target voice message into L audio segments according to the semantic information, inserts the target insert at an end position of each audio segment, and displays the target insert corresponding to each audio segment on the time axis, where L is a positive integer.

Optionally, in some embodiments of the present application, after the electronic device displays the L target inserts on the time axis, the electronic device may adjust the positions of one or more target inserts on the time axis based on the user input of one or more target inserts of the L target inserts, at which time the electronic device may record the current operation and further optimize the target algorithm by machine learning as a reference parameter to achieve matching with personal habits and characteristics.

In some embodiments of the present application, the target algorithm may be any one of the following: artificial intelligence (Artificial Intelligence, AI) algorithms or deep learning neural network algorithms.

In the voice message processing method provided by the application, under the condition of recording the target voice message, the electronic equipment can respond to the first input to display the target insert at the first position of the target voice message, on one hand, the target insert is used for marking the target voice fragment corresponding to the first position, so that when the user edits the target voice message, the user can directly position the voice fragment to be edited, on the other hand, the target insert is also used for marking the voice editing type of the target voice fragment, so that the user can directly call the voice editing function corresponding to the voice editing type, and the editing efficiency of the voice message is improved.

Alternatively, in some embodiments of the present application, as shown in fig. 2 in conjunction with fig. 1, the above step 201 may be specifically implemented by the following steps 201a and 201 b.

In step 201a, when the target voice message is recorded, the electronic device displays the insert control area.

Optionally, in some embodiments of the present application, an insert control area is included in the voice recording interface.

Optionally, in some embodiments of the present application, the insert control area includes at least two insert controls and at least one insert delete control.

Step 201b, the electronic device receives a first input to the insert control area.

In some embodiments of the present application, when the electronic device receives user input to a target insert control of the at least two insert controls, the electronic device may display a target insert corresponding to the target insert control at a first location on a timeline corresponding to the target voice message.

In some embodiments of the application, when the electronic device receives user input to the insert deletion control, the electronic device may delete one or more target inserts displayed on the timeline.

Illustratively, as shown in fig. 3, the above-mentioned insert deletion control includes a cancel control 10 and a delete control 11, on which a target insert 12, a target insert 13, and a target insert 14 are displayed on a time axis 11, and a user can make a click input to the target insert 13 and then click the delete control 11, so that the electronic device can delete the target insert 13 displayed on the time axis 11; alternatively, the user may directly make a click input to the delete control 11 to cause the electronic device to delete all the target inserts displayed on the timeline 11; alternatively, the user may enter a cancel control 10 so that the electronic device may delete the target insert that was most recently displayed on the timeline 11. It should be noted that fig. 3 only shows one of these solutions.

In some embodiments of the application, the electronic device, while displaying the plug-in control region, may be responsive to input to the plug-in control in the plug-in control region; or the electronic device responds to the input of the insert deletion control, so that the electronic device can perform different processing, and the electronic device can flexibly and intuitively process the target voice message through the insert control area.

Optionally, in some embodiments of the present application, the above-mentioned insert control area includes at least two insert controls, the at least two insert controls corresponding to different insert types and voice editing types, respectively.

Illustratively, as shown in fig. 4 in conjunction with fig. 2, the above step 201b may be implemented specifically by the following step 201b1, and the above step 202 may be implemented specifically by the following step 202 a.

Step 201b1, the electronic device receives a first input to a target one of the at least two plug-in controls.

Optionally, in some embodiments of the present application, the insert control area further includes an insert control, and the first input may be an input to the target insert control and the insert control.

In step 202a, the electronic device displays, in response to the first input, a target insert corresponding to the target insert control at a first location of the target voice message.

In some embodiments of the present application, the target voice segment corresponding to the first position of the target insert mark is determined according to an insert type corresponding to the target insert control, where a voice edit type of the target insert mark is a voice edit type corresponding to the target insert control.

Optionally, in some embodiments of the present application, the voice edit type corresponding to the target insert control may include at least one of the following: the method comprises the steps of customizing a voice fragment editing type, a preceding voice fragment editing type, a subsequent voice fragment editing type, a custom voice fragment annotation type, a preceding voice fragment annotation type and a subsequent voice fragment annotation type.

Alternatively, in some embodiments of the present application, the first position may be a preset position.

The first input includes a first sub-input and a second sub-input, and the user may perform the first sub-input on the target insert control so that the electronic device may determine the target insert corresponding to the target insert control, and then the electronic device may perform the second sub-input on the insert control so that the electronic device may display the target insert at an end position of the target voice message being recorded.

Optionally, in some embodiments of the application, the first position is determined by a termination input position of the first input.

The first input includes a first sub input and a second sub input, and the user may perform long-press input on the target-plug-in control, so that the target-plug-in control may be in a moving state, and then drag the target-plug-in control in the moving state, so as to drag the target-plug-in control onto the time axis, and display a target plug-in corresponding to the target-plug-in control at an end drag position.

Illustratively, as shown in fig. 5, the voice recording interface 15 includes: recording control 16, voice message recording timeline 11, custom voice clip insert control 17, preamble voice clip insert control 18, post voice clip insert control 19, custom voice clip annotation insert control 20, preamble voice clip annotation insert control 21, post voice clip annotation insert control 22, for example, a user may drag and input custom voice clip insert control 17 so that the electronic device may display custom voice clip insert 171 in voice message recording timeline 11.

In some embodiments of the present application, the electronic device may flexibly and intuitively add the target insert through the insert control area, so that flexibility of the electronic device in processing the target voice message is improved.

Alternatively, in some embodiments of the present application, the above step 202 may be specifically implemented by the following steps 202b and 202 c.

Step 202b, the electronic device responds to the first input and determines the first position of the target voice message according to the input time of the first input.

It will be appreciated that the time axis includes at least one time stamp, and the electronic device may determine, based on the first input, a time stamp corresponding to the first input on the voice message recording time axis, the time stamp corresponding to the first location.

Step 202c, the electronic device displays the target insert at the first position.

In some embodiments of the application, the electronic device may display the target insert on a timestamp corresponding to the first input.

In some embodiments of the present application, the electronic device may determine the first location based on a timestamp on a voice message recording timeline, such that the electronic device may display the target insert at the first location to prompt the user for the location of the target insert in the target voice message.

Optionally, in some embodiments of the present application, the voice message processing method provided by the present application further includes the following step 301.

Step 301, the electronic device responds to the first input, and determines a target voice segment corresponding to the first position of the target insert mark and a voice editing type of the target insert mark according to an input mode of the first input.

In some embodiments of the present application, the electronic device may display a functional area in the voice recording interface in response to the first input, the functional area being for determining a target voice clip corresponding to the first location of the target insert mark and a voice edit type of the target insert mark.

For example, in the case that the above-mentioned function areas integrate multiple editing functions, for example, for the same target insert control, a user sliding left triggers the addition of a preamble control insert, a user sliding right triggers the addition of a postamble control insert, a user sliding up triggers the addition of a text insert, and a user sliding down triggers the addition of a voice insert.

It will be appreciated that the voice edit type of the target insert marker described above is associated with the target voice segment corresponding to the first location of the target insert marker.

For example, assuming that the voice edit type of the target insert tag is a preamble control insert, the electronic device determines the target voice clip to be a voice clip located within a preset time period before the target insert based on the preamble control voice edit type; assuming that the voice edit type of the target insert tag is a subsequent control insert, the electronic device determines the target voice segment to be a voice segment located within a preset time period after the target insert based on the subsequent control voice edit type.

Optionally, in some embodiments of the present application, the target voice segment corresponding to the first location and the voice edit type of the target voice segment are determined according to an insert type of the target insert.

The insert type of the target insert is exemplified by a custom voice clip editing type, a pre-voice clip editing type, a post-voice clip editing type, a custom voice clip annotation type, a pre-voice clip annotation type, and a post-voice clip annotation type.

In some embodiments of the present application, the electronic device may flexibly change the target voice segment corresponding to the first position of the target insert mark and the voice editing type of the target insert mark based on the functional area, so as to improve flexibility of the electronic device in determining the target voice segment corresponding to the target insert and the voice editing type corresponding to the target insert.

Alternatively, in some embodiments of the present application, the above step 202 may be specifically implemented by the following steps 202d and 202 e.

Step 202d, the electronic device determines, in response to the first input, a display parameter of the target insert according to the insert type of the target insert.

Optionally, in some embodiments of the present application, the insert type of the target insert includes at least one of: text annotation type and voice clip editing type.

Illustratively, the text annotation type may include at least one of: custom voice segment annotation type, pre-amble voice segment annotation type, and post-amble voice segment annotation type.

Still further illustratively, the voice clip editing type may include at least one of: the custom voice clip edit type, the pre-amble voice clip edit type, and the post-amble voice clip edit type.

It should be noted that, the above-mentioned custom voice clip edit type: two inserters are required to be marked together, a voice fragment in the middle of the two inserters is a target voice fragment, and the target voice fragment is used for editing;

preamble voice clip edit type: the voice segment with the forward appointed duration of the insert is a target voice segment, and the target voice segment is used for editing;

Type of subsequent voice clip editing: the voice segment with the appointed duration inserted Fu Xianghou is a target voice segment, and the target voice segment is used for editing;

custom speech segment annotation type: two inserters are required to be marked together, and a voice fragment in the middle of the two inserters is a target voice fragment which is used for annotation;

preface speech segment annotation type: the voice segment with the forward appointed duration of the insert is a target voice segment, and the target voice segment is used for annotating;

type of subsequent voice segment annotation: the inserted Fu Xianghou specified duration speech segment is a target speech segment that is used for annotation.

Optionally, in some embodiments of the present application, the editing may include at least one of: substitutions, deletions and additions.

Alternatively, in some embodiments of the present application, the above-mentioned annotations may include at least one of text annotations and voice annotations.

It will be appreciated that different voice edit types correspond to different display parameters.

The foregoing different voice editing types correspond to different display parameters and are preset corresponding relationships of the electronic device.

Optionally, in some embodiments of the present application, the display parameter includes at least one of: color parameters, text parameters, symbol parameters, and label parameters, etc.

Step 202e, the electronic device displays the target insert in the first position of the target voice message according to the display parameter of the target insert.

In some embodiments of the application, the electronic device may display the target insert according to the display parameters on a voice message recording timeline.

For example, if the display parameter of the insert corresponding to the text comment type is red, the electronic device may display the insert of red on the voice message recording time axis when displaying the insert corresponding to the text comment type; or if the display parameter of the insert corresponding to the voice clip editing type is green, the electronic device may display the green insert on the voice message recording time axis when displaying the insert corresponding to the voice clip editing type.

In some embodiments of the present application, when the electronic device displays the target insert, the target insert of different voice editing types can be displayed in different display parameter modes, so that confusion of users is avoided, and the look and feel of the target insert is improved.

Optionally, in some embodiments of the present application, the voice message processing method provided by the present application further includes the following steps 401 and 402.

Step 401, the electronic device determines display parameters of the target voice segment according to the type of the target insert.

In some embodiments of the present application, since the type of the insert of one target insert corresponds to one display parameter and the types of inserts of different target inserts correspond to different display parameters, the electronic device may determine the display parameters of the target speech segment through the type of the insert of the target insert.

For example, assuming that the insert type of the target insert is a text comment type, and the display parameter corresponding to the text comment type is red, the display parameter of the target speech segment corresponding to the target insert is also red.

Step 402, the electronic device updates and displays the target voice segment according to the display parameters of the target voice message.

In some embodiments of the present application, after determining the display parameter of the target voice message, the electronic device may update the display parameter of the target voice segment from the first parameter to the second parameter, and use the second parameter as the display parameter of the target voice segment for display in the voice recording interface.

Optionally, in some embodiments of the present application, in a case where the target insert is multiple and insert types corresponding to the multiple target inserts are different, the electronic device may determine, based on the insert types corresponding to the target inserts, a display parameter of each target speech segment in the multiple target speech segments, so as to update the display parameter of each target speech segment in the multiple target speech segments.

The display parameter may be a display color, for example.

In an exemplary embodiment, assuming that the original display parameter of the target voice segment is black, the voice recording interface includes a first insert of a text annotation type and a second insert of a voice type, the display color corresponding to the first insert is red, the display color corresponding to the second insert is green, and when the electronic device determines the first voice segment based on the first insert, the electronic device may update the display color corresponding to the first voice segment to red; and when the electronic equipment determines the second voice segment based on the second insert, the electronic equipment can update the display color corresponding to the second voice segment to green.

Optionally, in some embodiments of the present application, the voice message processing method provided by the present application further includes the following steps 501 and 502.

In step 501, the electronic device receives a second input of the target insert when recording the target voice message is suspended.

Alternatively, in some embodiments of the present application, the second input may be a click input, a long press input, a slide input, or a preset track input of the target insert by the user. The method can be specifically determined according to actual use conditions, and the embodiment of the application is not limited.

Step 502, the electronic device responds to the second input, and displays a voice clip editing interface according to the voice editing type of the target insert mark.

In some embodiments of the present application, the electronic device may display different types of voice clip editing interfaces according to the voice editing type of the target insert tag, that is, one voice editing type corresponds to one voice clip editing interface; controls contained in different speech segment editing interfaces are different.

In some embodiments of the present application, when recording of the target voice message is suspended, the electronic device may edit the voice editing element, so as to reestablish the association between the edited voice editing element and the target insert, so that when the electronic device finishes recording the target voice message, the electronic device may edit the target voice message according to the edited voice editing element, so as to obtain the first voice message.

Optionally, in some embodiments of the present application, the voice message processing method provided by the present application further includes the following steps 601 and 602.

Step 601, the electronic device receives a third input of the target insert.

Alternatively, in some embodiments of the present application, the third input may be a click input, a long press input, a slide input, or a preset track input of the target insert by the user. The method can be specifically determined according to actual use conditions, and the embodiment of the application is not limited.

Step 602, the electronic device responds to the third input, and displays a voice clip editing interface according to the voice editing type of the target insert mark.

In some embodiments of the present application, the voice clip editing interface includes at least one voice clip editing control.

In some embodiments of the present application, the electronic device may perform editing processing on the voice editing type corresponding to the target insert mark in the voice clip editing interface according to the input of the user.

In some embodiments of the present application, the electronic device may edit the voice editing element, so as to reestablish an association relationship between the edited voice editing element and the target insert, so when the electronic device finishes recording the target voice message, the electronic device may edit the target voice message according to the edited voice editing element, thereby obtaining the first voice message.

Optionally, in some embodiments of the present application, the voice edit type of the target insert tag is a text comment type; the voice message processing method provided by the application further comprises the following steps 701 and 702.

Step 701, the electronic device receives a fourth input to the voice clip editing interface.

In some embodiments of the present application, the voice clip editing interface further includes an annotation control.

Optionally, in some embodiments of the present application, the fourth input user may be a click input, a long press input, a slide input, or a preset track input of the annotation control. The method can be specifically determined according to actual use conditions, and the embodiment of the application is not limited.

In step 702, the electronic device responds to the fourth input, and adds annotation information corresponding to the fourth input to the target voice segment.

In some embodiments of the present application, the electronic device may display a file input box corresponding to the annotation control in response to the third input, so that the electronic device may receive the annotation content that the user desires to annotate in the file input box.

As shown in fig. 6, a user may click to input a voice clip between the custom voice clip insert 171 and the custom voice clip insert 172, so that the electronic device may display a voice clip editing interface 24 corresponding to the custom voice clip in the voice recording interface, where the voice clip editing interface 24 includes a text editing control 22 and a deletion control 23, and the user may input any one of the controls, so that the electronic device may edit the voice clip according to an editing function corresponding to the any one control.

Optionally, in some embodiments of the present application, the voice edit type of the target insert tag is a voice clip edit type; the voice message processing method provided by the application further comprises the following steps 801 and 802.

Step 801, the electronic device receives a fifth input to the voice clip editing interface.

In some embodiments of the present application, the above-mentioned voice clip editing interface further includes an update control.

Optionally, in some embodiments of the present application, the fifth input user may be a click input, a long press input, a sliding input, or a preset track input of the update control. The method can be specifically determined according to actual use conditions, and the embodiment of the application is not limited.

Step 802, the electronic device responds to the fifth input, and updates the target voice segment according to the editing information corresponding to the fifth input.

In some embodiments of the application, the electronic device may replace, delete, etc., the speech content of the target speech segment in response to the fifth input.

As shown in fig. 7, the user may click and input the preceding voice segment insert 181, so that the electronic device may display a voice segment editing interface corresponding to the preceding voice segment insert 181 in the voice recording interface, where the voice segment editing interface includes a text clipping control and a deleting control, and the user may input any one of the controls, so that the electronic device may edit the voice segment associated with the preceding voice segment insert 181 according to an editing function corresponding to the any one of the controls.

As another example, as shown in fig. 8, the user may perform click input on the subsequent speech segment insert 191, so that the electronic device may display a speech segment editing interface corresponding to the subsequent speech segment insert 191 in the speech recording interface, where the speech segment editing interface includes a replacement control and a deletion control, and the user may perform input on any control, so that the electronic device may perform editing processing on the speech segment associated with the subsequent speech segment insert 191 according to an editing function corresponding to the any control.

Optionally, in some embodiments of the present application, the method for processing a voice message provided in the embodiment of the present application further includes the following step 901.

Step 901, displaying the insert control area on a first screen of the electronic device, and displaying the target voice message on a second screen of the electronic device.

For example, the electronic device may display the insert editing interface on a first screen and the timeline on a second screen.

In some embodiments of the present application, in the case where the electronic device does not have a folding screen, the electronic device may display the insert editing interface and the time axis in different screen areas through split screens, respectively; in the case where the electronic device is provided with a folding screen, the electronic device may display the insert editing interface and the time axis on different screens.

It is understood that the first screen may be any screen in the electronic device, and the second screen is a screen other than the first screen in the electronic device.

In some embodiments of the present application, the electronic device may display different contents in the voice recording interface on different screens, thereby improving flexibility of the electronic device in displaying the voice recording interface.

The foregoing is an explanation of the interface of the voice message processing method according to the embodiment of the present application, and the following details an explanation of the internal processing manner of the voice message processing method according to the embodiment of the present application.

In step S201, during the process of recording the target voice message, the electronic device inserts N function identifiers in the target voice message.

In some embodiments of the application, each function identifier indicates an audio editing function, N being a positive integer.

In some embodiments of the present application, the functions and the number corresponding to the N function identifiers may be determined based on the input of the user.

Optionally, in some embodiments of the present application, the function identifier includes: characters or functional data blocks.

In some embodiments of the present application, the character may include at least one of: special characters, chinese-english or emotions.

In some embodiments of the present application, the above-described function data block is used for the electronic device to distinguish a target audio clip described below from an insertion position in a target voice message according to the function data block.

In some embodiments of the present application, the above-described audio editing function may be any of the following: text insertion function, voice insertion function, and preamble control function.

It should be noted that, the above-mentioned preamble control function is to trace back the audio clip within the preset time period forward at the position where the function identifier is inserted.

In some embodiments of the present application, before recording the target voice message, the electronic device may determine an audio editing function corresponding to the function identifier according to the input of the user, and establish an association between the audio editing function and the function identifier, so that the electronic device may determine the corresponding audio editing function according to the function identifier.

For example, a user may determine an audio editing function corresponding to a function identification in a setup application of an electronic device.

In some embodiments of the present application, the audio editing function corresponding to the function identifier may be preset for the electronic device.

In some embodiments of the present application, after determining the audio editing function corresponding to the function identifier, the electronic device may display the function identifier and the audio editing function corresponding to the function identifier, so that a user may input the function identifier, so that the electronic device may update the audio editing function corresponding to the function identifier, and establish an association between the updated audio editing function and the function identifier, so that the electronic device may determine the corresponding updated audio editing function according to the function identifier.

Step S202, when the recording of the target voice message is finished, the electronic equipment edits the target voice message according to the audio editing function indicated by each function identifier in the target voice message and the insertion position of each function identifier, and generates a first voice message.

In some embodiments of the present application, when recording the target voice message is finished, the electronic device may traverse each function identifier in the target voice message, so that the electronic device may determine an insertion position of each function identifier in the target voice message and an audio editing function corresponding to each function identifier, and then, the electronic device may edit the target audio segment according to the audio editing function indicated by each function identifier in the target audio segment in the target voice message and the insertion position of each function identifier, to generate the first voice message.

In some embodiments of the present application, the target audio piece is an audio piece determined for a plurality of function identifications.

In some embodiments of the present application, the electronic device may end recording the target voice message according to the user's input.

For example, before recording the target voice message, the electronic device may receive a click input of the first recording control by the user, so that the electronic device may start recording the target voice message, and then, the electronic device may receive a click input of the first recording control by the user again, so as to end recording the target voice message.

In some embodiments of the present application, when recording the target voice message is finished, the electronic device may re-edit the audio editing function indicated by the function identifier and the insertion position of each function identifier, so as to re-edit the target audio clip, thereby generating the first voice message.

The application provides a voice message processing method, in the case of recording a target voice message, an electronic device can respond to a first input and display a target insert at a first position of the target voice message, on one hand, because the target insert is used for marking a target voice fragment corresponding to the first position, when a user edits the target voice message, the user can directly position the voice fragment to be edited, on the other hand, because the target insert is also used for marking the voice editing type of the target voice fragment, the user can directly call a voice editing function corresponding to the voice editing type, thereby improving the editing efficiency of the voice message.

Alternatively, in some embodiments of the present application, the above step S202 may be specifically implemented by the following step S202 a.

Step S202a, the electronic device edits the audio clip corresponding to the insertion position of the third function identifier by using the first audio editing element associated with the third function identifier according to the audio editing function indicated by the third function identifier, so as to generate a first voice message.

In some embodiments of the application, the first audio editing element includes at least one of: text, audio, video, images, special symbols.

In some embodiments of the present application, since the third function identifier has an association relationship with the first audio editing element, the electronic device may add the first audio editing element to the audio clip corresponding to the insertion position of the third function identifier for editing processing to generate the first voice message when the electronic device recognizes the third function identifier.

In some embodiments of the present application, before the electronic device adds the first voice editing element to the audio clip corresponding to the insertion position of the third function identifier to perform editing processing, before generating the first voice message, the electronic device may display an editing control corresponding to the first voice editing element in the voice recording interface, so that the electronic device may adjust the content of the first voice editing element according to the input of the user to the editing control, and add the adjusted first voice editing element to the audio clip corresponding to the insertion position of the third function identifier to perform editing processing, to generate the first voice message.

For example, assuming that the voice editing function corresponding to the third function identifier is a voice adding function, after the electronic device recognizes the third function identifier, the electronic device may add the target voice (i.e., the first audio editing element) to the audio clip corresponding to the insertion position of the third function identifier for editing processing, so as to generate the first voice message.

In some embodiments of the present application, the target voice may be stored in the electronic device, or recorded by the electronic device in real time.

For example, for the electronic device to record the target voice in real time, it may be understood that the electronic device may display a second recording control in the voice recording interface, and the user may perform long-press input on the second recording control, so that the electronic device may record the voice output by the user, and thus the electronic device may obtain the target voice, and add the target voice to an audio segment corresponding to the insertion position of the third function identifier for editing processing, so as to generate the first voice message.

As another example, assuming that the audio editing function corresponding to the third function identifier is a text adding function, after identifying the third function identifier, the electronic device may display at least one text to be selected in the voice recording interface, and then according to the input of the user, so that the electronic device may add the target text in the at least one text to be selected to the audio clip corresponding to the insertion position of the third function identifier for editing processing, to generate the first voice message; or the electronic equipment can display a text editing control on the voice recording interface, and can obtain a target text based on the input of the text editing control by a user, and the target text is added to the audio clip corresponding to the insertion position of the third function identifier for editing processing to generate the first voice message.

It should be noted that, the target text is hidden in the sent first voice message, that is, the user does not display the target text on the instant messaging interface of the electronic device when sending the first voice message; when the electronic device converts the first voice message into the text file, the electronic device can display the text corresponding to the first voice message and the target text on the instant communication interface, and the target text is located at the insertion position of the third function identifier.

In some embodiments of the present application, after the electronic device sends the first voice message to the other electronic device, the electronic device may send a prompt message to the other electronic device, where the prompt message is used to prompt a user to whom the other electronic device belongs, the first voice message includes the target text, and the user to which the other electronic device belongs may convert the first voice message into a text file for viewing.

For example, assuming that the audio editing function corresponding to the third function identifier is a video adding function, after identifying the third function identifier, the electronic device may display at least one video to be selected in the voice recording interface, and then according to the input of the user, so that the electronic device may add the target video in the at least one video to be selected to the audio clip corresponding to the insertion position of the third function identifier for editing processing, to generate the first voice message; or the electronic equipment can display the video shooting control on the voice recording interface, and can obtain a target video based on the input of the user to the video shooting control, and the target video is added to the audio clip corresponding to the insertion position of the third function identifier for editing processing to generate the first voice message.

Also for example, during the process of playing the first voice message by the electronic device, the electronic device may display the target video in the target window, cancel the display of the target window after the target video is completely played, and then continue playing the first voice message.

Specifically, the target window may be a popup window.

In some embodiments of the present application, the electronic device may set transparency for the pop-up window, so that the user may view the notification message in the instant messaging interface while viewing the target video.

In some embodiments of the present application, the electronic device may save the link corresponding to the target video in the first voice message, so that when the electronic device acquires the video link in the process of playing the first voice message, the electronic device may automatically jump to the video playing interface according to the link, return to the instant messaging interface after the playing is completed, and continue to play the first voice message.

In some embodiments of the present application, when the electronic device obtains the video link, the electronic device may display the video link to the user, so as to determine whether to play the target video based on the input of the user, if the electronic device determines to play the target video, see the above embodiments, and if the electronic device determines not to play the target video, continue playing the first voice message.

It can be appreciated that, for the above images and special symbols, reference may be made to the embodiments corresponding to the text adding function, and for avoiding repetition, the description is omitted here.

It should be noted that, in the above embodiments of the method, a single editing function is taken as an example, and for the present application, each editing function may be added to the target voice message according to the input of the user.

In some embodiments of the present application, during the recording process of the target voice message by the electronic device, the electronic device may edit the target voice message being recorded, so that the electronic device may directly obtain the edited first voice message, and the first voice message does not need to be replayed to determine the editing position, thereby improving the flexibility and efficiency of editing voice by the electronic device.

Alternatively, in some embodiments of the present application, the "inserting N function identifiers by the electronic device in the target voice message" in the step S201 may be specifically implemented by the following step S201 a.

In step S201a, the electronic device inserts, in response to the first input, a first function identifier at a location corresponding to the first timestamp in the target voice message.

In some embodiments of the present application, the first function identifier corresponds to a first function control, where the first function identifier is at least one of N function identifiers.

In some embodiments of the present application, the voice recording interface may further include: inserting a control; after determining the first function control, the user can input the insertion control, so that the electronic device can insert the first function identifier corresponding to the first function control in the end position of the target voice message in recording.

In some embodiments of the present application, the electronic device may insert the first function identifier at a position corresponding to the first timestamp in the target voice message according to the first input of the user, so that when the recording of the target voice message is finished, the electronic device may directly edit the position corresponding to the first timestamp in the target voice message according to the edit function corresponding to the first function identifier, thereby obtaining the first voice message, and improving efficiency of editing the audio file by the electronic device.

It should be noted that, in the voice message processing method provided by the embodiment of the present application, the execution body may be a voice message processing apparatus, or an electronic device, or may also be a functional module or entity in the electronic device. In some embodiments of the present application, a voice message processing apparatus is described by taking a voice message processing method performed by the voice message processing apparatus as an example.

Fig. 9 shows a schematic diagram of one possible architecture of a voice message processing apparatus involved in some embodiments of the present application. As shown in fig. 9, the voice message processing apparatus 70 may include: a receiving module 71 and a display module 72.

Wherein, the receiving module 71 is configured to receive a first input in a case of recording a target voice message. A display module 72 for displaying a target insert at a first location of a target voice message in response to the first input received by the receiving module 71; the target insert is used for marking a target voice segment corresponding to the first position and a voice editing type of the target voice segment.

In one possible implementation, the display module 72 is specifically configured to display the insert control area in the case of recording the target voice message; the receiving module 71 is specifically configured to receive a first input to the insert control area.

In one possible implementation manner, the insert control area includes at least two insert controls, where the at least two insert controls respectively correspond to different insert types and voice editing types; the receiving module 71 is specifically configured to receive a first input to a target insert control of the at least two insert controls. The display module 72 is specifically configured to display, in response to the first input received by the receiving module 71, a target insert corresponding to the target insert control at a first location of the target voice message; the target voice segment corresponding to the first position of the target insert mark is determined according to the type of the insert corresponding to the target insert control, and the voice editing type of the target insert mark is the voice editing type corresponding to the target insert control.

In one possible implementation manner, the voice message processing apparatus 70 provided in the embodiment of the present application further includes a determining module; and a determining module, configured to determine, in response to the first input received by the receiving module 71, a target voice segment corresponding to the first position of the target insert mark and a voice editing type of the target insert mark according to an input manner of the first input.

In a possible implementation manner, the determining module is further configured to determine, in response to the first input, a first location of the target voice message according to an input time of the first input. The display module 72 is further configured to display the target insert at the first location.

In a possible implementation manner, the determining module is specifically configured to determine, in response to the first input received by the receiving module 71, a display parameter of the target insert according to an insert type of the target insert. The display module 72 is specifically configured to display the target insert at the first location of the target voice message according to the display parameter of the target insert.

In a possible implementation manner, the determining module is further configured to determine a display parameter of the target voice segment according to an insert type of the target insert; the display module 72 is further configured to update and display the target voice segment according to the display parameter of the target voice message.

In one possible implementation manner, the target voice segment corresponding to the first position and the voice editing type of the target voice segment are determined according to an insert type of the target insert, where the insert type of the target insert is a custom voice segment editing type, a preamble voice segment editing type, a postamble voice segment editing type, a custom voice segment annotation type, a preamble voice segment annotation type, and a postamble voice segment annotation type.

In a possible implementation, the receiving module 71 is further configured to receive a second input of the target insert in case recording of the target voice message is suspended. The display module 72 is further configured to display a voice clip editing interface according to the voice editing type of the target insert tag in response to the second input received by the receiving module 71.

In a possible implementation manner, the receiving module 71 is further configured to receive a third input to the target insert; the display module is specifically configured to display a voice clip editing interface according to the voice editing type of the target insert tag in response to the third input received by the receiving module 71.

In one possible implementation, the voice edit type of the target insert tag is a text comment type; the voice message processing device 70 provided by the embodiment of the application further comprises an adding module; the receiving module 71 is further configured to receive a fourth input to the voice clip editing interface. And an adding module, configured to add annotation information corresponding to the fourth input to the target voice segment in response to the fourth input received by the receiving module 71.

In one possible implementation manner, the voice editing type of the target insert tag is a voice clip editing type; the voice message processing device 70 provided by the embodiment of the application further comprises an updating module; the receiving module 71 is further configured to receive a fifth input to the voice clip editing interface; and the updating module is used for responding to the fifth input received by the receiving module and updating the target voice fragment according to editing information corresponding to the fifth input.

The embodiment of the application provides a voice message processing device, which can respond to a first input and display a target insert at a first position of a target voice message under the condition of recording the target voice message, on one hand, the target insert is used for marking a target voice fragment corresponding to the first position, so that a user can directly position the voice fragment to be edited when editing the target voice message, and on the other hand, the target insert is also used for marking the voice editing type of the target voice fragment, so that the user can directly call a voice editing function corresponding to the voice editing type, and the editing efficiency of the voice message is improved. .

The voice message processing apparatus in some embodiments of the present application may be an electronic device or may be a component in an electronic device, such as an integrated circuit or chip. The electronic device may be a terminal, or may be other devices than a terminal. By way of example, the mobile electronic device may be a mobile phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, mobile internet appliance (Mobile Internet Device, MID), augmented reality (augmented reality, AR)/Virtual Reality (VR) device, robot, wearable device, ultra-mobile personal computer, UMPC, netbook or personal digital assistant (personal digital assistant, PDA), etc., and may also be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., without limitation of the embodiments of the present application.

The voice message processing device in some embodiments of the present application may be a device having an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.

The voice message processing device provided by the embodiment of the present application can implement each process implemented by the above method embodiment, and in order to avoid repetition, details are not repeated here.

Optionally, as shown in fig. 10, the embodiment of the present application further provides an electronic device 90, including a processor 91 and a memory 92, where the memory 92 stores a program or instructions that can be executed on the processor 91, and the program or instructions implement the steps of the foregoing embodiment of the voice message processing method when executed by the processor 91, and achieve the same technical effects, so that repetition is avoided and no further description is given here.

It should be noted that, in some embodiments of the present application, the electronic device includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 11 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 100 includes, but is not limited to: radio frequency unit 101, network module 102, audio output unit 103, input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, and processor 110.

Those skilled in the art will appreciate that the electronic device 100 may further include a power source (e.g., a battery) for powering the various components, and that the power source may be logically coupled to the processor 110 via a power management system to perform functions such as managing charging, discharging, and power consumption via the power management system. The electronic device structure shown in fig. 11 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than illustrated, or may combine some components, or may be arranged in different components, which are not described in detail herein.

Wherein the user input unit 107 is configured to receive a first input in case of recording a target voice message. A display unit 106 for displaying a target insert at a first location of a target voice message in response to a first input; the target insert is used for marking a target voice segment corresponding to the first position and a voice editing type of the target voice segment.

The embodiment of the application provides electronic equipment, which can respond to a first input and display a target insert at a first position of a target voice message under the condition of recording the target voice message, on one hand, the target insert is used for marking a target voice fragment corresponding to the first position, so that a user can directly position a voice fragment to be edited when editing the target voice message, and on the other hand, the target insert is also used for marking the voice editing type of the target voice fragment, so that the user can directly call a voice editing function corresponding to the voice editing type, and the editing efficiency of the voice message is improved.

Optionally, in some embodiments of the present application, the display unit 106 is specifically configured to display an insert control area when the target voice message is recorded; the user input unit 107 is specifically configured to receive a first input to the insert control area.

Optionally, in some embodiments of the present application, the insert control area includes at least two insert controls, where the at least two insert controls respectively correspond to different insert types and voice editing types; the user input unit 107 is specifically configured to receive a first input to a target insert control of the at least two insert controls; the display unit 106 is specifically configured to display, in response to the first input, a target insert corresponding to the target insert control at a first location of the target voice message; the target voice segment corresponding to the first position of the target insert mark is determined according to the type of the insert corresponding to the target insert control, and the voice editing type of the target insert mark is the voice editing type corresponding to the target insert control.

Optionally, in some embodiments of the present application, the processor 110 is further configured to determine, in response to the first input, a target voice segment corresponding to the first position of the target insert mark and a voice edit type of the target insert mark according to an input manner of the first input.

Optionally, in some embodiments of the present application, the processor 110 is specifically configured to determine, in response to the first input, a first location of the target voice message according to an input time of the first input; the display unit 106 is specifically configured to display the target insert at the first position.

Optionally, in some embodiments of the present application, the processor 110 is specifically configured to determine, in response to a first input, a display parameter of the target insert according to an insert type of the target insert; the display unit 106 is specifically configured to display the target insert in the first location of the target voice message according to the display parameter of the target insert.

Optionally, in some embodiments of the present application, the processor 110 is further configured to determine the display parameter of the target speech segment according to the type of the target insert. The display unit 106 is further configured to update and display the target voice segment according to the display parameter of the target voice message.

Optionally, in some embodiments of the present application, the user input unit 107 is further configured to receive a second input of the target insert in a case where recording of the target voice message is suspended. The display unit 106 is further configured to display a voice clip editing interface according to the voice editing type of the target insert mark in response to the second input.

Optionally, in some embodiments of the present application, the user input unit 107 is further configured to receive a third input of the target insert; the display unit 106 is further configured to display a voice clip editing interface according to the voice editing type of the target insert mark in response to the third input.

Optionally, in some embodiments of the present application, the voice edit type of the target insert tag is a text comment type; the user input unit 107 is further configured to receive a fourth input to the voice clip editing interface; the processor 110 is further configured to add annotation information corresponding to the fourth input to the target speech segment in response to the fourth input.

Optionally, in some embodiments of the present application, the voice edit type of the target insert tag is a voice clip edit type; the user input unit 107 is further configured to receive a fifth input to the voice clip editing interface; the processor 110 is further configured to respond to the fifth input, and update the target speech segment according to editing information corresponding to the fifth input.

The electronic device provided by the embodiment of the application can realize each process realized by the embodiment of the method and can achieve the same technical effect, and in order to avoid repetition, the description is omitted here.

The beneficial effects of the various implementation manners in this embodiment may be specifically referred to the beneficial effects of the corresponding implementation manners in the foregoing method embodiment, and in order to avoid repetition, the description is omitted here.

It should be appreciated that in some embodiments of the application, the input unit 104 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 1042, the graphics processor 1041 processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 107 includes at least one of a touch panel 1071 and other input devices 1072. The touch panel 1071 is also referred to as a touch screen. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.

Memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a first memory area storing programs or instructions and a second memory area storing data, wherein the first memory area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 109 may include volatile memory or nonvolatile memory, or the memory 109 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (ddr SDRAM), enhanced SDRAM (Enhanced SDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DRRAM). Memory 109 in some embodiments of the application includes, but is not limited to, these and any other suitable types of memory.

Processor 110 may include one or more processing units; optionally, the processor 110 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, etc., and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above method embodiment, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.

The embodiment of the application further provides a chip, which comprises a processor and a communication interface, wherein the communication interface is coupled with the processor, and the processor is used for running programs or instructions to realize the processes of the embodiment of the method, and can achieve the same technical effects, so that repetition is avoided, and the description is omitted here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

Embodiments of the present application provide a computer program product stored in a storage medium, where the program product is executed by at least one processor to implement the processes of the embodiments of the method for processing a voice message, and achieve the same technical effects, and are not described herein in detail to avoid repetition.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims

1. A method of processing a voice message, the method comprising:

receiving a first input in the case of recording a target voice message;

displaying a target insert at a first location of the target voice message in response to the first input; the target insert is used for marking a target voice segment corresponding to the first position and a voice editing type of the target voice segment.

2. The method of claim 1, wherein receiving the first input in the case of recording the target voice message comprises:

displaying an insert control area under the condition of recording the target voice message;

a first input to the insert control region is received.

3. The method of claim 2, wherein the plug-in control area comprises at least two plug-in controls, the at least two plug-in controls corresponding to different plug-in types and voice editing types, respectively;

the receiving a first input to the insert control region includes:

receiving a first input to a target one of the at least two insert controls;

the displaying, in response to the first input, a target insert at a first location of the target voice message, comprising:

Responding to the first input, and displaying a target insert corresponding to the target insert control at a first position of the target voice message; the target voice segment corresponding to the first position of the target insert mark is determined according to the insert type corresponding to the target insert control, and the voice editing type of the target insert mark is the voice editing type corresponding to the target insert control.

4. The method according to claim 2, wherein the method further comprises:

and responding to the first input, and determining a target voice fragment corresponding to the first position of the target insert mark and a voice editing type of the target insert mark according to an input mode of the first input.

5. The method of claim 1, wherein the displaying the target insert at the first location of the target voice message in response to the first input comprises:

responsive to the first input, determining a first location of the target voice message according to an input time of the first input;

displaying the target insert at the first location.

6. The method of claim 1, wherein the displaying the target insert at the first location of the target voice message in response to the first input comprises:

Determining, in response to the first input, a display parameter of the target interposer according to an interposer type of the target interposer;

and displaying the target insert at the first position of the target voice message according to the display parameters of the target insert.

7. The method according to claim 1, wherein the method further comprises:

determining display parameters of the target voice fragments according to the type of the target insert;

and updating and displaying the target voice fragment according to the display parameters of the target voice message.

8. The method of claim 1, wherein the target speech segment corresponding to the first location and the speech editing type of the target speech segment are determined according to an insert type of the target insert, and the insert type of the target insert is a custom speech segment editing type, a pre-order speech segment editing type, a post-order speech segment editing type, a custom speech segment annotation type, a pre-order speech segment annotation type, a post-order speech segment annotation type.

9. The method according to claim 1, wherein the method further comprises:

Receiving a second input of the target insert in the event that recording of the target voice message is paused;

and responding to the second input, and displaying a voice fragment editing interface according to the voice editing type of the target insert mark.

10. The method according to claim 1, wherein the method further comprises:

receiving a third input for the target insert;

and responding to the third input, and displaying a voice fragment editing interface according to the voice editing type of the target insert mark.

11. The method of claim 10, wherein the voice edit type of the target insert label is a text comment type;

the method further comprises the steps of:

receiving a fourth input to the speech segment editing interface;

and responding to the fourth input, and adding annotation information corresponding to the fourth input to the target voice segment.

12. The method of claim 10, wherein the voice edit type of the target insert label is a voice clip edit type;

the method further comprises the steps of:

receiving a fifth input to the speech segment editing interface;

and responding to the fifth input, and updating the target voice fragment according to editing information corresponding to the fifth input.

13. A voice message processing apparatus, the apparatus comprising: a receiving module and a display module;

the receiving module is used for receiving a first input under the condition of recording the target voice message;

the display module is used for responding to the first input received by the receiving module and displaying a target insert at a first position of the target voice message; the target insert is used for marking a target voice segment corresponding to the first position and a voice editing type of the target voice segment.

14. An electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the voice message processing method of any of claims 1 to 12.

15. A readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the voice message processing method according to any of claims 1 to 12.