CN110061910B

CN110061910B - Method, device and medium for processing voice short message

Info

Publication number: CN110061910B
Application number: CN201910364866.3A
Authority: CN
Inventors: 杨静静
Original assignee: Shanghai Zhangmen Science and Technology Co Ltd
Current assignee: Shanghai Zhangmen Science and Technology Co Ltd
Priority date: 2019-04-30
Filing date: 2019-04-30
Publication date: 2021-11-30
Anticipated expiration: 2039-04-30
Also published as: CN110061910A; WO2020221105A1

Abstract

The invention discloses a method, equipment and medium for processing voice short messages. The scheme comprises the following steps: acquiring a trigger operation input by a user through a session interface of a chat session of social software; responding to the trigger operation, and displaying an editing interface aiming at the first voice short message; acquiring editing operation based on the editing interface; and responding to the editing operation, clipping the first voice short message to obtain a clipped voice short message. By adopting the method, the equipment or the medium of the invention, the editing operation of the voice short message by the user can be realized, the user can obtain the required voice short message without re-inputting the voice short message by the user, and the communication efficiency based on the voice short message is improved.

Description

Method, device and medium for processing voice short message

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, and a medium for processing a voice short message.

Background

In the prior art, most of software products with social attributes have a function of sending voice short messages. The function of sending the voice short message means that the user can record the voice of the user and send the voice to another software user through corresponding software. However, the existing voice short message sending function is simple, and usually only supports the basic operations of entering and sending.

Disclosure of Invention

In view of this, the following embodiments of the present invention provide a method, an apparatus, and a medium for processing a voice short message, so as to avoid repeated entry, thereby improving communication efficiency based on the voice short message.

To solve the above technical problem, some embodiments of the present invention are implemented as follows:

in one aspect, some embodiments of the present invention provide a method for processing a voice short message, including: acquiring a trigger operation input by a user through a session interface of a chat session of social software; responding to the trigger operation, and displaying an editing interface aiming at the first voice short message; the first voice short message is input by the user through the conversation interface; acquiring editing operation based on the editing interface; and responding to the editing operation, clipping the first voice short message to obtain a clipped voice short message.

In another aspect, some embodiments of the present invention provide a method for processing a voice short message, including: the method comprises the steps that a terminal obtains a trigger operation input by a user through a session interface of a chat session of social software; responding to the trigger operation, and displaying an editing interface aiming at the first voice short message; the first voice short message is input by the user through the conversation interface; acquiring editing operation based on the editing interface; sending an editing instruction to a server, wherein the editing instruction comprises information corresponding to the editing operation; acquiring a processing result fed back by the server for clipping the first voice short message; and obtaining the clipped voice short message based on the processing result.

In another aspect, some embodiments of the present invention provide a method for processing a voice short message, including: the server acquires a first voice short message uploaded by the terminal; the first voice short message is input into the terminal through a session interface of a chat session of social software of the terminal; acquiring an editing instruction sent by the terminal, wherein the editing instruction is generated based on editing operation input through an editing interface of social software of the terminal; responding to the editing instruction, clipping the first voice short message to obtain a clipped voice short message; and sending the clipped voice short message to the terminal.

In yet another aspect, some embodiments of the present invention provide an apparatus for information processing, including a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method for processing a voice short message as described above.

In yet another aspect, some embodiments of the present invention provide a computer-readable medium, on which computer-readable instructions are stored, the computer-readable instructions being executable by a processor to implement a method for processing a voice short message as described above.

In the prior art, if a user does not express smoothly or says something wrong in the voice recording process, the user can only cancel sending the currently recorded voice and then re-record the voice. In the process of re-recording the voice, the user needs to record the spoken content again, so that repeated recording is caused, and the communication efficiency based on the voice short message is further reduced. However, the existing function of sending voice short messages which only supports basic operations is considered to be inertia in the field, because the function of sending voice short messages which only supports the basic operations is usually bundled with social software such as instant messaging and is widely used in daily life. The above-mentioned at least one technical solution adopted by the above-mentioned embodiment of the present invention breaks the above-mentioned thinking inertia, and can achieve the following beneficial effects: acquiring a trigger operation input by a user through a session interface of a chat session of social software; responding to the trigger operation, and displaying an editing interface aiming at the first voice short message; acquiring editing operation based on the editing interface; and responding to the editing operation, editing the first voice short message to obtain the edited voice short message, so that a user does not need to re-enter a new voice short message when needing to edit the voice short message, and can edit based on the original voice short message, thereby improving the communication efficiency based on the voice short message.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

fig. 1 is a schematic view of an application scenario of a method for processing a voice short message according to some embodiments of the present invention;

fig. 2 is a flowchart illustrating a method for processing a voice short message according to some embodiments of the present invention;

FIG. 3 is a schematic illustration of a conversation interface of a chat session of social software according to some embodiments of the invention;

fig. 4 is a schematic diagram of a conversation interface including a text editing interface for a first voice short message according to some embodiments of the present invention;

fig. 5 is a schematic diagram of a triggering manner for an inputted but unsent first voice short message according to some embodiments of the present invention;

fig. 6 is a schematic diagram of a triggering manner for a sent first voice short message according to some embodiments of the present invention;

fig. 7 is a schematic diagram of an editing manner for a plurality of first voice short messages according to some embodiments of the present invention;

fig. 8 is a flowchart illustrating a method for processing a voice short message according to the application scenario in fig. 1 according to some embodiments of the present invention;

fig. 9 is a schematic structural diagram of an apparatus for information processing according to some embodiments of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The technical solutions provided by the embodiments of the present invention are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic application scenario diagram of a method for processing a voice short message according to some embodiments of the present invention. As shown in fig. 1, the application scenario may include a server 101 and a terminal device 102.

The server 101 is a server of a service provider capable of providing a social network service, the service provider may run an application 103 supporting a voice short message sending function, the application 103 may be social software, and the service provider may provide a social network service supporting a voice short message interaction manner for a user based on the server 101 and the application 103.

The terminal device 102 may be loaded with an application 103 that supports a voice short message transmission function. The terminal device 102 may be a client side part of the application 103, and the server device 101 may be a server side part of the application 103. As an example, in fig. 1, the terminal device 102 may be a client of the application 103, and accordingly, the server 101 may be a server of the application 103. In practical applications, the terminal device 102 may be various terminal devices with a display screen, including but not limited to a smart phone, a tablet computer, a portable computer, a desktop computer, or the like; the server 101 may be a service device that provides various services, including but not limited to an integrated server or a distributed server, etc.

It should be understood that the number of servers and terminal devices in fig. 1 is merely illustrative. Any number of servers and devices may be employed, as desired for implementation.

Fig. 2 is a flowchart illustrating a method for processing a voice short message according to some embodiments of the present invention. The execution subject of the flow may be a terminal device loaded with an application program.

As shown in fig. 2, the process may include:

step S201: and acquiring a trigger operation input by a user through a session interface of the chat session of the social software.

The session interface of the chat session of the social software may be one of application interfaces of an application program, and the session interface may be displayed on a terminal device of the user. The user can trigger the conversation interface according to the preset rule of the social software to input the trigger operation, and then edit the voice short message.

In practical application, a voice short message editing button may be disposed in the session interface, and a user may click the voice short message editing button to input the triggering operation. The session interface may not be provided with a voice short message editing button, and after the user clicks the voice short message in the session interface, an option including a short message editing function is displayed.

Fig. 3 is a schematic view of a session interface of a chat session of social software according to some embodiments of the present invention. In fig. 3, a session interface of a chat session of the social software is displayed on the terminal device 102 of the first user, and a voice short message editing button is not set in the session interface. As shown in fig. 3 (a), a conversation interface 301 of a chat session of the social software is displayed on the display screen of the terminal device 102, in the conversation interface 301, a conversation message 302 and a conversation message 303 represent conversation messages input by a first user, and a conversation message 304 represents conversation messages input by a second user. The second user is a chat object of the first user, and the first user and the second user may be different users. The conversation message 302 has the voice short message identification and the voice duration information in the form of "sound wave" (where "2 s" means that the voice short message duration is 2 seconds), which means that the conversation message 302 is a voice short message and the voice duration is 2 seconds. Similarly, the conversation message 303 is a voice short message with a voice duration of 6 seconds. The conversation message 304 does not have the voice short message identification and the voice duration information, which means that the conversation message 304 is text information. A voice short message input button 305 may also be arranged in the session interface 301, and a user may click the voice short message input button 305 to trigger a voice input function. When the user needs to edit the voice short message 303, the user can click the voice short message 303. After the user clicks the voice short message 303, the session interface 301 may be as shown in (b) of fig. 3, and a voice short message editing button 306 may be displayed on the session interface 301, and the user may click the voice short message editing button 306 to input a trigger operation.

Step S202: responding to the trigger operation, and displaying an editing interface aiming at the first voice short message; the first voice short message is input by the user through the conversation interface.

In this embodiment, the editing interface is configured to display information related to an editing operation on the first voice short message, so that a user can perform a corresponding editing operation on the first voice short message as required.

Step S203: and acquiring the editing operation based on the editing interface.

Step S204: and responding to the editing operation, clipping the first voice short message to obtain a clipped voice short message.

In this embodiment, after the triggering operation input by the user is acquired, the editing interface for the first voice short message is displayed, and after the editing operation based on the editing interface by the user is acquired, the corresponding edited voice short message is generated, so that the editing operation of the user on the first voice short message is realized, the user does not need to re-enter the first voice short message, the required voice short message can be obtained, the communication efficiency based on the voice short message is improved, and the user experience can be improved.

In the chat scene based on the voice short message, the content of the voice short message input by the user is rich, the user cannot remember the specific content of the input voice short message easily, and the position of the voice content with problems in the voice short message cannot be determined accurately and intuitively, so that trouble is brought to the user for inputting and editing operation on an editing interface. Therefore, based on the processing method of the voice short message in the foregoing embodiment, an implementation manner of obtaining an editing operation of an editing interface is provided in this specification.

In one implementation of the editing operation for acquiring the editing interface, in step S202: before displaying the editing interface for the first voice short message, a conversion process for the first voice short message may be further included. The conversion process may be performed locally by the terminal or may be performed by the server.

When executed locally by the terminal, the following steps may be taken: and identifying the first voice short message to obtain the text message corresponding to the first voice short message.

Correspondingly, the displaying an editing interface for the first voice short message in step S202 may include: and displaying a text editing interface aiming at the first voice short message, wherein the text editing interface comprises texts corresponding to the text information.

Correspondingly, the acquiring of the editing operation based on the editing interface in step S203 may include: and acquiring the selection operation of the characters on the character editing interface.

Fig. 4 is a schematic diagram of a conversation interface including a text editing interface for a first voice short message according to some embodiments of the present invention. As shown in (a) of fig. 4, a conversation interface of the chat conversation of the social software is displayed on the display screen of the terminal device 102, the conversation interface includes a first voice short message 303 and a text editing interface 401 for the first voice short message, and the first voice short message 303 is the same as the conversation message 303 in fig. 3. The text editing interface 401 displays text corresponding to the first voice short message 303. Assuming that the text corresponding to the first voice short message is "i plan to go to work at six times tonight and expect to go to home at five times tonight", since the time of arrival is earlier than the time of going to work, where the time of arrival is wrong, the first user needs to edit the first voice short message 303 to obtain a voice short message with correct expression content. Specifically, the user may perform a selection operation on the text editing interface according to a preset rule of the social software, and at this time, the social software may obtain the selection operation of the user on the text editing interface. When the user selects the characters on the character editing interface, the display state of the selected characters can be changed. For example, the size, font, shading, etc. of the selected file are changed, and in the diagram (b) in fig. 4, the shading of the selected character is changed as an example, and it is assumed that the shading of the unselected character is white, so that it is known that the user performs the selecting operation on the characters such as "expect to arrive at home five times tonight". It should be noted that there are other implementation manners for displaying the selected character, and a detailed description thereof is not given here.

In the implementation mode, the text obtained by identifying the first voice short message is displayed to the user by displaying the text editing interface aiming at the first voice short message, so that the user can know the specific content of the first voice short message, the user does not need to play the first voice short message, and the convenience of the user in identifying the problems of the first voice short message is improved. By means of enabling the user to select the characters on the character editing interface, the editing operation of the editing interface is achieved, the user can conveniently, quickly and visually execute the editing operation, and user experience can be improved.

In practical applications, displaying an editing interface for the first voice short message may also be implemented in a manner that an audio editing interface for the first voice short message is displayed, where the audio editing interface may include a play time axis corresponding to the first voice short message. The user can select the voice short message corresponding to the two selected moments by selecting the two moments on the time axis. The editing interface in some embodiments of the present invention may be a plurality of editing interfaces such as a text editing interface and an audio editing interface, which are not specifically limited herein.

Since the user can perform various operations of sending the voice short message, entering the voice short message, and withdrawing the voice short message in the conversation interface of the chat session of the social software, different voice short messages input by the user may correspond to different states, such as an input but not sent state, a sent state, and a withdrawn state. In order to improve the function richness of the social software and meet more use requirements of the user, the trigger operation for acquiring the user input in step S201 in the embodiment may have multiple implementation manners. One implementation may be: acquiring a trigger operation for an input but unsent voice short message, and another implementation mode can be as follows: and acquiring a trigger operation for the voice short message in the sent state.

In an implementation manner of the foregoing trigger operation of acquiring the user input, the acquiring the trigger operation input by the user through the session interface of the social software in step S201 may include:

acquiring a first trigger operation of the user on the input and unsent first voice short message.

In this implementation manner of some embodiments of the present invention, after the user enters the first voice short message, the voice state of the first voice short message is the input but unsent state, and the user needs to perform the trigger operation of sending the first voice short message to send the first voice short message. A trigger area for the input and unsent first voice short message is displayed in the session interface of the social software, and the user can trigger the trigger area according to a preset rule of the social software so as to execute a first trigger operation on the input and unsent first voice short message. It should be noted that, the specific operation step of the user performing the first triggering operation on the input but unsent first voice short message may have various implementations, and is not limited in particular herein.

Fig. 5 is a schematic diagram of a triggering manner for an inputted but unsent first voice short message in some embodiments of the present invention. As shown in (a) of fig. 5, a conversation interface of a chat conversation of the social software is displayed on a display screen of the terminal device 102, a first voice short message 501 in the conversation interface has a word-like identifier of "not sent" at one end, meaning that the first voice short message 501 is in an input but not sent state, and a voice short message 302 in the conversation interface does not have a word-like identifier of "not sent", meaning that the voice short message 302 is in a sent state. The user can generate a voice short message that has been input but not transmitted by clicking the voice short message input button 305 to input voice. When the user clicks the input and unsent first voice short message 501, the conversation interface of the chat session of the social software may be as shown in fig. 5 (b), in which a voice short message edit button 502 is displayed, and the user may click the voice short message edit button 502 to input a first trigger operation for the input and unsent first voice short message 501.

In this implementation manner of this embodiment, the user may perform the first trigger operation on the input but unsent first voice short message, so as to edit the input but unsent first voice short message, and obtain the required voice short message, without the user entering the first voice short message again. Because the user can edit the voice short message before sending the voice short message, the user is prevented from sending inaccurate voice short message as far as possible, and the experience of a receiving party is improved.

In a chat scene based on the voice short message, after the user inputs the voice short message, the user may not determine whether the voice short message input by the user is clear, so that the user can know the content of the voice short message input by the user.

In some optional implementation manners of this embodiment, before performing the first trigger operation of acquiring the first voice short message that has been input but not sent by the user, the following steps may also be included:

and acquiring the playing operation of the user on the input and unsent first voice short message.

And responding to the playing operation, and playing the input and unsent first voice short message.

In this implementation manner of the embodiment, when the user enters the first voice short message but does not perform the trigger operation of sending the first voice short message, the first voice short message is the input but not sent first voice short message, and the user may perform the play operation for the input but not sent first voice short message to control the terminal device to play the input but not sent first voice short message. It should be noted that there are various implementation manners of the specific operation steps of the user performing the play operation on the input but unsent first voice short message, and the specific operation steps are not limited in this respect.

Taking fig. 5 as an example, the first user may perform a play operation by directly clicking on the input, but not sent, first voice short message 501. Of course, according to actual needs, the play button may be displayed after the user clicks the inputted but unsent first voice short message 501, and the user may execute the play operation on the inputted but unsent first voice short message 501 by clicking the play button.

In this implementation manner, the user can perform a play operation on the first voice short message that has been input but not sent, so that the user can obtain the content of the first voice short message in an auditory manner, which is convenient and fast, and has good practicability.

In another implementation manner of the foregoing trigger operation of acquiring the user input, the trigger operation of acquiring the user input through the session interface of the social software in step S201 may include:

and acquiring a second trigger operation of the user on the sent first voice short message.

Correspondingly, before displaying the editing interface for the first voice short message, the method may further include:

and in response to the second trigger operation, withdrawing the sent first voice short message in the chat session.

In this implementation manner, after the user performs a sending operation on the entered first voice short message, the voice state of the first voice short message may be changed to a sent state; the user may perform a second trigger operation on the sent first voice short message (i.e. the voice short message whose voice state is the sent state), and the second trigger operation may be a trigger operation for instructing to withdraw the sent first voice short message. The user may enter the second trigger operation by clicking on the sent first voice sms and the revocation option. It should be noted that, the specific operation step of the user performing the second triggering operation on the sent voice short message may also have other implementation manners, which is not limited herein.

Fig. 6 is a schematic diagram of a triggering manner for a sent first voice short message in some embodiments of the present invention. As shown in fig. 6, a conversation interface of the chat conversation of the social software is displayed on the display screen of the terminal device 102. When the user clicks the sent first voice short message 302, the session interface is as shown in fig. 6 (a), a recall button 601 for recalling the sent first voice short message 302 can be displayed on the session interface, and the user can input a second trigger operation by clicking the recall button 601. After the user clicks the recall button 601, the social software responds to the second trigger operation to recall the sent first voice short message 302 in the chat session. At this time, the session interface of the social software may be as shown in (b) of fig. 6, and the voice short message 602 in (b) of fig. 6 may be followed by a word identifier such as "revoked" to mean that the voice short message 602 is in a revoked state, and the voice short message 602 is the first voice short message 302 after revoked. The social software may also display an editing interface for the voice short message 602 in the conversation interface in response to the second trigger operation.

In this implementation manner, the user may perform a second trigger operation on the sent first voice short message, and the social software may withdraw the sent first voice short message selected by the user in response to the second trigger operation, so that the user may edit the withdrawn voice short message to obtain a required voice short message, and the user does not need to enter the corresponding first voice short message again, so as to reduce the voice entry time of the user. This implementation also provides a remedy for inaccurate voice short messages sent to the user.

In a chat scene based on voice short messages, after a user inputs a plurality of voice short messages, the sequence of the input voice short messages acquired by a chat object may be found to be required to be adjusted, so that the chat object can understand the content expressed by the user. Based on this, an implementation manner of performing an editing operation on a plurality of pieces of first voice short messages is given in the present specification.

In the above implementation manner of performing an editing operation on a plurality of first voice short messages, a plurality of first voice short messages may be displayed in the editing interface, and step S203: the acquiring of the editing operation based on the editing interface may include:

acquiring the selected operation of the plurality of pieces of the first voice short messages input based on the editing interface; and acquiring the splicing operation of the plurality of selected first voice short messages.

Correspondingly, step S204: in response to the editing operation, clipping the first voice short message may include:

generating a second voice short message based on the plurality of selected first voice short messages; the second voice short message comprises the selected first voice short messages, and the playing sequence of the selected first voice short messages in the second voice short message is consistent with the selected sequence of the selected first voice short messages.

In this implementation, the voice status of any one of the plurality of first voice short messages may be any one of an inputted but not sent status, a sent status, or a revoked status.

Fig. 7 is a schematic diagram illustrating an editing manner for a plurality of voice short messages according to some embodiments of the present invention. As shown in fig. 7 (a), the terminal device 102 displays an editing interface 701 generated after a user inputs a trigger operation to a session interface of a chat session of social software. The first voice short message 302, the selection option 703 corresponding to the first voice short message 302, the first voice short message 303 and the selection option 704 corresponding to the first voice short message 303 are displayed in the editing interface. Assume that the expression content corresponding to the first voice short message 302 is "i get home to cook this evening", and the expression content corresponding to the first voice short message 303 is "i get off work 5 pm today".

In the diagram (a) in fig. 7, the selection option 703 is filled with white color, which may mean that the first voice short message 302 corresponding to the selection option 703 is not selected. In practical application, a user can perform a selection operation on the first voice short message by clicking a selection option, and after the user clicks the selection option, a display state of the selection option can be changed. As shown in fig. 7 (b), the selected option 705 corresponding to the selected option 703 in the edit page 702 after the user performs the selection operation is filled with black color and has a tick mark, which may mean that the first voice short message 302 is selected by the user. Specifically, assume that the user clicks the selection option 704 first, the filling color of the selection option 706 corresponding to the selection option 704 is black and has a tick mark, and then clicks the selection option 703 to select the first voice sms 302 after selecting the first voice sms 303. The first voice short message 303 and the first voice short message 302 are spliced to generate a second voice short message, and the content expressed when the second voice short message is played can be 'i leave work at 5 pm today and go home to cook tonight'.

In this implementation manner, a second voice short message including a plurality of selected first voice short messages may be generated according to the plurality of selected first voice short messages selected by the user, and the playing sequence of the plurality of selected first voice short messages in the second voice short message is consistent with the selected sequence of the plurality of selected first voice short messages. The user can adjust the playing sequence of the recorded voice short messages according to the requirement, so that the chat object can conveniently understand the content expressed by the received voice short messages. And the user can splice a plurality of recorded voice short messages into a voice short message to be sent to the chat object, so that the chat object does not need to click each voice short message to obtain the content corresponding to the voice short message sent by the user, the operation steps of the user serving as the chat object are reduced, and the experience of the chat object on social software can be improved.

In a chat scene based on the voice short message, when a user edits an input voice short message, the input voice short message may have contents to be preserved or deleted. In order to facilitate the user to obtain the desired voice short message through the editing operation, there may be a plurality of implementation manners for obtaining the clipped voice short message in response to the editing operation in step S204 based on the implementation manner that the displayed editing interface is a text editing interface and the editing operation is a selection operation on the text editing interface. One implementation may be: deleting the voice short message corresponding to the character selected by the selection operation, wherein another implementation mode can be as follows: and storing the voice short message corresponding to the character selected by the selection operation.

In the first implementation manner of the above-mentioned voice short message clipped in response to the editing operation, step S204: the clipping the first voice short message in response to the editing operation to obtain a clipped voice short message may include:

and determining a starting time and an ending time according to the characters selected by the selection operation, wherein the starting time is the pronunciation starting time corresponding to the first character in the characters selected by the selection operation in the first voice short message, and the ending time is the pronunciation ending time corresponding to the last character in the characters selected by the selection operation in the first voice short message.

And determining the selected voice short message between the starting time and the ending time.

And deleting the selected voice short message to obtain the remaining voice short message.

In practical applications, the deleting the selected voice short message to obtain a remaining voice short message may include: analyzing a first audio file corresponding to the first voice short message to obtain first file header data and first audio data; determining second audio data corresponding to the selected voice short message from the first audio data; determining third audio data, wherein the third audio data is the residual audio data of the first audio data after the second audio data is removed; determining second header data according to the first header data and the third audio data; and generating a second audio file according to the second file header data and the third audio data.

The audio file is analyzed to obtain header data and audio data, wherein the header data comprises the total byte number of the audio file and the byte number of the audio data. For ease of understanding, it is assumed that the first header data includes: the total byte number of the first audio file is 220 bytes, and the byte number of the first audio data is 200 bytes. Since the byte number of the first header is the difference between the total byte number of the first audio file and the byte number of the first audio data, it can be known that the byte number of the first header is 20 bytes. The number of bytes of the audio data is in positive linear correlation with the duration of the audio data. Assuming that the duration of the first audio data is 6 seconds, the duration of the determined second audio data is 3 seconds, at this time, according to a formula, the number of bytes of the second audio data = the duration of the second audio data/the duration of the first audio data + the number of bytes of the first audio data, the number of bytes of the second audio data can be calculated to be 100 bytes, and the number of bytes of the third audio data is a difference between the number of bytes of the first audio data and the number of bytes of the second audio data, that is, the number of bytes of the third audio data is 100 bytes. Since the number of bytes of the header of the audio file may be the same in general, the number of bytes of the second header may be determined to be 20 bytes. The total byte number of the second audio file is the sum of the byte number of the second file header and the byte number of the third audio data, i.e. the total byte number of the second audio file is 120 bytes. The corresponding second header data may include: the total byte number of the second audio file is 120 bytes and the byte number of the third audio data is 100 bytes; at this time, a second audio file may be generated based on the second header data and the third audio data.

In this implementation manner, in response to a selection operation of a user on a text on the text editing interface, a voice short message selected by the selection operation may be determined, and the selected voice short message is deleted, so as to obtain a remaining voice short message. The method is suitable for the condition that the content needing to be deleted in the first voice short message is less, and the user can finish editing operation by executing selection operation for a few times, so that the method is convenient and fast.

In practical applications, before deleting the selected voice short message, the method may further include: displaying a first operation option for deleting the selected voice short message; and acquiring the trigger operation of the first operation option.

Correspondingly, the deleting the selected voice short message may include: and deleting the selected voice short message after the first operation option is triggered.

In practical application, if a user finds that the selection operation is performed incorrectly and selects a character which does not need to be edited, the user can return to a character editing interface by clicking an operation option for canceling the selection operation so as to perform the selection operation again. After the user determines that the characters selected by the selecting operation are the characters needing to be edited, the user can control the social software to delete the voice short message corresponding to the selected characters by clicking a first operation option for deleting the selected voice short message.

In this way, by setting a first operation option for performing a deletion operation on the selected voice short message and deleting the selected voice short message after the first operation option is triggered, the first voice short message is correspondingly edited under the condition that the user confirms that the execution of the selection operation is correct, the accuracy of obtaining the remaining voice short messages after editing the first voice short message is improved, the probability of generating wrong voice short messages due to user operation errors is reduced, and further the user experience is improved.

In the above-mentioned second implementation manner of obtaining the clipped voice short message in response to the editing operation, step S204: the clipping the first voice short message in response to the editing operation to obtain a clipped voice short message may include:

And reserving the selected voice short message.

In practical applications, the retaining the selected voice short message may include: analyzing a first audio file corresponding to the first voice short message to obtain first file header data and first audio data; determining second audio data corresponding to the selected voice short message from the first audio data; determining second header data according to the first header data and the second audio data; and generating a second audio file according to the second file header data and the second audio data.

Specifically, as for the generation manner of the second audio file, the manner of generating the second audio file according to the generated second header data and the third audio data is basically the same as that of the first implementation manner of obtaining the clipped voice short message in response to the editing operation, and is not repeated here.

In this implementation manner, in response to a selection operation of a user on a text on the text editing interface, a voice short message selected by the selection operation may be determined, and the selected voice short message is reserved as a clipped voice short message. The method is suitable for the condition that the content needing to be reserved in the first voice short message is less, and the user can finish editing operation by executing selection operation for a few times, so that the method is convenient and fast.

In practical applications, before the retaining the selected voice short message, the method may further include:

displaying a second operation option for reserving the selected voice short message; and acquiring the trigger operation of the second operation option.

The retaining the selected voice short message may include:

and after the second operation option is triggered, retaining the selected voice short message.

In this way, by setting a first operation option for performing a retention operation on the selected voice short message and retaining the selected voice short message after the first operation option is triggered, the first voice short message is correspondingly clipped under the condition that the user confirms that the selection operation is executed without errors, the accuracy of obtaining the remaining voice short messages after clipping the first voice short message is improved, the probability of generating wrong voice short messages due to user operation errors is reduced, and further the user experience is improved.

After the user performs the editing operation to obtain the clipped voice short message, the clipped voice short message needs to be sent to the chat target, so that step S204: after the clipped voice short message is obtained, the method may further include:

acquiring the sending operation of the clipped voice short message; and responding to the sending operation, sending the clipped voice short message so that the clipped voice short message is presented in a session interface of the chat session.

In this implementation manner, a voice short message sending button may also be displayed on the session interface of the chat session of the social software, and the user may send the clipped voice short message to the chat object by clicking the voice short message sending button. Specifically, before the user clicks the voice short message sending button, the clipped voice short message may be presented in an input but unsent state, and after the user clicks the voice short message sending button, the clipped voice short message may be presented in the session interface in a sent state, so that the user can conveniently identify the voice state of each voice short message.

Based on the same idea, corresponding to the method in the embodiment, some embodiments of the present invention further provide a method for processing a voice short message, in which the main body is a terminal device or a server.

Fig. 8 is a flowchart illustrating another method for processing a voice short message according to some embodiments of the present invention. As shown in fig. 8, the process may include:

step S801: the server acquires a first voice short message uploaded by the terminal; the first voice short message is input into the terminal through a session interface of a chat session of social software of the terminal.

Step S802: the terminal acquires the trigger operation input by the user through a session interface of the chat session of the social software.

Step S803: and the terminal sends a voice recognition instruction to the server, wherein the voice recognition instruction is used for indicating the conversion of the first voice short message into corresponding text information.

In practical applications, the conversion process for the first voice short message may be performed locally by the terminal, or may be performed by the server. When executed by the server, the terminal may send a voice recognition instruction to the server to instruct the server to generate a conversion result for the first voice short message.

Step S804: the server identifies the first voice short message to obtain a conversion result; and the conversion result comprises the text information corresponding to the first voice short message.

Step S805: and the server sends the conversion result to the terminal.

Step S806: and the terminal displays a text editing interface aiming at the first voice short message according to the conversion result, wherein the text editing interface comprises the text corresponding to the text information.

Step S807: the method comprises the steps that a terminal obtains an editing instruction sent by the terminal, and the editing instruction is generated based on editing operation input through an editing interface of social software of the terminal.

Step S808: and the terminal sends an editing instruction to the server, wherein the editing instruction comprises information corresponding to the editing operation.

In practical applications, the clipping process for the first voice short message may be performed locally by the terminal or may be performed by the server. When executed by the server, the terminal may send editing instructions to the server to instruct the server to clip the first voice short message.

Step S809: and the server responds to the editing instruction, clips the first voice short message and obtains the clipped voice short message.

Step S810: and the server sends the clipped voice short message to the terminal. The terminal can obtain the processing result of clipping the first voice short message fed back by the server; and obtaining the clipped voice short message based on the processing result.

In the processing method of the voice short message provided in fig. 8, an interaction process between the terminal device and the server involved in the execution process of the method is given, and the server recognizes, converts and clips the first voice short message to reduce the load pressure of the mobile terminal.

Based on the processing method of the voice short message provided in fig. 8, the present invention also provides a processing method of the voice short message, and the execution subject of the method is the terminal. The steps of the method may include:

the terminal acquires the trigger operation input by the user through a session interface of the chat session of the social software.

Responding to the trigger operation, and displaying an editing interface aiming at the first voice short message; the first voice short message is input by the user through the conversation interface.

And acquiring the editing operation based on the editing interface.

And sending an editing instruction to a server, wherein the editing instruction comprises information corresponding to the editing operation.

And acquiring a processing result of clipping the first voice short message fed back by the server.

And obtaining the clipped voice short message based on the processing result.

Before displaying the editing interface for the first voice short message, the method may further include:

sending a voice recognition instruction to the server, wherein the voice recognition instruction is used for indicating that the first voice short message is converted into corresponding text information; obtaining a conversion result fed back by the server; and the conversion result comprises the text information corresponding to the first voice short message.

The displaying an editing interface for the first voice short message may include: and displaying a text editing interface aiming at the first voice short message according to the conversion result, wherein the text editing interface comprises the text corresponding to the text information.

According to the processing method of the voice short message with the execution main body being the mobile terminal, the terminal sends the voice recognition instruction for indicating to convert the first voice short message into the corresponding text information and the editing instruction containing the information corresponding to the editing operation of the user on the first voice short message to the server, receives the processing result fed back by the server for editing the first voice short message, and obtains the edited voice short message.

Based on the processing method of the voice short message provided in fig. 8, some embodiments of the present invention further provide another processing method of the voice short message, and an execution subject of the method is a server. The method can comprise the following steps:

the server acquires a first voice short message uploaded by the terminal; the first voice short message is input into the terminal through a session interface of a chat session of social software of the terminal.

And acquiring an editing instruction sent by the terminal, wherein the editing instruction is generated based on editing operation input through an editing interface of social software of the terminal.

And responding to the editing instruction, clipping the first voice short message to obtain a clipped voice short message.

And sending the clipped voice short message to the terminal.

Before the obtaining of the editing instruction sent by the terminal, the method may further include:

the server acquires a voice recognition instruction sent by the terminal, wherein the voice recognition instruction is used for indicating the conversion of the first voice short message into corresponding text information; identifying the first voice short message to obtain a conversion result; the conversion result comprises text information corresponding to the first voice short message; and sending the conversion result to the terminal.

In the method for processing the voice short message with the execution main body as the server, the server feeds back the recognition and conversion result of the first voice short message and the corresponding clipping processing result to the mobile terminal by receiving and responding to the voice recognition instruction which is sent by the terminal and used for indicating the conversion of the first voice short message into the corresponding text information and the editing instruction which comprises the information corresponding to the editing operation of the user on the first voice short message, so that the terminal does not need to recognize, convert and clip the first voice short message, the load pressure of the mobile terminal can be reduced, the operation speed of the mobile terminal can be improved, and the user experience can be further improved.

Based on the same idea, some embodiments of the present invention also provide an apparatus for information processing corresponding to the above-mentioned method, as shown in fig. 9, the apparatus 900 includes a memory 930 for storing computer program instructions 920 and a processor 910 for executing the program instructions 920, wherein when the computer program instructions are executed by the processor, the apparatus is triggered to execute a method for processing a voice short message as provided in the above-mentioned embodiments.

For example, the computer program instructions, when executed by the processor, trigger the apparatus to perform the steps of:

acquiring a trigger operation input by a user through a session interface of a chat session of social software;

responding to the trigger operation, and displaying an editing interface aiming at the first voice short message; the first voice short message is input by the user through the conversation interface;

acquiring editing operation based on the editing interface;

and responding to the editing operation, clipping the first voice short message to obtain a clipped voice short message.

Alternatively, the computer program instructions, when executed by the processor, trigger the apparatus to perform the steps of:

the method comprises the steps that a terminal obtains a trigger operation input by a user through a session interface of a chat session of social software;

acquiring editing operation based on the editing interface;

sending an editing instruction to a server, wherein the editing instruction comprises information corresponding to the editing operation;

acquiring a processing result fed back by the server for clipping the first voice short message;

and obtaining the clipped voice short message based on the processing result.

the server acquires a first voice short message uploaded by the terminal; the first voice short message is input into the terminal through a session interface of a chat session of social software of the terminal;

acquiring an editing instruction sent by the terminal, wherein the editing instruction is generated based on editing operation input through an editing interface of social software of the terminal;

responding to the editing instruction, clipping the first voice short message to obtain a clipped voice short message;

and sending the clipped voice short message to the terminal.

Based on the same idea, some embodiments of the present invention further provide a computer readable medium corresponding to the above method, where computer readable instructions are stored, and the computer readable instructions are executable by a processor to implement a method for processing a voice short message as provided in the above embodiments. The storage medium, such as: ROM/RAM, magnetic disk, optical disk, etc.

For example, the computer readable instructions, when executed by a processor, may perform the steps of:

acquiring editing operation based on the editing interface;

and obtaining the clipped voice short message based on the processing result.

and sending the clipped voice short message to the terminal.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the units may be implemented in the same software and/or hardware or in a plurality of software and/or hardware when implementing the invention.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to some embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments of the present invention are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present invention, and is not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A method for processing voice short message is characterized by comprising the following steps:

responding to the trigger operation, and displaying an editing interface aiming at the first voice short message; the first voice short message is input by the user through the conversation interface; the editing interface comprises a plurality of first voice short messages; the voice state of any one of the first voice short messages is any one of an input but not sent state, a sent state or a withdrawn state;

acquiring editing operation based on the editing interface, wherein the editing operation comprises the following steps: acquiring the selected operation of the plurality of pieces of the first voice short messages input based on the editing interface; acquiring splicing operation of a plurality of selected first voice short messages;

responding to the editing operation, clipping the first voice short message to obtain a clipped voice short message; wherein the clipping the first voice short message in response to the editing operation comprises: generating a second voice short message based on the plurality of selected first voice short messages; the second voice short message comprises the selected first voice short messages, and the playing sequence of the selected first voice short messages in the second voice short message is consistent with the selected sequence of the selected first voice short messages.

2. The method according to claim 1, wherein before displaying the editing interface for the first voice short message, the method further comprises:

acquiring character information corresponding to the first voice short message;

the displaying of the editing interface for the first voice short message comprises:

displaying a text editing interface aiming at the first voice short message, wherein the text editing interface comprises texts corresponding to the text information;

the obtaining of the editing operation based on the editing interface comprises:

and acquiring the selection operation of the characters on the character editing interface.

3. The method according to claim 1 or 2, wherein the obtaining of the trigger operation input by the user through the session interface of the social software comprises:

4. The method according to claim 3, wherein before the obtaining the first trigger operation of the user on the first voice short message that has been input but not sent, the method further comprises:

acquiring the playing operation of the user on the input and unsent first voice short message;

5. The method according to claim 1 or 2, wherein the trigger operation of acquiring the input of the user through the session interface of the chat session of the social software comprises:

acquiring a second trigger operation of the user on the sent first voice short message;

before the displaying the editing interface for the first voice short message, the method further comprises:

6. The method according to claim 2, wherein said clipping the first voice short message in response to the editing operation to obtain a clipped voice short message comprises:

determining a starting time and an ending time according to the characters selected by the selection operation, wherein the starting time is the pronunciation starting time corresponding to the first character in the characters selected by the selection operation in the first voice short message, and the ending time is the pronunciation ending time corresponding to the last character in the characters selected by the selection operation in the first voice short message;

determining the selected voice short message between the starting time and the ending time;

7. The method according to claim 6, wherein before deleting the selected voice short message, further comprising:

displaying a first operation option for deleting the selected voice short message;

acquiring a trigger operation for the first operation option;

the deleting the selected voice short message comprises:

and deleting the selected voice short message after the first operation option is triggered.

8. The method according to claim 2, wherein said clipping the first voice short message in response to the editing operation to obtain a clipped voice short message comprises:

and reserving the selected voice short message.

9. The method according to claim 8, wherein before retaining the selected voice short message, further comprising:

displaying a second operation option for reserving the selected voice short message;

acquiring a trigger operation for the second operation option;

the reserving the selected voice short message comprises:

10. The processing method according to claim 1, wherein after obtaining the clipped voice short message, further comprising:

acquiring the sending operation of the clipped voice short message;

and responding to the sending operation, sending the clipped voice short message so that the clipped voice short message is presented in a session interface of the chat session.

11. A method for processing voice short message is characterized by comprising the following steps:

sending an editing instruction to a server, wherein the editing instruction comprises information corresponding to the editing operation; the editing instruction is used for the server to clip the first voice short message and comprises the following steps: generating a second voice short message based on the plurality of selected first voice short messages; the second voice short message comprises the selected first voice short messages, and the playing sequence of the selected first voice short messages in the second voice short message is consistent with the selected sequence of the selected first voice short messages;

and obtaining the clipped voice short message based on the processing result.

12. The method according to claim 11, wherein before displaying the editing interface for the first voice short message, further comprising:

sending a voice recognition instruction to the server, wherein the voice recognition instruction is used for indicating that the first voice short message is converted into corresponding text information;

obtaining a conversion result fed back by the server; the conversion result comprises text information corresponding to the first voice short message;

and displaying a text editing interface aiming at the first voice short message according to the conversion result, wherein the text editing interface comprises the text corresponding to the text information.

13. A method for processing voice short message is characterized by comprising the following steps:

acquiring an editing instruction sent by the terminal, wherein the editing instruction is generated based on editing operation input through an editing interface of social software of the terminal; the editing interface comprises a plurality of first voice short messages; the voice state of any one of the first voice short messages is any one of an input but not sent state, a sent state or a withdrawn state; the editing instruction is used for acquiring the selected operation of the first voice short messages based on the input of the editing interface; acquiring splicing operation of a plurality of selected first voice short messages;

responding to the editing instruction, clipping the first voice short message to obtain a clipped voice short message; wherein the clipping the first voice short message in response to the editing instruction comprises: generating a second voice short message based on the plurality of selected first voice short messages; the second voice short message comprises the selected first voice short messages, and the playing sequence of the selected first voice short messages in the second voice short message is consistent with the selected sequence of the selected first voice short messages;

and sending the clipped voice short message to the terminal.

14. The method according to claim 13, wherein before the obtaining the editing instruction sent by the terminal, the method further comprises:

the server acquires a voice recognition instruction sent by the terminal, wherein the voice recognition instruction is used for indicating the conversion of the first voice short message into corresponding text information;

identifying the first voice short message to obtain a conversion result; the conversion result comprises text information corresponding to the first voice short message;

and sending the conversion result to the terminal.

15. An apparatus for information processing, characterized in that the apparatus comprises a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform the method of any of claims 1 to 14.

16. A computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement the method of any one of claims 1 to 14.