CN110943908A

CN110943908A - Voice message sending method, electronic device and medium

Info

Publication number: CN110943908A
Application number: CN201911072399.3A
Authority: CN
Inventors: 潘红
Original assignee: Shanghai Sheng Electronic Payment Services Ltd
Current assignee: Shanghai Sheng Electronic Payment Services Ltd
Priority date: 2019-11-05
Filing date: 2019-11-05
Publication date: 2020-03-31

Abstract

A method for sending a voice message, an electronic device and a computer readable medium are provided. Wherein the method comprises the following steps: in an instant messaging scene, responding to voice recording triggering operation, and recording voice messages; the voice message is played out, and/or the voice message is preprocessed; and responding to message sending triggering operation, and sending the voice message or the preprocessed voice message to a communication object in the instant communication scene. The method can improve the communication efficiency on one hand, can avoid sending the voice message with poor effect to the chat object on the other hand, and can meet the diversified and interesting voice message communication requirements of the user.

Description

Voice message sending method, electronic device and medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to a method and an apparatus for sending a voice message, an electronic device, and a computer-readable medium.

Background

With the rapid development of computer technology and internet technology and the popularization of mobile terminals such as mobile phones, instant messaging applications have gradually become a main tool for people to communicate and communicate, and by using instant messaging applications, people can send diversified instant messaging messages such as characters, pictures, videos and voices to each other, so that the communication mode is faster and more colorful.

When the communication is carried out through the voice message, a general implementation mode is that a user presses a voice input key in the instant messaging application for a long time, the voice message is input during the long-time pressing, and after the user releases a finger, the input voice message is immediately triggered to be sent to a communication object.

However, due to reasons such as environmental noise, a microphone being blocked, poor contact between a finger and a touch screen, sudden noise in an input process, and the like, the input voice message is poor in effect and cannot meet the expectation of a user, and according to the existing implementation mode, only after the voice message is sent, the user can know the input effect of the voice message and re-input and send the voice message under the condition of poor effect, on one hand, the communication efficiency is affected, on the other hand, sending the voice message with poor effect to a chat object can also cause negative effects to the user, and the use experience of the user is reduced. In addition, the voice message recorded by the existing implementation mode is monotonous, so that the voice message is difficult to leave a deep impression or arouse the interest of a communication object, and the diversified voice message communication requirements of users cannot be met.

Disclosure of Invention

An object of the present application is to provide a voice message transmitting method, an electronic device, and a computer readable medium.

A first aspect of the present application provides a method for sending a voice message, including:

in an instant messaging scene, responding to voice recording triggering operation, and recording voice messages;

the voice message is played out, and/or the voice message is preprocessed;

and responding to message sending triggering operation, and sending the voice message or the preprocessed voice message to a communication object in the instant communication scene.

A second aspect of the present application provides an electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the computer program when executing the computer program to perform the method of the first aspect of the application.

A third aspect of the present application provides a computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement the method of the first aspect of the present application.

According to the voice message sending method provided by the first aspect of the application, in an instant messaging scene, a voice message is recorded in response to a voice recording triggering operation, and then the voice message is played out and/or preprocessed; and responding to message sending triggering operation, and sending the voice message or the preprocessed voice message to a communication object in the instant communication scene. After recording the voice message, the method can play the voice message without immediately sending the voice message to a communication object, so that a user can listen or preview the voice message before sending the voice message, the user can know the effect of the voice message earlier, and can cancel sending and re-recording when the effect is not good, on one hand, the communication efficiency can be improved, on the other hand, the voice message with poor effect can be prevented from being sent to the chat object, and the use experience of the user is improved; in addition, after the voice message is recorded, the voice message can be preprocessed so as to improve the effect of the voice message or increase interesting contents and the like, so that after the preprocessed voice message is sent to a communication object, the communication object can be deeply impressed and the interest of the communication object can be aroused, and the diversified and interesting voice message communication requirements of users can be met.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 illustrates a flow chart of a method for sending a voice message according to some embodiments of the present application;

2(a) - (c) show schematic diagrams of various trigger interfaces provided by some embodiments of the present application;

3(a) - (c) show schematic diagrams of various audio editing interfaces provided by some embodiments of the present application;

FIG. 4 illustrates a flow chart of a method for sending a voice message according to some embodiments of the present application;

FIG. 5 illustrates a schematic diagram of a voice messaging apparatus provided in some embodiments of the present application;

FIG. 6 illustrates a schematic diagram of an electronic device provided by some embodiments of the present application;

FIG. 7 illustrates a schematic diagram of a computer-readable storage medium provided by some embodiments of the present application.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which this application belongs.

In addition, the terms "first" and "second", etc. are used to distinguish different objects, rather than to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

The embodiment of the application provides a method and a device for sending a voice message, an electronic device and a computer readable medium, which are exemplarily described below with reference to the accompanying drawings.

Referring to fig. 1, which shows a flowchart of a voice message sending method according to some embodiments of the present application, as shown in fig. 1, the voice message sending method may include the following steps:

step S101: and in an instant messaging scene, responding to voice recording triggering operation and recording voice messages.

The instant messaging scenario may refer to an internet-based instant messaging scenario, which may be implemented based on an instant messaging application (e.g., WeChat, QQ, nailing, etc.), wherein the instant messaging application may include any application having an instant messaging function.

The voice entry triggering operation may be a triggering operation based on touch input for a touch screen, for example, please refer to fig. 2(a), and the voice entry triggering operation may be a click operation (for example, a click operation is performed once to start recording and a click operation is performed again to end recording) of a voice entry key in an instant messaging application, a long-time press operation (recording is performed during a long-time press period), and the like.

In addition, the voice recording triggering operation may also be a triggering operation based on voice input for a voice interaction device (e.g., a smart speaker, a smart phone that starts a smart voice assistant, etc.), for example, when a user says "start recording" to a smart voice, the smart speaker may be triggered to start recording a voice message, and for example, after the smart phone plays a communication message sent by a communication object in an instant messaging application through the smart voice assistant, the user is asked whether to reply, the user answers "yes", and the smart phone may also be triggered to start recording the voice message, and the voice input of the "start recording" and "yes" may be used as the voice recording triggering operation, which are all modified embodiments of the embodiment of the present application and are within the protection scope of the present application.

Step S102: and performing external playing on the voice message, and/or performing preprocessing on the voice message.

The playing-out may refer to any processing manner that makes the voice message perceivable to a user, for example, the playing-out may include audio playing to implement playing-out by voice; for another example, the displaying may include displaying after converting the text, so as to display through the text, and the embodiment of the present application is not limited.

The preprocessing may include audio editing and/or adding a bitmap, and the embodiments of the present application are not limited.

According to the embodiment of the application, after the voice message is recorded, the voice message can be automatically played; or automatically preprocessing the voice message according to a preset preprocessing program or a preprocessing program used last time; in addition, the above-mentioned play-out and preprocessing can also be performed alternately, for example, the processed voice message can be played-out after each preprocessing.

By automatically playing out the voice message, the user can quickly know the content and the effect of the voice message without performing additional operation by the user; by automatically preprocessing the voice message, the effect of the voice message can be automatically improved or interesting content can be increased without additional operation performed by a user; the method can effectively improve the message processing and sending efficiency, avoid the problem of overlong waiting time of the communication object, and meet the real-time requirement on the message in the instant communication scene.

In addition, the step S102 may also be executed according to a trigger operation input by the user, so as to meet the requirements of autonomous processing and diversification of the user with higher flexibility. Similar to the voice entry triggering operation, the triggering operation in step S102 may be a triggering operation based on touch input for a touch screen, or may be a triggering operation based on voice input for a voice interaction device (for example, an intelligent sound box, a smart phone that turns on an intelligent voice assistant, and the like), and the embodiment of the present application is not limited.

For example, in some embodiments, the trigger operation is a touch input-based trigger operation for a touch screen, which may include a touch operation for an operation identifier, and the method may further include:

displaying at least one operation identifier when recording the voice message or after detecting that the voice message is recorded, wherein each operation identifier corresponds to a play-out mode or a preprocessing mode;

the step S102 may include:

and responding to the touch operation aiming at the operation identifier, and performing external playing on the voice message and/or preprocessing the voice message.

For example, referring to fig. 2(b) and 2(c), upon recording the voice message or after detecting that the recording of the voice message is completed, the operation marks of editing, matching, displaying after text conversion, playing and the like can be displayed on the input interface of the voice message, wherein, after the user finishes speaking the voice content, referring to fig. 2(b), the recording can be finished by sliding the finger upwards, the finger is slid to one of the operation marks, the voice message can be triggered to be played out or preprocessed according to the play-out mode or the preprocessing mode corresponding to the operation identifier. For another example, referring to fig. 2(c), after the user finishes speaking the voice content, the user clicks the voice entry key once again to finish recording, and then clicks one of the operation identifiers, and the voice message may be played or preprocessed in a play-out manner or a preprocessing manner corresponding to the operation identifier, which are all the modified embodiments of the embodiment of the present application, and are within the protection scope of the present application.

According to the embodiment, the human-computer interaction interface which is more visual and easy to operate can be provided by displaying the operation identifier, so that a user can more conveniently and accurately trigger the voice message to be preprocessed.

In other embodiments, the triggering operation is a triggering operation based on a touch input for a touch screen, and may include a gesture triggering operation, where the step S102 may include:

and responding to the detected gesture trigger operation, performing external playing on the voice message, and/or preprocessing the voice message, wherein each gesture trigger operation corresponds to one external playing mode or one preprocessing mode.

For example, the gesture triggering operation may include "slide a finger upward left", "slide a finger upward right", "draw a circle", "draw an english letter", and the like, and different gesture triggering operations may correspond to different playback modes or preprocessing modes, which are not described herein again.

According to the voice message preprocessing method and device, the voice message can be preprocessed in a gesture triggering operation mode, the gesture triggering operation has the advantages of being simple in operation and fast in speed, the playing or preprocessing can be triggered fast, and the user experience is improved.

In further embodiments, the preprocessing trigger operation may be a trigger operation based on a voice input, and the method may further include:

when the voice message is recorded or after the voice message is detected to be recorded, playing voice prompt information, wherein the voice prompt information is used for prompting a user to play the voice message out and/or preprocessing the voice message;

the step S102 may include:

and responding to a voice control instruction input by a user aiming at the voice prompt information, and playing out the voice message, and/or preprocessing the voice message, wherein the voice control instruction is used for indicating the playing out of the voice message, and/or preprocessing the voice message.

For example, after the user finishes speaking the voice content, the user continues to speak the voice commands of "end", "over", and the like, and then the recording can be finished, the voice interaction device can prompt the user whether to process the recorded voice message by playing the voice prompt information, the user answers "yes", the voice interaction device can further give out various preprocessing modes for the user to select, the user continues to answer "playing", that is, the voice interaction device can be triggered to play the voice message, if the user is not satisfied with the voice message, the voice command of "cancel sending" can be spoken to instruct the voice interaction device to cancel sending the voice message, and if the user is satisfied with the voice message, the voice command of "can send" can be spoken to instruct the voice interaction device to call the instant messaging application to send the voice message to the communication object.

In some modification examples, after the recording is finished, the voice interaction device may directly remind the user of "whether to listen before send", if the user answers "yes", the voice interaction device may be triggered to play the voice message, if the user is not satisfied with the voice message, a "cancel send" voice instruction may be spoken to instruct the voice interaction device to cancel sending the voice message, and if the user is satisfied with the voice message, a "send possible" voice instruction may be spoken to instruct the voice interaction device to invoke the instant messaging application to send the voice message to the communication object.

Through the embodiment, the voice message can be played and preprocessed aiming at the voice interaction equipment, the operation is convenient, rapid and efficient, and the use experience of the user in using the voice interaction equipment to carry out instant messaging can be effectively improved.

It should be noted that, in some embodiments, there may be multiple optional trigger operations for triggering execution of this step S102, for example, a first trigger operation, a second trigger operation, a third trigger operation, a fourth trigger operation, and the like, each trigger operation may correspond to one play mode or one preprocessing mode, and different trigger operations may correspond to different play modes or preprocessing modes, which is described below with reference to the embodiments, and the following embodiments may be understood with reference to fig. 2(b), fig. 2(c) and the above exemplary description.

In some embodiments, the playing out the voice message may include:

and playing the voice message.

In particular implementations, the voice message may be played in response to a first trigger.

For example, referring to fig. 2(b), the first trigger operation may be a touch operation that slides up from the position of the voice entry key to the operation identifier "play", so that the voice message may be triggered to play.

For another example, referring to fig. 2(c), the first trigger operation may also be a touch operation of clicking the operation identifier "play", so as to also trigger the playing of the voice message.

Through this embodiment, can play this voice message before sending this voice message, so that the user audits or listens in advance to this voice message, make the user know the effect of this voice message earlier, can further click the sending function sign when effectual, in order to trigger and send this voice message, can click cancellation sending function sign when the effect is not good in order to cancel sending this voice message, thereby can improve communication efficiency on the one hand, on the other hand can avoid sending the relatively poor voice message of effect for the chat object, avoid sending the relatively poor voice message of effect for the user brings the negative effects, promote user's use and experience.

In other embodiments, the displaying the voice message may include:

and converting the voice message into text information and then displaying the text information.

In a specific implementation, the voice message may be converted into text information and then displayed in response to a second trigger operation.

For example, referring to fig. 2(b), the second trigger operation may be a touch operation that slides up from the position of the voice entry key to an operation identifier "display after text conversion", so that the voice message may be triggered to be converted into text information and then displayed, so that the user may preview the message content.

For another example, referring to fig. 2(c), the second triggering operation may also be a touch operation of clicking an operation identifier "display after text conversion", so as to trigger the voice message to be displayed after being converted into text information, so that a user can preview message content.

In addition, the voice message is converted into the text message, and any voice-to-text technology provided by the prior art can be adopted, for example, the prior voice-to-text engine is called for realization, and the embodiment of the application is not limited.

By the embodiment, the voice message can be converted into the text message and then displayed before being sent, so that a user can preview the message content conveniently, the user can know the effect of the voice message earlier, and the problems of misspeaking, misspeaking and the like can be found as early as possible.

In addition, considering how long the recording duration of the voice message is when the voice message is played, how long the user needs to listen to the voice message, on one hand, the information transmission efficiency is low, and on the other hand, for a long voice message, for example, a voice message exceeding 40 seconds and 1 minute, the user often does not have patience to listen to the voice message, so that, by the implementation mode, the voice message with long duration can be converted into text information to be displayed, because the visual information transmission efficiency is far greater than the auditory information transmission efficiency, for example, after the voice message with 1 minute is converted into the text information, the user can only need to see the text information for about 10 seconds, and therefore, by the implementation mode, the user can be better helped to realize the preview of the voice message with long duration.

In other embodiments, the displaying the voice message may include:

and playing the voice message, converting the voice message into text information and then displaying the text information.

Through the implementation mode, the contents of the voice message can be played out to the user in a mode of combining characters and voice, so that the user can more quickly, accurately and comprehensively know the contents and effects of the voice message through the combination of vision and hearing, and the efficiency is higher.

In some further embodiments, the pre-processing the voice message may include: performing audio editing on the voice message;

in particular implementations, the voice message may be audio edited in response to a third trigger.

For example, referring to fig. 2(b), the third triggering operation may be a touch operation that slides up from the position of the voice entry key to the operation identifier "edit", so that editing of the voice message may be triggered.

For another example, referring to fig. 2(c), the third triggering operation may also be a touch operation of clicking an operation identifier "edit", so as to also trigger editing of the voice message.

The audio editing of the voice message may be implemented by calling any audio editing engine or engine editing software provided in the prior art, which is not limited in the embodiments of the present application

Through the implementation mode, the voice message can be edited before the voice message is sent, so that the effect of the voice message is improved or interesting content is increased, after the preprocessed voice message is sent to the communication object, the communication object can be deeply impressed and the interest of the communication object is aroused, and therefore diversified and interesting voice message communication requirements of users can be met.

Wherein the voice editing may include, but is not limited to: adjusting volume, adjusting speech rate, eliminating noise, changing voice, clipping, etc. The voice modification means changing the voice of the user to the voice of another object by changing the information such as the tone and the tone in the voice message, for example, changing the voice to the voice of a certain star, the voice of a certain cartoon character, the voice of a certain game character, a certain dialect, etc., and the present embodiment can be implemented by any voice modification technology or voice modification software provided in the prior art, and the embodiments of the present invention are not limited thereto and fall within the scope of the present invention.

By adjusting the volume, the problem that the voice message recording volume is too small can be solved; by adjusting the speed of speech, the problem that the speech message is difficult to hear due to too fast speed of speech or the problem that the listening interest of a communication object is influenced due to too slow speed of speech can be solved; by eliminating noise, the problem of unclear voice message caused by environmental noise can be solved; through changing voice, the interestingness of the voice message can be increased, and the listening and communication interests of a communication object can be improved; by means of the clipping, unnecessary or wrong negative content in the voice message can be clipped, the negative content in the voice message can be eliminated, and unnecessary voice content is prevented from being sent to a communication object by mistake.

On the basis of the foregoing embodiment, in some variations, the audio editing of the voice message may include:

displaying a plurality of audio editing icons, wherein each audio editing icon corresponds to one audio editing template;

determining a target audio editing icon selected by a user according to the selection operation of the user on the audio editing icon;

and calling an audio editing template corresponding to the target audio editing icon to edit the voice message.

According to the embodiment, the editing processing program can be templated in advance, so that the audio editing template is generated, and the complex audio editing processing can be automatically completed by using the audio editing template only by simple selection operation of a user, so that the method has the advantages of simplicity in operation, high efficiency, strong interestingness and the like.

In some variations, the audio editing template may include, but is not limited to: at least one of a volume adjustment template, a speech rate adjustment template, or a voice voicing template.

For example, please refer to fig. 3(a) -3 (c), which are schematic diagrams illustrating various audio editing interfaces provided by some embodiments of the present application, wherein in fig. 3(a) and 3(b), both the voice voicing program and the speech rate adjustment program are subjected to templating processing, so as to generate a voice voicing template and a speech rate adjustment template, and after the voice message is selected by the user, the voice message can be automatically vocalized according to the audio editing template selected by the user, which is simple, efficient, and interesting.

In addition, in some embodiments, in the audio editing interface, a plurality of function keys, such as "listen on trial", "send", "restore", "cancel edit", "cancel send", and the like in fig. 3(a), are also provided, so that the user can trigger the corresponding operation more conveniently and quickly.

For example, after the editing is completed, the user may listen to the processed voice message by clicking the "listen on" button, and if the editing effect is satisfactory, may click the "send" button to trigger step S103; if the editing effect is not satisfactory, a 'restore' button can be clicked to restore the initial state of the voice message so as to facilitate re-editing or directly sending; in addition, the user can click the 'cancel editing' button according to actual requirements to quit the current editing, so that the user can return to the interface shown in fig. 2(b), and the user can conveniently select other preprocessing modes for preprocessing, thereby effectively improving operability, meeting diversified processing requirements of the user and improving the user experience; in addition, the user can click the 'cancel sending' button at any time, so that the editing and sending of the voice message are cancelled, and the user returns to the interface shown in fig. 2(a), thereby facilitating the user to re-enter the voice message.

In addition, referring to fig. 3(c), after the volume adjustment program is subjected to templating, the generated volume adjustment template can be used for the user to autonomously adjust the volume adjustment parameters, so as to meet the audio editing requirements of the user for more refinement and higher degree of freedom.

In other embodiments of the present application, the preprocessing the voice message may include: adding a match graph to the voice message;

in particular implementations, a match graph may be added to the voice message in response to a fourth trigger operation.

For example, referring to fig. 2(b), the fourth triggering operation may be a touch operation that slides up from the position of the voice entry key to an operation identifier "match map", so that adding a match map to the voice message may be triggered.

For another example, referring to fig. 2(c), the fourth triggering operation may also be a touch operation of clicking an operation identifier "match map", so as to also trigger adding a match map to the voice message.

Through the embodiment, the matching picture can be added to the voice message before the voice message is sent, so that the display mode and the display effect of the voice message are enriched, the effect of the voice message is improved or interesting content is increased in a mode of combining sound and pictures, after the voice message added with the matching picture is sent to a communication object, a deep impression can be left on the communication object and the interest of the communication object is aroused, and therefore the diversified and interesting voice message communication requirements of users can be met.

The matching picture is added to the voice message, a picture can be selected and sent to a communication object along with the voice message, or an audio-visual message can be synthesized according to the selected picture and the voice message, and the embodiment of the application is not limited.

For example, in some embodiments, the adding a match to the voice message may include:

displaying at least one alternative picture;

determining a target picture selected by a user according to the selection operation of the user on the alternative picture;

generating an audio-visual message according to the target picture and the voice message;

the sending the voice message or the preprocessed voice message to the communication object in the instant communication scene may include:

and sending the video and audio message to a communication object in the instant communication scene.

In this embodiment, referring to fig. 3(a), a plurality of candidate pictures are displayed on the add-and-match processing interface for the user to select, and an audio/video message is synthesized according to the target picture selected by the user and the voice message. The audio-video file is generated according to the picture file and the audio file, and can be implemented by any audio-video editing technology or audio-video editing software provided by the prior art, and the embodiment of the application is not limited.

Through the implementation mode, the user can independently select the appropriate target picture to generate the audio-visual message and send the audio-visual message to the communication object, so that the display mode and the display effect of the voice message can be enriched according to the actual demand of the user, the effect of the voice message is improved or interesting content is increased, and the diversified and interesting voice message communication demand of the user can be met.

In addition to the above embodiments, in some modified embodiments, before displaying the at least one candidate picture, the method further includes:

determining emotion information of the user according to the voice message;

and selecting at least one alternative picture matched with the emotion information from a preset picture library.

The emotion of the user is analyzed according to the voice message of the user, and any emotion analysis technology based on voice provided by the prior art can be adopted, and the embodiment of the application is not limited.

Through the embodiment, the picture which can accurately express the current emotion of the user can be automatically recommended for the user according to the current emotion of the user, so that the generated video and audio message can also vividly and vividly transmit the current emotion of the user in the form of an image, the voice communication effect is further improved, the communication interest is increased, and the diversified and interesting voice message communication requirements of the user are better met.

It should be noted that the target picture or the alternative picture may be a static picture or a dynamic picture, and the embodiment of the present application is not limited.

For example, in some variations, the target picture is a dynamic picture;

before generating the audio-visual message according to the target picture and the voice message, the method further comprises:

adjusting the frame rate of the target picture according to the speech rate of the voice message;

generating an audio-visual message according to the target picture and the voice message, wherein the generating of the audio-visual message comprises the following steps:

and generating an audio-visual message according to the target picture and the voice message after the frame rate is adjusted.

Through the implementation mode, the frame rate of the target picture can be adjusted according to the speech speed of the voice message, so that the action frequency of the dynamic object in the dynamic picture is matched with the speech speed of the user, the playing effect and the interestingness of the audio-video message can be effectively improved, and the diversified and interesting voice message communication requirements of the user are further met.

In some further modified embodiments, the preprocessing the voice message may include:

and carrying out audio editing on the voice message, and adding a matching picture to the voice message.

In this embodiment, audio editing and adding matching can be combined to preprocess the voice message, so that the preprocessed voice message has a better effect or is more interesting.

In some modifications of the embodiments of the present application, the step S102 may include:

responding to the detection that the recording of the voice message is finished, playing the voice message outwards, and displaying at least one operation identifier, wherein each operation identifier corresponds to a preprocessing mode;

and responding to the touch operation aiming at the operation identifier, and preprocessing the voice message by adopting a preprocessing mode corresponding to the operation identifier.

For example, a user presses a "voice input key" for a long time to record a voice message, after releasing a finger, the user does not trigger to immediately send the voice message, but triggers to automatically play the voice message, so that the user can know the content and effect of the voice message, and at the same time, at least one operation identifier is displayed, so that the user can select to perform audio editing and/or adding a matching picture on the voice message at any time according to the play-out effect, wherein the user can directly step at the trigger of sending the voice message or click the cancel-sending function identifier to cancel sending the voice message during the play-out period or after the play-out is finished.

According to the embodiment, after the voice message is detected to be recorded, the voice message can be automatically played without manual operation of a user, and the operation identifier is displayed for the user to determine whether the voice message is edited, so that the requirement that the user can quickly know the content and the effect of the voice message can be met, the requirement that the user edits the voice message or adds a matching picture to improve the effect of the voice message or increase interestingness can also be met, the use habit of the user is logically better met, and the use experience of the user can be effectively improved.

It should be noted that, the above-mentioned embodiment may be selected by the user to execute or not, for example, a "voice preview" option may be set in the instant messaging application, and after the user clicks the option, the above-mentioned embodiment may be automatically executed, and after the voice message is recorded, the voice message is not directly sent, but is automatically played out, and the above-mentioned operation identifier is displayed to allow the user to trigger the preprocessing of the voice message at any time. In addition, corresponding to the option of "voice preview", an option of "send immediately after preview" may also be set, and after the user clicks on the option, the user may automatically trigger sending the voice message after playing the voice message out, so that after the voice message is recorded, the steps of playing out, sending, and the like may be automatically performed without manual involvement, which is more convenient and faster.

Through the implementation mode, the user can independently select the processing mode after the voice message is recorded, and the requirement of the user for customizing the voice message processing mode is better met.

Step S103: and responding to message sending triggering operation, and sending the voice message or the preprocessed voice message to a communication object in the instant communication scene.

According to the embodiment of the application, after the voice message is played or preprocessed, the user can further execute a message sending triggering operation to trigger sending of the voice message or the preprocessed voice message. It is easy to understand that, in the case that only the voice message is played in step S102, the content of the voice message is not changed, and accordingly, this step S103 may send the voice message; if step S102 includes preprocessing the voice message, the content of the voice message is changed, and accordingly, step S103 sends the preprocessed voice message.

Similar to the voice entry triggering operation and the preprocessing triggering operation, the message sending triggering operation may be a triggering operation based on touch input for a touch screen, or a triggering operation based on voice input for a voice interaction device (for example, an intelligent sound box, a smart phone that turns on an intelligent voice assistant, or the like).

The method for sending the voice message provided by the embodiment of the application can at least obtain the following beneficial effects: after recording the voice message, the voice message can be played without immediately sending the voice message to a communication object, so that a user can listen or preview the voice message before sending the voice message, the user can know the effect of the voice message earlier, and the sending and re-recording can be cancelled when the effect is not good, so that the communication efficiency can be improved, the voice message with poor effect can be prevented from being sent to the chat object, and the use experience of the user can be improved; in addition, after the voice message is recorded, the voice message can be preprocessed so as to improve the effect of the voice message or increase interesting contents and the like, so that after the preprocessed voice message is sent to a communication object, the communication object can be deeply impressed and the interest of the communication object can be aroused, and the diversified and interesting voice message communication requirements of users can be met.

It should be noted that the embodiments of the present application are not limited to be implemented in an instant messaging application, for example, with the development of an artificial intelligence technology, an electronic device such as an intelligent sound box and a mobile phone may be configured with an intelligent voice assistant (a voice interaction program implemented based on an artificial intelligence technology, for example, "Siri" configured in an iPhone mobile phone, or "mini art" configured in a huaji mobile phone), so as to become a voice interaction device, and a user may directly talk with the intelligent voice assistant and then call an instant messaging application to communicate with a communication object by using the intelligent voice assistant, which may also achieve the purpose of the embodiments of the present application.

In some modifications of the embodiments of the present application, before step S102, the method may further include:

and if the message sending triggering operation aiming at the voice message is detected, sending the voice message to a communication object in the instant communication scene.

Referring to fig. 2(b), if the user triggers a message sending triggering operation before performing step S102, for example, clicks the sending function identifier in fig. 2(b), it indicates that the user does not need to play or preprocess the voice message, and therefore, the user can directly trigger sending the voice message to the communication object in the instant communication scene.

By the implementation mode, the user can freely select whether to preprocess the voice message and then send the voice message according to actual requirements, and diversified voice message sending requirements of the user are met.

To better illustrate the examples of the present application, the following description is given with reference to specific examples as follows:

referring to fig. 4, which shows a flowchart of a voice message sending method according to some specific embodiments of the present application, the voice message sending method shown in fig. 4 may be understood with reference to the above description of the embodiment corresponding to fig. 1, and a part of the content is not described again, and the description of the embodiment corresponding to fig. 1 may also be understood with reference to fig. 4.

As shown in fig. 4, the voice message transmitting method may include the steps of:

the user presses a voice input key in the instant messaging application to record voice messages, and the instant messaging tool records the voice messages;

and if the user looses the finger in the recording process, directly sending the voice message.

If the user slides the finger upwards, recording of the voice message is finished, and according to the triggering operation of the user, the functions of voice playing selected by the user, character conversion and display, audio editing or image matching addition are judged;

and if the user selects the voice playing function or displays the voice message after converting the text, triggering to play the voice message, or displaying the voice message after converting the text for the user to listen to or preview in advance. If the user is satisfied with the voice message, a message sending triggering operation can be executed to trigger the sending of the voice message; if the user is not satisfied with the voice message, a cancellation trigger (e.g., clicking on a cancellation send function identifier) may be performed to cancel sending the voice message.

If the user selects the audio editing function, loading a plurality of audio editing templates for the user to select, determining the audio editing template selected by the user according to the selection operation of the user, then directly editing the voice message by using the audio editing template selected by the user, after the editing is finished, performing audition, and if the user is satisfied with the editing effect, executing message sending triggering operation to trigger sending of the edited voice message; if the user is not satisfied with the voice message, a cancel sending trigger operation may be performed to cancel sending the voice message.

If the user selects to add the matching function, displaying a plurality of alternative pictures for the user to select, determining a target picture selected by the user according to the selection operation of the user, synthesizing the target picture and the voice message to generate an audio-visual message, previewing the audio-visual message, and if the user is satisfied with the audio-visual message, executing a message sending triggering operation to trigger sending of the audio-visual message; if the user is not satisfied with the audio-visual message, a cancel sending triggering operation can be executed to cancel sending and record the voice message again.

By the embodiment, at least the following beneficial effects can be obtained:

after recording the voice message, the voice message does not need to be sent to a communication object immediately, but the voice message can be sent to the communication object after being played or preprocessed according to the triggering operation of the user, and because the playing mode comprises voice playing or display after text conversion, the voice message can be pre-listened or previewed before being sent by the user through the voice playing or display after text conversion, so that the user can know the effect of the voice message earlier, and can cancel sending and re-recording when the effect is not good, on one hand, the communication efficiency can be improved, on the other hand, the voice message with poor effect can be prevented from being sent to a chat object, and the use experience of the user can be improved; in addition, because the preprocessing mode comprises audio editing or adding matching pictures, after the voice message is preprocessed, the effect of the voice message can be improved or interesting contents are added, after the preprocessed voice message is sent to a communication object, a deep impression can be given to the communication object and the interest of the communication object can be aroused, and therefore diversified and interesting voice message communication requirements of users can be met.

In the foregoing embodiment, a voice message sending method is provided, and correspondingly, the present application also provides a voice message sending apparatus. The voice message sending device provided by the embodiment of the present application can implement the above-mentioned voice message sending method, and the voice message sending device can be implemented by software, hardware, or a combination of software and hardware. For example, the voice message transmitting apparatus may include integrated or separate functional modules or units to perform the corresponding steps of the above-described methods. Referring to fig. 5, a schematic diagram of a voice message sending apparatus according to some embodiments of the present application is shown. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

As shown in fig. 5, the voice message transmission apparatus 10 may include:

the voice message recording module 101 is configured to record a voice message in an instant messaging scene in response to a voice recording trigger operation;

a play-out or preprocessing module 102, configured to play out the voice message and/or preprocess the voice message;

and the voice message sending module 103 is configured to send the voice message or the preprocessed voice message to the communication object in the instant messaging scene in response to a message sending trigger operation.

In some variations of the embodiments of the present application, the play-out or preprocessing module 102 includes:

the voice playing unit is used for playing the voice message; and/or the presence of a gas in the gas,

and the voice-to-text unit is used for displaying the voice message after converting the voice message into text information.

the audio editing unit is used for performing audio editing on the voice message; and/or the presence of a gas in the gas,

and the matching image adding unit is used for adding a matching image to the voice message.

In some variations of the embodiments of the present application, the play-out or preprocessing module 102 includes: an audio editing unit;

the audio editing unit comprises:

the icon display subunit is used for displaying a plurality of audio editing icons, wherein each audio editing icon corresponds to one audio editing template;

the icon selection subunit is used for determining a target audio editing icon selected by the user according to the selection operation of the user on the audio editing icon;

and the templating processing subunit is used for calling the audio editing template corresponding to the target audio editing icon to edit the voice message.

In some variations of embodiments of the present application, the audio editing template comprises: at least one of a volume adjustment template, a speech rate adjustment template, or a voice voicing template.

In some variations of the embodiments of the present application, the play-out or preprocessing module 102 includes: a map matching adding unit;

the map matching adding unit comprises:

the alternative picture display subunit is used for displaying at least one alternative picture;

the target picture selection subunit is used for determining a target picture selected by the user according to the selection operation of the user on the alternative picture;

the video and audio message generating subunit is used for generating a video and audio message according to the target picture and the voice message;

the voice message sending module 103 includes:

and the video and audio message sending unit is used for sending the video and audio message to the communication object in the instant communication scene.

In some modified embodiments of the present application, the map matching adding unit further includes:

the user emotion determining subunit is used for determining emotion information of the user according to the voice message;

and the alternative picture selecting subunit is used for selecting at least one alternative picture matched with the emotion information from a preset picture library.

In some variations of embodiments of the present application, the target picture is a dynamic picture;

the map matching adding unit further comprises:

a frame rate adjusting subunit, configured to adjust a frame rate of the target picture according to a speech rate of the voice message;

the video and audio message generation subunit comprises:

and the frame rate adjusting message generating subunit is used for generating an audio-visual message according to the target picture and the voice message after the frame rate is adjusted.

In some variations of the embodiments of the present application, the apparatus 10 further comprises:

the operation identifier display module is used for displaying at least one operation identifier when the voice message is recorded or after the voice message is detected to be recorded;

the external placement or pre-processing module 102, comprising:

and the identification triggering unit is used for responding to the triggering operation aiming at the operation identification, and performing play-out on the voice message and/or preprocessing the voice message.

and the gesture triggering unit is used for responding to the detected gesture triggering operation, playing the voice message out and/or preprocessing the voice message.

the voice prompt information playing module is used for playing voice prompt information when the voice message is recorded or after the voice message is detected to be recorded, wherein the voice prompt information is used for prompting a user whether to play the voice message or not and/or preprocessing the voice message;

the external placement or pre-processing module 102, comprising:

the voice trigger unit is used for responding to a voice control instruction input by a user aiming at the voice prompt information, externally playing the voice message and/or preprocessing the voice message; the voice control instruction is used for indicating the voice message to be played out and/or preprocessing the voice message.

the automatic playing unit is used for playing the voice message in response to the detection that the recording of the voice message is finished, and displaying at least one operation identifier, wherein each operation identifier corresponds to a preprocessing mode;

and the preprocessing unit is used for responding to the touch operation aiming at the operation identifier and preprocessing the voice message by adopting a preprocessing mode corresponding to the operation identifier.

In some variations of the embodiments of the present application, the apparatus 10 further includes:

and the voice message sending module is used for sending the voice message to a communication object in the instant communication scene if the message sending triggering operation aiming at the voice message is detected.

The voice message sending apparatus 10 according to the embodiment of the present application has the same advantages as the voice message sending method according to the foregoing embodiment of the present application.

The embodiment of the present application further provides an electronic device corresponding to the voice message sending method provided in the foregoing embodiment, where the electronic device may be any electronic device with a voice processing capability, such as a mobile phone, a notebook computer, a tablet computer, a desktop computer, a smart watch, a smart sound box, and the like, so as to execute the voice message sending method.

Please refer to fig. 6, which illustrates a schematic diagram of an electronic device according to some embodiments of the present application. As shown in fig. 6, the electronic device 20 includes: the system comprises a processor 200, a memory 201, a bus 202 and a communication interface 203, wherein the processor 200, the communication interface 203 and the memory 201 are connected through the bus 202; the memory 201 stores a computer program that can be executed on the processor 200, and the processor 200 executes the voice message sending method according to any one of the foregoing embodiments when executing the computer program.

The Memory 201 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 203 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.

Bus 202 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 201 is configured to store a program, and the processor 200 executes the program after receiving an execution instruction, where the voice message sending method disclosed in any embodiment of the present application may be applied to the processor 200, or implemented by the processor 200.

The processor 200 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 200. The Processor 200 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 201, and the processor 200 reads the information in the memory 201 and completes the steps of the method in combination with the hardware thereof.

The electronic device provided by the embodiment of the application and the voice message sending method provided by the embodiment of the application have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the electronic device.

Referring to fig. 7, a computer-readable storage medium is shown as an optical disc 30, on which a computer program (i.e., a program product) is stored, where the computer program is executed by a processor to execute the voice message sending method provided by any of the foregoing embodiments.

It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.

The computer-readable storage medium provided by the above-mentioned embodiment of the present application and the voice message sending method provided by the embodiment of the present application are based on the same inventive concept, and have the same beneficial effects as the method adopted, operated or implemented by the application program stored in the computer-readable storage medium.

It should be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present disclosure, and the present disclosure should be construed as being covered by the claims and the specification.

Claims

1. A method for sending a voice message, comprising:

the voice message is played out, and/or the voice message is preprocessed;

2. The method of claim 1, wherein the playing out the voice message comprises:

playing the voice message; and/or the presence of a gas in the gas,

3. The method of claim 1, wherein the pre-processing the voice message comprises:

performing audio editing on the voice message; and/or the presence of a gas in the gas,

adding a match graph to the voice message.

4. The method of claim 3, wherein the pre-processing the voice message comprises: performing audio editing on the voice message;

the audio editing of the voice message comprises:

5. The method of claim 4, wherein the audio editing template comprises: at least one of a volume adjustment template, a speech rate adjustment template, or a voice voicing template.

6. The method of claim 3, wherein the pre-processing the voice message comprises: adding a match graph to the voice message;

adding a match graph to the voice message comprises:

displaying at least one alternative picture;

the sending the voice message or the preprocessed voice message to the communication object in the instant communication scene includes:

7. The method of claim 6, wherein before displaying the at least one alternative picture, further comprising:

determining emotion information of the user according to the voice message;

8. The method according to claim 6 or 7, wherein the target picture is a dynamic picture;

9. The method according to any one of claims 1 to 8, further comprising:

the playing out the voice message and/or preprocessing the voice message comprises:

10. The method according to any one of claims 1 to 8, wherein the playing out the voice message and/or preprocessing the voice message comprises:

11. The method according to any one of claims 1 to 8, further comprising:

12. The method according to any one of claims 1 to 8, wherein the playing out the voice message and/or preprocessing the voice message comprises:

13. The method according to any one of claims 1 to 12, wherein before the playing out the voice message and/or preprocessing the voice message, further comprising:

14. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor executes the computer program to implement the method according to any of claims 1 to 13.

15. A computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement the method of any one of claims 1 to 13.