WO2017167047A1 - 音频消息的处理方法及装置 - Google Patents

音频消息的处理方法及装置 Download PDF

Info

Publication number
WO2017167047A1
WO2017167047A1 PCT/CN2017/077257 CN2017077257W WO2017167047A1 WO 2017167047 A1 WO2017167047 A1 WO 2017167047A1 CN 2017077257 W CN2017077257 W CN 2017077257W WO 2017167047 A1 WO2017167047 A1 WO 2017167047A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
message
communication
text content
server
Prior art date
Application number
PCT/CN2017/077257
Other languages
English (en)
French (fr)
Inventor
张达平
张黎黎
黄益信
陈鋆
赖建冬
钟浩华
Original Assignee
阿里巴巴集团控股有限公司
张达平
张黎黎
黄益信
陈鋆
赖建冬
钟浩华
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 张达平, 张黎黎, 黄益信, 陈鋆, 赖建冬, 钟浩华 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017167047A1 publication Critical patent/WO2017167047A1/zh
Priority to US16/143,372 priority Critical patent/US11037568B2/en
Priority to US17/316,931 priority patent/US12046242B2/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/04Real-time or near real-time messaging, e.g. instant messaging [IM]
    • H04L51/046Interoperability with other network applications or services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/06Message adaptation to terminal or network requirements
    • H04L51/066Format adaptation, e.g. format conversion or compression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/18Information format or content conversion, e.g. adaptation by the network of the transmitted or received information for the purpose of wireless delivery to users or terminals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/565Conversion or adaptation of application format or content

Definitions

  • the present application relates to the field of communications technologies, and in particular, to a method and an apparatus for processing an audio message.
  • the communication application sends and receives text as a communication message by collecting text manually input by the user.
  • manual input has many limitations. For example, if the user needs to keep his eyes on the screen of the electronic device, manual input may bring great security risks when the user is in the driving state; for example, when the electronic device is large and cannot When holding with one hand, the user needs to hold both hands and complete the input operation, it is difficult to manually input through the other hand assuming that the user has a hand-held weight.
  • some communication applications eliminate the above limitation by adding an audio input function, enabling a user to more easily transmit and receive audio type communication messages.
  • the present application provides a method and an apparatus for processing an audio message, which can perform text conversion on an audio message in advance, thereby improving the response speed to the audio conversion requirement of the user.
  • a method for processing an audio message including:
  • the server identifies the type of communication message transmitted between the two communicating parties
  • the server acquires any one of the communication messages and pre-converts to the corresponding text content;
  • the server transmits the text content to the any of the communicating parties.
  • a method for processing an audio message including:
  • the local communication device When receiving the audio conversion command issued by the user for any communication message of the audio type, the local communication device initiates a corresponding audio conversion request to the server;
  • a method for processing an audio message including:
  • the local communication device pre-acquires the text content corresponding to any communication message of the audio type
  • the local communication device displays the pre-fetched text content when receiving an audio conversion command issued by the user for any of the communication messages.
  • a method for processing an audio message including:
  • the local communication device sequentially determines whether each audio segment that has been collected meets a preset segmentation rule
  • the local communication device segments the audio segment in real time and uploads it to a server, so that the audio segment is pre-converted by the server.
  • the text segments corresponding to all the audio segments are sequentially spliced by the server into the text content corresponding to the communication message.
  • an apparatus for processing an audio message including:
  • the identification unit enables the server to identify the type of communication message transmitted between the two communicating parties;
  • a pre-conversion unit when the type of any communication message is an audio type, causing the server to acquire the any communication message and pre-convert to the corresponding text content;
  • an apparatus for processing an audio message including:
  • the requesting unit when the local communication device receives the audio conversion command issued by the user for any communication message of the audio type, initiates a corresponding audio conversion request to the server;
  • the display unit is configured to enable the local communication device to receive the text content corresponding to the any communication message returned by the server, and associate and display the communication message with any one of the communication messages;
  • the text content is actively pre-converted by the server prior to receiving the audio conversion request.
  • an apparatus for processing an audio message including:
  • the pre-acquisition unit enables the local communication device to pre-acquire the text content corresponding to any communication message of the audio type
  • an apparatus for processing an audio message including:
  • the determining unit in the process of generating the communication message of the audio type, causes the local communication device to sequentially determine whether each of the collected audio segments meets the preset segmentation rule;
  • a processing unit when the audio segment meets the preset segmentation rule, causing the local communication device to slice and upload the audio segment to the server in real time, so that the server may
  • the audio segments are pre-converted into corresponding text segments, and the text segments corresponding to all the audio segments are sequentially spliced by the server into the text content corresponding to the communication message.
  • an apparatus for processing an audio message including:
  • the server determines an unresponsive audio message associated with the any of the communicating parties;
  • the server respectively obtains the text content corresponding to the any audio message and the unresponsive audio message, and returns to the any communication party.
  • an apparatus for processing an audio message including:
  • the local communication device When receiving the audio conversion command sent by the user for any audio message, the local communication device respectively determines the first text content corresponding to the any audio message, and the unresponsive audio message corresponding to any of the audio messages. Second text content;
  • the local communication device respectively displays the first text content and the any audio message, the second text content, and the unresponsive audio message.
  • an apparatus for processing an audio message including:
  • Determining the unit when receiving an audio conversion request from any of the communicating parties for any audio message And causing the server to determine an unresponsive audio message associated with any of the communicating parties;
  • the server separately obtains the text content corresponding to the any audio message and the unresponsive audio message, and returns to the any communication party.
  • an apparatus for processing an audio message including:
  • a determining unit when receiving an audio conversion command sent by the user for any audio message, causing the local communication device to respectively determine the first text content corresponding to the any audio message, and the non-response outside the any audio message The second text content corresponding to the audio message;
  • the display unit causes the local communication device to respectively display the first text content and the any audio message, the second text content, and the unresponsive audio message.
  • the present application can perform text conversion in advance, so that when the user has the audio conversion requirement, the user can immediately feed back the corresponding text content without waiting in the translation process, which helps to speed up the demand for the user. Respond to speed, which enhances the user's application experience.
  • FIG. 1 is a flowchart of a method for processing an audio message based on a server side according to an exemplary embodiment of the present application.
  • FIG. 2 is a flowchart of a method for processing an audio message based on a communication device side according to an exemplary embodiment of the present application.
  • FIG. 3 is a flowchart of a method for processing an audio message based on a communication device side according to a second embodiment of the present application.
  • FIG. 4 is a flowchart of a method for processing an audio message according to an exemplary embodiment of the present application.
  • FIGS. 5-8 are schematic diagrams of interfaces of a communication application based on a receiver side according to an exemplary embodiment of the present application.
  • FIG. 9 is a schematic diagram of an interface of a communication application based on a sender side according to an exemplary embodiment of the present application.
  • FIG. 10 is a communication device side based on a third embodiment of the present application.
  • FIG. 11 is a flowchart of another method for processing an audio message according to an exemplary embodiment of the present application.
  • FIG. 12 is a flowchart of still another method for processing an audio message according to an exemplary embodiment of the present application.
  • FIG. 13 is a flowchart of still another method for processing an audio message according to an exemplary embodiment of the present application.
  • FIG. 14 is a flowchart of another method for processing an audio message based on a server side according to an exemplary embodiment of the present application.
  • FIG. 15 is a flowchart of a method for processing an audio message based on a communication device side according to a fourth embodiment of the present application.
  • FIG. 16 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application.
  • FIG. 17 is a block diagram of a processing apparatus for an audio message based on a server side according to an exemplary embodiment of the present application.
  • FIG. 18 is a block diagram of a processing apparatus for an audio message based on a communication device side according to an exemplary embodiment of the present application.
  • FIG. 19 is a block diagram of a processing apparatus for an audio message based on a communication device side according to a second embodiment of the present application.
  • FIG. 20 is a block diagram of a processing apparatus for an audio message based on a communication device side according to a third embodiment of the present application.
  • FIG. 21 is a block diagram of a processing apparatus for an audio message based on a communication device side according to a fourth embodiment of the present application.
  • FIG. 22 is a block diagram of a processing apparatus for an audio message based on a communication device side according to a fifth embodiment of the present application.
  • the related art proposes a text conversion of the audio message. Specifically, when the user receives the audio type communication message, if the user does not conveniently answer, the server may initiate the communication for the communication. The audio conversion request of the message, the server can actively identify the audio data, and return the converted text content to the user for the user to read.
  • the server needs a certain time for the audio conversion operation of the communication message, so that after the user sends an audio conversion request to the server, it takes a long time to see the converted text content, and on the other hand, the user waits for a long time, adding The anxiety of the user's emotions, on the other hand, causes the user to not respond for a long time, so that the sender of the communication message can not get feedback for a long time, which not only affects the user's application experience, but also causes the communication efficiency between users. Greatly reduced.
  • FIG. 1 is a flowchart of a method for processing an audio message based on a server side according to an exemplary embodiment of the present application. As shown in FIG. 1 , the method may include:
  • step 102 the server identifies the type of communication message transmitted between the two parties.
  • Step 104 When the type of any communication message is an audio type, the server acquires any one of the communication messages and pre-converts to the corresponding text content.
  • Step 106 When it is determined that any of the communication parties has a conversion requirement for the any communication message, the server sends the text content to the any communication party.
  • the server may actively determine the conversion requirement of the communication party for the audio message; for example, when any of the communication parties belongs to the preset communication role during the communication process, the server may determine that the communication party has a conversion requirement, and Send the corresponding text content.
  • the server may be pre-defined as a default recipient having a conversion requirement, so that whenever an audio message exists, the server always pre-converts the corresponding text content and actively sends it to the recipient's communication device.
  • the communication device can directly retrieve and display the information that has been stored locally.
  • the text content without the need to download from the server in real time, so that even if the network is not in good condition, it does not affect the text content display of the audio message, which reduces the need for real-time network conditions.
  • the server may determine whether there is a conversion requirement according to the request situation of the communication party; for example, when receiving an audio conversion request of any communication party for any communication message, the server may determine the any communication party. There is a conversion requirement, and the pre-converted text content corresponding to any of the communication messages is returned to the any communication party.
  • the server returns the corresponding text content only when the communication party does have a demand; by accurately judging the real needs of the communication party, the number of interactions between the server and the communication device can be reduced, and the server and communication can be reduced.
  • the amount of communication data between devices helps to reduce the power consumption of communication devices.
  • the consumption of wireless traffic can be reduced and unnecessary users can be avoided. Loss of costs.
  • the server can actively and pre-convert the audio message and obtain the corresponding text content before the user proposes the audio conversion requirement, so when the server receives the audio conversion requirement from the user.
  • the text content can be immediately returned to the user without the user waiting for the server to convert the audio message
  • the waiting time of the recipient user is greatly shortened
  • the waiting time of the feedback received by the sender user of the opposite end is shortened, thereby It enhances the user experience of both parties and greatly improves the communication efficiency between the two parties.
  • FIG. 2 is a flowchart of a method for processing an audio message based on a communication device side according to an exemplary embodiment of the present application. As shown in FIG. 2, the method may include:
  • Step 202 When receiving the audio conversion command issued by the user for any communication message of the audio type, the local communication device initiates a corresponding audio conversion request to the server.
  • Step 204 The local communication device receives any of the communication cancellations returned by the server.
  • the text content corresponding to the content is displayed in association with any of the communication messages; wherein the text content is actively pre-converted by the server before receiving the audio conversion request.
  • the local communication device actively initiates an audio conversion request to the server based on the audio conversion command sent by the user to indicate its actual demand for audio conversion, and the server returns its required text content accordingly.
  • the server can immediately start from the server.
  • the corresponding text content is obtained, and there is no need to wait for the server to convert the audio message in real time, which helps to improve the user experience of the communication parties, and greatly improves the communication efficiency between the communication parties.
  • FIG. 3 is a flowchart of a method for processing an audio message based on a communication device side according to a second embodiment of the present application. As shown in FIG. 3, the method may include:
  • Step 302 The local communication device pre-acquires the text content corresponding to any communication message of the audio type.
  • the local communication device can pre-fetch text content from the server, and the text content is pre-converted by the server.
  • the text content may be actively pushed by the server to the local communication device; or the local communication device determines the type of the communication message as the audio type when determining the type of the communication message transmitted between the communication device and the peer communication device.
  • the audio conversion request can be initiated to the server to obtain the text content obtained by the server pre-conversion process.
  • the powerful processing capability of the server can be fully utilized, the execution efficiency of the pre-conversion processing of the audio message can be improved, and the processing performance requirement and processing resource of the local communication device can be reduced. Occupied, thereby reducing the power consumption of the local communication device.
  • the local communication device can perform pre-conversion processing on any communication message to obtain the text content; in other words, local pre-conversion processing of the audio message by the local communication device. For example, when the local communication device determines the type of the communication message transmitted between the communication device and the peer communication device, if it is determined that the type of any communication message is an audio type, the local pre-conversion processing may be performed to obtain the corresponding text content. . In this embodiment, by using this Pre-conversion processing can eliminate or reduce the need for the network, so it can be applied to more application scenarios.
  • Step 304 When receiving an audio conversion command issued by the user for the any communication message, the local communication device displays the pre-acquired text content.
  • the local communication device obtains the corresponding text content directly when the user issues an audio conversion command by pre-acquiring the text content. It does not require the user to wait during the conversion process, which helps to improve communication efficiency.
  • the user can output the audio conversion command without the need for the network environment, so even if the local communication device is not connected to the network, the user can still view
  • the text content to the corresponding audio message is suitable for the user to view the historical communication message in some special scenes.
  • FIG. 4 is an audio message provided by an exemplary embodiment of the present application.
  • a flowchart of the processing method, as shown in FIG. 4, the method may include the following steps:
  • Step 402 The server acquires a communication message transmitted between the two communication parties.
  • the communication parties are completely equal in the technical solution of the present application, that is, each communication party can be used as the sender or receiver shown in FIG. 4; thus, in the embodiment shown in FIG. 4, the actual The above is a communication process between the two parties of the communication, and the corresponding sender and receiver are determined, and is used to illustrate the technical solution of the present application.
  • step 404 the server performs type identification on the communication message.
  • the communication message may include many types.
  • any communication message including audio data may be determined as an audio type, that is, an audio message, such as a voice message, a video message, etc.;
  • an audio message based on a voice message is taken as an example, but the application does not limit this.
  • Step 406 The server performs pre-conversion processing on the audio type communication message (ie, the audio message) to obtain corresponding text content.
  • the server may perform pre-conversion processing on the audio message in any manner related to the related art, and obtain corresponding text content.
  • the server may perform pre-conversion processing at any appropriate time to obtain corresponding text content, as long as the pre-conversion processing can be ensured before step 408. can.
  • the pre-conversion processing of the audio message by the server is independent of the audio conversion command initiated by the user for the audio message, and the pre-conversion processing is performed in advance by the server.
  • the server can immediately provide the text content that has been pre-converted to the user without performing real-time message conversion by the server, thereby avoiding long waits of the communication parties, Helps improve communication efficiency.
  • Step 408 The server receives an audio conversion request for receiving the policy for the audio message.
  • each communication party associated with the audio message can issue an audio conversion command, and the corresponding electronic device initiates an audio conversion request to the server ( It can also be understood that the user such as the sender or the receiver initiates an audio conversion request to the server; here, the receiver initiates an audio conversion request as an example for description.
  • the communication application can be an instant messaging application, for example, the instant messaging application can be a business Enterprise Instant Messaging (EIM), such as "DING Talk”.
  • EIM business Enterprise Instant Messaging
  • FIG. 5 assuming that the user "small white” sends a number of audio messages to the user "small black”, the user "small black” can press the long-press (or other pressure-triggered manner) to view the audio message.
  • the function option menu includes "handset playback”, “favorite”, “transfer text”, “delete” and other function options, then when the user "small black” select "turn text After the function option, it may be determined that an audio conversion command for the corresponding audio message is sent to the electronic device, and the electronic device initiates a corresponding audio conversion request to the server.
  • step 410 the server determines the response status of other audio messages.
  • Step 412 The server sends the text content corresponding to the audio message to the receiver.
  • step 414 the recipient displays the received text content.
  • the server may directly determine the text content corresponding to the audio message of the length 12s selected by the user in the "black” in FIG. 5, and return the text content. To the user “small black” to show the user "small black”.
  • the electronic device used by the user “small black” may expand the display area of the corresponding audio message; wherein the extended display area is divided into the first area and the second area; An area is used to show the corresponding audio message, and the second area is used to show the text content corresponding to the audio message.
  • the display area corresponding to the audio message (the display area may be the "bubble frame" shown in FIG.
  • the embodiment of the present application may include the foregoing step 410; correspondingly, in the technical solution of the present application, the server may determine the response status of the communication parties to the transmitted communication message;
  • the above audio message when receiving an audio conversion request initiated by the communication policy for the audio message, if the response status of other messages related to the any communication party is unresponsive, and the other message is an audio type
  • the server may return the text content corresponding to the other message in addition to the text content corresponding to the audio message.
  • the text content corresponding to the other messages is also obtained by the server actively and pre-transformed, and does not require the communication party to wait for the server to perform the conversion in real time.
  • the server can actively deliver the text content corresponding to all three audio messages; correspondingly, as shown in FIG. 8, the electronic device used by the user "small black” can Display area for three audio messages respectively
  • the domain expands and shows the corresponding text content, including "I am not convenient to type, direct voice", "about the last contract offer”, “add three more points", etc., so that on the one hand, the user can be simplified.
  • Black trigger operation (ie, issue an audio conversion command, or initiate an audio conversion request), which can be used to view all unresponsive audio messages by one trigger, and on the other hand can help the user "small black” to multiple The responsive audio message is simultaneously viewed. Compared with separately viewing the text content corresponding to each audio message, it obviously has better readability and reading consistency, and is convenient for the user to “small black” to the user “white”. The understanding of communication intent helps to improve communication efficiency.
  • the response status of each communication message may be determined and processed by the recipient.
  • the electronic device used by the receiver can determine the response status of the recipient to the received audio message of the audio type; wherein, when receiving the audio conversion command issued by the receiver for any audio message, if there is a An unresponsive communication message of an audio type other than any of the audio messages, the electronic device is in the audio conversion request initiated by the server, the audio conversion request is not only related to the any audio message (ie, can be used to obtain the The text content corresponding to the audio message is also related to other unresponsive communication messages (ie, can be used to obtain the text content corresponding to the other unresponsive communication message).
  • the electronic device detects that there is still a second audio message, a third audio message, and a response of the two audio messages. If the status is unresponsive, the electronic device initiates an audio conversion request for the three audio messages to the server, thereby obtaining the text content of the three audio messages returned by the server, and displaying the manner as shown in FIG.
  • the electronic device initiates an audio conversion request for the three audio messages to the server, thereby obtaining the text content of the three audio messages returned by the server, and displaying the manner as shown in FIG.
  • step 416 the server notifies the sender of the response status of the audio message.
  • a black dot can be displayed near the communication message to indicate that it is in an unresponsive state.
  • the electronic device of the user "small black” can determine that the audio message is responded, thereby eliminating the first line as shown in FIG. Black dots near the audio message.
  • the server when the user "small black” initiates an audio conversion request only for the first audio message, the server returns all three audio messages corresponding to the user "small black”. The text content, the server can think that the three audio messages correspond to the responded state, and inform the user of the "white” electronic device to make it in three Near the audio message are marked “read.”
  • FIG. 10 is a flowchart of a method for processing an audio message based on a communication device side according to a third embodiment of the present application. As shown in FIG. 10, the method may include:
  • Step 1002 In the process of generating an audio type communication message, the local communication device sequentially determines whether each of the collected audio segments meets a preset segmentation rule.
  • Step 1004 When any audio segment meets the preset segmentation rule, the local communication device segments and uploads any of the audio segments to the server in real time.
  • the server sequentially receives the audio segments that the local communication device divides and uploads in real time according to the preset rule, and pre-converts each audio segment into a corresponding text segment; then, the server will all the text segments. Splicing in sequence to obtain the text content corresponding to the entire audio message.
  • the segmentation rule may take various forms, such as combining one or more dimensions in a plurality of dimensions based on the length of time, the amount of data based on the audio segment, and the like. For example, when using the length-based segmentation rule, assuming that the entire audio message has a total of 12s, and the predefined segmentation duration is 2s, the real-time segmentation operation can be performed every time 2s is reached, and the The 2s audio clip is uploaded to the server, and the server can perform pre-conversion processing to obtain the corresponding text segment. Then, the entire audio can get 6 audio clips and 6 corresponding text segments, which are then integrated by the server. Splicing to correspond to the entire audio The text content corresponding to the message.
  • the real-time segmentation and uploading of the audio message by the electronic device of the sender enables the sender to input the audio message while the server can have almost no delay.
  • the server can complete the pre-conversion processing of the audio message more quickly than if the audio message is completely input and then uploaded to the server.
  • Corresponding text content so that even if the receiver initiates an audio conversion request immediately after receiving the audio message, the server can ensure that the pre-conversion process is completed before receiving the audio conversion request, and then returns to the corresponding immediately after receiving the audio conversion request.
  • the communication parties can avoid the inefficiency and mis-input problems during manual typing through audio input, and also solve the delay waiting problem when the audio is converted into text, that is, the audio input is also taken into consideration.
  • Quick and convenient and no delay in text communication it helps to mention Communication efficiency of communication between the two sides.
  • FIG. 11 is a flowchart of another method for processing an audio message according to an exemplary embodiment of the present application. As shown in FIG. 11, the method may include the following steps:
  • Step 1102 The server acquires a communication message transmitted between the two communication parties.
  • step 1104 the server performs type identification on the communication message.
  • Step 1106 The server performs pre-conversion processing on the audio type communication message (ie, the audio message) to obtain corresponding text content.
  • steps 1102-1106 may refer to steps 402-406 in the embodiment shown in FIG. 4, and details are not described herein again.
  • Step 1108 The server sends the text content corresponding to the audio message to the receiver.
  • the server defaults to that the receiver has an audio conversion requirement for all audio messages, and thus not only the text content corresponding to all the audio messages is obtained through the pre-conversion process, but also the text content is actively pushed to the receiver.
  • Step 1110 The receiving party's communication device receives the audio conversion command of the receiving policy for the audio message.
  • step 1112 the receiving communication device determines the response status of other audio messages.
  • Step 1114 The communication device of the receiving party displays the text content.
  • the server before the receiving party initiates the audio conversion command, the server has pre-converted the processing to obtain the corresponding text content, and actively pushes the communication device to the receiving device; in other words, the receiving party's communication device can be considered to be receiving Before the audio conversion command, the text content corresponding to the audio message has been "pre-acquired". Therefore, when the receiving party initiates an audio conversion command, the communication device can immediately obtain and display the corresponding text content without waiting for the receiver to wait.
  • the present embodiment pre-acquires the text content to the local area of the communication device, so that the communication device directly retrieves the corresponding text content from the local after receiving the audio conversion command.
  • the communication device directly retrieves the corresponding text content from the local after receiving the audio conversion command.
  • FIG. 11 can still satisfy the user's needs because network support is not required.
  • the communication device may The text content of these audio messages is displayed and will not be described here.
  • Step 1116 The receiving communication device marks the audio message that has performed the text content display as the responded state, notifies the server of the responded state, and the server notifies the sender.
  • the communication device may add the response status of the audio message to the response status switching notification, send the response status switching notification to the server, and forward it to the sender by the server, thereby transmitting the communication device on the sender.
  • the corresponding audio message is correctly marked on it.
  • FIG. 12 is a flowchart of still another method for processing an audio message according to an exemplary embodiment of the present application. As shown in FIG. 12, the method may include the following steps:
  • step 1202 the communication parties perform the sending and receiving operations of the communication message.
  • step 1204 the communication device of the receiver performs type identification on the communication message.
  • Step 1206 when the audio message is identified, the receiving communication device initiates an audio conversion request to the server.
  • Step 1208 The server performs pre-conversion processing on the audio type communication message (ie, the audio message) to obtain corresponding text content.
  • step 1210 the server sends the text content corresponding to the audio message to the receiver.
  • the audio conversion request is initiated by the communication device to the server, and is not initiated based on the audio conversion command sent by the receiver; in other words, the communication device actively initiates the audio conversion command before the receiver actually issues the audio conversion command.
  • the server initiates an audio conversion request, so that the server performs pre-conversion processing and obtains corresponding text content, that is, the communication device implements a “pre-acquisition” operation on the text content corresponding to the audio message. Therefore, when the receiving party initiates an audio conversion command, the communication device can immediately obtain and display the corresponding text content without waiting for the receiver to wait.
  • the communication device in this embodiment actively initiates an audio conversion request to the server by performing type identification on the communication message, so as to trigger the server to perform pre-conversion processing, instead of the server self-starting pre-conversion. Processing, so that the communication device shares the execution process of the "type identification" function, reducing the processing load of the server.
  • Step 1212 The receiving party's communication device receives the audio conversion command of the receiving policy for the audio message.
  • step 1214 the receiving communication device determines the response status of other audio messages.
  • step 1216 the communication device of the receiving party displays the text content.
  • Step 1218 The receiving party's communication device marks the audio message that has performed the text content display as the responded state, notifies the server of the responded state, and the server notifies the sender.
  • the steps 1212-1218 may refer to steps 1110-1116 in the embodiment shown in FIG. 11, and details are not described herein again.
  • FIG. 13 is a flowchart of still another method for processing an audio message according to an exemplary embodiment of the present application. As shown in FIG. 13, the method may include the following steps:
  • step 1302 the communication parties perform the sending and receiving operations of the communication message.
  • Step 1304 The communication device of the receiver performs type identification on the communication message.
  • Step 1306 when the audio message is identified, the communication device of the receiver performs pre-conversion processing on the audio type communication message (ie, the audio message) to obtain the corresponding text content.
  • the communication device of the receiving party actively identifies the type of the communication message, and when determining to be an audio message, also actively performs pre-conversion processing on the audio message to obtain corresponding text content. Then, when the network environment is poor or there is no network, the receiving party's communication device can still "pre-acquire" the text content of the audio message, so that when the receiving party issues an audio conversion command, the text content can be displayed in time to avoid the receiver waiting.
  • the communication device When the network environment is unstable, if the communication device relies on the server to perform pre-conversion processing after receiving the audio message, the communication device may fail to smoothly initiate an audio conversion request to the server due to the unstable network environment, or the server cannot The text content of the pre-conversion processing is successfully sent to the communication device, which may cause the receiving device to fail to pre-acquire the corresponding text content before initiating the audio conversion command, so that the receiver needs to initiate an audio conversion request to the server in real time, no doubt Increased user wait time.
  • the pre-transition processing (or pre-fetching) scheme of any of the embodiments of the present application can optimize the user.
  • the experience of using For example, when the pre-conversion process is implemented on the server, by obtaining the text content in advance, the server and the communication device can obtain more time and opportunity to transmit the text content before the user initiates the audio conversion command, thereby avoiding the user requesting the conversion in real time. Due to network reasons, the text content cannot be transmitted or the transmission fails repeatedly.
  • Step 1308 The receiving party's communication device receives the audio conversion command of the receiving policy for the audio message.
  • step 1310 the receiving communication device determines the response status of other audio messages.
  • step 1312 the communication device of the receiving party displays the text content.
  • step 1314 the receiving party's communication device marks the audio message that has performed the text content display as the responded state, notifies the server of the responded state, and the server notifies the sender.
  • steps 1308-1314 may refer to steps 1110-1116 in the embodiment shown in FIG. 11, and details are not described herein again.
  • FIG. 14 is a flowchart of a method for processing an audio message based on a server side according to an exemplary embodiment of the present application. As shown in FIG. 14 , the method is applied to a server, and may include the following steps:
  • Step 1402 When receiving an audio conversion request from any of the communicating parties for any of the audio messages, the server determines an unresponsive audio message associated with the any of the communicating parties.
  • Step 1404 The server separately obtains the text content corresponding to the any audio message and the unresponsive audio message, and returns to the any communication party.
  • the server when the server receives an audio conversion request for any audio message, the other unresponsive audio message is actively associated, so that the user does not need to initiate audio conversion separately for each audio message, and all unresponsive can be obtained.
  • the text content corresponding to the audio message greatly simplifies user operations.
  • the user when the user is inconvenient to trigger an audio conversion command for an audio message, such as a user's hand-held object, and can only be operated by the other hand, the user only needs to initiate audio for an audio message through the technical solution of the present application.
  • the text content corresponding to all the audio messages can be read; in addition, when the content of the plurality of audio messages is highly correlated, the text content of the plurality of audio messages is actively presented to the user, so that the user can The content and logic of multiple audio messages collude with each other to help improve reading and communication efficiency.
  • the server may pre-convert all audio messages and obtain corresponding text content, and when receiving the audio conversion request, the server only needs to separately find any of the above audio messages and does not respond.
  • the pre-converted text content corresponding to the audio message may be used; the technical solution in this scenario may refer to step 410 and the like in the embodiment shown in FIG. 4, and details are not described herein again.
  • the server may respectively convert any audio message and the unresponsive audio message into corresponding text content in real time, and return to the user for display;
  • the server may respectively convert any audio message and the unresponsive audio message into corresponding text content in real time, and return to the user for display;
  • FIG. 15 is a communication device side based on an exemplary embodiment of the present application.
  • Step 1502 When receiving an audio conversion command sent by the user for any audio message, the local communication device respectively determines the first text content corresponding to the any audio message, and the unresponsive audio other than the any audio message. The second text content corresponding to the message.
  • Step 1504 The local communication device respectively displays the first text content and the any audio message, the second text content, and the unresponsive audio message.
  • the communication device when the communication device receives the audio conversion command, the communication device actively determines the audio conversion in addition to any audio message for the audio conversion command.
  • the command does not respond to the unresponsive audio message, and displays the first text content and the second text content respectively corresponding to the two, so as to simplify the user operation and improve the efficiency of reading and communication, and details are not described herein again.
  • the communication device may pre-acquire the first text content and the second text content before receiving the audio conversion command.
  • the process may refer to step 302 in the embodiment shown in FIG. The details are not described here; or the communication device can acquire the first text content and the second text content in real time after receiving the audio conversion command.
  • the communication device can obtain the first text content and the second text content by any of the following methods:
  • the communication device can actively convert any audio message and unresponsive audio message into the first text content and the second text content; when the communication device adopts the pre-conversion processing mode, the process is implemented as shown in FIG. Step 1306 in the example is similar and will not be described again here.
  • the communication device can initiate an audio conversion request to the server to obtain the first text content and the second text content returned by the server.
  • the first text content and the second text content may be obtained by the server in real time according to the audio conversion request, that is, the server performs the audio conversion operation after receiving the audio conversion request, and the process is the same as step 1208 in the embodiment shown in FIG. Similarly, the description will not be repeated here; or the first text content and the second text content may be pre-converted by the server, and the process is similar to step 406 in the embodiment shown in FIG. 4, and details are not described herein again.
  • FIG. 16 shows a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application.
  • the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, and may of course include hardware required for other services.
  • the processor reads the corresponding computer program from the non-volatile memory into memory and then runs to form a processing device for the audio message at a logical level.
  • the present application does not exclude other implementation manners, such as a logic device or a combination of software and hardware, etc., that is, the execution body of the following processing flow is not limited to each logical unit, and may be Hardware or logic device.
  • the processing device of the audio message may include an identification unit, a pre-conversion unit, and a transmitting unit. among them:
  • the identification unit enables the server to identify the type of communication message transmitted between the two communicating parties;
  • a pre-conversion unit when the type of any communication message is an audio type, causing the server to acquire the any communication message and pre-convert to the corresponding text content;
  • the sending unit is specifically configured to:
  • the server is caused to determine that the communication party has the conversion requirement and send the text content.
  • the sending unit is specifically configured to:
  • it also includes:
  • a returning unit when receiving an audio conversion request of any of the communication parties for the any of the communication messages, if there is an unresponsive communication message of an audio type associated with the any of the communication parties, the server is further Any communication party returns unresponsive communication for all audio types The pre-converted text content corresponding to the message.
  • it also includes:
  • the determining unit after returning the pre-converted text content corresponding to the any communication message to the any communication party, causing the server to determine that the any communication message is switched to the responded state;
  • the notifying unit causes the server to notify the sender of the any communication message of the responded status.
  • the pre-conversion unit is specifically configured to:
  • Having the server sequentially receive audio segments that are separated and uploaded by the communication party according to a preset rule, and pre-convert each audio segment into a corresponding text segment;
  • the server sequentially splices all the text segments to obtain the text content.
  • the processing device of the audio message may include a request unit and a display unit. among them:
  • the requesting unit when the local communication device receives the audio conversion command issued by the user for any communication message of the audio type, initiates a corresponding audio conversion request to the server;
  • it also includes:
  • the audio conversion request when receiving an audio conversion command issued by the user for the any communication message, if there is an unresponsive communication message of an audio type other than the any communication message, the audio conversion request is further The unresponsive communication message is related.
  • it also includes:
  • An extension unit configured to receive, by the local communication device, text content returned by the server Afterwards, the display area of the corresponding communication message is expanded;
  • the extended display area is divided into a first area and a second area; the first is used to show a corresponding communication message, and the second area is used to show text content corresponding to the communication message.
  • the processing device of the audio message may include a pre-fetch unit and a display unit. among them:
  • the pre-acquisition unit enables the local communication device to pre-acquire the text content corresponding to any communication message of the audio type
  • the pre-acquisition unit is specifically configured to:
  • the local communication device performs pre-conversion processing on the any communication message to obtain the text content.
  • the pre-acquisition unit is specifically configured to:
  • the local communication device determines the type of the communication message transmitted between the communication device and the peer communication device, if it is determined that the type of the communication message is an audio type, the pre-acquisition of the any communication message is corresponding. Text content.
  • the display unit when receiving an audio conversion command sent by the user for the any communication message, if there is another communication message of an audio type in an unresponsive state, the display unit further enables the local communication device The pre-acquired text content corresponding to the other communication messages is respectively shown.
  • it also includes:
  • a notification unit after the local communication device respectively displays the pre-acquired text content corresponding to the other communication message, sends a response corresponding to the other communication message to the server
  • the status switch notification is to be notified by the server to the corresponding sender of the responded status of the other communication message.
  • the processing device of the audio message may include a determining unit and a processing unit. among them:
  • the determining unit in the process of generating the communication message of the audio type, causes the local communication device to sequentially determine whether each of the collected audio segments meets the preset segmentation rule;
  • a processing unit when the audio segment meets the preset segmentation rule, causing the local communication device to slice and upload the audio segment to the server in real time, so that the server may
  • the audio segments are pre-converted into corresponding text segments, and the text segments corresponding to all the audio segments are sequentially spliced by the server into the text content corresponding to the communication message.
  • the processing device of the audio message may include a determining unit and a returning unit. among them:
  • a determining unit when receiving an audio conversion request of any of the communicating parties for any of the audio messages, causing the server to determine an unresponsive audio message associated with the any one of the communicating parties;
  • the server separately obtains the text content corresponding to the any audio message and the unresponsive audio message, and returns to the any communication party.
  • the return unit is specifically configured to:
  • the processing device of the audio message may include a determining unit and a display unit. among them:
  • a determining unit when receiving an audio conversion command sent by the user for any audio message, causing the local communication device to respectively determine the first text content corresponding to the any audio message, and the non-response outside the any audio message The second text content corresponding to the audio message;
  • the display unit causes the local communication device to respectively display the first text content and the any audio message, the second text content, and the unresponsive audio message.
  • it also includes:
  • the pre-acquisition unit pre-acquires the first text content and the second text content by the local communication device before receiving the audio conversion command;
  • the real-time obtaining unit after receiving the audio conversion command, causes the local communication device to acquire the first text content and the second text content in real time.
  • it also includes:
  • An active conversion unit configured to: the local communication device actively converts the any audio message and the unresponsive audio message into the first text content and the second text content;
  • the requesting unit causes the local communication device to initiate an audio conversion request to the server to obtain the first text content and the second text content returned by the server; wherein the first text content and the first The second text content is obtained by the server in real time according to the audio conversion request, or is pre-converted by the server.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Telephonic Communication Services (AREA)
  • Information Transfer Between Computers (AREA)
  • Debugging And Monitoring (AREA)
  • Communication Control (AREA)

Abstract

一种音频消息的处理方法及装置,该方法包括:服务器识别通讯双方之间传输的通讯消息的类型(102);当任一通讯消息的类型为音频类型时,所述服务器获取所述任一通讯消息,并预转换为相应的文字内容(104);当确定任一通讯方存在对所述任一通讯消息的转换需求时,所述服务器向所述任一通讯方发送所述文字内容(106)。该方法可预先对音频消息进行文字转换,从而提升对用户的音频转换需求的响应速度。

Description

音频消息的处理方法及装置 技术领域
本申请涉及通讯技术领域,尤其涉及一种音频消息的处理方法及装置。
背景技术
通过安装有通讯应用的电子设备,用户之间可以实现通讯消息的收发,从而使用户间的沟通更为方便、快捷。
通常而言,通讯应用通过采集用户手动输入的文字,以作为通讯消息进行收发。然而,手动输入存在诸多方面的限制,比如用户需要双眼盯住电子设备的屏幕,则当用户处于驾驶状态时,手动输入可能带来极大的安全风险;再比如,当电子设备较大、无法单手握持时,用户需要双手同时握持并完成输入操作,则假定用户一只手提着重物时,将难以通过另一只手来完成手动输入。
在相关技术中,一些通讯应用通过添加音频输入功能,使得用户能够更为便捷地收发音频类型的通讯消息,而消除了上述限制。
发明内容
有鉴于此,本申请提供一种音频消息的处理方法及装置,可以预先对音频消息进行文字转换,从而提升对用户的音频转换需求的响应速度。
为实现上述目的,本申请提供技术方案如下:
根据本申请的第一方面,提出了一种音频消息的处理方法,包括:
服务器识别通讯双方之间传输的通讯消息的类型;
当任一通讯消息的类型为音频类型时,所述服务器获取所述任一通讯消息,并预转换为相应的文字内容;
当确定任一通讯方存在对所述任一通讯消息的转换需求时,所述服务器向所述任一通讯方发送所述文字内容。
根据本申请的第二方面,提出了一种音频消息的处理方法,包括:
本端通讯设备在接收到用户发出的针对音频类型的任一通讯消息的音频转换命令时,向服务器发起相应的音频转换请求;
本端通讯设备接收到所述服务器返回的所述任一通讯消息对应的文字内容,并与所述任一通讯消息进行关联展示;其中,所述文字内容由所述服务器在接收到所述音频转换请求之前主动预转换得到。
根据本申请的第三方面,提出了一种音频消息的处理方法,包括:
本端通讯设备预获取音频类型的任一通讯消息对应的文字内容;
当接收到用户发出的针对所述任一通讯消息的音频转换命令时,所述本端通讯设备示出预获取的所述文字内容。
根据本申请的第四方面,提出了一种音频消息的处理方法,包括:
在生成音频类型的通讯消息的过程中,本端通讯设备依次确定已采集到的每个音频片段是否符合预设切分规则;
当任一音频片段符合所述预设切分规则时,所述本端通讯设备将所述任一音频片段实时切分并上传至服务器,以由所述服务器将所述任一音频片段预转换为相应的文字片段,且所有音频片段对应的文字片段由所述服务器依次拼接为所述通讯消息对应的文字内容。
根据本申请的第五方面,提出了一种音频消息的处理装置,包括:
识别单元,使服务器识别通讯双方之间传输的通讯消息的类型;
预转换单元,当任一通讯消息的类型为音频类型时,使所述服务器获取所述任一通讯消息,并预转换为相应的文字内容;
发送单元,当确定任一通讯方存在对所述任一通讯消息的转换需求时,使所述服务器向所述任一通讯方发送所述文字内容。
根据本申请的第六方面,提出了一种音频消息的处理装置,包括:
请求单元,使本端通讯设备在接收到用户发出的针对音频类型的任一通讯消息的音频转换命令时,向服务器发起相应的音频转换请求;
展示单元,使本端通讯设备接收到所述服务器返回的所述任一通讯消息对应的文字内容,并与所述任一通讯消息进行关联展示;其中,所 述文字内容由所述服务器在接收到所述音频转换请求之前主动预转换得到。
根据本申请的第七方面,提出了一种音频消息的处理装置,包括:
预获取单元,使本端通讯设备预获取音频类型的任一通讯消息对应的文字内容;
展示单元,当接收到用户发出的针对所述任一通讯消息的音频转换命令时,使所述本端通讯设备示出预获取的所述文字内容。
根据本申请的第八方面,提出了一种音频消息的处理装置,包括:
确定单元,在生成音频类型的通讯消息的过程中,使本端通讯设备依次确定已采集到的每个音频片段是否符合预设切分规则;
处理单元,当任一音频片段符合所述预设切分规则时,使所述本端通讯设备将所述任一音频片段实时切分并上传至服务器,以由所述服务器将所述任一音频片段预转换为相应的文字片段,且所有音频片段对应的文字片段由所述服务器依次拼接为所述通讯消息对应的文字内容。
根据本申请的第九方面,提出了一种音频消息的处理装置,包括:
当接收到任一通讯方针对任一音频消息的音频转换请求时,服务器确定与所述任一通讯方相关的未响应音频消息;
所述服务器分别获取所述任一音频消息和所述未响应音频消息对应的文字内容,并返回至所述任一通讯方。
根据本申请的第十方面,提出了一种音频消息的处理装置,包括:
当接收到用户针对任一音频消息发出的音频转换命令时,本端通讯设备分别确定所述任一音频消息对应的第一文字内容,以及所述任一音频消息之外的未响应音频消息对应的第二文字内容;
所述本端通讯设备分别将所述第一文字内容与所述任一音频消息、所述第二文字内容与所述未响应音频消息进行关联展示。
根据本申请的第十一方面,提出了一种音频消息的处理装置,包括:
确定单元,当接收到任一通讯方针对任一音频消息的音频转换请求 时,使服务器确定与所述任一通讯方相关的未响应音频消息;
返回单元,使所述服务器分别获取所述任一音频消息和所述未响应音频消息对应的文字内容,并返回至所述任一通讯方。
根据本申请的第十二方面,提出了一种音频消息的处理装置,包括:
确定单元,当接收到用户针对任一音频消息发出的音频转换命令时,使本端通讯设备分别确定所述任一音频消息对应的第一文字内容,以及所述任一音频消息之外的未响应音频消息对应的第二文字内容;
展示单元,使所述本端通讯设备分别将所述第一文字内容与所述任一音频消息、所述第二文字内容与所述未响应音频消息进行关联展示。
由以上技术方案可见,本申请通过预先对音频消息进行文字转换,使得用户存在音频转换需求时,能够立即反馈相应的文字内容,而无需在翻译过程中进行等待,有助于加快对用户需求的响应速度,从而提升用户的应用体验。
附图说明
图1是本申请一示例性实施例提供的一种基于服务器侧的音频消息的处理方法的流程图。
图2是本申请一示例性实施例之一提供的一种基于通讯设备侧的音频消息的处理方法的流程图。
图3是本申请一示例性实施例之二提供的一种基于通讯设备侧的音频消息的处理方法的流程图。
图4是本申请一示例性实施例提供的一种音频消息的处理方法的流程图。
图5-8是本申请一示例性实施例提供的一种基于接收方侧的通讯应用的界面示意图。
图9是本申请一示例性实施例提供的一种基于发送方侧的通讯应用的界面示意图。
图10是本申请一示例性实施例之三提供的一种基于通讯设备侧的 音频消息的处理方法的流程图。
图11是本申请一示例性实施例提供的另一种音频消息的处理方法的流程图。
图12是本申请一示例性实施例提供的又一种音频消息的处理方法的流程图。
图13是本申请一示例性实施例提供的又一种音频消息的处理方法的流程图。
图14是本申请一示例性实施例提供的另一种基于服务器侧的音频消息的处理方法的流程图。
图15是本申请一示例性实施例之四提供的一种基于通讯设备侧的音频消息的处理方法的流程图。
图16是本申请一示例性实施例提供的一种电子设备的结构示意图。
图17是本申请一示例性实施例提供的一种基于服务器侧的音频消息的处理装置的框图。
图18是本申请一示例性实施例之一提供的一种基于通讯设备侧的音频消息的处理装置的框图。
图19是本申请一示例性实施例之二提供的一种基于通讯设备侧的音频消息的处理装置的框图。
图20是本申请一示例性实施例之三提供的一种基于通讯设备侧的音频消息的处理装置的框图。
图21是本申请一示例性实施例之四提供的一种基于通讯设备侧的音频消息的处理装置的框图。
图22是本申请一示例性实施例之五提供的一种基于通讯设备侧的音频消息的处理装置的框图。
具体实施方式
当用户采用音频类型的通讯消息时,存在一定的场景限制。举例而言,当用户在会议中接收到音频类型的通讯消息时,除非用户佩戴有蓝 牙耳机或其他可穿戴设备,否则可能由于无法及时收听该通信消息而造成相关事件的贻误。
为了解决音频类型的通讯消息存在的上述问题,相关技术中提出了对音频消息的文字转换,具体而言:用户在接收到音频类型通讯消息时,如果不方便接听,可以向服务器发起针对该通讯消息的音频转换请求,则服务器可以主动识别音频数据,并将转换得到的文字内容返回给用户,便于用户阅读。
然而,服务器对通讯消息的音频转换操作需要一定时间,使得用户在向服务器发出音频转换请求之后,需要等待较长时间才能够看到转换后的文字内容,一方面造成用户的长时间等待,增添了用户情绪的焦虑感,另一方面造成用户长时间不回复的现象,使通讯消息的发送方用户长时间无法得到反馈,不仅影响用户的应用体验,而且造成了用户之间的沟通效率的极大降低。
因此,本申请通过对音频消息的处理过程进行改进,以解决相关技术中的上述技术问题。为对本申请进行进一步说明,提供下列实施例:
图1是本申请一示例性实施例提供的一种基于服务器侧的音频消息的处理方法的流程图,如图1所示,该方法可以包括:
步骤102,服务器识别通讯双方之间传输的通讯消息的类型。
步骤104,当任一通讯消息的类型为音频类型时,所述服务器获取所述任一通讯消息,并预转换为相应的文字内容。
步骤106,当确定任一通讯方存在对所述任一通讯消息的转换需求时,所述服务器向所述任一通讯方发送所述文字内容。
在本实施例中,服务器可以主动判定通讯方对音频消息的转换需求;比如,当任一通讯方在通讯过程中属于预设通讯角色时,服务器可以判定该任一通讯方存在转换需求,并发送相应的文字内容。举例而言,服务器可以预定义为默认接收方存在转换需求,从而只要存在音频消息时,服务器总是预先转换出相应的文字内容,并主动发送给接收方的通讯设备。
在该实施例中,通过由服务器的预转换处理,并将文字内容主动发送给通讯设备,使得相应的通讯方确实需要执行音频转换时,该通讯设备可以直接调取并展示出已经存储于本地的文字内容,而无需从服务器上实时下载,从而即便当时网络状况不佳,也不影响对音频消息的文字内容展示,即降低了对实时网络状况的需求。
在本实施例中,服务器可以根据通讯方的请求情况,判定其是否存在转换需求;比如,当接收到任一通讯方针对任一通讯消息的音频转换请求时,服务器可以判定该任一通讯方存在转换需求,并向该任一通讯方返回该任一通讯消息对应的预转换的文字内容。
在该实施例中,服务器仅在通讯方确实存在需求时,才返回相应的文字内容;通过对通讯方的真实需求的准确判断,可以减少服务器与通讯设备之间的交互次数,降低服务器与通讯设备之间的通讯数据量,这一方面有助于降低通讯设备的功耗,另一方面对于采用无线移动通讯网络的通讯设备而言,可以减少无线流量的消耗,避免给用户造成不必要的费用损失。
由上述实施例可知,在本申请的技术方案中,服务器可以在用户提出音频转换需求之前,主动且预先对音频消息进行转换并得到相应的文字内容,因而当服务器接收到来自用户的音频转换需求时,可以立即将文字内容返回至用户,而无需用户等待服务器对音频消息进行转换,极大地缩短了接收方用户的等待时间,也缩短了对端的发送方用户收到反馈的等待时间,从而不仅提升了通讯双方的用户体验,而且极大地提升通讯双方之间的通讯效率。
对应于图1所示的实施例,在用户采用的通讯设备处存在多种相应的实施例,下面进行举例说明:
图2是本申请一示例性实施例之一提供的一种基于通讯设备侧的音频消息的处理方法的流程图,如图2所示,该方法可以包括:
步骤202,本端通讯设备在接收到用户发出的针对音频类型的任一通讯消息的音频转换命令时,向服务器发起相应的音频转换请求。
步骤204,本端通讯设备接收到所述服务器返回的所述任一通讯消 息对应的文字内容,并与所述任一通讯消息进行关联展示;其中,所述文字内容由所述服务器在接收到所述音频转换请求之前主动预转换得到。
在本实施例中,本端通讯设备基于用户发出的音频转换命令,主动向服务器发起音频转换请求,以表明其对于音频转换的切实需求,并由服务器相应返回其所需的文字内容。
由上述实施例可知,在本申请的技术方案中,基于服务器主动且预先对音频消息的预转换处理,本端通讯设备基于用户的音频转换命令而向服务器发起音频转换请求时,可以立即从服务器处获得相应的文字内容,不需要等待服务器对音频消息进行实时转换,有助于提升通讯双方的用户体验,而且极大地提升通讯双方之间的通讯效率。
图3是本申请一示例性实施例之二提供的一种基于通讯设备侧的音频消息的处理方法的流程图,如图3所示,该方法可以包括:
步骤302,本端通讯设备预获取音频类型的任一通讯消息对应的文字内容。
在本实施例中,本端通讯设备可以从服务器处预获取文字内容,该文字内容由该服务器预转换得到。其中,该文字内容可以由服务器主动推送至本端通讯设备;或者,本端通讯设备在确定与对端通讯设备之间传输的通讯消息的类型时,若确定任一通讯消息的类型为音频类型,则可以向服务器发起音频转换请求,以获得服务器预转换处理得到的文字内容。在该实施例中,通过利用服务器执行预转换处理,既能够充分利用服务器强大的处理能力,提升对音频消息的预转换处理的执行效率,又可以降低对本端通讯设备的处理性能需求和处理资源占用,从而降低本端通讯设备的功耗。
在本实施例中,本端通讯设备可以自行对任一通讯消息进行预转换处理,得到该文字内容;换言之,即本端通讯设备对音频消息的本地预转换处理。比如,本端通讯设备在确定与对端通讯设备之间传输的通讯消息的类型时,若确定任一通讯消息的类型为音频类型,则可以执行该本地预转换处理,以得到相应的文字内容。在该实施例中,通过采用本 地预转换处理,可以消除或降低对网络的需求,从而适用于更多应用场景。
步骤304,当接收到用户发出的针对所述任一通讯消息的音频转换命令时,所述本端通讯设备示出预获取的所述文字内容。
由上述实施例可知,在本申请的技术方案中,本端通讯设备通过对文字内容的预获取,使得在用户发出音频转换命令时,本端通讯设备可以直接获取并展示出相应的文字内容,而无需用户在转换过程中进行等待,有助于提升通讯效率。同时,通过将文字内容预获取在本端通讯设备的本地,使得用户可以在发出音频转换命令时,不存在对网络环境的需求,那么即便本端通讯设备并未连接至网络,用户仍然可以查看到相应音频消息的文字内容,适合于用户在一些特殊场景下对于历史通讯消息的查阅。
下面结合通讯过程中涉及到的发送方、接收方和服务端之间的交互过程,对本申请的技术方案进行详细描述;其中,图4是本申请一示例性实施例提供的一种音频消息的处理方法的流程图,如图4所示,该方法可以包括以下步骤:
步骤402,服务器获取通讯双方之间传输的通讯消息。
在本实施例中,通讯双方在本申请的技术方案中完全对等,即每个通讯方均可以作为图4所示的发送方或接收方;因而在图4所示的实施例中,实际上是针对通讯双方之间的任一次通讯过程,确定出相应的发送方和接收方,并用于对本申请的技术方案进行举例说明。
步骤404,服务器对通讯消息进行类型识别。
在本实施例中,通讯消息可以包括很多类型,本申请中可以将任意包含音频数据的通讯消息判定为音频类型,即音频消息,比如语音消息、视频消息等;下面结合的通讯应用的界面示意图中,以基于语音消息的音频消息为例进行说明,但本申请并不对此进行限制。
步骤406,服务器对音频类型的通讯消息(即音频消息)进行预转换处理,得到相应的文字内容。
在本实施例中,服务器可以采用相关技术中的任意方式,对音频消息进行预转换处理,并得到相应的文字内容。
需要说明的是,服务器在检测到某条通讯消息为音频类型之后,即可在任意恰当的时刻执行预转换处理,以得到相应的文字内容,只要能够确保该预转换处理在步骤408之前完成即可。换言之,服务器对音频消息的预转换处理,与用户对该音频消息发起的音频转换命令无关,该预转换处理是由服务器预先、主动完成的。
因此,当用户向服务器发起对音频消息的音频转换命令时,服务器可以立即将已经预转换得到的文字内容提供至该用户,而无需服务器实时执行消息转换,避免了通讯双方的长时间等待,有助于提升通讯效率。
步骤408,服务器接收到接收方针对该音频消息的音频转换请求。
在本实施例中,与该音频消息相关的每个通讯方,比如图4所示的发送方、接收方等,均可以发出音频转换命令,并由相应的电子设备向服务器发起音频转换请求(也可以理解为发送方或接收方等用户向服务器发起音频转换请求);此处以接收方发起音频转换请求为例进行说明。
假定用户“小白”与用户“小黑”之间实现通讯;其中,本申请并不限制两者采用的通讯应用的类型,该通讯应用可以为即时通讯应用,比如该即时通讯应用可以为企业即时通讯应用(Enterprise Instant Messaging,EIM),例如“钉钉(DING Talk)”等。如图5所示,假定用户“小白”向用户“小黑”发送了若干条音频消息,则用户“小黑”可以通过长按(或重压等其他触发方式)希望查看的音频消息,以调起图6所示的功能选项菜单,该功能选项菜单中包含“听筒播放”、“收藏”、“转文字”、“删除”等功能选项,则当用户“小黑”选取“转文字”功能选项后,可以判定为向电子设备发出了针对相应音频消息的音频转换命令,并由该电子设备向服务器发起相应的音频转换请求。
步骤410,服务器确定其他音频消息的响应状态。
步骤412,服务器将音频消息对应的文字内容发送至接收方。
步骤414,接收方对接收到的文字内容进行展示。
在一示例性实施例中,当不包含上述的步骤410时,服务器可以直接确定出用户“小黑”在图5中选中的长度为12s的音频消息对应的文字内容,并将该文字内容返回至用户“小黑”,以展示于用户“小黑”。
用户“小黑”采用的电子设备在接收服务器返回的文字内容后,可以对相应的音频消息的展示区域进行扩展;其中,扩展后的展示区域被划分为第一区域和第二区域;该第一区域用于示出相应的音频消息、该第二区域用于示出该音频消息对应的文字内容。比如图7所示,假定总共包含三条音频消息,而用户“小黑”触发了最上方的一条音频消息,则该音频消息对应的展示区域(该展示区域可以为图7所示的“气泡框”形式;当然,本申请并不对此进行限制)可以向下方扩展,则扩展后的展示区域被划分为相当于第一区域的上侧区域,以及相当于第二区域的下侧区域,其中上侧区域用于展示该音频消息的示意性图标,而下侧区域用于展示该音频消息对应的文字内容,比如“我现在不方便打字,直接语音吧”等。当然,本领域技术人员还可以采用其他方式对扩展区域进行功能划分,本申请并不对此进行限制。
在另一示例性实施例中,本申请的实施例中可以包含上述的步骤410;相应的,在本申请的技术方案中,服务器可以确定通讯双方对传输的通讯消息的响应状态;那么,针对上述的音频消息,当接收到任一通讯方针对该音频消息发起的音频转换请求时,若存在与该任一通讯方相关的其他消息的响应状态为未响应,且该其他消息为音频类型时,服务器在步骤412中除了返回上述音频消息对应的文字内容之外,还可以返回该其他消息对应的文字内容。当然,该其他消息对应的文字内容,也是由服务器主动、预先通过预转换处理而得到,并不需要通讯方等待服务器实时执行转换。
那么,如图5所示,当用户“小黑”仅针对第一条音频消息发起音频转换请求时,若同时存在第二条音频消息和第三条音频消息,且两者均为未响应状态,则无需用户“小黑”一一手动发起音频转换请求,服务器即可主动下发所有三条音频消息对应的文字内容;相应的,如图8所示,用户“小黑”采用的电子设备可以分别对三条音频消息的展示区 域进行扩展,并示出相应的文字内容,包括“我现在不方便打字,直接语音吧”、“关于上次的合同报价”、“再提高三个点”等,从而一方面可以简化用户“小黑”的触发操作(即发出音频转换命令,或发起音频转换请求),通过一次触发即可实现对所有未响应的音频消息的查看,另一方面可以帮助用户“小黑”对多条未响应的音频消息进行同时查看,这相比于分别单独查看每一条音频消息对应的文字内容,显然具有更佳的可读性和阅读连贯性,便于用户“小黑”对用户“小白”的通讯意图的理解,有助于提升通讯效率。
在又一示例性实施例中,除了服务器通过步骤410等来确定每条通讯消息的响应状态之外,可以由接收方对每条通讯消息的响应状况进行确定和处理。比如,接收方采用的电子设备可以确定该接收方对已接收的音频类型的通讯消息的响应状态;其中,当接收到该接收方发出的针对任一音频消息的音频转换命令时,若存在除该任一音频消息之外的音频类型的未响应通讯消息,则该电子设备在向服务器发起的音频转换请求,该音频转换请求不仅与该任一音频消息相关(即可以用于获取该任一音频消息对应的文字内容),还与其他的未响应通讯消息相关(即可以用于获取该其他的未响应通讯消息对应的文字内容)。比如,当用户“小黑”在电子设备上触发图5中的第一条音频消息后,该电子设备检测到还存在第二条音频消息、第三条音频消息,且两条音频消息的响应状态均为未响应,则该电子设备向服务器发起针对这三条音频消息的音频转换请求,从而同时获得服务器返回的这三条音频消息的文字内容,并通过如图8所示的方式进行展示,可参考上述实施例,此处不再赘述。
步骤416,服务器将该音频消息的已响应状态告知发送方。
在本实施例中,如图5所示,可以通过在通讯消息附近展示一黑色圆点,以表示其处于未响应状态。当用户“小黑”通过触发第一条音频消息而发出相应的音频转换请求后,用户“小黑”的电子设备可以判定为该音频消息被响应,从而如图7所示消除了第一条音频消息附近的黑色圆点。
同时,如图9所示,用户“小白”发出每条通讯消息后,用户“小 白”的电子设备上分别在每条通讯消息附近标示出其响应状态,比如“已读”对应于已响应状态、“未读”对应于未响应状态。那么,服务器在接收到用户“小黑”针对第一条音频消息的音频转换请求,并将预转换的相应文字内容返回给用户“小黑”之后,可以判定为该第一条音频消息由未响应状态切换至已响应状态,从而向该已响应状态告知给作为发送方的用户“小白”,因而图9中的第一条音频消息附近标示出“已读”,而第二条、第三条音频消息附近仍然标示为“未读”。当然,对应于图8所示的实施例,当用户“小黑”虽然仅针对第一条音频消息发起音频转换请求,但是基于服务器向用户“小黑”返回了全部三条音频消息对应的文字内容时,服务器可以认为三条音频消息均对应于已响应状态,并告知给用户“小白”的电子设备,以使其在三条音频消息附近均标示“已读”。
图10是本申请一示例性实施例之三提供的一种基于通讯设备侧的音频消息的处理方法的流程图,如图10所示,该方法可以包括:
步骤1002,在生成音频类型的通讯消息的过程中,本端通讯设备依次确定已采集到的每个音频片段是否符合预设切分规则。
步骤1004,当任一音频片段符合所述预设切分规则时,所述本端通讯设备将所述任一音频片段实时切分并上传至服务器。
在本实施例中,服务器依次接收到该本端通讯设备按照预设规则实时切分并上传的音频片段,并分别将每个音频片段预转换为相应的文字片段;然后,服务器将所有文字片段依次拼接,得到整条音频消息对应的文字内容。
在本实施例中,切分规则可以采用多种形式,比如基于时间长度、基于音频片段的数据量等多个维度中的一个或多个维度相结合。举例而言,当采用基于时间长度的切分规则时,假定整条音频消息共12s,而预定义的切分时长为2s,则每当达到2s时即可执行实时切分操作,并将该2s的音频片段上传至服务器,且服务器可以随即执行预转换处理,得到相应的文字片段;那么,整条音频一共可以得到6个音频片段,以及相应的6个文字片段,然后由服务器将其整合拼接为对应于整条音频 消息对应的文字内容。
在本实施例中,通过由发送方的电子设备(即上述的本端通讯设备)对音频消息的实时切分与上传,使得发送方在输入该音频消息的同时,服务器能够几乎不存在延迟地获得相应的音频片段,并随即执行对各个音频片段的预转换处理,相比于将音频消息完成输入后完整地上传至服务器,可使服务器更为迅速地完成对音频消息的预转换处理并得到相应的文字内容,从而即便接收方在接收到该音频消息后马上发起音频转换请求,服务器也能够确保在接收到该音频转换请求之前完成预转换处理,从而在接收到音频转换请求后立即返回相应的文字内容,那么通讯双方在通讯过程中,既可以通过音频输入而避免手动打字时的低效率和误输入问题,还解决了音频转换为文字时的延迟等待问题,即同时兼顾了音频输入时的快捷方便和文字交流时的无延迟,有助于提升通讯双方之间的沟通效率。
图11是本申请一示例性实施例提供的另一种音频消息的处理方法的流程图,如图11所示,该方法可以包括以下步骤:
步骤1102,服务器获取通讯双方之间传输的通讯消息。
步骤1104,服务器对通讯消息进行类型识别。
步骤1106,服务器对音频类型的通讯消息(即音频消息)进行预转换处理,得到相应的文字内容。
在本实施例中,步骤1102-1106可参考图4所示实施例中的步骤402-406,此处不再赘述。
步骤1108,服务器将音频消息对应的文字内容发送至接收方。
在本实施例中,服务器默认为接收方对所有音频消息均存在音频转换需求,因而不仅通过预转换处理得到所有音频消息对应的文字内容,而且主动将文字内容推送至接收方。
步骤1110,接收方的通讯设备接收到该接收方针对该音频消息的音频转换命令。
步骤1112,接收方的通讯设备确定其他音频消息的响应状态。
步骤1114,接收方的通讯设备对文字内容进行展示。
在本实施例中,在接收方发起音频转换命令之前,服务器已经预转换处理得到相应的文字内容,并主动推送至该接收方的通讯设备上;换言之,可以认为接收方的通讯设备在接收到音频转换命令之前,已经对音频消息对应的文字内容进行了“预获取”。因此,当接收方发起音频转换命令后,该通讯设备可以立即获得并展示出相应的文字内容,而无需接收方等待。
同时,相比于图4所示的实施例,本实施例通过将文字内容预获取至通讯设备的本地,使得该通讯设备在接收到音频转换命令后,直接从本地调取相应的文字内容即可,不存在对网络环境的需求。因此,对于一些场景下,比如用户希望在无网络环境下,对历史通讯消息中的音频消息进行文字转换时,图11由于不需要网络支持而仍然可以满足用户需求。
在本实施例中,与图4所示的步骤410相类似的,除了接收方直接发起音频转换命令的音频消息之外,若存在其他处于未响应状态的音频消息,该通讯设备可以一并对这些音频消息的文字内容进行展示,此处不再赘述。
步骤1116,接收方的通讯设备将执行了文字内容展示的音频消息标记为已响应状态,将该已响应状态告知服务器,并由服务器告知发送方。
在本实施例中,通讯设备可以将音频消息的已响应状态添加至响应状态切换通知,将该响应状态切换通知发送至服务器,并由服务器将其转发至发送方,从而在发送方的通讯设备上对相应的音频消息进行正确标记。
图12是本申请一示例性实施例提供的又一种音频消息的处理方法的流程图,如图12所示,该方法可以包括以下步骤:
步骤1202,通讯双方执行通讯消息的收发操作。
步骤1204,接收方的通讯设备对通讯消息进行类型识别。
步骤1206,当识别出音频消息时,接收方的通讯设备向服务器发起音频转换请求。
步骤1208,服务器对音频类型的通讯消息(即音频消息)进行预转换处理,得到相应的文字内容。
步骤1210,服务器将音频消息对应的文字内容发送至接收方。
在本实施例中,音频转换请求是由通讯设备主动向服务器发起的,而并非基于接收方发出的音频转换命令而发起;换言之,在接收方切实发出音频转换命令之前,该通讯设备通过主动向服务器发起音频转换请求,使得服务器执行预转换处理并得到相应的文字内容,即该通讯设备实现了对音频消息对应的文字内容的“预获取”操作。因此,当接收方发起音频转换命令后,该通讯设备可以立即获得并展示出相应的文字内容,而无需接收方等待。
同时,相比于图11所示实施例,本实施例中的通讯设备通过对通讯消息进行类型识别,主动向服务器发起音频转换请求,以触发服务器执行预转换处理,而非服务器自行启动预转换处理,从而使得该通讯设备分担了“类型识别”功能的执行过程,降低了服务器的处理负荷。
步骤1212,接收方的通讯设备接收到该接收方针对该音频消息的音频转换命令。
步骤1214,接收方的通讯设备确定其他音频消息的响应状态。
步骤1216,接收方的通讯设备对文字内容进行展示。
步骤1218,接收方的通讯设备将执行了文字内容展示的音频消息标记为已响应状态,将该已响应状态告知服务器,并由服务器告知发送方。
在本实施例中,步骤1212-1218可参考图11所示实施例中的步骤1110-1116,此处不再赘述。
图13是本申请一示例性实施例提供的又一种音频消息的处理方法的流程图,如图13所示,该方法可以包括以下步骤:
步骤1302,通讯双方执行通讯消息的收发操作。
步骤1304,接收方的通讯设备对通讯消息进行类型识别。
步骤1306,当识别出音频消息时,接收方的通讯设备对音频类型的通讯消息(即音频消息)进行预转换处理,得到相应的文字内容。
在本实施例中,接收方的通讯设备主动识别通讯消息的类型,并在确定为音频消息时,还主动完成对该音频消息的预转换处理,以得到相应的文字内容。那么,当网络环境差或无网络时,接收方的通讯设备仍然可以“预获取”音频消息的文字内容,使得接收方发出音频转换命令时,能够及时展示出该文字内容,避免接收方等待。
当网络环境不稳定时,通讯设备在接收到音频消息后,若依赖于服务器来执行预转换处理,则由于网络环境不稳定而可能导致通讯设备无法顺利向服务器发起音频转换请求,或者服务器无法将预转换处理的文字内容顺利发送至该通讯设备,那么可能导致接收方在发起音频转换命令之前,该通讯设备无法预获取到相应的文字内容,造成接收方需要实时向服务器发起音频转换请求,无疑增加了用户等待时间。
实际上,当网络环境不稳定时,通过本申请中任一实施例的预转换处理(或预获取)方案,即无论对音频消息的预转换处理在服务器或通讯设备上执行,均可以优化用户的使用体验。比如,当预转换处理在服务器上实现时,通过预先获得文字内容,那么在用户发起音频转换命令之前,服务器与通讯设备可以获得更多时间和机会来传输该文字内容,避免用户实时请求转换时,由于网络原因造成文字内容无法传输或反复出现传输失败的情况。
步骤1308,接收方的通讯设备接收到该接收方针对该音频消息的音频转换命令。
步骤1310,接收方的通讯设备确定其他音频消息的响应状态。
步骤1312,接收方的通讯设备对文字内容进行展示。
步骤1314,接收方的通讯设备将执行了文字内容展示的音频消息标记为已响应状态,将该已响应状态告知服务器,并由服务器告知发送方。
在本实施例中,步骤1308-1314可参考图11所示实施例中的步骤1110-1116,此处不再赘述。
图14是本申请一示例性实施例提供的一种基于服务器侧的音频消息的处理方法的流程图,如图14所示,该方法应用于服务器,可以包括以下步骤:
步骤1402,当接收到任一通讯方针对任一音频消息的音频转换请求时,服务器确定与所述任一通讯方相关的未响应音频消息。
步骤1404,所述服务器分别获取所述任一音频消息和所述未响应音频消息对应的文字内容,并返回至所述任一通讯方。
在本实施例中,服务器在接收到针对任一音频消息的音频转换请求时,主动相关联的其他未响应音频消息,使得用户无需针对每一音频消息分别发起音频转换,即可获得所有未响应音频消息对应的文字内容,从而极大地简化了用户操作。尤其是,当用户不便于触发对音频消息的音频转换命令时,比如用户一只手提着重物、仅能够通过另一只手操作,通过本申请的技术方案,用户仅需要对一条音频消息发起音频转换命令,即可读取所有音频消息对应的文字内容;再者,当多条音频消息之间的内容关联性较大时,通过将多条音频消息的文字内容主动呈现给用户,便于用户将多条音频消息的内容和逻辑相互串通,有助于提升阅读和沟通效率。
在本实施例的一种情况下,服务器可以对所有音频消息进行预转换并得到相应的文字内容,则当接收到音频转换请求时,服务器只需分别查找到上述的任一音频消息和未响应音频消息对应的预转换的文字内容即可;该场景下的技术方案可参考图4所示实施例的步骤410等,此处不再赘述。
在本实施例的另一种情况下,服务器可以在接收到音频转换请求后,分别将任一音频消息和未响应音频消息分别实时转换为对应的文字内容,并返回给用户进行展示;其中,对于每一单独音频消息的转换处理,可以参考相关技术中的处理过程,此处不再赘述。
图15是本申请一示例性实施例之一提供的一种基于通讯设备侧的 音频消息的处理方法的流程图,如图15所示,该方法应用于通讯设备,可以包括以下步骤:
步骤1502,当接收到用户针对任一音频消息发出的音频转换命令时,本端通讯设备分别确定所述任一音频消息对应的第一文字内容,以及所述任一音频消息之外的未响应音频消息对应的第二文字内容。
步骤1504,所述本端通讯设备分别将所述第一文字内容与所述任一音频消息、所述第二文字内容与所述未响应音频消息进行关联展示。
在本实施例中,与图14所示实施例相类似的,由通讯设备在接收到音频转换命令时,除了该音频转换命令针对的任一音频消息,该通讯设备还主动确定出该音频转换命令未针对的未响应音频消息,并通过将两者分别对应的第一文字内容和第二文字内容进行展示,以便于简化用户操作,并有助于提升阅读和沟通效率,此处不再赘述。
一方面,从对音频消息的转换时机而言,通讯设备可以在接收到音频转换命令之前,预获取第一文字内容和第二文字内容,该过程可以参考图3所示实施例中的步骤302,此处不再赘述;或者,通讯设备可以在接收到音频转换命令之后,实时获取第一文字内容和第二文字内容。
另一方面,无论是采用预获取或实时获取,通讯设备均可以通过下述任一方式获取第一文字内容和第二文字内容:
第一种方式下,通讯设备可以主动将任一音频消息和未响应音频消息转换为第一文字内容和第二文字内容;当通讯设备采用预转换的处理方式时,该过程与图13所示实施例中的步骤1306相似,此处不再赘述。
第二种方式下,通讯设备可以向服务器发起音频转换请求,以获得服务器返回的第一文字内容和第二文字内容。其中,第一文字内容和第二文字内容可以由服务器根据音频转换请求进行实时转换得到,即服务器在接收到音频转换请求后才执行音频转换操作,该过程与图12所示实施例中的步骤1208相似,此处不再赘述;或者,第一文字内容和第二文字内容也可以由服务器预转换得到,该过程与图4所示实施例中的步骤406相似,此处不再赘述。
图16示出了根据本申请的一示例性实施例的电子设备的示意结构图。请参考图16,在硬件层面,该电子设备包括处理器、内部总线、网络接口、内存以及非易失性存储器,当然还可能包括其他业务所需要的硬件。处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行,在逻辑层面上形成音频消息的处理装置。当然,除了软件实现方式之外,本申请并不排除其他实现方式,比如逻辑器件抑或软硬件结合的方式等等,也就是说以下处理流程的执行主体并不限定于各个逻辑单元,也可以是硬件或逻辑器件。
在一实施例中,请参考图17,在软件实施方式中,该音频消息的处理装置可以包括识别单元、预转换单元和发送单元。其中:
识别单元,使服务器识别通讯双方之间传输的通讯消息的类型;
预转换单元,当任一通讯消息的类型为音频类型时,使所述服务器获取所述任一通讯消息,并预转换为相应的文字内容;
发送单元,当确定任一通讯方存在对所述任一通讯消息的转换需求时,使所述服务器向所述任一通讯方发送所述文字内容。
可选的,所述发送单元具体用于:
当所述任一通讯方在通讯过程中属于预设通讯角色时,使所述服务器判定所述任一通讯方存在所述转换需求,并发送所述文字内容。
可选的,所述发送单元具体用于:
当接收到任一通讯方针对所述任一通讯消息的音频转换请求时,使所述服务器判定所述任一通讯方存在所述转换需求,并向所述任一通讯方返回所述任一通讯消息对应的预转换的所述文字内容。
可选的,还包括:
确定单元,使所述服务器确定通讯双方对传输的通讯消息的响应状态;
返回单元,当接收到任一通讯方针对所述任一通讯消息的音频转换请求时,若存在与所述任一通讯方相关的音频类型的未响应通讯消息,则使所述服务器还向所述任一通讯方返回所有音频类型的未响应通讯 消息对应的预转换的文字内容。
可选的,还包括:
判定单元,在向所述任一通讯方返回所述任一通讯消息对应的预转换的所述文字内容之后,使所述服务器判定所述任一通讯消息切换至已响应状态;
告知单元,使所述服务器将所述已响应状态告知所述任一通讯消息的发送方。
可选的,所述预转换单元具体用于:
使所述服务器依次接收通讯方按照预设规则实时切分并上传的音频片段,并分别将每个音频片段预转换为相应的文字片段;
所述服务器将所有文字片段依次拼接,得到所述文字内容。
在一实施例中,请参考图18,在软件实施方式中,该音频消息的处理装置可以包括请求单元和展示单元。其中:
请求单元,使本端通讯设备在接收到用户发出的针对音频类型的任一通讯消息的音频转换命令时,向服务器发起相应的音频转换请求;
展示单元,使本端通讯设备接收到所述服务器返回的所述任一通讯消息对应的文字内容,并与所述任一通讯消息进行关联展示;其中,所述文字内容由所述服务器在接收到所述音频转换请求之前主动预转换得到。
可选的,还包括:
确定单元,使所述本端通讯设备确定所述用户对已接收的音频类型的通讯消息的响应状态;
其中,当接收到所述用户发出的针对所述任一通讯消息的音频转换命令时,若存在所述任一通讯消息之外的音频类型的未响应通讯消息,则所述音频转换请求还与所述未响应通讯消息相关。
可选的,还包括:
扩展单元,使所述本端通讯设备在接收所述服务器返回的文字内容 后,对相应的通讯消息的展示区域进行扩展;
其中,扩展后的展示区域被划分为第一区域和第二区域;所述第一用于示出相应的通讯消息、所述第二区域用于示出所述通讯消息对应的文字内容。
在一实施例中,请参考图19,在软件实施方式中,该音频消息的处理装置可以包括预获取单元和展示单元。其中:
预获取单元,使本端通讯设备预获取音频类型的任一通讯消息对应的文字内容;
展示单元,当接收到用户发出的针对所述任一通讯消息的音频转换命令时,使所述本端通讯设备示出预获取的所述文字内容。
可选的,所述预获取单元具体用于:
使所述本端通讯设备从服务器处预获取所述文字内容,所述文字内容由所述服务器预转换得到;
或者,使所述本端通讯设备对所述任一通讯消息进行预转换处理,得到所述文字内容。
可选的,所述预获取单元具体用于:
使所述本端通讯设备接收到服务器推送的所述文字内容;
或者,使所述本端通讯设备在确定与对端通讯设备之间传输的通讯消息的类型时,若确定所述任一通讯消息的类型为音频类型,则预获取所述任一通讯消息对应的文字内容。
可选的,当接收到用户发出的针对所述任一通讯消息的音频转换命令时,若存在处于未响应状态的音频类型的其他通讯消息,则所述展示单元还使所述本端通讯设备分别示出所述其他通讯消息对应的预获取的文字内容。
可选的,还包括:
通知单元,使所述本端通讯设备在分别示出所述其他通讯消息对应的预获取的文字内容之后,向服务器发出对应于所述其他通讯消息的响 应状态切换通知,以由所述服务器将所述其他通讯消息的已响应状态告知对应的发送方。
在一实施例中,请参考图20,在软件实施方式中,该音频消息的处理装置可以包括确定单元和处理单元。其中:
确定单元,在生成音频类型的通讯消息的过程中,使本端通讯设备依次确定已采集到的每个音频片段是否符合预设切分规则;
处理单元,当任一音频片段符合所述预设切分规则时,使所述本端通讯设备将所述任一音频片段实时切分并上传至服务器,以由所述服务器将所述任一音频片段预转换为相应的文字片段,且所有音频片段对应的文字片段由所述服务器依次拼接为所述通讯消息对应的文字内容。
在一实施例中,请参考图21,在软件实施方式中,该音频消息的处理装置可以包括确定单元和返回单元。其中:
确定单元,当接收到任一通讯方针对任一音频消息的音频转换请求时,使服务器确定与所述任一通讯方相关的未响应音频消息;
返回单元,使所述服务器分别获取所述任一音频消息和所述未响应音频消息对应的文字内容,并返回至所述任一通讯方。
可选的,所述返回单元具体用于:
使所述服务器分别将所述任一音频消息和所述未响应音频消息转换为对应的文字内容;
或者,使所述服务器分别查找到所述任一音频消息和所述未响应音频消息对应的预转换的文字内容。
在一实施例中,请参考图22,在软件实施方式中,该音频消息的处理装置可以包括确定单元和展示单元。其中:
确定单元,当接收到用户针对任一音频消息发出的音频转换命令时,使本端通讯设备分别确定所述任一音频消息对应的第一文字内容,以及所述任一音频消息之外的未响应音频消息对应的第二文字内容;
展示单元,使所述本端通讯设备分别将所述第一文字内容与所述任一音频消息、所述第二文字内容与所述未响应音频消息进行关联展示。
可选的,还包括:
预获取单元,在接收到所述音频转换命令之前,使所述本端通讯设备预获取所述第一文字内容和所述第二文字内容;
或者,实时获取单元,在接收到所述音频转换命令之后,使所述本端通讯设备实时获取所述第一文字内容和所述第二文字内容。
可选的,还包括:
主动转换单元,使所述本端通讯设备主动将所述任一音频消息和所述未响应音频消息转换为所述第一文字内容和所述第二文字内容;
或者,请求单元,使所述本端通讯设备向服务器发起音频转换请求,以获得所述服务器返回的所述第一文字内容和所述第二文字内容;其中,所述第一文字内容和所述第二文字内容由所述服务器根据所述音频转换请求进行实时转换得到,或者由所述服务器预转换得到。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (40)

  1. 一种音频消息的处理方法,其特征在于,包括:
    服务器识别通讯双方之间传输的通讯消息的类型;
    当任一通讯消息的类型为音频类型时,所述服务器获取所述任一通讯消息,并预转换为相应的文字内容;
    当确定任一通讯方存在对所述任一通讯消息的转换需求时,所述服务器向所述任一通讯方发送所述文字内容。
  2. 根据权利要求1所述的方法,其特征在于,所述当确定任一通讯方存在对所述任一通讯消息的转换需求时,所述服务器向所述任一通讯方发送所述文字内容,包括:
    当所述任一通讯方在通讯过程中属于预设通讯角色时,所述服务器判定所述任一通讯方存在所述转换需求,并发送所述文字内容。
  3. 根据权利要求1所述的方法,其特征在于,所述当确定任一通讯方存在对所述任一通讯消息的转换需求时,所述服务器向所述任一通讯方发送所述文字内容,包括:
    当接收到任一通讯方针对所述任一通讯消息的音频转换请求时,所述服务器判定所述任一通讯方存在所述转换需求,并向所述任一通讯方返回所述任一通讯消息对应的预转换的所述文字内容。
  4. 根据权利要求3所述的方法,其特征在于,还包括:
    所述服务器确定通讯双方对传输的通讯消息的响应状态;
    当接收到任一通讯方针对所述任一通讯消息的音频转换请求时,若存在与所述任一通讯方相关的音频类型的未响应通讯消息,则所述服务器还向所述任一通讯方返回所有音频类型的未响应通讯消息对应的预转换的文字内容。
  5. 根据权利要求3所述的方法,其特征在于,还包括:
    在向所述任一通讯方返回所述任一通讯消息对应的预转换的所述文字内容之后,所述服务器判定所述任一通讯消息切换至已响应状态;
    所述服务器将所述已响应状态告知所述任一通讯消息的发送方。
  6. 根据权利要求1所述的方法,其特征在于,所述服务器获取所述任一通讯消息,并预转换为相应的文字内容,包括:
    所述服务器依次接收通讯方按照预设规则实时切分并上传的音频片段,并分别将每个音频片段预转换为相应的文字片段;
    所述服务器将所有文字片段依次拼接,得到所述文字内容。
  7. 一种音频消息的处理方法,其特征在于,包括:
    本端通讯设备在接收到用户发出的针对音频类型的任一通讯消息的音频转换命令时,向服务器发起相应的音频转换请求;
    本端通讯设备接收到所述服务器返回的所述任一通讯消息对应的文字内容,并与所述任一通讯消息进行关联展示;其中,所述文字内容由所述服务器在接收到所述音频转换请求之前主动预转换得到。
  8. 根据权利要求7所述的方法,其特征在于,还包括:
    所述本端通讯设备确定所述用户对已接收的音频类型的通讯消息的响应状态;
    其中,当接收到所述用户发出的针对所述任一通讯消息的音频转换命令时,若存在所述任一通讯消息之外的音频类型的未响应通讯消息,则所述音频转换请求还与所述未响应通讯消息相关。
  9. 根据权利要求7所述的方法,其特征在于,还包括:
    所述本端通讯设备在接收所述服务器返回的文字内容后,对相应的通讯消息的展示区域进行扩展;
    其中,扩展后的展示区域被划分为第一区域和第二区域;所述第一用于示出相应的通讯消息、所述第二区域用于示出所述通讯消息对应的文字内容。
  10. 一种音频消息的处理方法,其特征在于,包括:
    本端通讯设备预获取音频类型的任一通讯消息对应的文字内容;
    当接收到用户发出的针对所述任一通讯消息的音频转换命令时,所 述本端通讯设备示出预获取的所述文字内容。
  11. 根据权利要求10所述的方法,其特征在于,所述本端通讯设备预获取音频类型的任一通讯消息对应的文字内容,包括:
    所述本端通讯设备从服务器处预获取所述文字内容,所述文字内容由所述服务器预转换得到;
    或者,所述本端通讯设备对所述任一通讯消息进行预转换处理,得到所述文字内容。
  12. 根据权利要求10所述的方法,其特征在于,所述本端通讯设备预获取音频类型的任一通讯消息对应的文字内容,包括:
    所述本端通讯设备接收到服务器推送的所述文字内容;
    或者,所述本端通讯设备在确定与对端通讯设备之间传输的通讯消息的类型时,若确定所述任一通讯消息的类型为音频类型,则预获取所述任一通讯消息对应的文字内容。
  13. 根据权利要求10所述的方法,其特征在于,还包括:
    当接收到用户发出的针对所述任一通讯消息的音频转换命令时,若存在处于未响应状态的音频类型的其他通讯消息,则所述本端通讯设备还分别示出所述其他通讯消息对应的预获取的文字内容。
  14. 根据权利要求13所述的方法,其特征在于,还包括:
    所述本端通讯设备在分别示出所述其他通讯消息对应的预获取的文字内容之后,向服务器发出对应于所述其他通讯消息的响应状态切换通知,以由所述服务器将所述其他通讯消息的已响应状态告知对应的发送方。
  15. 一种音频消息的处理方法,其特征在于,包括:
    在生成音频类型的通讯消息的过程中,本端通讯设备依次确定已采集到的每个音频片段是否符合预设切分规则;
    当任一音频片段符合所述预设切分规则时,所述本端通讯设备将所述任一音频片段实时切分并上传至服务器,以由所述服务器将所述任一 音频片段预转换为相应的文字片段,且所有音频片段对应的文字片段由所述服务器依次拼接为所述通讯消息对应的文字内容。
  16. 一种音频消息的处理装置,其特征在于,包括:
    识别单元,使服务器识别通讯双方之间传输的通讯消息的类型;
    预转换单元,当任一通讯消息的类型为音频类型时,使所述服务器获取所述任一通讯消息,并预转换为相应的文字内容;
    发送单元,当确定任一通讯方存在对所述任一通讯消息的转换需求时,使所述服务器向所述任一通讯方发送所述文字内容。
  17. 根据权利要求16所述的装置,其特征在于,所述发送单元具体用于:
    当所述任一通讯方在通讯过程中属于预设通讯角色时,使所述服务器判定所述任一通讯方存在所述转换需求,并发送所述文字内容。
  18. 根据权利要求16所述的装置,其特征在于,所述发送单元具体用于:
    当接收到任一通讯方针对所述任一通讯消息的音频转换请求时,使所述服务器判定所述任一通讯方存在所述转换需求,并向所述任一通讯方返回所述任一通讯消息对应的预转换的所述文字内容。
  19. 根据权利要求18所述的装置,其特征在于,还包括:
    确定单元,使所述服务器确定通讯双方对传输的通讯消息的响应状态;
    返回单元,当接收到任一通讯方针对所述任一通讯消息的音频转换请求时,若存在与所述任一通讯方相关的音频类型的未响应通讯消息,则使所述服务器还向所述任一通讯方返回所有音频类型的未响应通讯消息对应的预转换的文字内容。
  20. 根据权利要求18所述的装置,其特征在于,还包括:
    判定单元,在向所述任一通讯方返回所述任一通讯消息对应的预转换的所述文字内容之后,使所述服务器判定所述任一通讯消息切换至已 响应状态;
    告知单元,使所述服务器将所述已响应状态告知所述任一通讯消息的发送方。
  21. 根据权利要求16所述的装置,其特征在于,所述预转换单元具体用于:
    使所述服务器依次接收通讯方按照预设规则实时切分并上传的音频片段,并分别将每个音频片段预转换为相应的文字片段;
    所述服务器将所有文字片段依次拼接,得到所述文字内容。
  22. 一种音频消息的处理装置,其特征在于,包括:
    请求单元,使本端通讯设备在接收到用户发出的针对音频类型的任一通讯消息的音频转换命令时,向服务器发起相应的音频转换请求;
    展示单元,使本端通讯设备接收到所述服务器返回的所述任一通讯消息对应的文字内容,并与所述任一通讯消息进行关联展示;其中,所述文字内容由所述服务器在接收到所述音频转换请求之前主动预转换得到。
  23. 根据权利要求22所述的装置,其特征在于,还包括:
    确定单元,使所述本端通讯设备确定所述用户对已接收的音频类型的通讯消息的响应状态;
    其中,当接收到所述用户发出的针对所述任一通讯消息的音频转换命令时,若存在所述任一通讯消息之外的音频类型的未响应通讯消息,则所述音频转换请求还与所述未响应通讯消息相关。
  24. 根据权利要求22所述的装置,其特征在于,还包括:
    扩展单元,使所述本端通讯设备在接收所述服务器返回的文字内容后,对相应的通讯消息的展示区域进行扩展;
    其中,扩展后的展示区域被划分为第一区域和第二区域;所述第一用于示出相应的通讯消息、所述第二区域用于示出所述通讯消息对应的文字内容。
  25. 一种音频消息的处理装置,其特征在于,包括:
    预获取单元,使本端通讯设备预获取音频类型的任一通讯消息对应的文字内容;
    展示单元,当接收到用户发出的针对所述任一通讯消息的音频转换命令时,使所述本端通讯设备示出预获取的所述文字内容。
  26. 根据权利要求25所述的装置,其特征在于,所述预获取单元具体用于:
    使所述本端通讯设备从服务器处预获取所述文字内容,所述文字内容由所述服务器预转换得到;
    或者,使所述本端通讯设备对所述任一通讯消息进行预转换处理,得到所述文字内容。
  27. 根据权利要求25所述的装置,其特征在于,所述预获取单元具体用于:
    使所述本端通讯设备接收到服务器推送的所述文字内容;
    或者,使所述本端通讯设备在确定与对端通讯设备之间传输的通讯消息的类型时,若确定所述任一通讯消息的类型为音频类型,则预获取所述任一通讯消息对应的文字内容。
  28. 根据权利要求25所述的装置,其特征在于,当接收到用户发出的针对所述任一通讯消息的音频转换命令时,若存在处于未响应状态的音频类型的其他通讯消息,则所述展示单元还使所述本端通讯设备分别示出所述其他通讯消息对应的预获取的文字内容。
  29. 根据权利要求28所述的装置,其特征在于,还包括:
    通知单元,使所述本端通讯设备在分别示出所述其他通讯消息对应的预获取的文字内容之后,向服务器发出对应于所述其他通讯消息的响应状态切换通知,以由所述服务器将所述其他通讯消息的已响应状态告知对应的发送方。
  30. 一种音频消息的处理装置,其特征在于,包括:
    确定单元,在生成音频类型的通讯消息的过程中,使本端通讯设备依次确定已采集到的每个音频片段是否符合预设切分规则;
    处理单元,当任一音频片段符合所述预设切分规则时,使所述本端通讯设备将所述任一音频片段实时切分并上传至服务器,以由所述服务器将所述任一音频片段预转换为相应的文字片段,且所有音频片段对应的文字片段由所述服务器依次拼接为所述通讯消息对应的文字内容。
  31. 一种音频消息的处理方法,其特征在于,包括:
    当接收到任一通讯方针对任一音频消息的音频转换请求时,服务器确定与所述任一通讯方相关的未响应音频消息;
    所述服务器分别获取所述任一音频消息和所述未响应音频消息对应的文字内容,并返回至所述任一通讯方。
  32. 根据权利要求31所述的方法,其特征在于,所述服务器分别获取所述任一音频消息和所述未响应音频消息对应的文字内容,包括:
    所述服务器分别将所述任一音频消息和所述未响应音频消息转换为对应的文字内容;
    或者,所述服务器分别查找到所述任一音频消息和所述未响应音频消息对应的预转换的文字内容。
  33. 一种音频消息的处理方法,其特征在于,包括:
    当接收到用户针对任一音频消息发出的音频转换命令时,本端通讯设备分别确定所述任一音频消息对应的第一文字内容,以及所述任一音频消息之外的未响应音频消息对应的第二文字内容;
    所述本端通讯设备分别将所述第一文字内容与所述任一音频消息、所述第二文字内容与所述未响应音频消息进行关联展示。
  34. 根据权利要求33所述的方法,其特征在于,还包括:
    在接收到所述音频转换命令之前,所述本端通讯设备预获取所述第一文字内容和所述第二文字内容;
    或者,在接收到所述音频转换命令之后,所述本端通讯设备实时获 取所述第一文字内容和所述第二文字内容。
  35. 根据权利要求33所述的方法,其特征在于,所述本端通讯设备通过下述任一方式获取所述第一文字内容和所述第二文字内容:
    所述本端通讯设备主动将所述任一音频消息和所述未响应音频消息转换为所述第一文字内容和所述第二文字内容;
    或者,所述本端通讯设备向服务器发起音频转换请求,以获得所述服务器返回的所述第一文字内容和所述第二文字内容;其中,所述第一文字内容和所述第二文字内容由所述服务器根据所述音频转换请求进行实时转换得到,或者由所述服务器预转换得到。
  36. 一种音频消息的处理装置,其特征在于,包括:
    确定单元,当接收到任一通讯方针对任一音频消息的音频转换请求时,使服务器确定与所述任一通讯方相关的未响应音频消息;
    返回单元,使所述服务器分别获取所述任一音频消息和所述未响应音频消息对应的文字内容,并返回至所述任一通讯方。
  37. 根据权利要求36所述的装置,其特征在于,所述返回单元具体用于:
    使所述服务器分别将所述任一音频消息和所述未响应音频消息转换为对应的文字内容;
    或者,使所述服务器分别查找到所述任一音频消息和所述未响应音频消息对应的预转换的文字内容。
  38. 一种音频消息的处理装置,其特征在于,包括:
    确定单元,当接收到用户针对任一音频消息发出的音频转换命令时,使本端通讯设备分别确定所述任一音频消息对应的第一文字内容,以及所述任一音频消息之外的未响应音频消息对应的第二文字内容;
    展示单元,使所述本端通讯设备分别将所述第一文字内容与所述任一音频消息、所述第二文字内容与所述未响应音频消息进行关联展示。
  39. 根据权利要求38所述的装置,其特征在于,还包括:
    预获取单元,在接收到所述音频转换命令之前,使所述本端通讯设备预获取所述第一文字内容和所述第二文字内容;
    或者,实时获取单元,在接收到所述音频转换命令之后,使所述本端通讯设备实时获取所述第一文字内容和所述第二文字内容。
  40. 根据权利要求38所述的装置,其特征在于,还包括:
    主动转换单元,使所述本端通讯设备主动将所述任一音频消息和所述未响应音频消息转换为所述第一文字内容和所述第二文字内容;
    或者,请求单元,使所述本端通讯设备向服务器发起音频转换请求,以获得所述服务器返回的所述第一文字内容和所述第二文字内容;其中,所述第一文字内容和所述第二文字内容由所述服务器根据所述音频转换请求进行实时转换得到,或者由所述服务器预转换得到。
PCT/CN2017/077257 2016-03-29 2017-03-20 音频消息的处理方法及装置 WO2017167047A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/143,372 US11037568B2 (en) 2016-03-29 2018-09-26 Audio message processing method and apparatus
US17/316,931 US12046242B2 (en) 2016-03-29 2021-05-11 Audio message processing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610187534.9A CN105869654B (zh) 2016-03-29 2016-03-29 音频消息的处理方法及装置
CN201610187534.9 2016-03-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/143,372 Continuation US11037568B2 (en) 2016-03-29 2018-09-26 Audio message processing method and apparatus

Publications (1)

Publication Number Publication Date
WO2017167047A1 true WO2017167047A1 (zh) 2017-10-05

Family

ID=56625194

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/077257 WO2017167047A1 (zh) 2016-03-29 2017-03-20 音频消息的处理方法及装置

Country Status (4)

Country Link
US (2) US11037568B2 (zh)
CN (1) CN105869654B (zh)
TW (1) TWI808936B (zh)
WO (1) WO2017167047A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI691190B (zh) * 2017-11-23 2020-04-11 香港商阿里巴巴集團服務有限公司 語音控制方法及裝置和電子設備

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869654B (zh) * 2016-03-29 2020-12-04 阿里巴巴集团控股有限公司 音频消息的处理方法及装置
JP7037426B2 (ja) * 2018-04-25 2022-03-16 京セラ株式会社 電子機器及び処理システム
US11977849B2 (en) * 2020-04-24 2024-05-07 Rajiv Trehan Artificial intelligence (AI) based automated conversation assistance system and method thereof
CN114678017A (zh) * 2022-02-09 2022-06-28 达闼机器人股份有限公司 语音处理方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20020028501A (ko) * 2000-10-10 2002-04-17 김철권 통신망에서의 음성 데이터와 문자 데이터간의 변환 방법및 그 장치
CN1798220A (zh) * 2004-12-20 2006-07-05 英保达股份有限公司 语音处理系统及方法
US7136462B2 (en) * 2003-07-15 2006-11-14 Lucent Technologies Inc. Network speech-to-text conversion and store
CN103632670A (zh) * 2013-11-30 2014-03-12 青岛英特沃克网络科技有限公司 语音和文本消息自动转换系统及其方法
CN104700836A (zh) * 2013-12-10 2015-06-10 阿里巴巴集团控股有限公司 一种语音识别方法和系统
CN105162836A (zh) * 2015-07-29 2015-12-16 百度在线网络技术(北京)有限公司 执行语音通信的方法、服务器和智能终端设备
CN105869654A (zh) * 2016-03-29 2016-08-17 阿里巴巴集团控股有限公司 音频消息的处理方法及装置

Family Cites Families (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5799279A (en) * 1995-11-13 1998-08-25 Dragon Systems, Inc. Continuous speech recognition of text and commands
US5724410A (en) 1995-12-18 1998-03-03 Sony Corporation Two-way voice messaging terminal having a speech to text converter
US6353809B2 (en) * 1997-06-06 2002-03-05 Olympus Optical, Ltd. Speech recognition with text generation from portions of voice data preselected by manual-input commands
US6198808B1 (en) * 1997-12-31 2001-03-06 Weblink Wireless, Inc. Controller for use with communications systems for converting a voice message to a text message
US6483899B2 (en) * 1998-06-19 2002-11-19 At&T Corp Voice messaging system
US6871179B1 (en) * 1999-07-07 2005-03-22 International Business Machines Corporation Method and apparatus for executing voice commands having dictation as a parameter
US7398209B2 (en) * 2002-06-03 2008-07-08 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US20040176114A1 (en) 2003-03-06 2004-09-09 Northcutt John W. Multimedia and text messaging with speech-to-text assistance
US20040267527A1 (en) 2003-06-25 2004-12-30 International Business Machines Corporation Voice-to-text reduction for real time IM/chat/SMS
US7130401B2 (en) * 2004-03-09 2006-10-31 Discernix, Incorporated Speech to text conversion system
US20050266829A1 (en) 2004-04-16 2005-12-01 Lg Elcectronics, Inc. Speech-to-text messaging system and method
US7583974B2 (en) 2004-05-27 2009-09-01 Alcatel-Lucent Usa Inc. SMS messaging with speech-to-text and text-to-speech conversion
US8009815B2 (en) * 2005-08-25 2011-08-30 Thomas James Newell Message distribution system
JP2007133033A (ja) * 2005-11-08 2007-05-31 Nec Corp 音声テキスト化システム、音声テキスト化方法および音声テキスト化用プログラム
US7698140B2 (en) * 2006-03-06 2010-04-13 Foneweb, Inc. Message transcription, voice query and query delivery system
WO2009073768A1 (en) * 2007-12-04 2009-06-11 Vovision, Llc Correcting transcribed audio files with an email-client interface
US8204748B2 (en) 2006-05-02 2012-06-19 Xerox Corporation System and method for providing a textual representation of an audio message to a mobile device
US20090070109A1 (en) 2007-09-12 2009-03-12 Microsoft Corporation Speech-to-Text Transcription for Personal Communication Devices
US8958848B2 (en) * 2008-04-08 2015-02-17 Lg Electronics Inc. Mobile terminal and menu control method thereof
FR2947688B1 (fr) * 2009-07-02 2011-10-14 Peugeot Citroen Automobiles Sa Systeme telematique pour vehicule avec reconnaissance vocale et export vers un support externe au systeme
US8358752B2 (en) * 2009-11-19 2013-01-22 At&T Mobility Ii Llc User profile based speech to text conversion for visual voice mail
EP2574220B1 (en) 2010-05-17 2019-11-27 Tata Consultancy Services Ltd. Hand-held communication aid for individuals with auditory, speech and visual impairments
US8355703B2 (en) * 2010-06-08 2013-01-15 At&T Intellectual Property I, L.P. Intelligent text message-to-speech system and method for visual voice mail
US8543652B2 (en) 2010-07-22 2013-09-24 At&T Intellectual Property I, L.P. System and method for efficient unified messaging system support for speech-to-text service
US8489075B2 (en) * 2011-11-16 2013-07-16 At&T Intellectual Property I, L.P. System and method for augmenting features of visual voice mail
CN104254884B (zh) * 2011-12-07 2017-10-24 高通股份有限公司 用于分析数字化音频流的低功率集成电路
US10334069B2 (en) * 2013-05-10 2019-06-25 Dropbox, Inc. Managing a local cache for an online content-management system
KR102149266B1 (ko) * 2013-05-21 2020-08-28 삼성전자 주식회사 전자 기기의 오디오 데이터의 관리 방법 및 장치
CN103281683B (zh) * 2013-06-08 2016-08-17 网易(杭州)网络有限公司 一种发送语音消息的方法及装置
US9401146B2 (en) * 2014-04-01 2016-07-26 Google Inc. Identification of communication-related voice commands
US10033864B2 (en) 2015-05-18 2018-07-24 Interactive Intelligence Group, Inc. Dynamically switching communications to text interactions
US9807045B2 (en) 2015-06-10 2017-10-31 Google Inc. Contextually driven messaging system
US20170085506A1 (en) 2015-09-21 2017-03-23 Beam Propulsion Lab Inc. System and method of bidirectional transcripts for voice/text messaging
US10223066B2 (en) * 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10121474B2 (en) * 2016-02-17 2018-11-06 Microsoft Technology Licensing, Llc Contextual note taking

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20020028501A (ko) * 2000-10-10 2002-04-17 김철권 통신망에서의 음성 데이터와 문자 데이터간의 변환 방법및 그 장치
US7136462B2 (en) * 2003-07-15 2006-11-14 Lucent Technologies Inc. Network speech-to-text conversion and store
CN1798220A (zh) * 2004-12-20 2006-07-05 英保达股份有限公司 语音处理系统及方法
CN103632670A (zh) * 2013-11-30 2014-03-12 青岛英特沃克网络科技有限公司 语音和文本消息自动转换系统及其方法
CN104700836A (zh) * 2013-12-10 2015-06-10 阿里巴巴集团控股有限公司 一种语音识别方法和系统
CN105162836A (zh) * 2015-07-29 2015-12-16 百度在线网络技术(北京)有限公司 执行语音通信的方法、服务器和智能终端设备
CN105869654A (zh) * 2016-03-29 2016-08-17 阿里巴巴集团控股有限公司 音频消息的处理方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI691190B (zh) * 2017-11-23 2020-04-11 香港商阿里巴巴集團服務有限公司 語音控制方法及裝置和電子設備

Also Published As

Publication number Publication date
TW201737117A (zh) 2017-10-16
CN105869654B (zh) 2020-12-04
US11037568B2 (en) 2021-06-15
CN105869654A (zh) 2016-08-17
US12046242B2 (en) 2024-07-23
US20210266280A1 (en) 2021-08-26
TWI808936B (zh) 2023-07-21
US20190027150A1 (en) 2019-01-24

Similar Documents

Publication Publication Date Title
WO2017167047A1 (zh) 音频消息的处理方法及装置
US20170230326A1 (en) Integrated Messaging
US9591524B2 (en) Method and apparatus for transmitting data in network system, and data transmission system
US9686506B2 (en) Method, apparatus, system, and storage medium for video call and video call control
WO2009089788A1 (fr) Procédé, dispositif et système permettant la mise en oeuvre de partage de ressources
US8990331B2 (en) Method, apparatus and system for sharing a microblog message
US10474319B2 (en) Methods and instant messaging client devices for performing IM using menu option
JP2017097919A (ja) モバイル・デバイス、通信方法及び記録媒体
WO2012100694A1 (zh) 用于移动设备显示电子邮件处理方法、服务器及移动设备
US11956531B2 (en) Video sharing method and apparatus, electronic device, and storage medium
KR20150032152A (ko) 전자 장치 간의 편집 동작을 실행하는 방법 및 장치
US9992343B2 (en) Text translation of an audio recording during recording capture
US10453160B2 (en) Embeddable communications software module
EP2974159B1 (en) Method, device and system for voice communication
WO2017071356A1 (zh) 一种基于客户端的网络数据同步的方法、装置以及系统
CN103944806A (zh) 一种基于微信平台的数据传输方法和系统
JP2019527490A (ja) メッセージデータを選択的に適合して送信するための方法、システム、およびコンピュータプログラム製品
JP6170634B2 (ja) 送信および受信された電子メッセージの相関
US9392119B2 (en) Enhanced visual voice mail
US20120077528A1 (en) Method To Exchange Application Specific Information For Outgoing Calls Between Two Mobile Devices
US11438298B2 (en) Method and apparatus for forwarding content between different application programs
WO2017121267A1 (zh) 一种资源传输方法及装置
KR20040000203A (ko) 무선 인터넷상에서 단말정보에 의존하는 메시지 서비스의컨텐츠 변환방법
CA2812824C (en) Method to exchange application specific information for outgoing calls between two mobile devices
KR100840301B1 (ko) 파일과 메시지를 동시에 송수신하는 방법과 이를 위한이동통신 단말기 및 메신저 서버

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17773077

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17773077

Country of ref document: EP

Kind code of ref document: A1