CN112911074B

CN112911074B - Voice communication processing method, device, equipment and machine-readable medium

Info

Publication number: CN112911074B
Application number: CN201911136856.0A
Authority: CN
Inventors: 肖蒴; 沈浩翔; 陈初; 符笛; 冯伟国
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-11-19
Filing date: 2019-11-19
Publication date: 2023-05-30
Anticipated expiration: 2039-11-19
Also published as: CN112911074A

Abstract

The embodiment of the application provides a voice communication processing method, a device, equipment and a machine-readable medium, wherein the method applied to first equipment comprises the following steps: receiving a voice communication request; the voice communication request is a request initiated by the third equipment to the second equipment; establishing a dialogue with a third device; determining dialogue content according to the first voice information of the third equipment; the dialogue content is recorded. The embodiment of the invention can save the cost spent by the user for answering the voice communication request and can help the user to know the condition of the voice communication request.

Description

Voice communication processing method, device, equipment and machine-readable medium

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a voice communication processing method, a voice communication processing apparatus, a device, and a machine readable medium.

Background

The application and popularization of mobile terminals, and the development of communication technology, have made it easy to connect people to people. The internet information is increasingly transparent and easy to spread, so that the identification of the mobile terminal is not provided with privacy, and the mobile terminal becomes one of user information which can be easily acquired by fraud institutions, marketing institutions and other institutions.

In the process of using the mobile terminal, a user often receives calls with strange numbers, wherein many calls are meaningless calls such as promotion calls, the user is difficult to avoid harassment if answering the call, and the user worry about missing important information if not answering the call, for example: colleagues, friends, family members dial incoming calls through strange numbers, etc.

Disclosure of Invention

The technical problem to be solved by the embodiments of the present application is to provide a voice communication processing method, which can save the cost spent by a user in answering a voice communication request, and can help the user to know the situation of the voice communication request.

Correspondingly, the embodiment of the application also provides a voice communication processing device, a device and a machine-readable medium, which are used for guaranteeing the implementation and application of the method.

In order to solve the above problems, an embodiment of the present application discloses a voice communication processing method, which is applied to a first device, and the method includes:

receiving a voice communication request; the voice communication request is a request initiated by the third equipment to the second equipment;

establishing a dialogue with a third device;

determining dialogue content according to the first voice information of the third equipment;

The dialogue content is recorded.

On the other hand, the embodiment of the application also discloses a voice communication processing method which is applied to the second equipment, wherein the voice communication request of the second equipment is sent to the first equipment, and the method comprises the following steps:

receiving dialogue content corresponding to the voice communication request from the first equipment;

and outputting the dialogue content.

On the other hand, the embodiment of the application also discloses a voice communication processing method, which comprises the following steps:

receiving a takeover instruction for the dialog;

determining dialogue content of the dialogue in response to the take-over instruction;

recording the dialogue content.

On the other hand, the embodiment of the application also discloses a voice communication processing method, which is applied to the first device, and comprises the following steps:

receiving an access instruction sent by second equipment;

accessing a dialogue corresponding to the second equipment according to the access instruction;

determining dialogue content according to the voice information of the dialogue opposite terminal;

recording dialogue content;

and sending the dialogue content to the second device.

On the other hand, the embodiment of the application also discloses a voice communication processing method, which is applied to the second equipment, and comprises the following steps:

After a dialogue is established, an access instruction is sent to first equipment so that the first equipment can access the dialogue;

and receiving dialogue content sent by the first equipment.

On the other hand, the embodiment of the application also discloses a voice communication processing device, which is applied to the first equipment, and the device comprises:

the receiving module is used for receiving a voice communication request; the voice communication request is a request initiated by the third equipment to the second equipment;

establishing a dialogue with a third device;

the dialogue content is recorded.

On the other hand, the embodiment of the application also discloses a voice communication processing device, which is applied to the second equipment, wherein the voice communication request of the second equipment is sent to the first equipment, and the device comprises:

and outputting the dialogue content.

In yet another aspect, an embodiment of the present application further discloses an apparatus, including:

one or more processors; and

one or more machine-readable media having instructions stored thereon, which when executed by the one or more processors, cause the apparatus to perform one or more of the methods described previously.

In yet another aspect, embodiments of the present application disclose one or more machine-readable media having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform one or more of the methods described previously.

Embodiments of the present application include the following advantages:

the embodiment of the application is a user voice communication request, and can save the cost spent by the user for answering the voice communication request.

In addition, the embodiment of the application records the dialogue content, and the dialogue content can help the user to know the condition of the voice communication request, for example, the user can be helped to judge whether the voice communication request is meaningful to the user or not, and for example, the dialogue content can comprise information meaningful to the user and the like.

Drawings

FIG. 1 is a schematic diagram of an application environment of a voice communication processing method according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating steps of a first embodiment of a voice communication processing method according to the present application;

FIG. 3 is an illustration of dialog content according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating steps of a second embodiment of a voice communication processing method according to the present application;

FIG. 5 is a schematic diagram of an application environment of a voice communication processing method according to an embodiment of the present application;

FIG. 6 is a flowchart illustrating steps of a third embodiment of a voice communication processing method according to the present application;

FIG. 7 is a flowchart illustrating steps of a fourth embodiment of a voice communication processing method according to the present application;

FIG. 8 is a flowchart illustrating steps in a fifth embodiment of a voice communication processing method according to the present application;

FIG. 9 is a flowchart illustrating steps of a sixth embodiment of a voice communication processing method according to the present application;

FIG. 10 is a flowchart illustrating steps of a seventh embodiment of a voice communication processing method according to the present application;

FIG. 11 is a block diagram of an embodiment of a voice communication processing apparatus of the present application; a step of

FIG. 12 is a block diagram of an embodiment of a voice communication processing apparatus of the present application; and

fig. 13 is a schematic structural diagram of an apparatus according to an embodiment of the present application.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings.

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application are within the scope of the protection of the present application.

The concepts of the present application are susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the description herein of specific embodiments is not intended to limit the concepts of the present application to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the present application.

Reference in the specification to "one embodiment," "an embodiment," "one particular embodiment," etc., means that a particular feature, structure, or characteristic may be included in the described embodiments, but every embodiment may or may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, where a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the purview of one skilled in the art to effect such feature, structure, or characteristic in connection with other ones of the embodiments whether or not explicitly described. In addition, it should be understood that the items in the list included in this form of "at least one of A, B and C" may include the following possible items: (A); (B); (C); (A and B); (A and C); (B and C); or (A, B and C). Likewise, an item listed in this form of "at least one of A, B or C" may mean (A); (B); (C); (A and B); (A and C); (B and C); or (A, B and C).

In some cases, the disclosed embodiments may be implemented as hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried on or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be executed by one or more processors. A machine-readable storage medium may be implemented as a storage device, mechanism, or other physical structure (e.g., volatile or non-volatile memory, a media disc, or other media other physical structure device) for storing or transmitting information in a form readable by a machine.

In the drawings, some structural or methodological features may be shown in a particular arrangement and/or ordering. Preferably, however, such specific arrangement and/or ordering is not necessary. Rather, in some embodiments, such features may be arranged in a different manner and/or order than as shown in the drawings. Furthermore, inclusion of a feature in a particular figure that is not necessarily meant to imply that such feature is required in all embodiments and that, in some embodiments, may not be included or may be combined with other features.

The embodiment of the application provides a voice communication processing scheme, which takes over a voice communication request for a user by a smart phone assistant and records actual dialogue content.

Firstly, the embodiment of the application helps the user take over the voice communication request, so that the cost spent by the user for answering the incoming call can be saved. Particularly, in the case that the communication type is the harassment type, the harassment of the harassment call to the user can be reduced.

In this embodiment of the present application, optionally, the voice communication request may specifically include: a voice incoming call request; or, the voice instant messaging request specifically refers to a voice request in an instant messaging environment. In the embodiment of the application, the voice incoming call request is taken as an example to describe a voice communication method, and the voice communication method corresponding to the voice instant communication request is referred to each other.

In the embodiment of the application, the specific implementation manner of taking over the voice communication request for the user by the intelligent phone assistant is as follows: transferring the incoming call of the user corresponding to the second device to the first device, the first device may have a smart phone assistant running thereon, the smart phone assistant may correspond to a program of the first device, or the smart phone assistant may correspond to a process, a thread, a service (service) or the like in an operating system of the first device. The service is a component of an operating system (such as android), and is used for processing some time-consuming logic in the background, or executing some tasks needing to run for a long time, and even can keep the service in a running state in the background under the condition that a program exits.

In practical applications, a transfer relationship between the second device and the first device may be pre-established, so as to transfer an incoming call of the second device to the first device.

According to one embodiment, the transfer relationship may not correspond to a triggered condition. Accordingly, in any case, the incoming call of the second device can be transferred to the first device.

According to another embodiment, the transfer relationship may correspond to a triggered condition. The triggering condition may include: the incoming call of the second device is hung up, or the incoming call of the second device is missed, or the second device is in a busy state, or user state information corresponding to the second device meets preset conditions, and the like, when the incoming call of the second device is received.

The user state information may characterize the state exhibited by the user. The user status information may include: user face information, or user limb information, environmental information in which the user is located, and the like.

According to one embodiment, the preset conditions may include: the user state information characterizes the user as being in a busy state, such as the user being at home, etc.

According to another embodiment, the preset conditions may include: the user status information characterizes that the user is in a preset space, such as the user is in a kitchen or a bathroom, etc.

According to the embodiment of the invention, under the condition that the user state information accords with the preset condition, the user is automatically helped to take over the voice communication request, and the intelligence of taking over can be improved.

It can be appreciated that the embodiments of the present application do not limit whether the transfer relationship corresponds to a trigger condition, and the specific trigger condition corresponding to the transfer relationship.

Referring to fig. 1, a schematic diagram of an application environment of a voice communication processing method in an embodiment of the present application, where a third device is used as a calling device to initiate an incoming call to a second device, and the incoming call of the second device may be transferred to a first device according to a trigger condition and a transfer relationship set by the second device.

The first device can receive the transferred incoming call and establish a dialogue with the third device; determining dialogue content according to the first voice information of the third equipment; and records the dialogue content.

The first device and the second device may interact with each other.

For example, the first device may send the session content or the takeover information corresponding to the incoming call to the second device. The dialogue content or the takeover information can enable the user of the second equipment to know the condition of the incoming call so as to help the user of the second equipment to judge whether the incoming call is significant to the user. In the case that the incoming call is meaningful to the user, the takeover information may further include key information that is meaningful to the user, so as to improve the information acquisition efficiency of the user.

For another example, the second device may send a reply instruction to the first device, which may instruct the first device to determine reply content during the conversation. For example, if the user is interested in a certain communication type a, the reply instruction may instruct to consult with respect to the relevant information of the communication type a, or the like.

In the embodiment of the present application, the first device, the second device, or the third device may be different devices. And the first device, the second device, or the third device may be a dialogue-capable device.

The first device, the second device, or the third device may specifically include, but are not limited to: smart phones, tablet computers, wearable devices, smart speakers, etc., it will be appreciated that embodiments of the present application are not limited to particular devices.

The intelligent sound box is an upgrade product of the sound box, and besides audio output components such as a power amplifier and a loudspeaker which are common in the sound box, the intelligent sound box can also comprise audio input components such as a microphone and a wireless network module, wherein the wireless network module can comprise a network access module such as a WIFI (wireless fidelity ) chip, a Bluetooth module such as a Bluetooth chip and the like, and can also be a module related to other wireless connection technologies. Therefore, the intelligent sound box can be used as a tool for surfing the Internet by voice to connect and interact with a network and other equipment besides providing a basic audio output function.

The intelligent sound box can support voice interaction and screen display, and can display information through a screen; the user can also interact with the voice interaction mode.

Method embodiment one

Referring to fig. 2, a flowchart illustrating steps of a first embodiment of a voice communication processing method of the present application is shown, where the method may specifically include the following steps:

step 201, receiving a voice communication request; the voice communication request may be a request initiated by the third device to the second device;

step 202, establishing a dialogue with a third device;

step 203, determining dialogue content according to the first voice information of the third device;

step 204, recording dialogue content.

In step 201, a transfer relationship between the second device and the first device may be pre-established to transfer the voice communication request of the second device to the first device.

According to one embodiment, the transfer relationship may not correspond to a triggered condition. Accordingly, in any case, the voice communication request of the second device may be transferred to the first device.

According to another embodiment, the transfer relationship may correspond to a triggered condition. The triggering condition may include: the incoming call of the second device is hung up, or the incoming call of the second device is missed, or the second device is in a busy state, or user state information corresponding to the second device meets preset conditions, and the like, when the incoming call of the second device is received. It can be appreciated that the embodiments of the present application do not limit whether the transfer relationship corresponds to a trigger condition, and the specific trigger condition corresponding to the transfer relationship.

In an alternative embodiment of the present application, receiving the voice communication request may specifically include: and receiving a voice communication request under the condition that the user state information corresponding to the second equipment accords with a preset condition.

In an alternative embodiment of the present application, the transfer server may store the above transfer relationship, and transfer the voice communication request of the second device to the first device according to the above transfer relationship. It will be appreciated that embodiments of the present application are not limited to a particular process of transferring a voice communication request from a second device to a first device.

In this embodiment of the present application, the processing manner information of the voice communication request by the second device may include: answer, or reject.

In this embodiment of the present application, the second device may search the identifier database according to the communication identifier corresponding to the voice communication request, and determine the processing mode information according to the search result. The communication identification may include: an incoming call number, or an instant messaging number, etc. For example, a tag number library is searched according to the incoming call number, and the processing mode information is determined according to the search result.

The communication types of the tag number library may include: fraud type, promotion type, transaction type, etc.

Optionally, if the caller number hits the fraud type tag number library, the processing manner information may include: and rejecting.

Alternatively, if the incoming call number hits the non-fraud type tag number library, the processing manner information may include: and (5) answering.

Optionally, if the incoming call number does not hit any type of tag number library, the processing manner information may include: and (5) answering.

In step 202, the second device may establish a dialogue with the third device by listening to the voice communication request.

In the embodiment of the application, the dialogue content may include: the first voice information sent by the third device and/or the second voice information sent by the second device to the third device. For example, the second voice information may be "hello". The first voice message may be "please ask you to buy a house recently".

In this embodiment of the present application, the session content may include: session identity and corresponding information, session identity may include: a first device or a third device.

In this embodiment of the present application, optionally, the dialog content may be arranged according to a chronological order.

In this embodiment of the present application, optionally, the recording session content specifically includes:

Recording voice information of the dialogue; and/or

Recording text information corresponding to the voice information of the dialogue; speech recognition techniques may be utilized to convert speech information into text information.

segmenting the voice information of the dialogue to obtain corresponding voice segments; the embodiments of the present application are not limited to a specific length of a speech segment.

And recording the voice segment and text information corresponding to the voice segment.

In this embodiment of the present application, optionally, the segmenting the voice information of the session may specifically include:

and segmenting the voice information of the dialogue according to the dialogue identity corresponding to the voice information. Different dialog identities may correspond to different speech segments. Under the condition that the duration of the voice information corresponding to one dialogue identity is longer, the voice information corresponding to one dialogue identity can be segmented to obtain voice segments with the duration not exceeding the preset duration.

In this embodiment of the present application, optionally, the session content may include: a plurality of speech segments arranged according to time; the speech segments may correspond to dialog identities corresponding to the speech information.

Referring to fig. 3, a schematic illustration of a dialog content according to an embodiment of the present application is shown, which may include: a plurality of voice segments arranged according to time and dialogue identity, and text information corresponding to the voice segments. The duration of the speech segment may not exceed a preset duration to facilitate user listening. And also provides text information of the voice segment, so that the user can conveniently check the text information under the condition that the user is inconvenient to listen to voice.

In step 203, dialogue content may be recorded so that the user knows the condition of the voice communication request according to the dialogue content.

In an alternative embodiment of the present application, the method may further include: and determining the communication type corresponding to the voice communication request according to the dialogue content.

The communication type may characterize an identity or purpose of a calling user to which the voice communication request corresponds, and examples of the communication type may include: a harassment type (e.g., fraud type, promotion type, etc.), a relatives type, or a transaction type (e.g., express type, take-out type, etc.).

In this embodiment of the present application, optionally, a mapping relationship between the session content and the communication type may be used to determine a communication type corresponding to the voice communication request.

In this embodiment of the present invention, optionally, session content may be sent to the second device, and in case of updating the session content, the updated session content may be sent to the second device, so that the user may obtain the session content in real time. Optionally, the second device may display the real-time dialogue content through a web page or an APP interface, so that the user views the real-time dialogue content.

In this embodiment of the present application, optionally, take-over information corresponding to the incoming call may be sent to the second device, where the take-over information may enable a user to learn about the situation of the incoming call.

The takeover information specifically comprises at least one of the following information;

incoming call time, incoming call number, communication type and processing mode information.

Optionally, the processing mode information specifically includes: answering or hanging up;

Optionally, in the case that the processing mode information includes answering, the takeover information specifically includes: the conversation content, and/or key information extracted from the conversation content.

The embodiment of the application can extract the key information from the dialogue content and can improve the efficiency of acquiring the information by the user.

Alternatively, key information may be extracted from the conversation content depending on the communication type. Alternatively, the type of key information may be determined depending on the communication type. Taking the property type as an example, the types of key information may include: time, place, price, etc. Taking the express delivery type as an example, the types of the key information may include: time, place, etc.

The second voice information may include: the actively transmitted content, such as greetings, the second voice message may further include: reply content to the first voice information.

In the embodiment of the application, the second voice information can be determined by using an intelligent interaction technology. For example, after establishing the conversation, the second voice message may be a preset greeting, such as "hello". After receiving the first voice information, the second voice information may be reply content corresponding to the first voice information, and so on.

In the embodiment of the application, optionally, a mapping relationship between the dialogue content and the communication type may be represented by the first data analyzer. Correspondingly, the method can further comprise the steps of: training the training data to obtain a first data analyzer; the first data analyzer may be configured to characterize a mapping relationship between dialogue content and communication types; the training data may include: dialog content in the corpus, and the type of communication from the annotation. Optionally, the corpus may be a dialogue corpus, and in particular, the dialogue corpus may be: telephone conversation corpus.

In an alternative embodiment of the present application, the data model may be trained based on training data to obtain a first data analyzer, which may characterize the mapping between input data (dialog content) and output data (communication type).

The mathematical model is a scientific or engineering model constructed by using a mathematical logic method and a mathematical language, and is a mathematical structure which is expressed in a generalized or approximate way by adopting the mathematical language aiming at referring to the characteristic or the quantity dependency relationship of a certain object system, and the mathematical structure is a relationship structure which is expressed by means of mathematical symbols. The mathematical model may be one or a set of algebraic, differential, integral or statistical equations and combinations thereof by which the interrelationship or causal relationship between the variables of the system is described quantitatively or qualitatively. In addition to mathematical models described by equations, there are models described by other mathematical tools, such as algebra, geometry, topology, mathematical logic, etc. Wherein the mathematical model describes the behavior and characteristics of the system rather than the actual structure of the system. The training of the mathematical model may be performed by a machine learning method, a deep learning method, and the like, and the machine learning method may include: linear regression, decision trees, random forests, etc., the deep learning method may include: convolutional neural network (Convolutional Neural Networks, CNN), long Short-Term Memory network (LSTM), gated loop unit (Gated Recurrent Unit, GRU), etc.

In the embodiment of the present application, optionally, the dialogue content may be parsed to obtain a corresponding communication type. Syntactic analysis may include: dependency syntax analysis, and the like. The corresponding communication type can be obtained by syntactic analysis of the keywords in the result. For example, the dialogue content "please ask you to buy a house recently" may be parsed to obtain the communication type "house type". As another example, the dialog content "XXX i am fitness" may be parsed to obtain a communication type "fitness type" or the like.

The process of determining the reply content is described in detail below. In this embodiment of the present application, the reply content may be one of the second voice information. Specifically, the second voice information may include: the actively transmitted content, such as greetings, the second voice message may further include: reply content to the first voice information.

In the embodiment of the application, the dialogue content may correspond to a voice form or a text form. The first voice information sent by the third device may be collected, so that the first voice information may be obtained, that is, the dialogue may be recorded, so as to obtain dialogue content in a voice form.

In an optional embodiment of the present application, the information to be replied may be determined according to the pause interval information of the first voice information sent by the third device, and reply is performed on the information to be replied.

The pause interval information may reflect a pause rule when the user speaks, for example, the user typically pauses longer after speaking a sentence, or the user typically pauses longer after speaking a sentence, in anticipation of a reply. According to the pause interval information, complete information to be replied can be obtained, reply is conducted on the complete information to be replied, the rationality of reply time can be improved, and the accuracy of reply content can be improved.

Accordingly, the method may further include: determining information to be replied according to the pause interval information of the first voice information; and determining the reply content corresponding to the information to be replied according to the information to be replied and the information to be replied.

For example, pause interval information in the voice signal can be detected, if the pause interval information exceeds the interval threshold, the voice is considered to be finished once, so that information to be replied can be obtained, and reply is carried out on the information to be replied. The interval threshold may be determined by those skilled in the art according to practical application requirements, for example, the interval threshold may be 800 milliseconds or the like, and it is understood that the embodiment of the present application is not limited to a specific interval threshold.

In an optional embodiment of the present application, the reply intention corresponding to the information to be recovered may be determined according to the information to be recovered and the context of the information to be recovered, and further the corresponding reply content may be determined according to the reply intention.

In this embodiment of the present application, optionally, the reply intention corresponding to the information to be replied may be determined by using the above mapping relationship between the information to be replied and the reply content.

In this embodiment of the present application, optionally, the mapping relationship between the information to be replied and the reply content may be represented by a second data analyzer. Correspondingly, the method can further comprise the steps of: training the training data to obtain a second data analyzer; the second data analyzer may be configured to characterize a mapping relationship between the information to be replied and the reply content; the training data may include: the above and information to be replied in the corpus, and the reply content in the corpus. Optionally, the corpus may be a dialogue corpus, and in particular, the dialogue corpus may be: telephone conversation corpus.

In an alternative embodiment of the present application, the data model may be trained based on training data to obtain a second data analyzer, which may characterize the mapping between the input data (above, information to be replied to) and the output data (reply content).

In another alternative embodiment of the present application, the reply content may be determined according to a reply instruction of the user. Accordingly, the method may further include: receiving a reply instruction sent by the second equipment; determining reply content corresponding to the first voice information according to the reply instruction; the first voice information may be information transmitted from the third device to the first device.

In practical application, the user can determine the reply instruction according to the real-time dialogue content. The reply instruction may include: reply content of integrity (e.g., sentences, etc.); alternatively, the reply instruction may include: and replying the keywords of the content to obtain corresponding reply content according to the keyword expansion. It will be appreciated that embodiments of the present application are not limited to specific reply instructions. For example, the first voice message is "do you recently buy a house". The reply command comprises a keyword ' place name ' and a keyword ' house type ', and corresponding reply contents, such as ' I want to know house types near place names ' and thank you ', can be obtained according to the keyword ' place name ' and the keyword ' house type '.

In yet another alternative embodiment of the present application, the reply content may be determined according to the communication type and the information to be replied. For example, a corresponding dialogue corpus may be predetermined for the communication type, so that, according to the information to be replied, searching may be performed in the dialogue corpus corresponding to the communication type to obtain reply content corresponding to the information to be replied. For example, information a matched with the information to be replied in the dialogue corpus can be obtained, and then the reply content is obtained from the following of the information a.

In an optional embodiment of the present application, the first voice information may be replied according to the reply mode information, so as to improve the rationality of the reply content.

The reply mode information specifically includes:

the first reply mode information is used for representing continuous consultation so as to acquire required information for a user; or alternatively

And the second reply mode information is used for representing the quick ending dialogue so as to save the cost of the dialogue.

The reply mode information can restrict the duration of the dialogue, so that the cost, such as time cost and operation resource, spent by the dialogue can be saved under the condition of meeting the intention of a user.

The embodiment of the application can provide the following determination mode of reply mode information:

manner of determination 1,

The determining mode 1 may determine reply mode information for the conversation according to matching information between the communication type and the user feature corresponding to the second device.

User characteristics may refer to characteristics that a user has. The user of the embodiment of the application may include: a user of the second device. Optionally, the user characteristics may include at least one of the following: preference features and static features.

Static features are relatively stable features such as the user's age, gender, territory, academic, business, profession, marital, consumption level, identity (e.g., dad, mom, grandpa, milk, grandma, etc.), etc.

The preference feature is typically dynamic with respect to the relative stability of the static feature described above, which may vary with changing user behavior. In an alternative embodiment of the present application, the preference feature may refer to a user's preference feature for content. Wherein the preference feature may vary with a user's behavior (at least one of browsing behavior, searching behavior, collecting behavior, storing behavior, focusing behavior, selecting behavior, and evaluating behavior) with respect to the content.

Examples of preference characteristics may include: preferred communication type, etc. The embodiment of the application can provide a setting interface so that a user can set the preferred communication type through the setting interface, and it can be understood that the preferred communication type can be updated with time update. For example, if user a has a demand for buying a house during period 1, then a promotional call for a property class may be significant to the user during period 1, so the types of communications that may be set to preferences include: the type of property. As another example, user B is interested in investment during period 2, then the promotional phones of the investment class may be meaningful to that user, and thus may set preferred communication types including: investment type.

Optionally, the preferred communication types may include: the relatives and friends type, etc., thus meeting the requirement of obtaining information when the user calls through strange telephone numbers.

Optionally, the preferred communication types may include: transaction types such as take-out type or express type, and the like, so that daily transactions can be prevented from being missed.

The preferred communication types may include: a property type, a medical insurance type, an automobile insurance type, a financial type, a stock investment type, or a loan type, etc.

The determining mode 1 determines reply mode information for the dialogue according to the matching information between the communication type and the user characteristics corresponding to the second device.

Optionally, the matching information is matching, and the reply mode information may include: and the first reply mode information is used for representing continuous consultation. For example, in the case where the communication type matches the communication type preferred by the user, the information related to the communication type can be continuously consulted, and the user can be assisted in acquiring more preferred information.

Optionally, the matching information is not matching, and the reply mode information may include: and the second reply mode information is used for representing the quick ending dialogue. For example, in the event that the communication type does not match the user-preferred communication type, the conversation may be ended quickly to save the cost spent on the conversation.

Determination mode 2,

The determining mode 2 may determine reply mode information for the session according to the communication type and/or remaining session duration information corresponding to the second device.

The remaining session length information may be used to limit the session length, which may affect the cost of consumption of the session or the tariff of the user.

In an alternative embodiment of the present application, the unit session duration may be determined in month units, and the first remaining session duration may be obtained according to the unit session duration and the session duration consumed in the unit.

Alternatively, the remaining session duration information may be obtained according to the session duration and the first remaining session duration, for example, according to a minimum value of the session duration and the first remaining session duration. The session duration may characterize the duration spent in a session, which may be 3 minutes or the like.

In this embodiment of the present application, optionally, the remaining session duration information exceeds a duration threshold, and the reply mode information may include: and the first reply mode information is used for representing continuous consultation. In this case, the remaining session duration is sufficient, so that the consultation may be continued, or whether to continue the consultation may be determined according to the matching information between the communication type and the user feature corresponding to the second device, for example, if the matching information is matching, the consultation is continued, or else, the session is terminated quickly.

In this embodiment of the present application, optionally, the remaining session duration information does not exceed a duration threshold, and the reply mode information includes: and the second reply mode information is used for representing the quick ending dialogue. In this case, the remaining session duration is insufficient, so that the session can be terminated quickly.

The above description is provided for details of various technical solutions for determining reply content, and it will be understood that one skilled in the art may adopt one or a combination of the above various technical solutions according to practical application requirements.

In practical applications, TTS (Text To Speech) technology may be used To convert the reply content into Speech and send the Speech corresponding To the reply content To the third device.

In summary, the voice communication processing method of the embodiment of the application takes over the voice communication request for the user, so that the cost spent by the user in answering the incoming call can be saved. Particularly, in the case that the communication type is the harassment type, the harassment of the harassment call to the user can be reduced.

In addition, the embodiment of the application determines the communication type according to the actual conversation content, and identifies the corresponding communication type as the harassment type according to the conversation content corresponding to the new harassment telephone number even if the harassment telephone number library does not timely record the new harassment telephone number; the embodiment of the application can not be influenced by the recording range and the updating speed of the harassment telephone numbers, so that the recognition accuracy of the communication type can be improved.

Method embodiment II

Referring to fig. 4, a flowchart illustrating steps of a second embodiment of a voice communication processing method of the present application is shown, where the method may specifically include the following steps:

step 401, receiving a voice communication request; the voice communication request may be a request initiated by the third device to the second device;

step 402, establishing a dialogue with a third device;

step 403, determining dialogue content according to the first voice information of the third device;

step 404, recording dialogue content.

With respect to the first embodiment of the method shown in fig. 2, the method of this embodiment may further include:

step 405, after establishing a session with a third device, accessing the second device into the session.

In the embodiment of the application, after the first device establishes a dialogue with the third device, the second device is accessed to the dialogue, so that the requirement of accessing the dialogue halfway by the user can be met. In other words, in the embodiment of the application, the voice communication request is first taken over by the smart phone assistant for the user, and the second device can be accessed into the session, so as to meet the requirements of letting the smart phone assistant exit the session and the user access the session in the middle.

In this embodiment of the present application, optionally, the accessing the second device to the session specifically includes: sending a call instruction to a fourth device to enable the fourth device to call the second device; and establishing connection with the fourth device.

The embodiment of the application can call the second device through the fourth device to establish a dialogue between the second device and the fourth device, and can establish a connection between the first device and the fourth device, so that the dialogue between the second device and the third device can be established.

Referring to fig. 5, a schematic diagram of an application environment of a voice communication processing method according to an embodiment of the present application is shown.

In fig. 5, a third device initiates an incoming call to a second device, which may be transferred to the first device to establish a conversation 1 between the first device and the third device. Assuming that after session 1 is established, the user of the second device has a need to access the session, the second device may be called with the fourth device to establish session 2 between the second device and the fourth device. And, a secure private connection may also be established between the first device and the fourth device to enable a conversation between the third device and the second device.

In this embodiment of the present application, different devices may use different communication identifiers, where the communication identifiers may include: telephone numbers, etc. For example, the first device, the second device, the third device, and the fourth device use different telephone numbers, respectively.

In summary, according to the voice communication processing method of the embodiment of the present application, after a first device establishes a session with a third device, the second device is connected to the session, so that the requirement of a user for midway connection to the session can be met.

For example, user C misses an incoming call due to busy or the like, so the smart phone assistant helps user C take over the incoming call. Assuming that the smart phone assistant sent the communication type to user C, the smart phone assistant may exit the conversation if the communication type is the user-preferred communication type.

Method example III

Referring to fig. 6, a flowchart illustrating steps of a third embodiment of a voice communication processing method of the present application, applied to a first device, may specifically include the following steps:

step 601, establishing a dialogue with a third device by switching on the transferred call;

the incoming call may be an incoming call initiated by the third device to the second device;

Step 602, receiving first voice information sent by a third device, and determining information to be replied according to pause interval information of the first voice information;

in practical application, the voice corresponding to the information to be replied can be intercepted from the recording file, and the text corresponding to the information to be replied can be obtained through the voice recognition technology.

Step 603, determining a communication type corresponding to the incoming call according to the information to be replied and the above information to be replied;

in practical application, NLU (natural language understanding ) can be performed on the information to be replied and the context thereof, and the communication type can be determined according to the natural understanding result.

Step 604, obtaining the corresponding key information from the information to be replied and the above information, and adding the key information into the dialogue note;

step 605, determining reply mode information for the session according to the communication type and/or the remaining session duration information corresponding to the second device;

step 606, determining reply content corresponding to the information to be replied according to the reply mode information, the communication type and the dialogue content;

step 607, converting the reply content into a target voice, and sending the target voice to the third device;

Step 608, judging whether the reply content represents ending of the dialogue, if yes, executing step 609, otherwise, returning to step 602;

if the reply content characterizes an end dialog (e.g., says bye), then step 608 is performed; if the reply content does not characterize the ending dialog but still requires that the dialog be continued, step 602 is performed to continue waiting for the next round of speech by the third device and repeating the process described above.

Step 609, send the dialogue note and dialogue content to the second device.

The dialogue notes of the current dialogue and the dialogue record can be used as record information of the current dialogue together to be tidied to the user. Optionally, text information corresponding to the dialogue record may also be sent.

Optionally, in the above processing procedure, data desensitization and security protection processing are performed during data transmission and storage to protect user privacy.

In summary, the voice communication processing method of the embodiment of the application has the following advantages:

firstly, the embodiment of the application requests the voice communication of the user, so that the cost spent by the user for answering the incoming call can be saved. Particularly, in the case that the communication type is the harassment type, the harassment of the harassment call to the user can be reduced.

Secondly, determining a communication type according to actual conversation content, and identifying the corresponding communication type as a harassment type according to conversation content corresponding to the new harassment telephone number even if the harassment telephone number library does not timely record the new harassment telephone number; the embodiment of the application can not be influenced by the recording range and the updating speed of the harassment telephone numbers, so that the recognition accuracy of the communication type can be improved.

Furthermore, the embodiment of the application can determine the processing mode information or the reply mode information of the incoming call according to the preference characteristics of the user. Optionally, if the communication type is a communication type preferred by the user, the voice communication request can be assisted, and information preferred by the user can be obtained through intelligent interaction, so that personalized service of the user can be provided. For example, in the case of receiving an incoming call of the property type, the user D is relatively insensitive to the property type, so that the processing mode information used may be refusal, and the reply mode information used may be quick-ending dialog. For another example, in the case of receiving an incoming call of the property type, the user E is interested in the property type, so that the adopted processing mode information may be answering, the adopted reply mode information may be continuous consultation, and key information may be obtained from the dialogue content and provided to the user E.

In addition, the embodiment of the application can send the real-time dialogue content to the second device, so that the user can view the real-time dialogue content on the interface. The user may instruct the smart phone assistant to determine the reply content using text or voice input.

And, during the conversation of the smart phone assistant and the calling device, the user can make the smart phone assistant exit the conversation and access the conversation itself. The intelligence of the smart phone assistant can be improved, so that the smart phone assistant is more like a real secretary which exists, is close and flexible, and is not just a conversation robot.

In addition, the embodiment of the application can extract key information from the dialogue content and send the key information and the dialogue record to the user after the dialogue is ended. Therefore, the information acquisition efficiency of the user can be improved.

Method example IV

Referring to fig. 7, a flowchart illustrating steps of a fourth embodiment of a voice communication processing method of the present application may be applied to a second device, where a voice communication request of the second device is sent to a first device, and the method may specifically include the following steps:

step 701, receiving dialogue content corresponding to a voice communication request from first equipment;

Step 702, outputting the dialogue content.

In this embodiment of the present application, in a case where a voice communication request of the second device is taken over by the first device, session content corresponding to the voice communication request may be received from the first device. The dialogue content can enable the user of the second device to know the condition of the voice communication request, so as to help the user of the second device to judge whether the voice communication request is meaningful to the user.

In this embodiment of the present application, optionally, the session content includes: a plurality of speech segments arranged according to time; the voice segment corresponds to the dialogue identity corresponding to the voice information.

In this embodiment of the present invention, optionally, the voice communication request is sent to the first device when the user state information corresponding to the second device meets a preset condition.

In this embodiment of the present application, optionally, the voice communication request includes:

a voice incoming call request; or alternatively

A voice instant messaging request.

In an embodiment of the present application, optionally, the method may further include:

and sending the dialogue content to the intelligent equipment so as to enable the intelligent equipment to play the dialogue content.

A smart device (intelligent device) refers to any device, appliance or machine having computing processing capabilities. The intelligent device is the product of combining the traditional electrical device with computer technology, data processing technology, control theory, sensor technology, network communication technology, power electronics technology and the like.

In this embodiment of the present application, optionally, the foregoing smart device may include: an intelligent household device. The smart home device may include: intelligent switch, intelligent lighting equipment, intelligent refrigerator, intelligent washing machine, intelligent door lock, intelligent entrance guard etc.. According to the embodiment of the application, the electroacoustic transducer assembly, such as a loudspeaker, can be integrated in the intelligent device, and the electroacoustic transducer assembly can play the conversation content. The method and the device can meet the conversation content listening requirement of the user under the condition of being in the space environment corresponding to the intelligent device. For example, the user is cooking in the kitchen, and the session content can be played through an intelligent switch or intelligence in the kitchen.

In an embodiment of the present application, optionally, a communication network between the second device and the smart device may include: bluetooth network, infrared network, or WIFI network, etc., it is to be understood that the embodiments of the present application are not limited to a specific communication network between the second device and the smart device.

In an embodiment of the present application, optionally, the method may further include: receiving user voice sent by intelligent equipment; and transmitting the user voice to the first equipment.

The embodiment of the application can collect the user voice through the intelligent device so as to apply the user voice to the dialogue process, for example, the user voice can be used as a reply instruction in the dialogue process; alternatively, the user speech may be used as dialogue content in a dialogue process. An acoustic-electric transduction component, such as a microphone, may be provided in the smart device for capturing the user's voice.

In an embodiment of the present application, optionally, the method may further include: receiving takeover information corresponding to the voice communication request from the first device;

the takeover information may include at least one of the following information;

request time, communication identification, communication type, and processing mode information.

The above-mentioned takeover information can make the user of second equipment know the condition of voice communication request so as to help the user of second equipment judge that the voice communication request is meaningful for himself or herself. In the case that the voice communication request is meaningful to the user, the takeover information may further include key information that is meaningful to the user, so as to improve the information acquisition efficiency of the user.

In this embodiment of the present application, optionally, the processing mode information specifically includes: answering or hanging up;

in the case that the processing manner information includes answering, the takeover information may include: the conversation content, and/or key information extracted from the conversation content. In the case that the incoming call is meaningful to the user, the takeover information may further include key information that is meaningful to the user, so as to improve the information acquisition efficiency of the user.

Optionally, the method may further include: after the first device establishes a dialogue with the third device, the dialogue is accessed. According to the embodiment of the application, the voice communication request is taken over for the user through the intelligent telephone assistant, the user corresponding to the second equipment can be accessed into the conversation, and the requirements of enabling the intelligent telephone assistant to exit the conversation and enabling the user to access the conversation halfway can be met.

Optionally, the accessing the session specifically includes: establishing a dialogue with a fourth device according to a call request sent by the fourth device; and the fourth device establishes connection with the first device.

Optionally, the method may further include: sending a reply instruction to the first device so that the first device determines reply content corresponding to the first voice information according to the reply instruction; the first voice information may be information transmitted from the third device to the first device.

The reply instruction may instruct the first device to determine reply content during the conversation. For example, if the user is interested in a certain communication type a, the reply instruction may instruct to consult with respect to the relevant information of the communication type a, or the like.

Method embodiment five

Referring to fig. 8, a flowchart illustrating steps of a fifth embodiment of a voice communication processing method of the present application is shown, where the method specifically may include the following steps:

step 801, receiving a takeover instruction for a dialogue;

step 802, responding to the take-over instruction to determine the dialogue content of the dialogue;

step 803, recording the dialogue content.

The embodiment of the application can be applied to processing equipment such as intelligent sound boxes and the like with a voice processing function, and the voice processing function can comprise a voice acquisition function and a voice playing function. The processing device may include: electroacoustic transducer assembly and electroacoustic transducer assembly.

The embodiment of the application can be suitable for a voice communication scene that a user leaves a dialogue halfway. After the user establishes the dialogue, if the user leaves the dialogue halfway due to something, etc., a take-over instruction can be triggered, and the processing device can take over the dialogue. In particular, the processing device may determine the dialog content and record the dialog content so that the user knows the follow-up of the dialog from the dialog content.

The voice communication scenario may include: two-party voice communication scenarios, or more than two-party voice communication scenarios, may include: teleconferencing scenarios, and the like.

In this embodiment of the present application, optionally, the determining the session content of the session may specifically include: and collecting third voice information of the opposite end of the conversation, and determining conversation content of the conversation according to the third voice information.

In this embodiment of the present application, optionally, the determining the session content of the session may specifically include: determining information to be replied according to the pause interval information of the third voice information; and determining the reply content corresponding to the information to be replied according to the information to be replied and the information to be replied.

In this embodiment of the present application, optionally, the determining the session content of the session may specifically include: determining information to be replied according to the pause interval information of the third voice information; and determining reply content corresponding to the information to be replied according to the reply instruction of the user.

In this embodiment of the present application, optionally, the recording session content may specifically include:

recording the voice information of the dialogue; and/or

And recording text information corresponding to the voice information of the dialogue.

In this embodiment of the present application, optionally, the recording the session content includes:

segmenting the voice information of the dialogue to obtain corresponding voice segments;

In this embodiment of the present application, optionally, the segmenting the voice information of the session includes:

and segmenting the voice information of the conversation according to the conversation identity corresponding to the voice information.

In an embodiment of the present application, optionally, the method may further include: the dialogue content is output to the user, and the embodiment of the application can play and/or display the dialogue content.

In summary, in the voice communication processing method according to the embodiment of the present application, after a user establishes a session, if the user leaves the session midway due to something, etc., a take-over instruction may be triggered, and the processing device may take over the session. In particular, the processing device may determine the dialog content and record the dialog content so that the user knows the follow-up of the dialog from the dialog content.

Method example six

Referring to fig. 9, a flowchart illustrating steps of a sixth embodiment of a voice communication processing method of the present application, where the method is applied to a first device, may specifically include the following steps:

step 901, receiving an access instruction sent by a second device;

step 902, accessing a session corresponding to the second device according to the access instruction;

step 903, determining dialogue content according to the voice information of the dialogue opposite terminal;

step 904, recording dialogue content;

step 905, transmitting the session content to the second device.

The embodiment of the application can be applied to first equipment such as an intelligent sound box and the like with a voice processing function, and the voice processing function can comprise a voice acquisition function and a voice playing function. The first device may include: electroacoustic transducer assembly and electroacoustic transducer assembly.

The embodiment of the application can be suitable for a voice communication scene that a user leaves a dialogue halfway. After the session is established, if the user leaves the session halfway due to something, the second device may trigger an access instruction, and then the second device may send the access instruction to the first device, so that the first device accesses the session. The access instruction may include: and the corresponding information is dialogized.

In this embodiment, optionally, the first device may access the session corresponding to the second device according to the method described in fig. 4 or fig. 5. For example, the second device sends a call instruction to the fifth device, so that the fifth device calls the first device; and establishing connection with the fifth device.

According to the method and the device, the first device can be called through the fifth device to establish a dialogue between the first device and the fifth device, and connection can be established between the second device and the fifth device, so that the dialogue between the dialogue opposite ends corresponding to the first device and the second device can be established.

In the embodiment of the application, the first device may determine the dialogue content and record the dialogue content, so that the user of the second device knows the follow-up condition of the dialogue according to the dialogue content. The process of recording the dialogue content may refer to the foregoing embodiments, which are not described herein in detail, but may refer to each other.

Method embodiment seven

Referring to fig. 10, a flowchart illustrating steps of a sixth embodiment of a voice communication processing method of the present application, where the method is applied to a second device, may specifically include the following steps:

step 1001, after establishing a session, sending an access instruction to a first device, so that the first device accesses the session;

step 1002, receiving dialogue content sent by the first device.

The embodiment of the application can be suitable for a voice communication scene that a user leaves a dialogue halfway. After the session is established, if the user leaves the session halfway due to something, the second device may trigger an access instruction, and then the second device may send the access instruction to the first device, so that the first device accesses the session. The access instruction may include: and information corresponding to the dialogue, such as communication identification of the opposite end of the dialogue, and the like.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments and that the acts referred to are not necessarily required by the embodiments of the present application.

The embodiment of the application also provides a voice communication processing device.

Referring to fig. 11, there is shown a block diagram of an embodiment of a voice communication processing apparatus of the present application, which is applied to a first device, where the apparatus may specifically include the following modules:

a receiving module 1101, configured to receive a voice communication request; the voice communication request is a request initiated by the third equipment to the second equipment;

an establishing module 1102, configured to establish a dialogue with a third device;

a determining module 1103, configured to determine dialogue content according to the first voice information of the third device;

a recording module 1104 for recording the dialogue content.

Optionally, the recording module 1104 may include:

the first recording module is used for recording the voice information of the conversation; and/or

And the second recording module is used for recording text information corresponding to the voice information of the conversation.

Optionally, the recording module 1104 may include:

the segmentation module is used for segmenting the voice information of the conversation so as to obtain corresponding voice segments;

and the third recording module is used for recording the voice section and text information corresponding to the voice section.

Alternatively, the segmentation module may include:

And the segmentation module based on the dialogue identity is used for segmenting the dialogue voice information according to the dialogue identity corresponding to the voice information.

Optionally, the session content may include: a plurality of speech segments arranged according to time; the speech segments may correspond to dialog identities corresponding to the speech information.

Optionally, the receiving module 1101 is specifically configured to receive a voice communication request when the user status information corresponding to the second device meets a preset condition.

Optionally, the voice communication request may specifically include:

a voice incoming call request; or alternatively

A voice instant messaging request.

Optionally, the apparatus may further include:

and the access module is used for accessing the second equipment into the dialogue after the dialogue is established with the third equipment.

Optionally, the access module may include:

an instruction sending module, configured to send a call instruction to a fourth device, so that the fourth device calls the second device;

and the connection establishment module is used for establishing connection with the fourth equipment.

Optionally, the apparatus may further include:

the reply instruction receiving module is used for receiving a reply instruction sent by the second equipment;

And the reply content determining module is used for determining reply content corresponding to the first voice information according to the reply instruction.

Optionally, the apparatus may further include:

and the communication type determining module is used for determining the communication type corresponding to the voice communication request according to the dialogue content.

Optionally, the apparatus may further include:

and the first reply mode information determining module is used for determining reply mode information aiming at the dialogue according to the matching information between the communication type and the user characteristics corresponding to the second equipment.

Optionally, the apparatus may further include:

and the second reply mode information determining module is used for determining reply mode information aiming at the conversation according to the communication type and/or the residual conversation duration information corresponding to the second equipment.

Optionally, the reply mode information may include:

the first reply mode information is used for representing continuous consultation; or alternatively

And the second reply mode information is used for representing the quick ending dialogue.

Optionally, the matching information is matching, and the reply mode information may include: the first reply mode information is used for representing continuous consultation; or alternatively

The matching information is not matching, and the reply mode information may include: and the second reply mode information is used for representing the quick ending dialogue.

Optionally, the remaining session duration information exceeds a duration threshold, and the reply mode information may include: the first reply mode information is used for representing continuous consultation; or alternatively

The remaining session duration information does not exceed the duration threshold, and the reply mode information may include: and the second reply mode information is used for representing the quick ending dialogue.

Optionally, the apparatus may further include:

and the dialogue content sending module is used for sending the dialogue content to the second equipment.

Optionally, the apparatus may further include:

the key information determining module is used for determining key information from the dialogue content;

and the key information sending module is used for sending the key information to the second equipment.

Optionally, the apparatus may further include:

the information to be replied determining module is used for determining information to be replied according to the pause interval information of the first voice information;

And the reply content determining module is used for determining reply content corresponding to the information to be replied according to the information to be replied and the context of the information to be replied.

Optionally, the apparatus may further include:

the takeover information sending module is used for sending takeover information corresponding to the voice communication request to the second equipment;

the takeover information includes at least one of the following information;

Optionally, the processing mode information may include: answering or hanging up;

in the case that the processing mode information includes answering, the takeover information includes: the conversation content, and/or key information extracted from the conversation content.

Referring to fig. 12, there is shown a block diagram of an embodiment of a voice communication processing apparatus of the present application, which is applied to a second device, where a voice communication request of the second device is sent to a first device, where the apparatus may specifically include the following modules:

a receiving module 1201, configured to receive, from the first device, dialogue content corresponding to the voice communication request;

an output module 1202 for outputting the dialog content.

Optionally, the session content includes: a plurality of speech segments arranged according to time; the voice segment corresponds to the dialogue identity corresponding to the voice information.

Optionally, the voice communication request is sent to the first device when the user state information corresponding to the second device meets a preset condition.

Optionally, the voice communication request may include:

a voice incoming call request; or alternatively

A voice instant messaging request.

Optionally, the apparatus may further include:

and the dialogue content sending module is used for sending the dialogue content to the intelligent equipment so as to enable the intelligent equipment to play the dialogue content.

Optionally, the apparatus further includes:

the user voice receiving module is used for receiving user voice sent by the intelligent equipment;

and the user voice sending module is used for sending the user voice to the first equipment.

Optionally, the smart device may include: an intelligent household device.

Optionally, the apparatus may further include: the takeover information receiving module is used for receiving takeover information corresponding to the voice communication request from the first equipment;

the takeover information may include at least one of the following information;

Optionally, the apparatus may further include:

and the access module is used for accessing the dialogue after the dialogue between the first equipment and the third equipment is established.

Optionally, the access module may include:

the dialogue establishing module is used for establishing dialogue with the fourth equipment according to the call request sent by the fourth equipment; and the fourth device establishes connection with the first device.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Embodiments of the present application may be implemented as a system or device configured as desired using any suitable hardware and/or software. Fig. 13 schematically illustrates an exemplary device 1300 that may be used to implement various embodiments described above in this application.

For one embodiment, fig. 13 illustrates an exemplary device 1300, the device 1300 may include: one or more processors 1302, a system control module (chipset) 1304 coupled to at least one of the processors 1302, a system memory 1306 coupled to the system control module 1304, a non-volatile memory (NVM)/storage 1308 coupled to the system control module 1304, one or more input/output devices 1310 coupled to the system control module 1304, and a network interface 1312 coupled to the system control module 1304. The system memory 1306 may include: instructions 1362, the instructions 1362 being executable by the one or more processors 1302.

The processor 1302 may include one or more single-core or multi-core processors, and the processor 1302 may include any combination of general-purpose or special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In some embodiments, the device 1300 can be a server, a target device, a wireless device, etc. as described in the embodiments of the present application.

In some embodiments, the apparatus 1300 may include one or more machine-readable media (e.g., system memory 1306 or NVM/storage 1308) having instructions and one or more processors 1302, in combination with the one or more machine-readable media, configured to execute the instructions to implement the modules included in the foregoing apparatus to perform the actions described above in embodiments of the present application.

The system control module 1304 of an embodiment may include any suitable interface controller for providing any suitable interface to at least one of the processors 1302 and/or any suitable device or component in communication with the system control module 1304.

The system control module 1304 of an embodiment may include one or more memory controllers to provide an interface to the system memory 1306. The memory controller may be a hardware module, a software module, and/or a firmware module.

The system memory 1306 of one embodiment may be used to load and store data and/or instructions 1362. For one embodiment, the system memory 1306 may include any suitable volatile memory, such as suitable DRAM (dynamic random Access memory). In some embodiments, the system memory 1306 may include: double data rate type four synchronous dynamic random access memory (DDR 4 SDRAM).

The system control module 1304 of an embodiment may include one or more input/output controllers to provide interfaces to the NVM/storage 1308 and the input/output device(s) 1310.

NVM/storage 1308 for one embodiment may be used to store data and/or instructions 1382. NVM/storage 1308 may include any suitable nonvolatile memory (e.g., flash memory, etc.) and/or may include any suitable nonvolatile storage device(s), such as, for example, one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives, etc.

NVM/storage 1308 may include storage resources that are physically part of the device on which device 1300 is installed, or which may be accessed by the device without being part of the device. For example, NVM/storage 1308 may be accessed over a network via network interface 1312 and/or through input/output devices 1310.

Input/output device(s) 1310 for one embodiment may provide an interface for device 1300 to communicate with any other suitable device, input/output device 1310 may include a communication component, an audio component, a sensor component, and the like.

The network interface 1312 for one embodiment may provide an interface for the device 1300 to communicate over one or more networks and/or with any other suitable device, and the device 1300 may communicate wirelessly with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols, such as accessing a wireless network based on a communication standard, such as WiFi (wireless fidelity ), 2G or 3G or 4G or 5G, or a combination thereof.

For one embodiment, at least one of the processors 1302 may be packaged together with logic of one or more controllers (e.g., memory controllers) of the system control module 1304. For one embodiment, at least one of the processors 1302 may be packaged together with logic of one or more controllers of the system control module 1304 to form a System In Package (SiP). For one embodiment, at least one of the processors 1302 may be integrated on the same new product as the logic of one or more controllers of the system control module 1304. For one embodiment, at least one of the processors 1302 may be integrated on the same chip with logic of one or more controllers of the system control module 1304 to form a system on chip (SoC).

In various embodiments, device 1300 may include, but is not limited to: a desktop computing device or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.), among others. In various embodiments, device 1300 may have more or fewer components and/or different architectures. For example, in some embodiments, device 1300 may include one or more cameras, keyboards, liquid Crystal Display (LCD) screens (including touch screen displays), non-volatile memory ports, multiple antennas, graphics chips, application Specific Integrated Circuits (ASICs), and speakers.

Wherein if the display comprises a touch panel, the display screen may be implemented as a touch screen display to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation.

The embodiment of the application also provides a non-volatile readable storage medium, in which one or more modules (programs) are stored, where the one or more modules are applied to a device, and the device may be caused to execute instructions (instructions) of the methods in the embodiments of the application.

In one example, an apparatus is provided, comprising: one or more processors; and instructions in one or more machine-readable media stored thereon, which when executed by the one or more processors, cause the apparatus to perform a method as in an embodiment of the present application, the method may comprise: the method shown in one or more of fig. 1-10.

One or more machine-readable media are also provided in one example, having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform a method as in an embodiment of the present application, the method may comprise: the method shown in one or more of fig. 1-10.

The specific manner in which the operations of the respective modules are performed in the apparatus of the above embodiments has been described in detail in the embodiments related to the method, and will not be described in detail herein, but only with reference to the portions of the description related to the embodiments of the method.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present embodiments have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the present application.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing has outlined a voice communication processing method, a voice communication processing apparatus, an apparatus, and a machine readable medium in which the principles and embodiments of the present application have been described in detail, and the above examples have been provided to facilitate understanding of the method and core ideas of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method for processing voice communications, applied to a first device, the method comprising:

establishing a dialogue with a third device;

recording dialogue content;

determining a communication type corresponding to the voice communication request according to the dialogue content;

and determining reply mode information aiming at the dialogue according to the matching information between the communication type and the user characteristics corresponding to the second equipment.

2. The method of claim 1, wherein the recording of dialog content comprises:

recording voice information of the dialogue; and/or

3. The method of claim 1, wherein the recording of dialog content comprises:

4. A method according to claim 3, wherein said segmenting the speech information of the dialog comprises:

And segmenting the voice information of the dialogue according to the dialogue identity corresponding to the voice information.

5. The method of claim 1, wherein the dialog content comprises: a plurality of speech segments arranged according to time; the voice segment corresponds to the dialogue identity corresponding to the voice information.

6. The method of claim 1, wherein the receiving a voice communication request comprises:

and receiving a voice communication request under the condition that the user state information corresponding to the second equipment accords with a preset condition.

7. The method according to any one of claims 1 to 6, wherein the voice communication request comprises:

a voice incoming call request; or alternatively

A voice instant messaging request.

8. The method according to any one of claims 1 to 6, further comprising:

after establishing a session with a third device, the second device is accessed to the session.

9. The method of claim 8, wherein said accessing the second device to the conversation comprises:

sending a call instruction to fourth equipment so that the fourth equipment calls the second equipment;

and establishing connection with the fourth device.

10. The method according to any one of claims 1 to 6, further comprising:

receiving a reply instruction sent by the second equipment;

and determining reply content corresponding to the first voice information according to the reply instruction.

11. The method according to claim 1, wherein the method further comprises:

and determining reply mode information aiming at the dialogue according to the communication type and/or the residual dialogue duration information corresponding to the second equipment.

12. The method according to claim 1 or 11, wherein the reply mode information includes:

13. The method of claim 1, wherein the matching information is a match, and the reply mode information includes: the first reply mode information is used for representing continuous consultation; or alternatively

The matching information is unmatched, and the reply mode information comprises: and the second reply mode information is used for representing the quick ending dialogue.

14. The method of claim 11, wherein the remaining session duration information exceeds a duration threshold, and wherein the reply mode information comprises: the first reply mode information is used for representing continuous consultation; or alternatively

The remaining dialogue duration information does not exceed a duration threshold, and the reply mode information comprises: and the second reply mode information is used for representing the quick ending dialogue.

15. The method according to any one of claims 1 to 6, further comprising:

and sending the dialogue content to the second device.

16. The method according to any one of claims 1 to 6, further comprising:

determining key information from the dialogue content;

and sending the key information to the second equipment.

17. The method according to any one of claims 1 to 6, further comprising:

determining information to be replied according to the pause interval information of the first voice information;

and determining reply content corresponding to the information to be replied according to the information to be replied and the above of the information to be replied.

18. The method according to any one of claims 1 to 6, further comprising:

transmitting takeover information corresponding to the voice communication request to the second equipment;

the takeover information includes at least one of the following information;

19. The method of claim 18, wherein the processing mode information includes: answering or hanging up;

in the case that the processing mode information includes answering, the takeover information includes: the dialog content, and/or key information extracted from the dialog content.

20. A voice communication processing method applied to a second device whose voice communication request is transmitted to a first device, the method comprising:

outputting the dialogue content; the first device determines a communication type corresponding to the voice communication request according to the dialogue content; and determining reply mode information aiming at the dialogue according to the matching information between the communication type and the user characteristics corresponding to the second equipment.

21. The method of claim 20, wherein the dialog content comprises: a plurality of speech segments arranged according to time; the voice segment corresponds to the dialogue identity corresponding to the voice information.

22. The method of claim 20, wherein the voice communication request is sent to the first device if the user status information corresponding to the second device meets a preset condition.

23. The method of claim 20, wherein the voice communication request comprises:

a voice incoming call request; or alternatively

A voice instant messaging request.

24. The method of claim 20, wherein the method further comprises:

and sending the dialogue content to the intelligent equipment so that the intelligent equipment plays the dialogue content.

25. The method of claim 20, wherein the method further comprises:

receiving user voice sent by intelligent equipment;

and sending the user voice to the first equipment.

26. The method of claim 24 or 25, the smart device comprising: an intelligent household device.

27. The method of any one of claims 20 to 25, further comprising: receiving takeover information corresponding to the voice communication request from the first equipment;

The takeover information includes at least one of the following information;

28. The method of claim 27, wherein the processing mode information includes: answering or hanging up;

29. The method according to any one of claims 20 to 25, further comprising:

after the first device establishes a dialogue with a third device, the dialogue is accessed.

30. The method of claim 29, wherein said accessing said session comprises:

establishing a dialogue with a fourth device according to a call request sent by the fourth device; wherein the fourth device establishes a connection with the first device.

31. A method of processing voice communications, the method comprising:

receiving a takeover instruction for the dialog; wherein the dialog is established based on a voice communication request initiated by the third device to the second device;

Recording the dialogue content; wherein, according to the dialogue content, determining the communication type corresponding to the voice communication request; and determining reply mode information aiming at the dialogue according to the matching information between the communication type and the user characteristics corresponding to the second equipment.

32. The method of claim 31, wherein the recording of dialog content comprises:

recording voice information of the dialogue; and/or

33. The method of claim 31, wherein the recording of dialog content comprises:

34. The method of claim 33, wherein segmenting the speech information of the conversation comprises:

35. The method of claim 33, wherein the dialog content comprises: a plurality of speech segments arranged according to time; the voice segment corresponds to the dialogue identity corresponding to the voice information.

36. A method for processing voice communications, applied to a first device, the method comprising:

receiving an access instruction sent by second equipment;

accessing a dialogue corresponding to the second equipment according to the access instruction; wherein the dialog is established based on a voice communication request initiated by the third device to the second device;

recording dialogue content;

transmitting the dialogue content to the second device; the first device determines a communication type corresponding to the voice communication request according to the dialogue content; and determining reply mode information aiming at the dialogue according to the matching information between the communication type and the user characteristics corresponding to the second equipment.

37. A method of processing voice communications, for use with a second device, the method comprising:

after a dialogue is established, an access instruction is sent to first equipment so that the first equipment can access the dialogue; wherein the dialog is established based on a voice communication request initiated by the third device to the second device;

receiving dialogue content sent by the first equipment; the first device determines a communication type corresponding to the voice communication request according to the dialogue content; and determining reply mode information aiming at the dialogue according to the matching information between the communication type and the user characteristics corresponding to the second equipment.

38. A voice communication processing apparatus, applied to a first device, comprising:

the establishing module is used for establishing a dialogue with the third equipment;

the determining module is used for determining dialogue content according to the first voice information of the third equipment;

the recording module is used for recording dialogue content; the first device determines a communication type corresponding to the voice communication request according to the dialogue content; and determining reply mode information aiming at the dialogue according to the matching information between the communication type and the user characteristics corresponding to the second equipment.

39. A voice communication processing apparatus, characterized by being applied to a second device whose voice communication request is transmitted to a first device, comprising:

the receiving module is used for receiving dialogue content corresponding to the voice communication request from the first equipment;

the output module is used for outputting the dialogue content; the first device determines a communication type corresponding to the voice communication request according to the dialogue content; and determining reply mode information aiming at the dialogue according to the matching information between the communication type and the user characteristics corresponding to the second equipment.

40. An apparatus for voice communication processing, comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon, which when executed by the one or more processors, cause the apparatus to perform the method of one or more of claims 1-19.

41. One or more machine-readable media having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform the method of one or more of claims 1-19.

42. An apparatus for voice communication processing, comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon, which when executed by the one or more processors, cause the apparatus to perform the method of one or more of claims 20-37.

43. One or more machine-readable media having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform the method of one or more of claims 20-37.