CN111210843A

CN111210843A - Method and device for recommending dialect

Info

Publication number: CN111210843A
Application number: CN201911425467.XA
Authority: CN
Inventors: 唐大闰; 徐浩; 吴明辉
Original assignee: Miaozhen Information Technology Co Ltd
Current assignee: Miaozhen Information Technology Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-29

Abstract

The invention discloses a conversation recommendation method and device. Wherein, the method comprises the following steps: acquiring audio information of a target object through radio equipment carried by the current object, wherein the distance between the target object and the current object is smaller than or equal to a first threshold value; identifying audio information to obtain a mood type and an emotion type of a target object; extracting a target language from a target language table according to the tone type and the emotion type, wherein a plurality of reference language records are stored in the target language table, each reference language record comprises a consumption value of a first object, the tone type and the emotion type of the first object and a language used by a second object for providing service for the first object, the first object and the target object have the same role mark, and the second object and the current object have the same role mark; and sending the target language to the current object. The invention solves the technical problem of low efficiency of the recommendation conversation.

Description

Method and device for recommending dialect

Technical Field

The invention relates to the field of intelligent equipment, in particular to a conversation recommendation method and device.

Background

In the related art, in the process of providing services to customers by salesclerks, various questions of the customers, detailed descriptions for the customers, and the like are generally required to be answered. However, for less experienced salespeople, it is often possible to record various dialogues that address a variety of situations. However, if the customer is handled by the temporary search operation in actual use, the customer cannot be effectively served.

That is, in the related art, there is a lack of an effective method for recommending dialogs for a salesperson.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a conversation recommendation method and device, which at least solve the technical problem of low conversation recommendation efficiency.

According to an aspect of an embodiment of the present invention, there is provided a conversational recommendation method, including: acquiring audio information of a target object through radio equipment carried by the current object, wherein the distance between the target object and the current object is smaller than or equal to a first threshold value; identifying the audio information to obtain the tone type and the emotion type of the target object; extracting a target language from a target language table according to the tone type and the emotion type, wherein the target language table stores a plurality of reference language records, each reference language record comprises a consumption value of a first object, the tone type and the emotion type of the first object and a language used by a second object serving the first object, the first object and the target object have the same role mark, and the second object and the current object have the same role mark; and sending the target dialogues to the current object.

As an alternative example, the target dialect table includes a first dialect table and a second dialect table, and before the target dialect is extracted from the target dialect table according to the tone type and the emotion type, the method further includes: acquiring the reference speech records; determining each reference tactical record as a current reference tactical record, and executing the following steps until the reference tactical records are traversed: storing the mood type of the first object, the mood type of the first object and the speech used by the second object in the current reference speech record as an associated data in the first speech table, in case the consumption value of the first object in the current reference speech record is greater than or equal to a second threshold value; in the case where the consumption value of the first subject is smaller than the second threshold value in the current reference utterance record, a mood type of the first subject, an emotion type of the first subject, and an utterance used by the second subject in the current reference utterance record are stored as one piece of associated data in the second utterance table.

As an optional example, the recognizing the audio information to obtain the mood type of the target object and the mood type of the target object includes: inputting the audio information into a first target recognition model to obtain a first recognition result output by the first target recognition model, wherein the first recognition result comprises the mood type of the target object; converting the audio information into character information; and inputting the character information into a second target recognition model to obtain a second recognition result output by the second target recognition model, wherein the second recognition result comprises the emotion type of the target object.

As an optional example, the inputting the audio information into the first target recognition model to obtain the first recognition result output by the first target recognition model includes: inputting the audio information into the first target recognition model to obtain the first recognition result when the first target recognition model is a model trained by using sample audio data; when the first target recognition model is a model trained using a sample spectrum image, the spectrum image of the audio information is input to the first target recognition model, and the first recognition result is obtained.

As an optional example, after sending the target utterance to the current object, the method further includes: acquiring a first dialect used by the current object; recording the mood type of the target object, the mood type of the target object and the first utterance of the current object as a piece of associated data in the target utterance table according to the consumption value of the target object.

According to another aspect of the embodiments of the present invention, there is also provided a conversational recommendation apparatus, including: the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring audio information of a target object through radio equipment carried by a current object, and the distance between the target object and the current object is smaller than or equal to a first threshold; the identification unit is used for identifying the audio information to obtain the tone type and the emotion type of the target object; an extracting unit, configured to extract a target utterance from a target utterance table according to the mood type and the emotion type, where the target utterance table stores a plurality of reference utterance records, each reference utterance record includes a consumption value of a first object, the mood type and the emotion type of the first object and a utterance used by a second object serving the first object, the first object and the target object have a same role mark, and the second object and the current object have a same role mark; and the sending unit is used for sending the target dialect to the current object.

As an alternative example, the target dialog table includes a first dialog table and a second dialog table, and the apparatus further includes: a second obtaining unit, configured to obtain the multiple reference utterance records before extracting the target utterance from the target utterance table according to the type of mood and the type of emotion; a processing unit, configured to determine each of the reference tactical records as a current reference tactical record, and execute the following steps until traversing the reference tactical records: storing the mood type of the first object, the mood type of the first object and the speech used by the second object in the current reference speech record as an associated data in the first speech table, in case the consumption value of the first object in the current reference speech record is greater than or equal to a second threshold value; in the case where the consumption value of the first subject is smaller than the second threshold value in the current reference utterance record, a mood type of the first subject, an emotion type of the first subject, and an utterance used by the second subject in the current reference utterance record are stored as one piece of associated data in the second utterance table.

As an alternative example, the identification unit includes: a first input module, configured to input the audio information into a first target recognition model, so as to obtain a first recognition result output by the first target recognition model, where the first recognition result includes a mood type of the target object; the conversion module is used for converting the audio information into character information; and the second input module is used for inputting the character information into a second target recognition model to obtain a second recognition result output by the second target recognition model, wherein the second recognition result comprises the emotion type of the target object.

As an optional example, the first input module includes: a first input sub-module, configured to input the audio information into the first target recognition model to obtain the first recognition result when the first target recognition model is a model trained using sample audio data; and a second input sub-module configured to, when the first target recognition model is a model trained using a sample spectrum image, input a spectrum image of the audio information into the first target recognition model to obtain the first recognition result.

As an optional example, the apparatus further includes: a third obtaining unit, configured to obtain a first voice operation used by the current object after the target voice operation is sent to the current object; a recording unit, configured to record the mood type of the target object, and the first dialect of the current object as a piece of associated data in the target dialect table according to the consumption value of the target object.

In the embodiment of the invention, audio information of a target object is acquired by adopting radio equipment carried by a current object, wherein the distance between the target object and the current object is less than or equal to a first threshold value; identifying the audio information to obtain the tone type and the emotion type of the target object; extracting a target language from a target language table according to the tone type and the emotion type, wherein the target language table stores a plurality of reference language records, each reference language record comprises a consumption value of a first object, the tone type and the emotion type of the first object and a language used by a second object serving the first object, the first object and the target object have the same role mark, and the second object and the current object have the same role mark; the method for sending the target dialect to the current object further searches the target dialect corresponding to the target dialect by acquiring the audio information of the target object and the type and emotion type of the target object and sends the target dialect to the current object, thereby realizing the effect of providing the corresponding dialect for the staff immediately according to the reaction of the customer and solving the technical problem of low efficiency of recommending the dialect.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow diagram illustrating an alternative approach to conversational recommendation in accordance with an embodiment of the invention;

fig. 2 is a schematic structural diagram of an alternative speech recommendation device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of an embodiment of the present invention, there is provided a conversational recommendation method, optionally as an optional implementation, as shown in fig. 1, the method including:

s102, acquiring audio information of a target object through radio equipment carried by the current object, wherein the distance between the target object and the current object is smaller than or equal to a first threshold value;

s104, identifying the audio information to obtain the tone type and the emotion type of the target object;

s106, extracting a target speech operation from a target speech operation table according to the tone type and the emotion type, wherein a plurality of reference speech operation records are stored in the target speech operation table, each reference speech operation record comprises a consumption value of a first object, the tone type and the emotion type of the first object and the speech operation used by a second object for providing service for the first object, the first object and the target object have the same role mark, and the second object and the current object have the same role mark;

s108, the target dialect is sent to the current object.

Alternatively, the above-mentioned tactical recommendation may be, but not limited to, applied to a terminal capable of calculating data, such as a mobile phone, or other intelligent terminals, such as an intelligent card with a headset or a display, an intelligent bracelet, and the like. The terminal may interact with the server via a network, which may include, but is not limited to, a wireless network or a wired network. Wherein, this wireless network includes: WIFI and other networks that enable wireless communication. Such wired networks may include, but are not limited to: wide area networks, metropolitan area networks, and local area networks. The server may include, but is not limited to, any hardware device capable of performing computations.

Optionally, the scheme can be applied to the service field, such as a selling process, a telephone maintenance customer process and the like. Taking a selling process as an example, the terminal acquires audio information of a customer and acquires a tone type and an emotion type of the customer according to the audio information, then searches a target speech operation in a corresponding target speech operation table, and prompts the target speech operation to a worker, so that the worker can deal with the customer according to the target speech operation.

Optionally, the target dialect table in the present solution may include a first dialect table and a second dialect table. Obtaining the plurality of reference session records; determining each of the reference session records as a current reference session record, and performing the following steps until the plurality of reference session records are traversed: under the condition that the consumption value of the first object in the current reference language record is larger than or equal to a second threshold value, saving the tone type of the first object, the emotion type of the first object and the language used by the second object in the current reference language record as a piece of associated data to the first language table; and in the case that the consumption value of the first object in the current reference language record is smaller than the second threshold value, saving the tone type of the first object, the emotion type of the first object and the language used by the second object in the current reference language record as a piece of associated data to the second language table. That is, the reference speech record stored in the first speech table is actually the speech record in the case where the user purchases a large number of products. And the reference dialect record in the second dialect table is the dialect record in the case that the user purchases few products. For example, the words of the staff and the emotions and the tone of the customers are recorded, the consumption values of the customers are recorded, and if the consumption values are higher, the words are effective for the emotions and the tone of the customers. The dialect and the emotion and mood of the customer are stored as one piece of data in the first dialect table. If the consumption value of the customer is very low, the current dialect is not good for the current emotional and emotional customer. The dialect and the emotion and tone of the customer are stored in a second dialect table. When pushing the target utterance to the staff member, the target utterance may be looked up from the first utterance table and/or the second utterance table. The target surgery may be multiple.

Alternatively, in identifying the audio information of the customer, a target recognition model may be constructed to identify the emotion type and the tone type. If the audio information of the customer is input into a first target recognition model, obtaining a first recognition result output by the first target recognition model, wherein the first recognition result comprises the tone type of the target object; converting the audio information into character information; and inputting the text information into a second target recognition model to obtain a second recognition result output by the second target recognition model, wherein the second recognition result comprises the emotion type of the target object.

That is, the first target recognition model is used for recognizing the audio information of the customer so as to recognize the tone type, and the second recognition result is used for recognizing the text information corresponding to the audio information of the customer so as to recognize the tone type of the customer.

Optionally, in the present solution, before the mood type is recognized, the first target recognition model may be trained. Training can be divided into different directions. If the first target recognition model is trained by using the sample audio data, the trained first target recognition model can directly recognize the audio data of the customer, so that the tone type of the customer is obtained. If the first target recognition model is trained by using the sample spectrum image, after training, when the first target recognition model is used, audio data of a customer needs to be converted into the spectrum image, and then the spectrum image is input into the first target recognition model, and the emotion type of the customer is recognized by the first target recognition model.

After the target language technique is recommended to the staff, a first language technique used by the staff is obtained, then a consumption value of the customer is obtained, and according to the consumption value, the tone type of the customer, the emotion type of the customer and the first language technique are recorded into the target language technique table as a piece of associated data. And recording to the first language table or recording to the second language table according to the consumption value.

The following description is made with reference to an example. At salesperson service customer's in-process, the salesperson can carry radio equipment and carry the earphone, and the salesperson collects customer's pronunciation audio frequency, then extracts characteristics and characters characteristics such as customer's tone, pronunciation characteristic, accent characteristic, discerns above-mentioned characteristic, determines customer's tone type and mood type. According to the tone type and the emotion type, the classification result of the customer aiming at the classification result of the customer can be obtained, and a target language table can be correspondingly obtained. The target dialect table records the dialect with good effect and the dialect with poor effect of the customer aiming at the classification result. The dialect is transmitted to the headset of the salesperson, who can serve the customer with the dialect as a reference. The customer's dialect and results are fed back to the target dialect table.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiment of the invention, a speech recommendation device for implementing the speech recommendation method is further provided. As shown in fig. 2, the apparatus includes:

(1) a first obtaining unit 202, configured to obtain audio information of a target object through a radio device carried by a current object, where a distance between the target object and the current object is smaller than or equal to a first threshold;

(2) the identifying unit 204 is configured to identify the audio information to obtain a mood type and an emotion type of the target object;

(3) an extracting unit 206, configured to extract a target utterance from a target utterance table according to the mood type and the emotion type, where the target utterance table stores a plurality of reference utterance records, each reference utterance record includes a consumption value of a first object, the mood type and the emotion type of the first object and a utterance used by a second object serving the first object, the first object and the target object have a same role mark, and the second object and the current object have a same role mark;

(4) a sending unit 208, configured to send the target dialog to the current object.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for conversational recommendation, comprising:

acquiring audio information of a target object through radio equipment carried by the current object, wherein the distance between the target object and the current object is smaller than or equal to a first threshold value;

identifying the audio information to obtain the tone type and the emotion type of the target object;

extracting a target language operation from a target language operation table according to the tone type and the emotion type, wherein a plurality of reference language operation records are stored in the target language operation table, each reference language operation record comprises a consumption value of a first object, the tone type and the emotion type of the first object and a language operation used by a second object for providing service for the first object, the first object and the target object have the same role mark, and the second object and the current object have the same role mark;

and sending the target language to the current object.

2. The method of claim 1, wherein the target dialect table comprises a first dialect table and a second dialect table, and wherein before extracting the target dialect from the target dialect table according to the mood type and the emotion type, the method further comprises:

obtaining the plurality of reference session records;

determining each of the reference session records as a current reference session record, and performing the following steps until the plurality of reference session records are traversed:

under the condition that the consumption value of the first object in the current reference language record is larger than or equal to a second threshold value, saving the tone type of the first object, the emotion type of the first object and the language used by the second object in the current reference language record as a piece of associated data to the first language table;

and in the case that the consumption value of the first object in the current reference language record is smaller than the second threshold value, saving the tone type of the first object, the emotion type of the first object and the language used by the second object in the current reference language record as a piece of associated data to the second language table.

3. The method of claim 1, wherein the identifying the audio information to obtain the mood type of the target object and the mood type of the target object comprises:

inputting the audio information into a first target recognition model to obtain a first recognition result output by the first target recognition model, wherein the first recognition result comprises the mood type of the target object;

converting the audio information into character information;

and inputting the text information into a second target recognition model to obtain a second recognition result output by the second target recognition model, wherein the second recognition result comprises the emotion type of the target object.

4. The method of claim 3, wherein the inputting the audio information into the first target recognition model to obtain the first recognition result output by the first target recognition model comprises:

under the condition that the first target recognition model is a model trained by using sample audio data, inputting the audio information into the first target recognition model to obtain a first recognition result;

and under the condition that the first target recognition model is a model trained by using a sample spectrum image, inputting the spectrum image of the audio information into the first target recognition model to obtain the first recognition result.

5. The method of any one of claims 1 to 4, wherein after sending the target utterance to the current subject, the method further comprises:

acquiring a first dialect used by the current object;

and recording the tone type of the target object, the emotion type of the target object and the first dialect of the current object into the target dialect table as a piece of associated data according to the consumption value of the target object.

6. A tactical recommendation device, comprising:

the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring audio information of a target object through radio equipment carried by a current object, and the distance between the target object and the current object is smaller than or equal to a first threshold;

the recognition unit is used for recognizing the audio information to obtain the tone type and the emotion type of the target object;

an extracting unit, configured to extract a target utterance from a target utterance table according to the mood type and the emotion type, where the target utterance table stores a plurality of reference utterance records, each reference utterance record includes a consumption value of a first object, the mood type and the emotion type of the first object and a utterance used by a second object serving the first object, the first object and the target object have a same role mark, and the second object and the current object have a same role mark;

a sending unit, configured to send the target dialog to the current object.

7. The apparatus of claim 6, wherein the target dialogs table comprises a first dialogs table and a second dialogs table, the apparatus further comprising:

a second obtaining unit, configured to obtain the multiple reference utterance records before extracting the target utterance from the target utterance table according to the type of mood and the type of emotion;

a processing unit, configured to determine each of the reference tactical records as a current reference tactical record, and execute the following steps until the plurality of reference tactical records are traversed:

8. The apparatus of claim 6, wherein the identification unit comprises:

the first input module is used for inputting the audio information into a first target recognition model to obtain a first recognition result output by the first target recognition model, wherein the first recognition result comprises the mood type of the target object;

the conversion module is used for converting the audio information into character information;

and the second input module is used for inputting the text information into a second target recognition model to obtain a second recognition result output by the second target recognition model, wherein the second recognition result comprises the emotion type of the target object.

9. The apparatus of claim 8, wherein the first input module comprises:

the first input submodule is used for inputting the audio information into the first target recognition model to obtain a first recognition result under the condition that the first target recognition model is a model trained by using sample audio data;

and the second input submodule is used for inputting the spectral image of the audio information into the first target recognition model to obtain the first recognition result under the condition that the first target recognition model is a model trained by using a sample spectral image.

10. The apparatus of any one of claims 6 to 9, further comprising:

a third obtaining unit, configured to obtain a first utterance used by the current object after the target utterance is sent to the current object;

and the recording unit is used for recording the tone type of the target object, the emotion type of the target object and the first dialect of the current object into the target dialect table as a piece of associated data according to the consumption value of the target object.