CN108205525B

CN108205525B - Method and device for determining user intention based on user voice information

Info

Publication number: CN108205525B
Application number: CN201611187130.6A
Authority: CN
Inventors: 张柯; 王晓光; 褚巍; 施兴
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2016-12-20
Filing date: 2016-12-20
Publication date: 2021-11-19
Anticipated expiration: 2036-12-20
Also published as: CN108205525A

Abstract

The application provides a method for determining user intention based on user voice information, which comprises the following steps: acquiring real-time voice of a target user when a specified transaction is processed; determining and calculating a voice characteristic value of the user intention according to semantic and intonation information contained in the real-time voice; and taking the voice characteristic value as an input parameter of a calculation model for determining the user intention based on the user voice information, and calculating to obtain an evaluation value of the intention of the target user when the specified transaction is processed at this time. The voice characteristic value can be determined according to the semantic and intonation information contained in the real-time voice of the target user, the reliability of a data source is guaranteed, the evaluation value of the corresponding intention of the target when the designated transaction is processed is objectively determined in real time, and the real-time indication function is provided for related business personnel processing the transaction.

Description

Method and device for determining user intention based on user voice information

Technical Field

The present application relates to a method of determining user intention, and more particularly, to a method and apparatus for determining user intention based on user speech information, a method and apparatus for determining reference phrases for evaluation, a method and apparatus for generating sample data for training, and a method and apparatus for generating a calculation model for determining user intention based on user speech information. And also relates to a method for determining the repayment intention of the user based on the voice information of the user.

Background

In daily activities, when a user is expected to implement a specific behavior, business personnel need to take corresponding actions or measures according to the intention of the user so as to prompt the user to implement the specific behavior.

For example, when a transaction that the user wants to implement purchasing behavior is processed, the purchasing intention of the user needs to be determined, and corresponding strategies are adopted according to different purchasing intentions of the user so as to persuade the user to implement purchasing behavior.

For another example, when dealing with a transaction requiring repayment of the loan user, in the process of communicating with the borrowing user, the user's repayment intention needs to be determined, and different measures such as sensibilization, pressure application and the like are taken to persuade the user to promote the repayment according to the determined repayment intention of the user.

In the case of the above-mentioned transaction, the corresponding intention of the user needs to be determined, and different measures and strategies can be taken accordingly. The existing ways of determining the user's intention in processing a given transaction are mostly as follows:

in a first mode, the collected historical data of the specified transaction processed in the past, including the attribute data of the user at that time, the processing result at that time and the like, is analyzed, the matching situation of the collected known attribute data of the target user corresponding to the specified transaction processed this time and the historical data is objectively judged, and the intention of the target user in the transaction processed this time is determined.

And in the second mode, in the process of communication between the related service personnel and the user, the behavior of the target user such as language attitude and the like is subjectively judged according to own experience, and the intention of the related service personnel in processing the specified transaction at this time is determined.

And a third mode, namely, the business personnel combines the objective judgment of the client and the subjective judgment according to the language and attitude in the communication process with the client to determine the intention of the business personnel in processing the specified transaction at this time.

In the first method, only the collected historical data is analyzed, the influence of the current situation of the user on the intention of the user is not considered, the correctness of the collected historical data cannot be ensured, and the judgment result is definitely deviated. And the second and third modes depend on the self-ability of business personnel and the exertion of communication with the user, so that the intention of the user cannot be generally and objectively and accurately determined.

It can be seen that the existing ways of determining the corresponding intent of a user to handle a given event are subjective and do not take into account the current condition of the user and the data source used for analysis may be unreliable.

Disclosure of Invention

The application provides a method for determining user intention based on user voice information and simultaneously provides a device for determining the user intention based on the user voice information. The application also provides a method for determining the reference phrase for evaluation and also provides a device for determining the reference phrase for evaluation. The application also provides a method for generating the sample data for training and a device for generating the sample data for training. The application also provides a method for generating the calculation model for determining the user intention based on the user voice information, a device for generating the calculation model for determining the user intention based on the user voice information, and a method for determining the user repayment intention based on the user voice information.

The application provides a method for determining user intention based on user voice information, which comprises the following steps:

acquiring real-time voice of a target user when a specified transaction is processed;

determining and calculating a voice characteristic value of the user intention according to semantic and intonation information contained in the real-time voice;

and taking the voice characteristic value as an input parameter of a calculation model for determining the user intention based on the user voice information, and calculating to obtain an evaluation value of the intention of the target user when the specified transaction is processed at this time.

Optionally, the determining, according to semantic and intonation information included in the real-time speech, a speech feature value for calculating a user intention includes:

extracting and generating voices of each phrase contained in the real-time voice from the real-time voice according to a preset rule;

according to a preset rule, endowing a tone coefficient to the voice of each phrase;

and determining a voice characteristic value according to the situation that the reference phrase for evaluation is contained by the real-time voice and the tone coefficient of the voice of the reference phrase for evaluation contained by the real-time voice.

Optionally, the determining a speech feature value according to a situation that the reference phrase for evaluation is included in the real-time speech and a intonation coefficient of speech of a phrase included in the real-time speech includes:

and if the reference phrase for evaluation is contained by the real-time voice, setting the corresponding voice characteristic value as the tone coefficient of the voice of the reference phrase for evaluation contained by the real-time voice.

Optionally, the evaluation is obtained with the reference phrase according to the following method:

acquiring historical voice of a user when the specified transaction is processed in the past, wherein the historical voice comprises a plurality of pieces of voice, and each piece of voice is voice of one user when the specified transaction is processed in the past;

extracting and generating each phrase which is formed by the basic words and corresponds to each voice from each voice according to the preset number of the basic words and phrases;

and taking each phrase meeting the preset reference phrase for evaluation number requirement as a reference phrase for evaluation from phrases corresponding to each voice in the historical voice.

Optionally, the preset number of reference phrases for evaluation is obtained according to the following method:

sorting the phrases according to quantity;

determining a quantity selection interval according to the sequencing position;

the number of phrases falls within the number selection interval as a number requirement for reference phrases for evaluation.

Optionally, the determining the number selection interval according to the sorted positions includes:

and taking the number of the phrases corresponding to the sorting position 40% in the sorting queue to the number of the phrases corresponding to the sorting position 60% as a number selection interval.

Optionally, the calculation model for determining the user intention based on the user voice information is obtained by the following method:

initializing each parameter of the computer deep neural network;

generating training sample data by using the historical speech of the user when the specified affair is processed in the past;

training the computer deep neural network with the training sample data until convergence;

and taking the converged computer deep neural network as the calculation model for determining the user intention based on the user voice information.

Optionally, the generating sample data for training by using the historical speech of the user when the specified transaction is processed in the past includes:

acquiring historical voice of a user when the specified transaction is processed in the past, wherein the historical voice comprises a plurality of voices, and each voice is voice of one user when the specified transaction is processed in the past;

generating characteristic values of the voices relative to the reference phrase for evaluation by using the historical voices;

and taking the characteristic value of each voice relative to the reference phrase for evaluation and the result of processing the specified transaction corresponding to each voice as sample data for training.

Optionally, the generating, by using the historical speech, a feature value of each speech with respect to the reference phrase for evaluation includes:

extracting and generating the voice of the phrase consisting of the basic words corresponding to each voice from each voice according to the preset number of the basic words;

determining the tone coefficient of the voice of the phrase corresponding to each voice;

taking each phrase meeting the preset reference phrase for evaluation number requirement as a reference phrase for evaluation from phrases corresponding to each voice in the historical voice;

and generating a characteristic value of each voice relative to the reference phrase for evaluation according to the condition that each voice in the historical voices contains the reference phrase for evaluation and the tone coefficient of the voice of the reference phrase for evaluation contained in each voice.

Optionally, the generating a feature value of each speech with respect to the reference phrase for evaluation according to a situation that each speech of the historical speech includes the reference phrase for evaluation and a intonation coefficient of a speech of the reference phrase for evaluation included in each speech includes:

and for the case that each piece of voice in the historical voice contains the reference phrase for evaluation, using the tone coefficient of the voice of the reference phrase for evaluation contained in the corresponding voice as the characteristic value of the voice relative to the reference phrase for evaluation.

Optionally, the specified transaction includes requiring a payment from the user, and accordingly, the corresponding intent includes an intent of the payment from the user.

The application provides a method for determining a reference phrase for evaluation, which comprises the following steps:

and taking each phrase meeting the preset reference phrase for evaluation number requirement as a reference phrase for evaluation from each phrase corresponding to each voice in the historical voice.

sorting the phrases according to quantity;

determining a quantity selection interval according to the sequencing position;

and taking the number of the phrases corresponding to 40% of the sorting positions in the sorting queue to the number of the phrases corresponding to 60% of the sorting positions in the sorting queue as a number selection interval.

The application provides a method for generating sample data for training, wherein the sample data is used for training a computer deep neural network to generate a calculation model for determining user intention based on user voice information, and the method comprises the following steps:

generating characteristic values of each voice relative to each reference phrase for evaluation by using the historical voices;

and taking the characteristic value of each voice relative to each reference phrase for evaluation and the result of processing the specified transaction corresponding to each voice as sample data for training.

Optionally, the generating, by using the historical speech, a feature value of each speech with respect to each reference phrase for evaluation includes:

extracting and generating the voice of each phrase consisting of the basic words corresponding to each voice from each voice according to the preset number of the basic words;

taking each phrase meeting the preset reference phrase for evaluation quantity requirement as each reference phrase for evaluation from each phrase corresponding to each voice in the historical voice;

and generating a characteristic value of each voice relative to each reference phrase for evaluation according to the condition that each voice in the historical voice contains each reference phrase for evaluation and the tone coefficient of the voice of the reference phrase for evaluation contained in each voice.

Optionally, the generating, according to a situation that each speech in the historical speech includes the reference phrase for evaluation and a tone coefficient of the speech of the reference phrase for evaluation included in each speech, a feature value of each speech with respect to each reference phrase for evaluation includes:

The application provides a method for generating a calculation model for determining user intention based on user voice information, wherein the calculation model is used for determining the user intention based on the user voice information when a specified transaction is processed, and the method comprises the following steps:

initializing each parameter of the computer deep neural network;

and using the converged computer deep neural network as a calculation model for determining the user intention based on the user voice information.

Optionally, the generating a feature value of each speech with respect to the reference phrase for evaluation according to a situation that each speech in the historical speech includes the reference phrase for evaluation and a intonation coefficient of a speech of the reference phrase for evaluation included in each speech includes:

Optionally, the computational model is used to determine the user's intent to pay when the user is required to pay.

The application provides a device for determining user intention based on user voice information, which comprises:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a real-time voice of a target user when a specified transaction is processed;

the voice characteristic value determining unit is used for determining and calculating a voice characteristic value of the user intention according to the semantic and intonation information contained in the real-time voice;

and the determining unit is used for calculating and obtaining an evaluation value of the intention of the target user when the specified transaction is processed at this time by taking the voice characteristic value as an input parameter of a calculation model for determining the intention of the user based on the voice information of the user.

Optionally, the voice feature value determination unit includes:

the phrase determining subunit is used for extracting and generating the voice of each phrase contained in the real-time voice from the real-time voice according to a preset rule;

a tone coefficient determining subunit, configured to assign a tone coefficient to the speech of each phrase according to a preset rule;

and the voice characteristic value determining subunit is used for determining the voice characteristic value according to the condition that the reference phrase for evaluation is contained by the real-time voice and the tone coefficient of the voice of the reference phrase for evaluation contained by the real-time voice.

Optionally, the speech feature value determining subunit is specifically configured to:

The application provides a device for determining reference phrases for evaluation, wherein the reference phrases for evaluation are used for determining corresponding intentions of a user when processing a specified transaction, and the device comprises:

the acquisition unit is used for acquiring historical voice of a user when the specified transaction is processed in the past, wherein the historical voice comprises a plurality of pieces of voice, and each piece of voice is voice of one user when the specified transaction is processed in the past;

the phrase determining unit is used for extracting and generating each phrase which is formed by basic words and corresponds to each voice from each voice according to the preset number of the basic words;

and the reference phrase determining unit is used for determining each phrase meeting the preset reference phrase number requirement for evaluation from each phrase corresponding to each voice in the historical voice as the reference phrase for evaluation.

The application provides a device for generating sample data for training, the sample data is used for training a computer deep neural network to generate a calculation model for determining user intention based on user voice information, the device comprises:

a feature value determination unit configured to generate feature values of the respective voices with respect to the respective reference phrases for evaluation, using the historical voices;

and the sample data determining unit is used for taking the characteristic value of each voice relative to each reference phrase for evaluation and the result of processing the specified transaction corresponding to each voice as sample data for training.

Optionally, the feature value determination unit includes:

the phrase determining subunit is used for extracting and generating the voice of each phrase consisting of the basic words corresponding to each voice from each voice according to the preset number of the basic words;

a tone coefficient determining subunit, configured to determine a tone coefficient of a voice of a phrase corresponding to each voice;

a reference phrase determining subunit, configured to use, as each reference phrase for evaluation, each phrase that meets a preset requirement for the number of reference phrases for evaluation from each phrase corresponding to each voice in the historical voice;

and a feature value determining subunit, configured to generate a feature value of each speech with respect to each reference phrase for evaluation according to a case where each speech in the historical speech includes each reference phrase for evaluation and a tone coefficient of the speech of the reference phrase for evaluation included in each speech.

Optionally, the feature value determining subunit is specifically configured to:

The application provides an apparatus for generating a calculation model for determining user intention based on user voice information, wherein the calculation model is used for determining the user intention based on the user voice information when a specified transaction is processed, and the apparatus comprises:

the initialization unit is used for initializing all parameters of the computer deep neural network;

the sample data generating unit is used for generating sample data for training by using the historical speech of the user when the specified affair is processed in the past;

a training unit for training the computer deep neural network by using the sample data for training until the convergence;

a model determination unit for using the converged computer deep neural network as a computational model for determining a user intent based on user speech information.

The application provides a method for determining a repayment intention of a user based on voice information of the user, which comprises the following steps:

acquiring real-time voice of a target lending user during the time of solicitation;

determining and calculating a voice characteristic value of the repayment intention of the user according to the semantic and intonation information contained in the real-time voice;

and taking the voice characteristic value as an input parameter of a calculation model for determining the repayment intention of the user based on the voice information of the user, and calculating to obtain an evaluation value of the repayment intention of the target lending user during the current collection.

Optionally, the method further comprises:

and providing a corresponding reference collection strategy according to the evaluation value.

Compared with the prior art, the method for determining the user intention based on the user voice information has the following advantages:

the voice characteristic value can be determined according to the semantic and intonation information contained in the real-time voice of the target user, the reliability of a data source is guaranteed, the evaluation value of the corresponding intention of the target when the designated transaction is processed is objectively determined in real time, and the real-time indication function is provided for related business personnel processing the transaction.

Compared with the prior art, the method for determining the reference phrase for evaluation provided by the application has the following advantages:

the reference phrases for evaluation are determined according to the quantity requirement from the phrases contained in the voice of the user during the processing of the specified transaction in the prior art, so that the key semantics of the user's intention can be accurately determined during the processing of the specified transaction, and the effect of improving the evaluation accuracy is achieved.

Compared with the prior art, the method for generating the sample data for training has the following advantages:

the historical speech of the user who processes the specified transaction in the past is used as source data, and the corresponding relation is established between the language features of the historical speech and the result of processing the specified transaction, so that the reliable data source is ensured, and the effect of improving the efficiency of training and generating a calculation model is achieved.

Compared with the prior art, the method for generating the calculation model for determining the user intention based on the user voice information has the following advantages:

the method has the advantages that the sample data formed by the historical speech directly from the conventional processing of the specified affairs is used for training the deep neural network of the computer to obtain the calculation model, the calculation accuracy of the calculation model can be improved, and the effect of improving the working efficiency is achieved.

Drawings

Fig. 1 is a flowchart illustrating a method for determining a user intention based on user voice information according to a first embodiment of the present application;

FIG. 2 is a diagram illustrating a method for determining user intent based on user voice information according to a first embodiment of the present application;

FIG. 3 is a schematic data flow diagram illustrating a method for determining a user's intention based on user's voice information according to a first embodiment of the present application;

FIG. 4 is a schematic flow chart diagram illustrating a method for determining a reference phrase for evaluation in accordance with a second embodiment of the present application;

fig. 5 is a flowchart illustrating a method for generating sample data for training according to a third embodiment of the present application;

FIG. 6 is a flowchart illustrating a method for generating a calculation model for determining user intent based on user speech information according to a fourth embodiment of the present disclosure;

FIG. 7 is a block diagram illustrating an apparatus for determining a user's intention based on user's voice information according to a fifth embodiment of the present application;

FIG. 8 is a block diagram of an apparatus for determining a reference phrase for evaluation according to a sixth embodiment of the present application;

fig. 9 is a block diagram illustrating an apparatus for generating sample data for training according to a seventh embodiment of the present application;

FIG. 10 is a block diagram illustrating an apparatus for generating a calculation model for determining a user's intention based on user's speech information according to an eighth embodiment of the present application;

fig. 11 is a flowchart illustrating a method for determining a repayment intention of a user based on user voice information according to a ninth embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

A first embodiment of the present application provides a method for determining a user intention based on user voice information, a flowchart of which is shown in fig. 1, and this embodiment takes determining a payment intention of a target user when processing a request for payment from the target user as an example for explanation, and the embodiment includes the following steps:

step S101, acquiring real-time voice of a target user when a specified transaction is processed.

In this embodiment, an example of determining a payment intention of a target user when a payment is requested by the target user is described. When a target user who borrows money needs to be charged, business personnel usually communicate with the target user through a telephone in a language to remind or request the target user to pay as soon as possible.

The real-time voice refers to the voice of the target user before the current time point when the specified transaction (payment urging to the target user) is processed at this time. In the process of telephone conversation between the service personnel and the target client, the original voice of the telephone before the current time point can be acquired at any time, and for the condition that the original telephone voice contains the voice information of the service personnel, the voice of the service personnel in the original telephone voice is stripped before the step, and the voice which only contains the voice information of the borrowing user before the current time point when the borrowing user is urged is acquired in the step.

And S102, determining and calculating a voice characteristic value of the user intention according to the semantic and intonation information contained in the real-time voice.

The manner of determining and calculating the speech feature value of the user intention according to the semantic and intonation information included in the real-time speech may be various, and may be determined according to a plurality of specific situations, such as an application scenario, a length of the real-time speech, or a number of words included in the real-time speech, and the embodiment provides the following manner:

first, speech of each phrase included in the real-time speech is extracted and generated from the real-time speech according to a preset rule.

The preset rule is a rule for extracting and generating the same phrase from the historical voice of the user when the specified transaction is processed in the past.

The historical voice comprises a plurality of pieces of voice, and each piece of voice is the voice of one user when the specified transaction is processed in the past. The historical speech only contains the speech of the user and does not contain the speech of other people.

The rules may be as follows: and decomposing a long sentence corresponding to the voice into basic words, and forming adjacent basic words contained in the voice into phrases according to the preset basic word quantity requirement of generating the phrases.

The long sentence corresponding to the voice may be decomposed into basic words by using a software tool (e.g., word2 vector or CRF) to operate the long sentence corresponding to the voice.

For example, in the case where the long sentence corresponding to the user voice is "i don't want to pay for no money", the basic words constituting the long sentence are obtained as "i", "don't pay for money", "don't want" and "pay for money", respectively, using the word2 player tool.

When the number of basic words included in a phrase set in advance is 3 or less, the following 9 phrases can be composed as shown in fig. 2:

phrases with a number of base words of 1 include "i", "no money", "not wanted" and "still money";

phrases with a number of basic words of 2 include "i don't pay", "don't want" and "don't want to pay";

phrases with a number of basic words of 3 include "i don't want money" and "do not want money".

And intercepting and storing the voice corresponding to each phrase from the voice of the long sentence.

And after the voice of each phrase is obtained, according to a preset rule, a tone coefficient is given to the voice of each phrase.

The preset rule is a tone coefficient reference standard determined by historical voice of a user when the specified transaction is processed in the past.

The reference standard for determining the intonation coefficient can be obtained by adopting the following modes:

and acquiring historical voice of the user when the specified transaction is processed in the past, wherein the historical voice comprises a plurality of pieces of voice, and each piece of voice is voice of one user when the specified transaction is processed in the past. The historical speech only contains the speech of the user and does not contain the speech of other people.

And extracting and generating the voice of each phrase consisting of the basic words corresponding to each voice from each voice according to the preset number of the basic words. The extraction method is the same as that in the previous step, and is not described herein again.

The intonation values of the voices of the phrases generated by the historical voices of the users (namely, the values obtained by calculation according to the voice frequencies of the phrases, and the specific calculation method can be selected according to actual conditions) are sorted, and the intonation values are divided into intervals with the specified number according to the size of the intonation values. For the case that the intonation value is 100 at most, the average is divided into 10 intervals of intonation values from small to large, which are 0-10, 11-20 … … 81-90 and 91-infinity respectively.

And setting a corresponding intonation coefficient for each interval. Such as 1, 2 … … 9 and 10.

The intonation coefficient reference criteria determined by the user's historical speech at the time of past processing of the specified transaction has been determined so far.

And according to the reference standard, respectively giving intonation coefficients to the phrases corresponding to the real-time voice according to the intonation values of the phrases.

And if the intonation value of the phrase corresponding to the real-time voice falls into the intonation interval of 11-20, giving the intonation coefficient of 2 to the phrase. And the other cases are analogized in turn.

After each phrase is given with a tone coefficient, a speech feature value is determined according to the condition that the reference phrase for evaluation is included in the real-time speech and the tone coefficient of the speech of the reference phrase for evaluation included in the real-time speech.

The reference phrase for evaluation may be obtained by the following method:

Similarly to the foregoing steps, each phrase composed of basic words and corresponding to each voice is extracted and generated from each voice according to the number of preset basic words and is not described herein again.

And taking each phrase meeting the preset requirement of the number of the reference phrases for evaluation from the phrases corresponding to each voice in the historical voice as each reference phrase for evaluation determined by the historical voice of the past user for processing the specified transaction.

And counting the number of different phrases generated by the historical speech, wherein the different phrases refer to the phrases with different semantic texts of the phrases, and the number of the same phrase is counted for the phrases with the same semantic text of the phrase. For example, when the phrase "i" is included in all 2 pieces of historical speech, the number of the phrase "i" is counted as 2 regardless of other factors such as the tone value or the tone coefficient of the phrase in different pieces of speech, because the semantic text is "i".

The requirement for the number of reference phrases for the preset evaluation may be determined according to the actual situation, and the following means are provided for determining the requirement:

the different phrases are sorted according to the number of the phrases appearing in the historical speech, the number selection interval is determined according to the sorting position, specifically, the sorting sequence number can be normalized, namely, the condition that the range of the forming sequence number is 0 to 1, and the number of the phrases corresponding to the phrases with the sorting position of 40% to 60% is used as the endpoint of the number selection interval to form the number selection interval. The interval may include an end point of the interval or may not include the end point, and the specific situation may be determined according to the actual situation. Selecting 40% to 60% as the quantity selection interval can filter out phrases that appear too many times or too few times in the historical speech that are not relevant to the transaction, leaving phrases that are commonly used that are relevant to the transaction.

The reference phrases for evaluation determined from the historical speech can be such that the phrases employed for evaluation are most relevant to the specified transaction.

For example, in the case that 100 phrases are ranked from large to small in number, the number of phrases ranked from 40 to 60 is used as the number of the number selection interval, the number of phrases ranked as 40 is 1000, and the number of phrases ranked as 60 is 700, then the number of phrases falls into the interval 700 and 1000 as the number requirement of the reference phrases for evaluation.

And taking phrases meeting the requirement of the number of the reference phrases for evaluation in the phrases corresponding to the speeches in the historical speeches as reference phrases for evaluation. For example, for the example of the number requirement for the reference phrases for evaluation described above, all phrases having a number of phrases greater than or equal to 700 and less than or equal to 1000 may be used as the reference phrases for evaluation.

And sequentially judging the condition that each reference phrase for evaluation is contained by the real-time voice, and setting the corresponding voice characteristic value as the tone coefficient of the voice of the reference phrase for evaluation contained by the real-time voice for the condition that a certain reference phrase for evaluation is contained by the real-time voice.

For example, if the fact speech includes a phrase "i have no money" whose intonation coefficient is 2, and "i have no money" is a reference phrase for evaluation, the corresponding speech feature value of the real-time speech corresponding to the reference phrase "i have no money" is set to 2.

For the case where a certain reference phrase for evaluation is not included in the real-time speech, the corresponding speech feature value is set to a value representing "not included" that realizes setting.

For example, if the phrase "money is insufficient" is included in the reference phrase for evaluation, and the phrase "money is insufficient" is not included in the fact speech, and the value representing "not included" set in advance is 0, the speech feature value corresponding to the real-time speech corresponding to the reference phrase "money is set to 0

And step S103, taking the voice characteristic value as an input parameter of a calculation model for determining the user intention based on the user voice information, and calculating to obtain an evaluation value of the intention of the target user when the specified transaction is processed at this time.

After the speech feature values are formed, they are used as input parameters for a computational model that determines the user's intent based on the user's speech information. The model for calculation is a computer deep neural network with fixed relevant parameters, and can be obtained by adopting the following method:

setting up the deep neural network of the computer and initializing various parameters of the deep neural network. Such as input, output, number of layers, weights, etc. The output of the computer deep neural network is an evaluation value of the intention of the user when the specified transaction is processed.

Training the computer deep neural network with the training sample data until it converges.

Training the computer deep neural network may take the following form:

and using characteristic values of each voice generated according to the historical voice of the user when the specified transaction is processed in the past and corresponding to each reference phrase for evaluation as the input of the computer deep neural network.

The characteristic value is obtained by adopting the following method:

and extracting the voice of the phrase consisting of the basic words corresponding to each voice from the historical voice of the user during the past processing of the specified transaction according to the preset number of the basic words. The detailed description may refer to the corresponding description in the previous step.

And determining the tone coefficient of the voice of the phrase corresponding to each voice. The detailed description may refer to the corresponding description in the previous step.

And using each phrase meeting the preset requirement of the number of the reference phrases for evaluation as each reference phrase for evaluation determined by the historical voice of the user when the specified transaction is processed in the past from the phrases corresponding to each voice in the historical voice. The detailed description may refer to the corresponding description in the previous step.

And generating a feature value of each speech with respect to each evaluation reference phrase determined by the historical speech of the user when the specified transaction was processed in the past, based on a case where each speech in the historical speech includes each evaluation reference phrase and a tone coefficient of the speech of the evaluation reference phrase included in each speech. The determination of the feature value corresponds to the determination of the speech feature value in the preceding step. The detailed description may refer to the related description about the speech feature values in the previous steps.

The feature values are used as sample data for training a calculation model for generating the user voice information-based user intention determination along with corresponding processing of the specified transaction result. The sample data is from historical voice and results of the user when the specified transaction is processed in the past, and is more direct and reliable.

And taking a group of characteristic values of reference phrases for evaluation corresponding to a piece of historical voice in the sample data as input data for one-time training of the deep neural network of the computer, taking a result of processing the specified transaction corresponding to the historical voice as an expected value of an output value of the deep neural network of the computer, and continuously training and adjusting each parameter of the deep neural network of the computer until the parameter is converged according to a preset acceptable range of the output value. For example, in the case of processing a payment request from a user, a result of the payment of a certain user is "0" indicating that the user is not paying, and correspondingly, a value corresponding to the case of payment already is "1", and the acceptable range of the preset output value is 20%, when the output of the deep neural network of the computer is greater than or equal to 0.8, the output result can be considered to be in accordance with the expectation of payment already. Otherwise, the product is considered not to be in accordance with the standard; when the output of the computer deep neural network is less than or equal to 0.2, the output result can be considered to be in accordance with the expectation of non-payment. Otherwise, the non-compliance is considered. For unexpected situations, relevant parameters of the computer deep neural network need to be adjusted.

And training the computer deep neural network by using the sample data until the computer deep neural network converges, fixing all parameters of the computer deep neural network, and using the computer deep neural network at the moment as a calculation model for determining the user intention based on the user voice information.

And taking the voice characteristic value obtained in the previous step as an input parameter of the calculation model, and calculating to obtain an evaluation value of the corresponding intention (payment intention) of the user when the specified transaction (such as payment request) is processed at this time.

For example, in the case where the real-time voice of the user is "i don't want to pay but do not want to pay" and the corresponding voice feature value is "0, 4, 3, 5, 2, 3, 5, 3, 4", the voice feature value is used as the input of the calculation model, and the evaluation value, for example, "0.7" is calculated, so that the evaluation value of the repayment intention of the user when the user is requested to pay this time is 0.7.

The calculated evaluation value of the repayment intention of the target client can help business personnel to take corresponding strategies or measures to prompt repayment of the target client as soon as possible.

In the following, a scheme of a method for determining a user intention based on user voice information provided by the present application is briefly described from the viewpoint of data flow, as shown in fig. 3.

The method comprises the steps of conducting semantic processing on historical voice of a user when a specified transaction is processed in the past to obtain historical voice phrases, conducting phrase operation processing on the historical voice phrases to obtain reference phrases for evaluation.

And carrying out tone processing on the historical voice phrase to obtain a tone coefficient of the historical voice phrase, combining the tone coefficient of the historical voice phrase with the reference phrase for evaluation and the historical voice, generating a characteristic value to obtain a characteristic value of the historical voice, and training a deep neural network of a computer by using the characteristic value of the historical voice and a corresponding processing result in the past when the specified transaction is processed to obtain a calculation model.

The method comprises the steps of carrying out corresponding semantic and intonation processing on real-time voice of a target user when an appointed transaction is processed, and obtaining the intonation coefficients of phrases and real-time voice phrases in sequence, wherein the processing rules of the semantic and intonation processing are the same as those of the prior historical voice of the user when the appointed transaction is processed.

And generating a voice characteristic value according to the tone coefficient of the real-time voice and the phrase and an evaluation reference phrase determined by historical voice of the user during the past processing of the specified transaction through the characteristic value.

And calculating to obtain an evaluation value of the intention of the target user by taking the voice characteristic value as the input of a calculation model.

So far, the determination of the intention of the target user when a designated transaction is processed is completed, if the real-time voice of the target user when the designated transaction is processed is continuously obtained, the method can be applied to continuously determine the evaluation value of the intention of the target user, and the evaluation value can continuously provide corresponding reference for business personnel so as to adopt corresponding strategies and measures.

A second embodiment of the present application provides a method for determining a reference phrase for evaluation, where the reference phrase for evaluation is used to determine a corresponding intention of a user when processing a specified transaction, and a flow chart of the method is shown in fig. 4, and includes the following steps:

step S201, obtaining a historical voice of a user when the specified transaction is processed in the past, where the historical voice includes multiple voices, and each voice is a voice of one user when the specified transaction is processed in the past.

For detailed description of this step, reference may be made to the description related to steps S101 and S102 in the first embodiment of the present application, which is not described herein again.

Step S202, extracting and generating each phrase which is corresponding to each voice and consists of basic words from each voice according to the preset number of basic words.

For a detailed description of this step, reference may be made to the description related to step S102 in the first embodiment of the present application, which is not repeated herein.

Step S203, using each phrase meeting the requirement of the number of preset reference phrases for evaluation as a reference phrase for evaluation from each phrase corresponding to each voice in the historical voice.

A third embodiment of the present application provides a method for generating sample data for training, where the sample data is used to generate a calculation model for determining a user intention based on user speech information. The flow diagram of the method is shown in fig. 5, and the method comprises the following steps:

step S301, obtaining historical voice of the past user for processing the specified transaction, wherein the historical voice comprises a plurality of voices, and each voice is voice of one user when the specified transaction is processed in the past.

Step S302, using the historical speech, generating characteristic values of each speech relative to each reference phrase for evaluation.

For a detailed description of this step, reference may be made to the description related to step S103 in the first embodiment of the present application, which is not described herein again.

Step S303, using the feature value of each speech with respect to each reference phrase for evaluation and the result of processing the designated transaction corresponding to each speech as sample data for training.

A fourth embodiment of the present application provides a method for determining a computational model of user intent based on user speech information, the computational model being used for determining user intent based on user speech information. The flow chart of the method is shown in fig. 6, and comprises the following steps:

step S401, initializing each parameter of the computer deep neural network.

Step S402, using the characteristic value of each speech relative to each reference phrase for evaluation in the training sample data generated by the past user processing the historical speech of the specified transaction as the input of the deep neural network of the computer, using the result of the user processing the specified transaction corresponding to the corresponding speech as the output expected value of the neural network, and training the deep neural network of the computer according to the acceptable range of the preset output value until the deep neural network converges.

And S403, using the converged computer deep neural network as a calculation model for determining the user intention based on the user voice information.

The method provided by this embodiment is similar to the obtaining method of the calculation model described in step S103 of the first embodiment of this application, and for a more detailed description, reference may be made to the related description in step S103 of the first embodiment of this application. And will not be described in detail herein.

The fifth embodiment of the present application provides an apparatus for determining a user intention based on user voice information, and the apparatus is configured as shown in fig. 7, and includes an obtaining unit U501, a voice feature value determining unit U502, and a determining unit U503.

The obtaining unit U501 is configured to obtain a real-time voice of a user processing a specified transaction;

the voice feature value determining unit U502 is configured to determine a voice feature value for calculating the user's intention according to semantic and intonation information included in the real-time voice.

The speech feature value determination unit U502 may include a phrase determination subunit, a tone coefficient determination subunit, and a speech feature value determination subunit.

And the phrase determining subunit is used for extracting and generating the voice of each phrase contained in the real-time voice from the real-time voice according to a preset rule.

And the tone coefficient determining subunit is configured to assign a tone coefficient to the voice of each phrase according to a preset rule.

The speech feature value determination subunit may be specifically configured to: and if the reference phrase for evaluation is contained by the real-time voice, setting the corresponding voice characteristic value as the tone coefficient of the voice of the reference phrase for evaluation contained by the real-time voice.

The determining unit U503 is configured to calculate an evaluation value of the intention of the target user when the specified transaction is processed this time, using the voice feature value as an input parameter of a calculation model for determining the intention of the user based on the user voice information.

A sixth embodiment of the present application provides an apparatus for determining a reference phrase for evaluation, where the reference phrase for evaluation is used to evaluate a corresponding intention of a user when processing a specified transaction, and a block diagram of the apparatus is shown in fig. 8, and the apparatus includes: an acquisition unit U601, a phrase determination unit U602, and an evaluation-use reference phrase determination unit U603.

The obtaining unit U601 is configured to obtain a historical voice of a previous user processing the specified transaction, where the historical voice includes multiple voices, and each voice is a voice of a user when the specified transaction was processed at a previous time.

The phrase determining unit U602 is configured to extract and generate each phrase composed of basic words and corresponding to each piece of speech from each piece of speech according to a preset number of basic words.

The reference phrase determination unit U603 for evaluation is configured to use, as a reference phrase for evaluation, each phrase that meets a requirement for a preset number of reference phrases for evaluation from each phrase corresponding to each speech in the historical speech.

A seventh embodiment of the present application provides an apparatus for generating sample data for training, where the sample data is used to generate a method computation model for determining a user intention based on user speech information, and a block diagram of the apparatus is shown in fig. 9, and includes: the device comprises an acquisition unit U701, a characteristic value determination unit U702 and a sample data determination unit U703.

The obtaining unit U701 is configured to obtain a historical voice of a previous user for processing the specified transaction, where the historical voice includes multiple voices, and each voice is a voice of one user when the specified transaction is processed in the previous time.

The feature value determination unit U702 is configured to generate feature values of each speech with respect to each reference phrase for evaluation using the historical speech.

The characteristic value determination unit may include: the system comprises a phrase determining subunit, a tone coefficient determining subunit and a characteristic value determining subunit.

and the intonation coefficient determining subunit is used for determining the intonation coefficient of the voice of the phrase corresponding to each voice.

And the reference phrase determining subunit is used for taking each phrase meeting the preset reference phrase number requirement for evaluation as each reference phrase for evaluation from each phrase corresponding to each voice in the historical voice.

And the characteristic value determining subunit is configured to generate a characteristic value of each piece of speech with respect to each reference phrase for evaluation according to a case that each piece of speech in the historical speech audio data includes each reference phrase for evaluation and a tone coefficient of the speech of the reference phrase for evaluation included in each piece of speech.

The feature value determining subunit may also be specifically configured to: and for the case that each piece of voice in the historical voice audio data contains the reference phrase for evaluation, using the tone coefficient of the voice of the reference phrase for evaluation contained in the corresponding voice as the characteristic value of the voice relative to the reference phrase for evaluation.

The feature value determining subunit may also be specifically configured to: and for the case that each voice in the historical voice audio data does not contain the reference phrase for evaluation, taking a preset value which indicates that the voice does not contain as a characteristic value of the voice relative to the reference phrase for evaluation.

The sample data determining unit U703 is configured to use, as training sample data for training a model used for evaluating a corresponding intention of the user when the user processes the specified transaction, feature values of the speech with respect to the evaluation reference phrases and a result of the user corresponding to the speech when the user processes the specified transaction.

An eighth embodiment of the present application provides an apparatus for generating a calculation model for determining a user intention based on user speech information, where the calculation model is used for determining the user intention based on the user speech information when processing a specified transaction, and a block diagram of the apparatus is shown in fig. 10, and the apparatus includes: an initialization unit U801, a sample data generation unit U802, a training unit U803 and a model determination unit U804.

The initialization unit U801 is configured to initialize each parameter of the computer deep neural network.

The sample data initialization unit U802 is configured to generate sample data for training by using historical speech of the user when the specified transaction is processed in the past.

The training unit U803 is configured to use, as an input of the deep neural network of the computer, a feature value of each speech in training sample data generated by processing the historical speech of the specified transaction by a previous user with respect to each reference phrase for evaluation, use, as an output expected value of the neural network, a result of processing the specified transaction by the user corresponding to the corresponding speech, and train the deep neural network of the computer until convergence according to a preset acceptable range of output values.

The model determining unit U804 is used for taking the converged computer deep neural network as a calculation model for determining the user intention based on the user voice information.

A ninth embodiment of the present application provides a method for determining a repayment intention of a user based on user voice information, a flowchart of which is shown in fig. 11, including:

step S901, acquiring a real-time voice of the target user during the receiving.

For the loan issuing business, the business personnel need to urge the loan user to make a payment. Usually, the business personnel can communicate with the lending user by telephone to urge the loan user to pay.

In this step, the voice of the lending user before the current time in the telephone communication process with the lending user is obtained. The voice only contains the voice of the loan user in the telephone communication process, but does not contain the voice of the service personnel.

Step S902, determining and calculating a voice characteristic value of the repayment intention of the user according to the semantic and intonation information contained in the real-time voice.

In this step, the voice feature value for calculating the repayment intention of the user is similar to the voice feature value in the first embodiment of the present application, and the voice feature value for calculating the repayment intention of the lending user can be determined according to the semantic and intonation information obtained by processing the voice.

The detailed description of this step can refer to the related description in step S102 in the first embodiment of this application. And will not be described in detail herein.

And step S903, taking the voice characteristic value as an input parameter of a calculation model for determining the repayment intention of the user based on the voice information of the user, and calculating to obtain an evaluation value of the repayment intention of the target user during the current collection.

And in the step, the voice characteristic value obtained in the previous step is used as an input parameter of a calculation model for determining the repayment intention of the user based on the voice information of the user, and an evaluation value of the repayment intention of the target user is calculated.

The calculation model for determining the repayment intention of the user based on the voice information of the user can refer to the relevant description of the calculation model for determining the intention of the user based on the voice information of the user in the first embodiment of the application. And will not be described in detail herein.

The voice characteristic value for calculating the repayment intention of the user is determined according to the semantic and tone information of the voice, so that the current actual condition factor of the user can be introduced into the calculation of the repayment intention evaluation value of the user, and the information is derived from the voice of the user, so that the repayment intention evaluation value is real and effective, and the calculation result is more accurate and effective.

In addition to the above steps, the embodiment may further provide a corresponding reference collection policy according to the repayment intention of the lending user calculated in step S1003, and the service staff may adjust the communication mode or attitude with the lending user according to the provided reference collection policy to facilitate the repayment of the lending user as soon as possible.

Furthermore, the aforementioned steps can be continuously repeated according to a certain time interval (e.g., 5 seconds, 10 seconds, etc.), so that the collection policy for the lending user can be continuously adjusted, and payment can be more purposefully promoted as soon as possible.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. A method for determining a user's intent based on user voice information, comprising:

determining and calculating a voice characteristic value of the user intention according to semantic and intonation information contained in the real-time voice, wherein the method comprises the following steps: extracting and generating voices of each phrase contained in the real-time voice from the real-time voice according to a preset rule; according to a preset rule, endowing a tone coefficient to the voice of each phrase; determining a voice characteristic value according to the situation that the reference phrase for evaluation is contained by the real-time voice and the tone coefficient of the voice of the reference phrase for evaluation contained by the real-time voice;

2. The method according to claim 1, wherein determining the speech feature value according to the situation that the reference phrase for evaluation is included in the real-time speech and the intonation coefficient of the speech of the phrase included in the real-time speech comprises:

3. The method of claim 1, wherein the evaluation is obtained with a reference phrase according to the following method:

4. The method of claim 3, wherein the number of reference phrases for evaluation that are set in advance is obtained by:

sorting the phrases according to quantity;

determining a quantity selection interval according to the sequencing position;

5. The method of claim 4, wherein determining a number selection interval according to the ranked positions comprises:

6. The method of claim 1, wherein the computational model for determining the user's intent based on the user's voice information is obtained by:

initializing each parameter of the computer deep neural network;

7. The method of claim 6, wherein generating training sample data using historical speech of the user in the past when processing the specified transaction comprises:

8. The method of claim 7, wherein the generating feature values of the respective voices relative to the reference phrase for evaluation using the historical voices comprises:

9. The method according to claim 8, wherein the generating feature values of the voices relative to the reference phrase for evaluation according to the fact that the voices of the historical voices contain the reference phrase for evaluation and the intonation coefficients of the voices of the reference phrase for evaluation contained in the voices comprises:

10. The method of claim 1, wherein the specified transaction comprises a payment request from the user, and wherein the corresponding intention comprises a payment intention from the user.

11. A method of determining reference phrases for evaluation, which are used to determine a corresponding intent of a user when processing a specified transaction, the method comprising the steps of:

taking each phrase meeting the preset reference phrase for evaluation number requirement as a reference phrase for evaluation from each phrase corresponding to each voice in the historical voice;

determining and calculating a voice characteristic value of the user intention according to semantics and intonation information contained in real-time voice of a target user during processing of a specified transaction and the reference phrase for evaluation; and taking the voice characteristic value as an input parameter of a calculation model for determining the user intention based on the user voice information, and calculating to obtain an evaluation value of the intention of the target user when the specified transaction is processed at this time.

12. The method of determining reference phrases for evaluation according to claim 11, wherein the predetermined number of reference phrases for evaluation is obtained by:

sorting the phrases according to quantity;

determining a quantity selection interval according to the sequencing position;

13. The method of determining reference phrases for evaluation according to claim 12, wherein said determining a number selection interval according to the ranked positions comprises:

14. A method of generating sample data for training a computer deep neural network to generate a computational model that determines a user's intent based on user speech information, the method comprising the steps of:

acquiring historical voice of a user when a specified transaction is processed in the past, wherein the historical voice comprises a plurality of pieces of voice, and each piece of voice is voice of one user when the specified transaction is processed in the past;

generating feature values of the respective voices with respect to the respective reference phrases for evaluation using the historical voices, including: generating a characteristic value of each voice relative to each reference phrase for evaluation according to the condition that each voice in the historical voice contains each reference phrase for evaluation and the tone coefficient of the voice of the reference phrase for evaluation contained in each voice;

15. The method according to claim 14, wherein the generating feature values of the respective voices with respect to the respective reference phrases for evaluation based on the fact that the respective voices in the historical voices contain the respective reference phrases for evaluation and the intonation coefficients of the voices of the reference phrases for evaluation contained in the respective voices, comprises:

16. The method according to claim 15, wherein the generating feature values of the respective voices with respect to the respective reference phrases for evaluation based on the fact that the respective voices in the historical voices contain the respective reference phrases for evaluation and the intonation coefficients of the voices of the reference phrases for evaluation contained in the respective voices, comprises:

17. The method of generating training sample data as recited in claim 14, wherein the specified transaction comprises a request for a payment from a user, and wherein the corresponding intent comprises a user's intent to make a payment.

18. A method of generating a computational model for determining user intent based on user speech information, the computational model for determining user intent based on user speech information when processing a specified transaction, the method comprising the steps of:

initializing each parameter of the computer deep neural network;

using the converged computer deep neural network as a computational model for determining user intent based on user speech information;

wherein the generating of the sample data for training by using the historical speech of the user in the past processing of the specified transaction comprises: generating a characteristic value of each voice relative to the reference phrase for evaluation according to the condition that each voice in the historical voice contains each reference phrase for evaluation and the tone coefficient of the voice of the reference phrase for evaluation contained in each voice; and taking the characteristic value of each voice relative to the reference phrase for evaluation and the result of processing the specified transaction corresponding to each voice as sample data for training.

19. The method according to claim 18, wherein generating a feature value of each piece of speech with respect to the reference phrase for evaluation based on a fact that each piece of speech in the historical speech includes the reference phrase for evaluation and a pitch coefficient of speech of the reference phrase for evaluation included in each piece of speech comprises:

extracting and generating the voice of a phrase consisting of basic words corresponding to each voice from each voice according to the preset number of the basic words;

20. The method according to claim 19, wherein the generating feature values of the respective utterances with respect to the reference phrase for evaluation based on the fact that the respective utterances in the historical utterances include the respective reference phrases for evaluation and the pitch coefficients of the utterances of the respective utterances included in the respective reference phrases for evaluation comprises:

21. The method of generating a computational model for determining user intent based on user voice information as recited in claim 18, wherein the computational model is used to determine user intent to repay when a repayment is requested from a user.

22. An apparatus for determining a user's intent based on user speech information, comprising:

the determining unit is used for calculating an evaluation value of the intention of the target user when the specified transaction is processed at this time by taking the voice characteristic value as an input parameter of a calculation model for determining the intention of the user based on the voice information of the user;

wherein the voice feature value determination unit includes: the phrase determining subunit is used for extracting and generating the voice of each phrase contained in the real-time voice from the real-time voice according to a preset rule; a tone coefficient determining subunit, configured to assign a tone coefficient to the speech of each phrase according to a preset rule; and the voice characteristic value determining subunit is used for determining the voice characteristic value according to the condition that the reference phrase for evaluation is contained by the real-time voice and the tone coefficient of the voice of the reference phrase for evaluation contained by the real-time voice.

23. The apparatus of claim 22, wherein the speech feature value determination subunit is specifically configured to:

24. An apparatus for determining a reference phrase for evaluation, the reference phrase for evaluation being used to determine a corresponding intent of a user when processing a specified transaction, the apparatus comprising:

an evaluation reference phrase determining unit, configured to use, as an evaluation reference phrase, each phrase that meets a preset evaluation reference phrase number requirement from among phrases corresponding to each voice in the historical voice;

25. An apparatus for generating sample data for training a computer deep neural network to generate a computational model for determining a user's intent based on user speech information, the apparatus comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring historical voice of a user when a specified transaction is processed in the past, the historical voice comprises a plurality of pieces of voice, and each piece of voice is voice of one user when the specified transaction is processed in the past;

a sample data determining unit, configured to use a feature value of each speech with respect to each reference phrase for evaluation and a result of processing the specified transaction corresponding to each speech as sample data for training;

the feature value determining unit further includes a feature value determining subunit, configured to generate a feature value of each piece of speech with respect to each reference phrase for evaluation according to a situation that each piece of speech in the historical speech includes each reference phrase for evaluation and a intonation coefficient of speech of each reference phrase for evaluation included in each piece of speech.

26. The apparatus for generating sample data for training as set forth in claim 25, wherein the feature value determination unit further includes:

27. The apparatus for generating sample data for training as claimed in claim 26, wherein the eigenvalue determination subunit is specifically configured to:

28. An apparatus for generating a computational model for determining a user intent based on user speech information, the computational model for determining the user intent based on the user speech information when processing a specified transaction, comprising:

a model determination unit for using the converged computer deep neural network as a computational model for determining a user intention based on user speech information;

29. A method for determining a repayment intention of a user based on voice information of the user, comprising:

determining and calculating a voice characteristic value of the repayment intention of the user according to semantic and intonation information contained in the real-time voice, wherein the voice characteristic value comprises the following steps: extracting and generating voices of each phrase contained in the real-time voice from the real-time voice according to a preset rule; according to a preset rule, endowing a tone coefficient to the voice of each phrase; determining a voice characteristic value according to the situation that the reference phrase for evaluation is contained by the real-time voice and the tone coefficient of the voice of the reference phrase for evaluation contained by the real-time voice;

30. The method for determining a repayment intention of a user based on voice information of the user according to claim 29, further comprising:

31. A method for determining a user's intent based on user voice information, comprising:

taking the voice characteristic value as an input parameter of a calculation model for determining the user intention based on the user voice information, and calculating to obtain an evaluation value of the intention of the target user when the specified transaction is processed at this time;

wherein, the calculation model for determining the user intention based on the user voice information is obtained by adopting the following method: initializing each parameter of the computer deep neural network; generating training sample data by using the historical speech of the user when the specified affair is processed in the past; training the computer deep neural network with the training sample data until convergence; using the converged computer deep neural network as the computational model for determining user intent based on user speech information;

wherein the generating of the sample data for training by using the historical speech of the user in the past processing of the specified transaction comprises: acquiring historical voice of a user when the specified transaction is processed in the past, wherein the historical voice comprises a plurality of voices, and each voice is voice of one user when the specified transaction is processed in the past; generating characteristic values of the voices relative to the reference phrase for evaluation by using the historical voices; and taking the characteristic value of each voice relative to the reference phrase for evaluation and the result of processing the specified transaction corresponding to each voice as sample data for training.

32. The method of claim 31, wherein the generating feature values of the respective voices relative to the reference phrase for evaluation using the historical voices comprises:

33. The method according to claim 32, wherein the generating feature values of the voices relative to the reference phrase for evaluation according to the fact that the voices in the history include the reference phrase for evaluation and the intonation coefficients of the voices of the reference phrase for evaluation included in the voices includes:

34. The method of claim 31, wherein the specified transaction comprises a payment request from the user, and wherein the corresponding intent comprises a payment intent from the user.

35. An apparatus for determining a user's intent based on user speech information, comprising: