CN108205525A

CN108205525A - The method and apparatus that user view is determined based on user speech information

Info

Publication number: CN108205525A
Application number: CN201611187130.6A
Authority: CN
Inventors: 张柯; 王晓光; 褚巍; 施兴
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2016-12-20
Filing date: 2016-12-20
Publication date: 2018-06-26
Anticipated expiration: 2036-12-20
Also published as: CN108205525B

Abstract

The application provides a kind of method that user view is determined based on user speech information, including：Obtain the real-time voice of target user when affairs are specified in processing；The semanteme and prosody information included according to the real-time voice determines to calculate the speech characteristic value of user view；Using the speech characteristic value as the input parameter for the computation model that user view is determined based on user speech information, the assessed value of the intention of target user when this handles the specified affairs is calculated.The semanteme and prosody information that can be included according to the real-time voice of target user determine speech characteristic value, it ensure that the reliability of data source, play the role of the assessed value of the respective intent of target when affairs are specified in determining processing objective in real time, the related service personnel to handle the affairs provide to be indicated in real time.

Description

The method and apparatus that user view is determined based on user speech information

Technical field

This application involves a kind of methods of determining user view, and in particular to one kind determines user based on user speech information The method and apparatus of intention further relate to a kind of method and apparatus of determining assessment reference phrases, further relate to a kind of generation training With the method and apparatus of sample data, a kind of computation model for generating and user view being determined based on user speech information is further related to Method and apparatus.Further relate to a kind of method that user's refund intention is determined based on user speech information.

Background technology

In daily routines, when it is expected that user implements specific behavior, business personnel needs to take according to the user's intention Corresponding action or measure are to promote its to implement the specific behavior.

For example, when processing wishes that user implements the affairs of buying behavior, it is thus necessary to determine that the buying intention of user, and root According to the difference of the buying intention of user, corresponding strategy is taken, persuasion advice is carried out to user, implements purchase row to user For.

For another example, it in the affairs that processing requirement loan user refunds, during user's communication exchange with loaning bill, needs User is wanted to refund to be intended to be determined, and the difference being intended to according to the refund of determining user, take different measures to user It is advised, is such as helped to change, pressure etc., to promote its refund.

Above-mentioned processing is specified in affairs case, is required for determining the corresponding intention of user, can be taken accordingly not With measure and strategy.The mode of the existing intention that user is determined in the specified affairs of processing is mostly following several：

Mode one analyzes historical data when handling the specified affairs in the past being collected into, and includes the category of user at that time Property data and the result that handles at that time etc., according to be collected into this handle described specified things when corresponding target user Known attribute data and the match condition of the historical data objectively judge, determine the target user this processing institute State intention during specified affairs.

Mode two in relation to business personnel during the communication exchange with user, according to the experience of oneself, is used target The behaviors such as family language attitude carry out subjective judgement, determine intention during its this described specified affairs of processing.

Mode three, with reference to above two mode, i.e., business personnel by this to client it is objective judge and with its communication gap According to its language during logical, the subjective judgement of attitude progress, which combines, is determined its this described specified affairs of processing When intention.

Wherein mode one only analyzes the historical data being collected into, and does not account for user's present case and anticipates to it The influence of figure, and can not ensure correcting errors for the historical data being collected into, judging result is undoubtedly can be devious.And mode two With three performances dependent on the ability of business personnel itself and when being linked up with user, can not accomplish generally, it is objectively accurate true Determine the intention of user.

Work as it can be seen that the mode of the respective intent of the existing determining specified event of user's processing has subjectivity and do not account for user Preceding situation and analysis may insecure problems with data source.

Invention content

The application offer is a kind of to determine the method for user view based on user speech information, while provides a kind of based on user Voice messaging determines the device of user view.The application also provides a kind of method of determining assessment reference phrases, also provides one The device of the determining assessment reference phrases of kind.The application also provides a kind of method for generating training sample data, and the application is also A kind of device for generating training sample data is provided.The application also provides a kind of generation and determines user based on user speech information The method of the computation model of intention, the application also provide a kind of calculating mould for generating and user view being determined based on user speech information The device of type, the application also provide a kind of method that user's refund intention is determined based on user speech information.

A kind of method that user view is determined based on user speech information that the application provides, including：

Obtain the real-time voice of target user when affairs are specified in processing；

The semanteme and prosody information included according to the real-time voice determines to calculate the speech characteristic value of user view；

Using the speech characteristic value as the input parameter for the computation model that user view is determined based on user speech information, The assessed value of the intention of target user when this handles the specified affairs is calculated.

Optionally, the semanteme included according to the real-time voice and prosody information determine to calculate the language of user view Sound characteristic value includes：

According to preset rule, the extraction generation real-time voice is included each short from the real-time voice The voice of language；

It is that the voice of each phrase assigns intonation coefficient according to preset rule；

What the situation and the real-time voice included according to assessment reference phrases by the real-time voice was included The intonation coefficient of the voice of assessment reference phrases, determines speech characteristic value.

Optionally, the situation about being included according to assessment reference phrases by the real-time voice and the real-time language The intonation coefficient of the voice for the phrase that sound is included, determines that speech characteristic value includes：

Assessment reference phrases are included by the real-time voice, then corresponding speech characteristic value are set as described real-time The intonation coefficient of the voice of the assessment reference phrases that voice is included.

Optionally, the assessment is obtained in accordance with the following methods with reference phrases：

The history voice of user when handling the specified affairs in the past is obtained, the history voice includes a plurality of voice, Voice of the every voice for a user described in previous single treatment during specified affairs；

According to preset basic word quantity, from every voice corresponding to every voice of extraction generation by Each phrase of basic word composition；

In phrase from the history voice corresponding to each voice, preset assessment reference phrases will be met Each phrase of quantitative requirement is as assessment reference phrases.

Optionally, the preset assessment is obtained in accordance with the following methods with reference phrases quantitative requirement：

Each phrase is sorted according to population size；

Quantity selection section is gone out according to the location determination of sequence；

The quantity of phrase is fallen within into the quantitative requirement that the quantity selects section as assessment reference phrases.

Optionally, the location determination according to sequence goes out quantity selection section and includes：

By the phrase corresponding to the quantity of the phrase corresponding to sorting position in sequencing queue 40% to sorting position 60% Quantity as quantity select section.

Optionally, it is described to determine that the computation model of user view is obtained using following methods based on user speech information：

The parameters of initializing computer deep neural network；

Utilize the history speech production training sample data of user when handling the specified affairs in the past；

The computer deep neural network is trained using training sample data, until its convergence；

User view is determined based on user speech information using the convergent computer deep neural network as described Computation model.

Optionally, the history speech production training sample number for utilizing user when handling the specified affairs in the past According to including：

The history voice of user when handling the specified affairs in the past is obtained, the history voice includes a plurality of voice, The voice of one user every voice is specifies affairs described in previous single treatment when；

Using the history voice, characteristic value of each voice relative to assessment reference phrases is generated；

By characteristic value of each voice relative to assessment reference phrases and the processing corresponding to each voice The result of the specified affairs is used as training sample data.

Optionally, it is described to utilize the history voice, generate characteristic value of each voice relative to assessment reference phrases Including：

According to preset basic word quantity, from every voice corresponding to every voice of extraction generation by The voice of the phrase of basic word composition；

Determine the intonation coefficient of the voice of the phrase corresponding to every voice；

In phrase from the history voice corresponding to each voice, preset assessment reference phrases will be met Each phrase of quantitative requirement is as assessment reference phrases；

Each voice in the history voice includes the situation of each assessment reference phrases and described The intonation coefficient for the voice of assessment reference phrases that each voice is included, each voice of generation is relative to assessment with reference to short The characteristic value of language.

Optionally, each voice according to the history voice includes the situation of each assessment reference phrases And the intonation coefficient of the voice of assessment reference phrases that each voice is included, each voice of generation is relative to assessment Included with the characteristic value of reference phrases：

In the case of each voice in the history voice includes the assessment reference phrases, by corresponding voice Comprising assessment reference phrases voice intonation coefficient as the voice relative to the assessment with reference to short The characteristic value of language.

Optionally, the specified affairs include user is required to refund, correspondingly, corresponding intention includes user's refund Intention.

A kind of method for determining assessment reference phrases that the application provides, includes the following steps：

The history voice of user when handling the specified affairs in the past is obtained, the history voice includes a plurality of voice, often The voice of one user voice is specifies affairs described in previous single treatment when；

In each phrase from the history voice corresponding to each voice, preset assessment reference will be met Each phrase of phrase quantitative requirement is as assessment reference phrases.

Each phrase is sorted according to population size；

It will be corresponding to the quantity of the phrase corresponding to sorting position 40% in the sequencing queue to sorting position 60% The quantity of phrase selects section as quantity.

A kind of method for generation training sample data that the application provides, the sample data are used to train computer deep Degree neural network generation determines the computation model of user view based on user speech information, the described method comprises the following steps：

Using the history voice, characteristic value of each voice relative to each assessment reference phrases is generated；

By each voice relative to corresponding to the characteristic value of each assessment reference phrases and each voice The result for handling the specified affairs is used as training sample data.

Optionally, it is described to utilize the history voice, generate spy of each voice relative to each assessment reference phrases Value indicative includes：

According to preset basic word quantity, from every voice corresponding to every voice of extraction generation by The voice of each phrase of basic word composition；

In each phrase from the history voice corresponding to each voice, preset assessment reference will be met Each phrase of phrase quantitative requirement is as each assessment reference phrases；

Each voice in the history voice includes the situation of each assessment reference phrases and described The intonation coefficient for the voice of assessment reference phrases that each voice is included, each voice of generation are joined relative to each assessment Examine the characteristic value of phrase.

Optionally, each voice in the history voice includes the feelings of each assessment reference phrases The intonation coefficient of the voice of assessment reference phrases that condition and each voice are included, each voice of generation is relative to each The characteristic value of a assessment reference phrases includes：

The method that a kind of generation that the application provides determines the computation model of user view based on user speech information, it is described When computation model is used to handle specified affairs, user view is determined based on user speech information, the described method comprises the following steps：

The parameters of initializing computer deep neural network；

Using the convergent computer deep neural network as the calculating that user view is determined based on user speech information Model.

Optionally, each voice in the history voice includes the feelings of each assessment reference phrases The intonation coefficient of the voice of assessment reference phrases that condition and each voice are included, each voice of generation is relative to commenting Estimate and included with the characteristic value of reference phrases：

Optionally, the computation model is used to that user to be required to determine the intention that user refunds when refunding.

A kind of device that user view is determined based on user speech information that the application provides, including：

Acquiring unit, the real-time voice of target user when handling specified affairs for obtaining；

Speech characteristic value determination unit, semanteme and prosody information for being included according to the real-time voice determine to calculate The speech characteristic value of user view；

Determination unit, for using the speech characteristic value as the calculating mould that user view is determined based on user speech information The assessed value of the intention of target user when this handles the specified affairs is calculated in the input parameter of type.

Optionally, the speech characteristic value determination unit includes：

Phrase determination subelement, for according to preset rule, being extracted from the real-time voice and generating the reality The voice of each phrase that Shi Yuyin is included；

Intonation coefficient determination subelement, for being that the voice of each phrase assigns language according to preset rule Adjust coefficient；

Speech characteristic value determination subelement, for situation about being included according to assessment reference phrases by the real-time voice And the intonation coefficient of the voice of assessment reference phrases that the real-time voice is included, determine speech characteristic value.

Optionally, the speech characteristic value determination subelement is specifically used for：

A kind of device for determining assessment reference phrases that the application provides, the assessment reference phrases refer to for handling Determine to determine that user is intended to accordingly during affairs, described device includes：

Acquiring unit, the history voice of user when handling the specified affairs in the past for obtaining, the history voice packet A plurality of voice is included, the voice of one user every voice is specifies affairs described in previous single treatment when；

Phrase determination unit, for according to preset basic word quantity, generation to be extracted from every voice The each phrase being made of basic word corresponding to every voice；

Assessment reference phrases determination unit, for each phrase corresponding to each voice from the history voice In, each phrase of preset assessment reference phrases quantitative requirement will be met as assessment reference phrases.

A kind of device for generation training sample data that the application provides, the sample data are used to train computer deep Degree neural network generation determines the computation model of user view based on user speech information, and described device includes：

Characteristic value determination unit, for utilizing the history voice, each voice of generation is referred to relative to each assessment The characteristic value of phrase；

Sample data determination unit, for by each voice relative to each assessment reference phrases characteristic value with The result of the processing specified affairs corresponding to each voice is used as training sample data.

Optionally, the characteristic value determination unit includes：

Phrase determination subelement, for according to preset basic word quantity, life to be extracted from every voice Into the voice of each phrase being made of basic word corresponding to every voice；

Intonation coefficient determination subelement, for determining the intonation coefficient of the voice of the phrase corresponding to every voice；

Reference phrases determination subelement, in each phrase corresponding to each voice from the history voice, inciting somebody to action Meet each phrase of preset assessment reference phrases quantitative requirement as each assessment reference phrases；

Characteristic value determination subelement includes each assessment ginseng for each voice in the history voice It examines the situation of phrase and the intonation coefficient of the voice of assessment reference phrases that each voice is included, generates each language Sound relative to each assessment reference phrases characteristic value.

Optionally, the characteristic value determination subelement, is specifically used for：

A kind of generation that the application provides determines the device of the computation model of user view based on user speech information, described When computation model is used to handle specified affairs, user view is determined based on user speech information, including：

Initialization unit, for the parameters of initializing computer deep neural network；

Sample data generation unit, for being instructed using the history speech production of user when handling the specified affairs in the past White silk sample data；

Training unit, for training the computer deep neural network using training sample data, until its Convergence；

Model determination unit, for using the convergent computer deep neural network as based on user speech letter Breath determines the computation model of user view.

A kind of method that user's refund intention is determined based on user speech information that the application provides, including：

The real-time voice of target debt-credit user is obtained during collection；

The semanteme and prosody information included according to the real-time voice determines to calculate the phonetic feature that user's refund is intended to Value；

Using the speech characteristic value as the input for the computation model that user's refund intention is determined based on user speech information The assessed value that the refund of target debt-credit user during this collection is intended to is calculated in parameter.

Optionally, it further includes：

According to the assessed value, provide accordingly with reference to collection strategy.

Compared with prior art, a kind of method for determining user view based on user speech information that the application provides has Advantages below：

The semanteme and prosody information that can be included according to the real-time voice of target user determine speech characteristic value, ensure that The reliability of data source plays the assessed value of the respective intent of target when affairs are specified in determining processing objective in real time Effect, the related service personnel to handle the affairs provide the effect indicated in real time.

Compared with prior art, a kind of method for determining assessment reference phrases that the application provides has the following advantages：

Determine that assessment is used according to quantitative requirement in the phrase that the voice of user is included during affairs specified from previous processing Reference phrases can play processing and specify the effect for accurately determining that the key of the intention of user is semantic during affairs, reach raising and comment Estimate the effect of accuracy.

Compared with prior art, a kind of method for generation training sample data that the application provides has the following advantages：

By the use of the specified affairs of previous processing be user history voice as source data, by the language of previous history voice Feature establishes correspondence with handling the result of the specified affairs, not only acts as and ensures that data source reliably acts on, also reaches To the effect for improving training generation computation model efficiency.

Compared with prior art, a kind of generation that the application provides determines the calculating of user view based on user speech information The method of model has the following advantages：

The sample data formed using the history voice for arising directly from the specified affairs of previous processing trains computer depth Neural network obtains computation model, can play the role of improving computation model calculating accuracy rate, reach and improve working efficiency Effect.

Description of the drawings

Fig. 1 is that a kind of flow of method that user view is determined based on user speech information of the application first embodiment is illustrated Figure；

Fig. 2 is a kind of schematic diagram for the method that user view is determined based on user speech information of the application first embodiment；

Fig. 3 is a kind of data flow for the method that user view is determined based on user speech information of the application first embodiment Schematic diagram；

Fig. 4 is a kind of flow diagram of the method for determining assessment reference phrases of the application second embodiment；

Fig. 5 is a kind of flow diagram for the method for generating training sample data of the application 3rd embodiment；

Fig. 6, which is that the application fourth embodiment is a kind of, generates the computation model that user view is determined based on user speech information The flow diagram of method；

Fig. 7 is a kind of structural frames for the device that user view is determined based on user speech information of the 5th embodiment of the application Figure；

Fig. 8 is a kind of structure diagram of the device of determining assessment reference phrases of the application sixth embodiment；

Fig. 9 is a kind of structure diagram for the device for generating training sample data of the 7th embodiment of the application；

Figure 10, which is that the 8th embodiment of the application is a kind of, generates the computation model that user view is determined based on user speech information Device structure diagram；

Figure 11 is a kind of stream for the method that user's refund intention is determined based on user speech information of the 9th embodiment of the application Journey schematic diagram.

Specific embodiment

Many details are elaborated in the following description in order to fully understand the application.But the application can be with Much implement different from other manner described here, those skilled in the art can be in the situation without prejudice to the application intension Under do similar popularization, therefore the application is not limited by following public specific implementation.

A kind of method that user view is determined based on user speech information of the application first embodiment, flow diagram is such as Shown in Fig. 1, the present embodiment illustrates for its intention of refunding is determined when processing requirement target user refunds, the embodiment packet Include following steps：

Step S101 obtains the real-time voice of target user when affairs are specified in processing.

Determine that target user's refund is said for being intended to when processing requirement target user refunds in the present embodiment It is bright.When needing to carry out urging money to the target user by means of money, business personnel would generally by phone and the target user into Row communication is linked up, and reminds or it is required to refund as early as possible.

The real-time voice refers to current point in time when this handles specified affairs (the urging money to the target user) The voice of the target user before.In the telephone interview process of business personnel and the target customer, at any time The raw tone of this phone before current point in time is got, for including business personnel's sound letter in original phone voice The situation of breath before this step, is removed business personnel's voice in original phone voice, this step obtains this to institute State the voice of the acoustic information for only including loaning bill user when loaning bill user urges money before current point in time.

Step S102, the semanteme and prosody information included according to the real-time voice determine to calculate the voice of user view Characteristic value.

The semanteme and prosody information included according to the real-time voice determines to calculate the speech characteristic value of user view Mode can there are many, can be according to application scenarios, number of word that the length or real-time voice of real-time voice are included etc. Multiple concrete conditions determine that the present embodiment provides mode as described below：

First, according to preset rule, the extraction generation real-time voice is included from the real-time voice The voice of each phrase.

The preset rule refers to extract life from the history voice of user when handling the specified affairs in the past It is similarly regular into phrase.

The history voice includes a plurality of voice, one user every voice is specifies affairs described in previous single treatment when Voice.The history voice only includes voice of the voice of user without including other staff.

The rule can be as described below：, the long sentence corresponding to voice is decomposed into basic word, according to preset The basic word quantitative requirement of phrase is generated, basic word adjacent included in the voice is formed into phrase.

Software tool may be used (such as in the mode that long sentence corresponding to the voice is decomposed into basic word Word2voctor or CRF) the corresponding long sentence of the voice is grasped.

For example, in the case of it is " I am out of funds to be not desired to go back money " to be the long sentence corresponding to user speech, utilize Word2voctor tools, the basic word that acquisition forms the long sentence are respectively " I ", " out of funds ", " being not desired to " and " going back money ".

Quantity for word basic included in the phrase that is previously set be less than or equal to 3 in the case of, such as Fig. 2 It is shown, following 9 phrases can be separately constituted：

The phrase that the quantity of basic word is 1 includes " I ", " out of funds ", " being not desired to " and " going back money "；

The phrase that the quantity of basic word is 2 includes " I am out of funds ", " out of funds to be not desired to " and " being not desired to go back money "；

The phrase that the quantity of basic word is 3 includes " I am out of funds to be not desired to " and " out of funds to be not desired to go back money ".

Using the voice of the long sentence, therefrom intercept and preserve the voice corresponding to each phrase.

It is that the voice of each phrase assigns language according to preset rule after the voice for obtaining each phrase Adjust coefficient.

The language that the history voice of user determines when the preset rule refers to by handling the specified affairs in the past Adjust coefficient reference standard.

In the following manner acquisition may be used in the determining intonation coefficient reference standard：

The history voice of user when handling the specified affairs in the past is obtained, the history voice includes a plurality of voice, often The voice of one user voice is specifies affairs described in previous single treatment when.The history voice only includes the language of user Voice of the sound without including other staff.

According to preset basic word quantity, from every voice corresponding to every voice of extraction generation by The voice of each phrase of basic word composition.The method of extraction is identical with previous step, and details are not described herein.

By the intonation value of the voice of each phrase of the history speech production by user (i.e. according to the voice of phrase The value that frequency is calculated, specific computational methods can be selected according to actual conditions) sequence, it will according to the size of intonation value Intonation value is divided into the section of specified number.Such as in the case of intonation value is up to 100, it is averagely divided into 10 from small to large Intonation value section, respectively 0-10,11-20 ... 81-90 and 91- ∞.

For each section, corresponding intonation coefficient is set.9 and 10 such as 1,2 ....

So far the intonation coefficient reference standard determined by the history voice of user when handling the specified affairs in the past It has determined.

According to the reference standard, respectively the intonation value of each phrase corresponding to according to the real-time voice, its tax Give intonation coefficient.

The intonation value of phrase as corresponding to the real-time voice drops into this intonation section of 11-20, then is assigned for it Intonation coefficient be 2.Other situations and so on.

After intonation coefficient being assigned for each phrase, situation about being included according to assessment reference phrases by the real-time voice And the intonation coefficient of the voice of assessment reference phrases that the real-time voice is included, determine speech characteristic value.

The assessment may be used following methods with reference phrases and obtain：

With in abovementioned steps similarly, according to preset basic word quantity, life is extracted from every voice Into each phrase being made of basic word corresponding to every voice, this will not be repeated here.

In phrase from the history voice corresponding to each voice, preset assessment reference phrases will be met Each phrase of quantitative requirement is used as assessment each determined by the history voice for handling the specified affairs as previous user Reference phrases.

It counts by the quantity of the different phrases of the history speech production, the different phrase refers to the semanteme text of phrase This different phrase for the identical phrase of the semantic text of phrase, is calculated as the quantity of same phrase.Such as 2 are gone through When all including " I " this phrase in history voice, no matter its intonation value or intonation coefficient in different voices etc. other because The quantity of " I " this phrase because its semantic text is all " I ", is then calculated as 2 by element.

The preset assessment can determine that the application is provided with reference phrases quantitative requirement according to actual conditions Following mode carrys out the requirement really：

Different phrases is sorted according to the size of its quantity occurred in the history voice, according to the position of sequence It determines that quantity selects section, the serial number of sequence can be specifically normalized, that is, it is 0 to 1 to form serial number range Situation, using the position of sequence be 40% to 60% phrase corresponding to phrase quantity as quantity selection section endpoint Forming quantity selects section.The section can include the endpoint in section, can not also include, and concrete condition can be according to reality Situation determines.Select 40% to 60% as quantity selection section can be too many by occurrence number in history voice or very little with The unrelated phrase of the affairs filters out, and retains the common phrase related with the affairs.

By the determining assessment of history voice institute phrase and the specified affairs used by assessing are enabled to reference phrases It is mostly concerned.

For example, in the case of 100 phrases are according to quantity ranking from big to small, by the phrase of ranking 40 to ranking 60 Quantity of the quantity as quantity selection section, and the phrase quantity that the quantity ranking of phrase is 40 is 1000, ranking is 60 The quantity of phrase is 700, then the quantity of phrase can be dropped into numbers of the section 700-1000 as assessment reference phrases Amount requirement.

It will by the quantity for meeting the assessment reference phrases in the phrase corresponding to each voice in the history voice The phrase asked is as assessment reference phrases.Such as the example of the quantitative requirement for previously described assessment reference phrases, It can be short as assessment reference for the genitive phrase more than or equal to 700 and less than or equal to 1000 using the quantity of phrase Language.

The situation that each assessment reference phrases are included by the real-time voice is judged successively, for some assessment Corresponding speech characteristic value is set as the real-time voice and included by situation about being included with reference phrases by the real-time voice The assessment reference phrases voice intonation coefficient.

For example, including " I am out of funds " this phrase in the fact voice, intonation coefficient is 2, and " I am out of funds " is Assessment reference phrases, the then corresponding speech characteristic value that the real-time voice is corresponded to this reference phrases " I am out of funds " are set It is 2.

In the case of some assessment reference phrases is not included by the real-time voice, by corresponding phonetic feature Value is set as realizing the value of the representative " not including " of setting.

For example, include " short of money " this phrase in assessment reference phrases, and not including in the true voice has " short of money " this phrase, the value for representing " not including " being previously set are 0, then the real-time voice are corresponded to this with reference to short The corresponding speech characteristic value of language " short of money " is set as 0

Step S103, using the speech characteristic value as the computation model that user view is determined based on user speech information The assessed value of the intention of target user when this handles the specified affairs is calculated in input parameter.

After forming speech characteristic value, using them as the defeated of the computation model that user view is determined based on user speech information Enter parameter.The calculating is the computer deep neural network that secures relevant parameter with model, and following methods may be used and obtain ：

Setting computer deep neural network simultaneously initializes its parameters.Such as input, output, the number of plies, weight etc..Institute The assessed value of the intention of user when stating the output of computer deep neural network to handle the specified affairs.

The computer deep neural network is trained using training sample data, until its convergence.

In the following manner may be used in the training computer deep neural network：

Each voice generated according to the history voice of user when handling the specified affairs in the past is corresponded to each The characteristic value of assessment reference phrases is as the input of the computer deep neural network.

The characteristic value obtains in the following ways：

According to preset basic word quantity, carried from the history voice of user when handling the specified affairs in the past Take the voice of the phrase being made of basic word corresponding to every voice of generation.Detailed description can be referred in preceding step Corresponding description.

Determine the intonation coefficient of the voice of the phrase corresponding to every voice.Detailed description can refer to front and walk It is described accordingly in rapid.

In phrase from the history voice corresponding to each voice, preset assessment reference phrases will be met It is each determined by the history voice of user each phrase of quantitative requirement was as handling the specified affairs in the past when to comment Estimate and use reference phrases.Detailed description can be referred to and be described accordingly in preceding step.

Each voice in the history voice includes the situation of each assessment reference phrases and described The intonation coefficient for the voice of assessment reference phrases that each voice is included, each voice of generation was relative to by handling institute in the past The characteristic value of each assessment reference phrases determined by the history voice of user when stating specified affairs.The characteristic value is really It is fixed in abovementioned steps speech characteristic value it is determining corresponding.Detailed description can be referred in preceding step about phonetic feature The associated description of value.

The characteristic value is handled the specified transaction results and is believed together as generation is described based on user speech with corresponding Breath determines the computation model training sample data of user view.The sample data comes from handles the specified affairs in the past When user history voice and as a result, more directly it is more reliable.

By one group of characteristic value with each assessment reference phrases corresponding to a history voice in the sample data It, will be described in processing corresponding with the history voice as the primary training input data of the computer deep neural network Desired value of the result of specified affairs as the output valve of the computer deep neural network, according to preset output valve Tolerance interval, constantly training adjust the parameters of the computer deep neural network, until its convergence.Such as place Reason requires the situation that user refunds, and the result of some user secondary refund represents that it is not refunded for " 0 ", correspondingly, having gone back The corresponding value of situation of money is " 1 ", and preset output valve tolerance interval is 20%, then when the computer deep neural network Output be greater than or equal to 0.8 when, it is believed that its export result meet the expection refunded.Otherwise it is assumed that it does not meet；Work as institute When stating the output of computer deep neural network less than or equal to 0.2, it is believed that it exports result and meets the expection do not refunded. Otherwise it is assumed that it does not meet.For not meeting expected situation, need to adjust the relevant parameter of the computer deep neural network.

After the computer deep neural network to its convergence being trained using sample data, the computer depth nerve net The parameters of network are fixed up, and the computer deep neural network at this time can be used as determining based on user speech information The computation model of user view.

Using the speech characteristic value obtained in previous step as the input parameter of the computation model, it is calculated at this The assessed value that the user is intended to and (refunds and be intended to) accordingly when managing the specified affairs (such as requiring its refund).

Such as in the case of the real-time voice of user is " I am out of funds to be not desired to go back money ", corresponding speech characteristic value is " using the speech characteristic value as the input of the computation model, assessment is calculated in 0,4,3,5,2,3,5,3,4 " situation Such as " 0.7 ", then it is 0.7 that this requires user's assessed value that the refund of the user is intended to when refunding to value.

The assessed value that the refund for the target customer being calculated is intended to can help business personnel to take corresponding plan Slightly or measure is to promote it to refund as early as possible.

The one kind for briefly explaining the application offer from the angle of data flow again below determines to use based on user speech information A kind of scheme in the method that family is intended to, as shown in Figure 3.

The history voice of user carried out semantic processes and obtains history speech phrase during to handling specified affairs in the past, to described History speech phrase carries out phrase operation processing and obtains assessment reference phrases.

Intonation is carried out to the history speech phrase to handle to obtain the intonation coefficient of history speech phrase, the history voice The intonation coefficient of phrase is combined with assessment reference phrases and history voice, and history is obtained after characteristic value generates The characteristic value of voice, using the history voice characteristic value and it is corresponding handled the specified affairs in the past when processing knot Fruit training computer deep neural network obtains computation model.

The real-time voice of target user carries out corresponding semantic and intonation processing during to handling specified affairs, can successively obtain To phrase and the intonation coefficient of real-time voice phrase, the processing rule of the semantic processes and intonation processing to handling in the past with referring to Determining the history voice of user during affairs, to carry out semantic processes identical when being handled with intonation.

According to the real-time voice, the intonation coefficient of phrase and during by handling specified affairs in the past user history voice Identified assessment generates to obtain speech characteristic value with reference phrases by characteristic value.

Using the speech characteristic value as the input of computation model, the assessment of the intention of the target user is calculated Value.

So far completion single treatment specifies and the intention of target user is determined during affairs, if successively handled The real-time voice of target user during the specified affairs can then apply this method constantly to determine the meaning of the target user The assessed value of figure, the assessed value constantly can provide corresponding reference for business personnel, correspondingly tactful in order to take And measure.

The application second embodiment provides a kind of method of determining assessment reference phrases, and the assessment is used with reference phrases User is intended to accordingly when determining that affairs are specified in processing, and the flow diagram of this method is as shown in figure 4, include the following steps：

Step S201, obtains the history voice of user when handling the specified affairs in the past, and the history voice includes more Voice, the voice of one user every voice is specifies affairs described in previous single treatment when.

The detailed description of this step can refer to relevant description in the application first embodiment step S101 and S102, This will not be repeated here.

Step S202, according to preset basic word quantity, every voice of extraction generation from every voice The corresponding each phrase being made of basic word.

The detailed description of this step can refer to relevant description in the application first embodiment step S102, herein not It repeats.

Step S203 in each phrase from the history voice corresponding to each voice, will meet preset Each phrase of assessment reference phrases quantitative requirement is as assessment reference phrases.

The application 3rd embodiment provides a kind of method for generating training sample data, and the sample data is used to generate The computation model of user view is determined based on user speech information.The flow diagram of this method as described in Figure 5, including following step Suddenly：

Step S301, obtains the history voice that previous user handles the specified affairs, and the history voice includes a plurality of Voice, the voice of one user every voice is specifies affairs described in previous single treatment when.

Step S302 using the history voice, generates feature of each voice relative to each assessment reference phrases Value.

The detailed description of this step can refer to relevant description in the application first embodiment step S103, herein not It repeats.

Step S303, by characteristic value of each voice relative to each assessment reference phrases and each voice The result of the corresponding processing specified affairs is used as training sample data.

The application fourth embodiment provides a kind of method for the computation model that user view is determined based on user speech information, The computation model is used to determine user view based on user speech information.The flow diagram of this method as shown in fig. 6, including Following steps：

Step S401, the parameters of initializing computer deep neural network.

Step S402, the training sample number that the history voice that the specified affairs are handled according to previous user is generated Each voice in is relative to the characteristic value of each assessment reference phrases as the defeated of the computer deep neural network Enter, the user corresponding to corresponding voice handled to output desired value of the result as the neural network of the specified affairs, According to preset output valve tolerance interval, the training computer deep neural network, until its convergence.

Step S403, using the convergent computer deep neural network as based on user speech information determining to use The computation model that family is intended to.

Method provided in this embodiment and the acquisition side of the computation model described in the application first embodiment step S103 Method is similar, and more detailed description can refer to the related description in the application first embodiment step S103.It does not do herein superfluous It states.

The 5th embodiment of the application provides a kind of device that user view is determined based on user speech information, described device knot Structure block diagram is as shown in fig. 7, comprises acquiring unit U501, speech characteristic value determination unit U502 and determination unit U503.

The acquiring unit U501, for obtaining the real-time voice of the specified affairs of user's processing；

The speech characteristic value determination unit U502, for the semanteme and prosody information included according to the real-time voice Determine the speech characteristic value of calculating user view.

The speech characteristic value determination unit U502 can include, phrase determination subelement, intonation coefficient determination subelement With speech characteristic value determination subelement.

The phrase determination subelement, for according to preset rule, generation institute to be extracted from the real-time voice State the voice of each phrase that real-time voice is included.

The intonation coefficient determination subelement, for being that the voice of each phrase is assigned according to preset rule Give intonation coefficient.

The speech characteristic value determination subelement can be specifically used for：Assessment reference phrases are wrapped by the real-time voice Contain, then corresponding speech characteristic value is set as the language of the voice of the assessment reference phrases that the real-time voice included Adjust coefficient.

The determination unit U503, for the speech characteristic value to be determined user view as based on user speech information Computation model input parameter, be calculated this handle described specified affairs when the target user intention assessment Value.

The application sixth embodiment provides a kind of device of determining assessment reference phrases, and the assessment is used with reference phrases When user's processing specified affairs are assessed its it is corresponding be intended to, described device structure diagram as shown in figure 8, including：Acquiring unit U601, phrase determination unit U602 and assessment reference phrases determination unit U603.

The acquiring unit U601, for obtaining the history voice that previous user handles the specified affairs, the history Voice includes a plurality of voice, the voice of one user every voice is specifies affairs described in previous single treatment when.

The phrase determination unit U602, for according to preset basic word quantity, from every voice The each phrase being made of basic word corresponding to extraction every voice of generation.

The assessment reference phrases determination unit U603, it is each corresponding to for each voice from the history voice In a phrase, each phrase of preset assessment reference phrases quantitative requirement will be met as assessment reference phrases.

The 7th embodiment of the application provides a kind of device for generating training sample data, and the sample data is used to generate For determining the method computation model of user view based on user speech information, described device structure diagram is as shown in figure 9, packet It includes：Acquiring unit U701, characteristic value determination unit U702 and sample data determination unit U703.

The acquiring unit U701, for obtaining the history voice that previous user handles the specified affairs, the history Voice includes a plurality of voice, the voice of one user every voice is specifies affairs described in previous single treatment when.

The characteristic value determination unit U702, for utilizing the history voice, each voice of generation is commented relative to each Estimate the characteristic value with reference phrases.

This feature value determination unit can include：Phrase determination subelement, intonation coefficient determination subelement, intonation coefficient are true Stator unit and characteristic value determination subelement.

The phrase determination subelement, for according to preset basic word quantity, being carried from every voice Take the voice of each phrase being made of basic word corresponding to every voice of generation；

The intonation coefficient determination subelement, for determining the intonation system of the voice of the phrase corresponding to every voice Number.

The reference phrases determination subelement, for each phrase corresponding to each voice from the history voice In, each phrase of preset assessment reference phrases quantitative requirement will be met as each assessment reference phrases.

The characteristic value determination subelement includes described for each voice in the history voice audio data The intonation system of the voice of assessment reference phrases that the situation of each assessment reference phrases and each voice are included Number generates characteristic value of each voice relative to each assessment reference phrases.

The characteristic value determination subelement, can also be specifically used for：For each item in the history voice audio data Voice includes the situation of the assessment reference phrases, the voice of the assessment reference phrases that corresponding voice is included Intonation coefficient as the voice relative to the characteristic value of the assessment reference phrases.

The characteristic value determination subelement, can also be specifically used for：For each item in the history voice audio data Voice does not include the situation of the assessment reference phrases, and the preset value for representing not including is opposite as the voice In the characteristic value of the assessment reference phrases.

The sample data determination unit U703, for by each voice relative to each assessment reference phrases The result of characteristic value and user's processing specified affairs corresponding to each voice is used as training for assessing at user The training sample data of its model being intended to accordingly when managing the specified affairs.

The 8th embodiment of the application provides the computation model that a kind of generation determines user view based on user speech information Device when the computation model is used to handle specified affairs, determines user view, the apparatus structure frame based on user speech information Figure is as shown in Figure 10, including：Initialization unit U801, sample data generation unit U802, training unit U803 and model determine Unit U804.

The initialization unit U801, for the parameters of initializing computer deep neural network.

The sample data initialization unit U802, the history of user when handling the specified affairs in the past for utilizing Speech production training sample data.

The training unit U803, for generated the history voice that the specified affairs are handled according to previous user Training is by the use of each voice in sample data relative to the characteristic value of each assessment reference phrases as the computer depth User corresponding to corresponding voice is handled the result of the specified affairs as the neural network by the input of neural network Desired value is exported, according to preset output valve tolerance interval, the training computer deep neural network, until its receipts It holds back.

The model determination unit U804, for using the convergent computer deep neural network as based on user's language Message breath determines the computation model of user view.

The 9th embodiment of the application provides a kind of method that user's refund intention is determined based on user speech information, flow Schematic diagram is as shown in figure 11, including：

Step S901, the real-time voice of target user when obtaining collection.

For the business offered loans, business personnel needs to carry out the work that it is urged to refund for debt-credit user.Usual feelings Under condition, business personnel can carry out communication with debt-credit user by phone and urge its refund.

This step is obtained with the debt-credit user's before current time during the telephonic communication of the debt-credit user Voice.The voice only includes the voice during described this telephonic communication of debt-credit user, without including business personnel Voice.

Step S902, the semanteme and prosody information included according to the real-time voice determine that calculating user refunds what is be intended to Speech characteristic value.

Speech characteristic value and the phonetic feature in the application first embodiment that user's refund is intended to are calculated described in this step Be worth it is similar, can be by obtaining semanteme that it is included and prosody information to the processing of the voice and according to the semantic and language Information is adjusted to determine for calculating the speech characteristic value that the refund of debt-credit user is intended to.

The detailed description of this step can refer to the associated description in the application first embodiment in step S102.Herein It does not repeat.

Step S903, using the speech characteristic value as the calculating mould that user's refund intention is determined based on user speech information The assessed value that the refund of target user during this collection is intended to is calculated in the input parameter of type.

This step determines that user's refund is intended to using the speech characteristic value that previous step obtains as based on user speech information Computation model input parameter, be calculated the target user refund be intended to assessed value.

It is described to determine that the computation model that user's refund is intended to be implemented with reference to the application first based on user speech information The related description of the computation model of user view is determined in example based on user speech information.This will not be repeated here.

User can be worked as according to the speech characteristic value that the semanteme and prosody information of voice determine to calculate user's refund intention Preceding actual state factor, which introduces it and refunds, is intended to the calculating of assessed value, and information source is in the voice of user, authentic and valid, Cause result of calculation more accurate and effective.

The present embodiment, can also be according further to being calculated other than provide above-mentioned step in step S1003 The refund for borrowing or lending money user is intended to, and provides accordingly with reference to collection strategy, business personnel can be according to the reference collection plan provided Slightly, adjustment with the communication way for borrowing or lending money user or attitude it to be facilitated to refund as early as possible.

Further, abovementioned steps can constantly be repeated according to certain time interval (such as 5 seconds, 10 seconds etc.), you can Constantly adjustment more targetedly promotees it and refunds as early as possible to the collection strategy of the debt-credit user.

Although the application is disclosed as above with preferred embodiment, it is not for limiting the application, any this field skill Art personnel are not being departed from spirit and scope, can make possible variation and modification, therefore the guarantor of the application Shield range should be subject to the range that the application claim is defined.

In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and memory.

Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.

1st, computer-readable medium can be by any side including permanent and non-permanent, removable and non-removable media Method or technology realize that information stores.Information can be computer-readable instruction, data structure, the module of program or other numbers According to.The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), fast flash memory bank or other memory techniques, CD-ROM are read-only Memory (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic tape cassette, tape magnetic rigid disk storage or Other magnetic storage apparatus or any other non-transmission medium, available for storing the information that can be accessed by a computing device.According to Herein defines, and computer-readable medium does not include non-temporary computer readable media (transitory media), such as modulates Data-signal and carrier wave.

2nd, it will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer program production Product.Therefore, the embodiment in terms of complete hardware embodiment, complete software embodiment or combination software and hardware can be used in the application Form.It can be used moreover, the application can be used in one or more computers for wherein including computer usable program code The computer program product that storage medium is implemented on (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Form.

Claims

1. a kind of method that user view is determined based on user speech information, is characterized in that, including：

Using the speech characteristic value as the input parameter for the computation model that user view is determined based on user speech information, calculate Obtain the assessed value of the intention of target user when this handles the specified affairs.

2. the method according to claim 1 that user view is determined based on user speech information, which is characterized in that described The speech characteristic value that the semanteme and prosody information included according to the real-time voice determines to calculate user view includes：

According to preset rule, extracted from the real-time voice and generate each phrase that the real-time voice is included Voice；

The assessment that the situation and the real-time voice included according to assessment reference phrases by the real-time voice is included With the intonation coefficient of the voice of reference phrases, speech characteristic value is determined.

3. the method according to claim 2 that user view is determined based on user speech information, which is characterized in that described The voice of phrase that the situation and the real-time voice included according to assessment reference phrases by the real-time voice is included Intonation coefficient, determine that speech characteristic value includes：

Assessment reference phrases are included by the real-time voice, then corresponding speech characteristic value are set as the real-time voice Comprising the assessment reference phrases voice intonation coefficient.

4. the method according to claim 2 that user view is determined based on user speech information, which is characterized in that institute's commentary Estimate and obtained in accordance with the following methods with reference phrases：

Obtain the history voice of user when handling the specified affairs in the past, the history voice includes a plurality of voice, every Voice of the voice for a user described in previous single treatment during specified affairs；

According to preset basic word quantity, from every voice corresponding to every voice of extraction generation by basic Each phrase of word composition；

In phrase from the history voice corresponding to each voice, preset assessment reference phrases quantity will be met It is required that each phrase as assessment reference phrases.

5. the method according to claim 4 that user view is determined based on user speech information, which is characterized in that described pre- The assessment first set is obtained in accordance with the following methods with reference phrases quantitative requirement：

Each phrase is sorted according to population size；

6. the method according to claim 5 that user view is determined based on user speech information, which is characterized in that described Go out quantity selection section according to the location determination of sequence to include：

By the number of the phrase corresponding to the quantity of the phrase corresponding to sorting position in sequencing queue 40% to sorting position 60% Amount selects section as quantity.

7. the method according to claim 1 that user view is determined based on user speech information, which is characterized in that the base The computation model for determining user view in user speech information is obtained using following methods：

The parameters of initializing computer deep neural network；

8. the method according to claim 7 that user view is determined based on user speech information, which is characterized in that the profit Included with the history speech production training of user when handling the specified affairs in the past with sample data：

Obtain the history voice of user when handling the specified affairs in the past, the history voice includes a plurality of voice, every The voice of one user voice is specifies affairs described in previous single treatment when；

By each voice relative to described in the processing corresponding to the characteristic value of assessment reference phrases and each voice The result of specified affairs is used as training sample data.

9. the method according to claim 8 that user view is determined based on user speech information, which is characterized in that the profit With the history voice, each voice of generation includes relative to the characteristic value of assessment reference phrases：

According to preset basic word quantity, from every voice corresponding to every voice of extraction generation by basic The voice of the phrase of word composition；

In phrase from the history voice corresponding to each voice, preset assessment reference phrases quantity will be met It is required that each phrase as assessment reference phrases；

Each voice in the history voice includes the situation of each assessment reference phrases and each item The intonation coefficient for the voice of assessment reference phrases that voice is included, each voice of generation is relative to assessment reference phrases Characteristic value.

10. the method according to claim 9 that user view is determined based on user speech information, which is characterized in that described The situation of each assessment reference phrases and each voice institute are included according to each voice of the history voice Comprising assessment reference phrases voice intonation coefficient, generation each voice relative to assessment reference phrases characteristic value Including：

In the case of each voice in the history voice includes the assessment reference phrases, corresponding voice is wrapped The intonation coefficient of the voice of assessment reference phrases contained is as the voice relative to the assessment reference phrases Characteristic value.

11. the method according to claim 1 that user view is determined based on user speech information, spy is being, the finger Determine affairs to include user is required to refund, correspondingly, corresponding intention includes the intention of user's refund.

12. a kind of method of determining assessment reference phrases, the assessment reference phrases are used to determine when handling specified affairs User is intended to accordingly, and the method is characterised by comprising following steps：

The history voice of user when handling the specified affairs in the past is obtained, the history voice includes a plurality of voice, every language The voice of one user sound is specifies affairs described in previous single treatment when；

In each phrase from the history voice corresponding to each voice, preset assessment reference phrases will be met Each phrase of quantitative requirement is as assessment reference phrases.

13. the method for determining assessment reference phrases according to claim 12, which is characterized in that described preset Assessment is obtained in accordance with the following methods with reference phrases quantitative requirement：

Each phrase is sorted according to population size；

14. the method for determining assessment reference phrases according to claim 13, which is characterized in that described according to sequence Location determination goes out quantity selection section and includes：

By the phrase corresponding to the quantity of the phrase corresponding to sorting position 40% in the sequencing queue to sorting position 60% Quantity as quantity select section.

15. a kind of method for generating training sample data, the sample data is used to that computer deep neural network to be trained to give birth to Into the computation model that user view is determined based on user speech information, the method is characterised by comprising following steps：

By characteristic value of each voice relative to each assessment reference phrases and the processing corresponding to each voice The result of the specified affairs is used as training sample data.

16. the method for generation training sample data according to claim 15, which is characterized in that gone through described in the utilization History voice, each voice of generation include relative to the characteristic value of each assessment reference phrases：

According to preset basic word quantity, from every voice corresponding to every voice of extraction generation by basic The voice of each phrase of word composition；

In each phrase from the history voice corresponding to each voice, preset assessment reference phrases will be met Each phrase of quantitative requirement is as each assessment reference phrases；

Each voice in the history voice includes the situation of each assessment reference phrases and each item The intonation coefficient for the voice of assessment reference phrases that voice is included, each voice of generation is relative to each assessment with reference to short The characteristic value of language.

17. the method for generation training sample data according to claim 16, which is characterized in that gone through described in the basis What the situation and each voice that each voice in history voice includes each assessment reference phrases were included comments Estimate the intonation coefficient of the voice with reference phrases, generate characteristic value packet of each voice relative to each assessment reference phrases It includes：

18. the method for generation training sample data according to claim 15, spy are being, the specified transaction packet It includes and user is required to refund, correspondingly, corresponding intention includes the intention of user's refund.

19. a kind of method for generating the computation model that user view is determined based on user speech information, the computation model are used for When affairs are specified in processing, user view is determined based on user speech information, the method is characterised by comprising following steps：

The parameters of initializing computer deep neural network；

Using the convergent computer deep neural network as the computation model that user view is determined based on user speech information.

20. the method according to claim 19 for generating the computation model that user view is determined based on user speech information, It is characterized in that, the history speech production training sample data packet for utilizing user when handling the specified affairs in the past It includes：

21. the method according to claim 20 for generating the computation model that user view is determined based on user speech information, It is characterized in that, described utilize the history voice, each voice of generation includes relative to the characteristic value of assessment reference phrases：

22. the method according to claim 21 for generating the computation model that user view is determined based on user speech information, It is characterized in that, each voice in the history voice include each assessment reference phrases situation with And the intonation coefficient of the voice of assessment reference phrases that each voice is included, each voice of generation are used relative to assessment The characteristic value of reference phrases includes：

23. the method according to claim 19 for generating the computation model that user view is determined based on user speech information, Spy is being that the computation model is used to that user to be required to determine the intention that user refunds when refunding.

24. a kind of device that user view is determined based on user speech information, is characterized in that, including：

Speech characteristic value determination unit, semanteme and prosody information for being included according to the real-time voice determine to calculate user The speech characteristic value of intention；

Determination unit, for using the speech characteristic value as the computation model that user view is determined based on user speech information The assessed value of the intention of target user when this handles the specified affairs is calculated in input parameter.

25. the device according to claim 24 that user view is determined based on user speech information, is characterized in that, institute's predicate Sound characteristic value determination unit includes：

Phrase determination subelement, for according to preset rule, being extracted from the real-time voice and generating the real-time language The voice of each phrase that sound is included；

Intonation coefficient determination subelement, for being that the voice of each phrase assigns intonation system according to preset rule Number；

Speech characteristic value determination subelement, for situation about being included according to assessment reference phrases by the real-time voice and The intonation coefficient for the voice of assessment reference phrases that the real-time voice is included, determines speech characteristic value.

26. the device according to claim 25 that user view is determined based on user speech information, which is characterized in that described Speech characteristic value determination subelement is specifically used for：

27. a kind of device of determining assessment reference phrases, the assessment reference phrases are used to determine when handling specified affairs User is intended to accordingly, and described device is characterised by comprising：

Acquiring unit, the history voice of user when handling the specified affairs in the past for obtaining, the history voice include more Voice, the voice of one user every voice is specifies affairs described in previous single treatment when；

Phrase determination unit, for according to preset basic word quantity, generation every to be extracted from every voice The each phrase being made of basic word corresponding to voice；

Assessment reference phrases determination unit, in each phrase corresponding to each voice from the history voice, inciting somebody to action Meet each phrase of preset assessment reference phrases quantitative requirement as assessment reference phrases.

28. a kind of device for generating training sample data, the sample data is used to that computer deep neural network to be trained to give birth to Into the computation model that user view is determined based on user speech information, it is characterized in that, described device includes：

Characteristic value determination unit, for utilizing the history voice, each voice of generation is relative to each assessment reference phrases Characteristic value；

Sample data determination unit, for by each voice relative to each assessment reference phrases characteristic value with it is described The result of the processing specified affairs corresponding to each voice is used as training sample data.

29. the device of generation training sample data according to claim 28, which is characterized in that the characteristic value determines Unit includes：

Phrase determination subelement, for according to preset basic word quantity, extraction generation to be every from every voice The voice of each phrase being made of basic word corresponding to voice；

Reference phrases determination subelement, in each phrase corresponding to each voice from the history voice, will meet Each phrase of preset assessment reference phrases quantitative requirement is as each assessment reference phrases；

Characteristic value determination subelement includes each assessment with reference to short for each voice in the history voice The intonation coefficient of the voice of assessment reference phrases that the situation of language and each voice are included generates each voice phase For the characteristic value of each assessment reference phrases.

30. the device of generation training sample data according to claim 29, which is characterized in that the characteristic value determines Subelement is specifically used for：

31. a kind of device for generating the computation model that user view is determined based on user speech information, the computation model are used for When affairs are specified in processing, user view is determined based on user speech information, which is characterized in that including：

Sample data generation unit is used for being trained using the history speech production of user when handling the specified affairs in the past Sample data；

Model determination unit, for using the convergent computer deep neural network as true based on user speech information Determine the computation model of user view.

32. a kind of method that user's refund intention is determined based on user speech information, is characterized in that, including：

The real-time voice of target debt-credit user is obtained during collection；

The semanteme and prosody information included according to the real-time voice determines to calculate the speech characteristic value that user's refund is intended to；

Using the speech characteristic value as based on user speech information determine user refund be intended to computation model input parameter, The assessed value that the refund of target debt-credit user during this collection is intended to is calculated.

33. the method according to claim 32 that user's refund intention is determined based on user speech information, which is characterized in that It further includes：