CN111681653A - Call control method, device, computer equipment and storage medium - Google Patents

Call control method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN111681653A
CN111681653A CN202010351277.4A CN202010351277A CN111681653A CN 111681653 A CN111681653 A CN 111681653A CN 202010351277 A CN202010351277 A CN 202010351277A CN 111681653 A CN111681653 A CN 111681653A
Authority
CN
China
Prior art keywords
user
word segmentation
text data
natural language
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010351277.4A
Other languages
Chinese (zh)
Inventor
罗金雄
胡宏伟
马骏
王少军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010351277.4A priority Critical patent/CN111681653A/en
Publication of CN111681653A publication Critical patent/CN111681653A/en
Priority to PCT/CN2020/125064 priority patent/WO2021218086A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/51Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
    • H04M3/5166Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing in combination with interactive voice response systems or voice portals, e.g. as front-ends
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Signal Processing (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention discloses a call control method, a call control device, computer equipment and a storage medium, wherein the method records user natural language data collected by an agent terminal through a call-in module and converts the user natural language data into corresponding text data; performing word segmentation on the text data to obtain word segmentation results; taking the words in the word segmentation result as input, training the word segmentation result of the text data, and obtaining a word segmentation training result; inputting the word segmentation training result into a neural network model which is obtained by pre-training and used for natural language classification, and obtaining a classification result aiming at natural language data; and determining the user intention according to the classification result, and judging whether to use the calling module to carry out calling operation on the user according to the determined user intention. The invention provides a call control method based on a detection model, which can quickly identify the intention of a user to call, help enterprises to improve the quality of customer service and improve the satisfaction degree of customers.

Description

Call control method, device, computer equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a call control method and apparatus, a computer device, and a storage medium.
Background
In recent years, data mining has attracted great attention in the information industry, mainly because of the large amount of data and the wide use thereof, and there is an urgent need to convert such data into useful information and knowledge. The obtained information and knowledge can be widely applied to various applications including business management, production control, market analysis, engineering design, scientific exploration and the like, and have great application value.
The data sources of the traditional data mining are mostly existing historical data information or massive webpage resources, the processed objects are mostly text-type media, the types are single, the data sources are passive, only the data can be waited for, moreover, after the data is received, in the face of complicated and multi-purpose text contents, the screening and confirmation can not be further carried out through the confirmability of the data in an active mode, the data analysis is difficult to be carried out, effective and valuable information is difficult to be mined, the data mining period is longer, the determined information is less, the enterprise data mining cost is very high, and adopts a manual mode to collect data, needs to invest a large amount of labor cost, utilizes the manual mode to dig the intention of the incoming call of the user, the mining efficiency and accuracy for the user intention are not high, and finally the customer satisfaction is not high.
Disclosure of Invention
In view of this, embodiments of the present invention provide a call control method, an apparatus, a computer device, and a storage medium, which can quickly identify an intention of a user for an incoming call, help an enterprise improve customer service quality, and improve customer satisfaction.
In one aspect, an embodiment of the present invention provides a call control method, where the method includes:
recording user natural language data collected by an agent terminal through a call-in module, and converting the user natural language data into corresponding text data;
performing word segmentation on the text data to obtain word segmentation results of the text data, wherein the word segmentation results comprise one or more words;
taking words in the word segmentation result as input, training the word segmentation result of the text data by using a preset word vector model, and obtaining a word segmentation training result, wherein the word segmentation training result comprises a vector representation corresponding to each word;
inputting the word segmentation training result into a neural network model which is obtained by pre-training and used for natural language classification, and obtaining a classification result aiming at natural language data;
and determining the user intention according to the classification result, and judging whether to use a calling module to carry out calling operation on the user according to the determined user intention.
In another aspect, an embodiment of the present invention provides a call control apparatus, where the apparatus includes:
the conversion unit is used for recording user natural language data collected by the seat terminal through the call-in module and converting the user natural language data into corresponding text data;
the word segmentation unit is used for segmenting the text data to obtain word segmentation results of the text data, and the word segmentation results comprise one or more words;
the training unit is used for taking the words in the word segmentation results as input, training the word segmentation results of the text data by using a preset word vector model, and acquiring word segmentation training results, wherein the word segmentation training results comprise vector representations corresponding to each word;
the classification unit is used for inputting the word segmentation training result into a neural network model which is obtained by pre-training and used for natural language classification to obtain a classification result aiming at natural language data;
and the determining and judging unit is used for determining the user intention according to the classification result and judging whether the calling module is used for calling the user or not according to the determined user intention.
In yet another aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the call control method as described above is implemented.
In yet another aspect, the present invention also provides a computer-readable storage medium, which stores one or more computer programs that can be executed by one or more processors to implement the call control method as described above.
The embodiment of the invention provides a call control method, a call control device, computer equipment and a storage medium, wherein the method comprises the following steps: recording user natural language data collected by an agent terminal through a call-in module, and converting the user natural language data into corresponding text data; performing word segmentation on the text data to obtain word segmentation results of the text data, wherein the word segmentation results comprise one or more words; taking words in the word segmentation result as input, training the word segmentation result of the text data by using a preset word vector model, and obtaining a word segmentation training result, wherein the word segmentation training result comprises a vector representation corresponding to each word; inputting the word segmentation training result into a neural network model which is obtained by pre-training and used for natural language classification, and obtaining a classification result aiming at natural language data; the user intention is determined according to the classification result, and whether the calling module is used for calling the user or not is judged according to the determined user intention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of a call control method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a call control method according to an embodiment of the present invention;
fig. 3 is another schematic flow chart of a call control method according to an embodiment of the present invention;
fig. 4 is another schematic flow chart of a call control method according to an embodiment of the present invention;
fig. 5 is another schematic flow chart of a call control method according to an embodiment of the present invention;
fig. 6 is a schematic block diagram of a call control apparatus according to an embodiment of the present invention;
fig. 7 is another schematic block diagram of a call control apparatus according to an embodiment of the present invention;
fig. 8 is another schematic block diagram of a call control apparatus according to an embodiment of the present invention;
fig. 9 is another schematic block diagram of a call control apparatus according to an embodiment of the present invention;
fig. 10 is another schematic block diagram of a call control apparatus according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a call control method according to an embodiment of the present invention, and fig. 2 is a schematic view of a flow of the call control method according to the embodiment of the present invention. The call control method is applied to the agent terminal of the call center system. As an application, as shown in fig. 1, the call control method is applied to an agent terminal 10, and the agent terminal 10 executes a user intention identification instruction and feeds back a user intention identification result to a user terminal 20.
It should be noted that only one terminal 20 is illustrated in fig. 1, and in the actual operation process, the server 10 may feed back the execution result to a plurality of terminals 20.
Referring to fig. 2, fig. 2 is a schematic flowchart of a call control method according to an embodiment of the present invention. As shown in fig. 2, the method comprises the following steps S101-S105.
S101, recording user natural language data collected by an agent terminal through a call-in module, and converting the user natural language data into corresponding text data.
In the embodiment of the invention, a plurality of speech channel channels interacting with users are configured in an agent terminal, for example, in wired communication, one telephone line represents one speech channel, and in wireless communication, one carrier frequency represents one speech channel; the call-in module is used for distributing the telephone to the seat terminal according to an intelligent distribution mode when the seat terminal is in a ready state, giving a call-in prompt when the seat is in ringing, giving a call-in and refusing button, and starting a call with a user when the call-in and refusing button is clicked, and is arranged in a plurality of speech path channels of the seat terminal. Such as: the user's spoken natural query language is: "I want to inquire about credit card availability", "I want to inquire about my credit card's monthly bill", etc.
Further, as shown in fig. 3, the step of recording, by the call-in module, the user natural language data collected by the agent terminal and converting the user natural language data into corresponding text data specifically includes steps S201 to S204: s201, collecting natural language data input by a user by using a microphone on the seat terminal, and recording the natural language data of the user collected by the seat terminal through a call-in module; s202, carrying out digital processing on the user natural language data to obtain a voice signal; s203, extracting acoustic features of the voice signal; s204, inputting the acoustic features into a preset acoustic model for decoding so as to generate the text data. It should be noted that, the user natural language data is converted into corresponding text data, and since the user natural language data is a speech signal and the speech information belongs to an analog signal, it is necessary to process the analog speech signal, digitize the analog speech signal, and extract the acoustic features of the speech signal. Among other things, acoustic features can be extracted using methods such as mel-frequency cepstral coefficients MFCC, linear predictive cepstral coefficients LPCC, multimedia content description interface MPEG7, and the like. Then, the acoustic features can be input to the acoustic model for decoding, so as to obtain text data corresponding to the speech signal. I.e. the process of converting the user natural language data into corresponding text data.
S102, performing word segmentation on the text data to obtain word segmentation results of the text data, wherein the word segmentation results comprise one or more words.
Specifically, before performing word segmentation on the text data to obtain a word segmentation result of the text data, data preprocessing needs to be performed on the text data, and the main data preprocessing steps include: corpus cleaning, namely deleting noise data in the text data, and deleting punctuation marks and tone auxiliary words in the text data, and retaining useful data, in this embodiment, common cleaning methods are: manual de-duplication, alignment, deletion, labeling, etc., such as removing words that do not contribute to text features, such as punctuation, tone, "and the like. The segmenting the text data includes: and segmenting the text data by using a segmentation method based on a probability statistic model. For example, let T be T1T2.. Tm, T be a chinese string corresponding to text data to be segmented, let W be W1W2.. Wn, W be a segmentation result, and Wa, Wb, …, Wk be all possible segmentation schemes of T. Then, the word segmentation model based on probability statistics is to find the target word string W, so that W satisfies: and P (W | T) — MAX (P (Wa | T), P (Wb | T).. P (Wk | T)), wherein the word string W obtained by the word segmentation model is a word string with the maximum estimated probability, and the word string W is used as a word segmentation result obtained by segmenting the text data. Such as: the text data 'I want to inquire credit card available amount', the word segmentation result obtained after the word segmentation is carried out through the word segmentation model is as follows: "i want," "inquiry," "credit card," "available," "quota"; the text data "i want to inquire the monthly bill of my credit card", the word segmentation result obtained after the word segmentation is carried out through the word segmentation model is as follows: "i want", "query", "credit card", "this month", "bill".
S103, taking the words in the word segmentation result as input, training the word segmentation result of the text data by using a preset word vector model, and obtaining a word segmentation training result, wherein the word segmentation training result comprises a vector representation corresponding to each word.
Specifically, the preset Word vector model refers to a Word2 vec-based deep learning model, in some embodiments, other machine learning models such as KNN, SVM, Naive Bayes, decision trees, K-means, and the like may also be used, or alternatively, other deep learning models such as RNN, CNN, LSTM, Seq2Seq, FastText, TextCNN, and the like may also be used, in this embodiment, the used deep learning model is a Word2 vec-based deep learning model, and a specific training process is to train the Word segmentation result of the text data by using the Word2vec deep learning model in the Gensim in the Python toolkit, take a Word in the Word segmentation result as an input, and take a Word vector training result as an output, where the Word vector result includes a vector representation corresponding to each Word.
Further, as shown in fig. 4, the step S103 includes steps S301 to S302:
s301, inputting words in the word segmentation result of the text data into a Python toolkit Gensim;
s302, training the Word segmentation result of the text data by using a Word2 vec-based deep learning model in a Python toolkit Gensim to obtain a vector representation corresponding to each Word as a Word segmentation training result.
In this example, using Gensim in the Python toolkit and performing the following parameter settings for the Word2vec deep learning model in the toolkit:
Figure BDA0002471908130000061
Figure BDA0002471908130000071
after training is completed through a Word2vec model in the Python toolkit Gensim, a vector.bin file is obtained, wherein each term of the text data and a term vector corresponding to each term are included in the vector.bin.
And S104, inputting the word segmentation training result into a neural network model which is trained in advance and used for natural language classification, and obtaining a classification result aiming at natural language data.
In the embodiment of the present invention, the neural network model is: o ist=g(V·St)St=f(U·Xt+St-1);
Wherein, XtIs the value of the input layer of the recurrent neural network, St、St-1Is the value of the hidden layer of the recurrent neural network, OtIs the value of the output layer of the recurrent neural network, U is the first weight matrix from the input layer to the hidden layer, V is the second weight matrix from the hidden layer to the output layer, g () is the nonlinear activation function, f () is the softmax function. Before inputting the word vector training result into the neural network model for natural language classification obtained by pre-training, the neural network model for natural language classification needs to be pre-trained, and the training process is as follows: inputting historical word vector data into a pre-constructed screening model for part-of-speech tagging to obtain part-of-speech probabilities corresponding to each historical word vector, tagging the corresponding historical word vector as a word vector of the part-of-speech of a noun if the part-of-speech probability corresponding to each historical word vector is greater than or equal to a preset first probability, tagging the corresponding historical word vector as a word vector of the part-of-speech of a verb if the part-of-speech probability corresponding to each word vector is greater than or equal to a preset second probability, and tagging the corresponding historical word vector as a word vector of the part-of-speech of a verb if the part-of-speech probability corresponding to each word vector is greater thanIf the probability is larger than or equal to a preset third probability, marking the corresponding historical word vector as a word vector of the part of speech of the adjective; more specifically, in the embodiment, a screening model is constructed by performing model training on a history word vector according to a naive bayesian algorithm; the screening model is used for judging whether the input word vector is a noun part-of-speech vector, a verb part-of-speech word vector or an adjective word part-of-speech word vector. When constructing a filtering model for part-of-speech tagging, a plurality of word vectors included in a training set are required to be used as input of the filtering model, and part-of-speech corresponding to each word vector is used as output of the filtering model, and the filtering model is obtained through training. The model of the naive bayes algorithm used is as follows:
Figure BDA0002471908130000083
wherein the content of the first and second substances,
Figure BDA0002471908130000081
Figure BDA0002471908130000082
nck represents the number of ck documents in the training set, and N represents the total number of word vectors in the training set; tjk represents the number of times term tj occurs in category ck, and V is the set of terms for all categories. By using the screening model as a classifier of the part of speech of the word vector, the input word vector can be judged to be a vector of the part of speech of a noun, a word vector of the part of speech of a verb or a word vector of the part of speech of an adjective. For example, each word vector is input into a model of a naive bayesian algorithm, and when the probability that the data appears in the vector class of the part of speech of the noun is greater than or equal to 50% (i.e., the first probability is set to 50%), the data can be regarded as a vector of the part of speech of the noun; when the probability of the part-of-speech corresponding to the word vector in the word vector category of the verb part-of-speech is greater than or equal to 50% (namely the second probability is set to be 50%), marking the word vector as the word vector of the verb part-of-speech; and when the probability of the part of speech corresponding to the word vector in the word vector category of the part of speech of the adjective is greater than or equal to 50 percent (namely the third probability is set to be 50 percent), marking the word vector as the word vector of the part of speech of the adjective.
And training to obtain a neural network model by taking the word vector result subjected to part-of-speech tagging as the input of a neural network and taking the corresponding word vector classification result as the output of a recurrent neural network, and training to obtain a first weight matrix, a second weight matrix and the neural network model by taking the word segmentation result subjected to part-of-speech tagging on the historical word vector as the input of the neural network and taking the corresponding word vector classification result as the output of the recurrent neural network, so as to obtain the model for subsequent word vector classification. After the pre-trained neural network model is obtained, the word vector training result of the user is input into the pre-trained neural network model, and the word vectors of the user are rapidly and intelligently classified according to the preset neural network model. For example, for the text data "i want to inquire about the credit card available amount", 4-dimensional word vectors are obtained after word segmentation and vector representation, and after the 4-dimensional word vectors are input to the pre-trained neural network model, the output classification result is verb-inquiry, noun-credit card, verb-available, noun-amount, and for the text data "i want to inquire about the monthly bill of my credit card", 4-dimensional word vectors are obtained after word segmentation and vector representation, and after the 4-dimensional word vectors are input to the pre-trained neural network model, the output classification result is: verb-query, noun-credit card, adjective-this month, noun-bill.
And S105, determining the user intention according to the classification result, and judging whether to use a calling module to carry out calling operation on the user according to the determined user intention.
Specifically, which content is a specific user intention or content having value is determined according to the classification result, and the present embodiment uses an intention slot model for determination. Further, as shown in fig. 5, the step S105 includes steps S401 to S403: s401, describing the classification result by using an intention-slot model; s402, determining the attribute of the user intention according to the described content, and determining the user intention according to the determined attribute of the user intention; and S403, judging whether to use the calling module to call the user according to the determined user intention. In the present embodiment, the specific user intention refers to that when the user explicitly expresses the need to do, in the intention slot model, a model description containing four elements is usually used, such as for the classification result being verb-query, noun-credit card, verb-available, noun-quota, we use the following slots to describe: name is query # target, name is credit card available # target, params2 is credit card, and as for the classification result: verb-query, noun-credit card, adjective-this month, noun-bill, we describe using the following slots: name is query # target, name is bill # target, params1 is credit card, action # name in the four-element slot model is used for describing actions of user intentions, query, handling, application and the like, target # name is a target object of user intentions, such as annual fee, amount, bill and the like, the other two slots are used as the attributes of intentions, the intention slots are supplemented, the intention slots are further clarified through supplementation, if the two slots are not supplemented, some intentions cannot be distinguished, such as 'fixed amount of inquiring my credit card' and 'temporary amount of inquiring my credit card', the action # name is 'query', the target # name is 'amount', the two slots cannot be distinguished by only adding the unique intentions, and only the two slots are distinguished by adding the unique intentions. In the previous examples, "inquire credit card available amount", "inquire credit card monthly bill", "raise credit card fixed amount" etc. are user specific intentions, which are valuable contents for a company (e.g. peace) because they express the idea that the user wants to transact business. But some intentions like "I want to inquire about the weather of Shenzhen today" are just invaluable for peace. As for how the robot confirms whether the robot is valuable, business personnel is required to define the value of the intention in advance, an identifiable intention list is listed, the intention in the list is divided into valuable and non-valuable, and whether the current intention is valuable or not can be known through intention matching.
It should be noted that, in the four-element intention model, generally, four slots need to be filled with to determine an accurate intention, if a user only says "i want to inquire the credit card amount" when expressing his own intention, we find that it only conforms to two slots when doing the intention slot filling, and do not determine whether the intention expressed by him is to "inquire the temporary credit card amount" or "inquire the fixed credit card amount".
After determining the user intention, determining whether to use a calling module to perform a calling operation on the user, specifically, the calling operation may refer to call back through a phone, and we follow up the fuzzy intention "query credit card limit" expressed when the user takes a round of phone call according to an accurate intention model "action.
As can be seen from the above, in the embodiment of the present invention, the incoming call module records the user natural language data collected by the agent terminal, and converts the user natural language data into corresponding text data; performing word segmentation on the text data to obtain word segmentation results of the text data, wherein the word segmentation results comprise one or more words; taking words in the word segmentation result as input, training the word segmentation result of the text data by using a preset word vector model, and obtaining a word segmentation training result, wherein the word segmentation training result comprises a vector representation corresponding to each word; inputting the word segmentation training result into a neural network model which is obtained by pre-training and used for natural language classification, and obtaining a classification result aiming at natural language data; and determining the user intention according to the classification result, and judging whether to use a calling module to call the user according to the determined user intention.
Referring to fig. 6, in response to the above-mentioned call control method, an embodiment of the present invention further provides a call control apparatus, where the apparatus 100 includes: the device comprises a conversion unit 101, a word segmentation unit 102, a training unit 103, a classification unit 104 and a determination and judgment unit 105.
The conversion unit 101 is configured to record, through the call-in module, user natural language data collected by the agent terminal, and convert the user natural language data into corresponding text data;
a word segmentation unit 102, configured to perform word segmentation on the text data to obtain a word segmentation result of the text data, where the word segmentation result includes one or more words;
the training unit 103 is configured to train the word segmentation result of the text data by using a preset word vector model with words in the word segmentation result as input, and obtain a word segmentation training result, where the word segmentation training result includes a vector representation corresponding to each word;
a classification unit 104, configured to input the word segmentation training result to a pre-trained neural network model for natural language classification, so as to obtain a classification result for natural language data;
a determination and judgment unit 105, configured to determine a user intention according to the classification result, and judge whether to perform a call operation on the user by using the calling module according to the determined user intention.
As can be seen from the above, in the embodiment of the present invention, the incoming call module records the user natural language data collected by the agent terminal, and converts the user natural language data into corresponding text data; performing word segmentation on the text data to obtain word segmentation results of the text data, wherein the word segmentation results comprise one or more words; taking words in the word segmentation result as input, training the word segmentation result of the text data by using a preset word vector model, and obtaining a word segmentation training result, wherein the word segmentation training result comprises a vector representation corresponding to each word; inputting the word segmentation training result into a neural network model which is obtained by pre-training and used for natural language classification, and obtaining a classification result aiming at natural language data; and determining the user intention according to the classification result, and judging whether to use a calling module to call the user according to the determined user intention.
Referring to fig. 7, the conversion unit 101 includes:
the system comprises a collecting unit 101a, a calling module and a calling module, wherein the collecting unit 101a is used for collecting user natural language data by using a microphone on an agent terminal and recording the user natural language data collected by the agent terminal through the calling module;
the processing unit 101b is used for performing digital processing on the user natural language data to obtain a voice signal;
an extracting unit 101c for extracting an acoustic feature of the speech signal;
the generating unit 101d is configured to input the acoustic features into a preset acoustic model for decoding, so as to generate the text data.
Referring to fig. 8, the apparatus 100 further includes:
the corpus cleaning unit 102a is configured to perform corpus cleaning on the text data to delete noise data in the text data, and delete punctuation marks and mood assist words in the text data.
Referring to fig. 9, the training unit 103 includes:
an input unit 103a, configured to input a word in the word segmentation result of the text data into a Python toolkit Gensim;
and the training subunit 103b is configured to train the word segmentation result of the text data by using a word2 vec-based deep learning model in a Python toolkit Gensim, so as to obtain a vector representation corresponding to each word as a word segmentation training result.
Referring to fig. 10, the determination judging unit 105 includes:
a description unit 105a, configured to describe the classification result using an intent-slot model;
a determination unit 105b for determining an attribute of the user intention according to the described content, and determining the user intention according to the determined attribute of the user intention;
and a judgment execution unit 105c for judging whether to perform a calling operation for the user using the calling-out module by the determined user intention.
The call control device corresponds to the call control method one to one, and the specific principle and process thereof are the same as those of the method described in the above embodiment, and are not described again.
The above-described call control means may be implemented in the form of a computer program which is executable on a computer device as shown in fig. 11.
FIG. 11 is a schematic diagram of a computer device according to the present invention. The device may be an agent terminal of a call centre system. Referring to fig. 11, the computer apparatus 500 includes a processor 502, a nonvolatile storage medium 503, an internal memory 504, and a network interface 505, which are connected by a system bus 501. The non-volatile storage medium 503 of the computer device 500 may store, among other things, an operating system 5031 and a computer program 5032, which, when executed, may cause the processor 502 to perform a call control method. The processor 502 of the computer device 500 is used to provide computing and control capabilities that support the overall operation of the computer device 500. The internal memory 504 provides an environment for the execution of computer programs 5032 in the non-volatile storage medium 503, which when executed by the processor, cause the processor 502 to perform a call control method. The network interface 505 of the computer device 500 is used for network communication. Those skilled in the art will appreciate that the architecture shown in fig. 11 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 502 implements the following operations when executing the computer program:
recording user natural language data collected by an agent terminal through a call-in module, and converting the user natural language data into corresponding text data;
performing word segmentation on the text data to obtain word segmentation results of the text data, wherein the word segmentation results comprise one or more words;
taking words in the word segmentation result as input, training the word segmentation result of the text data by using a preset word vector model, and obtaining a word segmentation training result, wherein the word segmentation training result comprises a vector representation corresponding to each word;
inputting the word segmentation training result into a neural network model which is obtained by pre-training and used for natural language classification, and obtaining a classification result aiming at natural language data;
and determining the user intention according to the classification result, and judging whether to use a calling module to carry out calling operation on the user according to the determined user intention.
In one embodiment, the recording, by the call-in module, user natural language data collected by the agent terminal, and converting the user natural language data into corresponding text data includes:
collecting user natural language data by using a microphone on the seat terminal, and recording the user natural language data collected by the seat terminal through a call-in module;
carrying out digital processing on the user natural language data to obtain a voice signal;
extracting acoustic features of the voice signal;
and inputting the acoustic features into a preset acoustic model for decoding so as to generate the text data.
In one embodiment, the processor 502, when executing the computer program, further performs the following:
and performing corpus cleaning on the text data to delete the noise data in the text data and delete punctuation marks and mood auxiliary words in the text data.
In one embodiment, the training of the word segmentation result of the text data by using a preset word vector model with the words in the word segmentation result as input obtains an output result, where the output result includes a vector representation corresponding to each word, and includes:
inputting words in the word segmentation result of the text data into a Python toolkit Gensim;
and training the word segmentation result of the text data by using a word2 vec-based deep learning model in a Python toolkit Gensim to obtain a vector representation corresponding to each word as a word segmentation training result.
In one embodiment, the determining the user intention according to the classification result and judging whether to use the calling module to perform the calling operation on the user according to the determined user intention includes:
describing the classification result by using an intention-slot model;
determining the attribute of the user intention according to the described content, and determining the user intention according to the determined attribute of the user intention;
and judging whether to use the calling module to carry out calling operation on the user according to the determined user intention.
Those skilled in the art will appreciate that the embodiment of a computer device illustrated in fig. 11 does not constitute a limitation on the specific construction of the computer device, and that in other embodiments a computer device may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device only includes a memory and a processor, and in such embodiments, the structures and functions of the memory and the processor are the same as those of the embodiment shown in fig. 11, and are not described herein again.
The present invention provides a computer readable storage medium storing one or more computer programs, the one or more computer programs being executable by one or more processors to perform the steps of:
recording user natural language data collected by an agent terminal through a call-in module, and converting the user natural language data into corresponding text data;
performing word segmentation on the text data to obtain word segmentation results of the text data, wherein the word segmentation results comprise one or more words;
taking words in the word segmentation result as input, training the word segmentation result of the text data by using a preset word vector model, and obtaining a word segmentation training result, wherein the word segmentation training result comprises a vector representation corresponding to each word;
inputting the word segmentation training result into a neural network model which is obtained by pre-training and used for natural language classification, and obtaining a classification result aiming at natural language data;
and determining the user intention according to the classification result, and judging whether to use a calling module to carry out calling operation on the user according to the determined user intention.
In one embodiment, the recording, by the call-in module, user natural language data collected by the agent terminal, and converting the user natural language data into corresponding text data includes:
collecting user natural language data by using a microphone on the seat terminal, and recording the user natural language data collected by the seat terminal through a call-in module;
carrying out digital processing on the user natural language data to obtain a voice signal;
extracting acoustic features of the voice signal;
and inputting the acoustic features into a preset acoustic model for decoding so as to generate the text data.
In one embodiment, the one or more computer programs, which are executable by one or more processors, further implement the steps of:
and performing corpus cleaning on the text data to delete the noise data in the text data and delete punctuation marks and mood auxiliary words in the text data.
In one embodiment, the training of the word segmentation result of the text data by using a preset word vector model with the words in the word segmentation result as input obtains an output result, where the output result includes a vector representation corresponding to each word, and includes:
inputting words in the word segmentation result of the text data into a Python toolkit Gensim;
and training the word segmentation result of the text data by using a word2 vec-based deep learning model in a Python toolkit Gensim to obtain a vector representation corresponding to each word as a word segmentation training result.
In one embodiment, the determining the user intention according to the classification result and judging whether to use the calling module to perform the calling operation on the user according to the determined user intention includes:
describing the classification result by using an intention-slot model;
determining the attribute of the user intention according to the described content, and determining the user intention according to the determined attribute of the user intention;
and judging whether to use the calling module to carry out calling operation on the user according to the determined user intention.
The foregoing storage medium of the present invention includes: various media that can store program codes, such as a magnetic disk, an optical disk, and a Read-Only Memory (ROM).
The elements of all embodiments of the present invention may be implemented by a general purpose integrated circuit, such as a CPU (central processing Unit), or by an ASIC (Application Specific integrated circuit).
The steps in the call control method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs.
The units in the call control device of the embodiment of the invention can be merged, divided and deleted according to actual needs.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for call control, the method comprising:
recording user natural language data collected by an agent terminal through a call-in module, and converting the user natural language data into corresponding text data;
performing word segmentation on the text data to obtain word segmentation results of the text data, wherein the word segmentation results comprise one or more words;
taking words in the word segmentation result as input, training the word segmentation result of the text data by using a preset word vector model, and obtaining a word segmentation training result, wherein the word segmentation training result comprises a vector representation corresponding to each word;
inputting the word segmentation training result into a neural network model which is obtained by pre-training and used for natural language classification, and obtaining a classification result aiming at natural language data;
and determining the user intention according to the classification result, and judging whether to use a calling module to carry out calling operation on the user according to the determined user intention.
2. The method of claim 1, wherein the recording, by the call-in module, user natural language data collected by an agent terminal and converting the user natural language data into corresponding text data comprises:
collecting user natural language data by using a microphone on the seat terminal, and recording the user natural language data collected by the seat terminal through a call-in module;
carrying out digital processing on the user natural language data to obtain a voice signal;
extracting acoustic features of the voice signal;
and inputting the acoustic features into a preset acoustic model for decoding so as to generate the text data.
3. The method of claim 1, wherein before the segmenting the text data into words to obtain the segmentation result of the text data, the method further comprises a text data preprocessing step of:
and performing corpus cleaning on the text data to delete the noise data in the text data and delete punctuation marks and mood auxiliary words in the text data.
4. The method of claim 1, wherein the training the segmentation result of the text data by using a preset word vector model with the word in the segmentation result as an input to obtain a segmentation training result comprises:
inputting words in the word segmentation result of the text data into a Python toolkit Gensim;
and training the word segmentation result of the text data by using a word2 vec-based deep learning model in a Python toolkit Gensim to obtain a vector representation corresponding to each word as a word segmentation training result.
5. The method of claim 1, wherein the determining a user intention according to the classification result and determining whether to perform a call operation on the user using a calling module according to the determined user intention comprises:
describing the classification result by using an intention-slot model;
determining the attribute of the user intention according to the described content, and determining the user intention according to the determined attribute of the user intention;
and judging whether to use the calling module to carry out calling operation on the user according to the determined user intention.
6. A call control apparatus, characterized in that the apparatus comprises:
the conversion unit is used for recording user natural language data collected by the seat terminal through the call-in module and converting the user natural language data into corresponding text data;
the word segmentation unit is used for segmenting the text data to obtain word segmentation results of the text data, and the word segmentation results comprise one or more words;
the training unit is used for taking the words in the word segmentation results as input, training the word segmentation results of the text data by using a preset word vector model, and acquiring word segmentation training results, wherein the word segmentation training results comprise vector representations corresponding to each word;
the classification unit is used for inputting the word segmentation training result into a neural network model which is obtained by pre-training and used for natural language classification to obtain a classification result aiming at natural language data;
and the determining and judging unit is used for determining the user intention according to the classification result and judging whether the calling module is used for calling the user or not according to the determined user intention.
7. The apparatus of claim 6, wherein the conversion unit comprises:
the system comprises a collecting unit, a calling module and a calling module, wherein the collecting unit is used for collecting user natural language data by using a microphone on an agent terminal and recording the user natural language data collected by the agent terminal through the calling module;
the processing unit is used for carrying out digital processing on the user natural language data to obtain a voice signal;
an extraction unit for extracting acoustic features of the speech signal;
and the generating unit is used for inputting the acoustic features into a preset acoustic model for decoding so as to generate the text data.
8. The apparatus of claim 6, wherein the apparatus further comprises:
and the corpus cleaning unit is used for performing corpus cleaning on the text data so as to delete the noise data in the text data and delete punctuation marks and mood auxiliary words in the text data.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the call control method according to any of claims 1-5 when executing the computer program.
10. A computer-readable storage medium, storing one or more computer programs, the one or more computer programs being executable by one or more processors to implement the call control method of any one of claims 1-5.
CN202010351277.4A 2020-04-28 2020-04-28 Call control method, device, computer equipment and storage medium Pending CN111681653A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010351277.4A CN111681653A (en) 2020-04-28 2020-04-28 Call control method, device, computer equipment and storage medium
PCT/CN2020/125064 WO2021218086A1 (en) 2020-04-28 2020-10-30 Call control method and apparatus, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010351277.4A CN111681653A (en) 2020-04-28 2020-04-28 Call control method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111681653A true CN111681653A (en) 2020-09-18

Family

ID=72452291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010351277.4A Pending CN111681653A (en) 2020-04-28 2020-04-28 Call control method, device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN111681653A (en)
WO (1) WO2021218086A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112242140A (en) * 2020-10-13 2021-01-19 中移(杭州)信息技术有限公司 Intelligent device control method and device, electronic device and storage medium
CN112347232A (en) * 2020-11-18 2021-02-09 武汉贝多多网络科技有限公司 Method for carrying out intention recognition on object based on cloud computing
CN112350908A (en) * 2020-11-10 2021-02-09 珠海格力电器股份有限公司 Control method and device of intelligent household equipment
CN113434680A (en) * 2021-06-29 2021-09-24 平安科技(深圳)有限公司 User intention analysis method and device based on seat data and electronic equipment
WO2021218086A1 (en) * 2020-04-28 2021-11-04 平安科技(深圳)有限公司 Call control method and apparatus, computer device, and storage medium
WO2022134833A1 (en) * 2020-12-23 2022-06-30 深圳壹账通智能科技有限公司 Speech signal processing method, apparatus and device, and storage medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168716B (en) * 2022-02-11 2022-05-24 华南理工大学 Deep learning-based automatic engineering cost extraction and analysis method and device
CN114615378A (en) * 2022-03-10 2022-06-10 平安普惠企业管理有限公司 Call connection method and device, intelligent voice platform and storage medium
CN115687577B (en) * 2023-01-04 2023-04-07 交通运输部公路科学研究所 Road transportation normalized problem appeal discovery method and system
CN116320171B (en) * 2023-05-25 2023-12-08 安徽博天亚智能科技集团有限公司 Artificial intelligent calling system, method, device and medium based on big data
CN116361442B (en) * 2023-06-02 2023-10-17 国网浙江宁波市鄞州区供电有限公司 Business hall data analysis method and system based on artificial intelligence

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105681609A (en) * 2014-11-20 2016-06-15 中兴通讯股份有限公司 Assistance method and device for service center agent
KR101891496B1 (en) * 2017-12-26 2018-08-24 주식회사 머니브레인 Interactive ai agent system and method for actively monitoring and joining a dialogue session among users, computer readable recording medium
CN109840276A (en) * 2019-02-12 2019-06-04 北京健康有益科技有限公司 Intelligent dialogue method, apparatus and storage medium based on text intention assessment
CN110046221B (en) * 2019-03-01 2023-12-22 平安科技(深圳)有限公司 Machine dialogue method, device, computer equipment and storage medium
CN110853649A (en) * 2019-11-05 2020-02-28 集奥聚合(北京)人工智能科技有限公司 Label extraction method, system, device and medium based on intelligent voice technology
CN111681653A (en) * 2020-04-28 2020-09-18 平安科技(深圳)有限公司 Call control method, device, computer equipment and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021218086A1 (en) * 2020-04-28 2021-11-04 平安科技(深圳)有限公司 Call control method and apparatus, computer device, and storage medium
CN112242140A (en) * 2020-10-13 2021-01-19 中移(杭州)信息技术有限公司 Intelligent device control method and device, electronic device and storage medium
CN112350908A (en) * 2020-11-10 2021-02-09 珠海格力电器股份有限公司 Control method and device of intelligent household equipment
CN112350908B (en) * 2020-11-10 2021-11-23 珠海格力电器股份有限公司 Control method and device of intelligent household equipment
CN112347232A (en) * 2020-11-18 2021-02-09 武汉贝多多网络科技有限公司 Method for carrying out intention recognition on object based on cloud computing
WO2022134833A1 (en) * 2020-12-23 2022-06-30 深圳壹账通智能科技有限公司 Speech signal processing method, apparatus and device, and storage medium
CN113434680A (en) * 2021-06-29 2021-09-24 平安科技(深圳)有限公司 User intention analysis method and device based on seat data and electronic equipment

Also Published As

Publication number Publication date
WO2021218086A1 (en) 2021-11-04

Similar Documents

Publication Publication Date Title
CN111681653A (en) Call control method, device, computer equipment and storage medium
CN108345692B (en) Automatic question answering method and system
CN110444198B (en) Retrieval method, retrieval device, computer equipment and storage medium
US10498888B1 (en) Automatic call classification using machine learning
US8238528B2 (en) Automatic analysis of voice mail content
US8135579B2 (en) Method of analyzing conversational transcripts
CN110019742B (en) Method and device for processing information
US20240144957A1 (en) End-to-end speech diarization via iterative speaker embedding
KR20150094419A (en) Apparatus and method for providing call record
CN110225210A (en) Based on call abstract Auto-writing work order method and system
CN115665325A (en) Intelligent outbound method, device, electronic equipment and storage medium
CN115935182A (en) Model training method, topic segmentation method in multi-turn conversation, medium, and device
CN110933225A (en) Call information acquisition method and device, storage medium and electronic equipment
CN114706945A (en) Intention recognition method and device, electronic equipment and storage medium
CN112995414A (en) Behavior quality inspection method, device, equipment and storage medium based on voice call
CN115455982A (en) Dialogue processing method, dialogue processing device, electronic equipment and storage medium
CN112527969A (en) Incremental intention clustering method, device, equipment and storage medium
CN110740212B (en) Call answering method and device based on intelligent voice technology and electronic equipment
CN110347696B (en) Data conversion method, device, computer equipment and storage medium
CN114240250A (en) Intelligent management method and system for vocational evaluation
CN113590828B (en) Method and device for acquiring call key information
CN117059095B (en) IVR-based service providing method and device, computer equipment and storage medium
CN114282973B (en) Electronic commerce payment system and method based on mobile terminal
CN113239164B (en) Multi-round dialogue flow construction method and device, computer equipment and storage medium
CN112434501B (en) Method, device, electronic equipment and medium for intelligent generation of worksheet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination