CN111507751A

CN111507751A - Communication data-based clue scoring method

Info

Publication number: CN111507751A
Application number: CN202010223418.4A
Authority: CN
Inventors: 杨植麟; 杜羽伦; 陈虞君; 张宇韬
Original assignee: Beijing Ruikelun Intelligent Technology Co ltd
Current assignee: Beijing Ruikelun Intelligent Technology Co ltd
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2020-08-07

Abstract

The invention relates to the technical field of computer information, in particular to a clue scoring method based on communication data, which mainly comprises the following steps: step 1: collecting communication data; step 2: converting the communication data into interactive text; and step 3: performing data preprocessing on the interactive text and the order data to obtain a plurality of clues and labels corresponding to the clues, wherein the labels comprise positive example labels and negative example labels; and 4, step 4: preparing a natural language processing pre-training model; and 5: training and testing the pre-training model through the clues; step 6: and generating a clue scoring result. According to the invention, the artificial intelligence technologies such as voice recognition, natural language processing and machine learning are used to improve the success rate of clue conversion of telemarketers, so that each telemarketer can more accurately contact potential customers with stronger intentions, thereby improving the overall clue conversion rate and operation efficiency and enabling effective data to be more quickly contacted.

Description

Communication data-based clue scoring method

Technical Field

The invention relates to the technical field of computer information, in particular to a clue scoring method based on communication data.

Background

Currently, in the field of telemarketing, sales and marketing personnel take several hours per day to follow sales leads in a pool of leads. How to screen out the clues that the salesperson should follow first so as to improve the final clue conversion rate is an urgent problem.

The traditional telemarketing industry relies on the use of a customer relationship management system (CRM) to select clues to be followed by sales and marketing personnel to input clue-related data into the system in a tedious way and manually judge the single intention of the clues. Despite the best efforts of the sales service management department, these systems have a series of problems such as data loss, manual error filling, and false data tampering, which result in inaccurate or failed final clue intent determination. Therefore, there is a need to invent a general innovative clue scoring method based on non-tamperable phone recording or communication text recording.

Disclosure of Invention

The invention provides a clue scoring method based on communication data, which improves the clue conversion success rate of telemarketers by applying artificial intelligence technologies such as voice recognition, natural language processing, machine learning and the like, so that each telemarketer can more accurately contact potential customers with stronger intention, thereby improving the overall clue conversion rate and operation efficiency and enabling effective data to be more quickly contacted.

In order to achieve the purpose, the invention provides the following technical scheme: a clue scoring method based on communication data mainly comprises the following steps:

step 1: collecting communication data;

step 2: converting the communication data into interactive text;

and step 3: performing data preprocessing on the interactive text and the order data to obtain a plurality of clues and labels corresponding to the clues, wherein the labels comprise positive example labels and negative example labels;

and 4, step 4: preparing a natural language processing pre-training model;

and 5: training and testing the pre-training model through the clues;

step 6: and generating a clue scoring result.

Preferably, the step 3 further comprises the following steps:

step 31: sequencing the interactive text of each clue according to the time stamp, and cutting the interactive text into one or more evaluation time points;

step 32: aiming at an evaluation time point, checking whether a call is formed within N2 days after the evaluation time point by using the conversation text content of each clue N1 days before the evaluation time point;

step 33: cues are characterized, including but not limited to merging interactive text, segmenting interactive text, and adding additional features.

Preferably, the step 5 further comprises the following steps:

step 51: randomly dividing a plurality of the clues of the step 3 into a training set and a testing set;

step 52: fine-tuning a two-classification model formed by a pre-training model by using the training set;

step 53: and testing the pre-training model through the test set, and selecting an optimal model.

The invention has the beneficial effects that: the invention applies the artificial intelligence technology of voice recognition, natural language processing, machine learning and the like to improve the success rate of clue conversion of telemarketers and ensure that each telemarketer can more accurately contact potential customers with stronger intention, thereby improving the overall clue conversion rate and operation efficiency and ensuring that effective clues are more quickly contacted.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of model creation according to the present invention;

FIG. 2 is a flowchart illustrating an exemplary application of the optimal model of the present invention;

FIG. 3 is an exemplary process for evaluating positive and negative instances of a time point according to the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in the flowcharts of fig. 1 and fig. 2, a method for scoring a thread based on communication data includes the following steps:

step 1: collecting communication data;

step 2: converting the communication data into interactive text;

and 4, step 4: preparing a natural language processing pre-training model;

and 5: training and testing the pre-training model through the clues;

step 6: and generating a clue scoring result.

Specifically, the method comprises the following steps: first, collect relevant data and store the data in computer database, the data includes communication record between sale and client, such as telephone recording, text chat record, and order data, such as name, alias, unit, etc. Since this data is ultimately used to generate data for training and testing the final cue scoring model. The data field thus contains at least the following information:

for example, if the communication record source and the call record are, the data fields of the record are:

the order data is:

the main purpose of this step is to transform the telephone recordings of the customer and the sale into structured texts for the natural language processing technique to analyze the semantic parsing clues into the degree of singleness.

The third step: and carrying out data preprocessing on the written interactive text and the order data. This step includes a number of preprocessing tasks. Firstly, sorting and sorting the historical call recording text of each clue according to the call time stamp, and setting one or more evaluation time points. Then, aiming at an evaluation time point, the conversation text content of each clue in N1 days before the evaluation time point is used for checking whether an order is formed in N2 days after the evaluation time point. Clues with singleton information are considered positive examples, while clues without singleton information are considered negative examples. Finally, positive and negative examples cues are characterized by methods including, but not limited to, merging call text, call text tokenize, text sliding window segmentation, adding additional features. The final product of this step is a plurality of clue-corresponding features and their corresponding positive and negative labels. As shown in fig. 3:

thread a should be set as a positive example and the text of phone 2 and phone 3 should be merged.

Thread B is a negative example because thread B is not singleton. The text of phone 2 and phone 3 should be merged.

Clue C is neither positive nor negative because the bill of lading is outside the target number of days N2.

Thread D is neither a positive nor a negative example because there is no call data within the target number of days N1.

Chinese participles (tokenize) are a series of non-separable minimum unit symbols that split each chinese sentence or paragraph into. The Chinese tokenizer of Bert may be used herein for segmentation. Because the transcribed text length usually takes 1000 words, the model with large parameter number of Bert cannot support the ultra-long sequence well during training, and a sliding window (sliding window) idea is needed to solve the problem at present. The sliding window divides a long text into several overlapped sections (for example, each section has 256 words, and the overlap length is 128 words), and then each section is input into the Bert model as an independent text for training. And finally integrating the results obtained by the independent texts.

Additional features may also include follow-up cadence features, product information, user behavior features, customer attribute features, and so forth.

The models that can be used here include, but are not limited to, X L Net and Bert.

And fifthly, training a clue scoring model, and finely adjusting the pre-training model X L Net or Bert by using the prepared training data to train a binary model.

In the step, a training set and a test set are randomly separated from the data processed in the third step according to clues, and preferably, 80% of clue data is used as the training set, and the other 20% of clue data is used as the test set. Then, the training set data is used for fine-tuning the two classification models formed by the training pre-training models. And finally, selecting the optimal model according to the effect of the model on the test data set, and storing the model.

And a sixth step: and generating a clue scoring result. For each thread to be scored, the relevant features of the thread are input into the optimal model obtained in the fifth step, and the value obtained by the model after the Softmax layer can be regarded as the score of the thread in a single direction.

In practical applications, for a thread pool, all threads in the pool should be sorted from large to small according to the intention score, and the thread at the head of the sorting is a high-order one-way thread for the sales or marketing staff to follow.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A clue scoring method based on communication data is characterized by comprising the following steps:

step 1: collecting communication data;

step 2: converting the communication data into interactive text;

and 4, step 4: preparing a natural language processing pre-training model;

and 5: training and testing the pre-training model through the clues;

step 6: and generating a clue scoring result.

2. The method of claim 1, wherein the step of scoring the communication data comprises: the step 3 further comprises the following steps:

3. The method of claim 2, wherein the step of scoring the communication data comprises: the step 5 further comprises the following steps: