CN112329843B

CN112329843B - Call data processing method, device, equipment and storage medium based on decision tree

Info

Publication number: CN112329843B
Application number: CN202011211367.XA
Authority: CN
Inventors: 李海翔
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2020-11-03
Filing date: 2020-11-03
Publication date: 2024-06-11
Anticipated expiration: 2040-11-03
Also published as: CN112329843A

Abstract

The invention discloses a call data processing method, device, equipment and storage medium based on a decision tree. The method comprises the following steps: acquiring historical call data, wherein the historical call data comprises a call result identifier and historical feature values corresponding to K original features; performing controllability analysis on historical feature values corresponding to the K original features, and determining L controllable features from the K original features; carrying out relevance analysis on historical feature values corresponding to the L controllable features by adopting a call result identifier, and determining N target features from the L controllable features; forming training call data based on the call result identification and historical feature values corresponding to the N target features; and processing the training call data by adopting a decision tree model, obtaining a target decision tree, and obtaining node attribute information corresponding to each node in the target decision tree. The method can carry out call policy adjustment according to the node attribute information in the target decision tree, so that the call policy adjustment has flexibility and improves efficiency and accuracy.

Description

Call data processing method, device, equipment and storage medium based on decision tree

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing call data based on a decision tree.

Background

The voice call system is a system for realizing voice call by adopting artificial intelligence technology, and is formed by adopting a robot to replace manual call making to clients. Generally, when a voice call system schedules a call task, it needs to perform feature analysis processing on historical call data to determine when the call completing rate is the highest, so as to formulate a reasonable call policy. In the prior art, a statistical analysis method (such as pivot table analysis) is generally adopted, statistics is performed by manually adjusting a feature box threshold value of a feature, and data visualization is performed on a statistical result, so that golden section points are determined, and then a calling strategy is adjusted by utilizing the golden section points. The golden section point is the characteristic box threshold with the highest call completing rate. The statistical analysis method has the problems of low flexibility, long time consumption, low efficiency and low accuracy of the determined golden section points because the golden section points are not the global optimal solution due to the fact that the local optimal solution is easily found in a certain range.

Disclosure of Invention

The embodiment of the invention provides a call data processing method, a call data processing device, computer equipment and a storage medium based on a decision tree, which are used for solving the problems of low flexibility, long time consumption, low efficiency and low accuracy in determining a call strategy by adopting a statistical analysis method.

A decision tree based call data processing method, comprising:

Acquiring historical call data, wherein the historical call data comprises a call result identifier and historical feature values corresponding to K original features, and the original features comprise call opportunity features, wherein K is larger than or equal to 2;

Performing controllability analysis on historical feature values corresponding to the K original features, and determining L controllable features from the K original features, wherein L is smaller than or equal to 2 and smaller than or equal to K;

Carrying out relevance analysis on historical feature values corresponding to the L controllable features by adopting the call result identifier, and determining N target features from the L controllable features, wherein the target features comprise the call opportunity features, and the number of the target features is more than or equal to 2 and less than or equal to L;

forming training call data based on the call result identification and the historical feature values corresponding to the N target features;

And processing the training call data by adopting a decision tree model, obtaining a target decision tree, and obtaining node attribute information corresponding to each node in the target decision tree, wherein the node attribute information comprises node category information, node entropy value and node sample information.

A decision tree based call data processing apparatus comprising:

The system comprises a historical call data acquisition module, a call timing acquisition module and a call timing acquisition module, wherein the historical call data comprises a call result identifier and historical characteristic values corresponding to K original characteristics, and the original characteristics comprise call timing characteristics, wherein K is larger than or equal to 2;

the controllable feature determining module is used for performing controllable analysis on historical feature values corresponding to the K original features and determining L controllable features from the K original features, wherein 2 is smaller than or equal to L is smaller than or equal to K;

The target feature determining module is used for carrying out relevance analysis on historical feature values corresponding to the L controllable features by adopting the calling result identification, and determining N target features from the L controllable features, wherein the target features comprise the calling opportunity features, and the number of the target features is 2 less than or equal to N less than or equal to L;

The training call data acquisition module is used for forming training call data based on the call result identification and the historical characteristic values corresponding to the N target characteristics;

the target decision tree acquisition module is used for processing the training call data by adopting a decision tree model, acquiring a target decision tree and acquiring node attribute information corresponding to each node in the target decision tree, wherein the node attribute information comprises node category information, node entropy value and node sample information.

A decision tree based call data processing method, comprising:

Acquiring at least one piece of to-be-called data corresponding to a to-be-called client, wherein the to-be-called data comprises to-be-called feature values corresponding to N target features, the target features comprise calling time features, and the to-be-called feature values corresponding to the calling time features are configured to call time periods;

inputting the data to be called into the target decision tree, determining a target node to which the data to be called belongs on the target decision tree, and acquiring node attribute information of the target node;

Determining a time period priority corresponding to at least one configuration calling time period corresponding to the client to be called based on node attribute information of at least one target node corresponding to the same client to be called;

And carrying out call policy adjustment on all the clients to be called based on the time period priority corresponding to at least one configuration call time period corresponding to all the clients to be called, and determining the target call time period corresponding to the clients to be called.

A decision tree based call data processing apparatus comprising:

The system comprises a waiting call data acquisition module, a waiting call data processing module and a calling time processing module, wherein the waiting call data acquisition module is used for acquiring at least one waiting call data corresponding to a waiting call client, the waiting call data comprises waiting call characteristic values corresponding to N target characteristics, the target characteristics comprise calling time characteristics, and the waiting call characteristic values corresponding to the calling time characteristics are configured calling time periods;

The target node determining module is used for inputting the data to be called into the target decision tree obtained in the embodiment, determining a target node to which the data to be called belongs on the target decision tree, and obtaining node attribute information of the target node;

The time period priority determining module is used for determining the time period priority corresponding to at least one configuration calling time period corresponding to the same client to be called based on node attribute information of at least one target node corresponding to the same client to be called;

and the target calling period determining module is used for carrying out calling strategy adjustment on all the clients to be called based on the period priority corresponding to at least one configuration calling period corresponding to all the clients to be called, and determining the target calling period corresponding to the clients to be called.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the above-mentioned decision tree based call data processing method when executing the computer program.

A computer readable storage medium storing a computer program which when executed by a processor implements the decision tree based call data processing method described above.

According to the call data processing method, device, computer equipment and storage medium based on the decision tree, the historical characteristic values corresponding to the original characteristics are subjected to controllable analysis to determine the controllable characteristics, so that interference of uncontrollable characteristics is eliminated, and the processing efficiency of a target decision tree generated later is guaranteed; carrying out relevance analysis on the historical feature values and the call result identifiers corresponding to all the controllable features to obtain target features with strong relevance to the call result identifiers, and helping to ensure the processing efficiency and the result relevance of a target decision tree generated subsequently so as to ensure the accuracy of subsequent call strategy adjustment; and forming target features based on the N target features, processing training call data to form target decision trees and node attribute information corresponding to each node, rapidly determining optimal node category information through node entropy values in the node attribute information, and further performing call policy adjustment, so that the call policy adjustment has flexibility, shortens time consumption of call policy formulation, and improves call policy adjustment processing efficiency and accuracy.

According to the method, the device, the computer equipment and the storage medium for processing the call data based on the decision tree, at least one piece of data to be called corresponding to the clients to be called is input into the target decision tree to determine the target node corresponding to each piece of data to be called, so that the time period priority corresponding to the configuration call time period in the data to be called is determined, the call policy adjustment is carried out on all the clients to be called according to the time period priority, the rapid and accurate call policy adjustment is carried out by using the target decision tree, the target call time period corresponding to the clients to be called is determined, the success rate of calling the clients to be called is guaranteed, the call policy adjustment has flexibility, the time consumption for making the call policy is shortened, and the call policy adjustment processing efficiency and accuracy are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an application environment of a call data processing method based on decision tree according to an embodiment of the present invention;

FIG. 2 is a flow chart of a call data processing method based on decision tree in an embodiment of the invention;

FIG. 3 is another flow chart of a decision tree based call data processing method in an embodiment of the invention;

FIG. 4 is another flow chart of a decision tree based call data processing method in an embodiment of the invention;

FIG. 5 is another flow chart of a decision tree based call data processing method in an embodiment of the invention;

FIG. 6 is another flow chart of a decision tree based call data processing method in an embodiment of the invention;

FIG. 7 is a diagram of a call data processing apparatus based on decision tree according to an embodiment of the present invention;

FIG. 8 is another schematic diagram of a decision tree based call data processing apparatus in accordance with an embodiment of the present invention;

FIG. 9 is a schematic diagram of a computer device in accordance with an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The call data processing method based on the decision tree provided by the embodiment of the invention can be applied to an application environment shown in figure 1. Specifically, the call data processing method based on the decision tree is applied to a voice call system, the voice call system comprises a client and a server as shown in fig. 1, the client and the server are communicated through a network, the method is used for analyzing and visualizing historical call data based on a decision tree model, and a target decision tree is obtained so as to quickly and accurately adjust a call strategy based on the target decision tree. The client is also called a client, and refers to a program corresponding to the server for providing local service for the client. The client may be installed on, but is not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, a call data processing method based on decision tree is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps:

S201: historical call data are acquired, wherein the historical call data comprise call result identifiers and historical feature values corresponding to K original features, and the original features comprise call opportunity features, wherein K is larger than or equal to 2.

Historical call data is information related to historical calls, i.e., data formed by a voice call system calling any customer prior to the current time of the system. Each historical call data is associated with a data identification.

The call result identifier is an identifier corresponding to a call result for reflecting whether or not a history call is put through. The call result identifier comprises a call success identifier and a call failure identifier, and the call success identifier refers to an identifier generated when a call to a client is successful through a voice call system as the name implies; accordingly, the call failure identification refers to an identification generated when a call is failed to a customer through a voice call system.

The original feature is a feature for classifying the history call data, and can be understood as an abbreviation of the original feature name. K is the number of original features, where K+.2. The historical feature value corresponding to the original feature refers to a specific value corresponding to the original feature in the historical call data.

In the process of adjusting the call policy, the conditions of call opportunity, call resource scheduling and the like are mainly considered, so that the historical call data should include information related to the call opportunity, that is, the original feature at least includes a call opportunity feature, and the historical feature value corresponding to the call opportunity feature is a call timestamp. The call time stamp is a time stamp corresponding to a history call, specifically, a time stamp when the call is sent to a client through a voice call system, and is automatically recorded by the voice call system.

In this example, the original features may include at least one of a user portrayal feature and a call destination feature in addition to the call opportunity feature. The user profile features are features associated with the user profile of the called client including, but not limited to, age, gender, academic, occupation, and income. The call destination feature is a feature related to a call destination corresponding to a history call. For example, if the history call is for policy-based collection, the call is targeted to a policy, and the characteristics related to the call include, but are not limited to, policy premium, policy risk type, policy self-service type, and payment period.

As an example, the server may obtain, from a system database corresponding to the voice call system, historical call data formed by all historical calls before the current time of the system, so as to analyze and process all the historical call data based on the decision tree model, and utilize self-learning and correction capabilities of the decision tree model to perform continuous iterative training, so that the analysis process of the historical call data is less time-consuming, fast in efficiency and accurate in result. Generally, the number of historical call data acquired by the server needs to be greater than a preset number threshold to ensure the effectiveness of the target decision tree obtained by training on the call policy adjustment. The preset number threshold is a preset threshold for evaluating whether the number of historical call data reaches the number required to train the decision tree.

S202: and performing controllability analysis on historical feature values corresponding to the K original features, and determining L controllable features from the K original features, wherein L is smaller than or equal to 2 and smaller than or equal to K.

The controllability analysis is a process for analyzing whether original features corresponding to all historical call data have controllability or not so as to realize the preservation of controllable features and the removal of uncontrollable features. The controllable feature is an original feature with controllability; accordingly, an uncontrollable feature is an original feature that has no controllability. L is the number of controllable features, where 2+.L+.K.

The controllability in this example means that in all the historical call data, the historical feature value corresponding to the same original feature has controllability, that is, the integrity of the historical feature value corresponding to the same original feature in all the historical call data meets the standard and there is no frequent change. For example, the number of the historical call data acquired by the server is 10000, and if the non-empty number of the historical feature values corresponding to the original feature A in 10000 historical call data is not empty value is greater than the first preset number, the integrity corresponding to the original feature A in all the historical call data is determined to reach the standard; if the fluctuation quantity of the historical characteristic values corresponding to all the original characteristics A in the preset evaluation period is smaller than the second preset quantity, the situation that frequent fluctuation exists is determined to be absent; thus, the original feature a is considered to have controllability, which is a controllable feature. For example, if the number of non-null values corresponding to the original feature B in 10000 historical call data is not greater than the first preset number, the integrity corresponding to the original feature B in all the historical call data is determined to be not up to standard, that is, if the historical feature value corresponding to the original feature B in the historical call data is greater than the first preset number, the original feature B is used as the target feature for training of the decision tree, smooth classification is not possible or the classification result is inaccurate, so that the original feature B is determined to be uncontrollable. For another example, if the number of changes of the historical feature values corresponding to all the original features C in the preset evaluation period is not less than the second preset number, that is, in the preset evaluation period, there is more than the second preset number of historical call data, the historical feature values corresponding to the original features C change, so that it is determined that the historical feature values corresponding to the original features C have frequent changes, that is, the original features C have no controllability and are uncontrollable features.

As an example, the server adopts a controllability analysis logic to analyze the historical feature values corresponding to the K original features in all the historical call data, and obtains a controllability analysis result corresponding to each original feature; if the controllability analysis result corresponding to the original features is that the original features are controllable, determining the original features as controllable features; if the controllability analysis result corresponding to the original features is that the original features are not controllable, the original features are determined to be uncontrollable features, so that the purpose of determining L controllable features from K original features is achieved, interference of the uncontrollable features is eliminated, and the reliability and the accuracy of a target decision tree obtained through training are guaranteed. The controllability analysis logic is preset processing logic for performing controllability analysis.

S203: and carrying out relevance analysis on historical feature values corresponding to the L controllable features by adopting the call result identification, and determining N target features from the L controllable features, wherein the target features comprise call opportunity features, and the number of the target features is 2 less than or equal to N less than or equal to L.

The relevance analysis is used for analyzing relevance between original features corresponding to all historical call data and call results so as to keep target features with strong relevance. The target feature may be understood as a controllable feature whose relevance meets the criteria. N is the number of target features, where 2+.N+.L.

The relevance in this example refers to the degree of correlation between the historical feature value corresponding to the same original feature and the call result in all the historical call data. For example, if the historical feature value corresponding to a certain original feature is larger, the probability of successful call is larger, and the correlation of positive correlation is stronger; correspondingly, if the historical characteristic value corresponding to a certain original characteristic is larger, the probability of successful call is smaller, and the correlation of the negative correlation is stronger.

As an example, the server adopts a relevance analysis logic to perform relevance analysis on historical feature values corresponding to K original features and call result identifiers thereof in all the historical call data, and obtains feature relevance corresponding to each original feature; and selecting the first N controllable features with higher feature relevance from the L controllable features, and determining the first N controllable features as target features, wherein the target features are the features which are used subsequently as the corresponding classification conditions in the decision tree training.

S204: and forming training call data based on the call result identification and the historical feature values corresponding to the N target features.

The training call data is data for inputting a decision tree model for model training.

In the example, the server forms the training call data from the call result identifier and the historical feature values corresponding to the N target features, so as to remove the uncontrollable features and the controllable features with low feature relevance, so that the data size of the training call data is smaller, the correlation with the call result is stronger, and the training efficiency and the training effectiveness of the subsequent decision tree model are guaranteed.

S205: and processing the training call data by adopting a decision tree model, obtaining a target decision tree, and obtaining node attribute information corresponding to each node in the target decision tree, wherein the node attribute information comprises node category information, node entropy value and node sample information.

The decision tree model is a simple and easy-to-use non-parameter classifier, does not need any priori assumption on data, and has the characteristics of high calculation speed, easy interpretation of results and high robustness. The decision tree model may employ, but is not limited to, models LightGBM, GBDT and XGBoost, etc.

The node category information is information corresponding to classification conditions for reflecting that all training call data are divided into different nodes, and comprises target features and feature classification intervals corresponding to the target features. The target feature is a feature corresponding to the classification condition, i.e., a feature for realizing the division of all the history call data into different sets. The feature classification interval corresponding to the target feature is a feature classification interval corresponding to the classification condition, namely, the feature classification interval is used for dividing the historical feature value corresponding to the target feature in the training call data into feature classification intervals corresponding to different categories. When the target feature is a calling occasion feature, the feature classification interval corresponding to the calling occasion feature is a configured calling time period, and time periods for dividing the calling occasions are preconfigured for the system, wherein the number of the configured calling time periods is at least one, so that historical calling data are counted according to different configured calling time periods, and the subsequent utilization of the configured calling time periods for calling strategy adjustment is facilitated.

For example, for a historical feature value D corresponding to any target feature D, the lower limit of the feature value is D _min, the lower limit of the feature value is D _max, and M feature splitting points D ₁、D₂……D_M are provided, m+1 feature classification intervals are formed based on the M feature splitting points, and D _min-D₁、D₁-D₂、……D_M-1-D_M、D_M-D_max are respectively formed, so that all historical call data are classified based on the feature classification intervals to determine different feature sets. For example, if the number X of feature split points is 1, 2 feature classification sections are formed. Because the target feature at least comprises one calling time feature, the callable interval can be divided into M+1 feature classification intervals by using M feature splitting points, each training call data comprises a calling time stamp corresponding to the calling time feature, and in the decision tree training process, the training call data can be divided into nodes corresponding to the corresponding feature classification intervals based on the calling time stamp corresponding to the calling time feature.

The node entropy value is the entropy of the node, the entropy is an index for statistically counting the purity of one node, and the possible value of the random variable X is assumed to be X ₁、X₂……X_n; for each possible value, the probability P (x=x _i)＝P_i, (i=1, 2,., n) is given, where the entropy of the random variable X is

It is understood that the node is not pure. And is thus the entropy reflecting all training call data in any node relative to the call completing rate.

In this example, assuming that the feature set T corresponding to each node is a sample set containing T training call data, if the feature set T has n different values on the class attribute S, the values on the class attribute S are equally divided into the same class, that is, the feature set T is divided into n classes { S ₁、S₂……S_n},s_i (1+.i+.n) represents the number of S _i, the node entropy value corresponding to the feature set T isWhere pi=s _i/T is the proportion of the feature set T of the class S _i. In this example, the node entropy H (T) reflects the node purity of the feature set T corresponding to the node, and the greater the node entropy H (T), the greater the uncertainty information the node contains, and the lower the node purity; the node entropy value H (T) reaches the maximum value when the probabilities of all the categories are equal, and the node purity is the least pure. Correspondingly, the smaller the node entropy value H (T), the lower the uncertain information contained in the node, the higher the node purity, and when the node purity is higher, the higher the distinction degree of the feature classification section determined by the feature splitting point corresponding to the target feature in the node to the target value of the call result identifier, namely the higher the distinction degree reflecting the call success identifier and the call failure identifier, so as to use the node entropy value H (T) to carry out call policy adjustment. In this example, the node with smaller node entropy value is the node with unbalanced shouting success and calling failure, i.e. the calling success rate is far greater than the calling failure rate, so the node entropy value H (T) is used for adjusting the calling policy.

The node sample information is information related to the quantity of all training call data which are divided into feature sets corresponding to the same node, and comprises the total number of samples, the number of call success and the number of call failure, wherein the total number of samples is the sum of the number of call success and the number of call failure. The number of successful calls is the number of training sample data carrying successful call identification. The number of call failures is the number of training sample data carrying the identity of the call failure.

For example, if the node purity corresponding to a node is higher, the training call data carrying the call success identifier and the call failure identifier in all the training call data in the node is unbalanced, that is, the number of call success corresponding to the training call data carrying the call success identifier and the number of call failure corresponding to the training call data carrying the call failure identifier are unbalanced, that is, the absolute value corresponding to the difference value between the number of call success and the number of call failure is the absolute value, and the ratio of the total number of occupied samples is larger, so that the feature split point with the highest call completing rate can be quickly determined according to the node entropy value, that is, the node category information corresponding to the node with the smallest node entropy value is selected to determine the feature split point with the highest call completing rate, so as to adjust the call strategy. The call completing rate may be understood as the ratio of the number of successful calls to the total number of samples.

As an example, the server processes all the training call data by adopting the decision tree model, so as to classify all the training call data according to at least two feature classification intervals corresponding to the N target features, and form a target decision tree, where each node in the target decision tree corresponds to node attribute information, and is used to reflect information corresponding to the feature set corresponding to the node.

For example, for an application scenario of policy collection, a call opportunity feature and a policy premium feature may be used as two target features, where each target feature may be provided with a plurality of feature splitting points, so as to divide all training call data into nodes corresponding to different feature sets, where each node corresponds to a node class information; and then carrying out statistical analysis on the training call data in the same node to determine the node entropy value H (T) and the node sample information of the node so as to acquire the node attribute information corresponding to each node in the target decision tree. The node attribute information includes node class information, node entropy value and node sample information for subsequent call policy adjustment according to the node attribute information. In this example, the diversity of the call policy is reflected on a policy set having different policy properties, that is, a feature set formed by feature classification intervals corresponding to N target features; the distribution of call connection is uniform, which means that in most of the policy sets with policy properties, the number of the policy of call success and call failure is uniform, and the node with the minimum node entropy can be determined as the node with the least balanced call success and call failure by the node entropy of each node, namely the node with the success rate of call being far greater than the failure rate of call is determined, so that the information adjustment can be performed by using the node category information of the call.

According to the call data processing method based on the decision tree, the historical feature values corresponding to the original features are subjected to controllability analysis to determine the controllable features, so that interference of uncontrollable features is eliminated, and the processing efficiency of a target decision tree generated later is guaranteed; carrying out relevance analysis on the historical feature values and the call result identifiers corresponding to all the controllable features to obtain target features with strong relevance to the call result identifiers, and helping to ensure the processing efficiency and the result relevance of a target decision tree generated subsequently so as to ensure the accuracy of subsequent call strategy adjustment; and forming target features based on the N target features, processing training call data to form target decision trees and node attribute information corresponding to each node, rapidly determining optimal node category information through node entropy values in the node attribute information, and further performing call policy adjustment, so that the call policy adjustment has flexibility, shortens time consumption of call policy formulation, and improves call policy adjustment processing efficiency and accuracy.

In one embodiment, after step S205, that is, after the training call data is processed by using the decision tree model, a target decision tree is obtained, and node attribute information corresponding to each node in the target decision tree is obtained, the call data processing method based on the decision tree further includes: and performing visualization processing on the target decision tree by adopting visualization tools, obtaining a visualization decision tree, and displaying node attribute information in a node display area corresponding to the visualization decision tree.

Wherein the visualization tool is a tool for implementing the visualization process.

As an example, the server may use Graphviz a visualization tool to perform visualization processing on the target decision tree to obtain a visualization decision tree, and display node attribute information in a node display area corresponding to the visualization decision tree on the client, and specifically display node category information, node entropy value and node sample information related to the call result identifier, so as to perform call policy adjustment based on the visualization decision tree. For example, the node with the smallest node entropy value determines the node with the least imbalance between the call success and the call failure, i.e. the node with the success rate of the call being far greater than the failure rate of the call, so as to use the node category information of the call to adjust information.

In the call data processing method based on the decision tree, after the decision tree model is adopted to train and determine the target decision tree, the visualization tool Graphviz is adopted to visualize the splitting process of the target decision tree, so that a user can clearly find out the target node with the highest distinction degree, namely the target node with the minimum node entropy value, and trace back from the target node to obtain all the splitting processes of the target node, the feature classification intervals corresponding to all the target features are intuitively reflected, and the method is strong in interpretability and easy to express. In the example, the call policy can be adjusted according to node attribute information such as node category information, node entropy value and node sample information of each node in the visual decision tree, and the call policy adjustment processing efficiency and accuracy are improved according to the flexibility of guaranteeing the call policy adjustment, shortening the time consumption of the call policy formulation.

In an embodiment, as shown in fig. 3, step S202, namely performing controllability analysis on historical feature values corresponding to K original features, determines L controllable features from the K original features, includes:

S301: and carrying out integrity statistics on the historical feature values corresponding to the same original feature in all the historical call data to obtain feature integrity corresponding to the original feature.

The feature integrity corresponding to the original feature is used for reflecting the probability that the same original feature in all the historical call data contains a non-null feature value. Non-null eigenvalues refer to eigenvalues that are not null.

As an example, the server obtains the number of history samples corresponding to all the history call data, counts the non-empty feature values corresponding to the same original feature in all the history call data, and determines the number of the non-empty feature values corresponding to the original feature to the number of the non-empty feature values corresponding to the original feature; and determining the quotient of the non-empty quantity corresponding to the original features and the historical sample quantity as the feature integrity corresponding to the original features, so as to avoid the problems of over fitting and insufficient generalization capability of the subsequent training of the target decision tree based on the historical call data with smaller feature integrity, thereby ensuring the accuracy of the target decision tree obtained by training. For example, if the number of history samples of the history call data obtained by the server is 10000 and the number of non-null features corresponding to the original feature a is 9000, the feature integrity of the original feature a is 90%, which indicates that 90% of the history call data includes the non-null feature value corresponding to the original feature a.

S302: and counting the variation probability of the historical feature value corresponding to the same original feature in all the historical call data in a preset evaluation period to obtain the feature variation probability corresponding to the original feature.

The preset evaluation period is a period for evaluating whether the historical feature value corresponding to the original feature frequently fluctuates, and may be autonomously determined according to the actual situation, for example, may be set to 1 month. The feature fluctuation probability is a probability for reflecting the fluctuation of the historical feature value corresponding to the same original feature in all the historical call data.

As an example, the server obtains the number of history samples corresponding to all the history call data, counts the number of variation of the history feature values corresponding to the same original feature in all the history call data, and obtains the number of variation corresponding to each original feature, where the number of variation corresponding to the original feature is the number of variation of the history feature values corresponding to the original feature in a preset evaluation period; and determining the quotient of the fluctuation quantity corresponding to the original features and the historical sample quantity as the feature fluctuation probability corresponding to the original features. For example, the number of history samples of the history call data acquired by the server is 10000, the preset evaluation period is 1 month, if the number of changes of the history feature value corresponding to the original feature a is 100 in 1 month before the current time of the system, the feature change probability corresponding to the original feature is 1%. It can be understood that whether the historical feature value corresponding to the original feature changes or not can be determined by querying the historical record related to the historical call data in the system database based on the data identification corresponding to the historical call data, so as to change the historical feature value corresponding to the original feature within a preset evaluation period, avoid the subsequent decision tree training based on the frequently changed historical call data, and further ensure the accuracy of the target decision tree obtained by training.

S303: and if the feature integrity is greater than the integrity threshold and the feature fluctuation probability is less than the fluctuation probability threshold, determining the original feature as the controllable feature.

Wherein the integrity threshold is a threshold for evaluating whether the integrity meets a criterion. The variation probability is a threshold value for evaluating whether the variation probability meets the standard.

As an example, if the feature integrity is greater than the integrity threshold and the feature variation probability is less than the variation probability threshold, the original feature is deemed to be controllable and the original feature is determined to be a controllable feature. Correspondingly, if the feature integrity is not greater than the integrity threshold, or the feature fluctuation probability is not less than the fluctuation probability threshold, the original feature is determined to have no controllability, and the original feature is determined to be an uncontrollable feature.

In the call data processing method based on the decision tree provided by the embodiment, the original features with the feature integrity larger than the integrity threshold and the feature variation probability smaller than the variation probability threshold are determined as controllable features, so that the situation that the target decision tree obtained by training is over-fitted due to the fact that the feature integrity is not up to standard is avoided, and the generalization capability is insufficient; or the original characteristics with excessively high characteristic variation probability are prevented from being determined as controllable characteristics, so that uncertainty exists in training call data for training a target decision tree, and the accuracy of training the target decision tree and adjusting a subsequent call strategy is prevented from being influenced.

As an example, as shown in fig. 4, performing relevance analysis on historical feature values corresponding to L controllable features by using call result identifiers, and determining N target features from the L controllable features includes:

S401: and processing the historical characteristic values corresponding to the same controllable characteristic in all the historical call data by adopting the call result identifier to acquire the information gain corresponding to the controllable characteristic.

As an example, step S401 specifically includes: (1) The server firstly adopts the calling result identifier to process the historical characteristic values corresponding to the same controllable characteristic in all the historical calling data, and determines the experience entropy corresponding to each controllable characteristic. For example, in the sample set D formed by all the historical call data, the feature classification interval corresponding to each controllable feature can be divided into k categories { C ₁、C₂……C_k }, and the probability corresponding to each category isWherein, |C _k | is the number of samples of the class C _k, |D| is the total number of samples of the sample set D, and the empirical entropy corresponding to each controllable feature is/>(2) And the server determines the information gain corresponding to the controllable features according to the experience entropy of each controllable feature. The server classifies all the historical call data by utilizing the feature classification interval corresponding to each controllable feature, and determines the information gain corresponding to the controllable feature based on the difference value of the empirical entropy before and after the division, namely g (D, A) =H (D) -H (D|A), wherein g (D, A) is the information gain, H (D) is the empirical entropy before the division, and H (D|A) is the empirical entropy after the division, so that the information gain g (D, A) is used for measuring the dividing effect of the controllable feature on the sample set D formed by the whole historical call data. It will be appreciated that the empirical entropy before division H (D) is constant for the sample set D formed from historical call data, but that the greater the empirical entropy after division H (DA), the less uncertainty in the subset partitioned using this controllable feature, and the more relevant the call result.

S402: and ordering the information gains corresponding to the L controllable features, and determining the first N controllable features with larger information gains as target features.

As an example, the server orders the information gains corresponding to the L controllable features, determines the first N controllable features with larger information gains as target features, so as to determine the first N controllable features with larger relevance to the call result as target features, and exclude other controllable features with weaker relevance to the call result, thereby ensuring the efficiency and accuracy of the target decision tree obtained by subsequent training relative to the call policy adjustment.

In one embodiment, as shown in fig. 5, step 205, that is, processing training call data by using a decision tree model, obtaining a target decision tree, and obtaining node attribute information corresponding to each node in the target decision tree includes:

s501: and constructing an original decision tree related to the N target features and the feature classification interval corresponding to each target feature by adopting a decision tree model.

As an example, according to the N target features, determining all feature classification intervals corresponding to the target features according to a lower limit of a historical feature value, a feature splitting point and an upper limit of a feature value corresponding to each target feature; and then adopting a decision tree model to process all target features and the feature classification intervals thereof so as to determine an original decision tree. For example, the evaluation indexes such as the empirical entropy, the information gain ratio or the kenel coefficient corresponding to the feature set formed by the N target features can be calculated, so that the dividing sequence of each target feature is determined according to the level of the evaluation indexes, and the original decision tree is determined.

S502: inputting all training call data into an original decision tree, and acquiring node attribute information corresponding to each node in the original decision tree, wherein the node attribute information comprises node category information, node entropy value and node sample information, and the node sample information comprises the total number of samples.

As an example, the server inputs all training call data into the original decision tree for classification to determine the total number of samples corresponding to each node in the original decision tree, which can be understood as the number of training call data satisfying the feature classification interval corresponding to the target feature. In this example, the nodes in each original decision tree include a root node, a leaf node and an intermediate node disposed between the root node and the leaf node, all the training call data are input into the original decision tree to determine the leaf node corresponding to each training call data, and the total number of samples corresponding to all the leaf nodes is counted.

S503: and determining nodes with the total number of samples smaller than a preset sample threshold as nodes to be pruned, and pruning an original decision tree by adopting the nodes to be pruned to obtain a target decision tree.

The preset sample threshold is a preset number threshold used for evaluating whether the total number of samples reaches a retention standard.

As an example, the server compares the total number of samples of each node with a preset sample threshold, and determines nodes with the total number of samples smaller than the preset sample threshold as nodes to be deleted, namely nodes to be deleted; the method comprises the steps of adopting nodes to be pruned to prune an original decision tree, so that the total number of samples of each node in a target decision tree after pruning is not smaller than a preset sample threshold value, and the problems of over fitting and insufficient generalization capability caused by too few total numbers of samples corresponding to part of nodes are avoided, and the accuracy of subsequent call strategy adjustment based on the target decision tree is affected.

The embodiment of the invention also provides a call data processing method based on the decision tree, which is used for processing the data to be called by utilizing the target decision tree trained by the embodiment to realize the adjustment of the call strategy, and the call data processing method based on the decision tree comprises the following steps:

S601: at least one piece of to-be-called data corresponding to a to-be-called client is obtained, the to-be-called data comprises to-be-called feature values corresponding to N target features, the target features comprise calling time features, and the to-be-called feature values corresponding to the calling time features are used for configuring a calling time period.

The data to be called refers to data to be called, namely data to be subjected to call policy adjustment to determine a call period.

The target features are features corresponding to classification conditions adopted by the pre-constructed target decision tree. The target features include a call opportunity feature and at least one of a user portrayal feature and a call destination feature that are controllable and have a strong correlation with call results.

The feature value to be called refers to a specific value corresponding to the target feature in the data to be called. When the target feature is a calling time feature, a feature classification interval corresponding to the calling time feature, which is adopted in the process of training the target decision tree, can be adopted as a default feature value to be called, and the feature classification interval corresponding to the calling time feature is a configuration calling time period; when the target feature is at least one of the user portrayal feature and the call destination feature, the specific value thereof is determined as the waiting feature value. In general, in the data to be called corresponding to the same customer to be called, the feature value to be called corresponding to the calling time feature adopts default configuration calling time period, and the number of configuration calling time period is at least one, so the number of feature value to be called corresponding to the calling time feature is at least one; and the number of the feature values to be called corresponding to the user portrait feature and the call destination feature is only unique. Therefore, the number of the data to be called corresponding to the same customer to be called is at least one, and the configured calling time periods corresponding to the calling time characteristics are different.

S602, inputting data to be called into a target decision tree, determining a target node to which the data to be called belongs on the target decision tree, and acquiring node attribute information of the target node.

The target node is a leaf corresponding to node category information, which is used for inputting data to be called into a target decision tree to classify the data so as to determine the node category information matched with the feature values to be called corresponding to N target features.

As an example, the server inputs each data to be called into the target decision tree, matches the feature values to be called corresponding to the N target features with the node class information corresponding to each node according to the node traversal sequence of the root node, the intermediate node and the leaf node, that is, compares the feature values to be called corresponding to the N target features with the feature classification intervals corresponding to the same target feature in the node class information corresponding to the node, and if all the feature values to be called corresponding to the target features are in the corresponding feature classification intervals, the node is the node to be selected; determining the last traversed node to be selected as a target node to which the data to be called belongs according to the node traversing sequence, acquiring node attribute information of the target node, and determining the node traversing path as a node traversing path based on the target node and all traversed nodes to be selected.

S603: and determining the time period priority corresponding to at least one configuration calling time period corresponding to the client to be called based on the node attribute information of at least one target node corresponding to the same client to be called.

Since the feature value to be called corresponding to the calling time feature is at least one configured calling time period, one piece of to-be-called data is formed based on each configured calling time period, the number of to-be-called data corresponding to the same to-be-called client is at least one, each piece of to-be-called data determines a target node, and the target node corresponds to the configured calling time period.

In this embodiment, the server may determine target nodes based on at least one to-be-called data corresponding to the same to-be-called client, where each target node is associated with a configured calling period in the to-be-called data; node attribute information such as node entropy value and node sample information corresponding to at least one target node corresponding to the same customer to be called is obtained, time period priority corresponding to at least one configuration calling time period corresponding to the customer to be called is determined, the time period priority can be related to the call success probability of the configuration calling time period in historical calling, so that the customer to be called is determined to carry out calling strategy adjustment based on the time period priority, and the success rate of calling the customer to be called is improved.

In one embodiment, step S603, namely determining, based on node attribute information of at least one target node corresponding to the same to-be-called client, a time period priority corresponding to at least one configured call time period corresponding to the to-be-called client, specifically includes:

s6031: and based on the node entropy value and the node sample information of at least one target node corresponding to the same customer to be called.

The node entropy value and the node sample information are specific contents of node attribute information corresponding to a target node in a target decision tree.

S6032: and acquiring the call priority corresponding to the at least one target node based on the node entropy value and the node sample information of the at least one target node.

Since the node sample information of the target node includes the total number of samples, the number of call successes and the number of call failures, if the number of call successes is greater than the number of call failures, it is indicated that the probability of call successes is greater than the probability of call failures in the corresponding configured call period; otherwise, if the number of successful calls is not greater than the number of failed calls, it indicates that the probability of successful calls in the corresponding configured call period is not greater than the probability of failed calls. The node entropy value of the target node reflects whether the call success and call failure distribution is balanced or not, and the smaller the node entropy value is, the more unbalanced the call success and the call failure are, so that the server can determine the call priority corresponding to at least one target node based on node attribute information such as the node entropy value corresponding to at least one target node and node sample information, the call priority of the at least one target node can be ordered from high to low according to the call success probability, and the time period priority is determined based on the call priority of the at least one target node.

As an example, step S6032 specifically includes: (1) Dividing the target nodes with the number of successful calls being greater than the number of failed calls into a first node set, and dividing the target nodes with the number of successful calls not greater than the number of failed calls into a second node set. (2) Forming a first priority sequence according to the order of node entropy values from small to large of all target nodes in the first node set; and forming a second priority sequence according to the order of the node entropy values from large to small of all target nodes in the second node set. (3) And acquiring the call priority corresponding to at least one target node based on the first priority sequence and the second priority sequence.

For example, based on the first priority sequence of the target nodes formed by the order of the node entropy value being smaller, the target node with the greater probability of successful call and the least balanced between the successful call and the failed call is placed in the first priority sequence, and the positions of other target nodes in the first priority sequence are determined according to the unbalanced degree (i.e. the node entropy value) of the successful call and the failed call. Accordingly, the number of call successes is not greater than the number of call failures, and the node entropy is formed by the second priority sequence of the target nodes in the order from large to small, the target node with greater probability of call failure and the least balanced call success and call failure can be placed at the last of the second priority sequence, and the positions of other target nodes in the second priority sequence are determined according to the unbalanced degree (namely the node entropy) ordering of the call success and the call failure, so that the call success probability of the call priorities of all the formed target nodes is ordered from large to small.

S6033: and determining the time period priority corresponding to at least one configuration call time period corresponding to the client to be called based on the call priority corresponding to the at least one target node.

In this example, since at least one to-be-called data corresponding to the same to-be-called client respectively determines one target node, each target node corresponds to a configured call period, that is, the same to-be-called client corresponds to at least one target node, a period priority corresponding to the configured call period corresponding to the at least one target node may be determined based on a call priority corresponding to the at least one target node.

For example, if at least one target node corresponding to the data to be called corresponding to the same customer to be called is P1, P2, P3, P4 and P5, the corresponding configured calling periods are T1, T2, T3, T4 and T5, respectively; if the call priority corresponding to at least one target node is P1> P4> P5> P2> P3; the priority of the time period corresponding to the at least one configuration calling time period corresponding to the to-be-called client is T1> T4> T5> T2> T3, namely, the to-be-called client is preferentially called in the configuration calling time period T1, so that the calling success rate is higher, the calling strategy configuration is realized by utilizing the priority of the time period corresponding to the at least one configuration calling time period corresponding to the to-be-called client, and the subsequent calling success rate is ensured.

S604, based on the time period priority corresponding to at least one configuration calling time period corresponding to all the clients to be called, carrying out calling strategy adjustment on all the clients to be called, and determining the target calling time period corresponding to the clients to be called.

As an example, the server may select the configured calling period with the highest period priority based on the period priority corresponding to at least one configured calling period corresponding to each to-be-called client, and determine the configured calling period as the target calling period corresponding to the to-be-called client; and by analogy, determining the target call time periods corresponding to all the clients to be called, and carrying out voice calls on all the clients to be called in the target call time periods by utilizing the voice call system, thereby being beneficial to improving the call success rate.

As another example, in a case where the system resources of the voice call system are considered to be limited, that is, in a case where the period available resources of any one of the configuration call periods are limited, step S604 specifically includes:

S6041: and determining the time slot to be selected corresponding to each customer to be called according to the sequence of the time slot priorities based on at least one time slot priority corresponding to each time slot to be called.

For example, when the time period priority of the to-be-called client U1 is T1> T4> T5> T2> T3, T1 may be determined as the to-be-selected call time period; if T1 cannot be determined as the target call period, T4, T5, T2 and T3 are sequentially determined as the candidate call periods.

S6042: and acquiring time period available resources corresponding to the time period of the to-be-selected call, and acquiring to-be-allocated resources corresponding to all the clients to be called corresponding to the time period of the to-be-selected call.

The available resources in the time period are the system resources which can be called by the voice call system in each time period of the call to be selected, and are the maximum resources which can be called in the time period of the call to be selected. The resources to be allocated refer to call resources allocated to each client to be called, and the resources to be allocated are larger as the number of the resources to be allocated is larger.

S6043: if the available resources of the time period corresponding to the time period to be selected are larger than or equal to the resources to be allocated, determining the time period to be selected corresponding to the client to be called as the target time period to be called corresponding to the client to be called.

It can be understood that if the available time period resources are greater than or equal to the resources to be allocated, it is indicated that the available time period resources corresponding to the same time period to be called can allocate the corresponding call resources to all the clients to be called, so that the time period to be called, in which the available time period resources are greater than or equal to the resources to be allocated, can be directly determined as the target call time period corresponding to the clients to be called.

S6044: if the available resources of the time period corresponding to the time period to be selected are smaller than the resources to be allocated, carrying out client priority analysis on all clients to be called corresponding to the same time period to be selected, and obtaining the client priority corresponding to the clients to be called; according to the priority sequence of the clients, determining the clients to be called, which are matched with the available resource quantity of the time periods, as target calling clients, and determining the time periods to be selected as target calling time periods corresponding to the target calling clients; and updating and determining the waiting calling time period corresponding to the waiting calling client according to the sequence of the time period priority for the waiting calling client which is not matched with the available resource quantity of the time period.

For example, when the time period priority of the to-be-called client U1 is T1> T4> T5> T2> T3, T1 may be determined as the to-be-selected call time period; if the time period available resource corresponding to the waiting call time period is 1000 call resources. In an example, if all the clients to be called are analyzed, it is determined that the resources to be allocated corresponding to all the clients to be called corresponding to the period T1 to be called are 900, that is, 900 clients to be called want to call in the period T1 to be called, and since the available resources 1000 in the period corresponding to the period to be called are greater than the resources 900 to be allocated, it is possible to determine that the 900 clients to be called are called in the period T1 to be called, and then T1 is the target period corresponding to the 900 clients to be called. In another example, if all the clients to be called are analyzed, it is determined that the resources to be allocated corresponding to all the clients to be called corresponding to the period T1 to be called are 1100, that is, 1100 clients to be called want to be called in the period T1 to be called, because the available resources 1000 in the period corresponding to the period to be called are smaller than the resources 1100 to be allocated, client priority analysis is required to be performed on 1100 clients to be called, client priorities corresponding to 1100 clients to be called are obtained, 1000 clients to be called with the client priority in front can be determined as target calling clients corresponding to the period T1 to be called according to the priority sequence of the clients corresponding to 1100 clients to be called, and the period T1 to be called is determined as the target calling period corresponding to the target calling client; for the clients to be called, which are not matched with the available resources in the time period, i.e. the clients to be called with the later 100 clients of the client priority, the time period to be selected is required to be determined again, the clients to be called U1 are the clients to be called with the later 100 clients of the client priority, then the T4 is determined to be the time period to be selected according to the priority sequence of the time period priority, and the execution steps S6041-S6044 are executed to realize the adjustment of the call strategy of all the clients to be called, thereby being beneficial to guaranteeing the success rate of the calls of the clients to be called in each time period to be called.

In the decision tree-based call data processing method provided by the embodiment, at least one data to be called corresponding to a customer to be called is input into a target decision tree to determine a target node corresponding to each data to be called, so that a time period priority corresponding to a configured call time period in the data to be called is determined, and call policy adjustment is performed on all customers to be called according to the time period priority, so that rapid and accurate call policy adjustment is performed by using the target decision tree, and a target call time period corresponding to the customer to be called is determined, so that the success rate of calling the customer to be called is ensured, the call policy adjustment has flexibility, time consumption of call policy formulation is shortened, and the call policy adjustment processing efficiency and accuracy are improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

In an embodiment, a decision tree-based call data processing device is provided, where the decision tree-based call data processing device corresponds to the decision tree-based call data processing method in the above embodiment one by one. As shown in fig. 7, the decision tree based call data processing apparatus includes a historical call data acquisition module 701, a controllable feature determination module 702, a target feature determination module 703, a training call data acquisition module 704, and a target decision tree acquisition module 705. The functional modules are described in detail as follows:

The historical call data obtaining module 701 is configured to obtain historical call data, where the historical call data includes a call result identifier and historical feature values corresponding to K original features, and the original features include a call opportunity feature, where K is greater than or equal to 2.

The controllable feature determining module 702 is configured to perform a controllable analysis on historical feature values corresponding to the K original features, and determine L controllable features from the K original features, where 2+.l+.k.

The target feature determining module 703 is configured to perform relevance analysis on historical feature values corresponding to the L controllable features by using the call result identifier, and determine N target features from the L controllable features, where the target features include a call opportunity feature, and N is equal to or greater than 2 and equal to or less than L.

The training call data obtaining module 704 is configured to form training call data based on the call result identifier and the historical feature values corresponding to the N target features.

The target decision tree obtaining module 705 is configured to process the training call data by using a decision tree model, obtain a target decision tree, and obtain node attribute information corresponding to each node in the target decision tree, where the node attribute information includes node category information, node entropy value and node sample information.

Preferably, the controllable feature determination module 702 comprises:

and the feature integrity obtaining unit is used for carrying out integrity statistics on the historical feature values corresponding to the same original feature in all the historical call data to obtain feature integrity corresponding to the original feature.

The feature variation probability obtaining unit is used for counting variation probabilities of historical feature values corresponding to the same original feature in all the historical call data in a preset evaluation period to obtain feature variation probabilities corresponding to the original feature.

And the controllable feature judging unit is used for determining the original feature as the controllable feature if the feature integrity is larger than the integrity threshold and the feature fluctuation probability is smaller than the fluctuation probability threshold.

Preferably, the target feature determination module 703 includes:

And the information gain acquisition unit is used for processing the historical characteristic values corresponding to the same controllable characteristic in all the historical call data by adopting the call result identification to acquire the information gain corresponding to the controllable characteristic.

And the target feature determining unit is used for sequencing the information gains corresponding to the L controllable features and determining the first N controllable features with larger information gains as target features.

In an embodiment, a decision tree-based call data processing device is provided, where the decision tree-based call data processing device corresponds to the decision tree-based call data processing method in the above embodiment one by one. As shown in fig. 8, the decision tree-based call data processing apparatus includes a to-be-called data acquisition module 801, a target node determination module 802, a period priority determination module 803, and a target call period determination module 804. The functional modules are described in detail as follows:

The to-be-called data obtaining module 801 is configured to obtain at least one to-be-called data corresponding to a to-be-called client, where the to-be-called data includes to-be-called feature values corresponding to N target features, the target features include a calling opportunity feature, and the to-be-called feature value corresponding to the calling opportunity feature is a configured calling period.

The target node determining module 802 is configured to input the data to be called into the target decision tree obtained in the above embodiment, determine a target node to which the data to be called belongs on the target decision tree, and obtain node attribute information of the target node.

The period priority determining module 803 is configured to determine, based on node attribute information of at least one target node corresponding to the same to-be-called client, a period priority corresponding to at least one configured call period corresponding to the to-be-called client.

The target calling period determining module 804 is configured to perform calling policy adjustment on all the clients to be called based on at least one period priority corresponding to the configured calling period corresponding to all the clients to be called, and determine a target calling period corresponding to the clients to be called.

Preferably, the period priority determining module 803 includes:

the node information acquisition unit is used for obtaining the node entropy value and the node sample information of at least one target node corresponding to the same client to be called.

And the call priority acquisition unit is used for acquiring the call priority corresponding to the at least one target node based on the node entropy value and the node sample information of the at least one target node.

And the time period priority acquisition unit is used for determining the time period priority corresponding to at least one configuration call time period corresponding to the client to be called based on the call priority corresponding to the at least one target node.

Preferably, the target call period determining module 804 includes:

And the waiting call period determining unit is used for determining the waiting call period corresponding to each waiting call client according to the sequence of the time period priorities based on the time period priority corresponding to at least one configuration call period corresponding to each waiting call client.

The resource determining module is used for acquiring time period available resources corresponding to the time period of the to-be-selected call and acquiring to-be-allocated resources corresponding to all to-be-called clients corresponding to the time period of the to-be-selected call.

And the first target time period determining module is used for determining the time period of the to-be-called call corresponding to the to-be-called client as the target call time period corresponding to the to-be-called client if the available time period resources corresponding to the time period of the to-be-called call are larger than or equal to the to-be-allocated resources.

The second target time period determining module is used for analyzing the priority of all the clients to be called corresponding to the same time period to obtain the priority of the clients to be called if the available time period resources corresponding to the time period to be selected are smaller than the resources to be allocated; according to the priority sequence of the clients, determining the clients to be called, which are matched with the available resource quantity of the time periods, as target calling clients, and determining the time periods to be selected as target calling time periods corresponding to the target calling clients; and updating and determining the waiting calling time period corresponding to the waiting calling client according to the sequence of the time period priority for the waiting calling client which is not matched with the available resource quantity of the time period.

For specific limitations on the decision tree based call data processing means, reference is made to the above limitations on the decision tree based call data processing method, and no further description is given here. The various modules in the decision tree based call data processing apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 9. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for executing data adopted or generated by the call data processing method process based on the decision tree. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a decision tree based call data processing method.

In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the computer program to implement the call data processing method based on the decision tree in the foregoing embodiment, for example, as shown in fig. 2 to 6, and is not repeated herein. Or the processor when executing the computer program implements the functions of the modules/units in this embodiment of the decision tree based call data processing apparatus, for example, the functions of the modules/units shown in fig. 7 to 6, which are not repeated here.

In an embodiment, a computer readable storage medium is provided, and a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the method for processing call data based on decision tree in the above embodiment is implemented, for example, as shown in fig. 2 to 6, and is not repeated here. Or the computer program, when executed by the processor, implements the functions of the modules/units in the embodiment of the decision tree-based call data processing apparatus, for example, the functions of the modules/units shown in fig. 7 to 6, which are not repeated here.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A method for processing call data based on decision tree, comprising:

performing controllability analysis on historical feature values corresponding to the K original features, and determining L controllable features from the K original features, wherein L is smaller than or equal to 2 and smaller than or equal to K, and the method comprises the following steps: carrying out integrity statistics on historical feature values corresponding to the same original feature in all the historical call data to obtain feature integrity corresponding to the original feature; counting the variation probabilities of the historical feature values corresponding to the same original feature in all the historical call data in a preset evaluation period to obtain feature variation probabilities corresponding to the original feature; if the feature integrity is greater than an integrity threshold and the feature fluctuation probability is less than a fluctuation probability threshold, determining the original feature as a controllable feature;

2. The decision tree based call data processing method as recited in claim 1, wherein said employing said call result identification to perform correlation analysis on historical feature values corresponding to L of said controllable features, determining N target features from L of said controllable features, comprises:

Processing the historical characteristic values corresponding to the same controllable characteristic in all the historical call data by adopting the call result identifier to acquire the information gain corresponding to the controllable characteristic;

And sequencing the information gains corresponding to the L controllable features, and determining the first N controllable features with larger information gains as target features.

3. A method for processing call data based on decision tree, comprising:

Inputting the data to be called into a target decision tree obtained by the method according to any one of claims 1-2, determining a target node to which the data to be called belongs on the target decision tree, and obtaining node attribute information of the target node;

4. A decision tree based call data processing method in accordance with claim 3, wherein said determining a time period priority corresponding to at least one configured call time period corresponding to said to-be-called client based on node attribute information of at least one of said target nodes corresponding to the same to-be-called client comprises:

based on the node entropy value and the node sample information of at least one target node corresponding to the same client to be called;

acquiring a call priority corresponding to at least one target node based on the node entropy value and the node sample information of the at least one target node;

and determining the time period priority corresponding to at least one configuration call time period corresponding to the client to be called based on the call priority corresponding to at least one target node.

5. The decision tree based call data processing method as recited in claim 3, wherein said determining a target call period corresponding to said to-be-called client based on said at least one configured call period corresponding to said time period priority for all said to-be-called clients, performing call policy adjustment for all said to-be-called clients, comprises:

Determining a waiting calling time period corresponding to each waiting calling client according to the sequence of the time period priorities based on at least one time period priority corresponding to each configuration calling time period corresponding to each waiting calling client;

acquiring time period available resources corresponding to the time period to be selected, and acquiring resources to be allocated corresponding to all clients to be called, wherein the resources to be allocated correspond to the time period to be selected;

If the available time period resources corresponding to the time period to be called are larger than or equal to the resources to be allocated, determining the time period to be called corresponding to the client to be called as the target time period to be called corresponding to the client to be called;

If the time period available resource corresponding to the time period to be selected is smaller than the to-be-allocated resource, carrying out client priority analysis on all to-be-called clients corresponding to the same time period to be selected, and obtaining the client priority corresponding to the to-be-called clients; determining a to-be-called client matched with the number of the time slot available resources as a target call client according to the priority sequence of the clients, and determining the to-be-selected call time slot as a target call time slot corresponding to the target call client; and updating and determining the waiting calling time period corresponding to the waiting calling client according to the sequence of the time period priority for the waiting calling client which is not matched with the available resource quantity of the time period.

6. A decision tree based call data processing apparatus comprising:

The controllable feature determining module is configured to perform controllable analysis on historical feature values corresponding to the K original features, determine L controllable features from the K original features, where 2+.l+. kj includes: carrying out integrity statistics on historical feature values corresponding to the same original feature in all the historical call data to obtain feature integrity corresponding to the original feature; counting the variation probabilities of the historical feature values corresponding to the same original feature in all the historical call data in a preset evaluation period to obtain feature variation probabilities corresponding to the original feature; if the feature integrity is greater than an integrity threshold and the feature fluctuation probability is less than a fluctuation probability threshold, determining the original feature as a controllable feature;

7. A decision tree based call data processing apparatus comprising:

A target node determining module, configured to input the data to be called into a target decision tree obtained by the method according to any one of claims 1-2, determine a target node to which the data to be called belongs on the target decision tree, and obtain node attribute information of the target node;

8. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the decision tree based call data processing method according to any of claims 1 to 5 when executing the computer program.

9. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the decision tree based call data processing method according to any one of claims 1 to 5.