CN115344690A - Data processing method and device for business problems - Google Patents

Data processing method and device for business problems Download PDF

Info

Publication number
CN115344690A
CN115344690A CN202111447931.2A CN202111447931A CN115344690A CN 115344690 A CN115344690 A CN 115344690A CN 202111447931 A CN202111447931 A CN 202111447931A CN 115344690 A CN115344690 A CN 115344690A
Authority
CN
China
Prior art keywords
business
model
classification
matching
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111447931.2A
Other languages
Chinese (zh)
Inventor
薛苏杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN202111447931.2A priority Critical patent/CN115344690A/en
Publication of CN115344690A publication Critical patent/CN115344690A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a data processing method and a data processing device aiming at business problems. The method comprises the following steps: a feature extraction step, extracting key words for business problems input by a user, acquiring a user portrait of the user, and generating feature data based on the key words and the user portrait; a classification step, namely inputting the characteristic data into a classification model trained in advance, and obtaining a service classification corresponding to the service problem by using the classification model; and a first matching step of inputting the feature data into a matching model corresponding to the business classification based on the business classification, and obtaining a first solution corresponding to the business question by using the matching model, wherein the corresponding matching model is trained in advance corresponding to different business classifications. According to the invention, the timeliness and the accuracy of service problem matching can be improved.

Description

Data processing method and device for business problems
Technical Field
The present invention relates to computer network technology, and in particular, to a data processing method for business problems and a data processing apparatus for business problems.
Background
In the prior art, a Bayesian classification algorithm, a K nearest neighbor classification algorithm, a neural network and the like are generally adopted for classifying service problems. However, the existing technical scheme cannot well process the classification of small sample data, and simultaneously cannot well process samples with more feature numbers, and meanwhile, some priori knowledge is not considered in a combined manner, so that the problem of business classification in the external service consultation process cannot be well processed.
The existing question-answering system usually requires a user to actively select a problem category needing consultation, then performs related answer pushing and manual processing, is poor in user experience, and has similar problem descriptions aiming at the same category, so that the user is easy to be confused.
Disclosure of Invention
In view of the above problems, the present invention aims to provide a data processing method for business problems and a data processing apparatus for business problems, which can accurately match the business problems.
The data processing method for the business problem in one aspect of the invention is characterized by comprising the following steps:
a feature extraction step, extracting key words for business problems input by a user, acquiring a user portrait of the user, and generating feature data based on the key words and the user portrait;
a classification step, namely inputting the characteristic data into a classification model trained in advance, and obtaining a service classification corresponding to the service problem by using the classification model; and
and a first matching step of inputting the characteristic data into a matching model corresponding to the business classification based on the business classification, and obtaining a first solution corresponding to the business problem by using the matching model, wherein the corresponding matching model is trained in advance corresponding to different business classifications.
Optionally, after the matching step, further comprising:
a first judging step of judging whether the similarity of the first solution output by the first matching step is higher than a preset first threshold, if so, taking the first solution output by the first matching step as a final result and ending the process, otherwise, further continuing the following second matching step; and
and a second matching step, namely inputting the characteristic data into a pre-trained bottom-of-pocket model, and obtaining a second solution corresponding to the business problem by using the bottom-of-pocket model.
Optionally, after the second matching step, further comprising:
a second judging step of judging whether the similarity of the second solution output by the second matching step is higher than a preset second threshold, if so, taking the second solution output by the second matching step as a final result and ending the process, otherwise, further continuing the following switching step; and
and switching to manual processing to solve the service problem.
Optionally, the feature extraction step includes:
a word segmentation sub-step, which is used for carrying out word segmentation processing on basic data related to business problems to obtain word vectors;
a weight obtaining sub-step, namely combining a pre-formed user-defined word library to obtain the weight corresponding to the word; and
a concatenation substep of concatenating a constituent sentence vector based on the word vector and the weights and serving as the feature data,
the user-defined word stock is formed by a multi-round word segmentation method.
Optionally, the classification model is constructed based on training of a support vector machine algorithm for collecting known business problems and user portraits.
Optionally, the classification model is constructed by:
a collection sub-step, collecting basic data related to business problems and user portrayal;
a feature extraction sub-step, namely extracting features of the basic data to obtain feature data to be trained;
a marking substep, namely endowing the characteristic data to be trained with a business label, wherein the business label is used for marking the business classification corresponding to the characteristic data to be trained; and
and a training substep, adopting a support vector machine algorithm of a kernel function based on probability information to train the feature data to be trained and the service label and obtain a classification model.
Optionally, the support vector machine algorithm is a kernel density estimation function, and is represented by the following formula:
Figure BDA0003384572550000031
where n is the number of samples, h is the bandwidth, x i Are independent and equally distributed sample points.
Optionally, the support vector machine algorithm is a kernel density estimation function, and is represented by the following formula:
Figure BDA0003384572550000032
wherein x is i For independently identically distributed sample points, σ is the variance and μ is the mean.
Optionally, the collecting substep comprises:
acquiring a custom word bank related to a business problem by a multi-round word segmentation method; and
constructing a user portrait according to the relevant information of the user;
optionally, in the sub-step of feature extraction, word segmentation processing is performed on the basic data to obtain a word vector, a weight corresponding to the word is obtained by combining the user-defined word bank, and a sentence vector is formed based on the word vector and the weight and serves as the feature data to be trained.
Optionally, the matching model is a model constructed by using a BiLSTm model corresponding to different traffic classes.
Optionally, the bibliographic model is a model built for all business classifications.
The data processing apparatus for a business problem according to the present invention includes:
the system comprises a feature extraction module, a feature extraction module and a feature analysis module, wherein the feature extraction module is used for extracting a keyword for a business problem input by a user, acquiring a user portrait of the user and generating feature data based on the keyword and the user portrait;
the classification module is used for inputting the characteristic data into a classification model trained in advance and obtaining a service classification corresponding to the service problem by using the classification model; and
and the first matching module is used for inputting the characteristic data into a matching model corresponding to the business classification based on the business classification, and obtaining a first solution corresponding to the business question by using the matching model, wherein the corresponding matching model is trained in advance corresponding to different business classifications.
Optionally, further comprising:
the first judging module is used for judging whether the similarity of the first solution output by the first matching module is higher than a preset first threshold, if so, taking the first solution output by the first matching module as a final result and ending the process, otherwise, further continuing the following second matching module; and
and the second matching module is used for inputting the characteristic data into a pre-trained pocket bottom model and obtaining a second solution corresponding to the business problem by using the pocket bottom model.
Optionally, further comprising:
the second judging module is used for judging whether the similarity of the second solution output by the second matching module is higher than a preset second threshold, if so, taking the second solution output by the second matching module as a final result and ending the process, otherwise, further continuing the following switching module; and
and the switching module is used for switching to manual processing to solve the service problem.
Optionally, the feature extraction module includes:
the word segmentation submodule is used for carrying out word segmentation on basic data related to the business problem to obtain a word vector;
the weight obtaining submodule is used for combining a pre-formed user-defined word bank to obtain the weight corresponding to the word; and
a concatenation submodule for concatenating the sentence vectors based on the word vectors and the weights and serving as the feature data,
the user-defined word stock is formed by a multi-round word segmentation device.
Optionally, the classification model is constructed based on training of a support vector machine algorithm for collecting known business problems and user portraits.
Optionally, the classification model is constructed by:
collecting basic data related to business problems and user portrayal;
carrying out feature extraction on the basic data to obtain feature data to be trained;
giving a business label to the characteristic data to be trained, wherein the business label is used for marking the business classification corresponding to the characteristic data to be trained; and
and training the feature data to be trained and the service labels by adopting a support vector machine algorithm of a kernel function based on probability information to obtain a classification model.
Optionally, the support vector machine algorithm is a kernel density estimation function, and is represented by the following formula:
Figure BDA0003384572550000051
wherein n is the number of samples, h is the bandwidth, x i Are independent and equally distributed sample points.
Optionally, the support vector machine algorithm is a kernel density estimation function, and is represented by the following formula:
Figure BDA0003384572550000052
wherein x is i For independently identically distributed sample points, σ is the variance and μ is the mean.
Optionally, the matching model is a model constructed by using a BiLSTm model corresponding to different traffic classes.
Optionally, the bibliographic model is a model built for all business classifications.
The computer-readable medium of the present invention, on which a computer program is stored, is characterized in that,
the computer program, when executed by a processor, implements the data processing method for business problems.
The computer device of the present invention includes a storage module, a processor, and a computer program stored on the storage module and capable of running on the processor, and is characterized in that the processor implements the data processing method for the business problem when executing the computer program.
Drawings
Fig. 1 shows a schematic diagram of a CC attack being launched to a target server by a plurality of proxy servers.
FIG. 2 is a schematic diagram of one example of a multi-round participle.
Fig. 3 is a block diagram of a data processing apparatus for business problem of the present invention.
Detailed Description
The following description is of some of the various embodiments of the invention and is intended to provide a basic understanding of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention.
For the purposes of brevity and explanation, the principles of the present invention are described herein with reference primarily to exemplary embodiments thereof. However, those skilled in the art will readily recognize that the same principles are equally applicable to all types of business problem-oriented data processing methods and business problem-oriented data processing apparatuses, and that these same principles, as well as any such variations, may be implemented therein without departing from the true spirit and scope of the present patent application.
Moreover, in the following description, reference is made to the accompanying drawings that illustrate certain exemplary embodiments. Electrical, mechanical, logical, and structural changes may be made to these embodiments without departing from the spirit and scope of the present invention. In addition, while a feature of the invention may have been disclosed with respect to only one of several implementations/embodiments, such feature may be combined with one or more other features of the other implementations/embodiments as may be desired and/or advantageous for any given or identified function. The following description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims and their equivalents.
Words such as "comprising" and "comprises" mean that, in addition to having elements (modules) and steps which are directly and explicitly stated in the description and the claims, the technical solution of the invention does not exclude the case of having other elements (modules) and steps which are not directly or explicitly stated.
Before explaining the data processing method for business problems and the data processing apparatus for business problems of the present invention, several terms will be explained.
(1)SVM
A Support Vector Machine (SVM) is a generalized linear classifier (generalized linear classifier) that binary classifies data according to a supervised learning (supervised learning) manner, and a decision boundary of the SVM is a maximum-margin hyperplane (maximum-margin hyperplane) that solves learning samples.
The SVM calculates an empirical risk (empirical risk) using a hinge loss function (change loss) and adds a regularization term to a solution system to optimize a structural risk (structural risk), which is a classifier with sparsity and robustness. SVMs can be classified non-linearly by a kernel method, which is one of the common kernel learning methods.
(2) A priori knowledge
Prior knowledge (prior knowledge) is knowledge that precedes experience. "prior to" also refers to various forms of rationality in the demonstration, which do not rely on sensory or other types of experience.
(3)BiLSTM
BilSTM is an abbreviation for Bi-directional Long Short-Term Memory, and is composed of forward LSTM and backward LSTM. It is well suited for context-dependent sequence annotation tasks and is therefore often used in NLP to model context information.
(4) Probability density
Probability refers to the probability of an event occurring randomly, and for a uniformly distributed function, probability density refers to the probability of a segment (the range of values of an event) divided by the length of the segment, and its value is non-negative.
Hereinafter, a data processing method and a data processing apparatus for a business problem according to the present invention will be described.
First, the technical idea of the data processing method for business problems of the present invention is introduced.
In the practical application scene of the automatic answer of the business problems, the inventor of the invention finds that the common user problems are more diverse, and the same consultation problem may relate to different business contents, so that the corresponding answer cannot be well matched based on the original intelligent pushing system, the matching difficulty is high, if the manual scene is changed, a first order receiver usually needs to carry out related conversation, and then manually judges and transfers to a two-line support person in corresponding classification, the circulation difficulty and the solution time are increased, and the whole conversation sample data has the characteristics of small quantity, high dispersity and the like.
In order to solve the problems and improve the effectiveness and accuracy of reply aiming at the business problems, the invention provides a method for classifying the problems of users by adopting an SVM (support vector machine) algorithm and a probability density kernel function mode, then pushing the best answer by adopting a matching algorithm based on the acquaintance degree, and simultaneously accurately switching to a corresponding processor based on the predicted business classification in the manual processing after the business classification.
Fig. 1 is a flow chart illustrating a data processing method for business problems according to the present invention.
As shown in fig. 1, the data processing method for service problem of the present invention includes:
a feature extraction step S100: extracting key words for business problems input by a user, acquiring a user portrait of the user, and generating characteristic data based on the key words and the user portrait;
a classification step S200: inputting the characteristic data into a classification model trained in advance, and obtaining a business classification corresponding to the business problem by using the classification model, wherein the classification model is used for representing the corresponding relation between the characteristic data and the business classification;
first matching step S300: inputting the feature data into a matching model corresponding to the business classification based on the business classification, and obtaining a first solution corresponding to the business question by using the matching model, wherein different matching models are trained in advance corresponding to different business classifications, and the matching model is used for representing the corresponding relation between the business question under the business classification and the corresponding solution;
first determination step S400: judging whether the similarity of the first solution output in the first matching step S300 is higher than a preset first threshold, if so, jumping to the first output step S500, otherwise, continuing to the following second matching step S600;
first output step S500: taking the first solution as a final result and ending the process;
second matching step S600: inputting the characteristic data into a pre-trained pocket bottom model, obtaining a second solution corresponding to the business problem by using the pocket bottom model, and using a matching model to express the corresponding relation between the business problem under all business classifications and the corresponding solution;
second determination step S700: judging whether the similarity of the second solution output in the second matching step S600 is higher than a preset second threshold, if so, going to a second output step S800, otherwise, going to step S900;
second output step S800: taking the second solution as a final result and ending the process; and a transfer step S900: and switching manual processing to solve the business problem.
In the above flow, the data processing method for the business problem according to the present invention can be realized by providing at least the feature extraction step S100, the classification step S200, and the first matching step S300 because the business problem can be classified first and then matched to the solution under the classification by generating feature data by combining the keyword of the business problem input by the user and the user image, classifying the business using the classification model based on the feature data, and further inputting the feature data to the matching model under the business classification to match to obtain the corresponding solution, thereby locating the classification and accurately matching the classification to the solution.
On this basis, it is further determined whether the similarity of the first solution output by the first matching step S300 is higher than a preset first threshold, if not, the feature data is input into a pre-trained pocket bottom model, a second solution corresponding to the business problem is obtained by using the pocket bottom model, and if the similarity of the second solution is not higher than a preset second threshold, the manual processing may be set to be switched to, so as to solve the business problem.
Here, the matching model employed in the present invention is a model constructed using the BiLSTm model corresponding to different traffic classes. The bottom-pocketed model employed in the present invention is a model constructed for all business classifications.
The following describes the details of the above steps.
In the present invention, as an example, in the feature extraction step S100, the following steps may be included:
a word segmentation sub-step, which is used for carrying out word segmentation processing on basic data related to business problems to obtain word vectors;
a weight obtaining sub-step, namely combining a pre-formed user-defined word library to obtain the weight corresponding to the word; and
a concatenation substep of concatenating constituent sentence vectors as the feature data based on the word vectors and the weights,
the user-defined word stock is formed by a multi-round word segmentation method. In the invention, the best self-defined word bank is obtained by adopting a multi-round word segmentation mode. The data source of the self-defined word stock mainly comprises a standard summary of related services, historical conversation records of stock, artificial auxiliary tags and the like.
FIG. 2 is a schematic diagram of one example of a multi-round word segmentation.
As shown in fig. 2, an example process of multiple rounds of word segmentation includes:
step S1: forming a word bank A through the key terms of the existing rule specification and a default word bank (such as synonyms);
step S2: performing a round of word segmentation on the rule specification and the existing knowledge base by using a word base A to generate a corresponding word base B;
and step S3: performing corresponding combination on the words in the word bank B (mainly combining every two words, performing three-three combination according to business needs, and the like), and adding the words into the word bank A to form a new word bank C;
and step S4: performing related word segmentation on the existing knowledge base by using the word base C again to obtain keywords to form a word base D; and
step S5: and combining the word bank D and the word bank A to obtain a final word bank.
As described above, the keyword can be made clearer by adopting plural rounds of deletion and selection of high-frequency words.
On the other hand, the user image mainly includes a task record, a session record, and the like of the user stock. Such as browsing business records of a new business website, business contents of latest telephone mail consultation, and some artificial tags.
The classification model used in the classification step S200 is a classification model constructed based on a support vector machine algorithm (SVM). The reason for adopting the support vector machine algorithm in the invention is that the following characteristics of the support vector machine algorithm are utilized:
(1) The SVM data volume is not large, and the method is suitable for a small sample learning method; and (2) SVM has clear division and high classification precision.
Therefore, the characteristic of the SVM can be well utilized in the invention, so that the method does not need too much data volume and can obtain higher classification precision.
As an example, the classification model is constructed by:
a collection sub-step, collecting basic data related to business problems and user portrayal;
a characteristic extraction substep, which is to obtain characteristic data to be trained after characteristic extraction is carried out on the basic data;
a marking sub-step, namely endowing the characteristic data to be trained with a business label, wherein the business label is used for marking the business classification corresponding to the characteristic data to be trained; and
and a training sub-step, adopting a support vector machine algorithm of a kernel function based on probability information to train the feature data to be trained and the service label and obtain a classification model.
To explain the construction process of the classification model more specifically, the following description will be given by way of an example.
One exemplary classification model building process includes:
(1) Extracting key features
In the invention, the adopted word and sentence characteristics mainly comprise quantity characteristics, distance characteristics, sequence characteristics, keywords, the weight of corresponding word vectors (for data in a word bank, the corresponding weight is higher, for example, 0.5 is defaulted, artificial configuration can be carried out) and user characteristics (service information browsed last time, service information to be developed and the like), M keywords and corresponding weights (for example, M = 5) are selected in the invention, and if the participles obtained by corresponding sentences are not enough to be M keywords, the existing words are supplemented by adopting a random extraction mode in the invention.
(2) Training model
The existing data is marked with corresponding service labels, and then classification is carried out by adopting a support vector machine algorithm of a kernel function based on probability information.
Here, it is assumed that the corresponding features of the questions posed by the user for the same category are all gaussian-distributed.
Then for the n sample points of the independent distribution, assuming its probability density function is f, the kernel density estimate can be obtained as:
Figure BDA0003384572550000111
wherein n is the number of samples, h is the bandwidth, x i Are independent and equally distributed sample points.
The probability density function of gaussian distribution can also be used as a kernel function, and the probability distribution function of gaussian distribution can also be expanded into a new kernel function by combining the relationship between the probability distribution function and the probability density function. By analyzing the integral form of the gaussian kernel function and introducing the concepts of mean and variance at the same time, a new kernel function is formed in the invention:
Figure BDA0003384572550000121
wherein x is i For independently identically distributed sample points, σ is the variance and μ is the mean.
And training the existing data by adopting a support vector machine algorithm based on the kernel function to obtain a corresponding classification model.
As an example, in the data preparation stage, 1000 existing session records are selected, and the data is cleaned, mainly the picture data and the repeated data are deleted, and 752 test data are cleaned. Then, two-thirds of the data are selected as training data, and the other third are selected as prediction data.
And then, performing word segmentation processing on the training data, combining with a user-defined word bank, and simultaneously outputting the weight corresponding to each word to obtain intermediate data. Then, feature extraction is performed on the intermediate data and the user image data, corresponding word vectors (200-dimensional data) are output, corresponding weights are added to the word vectors to form corresponding sentence vectors (1005-dimensional data), and a classification model based on a support vector machine algorithm is generated for the sentence vectors of the test data based on a probability information kernel.
Then, the training data is subjected to prediction classification, and a neural network classification method and KNN are adopted to be compared with the support vector machine algorithm of the invention, and the prediction results are compared as follows:
prediction method adopted Rate of accuracy
Support vector machine algorithm of the invention 85.56%
Neural network classification method 82.48%
KNN (processed with reduced dimension) 70.12%
As shown in the table above, the support vector machine algorithm of the present invention can predict the result more accurately than the other two algorithms.
In addition, if, on the other hand, the business problem changes to a human, the handler will correct the relevant session record and label the corresponding business classification. Moreover, the optimization learning of the business classification model can be periodically set.
As described above, according to the data processing method for business problems of the present invention, for a business problem input by a user, a keyword is first extracted from the business problem, a user figure of the user is then obtained, a word vector is generated after feature extraction, and the word vector is input to an SVM-based classification model, thereby obtaining a business classification of the business problem. Further, a matching model corresponding to the business classification and a knowledge base corresponding to the classification are obtained according to the business classification result, then a solution for the business question is obtained by using the corresponding matching model, if the similarity obtained by the solution is smaller than a threshold value, namely the business classification is not matched with a proper solution, the business question enters a pocketed bottom model, the corresponding solution is matched through the total amount of the pocketed bottom model, if the similarity of the solution obtained by the pocketed bottom model is also smaller than the threshold value, a manual link is entered, and otherwise, the solution with the acquaintance degree larger than the threshold value is pushed to serve as a final solution.
If the answers meeting the conditions are not obtained through the matching, the method can be set to be switched into a worker link, and the workers need to be switched into corresponding groups according to the problem classification. Firstly, session records of a user corresponding to the session are obtained, then classification results of different records are obtained through a classification model according to the records, then a mode is selected from the classification results, and if a plurality of classification modes exist, a first classification result is preferentially selected. And according to the obtained service classification, the service classification is transferred to a processor corresponding to the service classification, and the processor finally performs accuracy feedback for further optimization learning of the classification model.
Hereinafter, a data processing apparatus for solving a business problem according to the present invention will be described.
Fig. 3 is a block diagram of a data processing apparatus for business problem of the present invention.
As shown in fig. 3, the data processing apparatus for business problem of the present invention includes:
a feature extraction module 100, configured to extract a keyword for a business problem input by a user, and configured to obtain a user profile of the user, and generate feature data based on the keyword and the user profile;
the classification module 200 is configured to input the feature data into a classification model trained in advance, and obtain a service classification corresponding to the service problem by using the classification model;
a matching module 300, configured to input the feature data into a matching model corresponding to the service classification based on the service classification, and obtain a first solution corresponding to the service question by using the matching model, where the matching model is trained in advance for different service classifications;
a first determining module 400, configured to determine whether a similarity of a first solution output by the first matching module is higher than a preset first threshold, if so, take the first solution output by the first matching module as a final result and end the process, otherwise, continue the following second matching module;
the pocket bottom module 500 is configured to input the feature data into a pre-trained pocket bottom model, and obtain a second solution corresponding to the business problem by using the pocket bottom model;
a second determining module 600, configured to determine whether a similarity of a second solution output by the second matching module is higher than a preset second threshold, if the similarity is higher than the second threshold, take the second solution output by the second matching module as a final result and end the process, otherwise, further continue the following switching module; and
and the switching module 700 is used for switching to manual processing.
Wherein, the feature extraction module 100 includes:
the word segmentation submodule is used for carrying out word segmentation on basic data related to the business problem to obtain a word vector;
the weight obtaining submodule is used for combining a pre-formed user-defined word bank to obtain the weight corresponding to the word; and
a concatenation submodule for concatenating the sentence vectors based on the word vectors and the weights and serving as the feature data,
the user-defined word stock is formed by a multi-round word segmentation device.
The classification model is constructed by training collected known business problems and user portraits based on a support vector machine algorithm. The classification model is constructed in the following way:
collecting basic data related to business problems and user portrayal;
carrying out feature extraction on the basic data to obtain feature data to be trained;
endowing the feature data to be trained with a business label, wherein the business label is used for marking the business classification corresponding to the feature data to be trained; and
and training the feature data to be trained and the service label by adopting a support vector machine algorithm to obtain a classification model.
Wherein, the support vector machine algorithm is a kernel density estimation function and is represented by the following formula:
Figure BDA0003384572550000151
wherein n is the number of samples, h is the bandwidth, x i Are independent and equally distributed sample points.
Wherein, the support vector machine algorithm is a kernel density estimation function and is represented by the following formula:
Figure BDA0003384572550000152
wherein x is i For independently identically distributed sample points, σ is the variance and μ is the mean.
Wherein, the matching model is a model constructed by adopting a BilSTM model corresponding to different service classifications.
Wherein the bottom-pocketed model is a model constructed for all business classifications.
In the data processing device for the business problem of the invention, the invention can be realized as long as at least the feature extraction module 100, the classification module 200 and the matching module 300 are provided, the feature extraction module 100 extracts feature data by combining keywords of the business problem input by a user and the user portrait, then the business classification is carried out by utilizing the classification model provided by the feature data classification module 200 according to the feature data, and then the feature data is further input into the matching model 300 under the business classification for matching to obtain a corresponding answer, so that the business problem can be classified firstly and then matched to the answer under the classification, and the classification can be positioned and the answer can be accurately matched.
On the basis, if the solution obtained by the matching model 300 is lower than the threshold, the feature data is sent to the bottom module 500 again for matching to obtain a corresponding solution, and if the solution is lower than the threshold, the business question is transferred to the manual work for processing.
As described above, according to the data processing method for business problems and the data processing apparatus for business problems of the present invention, classification and positioning of user problems can be effectively achieved based on business classification, knowledge points that a user wants to know in a focused manner can be accurately matched for a plurality of business points related to different consultation problems, and accurate answers can be pushed; the service classification to which the description is ambiguous can be well distinguished; for knowledge bases describing different services for similar problems, relevant answers of corresponding classifications can be pushed out through judgment of user priori knowledge; after the transfer to the manual processing stage, the transfer to the professional with the corresponding service skill can be carried out according to the service classification.
The above examples mainly describe the data processing method and the data processing apparatus for business problems of the present invention. Although only a few specific embodiments of the invention have been described, those skilled in the art will recognize that the invention can be embodied in many other forms without departing from the spirit or scope thereof. Accordingly, the present examples and embodiments are to be considered as illustrative and not restrictive, and various modifications and substitutions may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims.

Claims (24)

1. A data processing method for business problems, comprising:
a feature extraction step, extracting key words for business problems input by a user, acquiring a user portrait of the user, and generating feature data based on the key words and the user portrait;
a classification step, namely inputting the characteristic data into a classification model trained in advance, and obtaining a service classification corresponding to the service problem by using the classification model; and
and a first matching step of inputting the characteristic data into a matching model corresponding to the business classification based on the business classification, and obtaining a first solution corresponding to the business problem by using the matching model, wherein the corresponding matching model is trained in advance corresponding to different business classifications.
2. The data processing method for business problems of claim 1, further comprising after the matching step:
a first judging step of judging whether the similarity of the first solution output by the first matching step is higher than a preset first threshold, if so, taking the first solution output by the first matching step as a final result and ending the process, otherwise, further continuing the following second matching step; and
and a second matching step, namely inputting the characteristic data into a pre-trained bottom-of-pocket model, and obtaining a second solution corresponding to the business problem by using the bottom-of-pocket model.
3. The data processing method for business problems of claim 1, further comprising, after the second matching step:
a second judging step of judging whether the similarity of the second solution output by the second matching step is higher than a preset second threshold, if so, taking the second solution output by the second matching step as a final result and ending the flow, otherwise, further continuing the following switching step; and
and switching to manual processing to solve the service problem.
4. The data processing method for business problems of claim 1, wherein the feature extraction step comprises:
a word segmentation sub-step, which is used for carrying out word segmentation processing on basic data related to business problems to obtain word vectors;
a weight obtaining sub-step, namely combining a pre-formed user-defined word library to obtain the weight corresponding to the word; and
a concatenation substep of concatenating constituent sentence vectors as the feature data based on the word vectors and the weights,
the user-defined word stock is formed by a multi-round word segmentation method.
5. The data processing method for business problems of claim 1,
the classification model is constructed by training collected known business problems and user portraits based on a support vector machine algorithm.
6. The data processing method for business problems of claim 5, wherein the classification model is constructed by:
a collection sub-step of collecting basic data related to business problems and user figures;
a feature extraction sub-step, namely extracting features of the basic data to obtain feature data to be trained;
a marking substep, namely endowing the characteristic data to be trained with a business label, wherein the business label is used for marking the business classification corresponding to the characteristic data to be trained; and
and a training substep, adopting a support vector machine algorithm to train the feature data to be trained and the service label and obtain a classification model.
7. The data processing method for business problems of claim 5,
the support vector machine algorithm is a kernel density estimation function and is represented by the following formula:
Figure FDA0003384572540000021
where n is the number of samples, h is the bandwidth, x i Are independent and equally distributed sample points.
8. The data processing method for business problems of claim 5,
the support vector machine algorithm is a kernel density estimation function and is represented by the following formula:
Figure FDA0003384572540000031
wherein x is i For independently identically distributed sample points, σ is the variance and μ is the mean.
9. A data processing method for business problems according to claim 5, characterized in that said collecting sub-step comprises:
acquiring a custom word bank related to a business problem by a multi-round word segmentation method; and
and constructing the user portrait according to the relevant information of the user.
10. The data processing method for business problems of claim 9,
in the characteristic extraction substep, word segmentation processing is carried out on basic data to obtain word vectors, weights corresponding to the words are obtained by combining the user-defined word bank, and sentence vectors are formed on the basis of the word vectors and the weights and serve as the characteristic data to be trained.
11. The data processing method for business problems of claim 1,
the matching model is a model constructed by adopting a BilSTM model corresponding to different service classifications.
12. The data processing method for business problems of claim 2,
the bottom-pocketed model is a model constructed for all business classifications.
13. A data processing apparatus for business problems, comprising:
the system comprises a feature extraction module, a feature extraction module and a feature analysis module, wherein the feature extraction module is used for extracting a keyword for a service problem input by a user, acquiring a user portrait of the user and generating feature data based on the keyword and the user portrait;
the classification module is used for inputting the characteristic data into a classification model trained in advance and obtaining a service classification corresponding to the service problem by using the classification model; and
and the first matching module is used for inputting the characteristic data into a matching model corresponding to the business classification based on the business classification, and obtaining a first solution corresponding to the business problem by using the matching model, wherein the corresponding matching model is trained in advance corresponding to different business classifications.
14. The data processing apparatus for business concern of claim 13, further comprising:
the first judging module is used for judging whether the similarity of the first solution output by the first matching module is higher than a preset first threshold, if so, taking the first solution output by the first matching module as a final result and ending the process, otherwise, further continuing the following second matching module; and
and the second matching module is used for inputting the characteristic data into a pre-trained pocket bottom model and obtaining a second solution corresponding to the business problem by using the pocket bottom model.
15. The data processing apparatus for business concern of claim 13, further comprising:
the second judging module is used for judging whether the similarity of the second solution output by the second matching module is higher than a preset second threshold, if so, taking the second solution output by the second matching module as a final result and ending the process, otherwise, further continuing the following switching module; and
and the switching module is used for switching to manual processing.
16. The data processing apparatus for business problem of claim 13, wherein the feature extraction module comprises:
the word segmentation submodule is used for carrying out word segmentation on basic data related to the business problem to obtain a word vector;
the weight obtaining submodule is used for combining a pre-formed user-defined word bank to obtain the weight corresponding to the word; and
a concatenation submodule for concatenating the sentence vectors based on the word vectors and the weights and using the words and the weights as the feature data,
the user-defined word stock is formed by a multi-round word segmentation device.
17. The data processing apparatus for business problem of claim 13,
the classification model is constructed by training collected known business problems and user portraits based on a support vector machine algorithm.
18. The data processing apparatus for business problems of claim 17, wherein the classification model is constructed by:
collecting basic data related to business problems and user portraits;
carrying out feature extraction on the basic data to obtain feature data to be trained;
giving a business label to the characteristic data to be trained, wherein the business label is used for marking the business classification corresponding to the characteristic data to be trained; and
and training the feature data to be trained and the service labels by adopting a support vector machine algorithm of a kernel function based on probability information to obtain a classification model.
19. The data processing apparatus for business problem of claim 18,
the support vector machine algorithm is a kernel density estimation function and is represented by the following formula:
Figure FDA0003384572540000051
where n is the number of samples, h is the bandwidth, x i Are independent and equally distributed sample points.
20. The data processing apparatus for business problem of claim 18,
the support vector machine algorithm is a kernel density estimation function and is represented by the following formula:
Figure FDA0003384572540000052
wherein x is i For independently identically distributed sample points, σ is the variance and μ is the mean.
21. The data processing apparatus for business problem of claim 13,
the matching model is a model constructed by adopting a BilSTM model corresponding to different service classifications.
22. The data processing apparatus for business problem of claim 14,
the bottom-of-pocket model is a model constructed for all business classifications.
23. A computer-readable medium, having stored thereon a computer program,
which computer program, when being executed by a processor, carries out a data processing method for business problems according to any one of claims 1 to 12.
24. A computer device comprising a memory module, a processor, and a computer program stored on the memory module and executable on the processor,
the processor implements the data processing method for business problems of any one of claims 1 to 12 when executing the computer program.
CN202111447931.2A 2021-11-30 2021-11-30 Data processing method and device for business problems Pending CN115344690A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111447931.2A CN115344690A (en) 2021-11-30 2021-11-30 Data processing method and device for business problems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111447931.2A CN115344690A (en) 2021-11-30 2021-11-30 Data processing method and device for business problems

Publications (1)

Publication Number Publication Date
CN115344690A true CN115344690A (en) 2022-11-15

Family

ID=83946693

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111447931.2A Pending CN115344690A (en) 2021-11-30 2021-11-30 Data processing method and device for business problems

Country Status (1)

Country Link
CN (1) CN115344690A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115766489A (en) * 2022-12-23 2023-03-07 中国联合网络通信集团有限公司 Data processing apparatus, method and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115766489A (en) * 2022-12-23 2023-03-07 中国联合网络通信集团有限公司 Data processing apparatus, method and storage medium

Similar Documents

Publication Publication Date Title
CN111523621B (en) Image recognition method and device, computer equipment and storage medium
CN110222140B (en) Cross-modal retrieval method based on counterstudy and asymmetric hash
CN108563722B (en) Industry classification method, system, computer device and storage medium for text information
CN105608477B (en) Method and system for matching portrait with job position
US7444279B2 (en) Question answering system and question answering processing method
CN111126396B (en) Image recognition method, device, computer equipment and storage medium
US11449788B2 (en) Systems and methods for online annotation of source data using skill estimation
CN116261731A (en) Relation learning method and system based on multi-hop attention-seeking neural network
CN110347791B (en) Topic recommendation method based on multi-label classification convolutional neural network
JP6738769B2 (en) Sentence pair classification device, sentence pair classification learning device, method, and program
CN113806582B (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
KR20190008699A (en) Method, system and computer program for semantic image retrieval based on topic modeling
WO2020024444A1 (en) Group performance grade recognition method and apparatus, and storage medium and computer device
CN112258250A (en) Target user identification method and device based on network hotspot and computer equipment
CN114881173A (en) Resume classification method and device based on self-attention mechanism
CN110457523B (en) Cover picture selection method, model training method, device and medium
CN115344690A (en) Data processing method and device for business problems
CN112800226A (en) Method for obtaining text classification model, method, device and equipment for text classification
CN110162535B (en) Search method, apparatus, device and storage medium for performing personalization
CN111767404A (en) Event mining method and device
CN115455939A (en) Chapter-level event extraction method, device, equipment and storage medium
CN113673237A (en) Model training method, intent recognition method, device, electronic equipment and storage medium
Dey et al. Analysis of machine learning algorithms by developing a phishing email and website detection model
CN117349512B (en) User tag classification method and system based on big data
CN113641845B (en) Depth feature contrast weighted image retrieval method based on vector contrast strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination