CN112561530A - Transaction flow processing method and system based on multi-model fusion - Google Patents

Transaction flow processing method and system based on multi-model fusion Download PDF

Info

Publication number
CN112561530A
CN112561530A CN202011567495.8A CN202011567495A CN112561530A CN 112561530 A CN112561530 A CN 112561530A CN 202011567495 A CN202011567495 A CN 202011567495A CN 112561530 A CN112561530 A CN 112561530A
Authority
CN
China
Prior art keywords
transaction
model
label
transaction flow
training set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011567495.8A
Other languages
Chinese (zh)
Inventor
李振
尹正
张刚
鲍东岳
刘昊霖
傅佳美
赵希
任鹏飞
李千惠
黑小波
刘蓓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Minsheng Science And Technology Co ltd
Original Assignee
Minsheng Science And Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Minsheng Science And Technology Co ltd filed Critical Minsheng Science And Technology Co ltd
Priority to CN202011567495.8A priority Critical patent/CN112561530A/en
Publication of CN112561530A publication Critical patent/CN112561530A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/389Keeping log of transactions for guaranteeing non-repudiation of a transaction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Image Analysis (AREA)

Abstract

A transaction flow processing method and a system based on multi-model fusion relate to the technical field of intelligent classification, and the method comprises the following steps of S1: collecting trade flow samples to construct a training set; s2: preprocessing the training set to obtain an input vector of each transaction running water sample; s3: respectively substituting the input vectors into LightGBM, SVM and Softmax for training, and completing prediction of the to-be-classified transaction pipelining first-level label by combining the trained three models; s4: and on the basis of the predicted primary label of S3, completing the prediction of the secondary label of the transaction pipeline to be classified by using a convolutional neural network model. The method disclosed by the invention adopts a hierarchical classification system, can quickly and accurately display the consumption type labels to the user, has the characteristics of error correction and autonomous learning, generates good interaction with the user and improves the user experience.

Description

Transaction flow processing method and system based on multi-model fusion
Technical Field
The invention relates to the technical field of intelligent classification, in particular to a transaction flow processing method and a transaction flow processing system based on multi-model fusion.
Background
The receiving and payment details refer to data recording all transaction flow of the account corresponding to the customer, and generally include basic information such as transaction time, customer name, transaction amount and the like for the customer to check. Nowadays, with the development of the internet, a new era of mobile phone payment is started by convenient payment, online and offline transaction amount and transaction flow are rapidly increased, meanwhile, the demand of people on the collection and payment function is continuously improved, and the real-time and transparent collection and payment detailed data becomes a great trend of the financial payment market.
According to statistics, the 'view of the receipt and payment details' becomes one of the most frequently used functions of the current customer after logging in the online bank APP, the content and the display form of the receipt and payment details are optimized based on the user requirements, the user experience can be greatly improved, and therefore the user viscosity is enhanced. At present, most of expenditure details displayed to a user by an online bank APP only comprise basic information such as transaction time, transaction account, customer name, transaction amount and the like, and personalized information such as consumption type of each expenditure, total consumption type proportion and the like does not have presentation of related data, so that the user does not have clear and intuitive knowledge on income or expenditure, income source or channel, expenditure destination or type of individual single transaction.
Disclosure of Invention
In view of this, the invention provides a transaction flow processing method and system based on multi-model fusion, which adopts a hierarchical classification system and provides a transaction flow processing system based on machine learning multi-model fusion and a convolutional neural network model. The system can realize real-time marking of user transaction flow, can quickly and accurately display the consumption type label to the user, has the characteristics of error correction and autonomous learning, generates good interaction with the user, and improves user experience.
In order to achieve the purpose, the invention adopts the following technical scheme:
according to a first aspect of the present invention, there is provided a transaction pipeline processing method based on multi-model fusion, the method comprising the following steps:
s1: collecting a transaction flow sample to construct a training set, wherein the sample comprises transaction flow data and a first-level label and a second-level label corresponding to the transaction flow data;
s2: preprocessing the training set to obtain an input vector of each transaction running water sample;
s3: respectively substituting the input vectors into a light gradient lifting regression tree model, a support vector machine model and a logistic regression model for training, and completing prediction of the to-be-classified trade pipelining primary label by combining the trained three models;
s4: on the basis of the predicted primary label of S3, the prediction of the secondary label of the transaction pipeline to be classified is completed by utilizing a convolutional neural network model;
s5: and (4) correcting the error prediction result to form a training set as a new sample, and repeating the steps S2-S4 to complete the optimization of the model.
Further, the transaction flow data in S1 includes a plurality of fields, and the fields include name, remark, amount, and transaction time.
Further, the S2 specifically includes:
s21: removing special characters and stop words in the training set transaction flow data;
s22: performing word segmentation processing on each field of the transaction flow data;
s23: converting the words obtained after word segmentation into word vectors;
s24: accumulating all word vectors of each field and taking the average value to obtain a field vector;
s25: and splicing the field vectors of each sample to obtain an input vector set of each sample.
Further, the word-to-word vector conversion is completed by adopting the word2vec model in the step S23, so that the defects of huge dimensionality disasters and vector sparsity generated when a dictionary is constructed by a discrete representation model can be overcome.
Further, the S3 specifically includes:
s31: inputting the transaction flow data in the training set and the primary labels corresponding to the transaction flow data into a light gradient lifting regression tree model for training, and predicting the probability P that the transaction flow data to be classified belongs to each primary label based on the trained light gradient lifting regression tree modelL(j) Wherein j represents the number of the primary label;
s32, inputting the transaction running water data in the training set and the first-level labels corresponding to the transaction running water data into a support vector machine model for training, and predicting the probability P that the transaction running water to be classified belongs to each first-level label based on the trained support vector machine modelSVM(j);
S33, inputting the transaction flow data in the training set and the first-level labels corresponding to the transaction flow data into a logistic regression model for training, and predicting the probability P that the transaction flow to be classified belongs to each first-level label based on the trained logistic regression modelS(j);
S34: calculating the probability average value P of each primary labelj=(PL(j)+PSVM(j)+PS(j) 3) selecting the primary label with the maximum average probability as the transaction flow to be classifiedThe primary label of water predicts the result.
Further, the S4 specifically includes:
s41: dividing the training set into a plurality of sub-training sets according to the first-level labels in the training set;
s42: inputting the transaction running water data contained in each sub-training set and the secondary labels corresponding to the transaction running water data into a convolutional neural network for training to obtain a plurality of convolutional neural network models corresponding to each primary label;
s43: and selecting a convolutional neural network model corresponding to the to-be-classified transaction flow primary label output by the S3 to complete the prediction of the to-be-classified transaction flow secondary label.
Further, the convolutional neural network model includes:
the input layer is used for converting the transaction flow samples into neural network input vectors;
the convolutional layer is used for extracting text features in each neural network input vector;
the pooling layer is used for screening out important features from the text features;
the full connection layer is used for connecting the important features to the classifier to obtain the probability that the transaction flow to be classified belongs to each secondary label;
and the prediction layer is used for outputting the secondary label corresponding to the maximum probability as a prediction result.
According to a second aspect of the present invention, there is provided a transaction pipeline processing system based on multi-model fusion, comprising:
the system comprises a sample collection module, a training set generation module and a training set generation module, wherein the sample collection module is used for collecting transaction flow samples to construct a training set, and the samples comprise transaction flow data and a primary label and a secondary label which correspond to the transaction flow data;
the sample processing module is used for preprocessing the training set to obtain an input vector of each transaction flow sample;
the first-level label prediction module is used for substituting the input vectors into the light gradient lifting regression tree, the support vector machine and the logistic regression model respectively for training, and completing the prediction of the first-level label of the flow of the to-be-classified transaction by utilizing a majority voting method based on the output results of the trained three models;
the secondary label prediction module is used for completing the prediction of the to-be-classified transaction flow secondary label by utilizing a convolutional neural network model on the basis of the primary label predicted by the S3;
and the result optimization module is used for forming a training set by taking the corrected error prediction result as a new sample to complete the optimization of the model.
According to a third aspect of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of the method as set forth above.
According to a fourth aspect of the present invention, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method as described above when executing the program.
Compared with the prior art, the transaction flow processing method and the transaction flow processing system based on multi-model fusion have the following advantages:
the system adopts a hierarchical classification system, firstly adopts three models, namely a light gradient lifting regression tree model (LightGBM), a support vector machine model (SVM) and a logistic regression model (Softmax), to fuse votes for primary label classification, then adopts a convolutional neural network model (CNN) to perform secondary label classification on the basis of the primary label classification, and displays the result to the user. Meanwhile, the display of the system results enriches the user transaction detail display interface, supports the user to modify the marking results, provides a more intelligent accounting function for the user, improves the user experience, and improves the user click rate.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a detailed flow chart of the method of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terms "first," "second," and the like in the description and in the claims of the present disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
A plurality, including two or more.
And/or, it should be understood that, for the term "and/or" as used in this disclosure, it is merely one type of association that describes an associated object, meaning that three types of relationships may exist. For example, a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone.
The invention adopts a hierarchical classification system, firstly divides data into a plurality of first-level consumption labels, and then divides the data under each first-level label into second-level consumption labels under corresponding labels. The method resolves a large multi-classification problem into a plurality of sub-classification problems, can greatly reduce the number of labels corresponding to a single model, improves the precision of the model, and effectively avoids the condition of low model classification accuracy under the condition of more classes. Meanwhile, a plurality of models are trained in parallel under a hierarchical classification system, so that the speed can be ensured, and the overall efficiency of the system can be improved.
The invention has a self-learning mechanism and can periodically correct and update the model in the system according to the correction data.
The invention takes the characteristics of the CNN into consideration, and adopts the CNN model to process the bank transaction running water. The network can extract local features, considers the importance of local information, and is more suitable for the characteristic that significant key words in Chinese fields of running data can have key influence on results. Meanwhile, the extraction of the local features is equivalent to the capture of a plurality of different n-gram features of the text, and for one n-gram feature, a plurality of different filters extract useful information from different angles, and the combination mode of the plurality of information can learn the implicit corresponding relation between different fields in the flow data. Compared with other neural network models, the model has the advantages of difficult overfitting and high speed.
The method specifically comprises the following steps:
s1: the bank flow data consists of a plurality of fields: the characteristics are fields of client name, client remarks, bank remarks, money amount, transaction time and the like. And simultaneously marking a primary label and a secondary label for the transaction according to the transaction flow category. The obtained running water data and the corresponding labels form a training set of the following model.
S2: carrying out data preprocessing on the obtained training data set, wherein the data preprocessing steps are as follows:
s21, removing special characters and stop words from the Chinese field;
s22, the Chinese field is participled by using the ending participle, and the content corresponding to the final field is expressed as: s ═ w1,w2,w3,…,wlIn which wiMeans the word after the ith word segmentation processingThe term l is the number of words obtained after the Chinese field is processed by the above processing steps.
S23, using the word2vec model to represent the Chinese word vector. word2vec can represent the words as a dense, low-dimensional, real-valued vector, and has good semantic properties, suitable for business names with diversity in modeling data. The distributed vector representation can effectively overcome the defects of huge dimensionality disasters and very sparse vectors generated when a dictionary is constructed by a discrete representation model, such as a word bag model.
S24: finally, the vector representation of the whole Chinese field is calculated by accumulating and averaging all word vectors:
Figure BDA0002861388000000051
wherein S _ vec is a vector of corresponding contents of Chinese field, w _ veciA word vector representing each word.
S25: and splicing vectors corresponding to all Chinese fields to obtain the input of each sample:
X=concat(S_vec1;...;S_vecm)
where m represents the number of chinese fields.
And S3, training the training data by adopting three models, namely a light gradient lifting regression tree model (LightGBM), a support vector machine model (SVM) and a logistic regression model (Softmax), and then obtaining the first-level classification labels of the data through a voting mechanism. The three machine learning models are independent from each other, and the optimization functions and the processing characteristic modes in the models are different from each other, so that the correlation difference between the models is large, the repeatability of the obtained prediction result is lower than that of a result obtained by using a similar model, and the method is suitable for a mode of performing model fusion by adopting a voting mechanism. The method comprises the following specific operations:
s31, putting the training set data corresponding to the primary labels into a LightGBM model for training, and predicting the probability P of each primary label based on the trained LightGBM modelL(j) Wherein j represents one orderThe number of the label. The LightGBM model formula is as follows:
Figure BDA0002861388000000061
wherein f isk(xi) Representing the kth residual Tree vs. the ith sample xiF is the function space of the residual trees, each of which resembles a piecewise function from a functional point of view.
S32, putting the training set data only labeled with the first-level label into SVM model, predicting probability P of each first-level label of data by adopting OVR (i.e. classifying samples of a certain class into one class and classifying other residual samples into another class during training)SVM(j) The concrete formula is as follows:
PSVM(j)=max(f1(x),f2(x),...,fn(x))
wherein f isn(x) And j is the label corresponding to the maximum value in the prediction result for the nth SVM classifier.
S33, putting the training set data only labeled with the primary label into a Softmax regression model, and predicting the probability P of each primary label of the data based on the following formulaS(j);
Figure BDA0002861388000000062
Wherein, theta is a model parameter, n is a first-level label number, and x is a sample.
S34, calculating the probability average value by adopting a voting mechanism for the obtained three prediction probabilities, wherein the concrete formula is as follows:
Pj=(PL(j)+PSVM(j)+PS(j))/3
and finally, selecting the category with the highest probability as a primary label classification result.
And S4, performing secondary label prediction by adopting a Convolutional Neural Network (CNN) model on the basis of the primary label. As the transaction flow data magnitude of the bank can reach hundred million levels, the traditional machine learning model is slow to train. Compared with the traditional machine learning model, the neural network model can show strong learning ability under the condition of large data volume. The specific CNN training procedure is as follows:
s41, dividing the data set into n parts according to the primary labels (n is the number of the primary labels, data)iData corresponding to the ith tag):
Figure BDA0002861388000000071
s42: will dataiInput into CNN for training.
An input layer: the layer receives a vectorized representation of the streaming water data. The vector consists of a Chinese character vector, a category vector and a numerical value vector. The category variables (corresponding category fields in the transaction flow: transaction categories, transaction behaviors and the like) use an onehot coding processing mode to obtain corresponding vector representation C _ vec, vector dimensions in the onehot coding mode are category numbers, category representations are corresponding dimensions 1, and the other dimensions are all 0. The vector representation of the Chinese field in the input data adopts word vectors of splicing pre-training, the word vector char _ vec corresponding to each word in the Chinese field is spliced with the category vector C _ vec and the numerical value vector to obtain the vector representation x corresponding to each wordiI.e. by
xi=concat(char_veci;C_vec1;...;C_vecn;num1;...;numt)
Where n denotes the number of category fields, t denotes the number of value class fields, numiA value class field is represented.
Finally, obtaining an input vector X of each flow data sample:
Figure BDA0002861388000000072
where r is the number of words in the Chinese field.
And (3) rolling layers: this layer is a core layer of CNN, and the vectors obtained from the input layer are mainly convolved by a convolution kernel of h × d size to extract features of a deeper layer and obtain a corresponding feature representation T ═ T1,t2,...,tkWhere t iskAnd representing the corresponding column vector obtained by the kth convolution kernel, wherein the sizes of the convolution kernels are respectively 1, 2 and 3.
A pooling layer: the layer acts to compress the features obtained from the convolutional layer and to extract the main features (using a dimension reduction operation). The invention uses a maximum pooling operation, i.e. a column vector t of the previous stepkThe maximum value of (a) is the most important feature.
Full connection layer: inputting the vector output by the pooling layer into the full-connection layer, and obtaining a prediction probability set P ═ { P } through a Softmax function1,p2,...,pnIn which p isnThe probability corresponding to the nth label, n being the number of labels to be predicted.
Prediction layer: and outputting the label corresponding to the maximum probability value in the probability set output by the full connection layer as a prediction result, wherein the formula is as follows:
Label=argmax(P)
label represents the final predicted Label and P represents the probability.
For n data, n CNN models were trained.
And S5, correcting the error prediction result, using the corrected error prediction result as a new sample to form a training set, and sending the training set into the model, and training the new model on the basis of the last model.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the above implementation method can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation method. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A transaction flow processing method based on multi-model fusion is characterized by comprising the following steps:
s1: collecting a transaction flow sample to construct a training set, wherein the sample comprises transaction flow data and a first-level label and a second-level label corresponding to the transaction flow data;
s2: preprocessing the training set to obtain an input vector of each transaction running water sample;
s3: respectively substituting the input vectors into a light gradient lifting regression tree model, a support vector machine model and a logistic regression model for training, and completing prediction of the to-be-classified trade pipelining primary label by combining the trained three models;
s4: and on the basis of the predicted primary label of S3, completing the prediction of the secondary label of the transaction pipeline to be classified by using a convolutional neural network model.
2. The transaction pipeline processing method based on multi-model fusion of claim 1, wherein the transaction pipeline data in S1 includes a plurality of fields, and the fields include name, remark, amount and transaction time.
3. The transaction pipeline processing method based on multi-model fusion of claim 2, wherein the S2 specifically includes:
s21: removing special characters and stop words in the training set transaction flow data;
s22: performing word segmentation processing on each field of the transaction flow data;
s23: converting the words obtained after word segmentation into word vectors;
s24: accumulating all word vectors of each field and taking the average value to obtain a field vector;
s25: and splicing the field vectors of each sample to obtain an input vector set of each sample.
4. The transaction pipeline processing method based on multi-model fusion of claim 1, wherein the error prediction result is modified to be used as a new sample to form a training set, and the model optimization can be completed by repeating the steps S2-S4.
5. The transaction pipeline processing method based on multi-model fusion of claim 1, wherein the S3 specifically includes:
s31: inputting the transaction flow data in the training set and the primary labels corresponding to the transaction flow data into a light gradient lifting regression tree model for training, and predicting the probability P that the transaction flow data to be classified belongs to each primary label based on the trained light gradient lifting regression tree modelL(j) Wherein j represents the compilation of primary labelsNumber;
s32, inputting the transaction running water data in the training set and the first-level labels corresponding to the transaction running water data into a support vector machine model for training, and predicting the probability P that the transaction running water to be classified belongs to each first-level label based on the trained support vector machine modelSVM(j);
S33, inputting the transaction flow data in the training set and the first-level labels corresponding to the transaction flow data into a logistic regression model for training, and predicting the probability P that the transaction flow to be classified belongs to each first-level label based on the trained logistic regression modelS(j);
S34: calculating the probability average value P of each primary labelj=(PL(j)+PSVM(j)+PS(j) And 3), selecting the primary label with the maximum probability average value as the primary label prediction result of the transaction flow to be classified.
6. The transaction pipeline processing method based on multi-model fusion of claim 1, wherein the S4 specifically includes:
s41: dividing the training set into a plurality of sub-training sets according to the first-level labels in the training set;
s42: inputting the transaction running water data contained in each sub-training set and the secondary labels corresponding to the transaction running water data into a convolutional neural network for training to obtain a plurality of convolutional neural network models corresponding to each primary label;
s43: and selecting a convolutional neural network model corresponding to the to-be-classified transaction flow primary label output by the S3 to complete the prediction of the to-be-classified transaction flow secondary label.
7. The transaction pipeline processing method based on multi-model fusion of claim 6, wherein the convolutional neural network model comprises:
the input layer is used for converting the transaction flow samples into neural network input vectors;
the convolutional layer is used for extracting text features in each neural network input vector;
the pooling layer is used for screening out important features from the text features;
the full connection layer is used for connecting the important features to the classifier to obtain the probability that the transaction flow to be classified belongs to each secondary label;
and the prediction layer is used for outputting the secondary label corresponding to the maximum probability as a prediction result.
8. A transaction pipeline processing system based on multi-model fusion is characterized by comprising:
the system comprises a sample collection module, a training set generation module and a training set generation module, wherein the sample collection module is used for collecting transaction flow samples to construct a training set, and the samples comprise transaction flow data and a primary label and a secondary label which correspond to the transaction flow data;
the sample processing module is used for preprocessing the training set to obtain an input vector of each transaction flow sample;
the first-level label prediction module is used for substituting the input vectors into the light gradient lifting regression tree, the support vector machine and the logistic regression model respectively for training, and completing the prediction of the first-level label of the flow of the to-be-classified transaction by utilizing a majority voting method based on the output results of the trained three models;
and the secondary label prediction module is used for completing the prediction of the to-be-classified transaction flow secondary label by utilizing a convolutional neural network model on the basis of the primary label predicted by the S3.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method according to any of claims 1 to 7 are carried out when the program is executed by the processor.
CN202011567495.8A 2020-12-25 2020-12-25 Transaction flow processing method and system based on multi-model fusion Pending CN112561530A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011567495.8A CN112561530A (en) 2020-12-25 2020-12-25 Transaction flow processing method and system based on multi-model fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011567495.8A CN112561530A (en) 2020-12-25 2020-12-25 Transaction flow processing method and system based on multi-model fusion

Publications (1)

Publication Number Publication Date
CN112561530A true CN112561530A (en) 2021-03-26

Family

ID=75033059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011567495.8A Pending CN112561530A (en) 2020-12-25 2020-12-25 Transaction flow processing method and system based on multi-model fusion

Country Status (1)

Country Link
CN (1) CN112561530A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052635A (en) * 2021-03-30 2021-06-29 北京明略昭辉科技有限公司 Population attribute label prediction method, system, computer device and storage medium
CN113065941A (en) * 2021-04-29 2021-07-02 中国银行股份有限公司 Automatic accounting method and device

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468713A (en) * 2015-11-19 2016-04-06 西安交通大学 Multi-model fused short text classification method
CN107943865A (en) * 2017-11-10 2018-04-20 阿基米德(上海)传媒有限公司 It is a kind of to be suitable for more scenes, the audio classification labels method and system of polymorphic type
CN108509485A (en) * 2018-02-07 2018-09-07 深圳壹账通智能科技有限公司 Preprocess method, device, computer equipment and the storage medium of data
CN109460725A (en) * 2018-10-29 2019-03-12 苏州派维斯信息科技有限公司 Receipt consumption details content mergence and extracting method
CN109766358A (en) * 2018-12-17 2019-05-17 深圳壹账通智能科技有限公司 Billing data management method, device, computer equipment and storage medium
CN109886349A (en) * 2019-02-28 2019-06-14 成都新希望金融信息有限公司 A kind of user classification method based on multi-model fusion
CN109948668A (en) * 2019-03-01 2019-06-28 成都新希望金融信息有限公司 A kind of multi-model fusion method
US20190303877A1 (en) * 2018-03-30 2019-10-03 Microsoft Technology Licensing, Llc Analyzing pipelined data
WO2020001106A1 (en) * 2018-06-25 2020-01-02 阿里巴巴集团控股有限公司 Classification model training method and store classification method and device
CN110765114A (en) * 2019-09-20 2020-02-07 北京数衍科技有限公司 Transaction receipt data merging method
CN111259987A (en) * 2020-02-20 2020-06-09 民生科技有限责任公司 Method for extracting event main body based on BERT (belief-based regression analysis) multi-model fusion
US20200302234A1 (en) * 2019-03-22 2020-09-24 Capital One Services, Llc System and method for efficient generation of machine-learning models
CN112036403A (en) * 2020-08-31 2020-12-04 合肥工业大学 Intelligent detection method for missing of bolt pin of power transmission tower based on attention mechanism
CN112084242A (en) * 2020-09-02 2020-12-15 深圳市铭数信息有限公司 Consumption information display method, device, terminal and medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468713A (en) * 2015-11-19 2016-04-06 西安交通大学 Multi-model fused short text classification method
CN107943865A (en) * 2017-11-10 2018-04-20 阿基米德(上海)传媒有限公司 It is a kind of to be suitable for more scenes, the audio classification labels method and system of polymorphic type
CN108509485A (en) * 2018-02-07 2018-09-07 深圳壹账通智能科技有限公司 Preprocess method, device, computer equipment and the storage medium of data
US20190303877A1 (en) * 2018-03-30 2019-10-03 Microsoft Technology Licensing, Llc Analyzing pipelined data
WO2020001106A1 (en) * 2018-06-25 2020-01-02 阿里巴巴集团控股有限公司 Classification model training method and store classification method and device
CN109460725A (en) * 2018-10-29 2019-03-12 苏州派维斯信息科技有限公司 Receipt consumption details content mergence and extracting method
CN109766358A (en) * 2018-12-17 2019-05-17 深圳壹账通智能科技有限公司 Billing data management method, device, computer equipment and storage medium
CN109886349A (en) * 2019-02-28 2019-06-14 成都新希望金融信息有限公司 A kind of user classification method based on multi-model fusion
CN109948668A (en) * 2019-03-01 2019-06-28 成都新希望金融信息有限公司 A kind of multi-model fusion method
US20200302234A1 (en) * 2019-03-22 2020-09-24 Capital One Services, Llc System and method for efficient generation of machine-learning models
CN110765114A (en) * 2019-09-20 2020-02-07 北京数衍科技有限公司 Transaction receipt data merging method
CN111259987A (en) * 2020-02-20 2020-06-09 民生科技有限责任公司 Method for extracting event main body based on BERT (belief-based regression analysis) multi-model fusion
CN112036403A (en) * 2020-08-31 2020-12-04 合肥工业大学 Intelligent detection method for missing of bolt pin of power transmission tower based on attention mechanism
CN112084242A (en) * 2020-09-02 2020-12-15 深圳市铭数信息有限公司 Consumption information display method, device, terminal and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李恒超;林鸿飞;杨亮;徐博;魏晓聪;张绍武;古丽孜热・艾尼外;: "一种用于构建用户画像的二级融合算法框架", 计算机科学, no. 01, 15 January 2018 (2018-01-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052635A (en) * 2021-03-30 2021-06-29 北京明略昭辉科技有限公司 Population attribute label prediction method, system, computer device and storage medium
CN113065941A (en) * 2021-04-29 2021-07-02 中国银行股份有限公司 Automatic accounting method and device

Similar Documents

Publication Publication Date Title
CN107729309B (en) Deep learning-based Chinese semantic analysis method and device
CN110134757B (en) Event argument role extraction method based on multi-head attention mechanism
CN111966917B (en) Event detection and summarization method based on pre-training language model
CN111914558A (en) Course knowledge relation extraction method and system based on sentence bag attention remote supervision
CN109684626A (en) Method for recognizing semantics, model, storage medium and device
CN113011186B (en) Named entity recognition method, named entity recognition device, named entity recognition equipment and computer readable storage medium
CN110489523B (en) Fine-grained emotion analysis method based on online shopping evaluation
CN113312480B (en) Scientific and technological thesis level multi-label classification method and device based on graph volume network
CN110415071B (en) Automobile competitive product comparison method based on viewpoint mining analysis
CN114896388A (en) Hierarchical multi-label text classification method based on mixed attention
CN113946677B (en) Event identification and classification method based on bidirectional cyclic neural network and attention mechanism
CN112036184A (en) Entity identification method, device, computer device and storage medium based on BilSTM network model and CRF model
KR20200139008A (en) User intention-analysis based contract recommendation and autocomplete service using deep learning
CN112561530A (en) Transaction flow processing method and system based on multi-model fusion
US20220383120A1 (en) Self-supervised contrastive learning using random feature corruption
CN113051887A (en) Method, system and device for extracting announcement information elements
CN115392254A (en) Interpretable cognitive prediction and discrimination method and system based on target task
CN110674642B (en) Semantic relation extraction method for noisy sparse text
CN114880307A (en) Structured modeling method for knowledge in open education field
CN112862569B (en) Product appearance style evaluation method and system based on image and text multi-modal data
CN114416991A (en) Method and system for analyzing text emotion reason based on prompt
CN114091406A (en) Intelligent text labeling method and system for knowledge extraction
CN110969005A (en) Method and device for determining similarity between entity corpora
CN113449103A (en) Bank transaction flow classification method and system integrating label and text interaction mechanism
CN109635289B (en) Entry classification method and audit information extraction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination