CN113449103A - Bank transaction flow classification method and system integrating label and text interaction mechanism - Google Patents

Bank transaction flow classification method and system integrating label and text interaction mechanism Download PDF

Info

Publication number
CN113449103A
CN113449103A CN202110119998.7A CN202110119998A CN113449103A CN 113449103 A CN113449103 A CN 113449103A CN 202110119998 A CN202110119998 A CN 202110119998A CN 113449103 A CN113449103 A CN 113449103A
Authority
CN
China
Prior art keywords
label
data
transaction flow
model
des
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110119998.7A
Other languages
Chinese (zh)
Other versions
CN113449103B (en
Inventor
李振
张刚
尹正
鲍东岳
刘蓓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Minsheng Science And Technology Co ltd
Original Assignee
Minsheng Science And Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Minsheng Science And Technology Co ltd filed Critical Minsheng Science And Technology Co ltd
Priority to CN202110119998.7A priority Critical patent/CN113449103B/en
Publication of CN113449103A publication Critical patent/CN113449103A/en
Application granted granted Critical
Publication of CN113449103B publication Critical patent/CN113449103B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/381Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using identifiers, e.g. barcodes, RFIDs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a bank transaction flow classification method and system integrated with a label and text interaction mechanism, and relates to the field of financial business processing. The method comprises the following steps: marking and cleaning the transaction flow data, and constructing training data required by model training; continuously optimizing the model based on training data by adopting a neural network model integrated with an interaction mechanism; calling an optimal model, and marking the new transaction flow; and outputting the corresponding running label, and automatically pushing the consumption report forms of each week, each month or each quarter according to the setting of the user. The system can mark the user transaction flow in real time, and show the expenditure proportion of a specific period to the user, so that the user can comprehensively and systematically know the own consumption structure. Compared with the traditional classification system, the system adds an interaction layer, explicitly calculates the matching degree of the label and the text, and plays a guiding role in later label prediction.

Description

Bank transaction flow classification method and system integrating label and text interaction mechanism
Technical Field
The invention relates to the field of financial business processing, in particular to a bank transaction flow classification method and system integrating a label and text interaction mechanism.
Background
The financial science and technology is rapidly developed, and various large financial institutions actively promote the development of network financial services and devote to the improvement and promotion of traditional financial services and products. The mode of providing financial services by mobile terminals such as mobile banking and the like changes the implementation mode of financial functions, changes the behaviors and habits of people in traditional offline financial transactions and online financial transactions, and becomes a key measure for the bank to take precedence and seize the future market. At present, online transactions have penetrated the lives of the masses, the generated online transaction records have been increased explosively, for example, payments applied in real life are composed of large low-frequency payments and small high-frequency payments, and based on the transaction information, technologies such as big data and artificial intelligence can be adopted to derive some valuable new data, such as the characteristics of terminal merchants or the characteristics of consumers.
People tend to pay more attention to information such as the type of each transaction and the expense ratio of the people, and according to relevant statistics, checking the income and expense details becomes one of the most frequently used functions of people for logging in the mobile phone bank APP. Therefore, more abundant transaction information is displayed to the user based on the user requirements, such as marking consumption category information of diet, trip, clothes and the like for each transaction, the user can conveniently check consumption conditions of each category, and meanwhile, an individualized consumption report is displayed to the user, so that the user experience can be greatly improved, and the user viscosity is increased. However, at present, most of bank APP transaction details only display basic information such as transaction time, transaction objects, amount and the like, information such as consumption types of users and consumption structures of users in a specific period is not output, and visual and systematic display of user consumption behaviors is lacked.
Disclosure of Invention
Aiming at the problems, the invention provides a bank transaction flow classification method and system integrating a label and text interaction mechanism. The system can mark the user transaction flow in real time, and show the expenditure proportion of a specific period to the user, so that the user can comprehensively and systematically know the own consumption structure. Compared with the traditional classification system, the system adds an interaction layer, explicitly calculates the matching degree of the label and the text, and plays a guiding role in later label prediction.
According to a first aspect of the present invention, there is provided a bank transaction pipelining classification method incorporating a tag and text interaction mechanism, wherein the method comprises the following steps:
a data construction step: marking and cleaning the transaction flow data, and constructing training data required by model training;
model training: continuously optimizing the model based on training data by adopting a neural network model integrated with an interaction mechanism;
and (3) label prediction step: calling an optimal model, and marking the new transaction flow;
and a result display step: and outputting corresponding running labels, and automatically pushing consumption reports of each week, each month or each season according to user settings.
Further, the data constructing step specifically includes:
s1: constructing a data set, and labeling and cleaning transaction flow data;
s2: and preprocessing the cleaned transaction flow data, and removing interference words to obtain training data required by model training.
Further, the step S1 specifically includes:
s1.1: marking the transaction flow data based on the regular pattern, thereby obtaining marking data with corresponding labels;
s1.2: washing the marked data, performing de-duplication processing on the marked data, and deleting the marked data with a null value larger than a certain threshold value;
s1.3: obtaining the description corresponding to the label, and storing the label and the description corresponding to the label into a dictionary form: d ═ l1:des1;l2:des2;…;lc:descWhere c is the number of classes of label, lqIs a q-th class tag, desqFor class q labels lqDescription of (1. ltoreq. q. ltoreq.c).
Further, the step S2 specifically includes:
s2.1: carrying out stop word removal processing on character string fields in transaction flow data;
s2.2: splicing the character string fields obtained in the step S2.1, and performing word segmentation, wherein the content corresponding to each transaction flow data field is represented as: s ═ S1,s2,s3,…,snIn which s isiThe ith word of the transaction flow data is indicated, and n represents the number of words of a transaction flow data field (i is more than or equal to 1 and less than or equal to n);
s2.3: performing word segmentation processing on the description field corresponding to the label to obtain desq={w1,w2,w3,…,wmIn which wjRefers to the jth word of the description field, m represents the number of words of the description field (1 ≦ j ≦ m),
and each transaction flow data field after word segmentation and the description field after word segmentation are jointly used as training data.
Further, the stop words include, but are not limited to, special characters, city names, and other information that is not related to marking.
Further, the model training step specifically includes:
the method comprises the steps of training based on training data by adopting a classification model fused with a label and text interaction mechanism, obtaining the matching degree of each word and each category in a text through the dot product operation of vectors by utilizing word-level information, and applying the matching degree to a final prediction layer.
Further, the specific training process of the classification model using the label-fused and text interaction mechanism is as follows:
an input layer: words of the training data are mapped to continuous vectors using the word2vec model,
the vector for each transaction pipeline data field is represented as:
Figure BDA0002921666670000031
wherein
Figure BDA0002921666670000032
Means the ith word s of the trade flow dataiIs represented by a vector of (A), RdR in (a) represents the real number space, d represents the dimension of the vector,
label lqCorresponding description field desqThe vector of (d) is represented as:
Figure BDA0002921666670000033
wherein
Figure BDA0002921666670000034
Means description field jth word wjAll label descriptions correspond to a vector representation matrix of:
Figure BDA0002921666670000035
L∈Rc×m×dwherein R isc×m×dIndicating that the DES has a true value range
Number space, shape c × m × d;
and (3) coding layer: will SeAnd DESeRespectively inputting the data into a Gated Current Unit (GRU) for coding, wherein c GRU coders are required to separately code c-type labels, and coding expressions of transaction flow and label description are respectively obtained:
Figure BDA0002921666670000036
Sh∈Rn×d,DESh∈ Rc×dwherein R isn×dDenotes ShThe value range of (1) is real number space, and the shape is n multiplied by d, Rc×dRepresentation DEShThe value range of (A) is real number space, the shape is c x d,
wherein, when coding S, the output of each time is reserved, and des is described for each labelqOnly the last hidden state is retained when encoding is performed:
Figure BDA0002921666670000037
Figure BDA0002921666670000038
an interaction layer: calculating each word siWith each tag description desqBased on the matching degree, obtaining a classification clue with finer granularity, wherein the matching degree calculation formula is as follows:
Figure BDA0002921666670000039
Q∈Rc×n
Figure BDA00029216666700000310
q is obtained by multiplying two matrixes, and T represents the transposition of the matrixes;
full connection layer: inputting Q to a full connection layer, obtaining O by adopting a ReLU activation function, and obtaining the probability P of the transaction running data S corresponding to each category by a softmax function:
O=ReLU(I×W+b),W∈Rn×1
P=softmax(O)={p1,p2,...,pc},
wherein, W is a parameter matrix, b is a bias, and all parameters of the model to be learned are parameters;
an output layer: and outputting the label corresponding to the maximum probability value as a prediction result:
Labelpre=argmax(P);
wherein the model is optimized based on the following loss function:
Figure BDA0002921666670000041
wherein the content of the first and second substances,
Figure BDA0002921666670000042
a value representing the j dimension of the correct label corresponding to the ith transaction flow,
Figure BDA0002921666670000043
the probability that the ith transaction running label predicted by the model is j is represented.
According to a second aspect of the present invention, there is provided a bank transaction flow classification device incorporating a tag and text interaction mechanism, wherein the device operates based on the method according to any one of the above aspects, and the device comprises the following modules:
a data construction module: the system is used for marking and cleaning transaction flow data and constructing training data required by model training;
a model training module: the model is continuously optimized based on training data by adopting a neural network model integrated with an interaction mechanism;
a label prediction module: the system is used for calling the optimal model and marking the new transaction flow;
and a result display module: the system is used for outputting corresponding running labels and automatically pushing consumption reports of each week, each month or each quarter according to user settings.
According to a third aspect of the present invention, there is provided a bank transaction pipelining classification system incorporating a tag and text interaction mechanism, the system comprising:
a processor and a memory for storing executable instructions;
wherein the processor is configured to execute the executable instructions to perform the bank transaction pipelining classification method according to any one of the above aspects, incorporating a tag and text interaction mechanism.
According to a fourth aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of pipelining banking transactions involving a tag and text interaction mechanism according to any one of the above aspects.
The invention has the beneficial effects that:
1) and a data set is automatically constructed based on the regular pattern, so that a large amount of manpower and material resources are saved. The premise of applying the neural network model is that a large amount of labeled data is needed, and the traditional mode is based on manual labeling, which is time-consuming and labor-consuming. The method and the device obtain the marking data based on the regular expression and provide training data for the model, so that the training by adopting the neural network model has feasibility.
2) The neural network model is adopted to predict the labels of the bank flow, so that the generalization of the system can be improved. The Chinese language representation is diversified, rules cannot cover all conditions, and the neural network has the characteristics of self-learning, self-adaption, nonlinearity and the like, so that the neural network can learn complex semantic association in the Chinese language, and has higher accuracy and recall rate in a classification task of a real scene.
3) And the word level information and the label information are fully utilized, and the convergence speed and accuracy of the model are improved. The overall representation of the text in the traditional classification model determines the classification with high probability, neglects the word level information which can provide effective classification clues, for example, the rice line strongly suggests the subject of diet. Compared with the traditional classification model which only utilizes the label information at the prediction layer, the model adopted by the system also utilizes the label information at the interaction layer, and the interaction result of the layer can further guide the optimization direction of the model and accelerate the convergence of the model.
4) The invention can mark each transaction of the user with a specific consumption type in real time, can also regularly push a consumption report to the user, supports the user to set a report display period by himself, and can intuitively know the consumption structure of the user according to the reports, thereby providing a basis for the next consumption and further adjusting the consumption behavior. Other use scenes can be derived based on the consumption labels, such as user portrait construction, product recommendation and the like, and the interests and hobbies of the user are known according to the consumption structure of the user, so that relevant information or financial products can be pertinently recommended to the user.
Drawings
FIG. 1 illustrates a block diagram of a banking transaction pipelining classification system incorporating a tag and text interaction mechanism in accordance with the present invention;
fig. 2 shows a structure diagram of a bank transaction flow classification model merged into a label and text interaction mechanism according to the invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings, in which like numerals refer to the same or similar elements throughout the different views, unless otherwise specified. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terms "first," "second," and the like in the description and in the claims of the present disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.
Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
A plurality, including two or more.
And/or, it should be understood that, for the term "and/or" as used in this disclosure, it is merely one type of association that describes an associated object, meaning that three types of relationships may exist. For example, a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone.
Examples
The method comprises the following steps:
s1: and constructing a data set, and mainly labeling and cleaning the data.
S1.1A bank assembly line generally comprises information related to transaction contents such as a business name, a customer introduction, a bank remark and the like, and data is marked on the basis of a regular basis (for example, when a supermarket character appears, the category of the assembly line (a certain treasure-AA supermarket stock company) is marked as supermarket convenience), so that a large amount of marked data is obtained. But regularization does not cover all cases (e.g., when only "AA" appears but not "supermarket"), so neural network models are also needed to learn the underlying information in the sample.
S1.2: and (6) cleaning data. And (3) carrying out deduplication processing on the data (increasing the diversity of samples under the condition of a certain number of training sets), and deleting the data with more null values.
S1.3: obtaining the description corresponding to the label (such as { 'convenience service:' shared charging treasure, housekeeping service, moving, flash sending and maintenance) }), and storing the label and the description corresponding to the label into a dictionary form: d ═ l1:des1;l2:des2;…;lc:descWhere c is the number of classes of label, lqIs a q-th class tag, desqIs a q-th type label lqDescription of (1. ltoreq. q. ltoreq.c).
Regular, i.e. regular expression, describes a pattern (pattern) for matching a character string, which can be used to check whether a string contains a certain substring, to replace the matching substring, or to extract a substring that meets a certain condition from a certain string, etc.
S2: and (4) preprocessing data. And preprocessing the cleaned data, and removing stop words, word segmentation and the like.
S2.1: and carrying out stop word processing on the character string field in the transaction flow. The stop words comprise information which is irrelevant to marking, such as special characters, city place names and the like.
S2.2: splicing the character string fields, and performing word segmentation processing, wherein the final content corresponding to the fields is represented as: s ═ S1,s2,s3,…,snIn which s isiThe ith word of the transaction flow data is indicated, and n represents the number of words of the transaction flow data field (i is more than or equal to 1 and less than or equal to n).
S2.3: performing word segmentation processing on the description corresponding to the label to obtain desq={w1,w2,w3,…,wmIn which wjRefers to the jth word of the description field, and m represents the number of words of the description field (j is more than or equal to 1 and less than or equal to m).
S3: by adopting a classification model integrated with a label and text interaction mechanism, by utilizing finer-grained information, namely information of word level, the matching degree of each word in the text and each category is obtained through the dot product operation of vectors, and the matching degree is applied to a final prediction layer (for a BBB soup pill store, wherein the soup pill strongly suggests a diet theme, and in the learning of the model, compared with other themes, the matching degree of the soup pill and the diet theme is the largest). The invention adopts GRU to encode text and labels, the network unit can capture time sequence information and optimize the problems of gradient disappearance and explosion, and the network unit is widely used for word-level encoding in recent years. The specific training process of the model is as follows:
s3.1 input layer: word2vec model is used to map words into continuous vectors. word2vec converts words in natural language into dense vectors which can be understood by a computer, and the position relation of the vectors in the space generally represents the semantic correlation degree between words, namely, words with similar meanings can be mapped to similar positions in the vector space.
The representation of the final string is:
Figure BDA0002921666670000071
desq={w1,w2,w3,…,wmin which wjRefers to the jth word of the description field, m represents the number of words of the description field (1 ≦ j ≦ m),
label lqCorresponding description field desqThe vector of (d) is represented as:
Figure BDA0002921666670000072
wherein
Figure BDA0002921666670000073
Means description field jth word wjThe final representation matrix corresponding to all labels is (where c represents the label category number):
Figure BDA0002921666670000074
L∈Rc×m×dwherein R isc×m×dThe DES is represented by a real space and a shape of c multiplied by m multiplied by d;
s3.2 coding layer: will SeAnd LeAnd respectively inputting the coded signals into GRUs for coding, wherein c GRU coders are required to respectively and independently code c labels. Finally obtain the representation of the string and label:
Figure BDA0002921666670000075
Figure BDA0002921666670000076
Sh∈Rn×d,DESh∈Rc×dwherein R isn×dDenotes ShThe value range of (1) is real number space, and the shape is n multiplied by d, Rc×dRepresentation DEShThe value range of (A) is real number space, the shape is c x d,
wherein the output at each time is preserved when S is coded, and each label is describeddesqOnly the last hidden state is retained when encoding is performed:
Figure BDA0002921666670000077
Figure BDA0002921666670000078
s3.3, interaction layer: calculating each word siWith each tag description desqBased on the matching degree, obtaining a classification clue with finer granularity, wherein the matching degree calculation formula is as follows:
Figure BDA0002921666670000079
Q∈Rc×n
Figure BDA00029216666700000710
q is obtained by multiplying two matrixes, and T represents the transposition of the matrixes;
s3.4 full connection layer: inputting Q to a full connection layer, obtaining O by adopting a ReLU activation function, and obtaining the probability P of the transaction running data S corresponding to each category by a softmax function:
O=ReLU(I×W+b),W∈Rn×1
P=softmax(O)={p1,p2,...,pc},
s3.5 output layer: and outputting the label corresponding to the maximum probability value as a prediction result:
Labelpre=argmax(P);
wherein the model is optimized based on the following loss function:
Figure BDA0002921666670000081
wherein the content of the first and second substances,
Figure BDA0002921666670000082
a value representing the j dimension of the correct label corresponding to the ith transaction flow,
Figure BDA0002921666670000083
representing the probability that the ith transaction flow label predicted by the model is j;
s4: and displaying the label prediction result to the user, and pushing a consumption report according to the time set by the user per se and according to the week, the month or the quarter.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the above implementation method can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation method. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and various modifications can be made by those skilled in the art without departing from the spirit and scope of the present invention as defined in the appended claims.

Claims (10)

1. A bank transaction flow classification method integrated with a label and text interaction mechanism is characterized by comprising the following steps:
a data construction step: marking and cleaning the transaction flow data, and constructing training data required by model training;
model training: continuously optimizing the neural network model based on training data by adopting a neural network model integrated into an interaction mechanism;
and (3) label prediction step: calling the optimized neural network model to mark the new transaction flow;
and a result display step: and outputting the corresponding running label, and automatically pushing the consumption report forms of each week, each month or each quarter according to the setting of the user.
2. The method for classifying bank transaction flow according to claim 1, wherein the data constructing step specifically comprises:
s1: constructing a data set, and labeling and cleaning transaction flow data;
s2: and preprocessing the cleaned transaction flow data, and removing interference words to obtain training data required by model training.
3. The method for classifying bank transaction flow according to claim 2, wherein the step S1 specifically includes:
s1.1: marking the transaction flow data based on the regular pattern, thereby obtaining marking data with corresponding labels;
s1.2: washing the marked data, performing de-duplication processing on the marked data, and deleting the marked data with a null value larger than a certain threshold value;
s1.3: obtaining the description corresponding to the label, and storing the label and the description corresponding to the label into a dictionary form: d ═ l1:des1;l2:des2;…;lc:descWhere c is the number of classes of label, lqIs a q-th class tag, desqIs a q-th type label lqDescription of (1. ltoreq. q. ltoreq.c).
4. The method for classifying bank transaction flow according to claim 2, wherein the step S2 specifically includes:
s2.1: carrying out stop word removal processing on character string fields in transaction flow data;
s2.2: splicing the character string fields obtained in the step S2.1, and performing word segmentation, wherein the content corresponding to each transaction flow data field is represented as: s ═ S1,s2,s3,…,snIn which s isiThe ith word of the transaction flow data is indicated, and n represents the number of words of a transaction flow data field (i is more than or equal to 1 and less than or equal to n);
s2.3: performing word segmentation processing on the description field corresponding to the label to obtain desq={w1,w2,w3,…,wmIn which wjRefers to the jth word of the description field, m represents the number of words of the description field (1 ≦ j ≦ m),
and each transaction flow data field after word segmentation and the description field after word segmentation are jointly used as training data.
5. The method for classifying bank transaction flow according to claim 4, wherein the stop words include but are not limited to special characters, city names and other information which is not related to marking.
6. The method for classifying bank transaction flow according to claim 1, wherein the model training step specifically comprises:
the method comprises the steps of training based on training data by adopting a classification model fused with a label and text interaction mechanism, obtaining the matching degree of each word and each category in a text through the dot product operation of vectors by utilizing word-level information, and applying the matching degree to a final prediction layer.
7. The method for classifying bank transactions according to claim 6, wherein the specific training process of the classification model using the label and text interaction mechanism is as follows:
an input layer: words of the training data are mapped to continuous vectors using the word2vec model,
the vector for each transaction pipeline data field is represented as:
Figure FDA0002921666660000021
wherein
Figure FDA0002921666660000022
Fat trade pipeline data ith word siIs represented by a vector of (A), RdR in (a) represents the real number space, d represents the dimension of the vector,
label lqCorresponding description field desqThe vector of (d) is represented as:
Figure FDA0002921666660000023
wherein
Figure FDA0002921666660000024
Means description field jth word wjIs used to represent the vector of (a),
all label descriptions correspond to vector representation matrix as:
Figure FDA0002921666660000025
L∈Rc×m×dwherein R isc×m×dThe DES is represented by a real number space in a value range of c multiplied by m multiplied by d;
and (3) coding layer: will SeAnd DESeRespectively inputting the data into a Gated Current Unit (GRU) for coding, wherein c GRU coders are required to separately code c labels, and coding expressions of a transaction stream and label description are respectively obtained:
Figure FDA0002921666660000026
Sh∈Rn×d,DESh∈Rc×dwherein R isn×dDenotes ShThe value range of (1) is real number space, and the shape is n multiplied by d, Rc×dRepresentation DEShThe value range of (A) is real number space, the shape is c x d,
wherein, when coding S, the output of each time is reserved, and des is described for each labelqOnly the last hidden state is retained when encoding is performed:
Figure FDA0002921666660000027
Figure FDA0002921666660000028
an interaction layer: calculating each word siWith each tag description desqBased on the matching degree, obtaining a classification clue with finer granularity, wherein the matching degree calculation formula is as follows:
Figure FDA0002921666660000031
Figure FDA0002921666660000032
q is obtained by multiplying two matrixes, and T represents the transposition of the matrixes;
full connection layer: inputting Q to a full connection layer, obtaining O by adopting a ReLU activation function, and obtaining the probability P of the transaction running data S corresponding to each category by a softmax function:
O=ReLU(I×W+b),W∈Rn×1
P=softmax(O)={p1,p2,…,pc},
wherein, W is a parameter matrix, b is a bias, and all parameters of the model to be learned are parameters;
an output layer: and outputting the label corresponding to the maximum probability value as a prediction result:
Labelpre=argmax(P);
wherein the model is optimized based on the following loss function:
Figure FDA0002921666660000033
wherein the content of the first and second substances,
Figure FDA0002921666660000034
a value representing the j dimension of the correct label corresponding to the ith transaction flow,
Figure FDA0002921666660000035
and representing the probability that the ith transaction flow label predicted by the model is j.
8. A banking transaction pipelining classification apparatus incorporating a tag and text interaction mechanism, the apparatus operating based on a method according to any one of claims 1 to 7, the apparatus comprising the following modules:
a data construction module: the system is used for marking and cleaning transaction flow data and constructing training data required by model training;
a model training module: the neural network model is used for continuously optimizing the neural network model based on training data by adopting a neural network model integrated into an interaction mechanism;
a label prediction module: the neural network model is used for calling the optimized neural network model to mark a new transaction flow;
and a result display module: the system is used for outputting corresponding running labels and automatically pushing consumption reports of each week, each month or each quarter according to user settings.
9. A system for sorting bank transactions in-line by incorporating a tag and text interaction mechanism, the system comprising:
a processor and a memory for storing executable instructions;
wherein the processor is configured to execute the executable instructions to perform the bank transaction pipelining classification method incorporating the tag-to-text interaction mechanism of any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of pipelining banking transactions involving a tag and text interaction mechanism according to any one of claims 1 to 7.
CN202110119998.7A 2021-01-28 2021-01-28 Bank transaction running water classification method and system integrating label and text interaction mechanism Active CN113449103B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110119998.7A CN113449103B (en) 2021-01-28 2021-01-28 Bank transaction running water classification method and system integrating label and text interaction mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110119998.7A CN113449103B (en) 2021-01-28 2021-01-28 Bank transaction running water classification method and system integrating label and text interaction mechanism

Publications (2)

Publication Number Publication Date
CN113449103A true CN113449103A (en) 2021-09-28
CN113449103B CN113449103B (en) 2024-05-10

Family

ID=77808887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110119998.7A Active CN113449103B (en) 2021-01-28 2021-01-28 Bank transaction running water classification method and system integrating label and text interaction mechanism

Country Status (1)

Country Link
CN (1) CN113449103B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116720944A (en) * 2023-08-10 2023-09-08 山景智能(北京)科技有限公司 Bank flowing water marking method and device

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020032810A1 (en) * 1995-06-22 2002-03-14 Wagner Richard Hiers Open network system for I/O operation including a common gateway interface and an extended open network protocol with non-standard I/O devices utilizing device and identifier for operation to be performed with device
CN101140645A (en) * 2007-11-05 2008-03-12 陆航程 Tax controlling method based on article internet, and tax controlling method and EPC, EBC article internet and implement used for tax controlling
CN104272335A (en) * 2011-12-02 2015-01-07 艾萨薇公司 Unified processing of events associated with a transaction executing product purchase and/or use
CN107908606A (en) * 2017-10-31 2018-04-13 上海壹账通金融科技有限公司 Method and system based on different aforementioned sources automatic report generation
CN108073677A (en) * 2017-11-02 2018-05-25 中国科学院信息工程研究所 A kind of multistage text multi-tag sorting technique and system based on artificial intelligence
CN108509485A (en) * 2018-02-07 2018-09-07 深圳壹账通智能科技有限公司 Preprocess method, device, computer equipment and the storage medium of data
CN109299273A (en) * 2018-11-02 2019-02-01 广州语义科技有限公司 Based on the multi-source multi-tag file classification method and its system for improving seq2seq model
CN109711848A (en) * 2018-12-28 2019-05-03 武汉金融资产交易所有限公司 A kind of matching system and its construction method, matching process of financial transaction
CN109829818A (en) * 2019-02-03 2019-05-31 中国银行股份有限公司 Cash demand amount prediction technique, device, electronic equipment and readable storage medium storing program for executing
CN110188199A (en) * 2019-05-21 2019-08-30 北京鸿联九五信息产业有限公司 A kind of file classification method for intelligent sound interaction
CN110442707A (en) * 2019-06-21 2019-11-12 电子科技大学 A kind of multi-tag file classification method based on seq2seq
CN111274791A (en) * 2020-01-13 2020-06-12 江苏艾佳家居用品有限公司 Modeling method of user loss early warning model in online home decoration scene
CN111754241A (en) * 2019-05-27 2020-10-09 北京京东尚科信息技术有限公司 User behavior perception method, device, equipment and medium
US10831452B1 (en) * 2019-09-06 2020-11-10 Digital Asset Capital, Inc. Modification of in-execution smart contract programs

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020032810A1 (en) * 1995-06-22 2002-03-14 Wagner Richard Hiers Open network system for I/O operation including a common gateway interface and an extended open network protocol with non-standard I/O devices utilizing device and identifier for operation to be performed with device
CN101140645A (en) * 2007-11-05 2008-03-12 陆航程 Tax controlling method based on article internet, and tax controlling method and EPC, EBC article internet and implement used for tax controlling
CN104272335A (en) * 2011-12-02 2015-01-07 艾萨薇公司 Unified processing of events associated with a transaction executing product purchase and/or use
CN107908606A (en) * 2017-10-31 2018-04-13 上海壹账通金融科技有限公司 Method and system based on different aforementioned sources automatic report generation
CN108073677A (en) * 2017-11-02 2018-05-25 中国科学院信息工程研究所 A kind of multistage text multi-tag sorting technique and system based on artificial intelligence
CN108509485A (en) * 2018-02-07 2018-09-07 深圳壹账通智能科技有限公司 Preprocess method, device, computer equipment and the storage medium of data
CN109299273A (en) * 2018-11-02 2019-02-01 广州语义科技有限公司 Based on the multi-source multi-tag file classification method and its system for improving seq2seq model
CN109711848A (en) * 2018-12-28 2019-05-03 武汉金融资产交易所有限公司 A kind of matching system and its construction method, matching process of financial transaction
CN109829818A (en) * 2019-02-03 2019-05-31 中国银行股份有限公司 Cash demand amount prediction technique, device, electronic equipment and readable storage medium storing program for executing
CN110188199A (en) * 2019-05-21 2019-08-30 北京鸿联九五信息产业有限公司 A kind of file classification method for intelligent sound interaction
CN111754241A (en) * 2019-05-27 2020-10-09 北京京东尚科信息技术有限公司 User behavior perception method, device, equipment and medium
CN110442707A (en) * 2019-06-21 2019-11-12 电子科技大学 A kind of multi-tag file classification method based on seq2seq
US10831452B1 (en) * 2019-09-06 2020-11-10 Digital Asset Capital, Inc. Modification of in-execution smart contract programs
CN111274791A (en) * 2020-01-13 2020-06-12 江苏艾佳家居用品有限公司 Modeling method of user loss early warning model in online home decoration scene

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈强: "兴业银行"AI+大数据"的创新应用与实践", 《 金融电子化》, pages 72 - 74 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116720944A (en) * 2023-08-10 2023-09-08 山景智能(北京)科技有限公司 Bank flowing water marking method and device
CN116720944B (en) * 2023-08-10 2023-12-19 山景智能(北京)科技有限公司 Bank flowing water marking method and device

Also Published As

Publication number Publication date
CN113449103B (en) 2024-05-10

Similar Documents

Publication Publication Date Title
CN105046515B (en) Method and device for sorting advertisements
CN109493199A (en) Products Show method, apparatus, computer equipment and storage medium
CN110598206A (en) Text semantic recognition method and device, computer equipment and storage medium
CN107729309A (en) A kind of method and device of the Chinese semantic analysis based on deep learning
CN112347367B (en) Information service providing method, apparatus, electronic device and storage medium
CN110245257B (en) Push information generation method and device
CN109598517B (en) Commodity clearance processing, object processing and category prediction method and device thereof
CN109492103A (en) Label information acquisition methods, device, electronic equipment and computer-readable medium
CN112819023A (en) Sample set acquisition method and device, computer equipment and storage medium
CN109740642A (en) Invoice category recognition methods, device, electronic equipment and readable storage medium storing program for executing
CN110287341A (en) A kind of data processing method, device and readable storage medium storing program for executing
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
CN116861258B (en) Model processing method, device, equipment and storage medium
CN111897954A (en) User comment aspect mining system, method and storage medium
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN109447129A (en) A kind of multi-mode Emotion identification method, apparatus and computer readable storage medium
CN113449103A (en) Bank transaction flow classification method and system integrating label and text interaction mechanism
CN112905787B (en) Text information processing method, short message processing method, electronic device and readable medium
CN113761910A (en) Comment text fine-grained emotion analysis method integrating emotional characteristics
CN114119191A (en) Wind control method, overdue prediction method, model training method and related equipment
CN116029793A (en) Commodity recommendation method, device, equipment and medium thereof
CN115293818A (en) Advertisement putting and selecting method and device, equipment and medium thereof
CN115953217A (en) Commodity grading recommendation method and device, equipment, medium and product thereof
CN115618079A (en) Session recommendation method, device, electronic equipment and storage medium
CN113807920A (en) Artificial intelligence based product recommendation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant