CN116595551A - Bank transaction data management method and system - Google Patents

Bank transaction data management method and system Download PDF

Info

Publication number
CN116595551A
CN116595551A CN202310540633.0A CN202310540633A CN116595551A CN 116595551 A CN116595551 A CN 116595551A CN 202310540633 A CN202310540633 A CN 202310540633A CN 116595551 A CN116595551 A CN 116595551A
Authority
CN
China
Prior art keywords
transaction data
data word
feature vectors
feature
context
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202310540633.0A
Other languages
Chinese (zh)
Inventor
袁明浩
欧智彪
王圳义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Weiming Information Technology Co ltd
Original Assignee
Guangzhou Weiming Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Weiming Information Technology Co ltd filed Critical Guangzhou Weiming Information Technology Co ltd
Priority to CN202310540633.0A priority Critical patent/CN116595551A/en
Publication of CN116595551A publication Critical patent/CN116595551A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Animal Behavior & Ethology (AREA)
  • Medical Informatics (AREA)
  • Machine Translation (AREA)

Abstract

A bank transaction data management method and system, obtain the bank transaction data to be analyzed; and mining implicit semantic understanding characteristics about the personal information data in the banking transaction data to be analyzed by adopting an artificial intelligence technology based on deep learning, and hiding or encrypting the data segments related to the personal information data based on the implicit semantic understanding characteristics so as to ensure the safety of the personal information data. Thus, the personal information data in the banking transaction data can be accurately encrypted, so that the desensitization processing of the banking transaction data is finished, and the personal information data in the banking transaction data is ensured to be safe.

Description

Bank transaction data management method and system
Technical Field
The application relates to the technical field of intelligent management, in particular to a banking transaction data management method and system.
Background
At present, commercial banks in China are in a strategic period of transformation, the banks need to search for new development modes, the transformation from 'bank informatization' to 'informatization bank' is gradually realized, and the electronic and other aspects of banking business are not separated from 'data', so that the method is particularly important for data management of banking transactions.
At present, under the dual drive of internal changing demands and external competitive pressures, large commercial banks, urban commercial banks and rural commercial banks push the construction of a data management system by means of an IT information system in a dispute, so that the data quality of enterprises is improved. Although the mode has a certain data management result, the data management of commercial banks in China still has a plurality of problems in the whole. In recent years, bank information security events of various countries are frequent, a large amount of user information is revealed, and the public attention is drawn. Financial institutions are urgently required to be enhanced in terms of personal information data security.
Accordingly, an optimized banking transaction data management system is desired.
Disclosure of Invention
The present application has been made to solve the above-mentioned technical problems. The embodiment of the application provides a banking transaction data management method and system, which are used for acquiring banking transaction data to be analyzed; and mining implicit semantic understanding characteristics about the personal information data in the banking transaction data to be analyzed by adopting an artificial intelligence technology based on deep learning, and hiding or encrypting the data segments related to the personal information data based on the implicit semantic understanding characteristics so as to ensure the safety of the personal information data. Thus, the personal information data in the banking transaction data can be accurately encrypted, so that the desensitization processing of the banking transaction data is finished, and the personal information data in the banking transaction data is ensured to be safe.
In a first aspect, there is provided a banking transaction data management system comprising:
the transaction data acquisition module is used for acquiring bank transaction data to be analyzed;
the word segmentation processing module is used for carrying out word segmentation processing on the bank transaction data to be analyzed to obtain a transaction data word sequence;
the word embedding module is used for enabling the transaction data word sequence to pass through a word embedding layer to obtain a sequence of transaction data word feature vectors;
the transaction semantic understanding module is used for enabling the sequence of the transaction data word feature vectors to pass through a context encoder based on a converter to obtain a plurality of transaction data word context-associated semantic feature vectors;
the global association coding module is used for arranging the sequence of the transaction data word feature vectors into a two-dimensional feature matrix and obtaining transaction data word global understanding feature vectors through a text convolutional neural network model;
the query module is used for respectively taking the context-associated semantic feature vectors of the transaction data words as query feature vectors, and calculating a transfer matrix between the context-associated semantic feature vectors and the transaction data word global understanding feature vectors so as to obtain a plurality of classification feature matrices;
the detection result judging module is used for enabling the plurality of classification feature matrixes to pass through a classifier to obtain a plurality of classification results, and each classification result is used for indicating whether each transaction data word belongs to personal information or not; and
And the desensitization module is used for carrying out desensitization processing on the bank transaction data to be analyzed based on the plurality of classification results.
In the banking transaction data management system, the transaction semantic understanding module includes: the query vector construction unit is used for one-dimensionally arranging the sequence of the transaction data word feature vectors to obtain global transaction data word feature vectors; a self-attention unit, configured to calculate a product between the global transaction data word feature vector and a transpose vector of each transaction data word feature vector in the sequence of transaction data word feature vectors to obtain a plurality of self-attention correlation matrices; the normalization unit is used for respectively performing normalization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of normalized self-attention correlation matrices; the attention calculating unit is used for obtaining a plurality of probability values through a Softmax classification function by each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and the attention applying unit is used for weighting each transaction data word characteristic vector in the sequence of the transaction data word characteristic vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the plurality of transaction data word context-associated semantic characteristic vectors.
In the banking transaction data management system, the global association coding module is used for: and respectively carrying out convolution processing, feature matrix-based mean pooling processing and nonlinear activation processing on input data in forward transfer of layers by using each layer of the text convolutional neural network model to output the transaction data word global understanding feature vector by the last layer of the text convolutional neural network model, wherein the input of the first layer of the text convolutional neural network model is the two-dimensional feature matrix.
In the banking transaction data management system, the inquiry module includes: an optimization factor calculation unit, configured to calculate association-probability density distribution affine mapping factors between the context associated semantic feature vectors of the respective transaction data words and the global understanding feature vectors of the transaction data words, respectively, to obtain a plurality of first association-probability density distribution affine mapping factors and a plurality of second association-probability density distribution affine mapping factors; the weighted optimization unit is used for respectively taking the plurality of first association-probability density distribution affine mapping factors and the plurality of second association-probability density distribution affine mapping factors as weights and respectively weighting the context association semantic feature vectors of all transaction data words and the transaction data word global understanding feature vectors so as to obtain a plurality of corrected transaction data word context association semantic feature vectors and a plurality of corrected transaction data word global understanding feature vectors; and a transfer association unit, configured to calculate a transfer matrix of the context-associated semantic feature vector of each group of corresponding corrected transaction data words relative to the corrected transaction data words global understanding feature vector, so as to obtain the plurality of classification feature matrices.
In the banking data management system, the optimization factor calculating unit is configured to: calculating correlation-probability density distribution affine mapping factors between the context-associated semantic feature vectors of the respective transaction data words and the transaction data word global understanding feature vector respectively in the following optimization formula to obtain the plurality of first correlation-probability density distribution affine mapping factors and the plurality of second correlation-probability density distribution affine mapping factors; wherein, the optimization formula is:
wherein V is 1 Representing the context associated semantic feature vectors of the respective transaction data words, V 2 Representing the global understanding feature vector of the transaction data word, M is an association matrix obtained by position-by-position association between the context association semantic feature vector of each transaction data word and the global understanding feature vector of the transaction data word, mu and sigma are mean vector and position-by-position variance matrix of each Gaussian density map formed by the context association semantic feature vector of each transaction data word and the global understanding feature vector of the transaction data word,representing matrix multiplication, exp (·) representing the exponential operation of the matrix, the exponential operation table of the matrix The natural exponential function value, w, raised to the power of the eigenvalue of each position in the matrix is calculated 1 Representing each of the plurality of first associative-probability density distribution affine mapping factors, w 2 Representing each of the plurality of second association-probability density distribution affine mapping factors.
In the banking transaction data management system, the transfer association unit is configured to: calculating a transfer matrix of the context-associated semantic feature vector of each group of corresponding corrected transaction data words relative to the global understanding feature vector of the corrected transaction data words according to the following transfer formula to obtain a plurality of classification feature matrices; wherein, the transfer formula is:
wherein V is 1 Representing the context-associated semantic feature vector of the corrected transaction data word, V 2 Representing the corrected transaction data word global understanding feature vector, M representing the plurality of classification feature matrices,representing matrix multiplication.
In the above banking transaction data management system, the detection result judging module includes: a matrix expansion unit, configured to expand the plurality of classification feature matrices into classification feature vectors according to row vectors or column vectors; the full-connection coding unit is used for carrying out full-connection coding on the classification characteristic vectors by using a plurality of full-connection layers of the classifier so as to obtain coded classification characteristic vectors; and the classification unit is used for passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
In a second aspect, there is provided a banking transaction data management method, comprising:
acquiring bank transaction data to be analyzed;
word segmentation processing is carried out on the bank transaction data to be analyzed so as to obtain a transaction data word sequence;
passing the transaction data word sequence through a word embedding layer to obtain a sequence of transaction data word feature vectors;
passing the sequence of transaction data word feature vectors through a context encoder based on a transducer to obtain a plurality of transaction data word context-associated semantic feature vectors;
the sequence of the feature vectors of the transaction data words is arranged into a two-dimensional feature matrix, and then the feature vectors of the transaction data words are globally understood through a text convolutional neural network model;
respectively taking the context-associated semantic feature vectors of the transaction data words as query feature vectors, and calculating transfer matrixes between the query feature vectors and the transaction data word global understanding feature vectors to obtain a plurality of classification feature matrixes;
the classification feature matrixes pass through a classifier to obtain classification results, and each classification result is used for indicating whether each transaction data word belongs to personal information; and
and based on the classification results, desensitizing the bank transaction data to be analyzed.
In the banking transaction data management method, the step of obtaining a plurality of transaction data word context-associated semantic feature vectors by passing the sequence of transaction data word feature vectors through a context encoder based on a converter includes: one-dimensional arrangement is carried out on the sequence of the transaction data word feature vectors so as to obtain global transaction data word feature vectors; calculating the product between the global transaction data word feature vector and the transpose vector of each transaction data word feature vector in the sequence of transaction data word feature vectors to obtain a plurality of self-attention association matrices; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and weighting each transaction data word feature vector in the sequence of transaction data word feature vectors by taking each probability value in the plurality of probability values as a weight to obtain the context-associated semantic feature vectors of the plurality of transaction data words.
In the banking transaction data management method, the sequence of the transaction data word feature vectors is arranged into a two-dimensional feature matrix, and then the two-dimensional feature matrix is passed through a text convolutional neural network model to obtain the transaction data word global understanding feature vector, which comprises the following steps: and respectively carrying out convolution processing, feature matrix-based mean pooling processing and nonlinear activation processing on input data in forward transfer of layers by using each layer of the text convolutional neural network model to output the transaction data word global understanding feature vector by the last layer of the text convolutional neural network model, wherein the input of the first layer of the text convolutional neural network model is the two-dimensional feature matrix.
Compared with the prior art, the banking transaction data management method and system provided by the application acquire the banking transaction data to be analyzed; and mining implicit semantic understanding characteristics about the personal information data in the banking transaction data to be analyzed by adopting an artificial intelligence technology based on deep learning, and hiding or encrypting the data segments related to the personal information data based on the implicit semantic understanding characteristics so as to ensure the safety of the personal information data. Thus, the personal information data in the banking transaction data can be accurately encrypted, so that the desensitization processing of the banking transaction data is finished, and the personal information data in the banking transaction data is ensured to be safe.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is an application scenario diagram of a banking transaction data management system according to an embodiment of the present application.
Fig. 2 is a block diagram of a banking transaction data management system according to an embodiment of the present application.
Fig. 3 is a block diagram of the transaction semantic understanding module in the banking transaction data management system according to an embodiment of the present application.
Fig. 4 is a block diagram of the query module in the banking data management system according to an embodiment of the present application.
Fig. 5 is a block diagram of the detection result judging module in the banking transaction data management system according to the embodiment of the present application.
Fig. 6 is a flowchart of a banking transaction data management method according to an embodiment of the present application.
Fig. 7 is a schematic diagram of a system architecture of a banking transaction data management method according to an embodiment of the present application.
Detailed Description
The following description of the technical solutions according to the embodiments of the present application will be given with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Unless defined otherwise, all technical and scientific terms used in the embodiments of the application have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application.
In describing embodiments of the present application, unless otherwise indicated and limited thereto, the term "connected" should be construed broadly, for example, it may be an electrical connection, or may be a communication between two elements, or may be a direct connection, or may be an indirect connection via an intermediate medium, and it will be understood by those skilled in the art that the specific meaning of the term may be interpreted according to circumstances.
It should be noted that, the term "first\second\third" related to the embodiment of the present application is merely to distinguish similar objects, and does not represent a specific order for the objects, it is to be understood that "first\second\third" may interchange a specific order or sequence where allowed. It is to be understood that the "first\second\third" distinguishing objects may be interchanged where appropriate such that embodiments of the application described herein may be practiced in sequences other than those illustrated or described herein.
As described above, in recent years, bank information security events of various countries are frequent, and a large amount of user information is revealed, which has attracted public attention. Financial institutions are urgently required to be enhanced in terms of personal information data security. Accordingly, an optimized banking transaction data management system is desired.
Accordingly, in the management process of actually performing banking data, it is desirable to perform text-based semantic understanding on the banking data to hide or encrypt data pieces related to personal information data to secure the personal information data. However, since the transaction data of the bank is complicated and contains a large amount of basic information and transaction information, capturing and extracting the effective data information about the personal information data is difficult, and the security protection degree of the personal information data is reduced. Therefore, in the process, the difficulty is how to fully and accurately mine the implicit semantic understanding characteristics of the personal information data in the bank transaction data to be analyzed, so that the personal information data is accurately encrypted, the desensitization processing is finished, and the personal information data safety in the bank transaction data is ensured.
In recent years, deep learning and neural networks have been widely used in the fields of computer vision, natural language processing, text signal processing, and the like. In addition, deep learning and neural networks have also shown levels approaching and even exceeding humans in the fields of image classification, object detection, semantic segmentation, text translation, and the like.
The development of deep learning and neural networks provides new solutions and solutions for mining implicit semantic understanding features about personal information data in the banking data to be analyzed.
Specifically, in the technical scheme of the application, firstly, bank transaction data to be analyzed is obtained. Then, considering that the to-be-analyzed banking transaction data contains a large amount of semantic information, and each semantic data information in the to-be-analyzed banking transaction data is composed of a plurality of words, the words have important significance for detecting personal information. Therefore, in the technical scheme of the application, in order to fully and accurately extract the context semantic association characteristics among the words in the banking transaction data to be analyzed, word segmentation processing is carried out on the banking transaction data to be analyzed, so that word sequence confusion is avoided in subsequent semantic understanding, and errors are caused in semantic understanding of the banking transaction data to be analyzed, thereby obtaining a sequence of transaction data words.
Next, considering that the banking transaction data to be analyzed are all terms of art, in order to improve the accuracy of semantic understanding of the banking transaction data to be analyzed, so as to accurately identify and detect personal information data, in the technical scheme of the application, the transaction data word sequence is further passed through a word embedding layer, so that the transaction data word sequence is mapped to an embedding vector by using the embedding layer to obtain a sequence of transaction data word feature vectors. In particular, here, the embedding layer may be constructed using knowledge maps of the term semantic features of the banking data such that prior information of the term semantic features of the banking data is introduced in the process of converting the transaction data word sequence into the embedding vector.
Further, considering that the bank transaction data to be analyzed contains a plurality of transaction data words, each transaction data word has a semantic association relation based on context. Therefore, in order to accurately perform semantic understanding of the banking transaction data to be analyzed, the personal information data is detected and identified, and the sequence of transaction data word feature vectors is further encoded in a context encoder based on a converter, so that global context semantic association feature information of each transaction data word of the banking transaction data to be analyzed, namely global semantic understanding feature information of each transaction data word in the banking transaction data to be analyzed, is extracted, and therefore context association semantic feature vectors of a plurality of transaction data words are obtained.
Then, considering that, for the banking transaction data to be analyzed, semantic association characteristic information exists among the transaction data words, the semantic association characteristic information plays an important role in identifying data segments of personal information data. Therefore, in order to improve the accuracy of detecting and identifying the data segments of the personal information data, in the technical scheme of the application, the sequence of the feature vectors of the transaction data words is further arranged into a two-dimensional feature matrix and then is processed in a text convolutional neural network model so as to extract global semantic association feature distribution information among all the transaction data words in the bank transaction data to be analyzed, thereby obtaining global understanding feature vectors of the transaction data words.
And then, respectively taking the context-associated semantic feature vectors of the transaction data words as query feature vectors, and calculating a transfer matrix between the query feature vectors and the transaction data word global understanding feature vectors to obtain a plurality of classification feature matrices. In this way, the semantic features of the transaction data words can be effectively extracted based on the global semantic association features among the transaction data words, so that semantic understanding of the transaction data words can be accurately performed, word data belonging to personal information in the transaction data words can be accurately identified, and encryption processing can be completed. Specifically, the classification feature matrixes are passed through a classifier to obtain classification results, and each classification result is used for indicating whether each transaction data word belongs to personal information. Thus, after the transaction data word belonging to the personal information is detected, the transaction data word is encrypted, so that the desensitization processing of the bank transaction data to be analyzed is finished, and the personal information data security in the bank transaction data is ensured.
In particular, in the technical solution of the present application, when the context-associated semantic feature vector of each transaction data word is used as a query feature vector, and a transition matrix between the query feature vector and the transaction data word global understanding feature vector is calculated to obtain a plurality of classification feature matrices, the intra-sample cross-correlation information of the context-associated semantic feature vector of each transaction data word is considered to be included in the transaction data word global understanding feature vector, so that the degree of association between the intra-sample cross-correlation information and the context-associated semantic feature vector of each transaction data word is low, and meanwhile, inconsistency exists in class probability density distribution, thereby affecting consistency among a plurality of classification results obtained when the obtained plurality of classification feature matrices pass through a classifier.
Accordingly, applicants of the present application separately calculate a contextually relevant semantic feature vector, e.g., denoted as V, for each transaction data word 1 And the transaction data word global understanding feature vector, e.g. denoted V 2 Affine mapping factors of the association-probability density distribution expressed as:
m is the context associated semantic feature vector V of the transaction data word 1 And the transaction data word global understanding feature vector V 2 The correlation matrix obtained by position-by-position correlation between the transaction data words is that mu and sigma are the context correlation semantic feature vectors V of the transaction data words 1 And the transaction data word global understanding feature vector V 2 The mean vector and the position-by-position variance matrix of the constructed Gaussian density map.
That is, by constructing each of the transaction data words contextually-associated semantic feature vectors V 1 And the transaction data word global understanding feature vector V 2 The associated feature space between and the probability density space represented by the Gaussian probability density can be obtained by combining eachThe transaction data word context associated semantic feature vector V 1 And the transaction data word global understanding feature vector V 2 Mapping into affine homography subspaces within associated feature space and class probability density space, respectively, to extract affine homography-compliant representations of feature representations within associated feature domain and class probability density domain by affine mapping factor values w with the associated-probability density distribution 1 And w 2 Context-associated semantic feature vectors V for each of the transaction data words 1 And the transaction data word global understanding feature vector V 2 Weighting is carried out, so that the context-associated semantic feature vector V of each transaction data word can be improved 1 And the transaction data word global understanding feature vector V 2 And the consistency on the class probability density distribution under the condition of carrying out association representation is improved, so that the consistency among a plurality of classification results obtained when the plurality of classification feature matrixes pass through the classifier is improved. Thus, the personal information data in the banking transaction data can be accurately encrypted, so that the desensitization processing of the banking transaction data is finished, and the personal information data in the banking transaction data is ensured to be safe.
Fig. 1 is an application scenario diagram of a banking transaction data management system according to an embodiment of the present application. As shown in fig. 1, in this application scenario, first, bank transaction data to be analyzed is acquired (e.g., C1 as illustrated in fig. 1); then, the acquired banking data to be analyzed is input into a server (e.g., S as illustrated in fig. 1) in which a banking data management algorithm is deployed, wherein the server is capable of processing the banking data to be analyzed based on the banking data management algorithm to generate a classification result indicating whether each transaction data word belongs to personal information, and desensitizing the banking data to be analyzed based on the plurality of classification results.
Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.
In one embodiment of the present application, FIG. 2 is a block diagram of a banking transaction data management system in accordance with an embodiment of the present application. As shown in fig. 2, the banking transaction data management system 100 according to an embodiment of the present application includes: a transaction data acquisition module 110, configured to acquire bank transaction data to be analyzed; the word segmentation processing module 120 is configured to perform word segmentation processing on the banking transaction data to be analyzed to obtain a sequence of transaction data words; a word embedding module 130, configured to pass the transaction data word sequence through a word embedding layer to obtain a sequence of transaction data word feature vectors; a transaction semantic understanding module 140, configured to pass the sequence of transaction data word feature vectors through a context encoder based on a converter to obtain a plurality of transaction data word context-associated semantic feature vectors; the global association coding module 150 is configured to arrange the sequence of feature vectors of the transaction data words into a two-dimensional feature matrix, and then obtain a global understanding feature vector of the transaction data words through a text convolutional neural network model; the query module 160 is configured to calculate a transition matrix between the context-associated semantic feature vector and the transaction data word global understanding feature vector to obtain a plurality of classification feature matrices, where the context-associated semantic feature vector of each transaction data word is used as a query feature vector; the detection result judging module 170 is configured to pass the plurality of classification feature matrices through a classifier to obtain a plurality of classification results, where each classification result is used to indicate whether each transaction data word belongs to personal information; and a desensitizing module 180, configured to desensitize the banking data to be analyzed based on the multiple classification results.
Specifically, in the embodiment of the present application, the transaction data collection module 110 is configured to obtain the bank transaction data to be analyzed. Accordingly, in the management process of actually performing banking data, it is desirable to perform text-based semantic understanding on the banking data to hide or encrypt data pieces related to personal information data to secure the personal information data.
However, since the transaction data of the bank is complicated and contains a large amount of basic information and transaction information, capturing and extracting the effective data information about the personal information data is difficult, and the security protection degree of the personal information data is reduced. Therefore, in the process, the difficulty is how to fully and accurately mine the implicit semantic understanding characteristics of the personal information data in the bank transaction data to be analyzed, so that the personal information data is accurately encrypted, the desensitization processing is finished, and the personal information data safety in the bank transaction data is ensured.
In recent years, deep learning and neural networks have been widely used in the fields of computer vision, natural language processing, text signal processing, and the like. In addition, deep learning and neural networks have also shown levels approaching and even exceeding humans in the fields of image classification, object detection, semantic segmentation, text translation, and the like.
The development of deep learning and neural networks provides new solutions and solutions for mining implicit semantic understanding features about personal information data in the banking data to be analyzed.
Specifically, in the technical scheme of the application, firstly, bank transaction data to be analyzed is obtained.
Specifically, in the embodiment of the present application, the word segmentation processing module 120 is configured to perform word segmentation processing on the banking transaction data to be analyzed to obtain a sequence of transaction data words. Then, considering that the to-be-analyzed banking transaction data contains a large amount of semantic information, and each semantic data information in the to-be-analyzed banking transaction data is composed of a plurality of words, the words have important significance for detecting personal information.
Therefore, in the technical scheme of the application, in order to fully and accurately extract the context semantic association characteristics among the words in the banking transaction data to be analyzed, word segmentation processing is carried out on the banking transaction data to be analyzed, so that word sequence confusion is avoided in subsequent semantic understanding, and errors are caused in semantic understanding of the banking transaction data to be analyzed, thereby obtaining a sequence of transaction data words.
Specifically, in the embodiment of the present application, the word embedding module 130 is configured to pass the transaction data word sequence through a word embedding layer to obtain a sequence of transaction data word feature vectors. Then, in order to improve the accuracy of semantic understanding of the banking data to be analyzed, the personal information data is accurately identified and detected in consideration of the fact that the banking data to be analyzed are all technical terms.
In the technical scheme of the application, the transaction data word sequence is further passed through a word embedding layer, so that the transaction data word sequence is mapped to an embedding vector by using the embedding layer to obtain a sequence of transaction data word feature vectors. In particular, here, the embedding layer may be constructed using knowledge maps of the term semantic features of the banking data such that prior information of the term semantic features of the banking data is introduced in the process of converting the transaction data word sequence into the embedding vector.
Specifically, in the embodiment of the present application, the transaction semantic understanding module 140 is configured to pass the sequence of transaction data word feature vectors through a context encoder based on a converter to obtain a plurality of transaction data word context-associated semantic feature vectors. Further, considering that the bank transaction data to be analyzed contains a plurality of transaction data words, each transaction data word has a semantic association relation based on context.
Therefore, in order to accurately perform semantic understanding of the banking transaction data to be analyzed, the personal information data is detected and identified, and the sequence of transaction data word feature vectors is further encoded in a context encoder based on a converter, so that global context semantic association feature information of each transaction data word of the banking transaction data to be analyzed, namely global semantic understanding feature information of each transaction data word in the banking transaction data to be analyzed, is extracted, and therefore context association semantic feature vectors of a plurality of transaction data words are obtained.
Fig. 3 is a block diagram of the transaction semantic understanding module in the banking transaction data management system according to an embodiment of the present application, and as shown in fig. 3, the transaction semantic understanding module 140 includes: a query vector construction unit 141, configured to one-dimensionally arrange the sequence of transaction data word feature vectors to obtain global transaction data word feature vectors; a self-attention unit 142, configured to calculate a product between the global transaction data word feature vector and a transpose vector of each transaction data word feature vector in the sequence of transaction data word feature vectors to obtain a plurality of self-attention correlation matrices; a normalization unit 143, configured to perform normalization processing on each of the plurality of self-attention correlation matrices to obtain a plurality of normalized self-attention correlation matrices; the attention calculating unit 144 is configured to obtain a plurality of probability values by using a Softmax classification function for each normalized self-attention correlation matrix in the plurality of normalized self-attention correlation matrices; and an attention applying unit 145, configured to weight each transaction data word feature vector in the sequence of transaction data word feature vectors with each probability value in the plurality of probability values as a weight to obtain the plurality of transaction data word context-associated semantic feature vectors.
The context encoder aims to mine for hidden patterns between contexts in the word sequence, optionally the encoder comprises: CNN (Convolutional Neural Network ), recurrent NN (RecursiveNeural Network, recurrent neural network), language Model (Language Model), and the like. The CNN-based method has a better extraction effect on local features, but has a poor effect on Long-Term Dependency (Long-Term Dependency) problems in sentences, so Bi-LSTM (Long Short-Term Memory) based encoders are widely used. The repetitive NN processes sentences as a tree structure rather than a sequence, has stronger representation capability in theory, but has the weaknesses of high sample marking difficulty, deep gradient disappearance, difficulty in parallel calculation and the like, so that the repetitive NN is less in practical application. The transducer has a network structure with wide application, has the characteristics of CNN and RNN, has a better extraction effect on global characteristics, and has a certain advantage in parallel calculation compared with RNN (RecurrentNeural Network ).
Specifically, in the embodiment of the present application, the global association encoding module 150 is configured to arrange the sequence of feature vectors of the transaction data word into a two-dimensional feature matrix, and then obtain the global understanding feature vector of the transaction data word through a text convolutional neural network model. Then, considering that, for the banking transaction data to be analyzed, semantic association characteristic information exists among the transaction data words, the semantic association characteristic information plays an important role in identifying data segments of personal information data.
Therefore, in order to improve the accuracy of detecting and identifying the data segments of the personal information data, in the technical scheme of the application, the sequence of the feature vectors of the transaction data words is further arranged into a two-dimensional feature matrix and then is processed in a text convolutional neural network model so as to extract global semantic association feature distribution information among all the transaction data words in the bank transaction data to be analyzed, thereby obtaining global understanding feature vectors of the transaction data words.
It should be appreciated that convolutional neural network (Convolutional Neural Network, CNN) is an artificial neural network and has wide application in the fields of image recognition and the like. The convolutional neural network may include an input layer, a hidden layer, and an output layer, where the hidden layer may include a convolutional layer, a pooling layer, an activation layer, a full connection layer, etc., where the previous layer performs a corresponding operation according to input data, outputs an operation result to the next layer, and obtains a final result after the input initial data is subjected to a multi-layer operation.
The convolutional neural network model has excellent performance in the aspect of image local feature extraction by taking a convolutional kernel as a feature filtering factor, and has stronger feature extraction generalization capability and fitting capability compared with the traditional image feature extraction algorithm based on statistics or feature engineering.
Wherein, the global associated coding module 150 is configured to: and respectively carrying out convolution processing, feature matrix-based mean pooling processing and nonlinear activation processing on input data in forward transfer of layers by using each layer of the text convolutional neural network model to output the transaction data word global understanding feature vector by the last layer of the text convolutional neural network model, wherein the input of the first layer of the text convolutional neural network model is the two-dimensional feature matrix.
Specifically, in the embodiment of the present application, the query module 160 is configured to calculate a transition matrix between the query module 160 and the transaction data word global understanding feature vector by using the context associated semantic feature vector of each transaction data word as a query feature vector, so as to obtain a plurality of classification feature matrices. And then, respectively taking the context-associated semantic feature vectors of the transaction data words as query feature vectors, and calculating a transfer matrix between the query feature vectors and the transaction data word global understanding feature vectors to obtain a plurality of classification feature matrices. In this way, the semantic features of the transaction data words can be effectively extracted based on the global semantic association features among the transaction data words, so that semantic understanding of the transaction data words can be accurately performed, word data belonging to personal information in the transaction data words can be accurately identified, and encryption processing can be completed.
Fig. 4 is a block diagram of the query module in the banking data management system according to an embodiment of the present application, and as shown in fig. 4, the query module 160 includes: an optimization factor calculation unit 161 for calculating correlation-probability density distribution affine mapping factors between the respective transaction data word context-associated semantic feature vectors and the transaction data word global-understanding feature vectors to obtain a plurality of first correlation-probability density distribution affine mapping factors and a plurality of second correlation-probability density distribution affine mapping factors, respectively; the weighting optimization unit 162 is configured to respectively weight the context-associated semantic feature vectors of the transaction data words and the global understanding feature vectors of the transaction data words with the affine mapping factors of the first association-probability density distribution and the affine mapping factors of the second association-probability density distribution as weights, so as to obtain a plurality of context-associated semantic feature vectors of the corrected transaction data words and a plurality of global understanding feature vectors of the corrected transaction data words; and a transfer association unit 163, configured to calculate a transfer matrix of the context-associated semantic feature vector of each group of corresponding corrected transaction data words relative to the global understanding feature vector of the corrected transaction data words, so as to obtain the plurality of classification feature matrices.
In particular, in the technical solution of the present application, when the context-associated semantic feature vector of each transaction data word is used as a query feature vector, and a transition matrix between the query feature vector and the transaction data word global understanding feature vector is calculated to obtain a plurality of classification feature matrices, the intra-sample cross-correlation information of the context-associated semantic feature vector of each transaction data word is considered to be included in the transaction data word global understanding feature vector, so that the degree of association between the intra-sample cross-correlation information and the context-associated semantic feature vector of each transaction data word is low, and meanwhile, inconsistency exists in class probability density distribution, thereby affecting consistency among a plurality of classification results obtained when the obtained plurality of classification feature matrices pass through a classifier.
Accordingly, applicants of the present application separately calculate a contextually relevant semantic feature vector, e.g., denoted as V, for each transaction data word 1 And the transaction data word global understanding feature vector, e.g. denoted V 2 Expressed as calculating, with the following optimization formula, the association-probability density distribution affine mapping factors between the respective transaction data word context associated semantic feature vectors and the transaction data word global understanding feature vectors to obtain the plurality of first association-probability density distribution affine mapping factors and the plurality of second association-probability density distribution affine mapping factors, respectively; wherein, the optimization formula is:
Wherein V is 1 Representing each of the aboveContextual semantic feature vectors, V, of individual transaction data words 2 Representing the global understanding feature vector of the transaction data word, M is an association matrix obtained by position-by-position association between the context association semantic feature vector of each transaction data word and the global understanding feature vector of the transaction data word, mu and sigma are mean vector and position-by-position variance matrix of each Gaussian density map formed by the context association semantic feature vector of each transaction data word and the global understanding feature vector of the transaction data word,representing matrix multiplication, exp (·) representing the exponential operation of the matrix representing the computation of the natural exponential function value raised to the eigenvalue of each position in the matrix, w 1 Representing each of the plurality of first associative-probability density distribution affine mapping factors, w 2 Representing each of the plurality of second association-probability density distribution affine mapping factors.
That is, by constructing each of the transaction data words contextually-associated semantic feature vectors V 1 And the transaction data word global understanding feature vector V 2 The associated feature space between and the probability density space represented by the gaussian probability density may be represented by contextually associating each of the trade data words with a semantic feature vector V 1 And the transaction data word global understanding feature vector V 2 Mapping into affine homography subspaces within associated feature space and class probability density space, respectively, to extract affine homography-compliant representations of feature representations within associated feature domain and class probability density domain by affine mapping factor values w with the associated-probability density distribution 1 And w 2 Context-associated semantic feature vectors V for each of the transaction data words 1 And the transaction data word global understanding feature vector V 2 Weighting is carried out, so that the context-associated semantic feature vector V of each transaction data word can be improved 1 And the transaction data word global understanding featureSign vector V 2 And the consistency on the class probability density distribution under the condition of carrying out association representation is improved, so that the consistency among a plurality of classification results obtained when the plurality of classification feature matrixes pass through the classifier is improved. Thus, the personal information data in the banking transaction data can be accurately encrypted, so that the desensitization processing of the banking transaction data is finished, and the personal information data in the banking transaction data is ensured to be safe.
Further, the transfer association unit 163 is configured to: calculating a transfer matrix of the context-associated semantic feature vector of each group of corresponding corrected transaction data words relative to the global understanding feature vector of the corrected transaction data words according to the following transfer formula to obtain a plurality of classification feature matrices; wherein, the transfer formula is:
wherein V is 1 Representing the context-associated semantic feature vector of the corrected transaction data word, V 2 Representing the corrected transaction data word global understanding feature vector, M representing the plurality of classification feature matrices,representing matrix multiplication.
Specifically, in the embodiment of the present application, the detection result judging module 170 and the desensitizing module 180 are configured to pass the plurality of classification feature matrices through a classifier to obtain a plurality of classification results, where each classification result is used to indicate whether each transaction data word belongs to personal information; and the system is used for desensitizing the bank transaction data to be analyzed based on the plurality of classification results.
And passing the classification feature matrixes through a classifier to obtain classification results, wherein each classification result is used for indicating whether each transaction data word belongs to personal information. Thus, after the transaction data word belonging to the personal information is detected, the transaction data word is encrypted, so that the desensitization processing of the bank transaction data to be analyzed is finished, and the personal information data security in the bank transaction data is ensured.
Fig. 5 is a block diagram of the detection result judging module in the banking transaction data management system according to the embodiment of the present application, and as shown in fig. 5, the detection result judging module 170 includes: a matrix developing unit 171 for developing the plurality of classification feature matrices into classification feature vectors according to row vectors or column vectors; a full-connection encoding unit 172, configured to perform full-connection encoding on the classification feature vector by using multiple full-connection layers of the classifier to obtain an encoded classification feature vector; and a classification unit 173, configured to pass the encoded classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
In a specific example of the present application, the classifier is used to process the plurality of classification feature matrices to obtain the classification result according to the following formula; wherein, the formula is:
O=softmax{(W n ,B n ):…:(W 1 ,B 1 ) Project (F), where W 1 To W n Is a weight matrix, B 1 To B n For bias vectors, project (F) is to Project the plurality of classification feature matrices as vectors.
In summary, the banking transaction data management system 100 according to the embodiment of the present application is illustrated to acquire banking transaction data to be analyzed; and mining implicit semantic understanding characteristics about the personal information data in the banking transaction data to be analyzed by adopting an artificial intelligence technology based on deep learning, and hiding or encrypting the data segments related to the personal information data based on the implicit semantic understanding characteristics so as to ensure the safety of the personal information data. Thus, the personal information data in the banking transaction data can be accurately encrypted, so that the desensitization processing of the banking transaction data is finished, and the personal information data in the banking transaction data is ensured to be safe.
As described above, the banking data management system 100 according to the embodiment of the present application may be implemented in various terminal devices, such as a server for banking data management, and the like. In one example, the banking data management system 100 according to an embodiment of the present application may be integrated into the terminal device as a software module and/or hardware module. For example, the banking data management system 100 may be a software module in the operating system of the terminal device or may be an application developed for the terminal device; of course, the banking data management system 100 could equally be one of a number of hardware modules of the terminal device.
Alternatively, in another example, the banking data management system 100 and the terminal device may be separate devices, and the banking data management system 100 may be connected to the terminal device through a wired and/or wireless network and transmit the interactive information in a contracted data format.
In one embodiment of the present application, fig. 6 is a flowchart of a banking transaction data management method according to an embodiment of the present application. As shown in fig. 6, a banking transaction data management method according to an embodiment of the present application includes: 210, acquiring bank transaction data to be analyzed; 220, word segmentation processing is carried out on the bank transaction data to be analyzed to obtain a transaction data word sequence; 230, passing the transaction data word sequence through a word embedding layer to obtain a sequence of transaction data word feature vectors; 240, passing the sequence of transaction data word feature vectors through a context encoder based on a converter to obtain a plurality of transaction data word context-associated semantic feature vectors; 250, arranging the sequence of the feature vectors of the transaction data words into a two-dimensional feature matrix, and obtaining global understanding feature vectors of the transaction data words through a text convolutional neural network model; 260, respectively taking the context associated semantic feature vectors of the transaction data words as query feature vectors, and calculating a transfer matrix between the query feature vectors and the transaction data word global understanding feature vectors to obtain a plurality of classification feature matrices; 270, passing the plurality of classification feature matrixes through a classifier to obtain a plurality of classification results, wherein each classification result is used for indicating whether each transaction data word belongs to personal information; and 280, performing desensitization processing on the bank transaction data to be analyzed based on the classification results.
Fig. 7 is a schematic diagram of a system architecture of a banking transaction data management method according to an embodiment of the present application. As shown in fig. 7, in the system architecture of the banking transaction data management method, first, banking transaction data to be analyzed is acquired; then, word segmentation processing is carried out on the bank transaction data to be analyzed so as to obtain a transaction data word sequence; then, the transaction data word sequence passes through a word embedding layer to obtain a sequence of transaction data word feature vectors; then, the sequence of transaction data word feature vectors passes through a context encoder based on a converter to obtain a plurality of transaction data word context-associated semantic feature vectors; then, the sequence of the feature vectors of the transaction data words is arranged into a two-dimensional feature matrix, and the feature vectors of the transaction data words are globally understood through a text convolutional neural network model; then, respectively taking the context-associated semantic feature vectors of the transaction data words as query feature vectors, and calculating a transfer matrix between the query feature vectors and the transaction data word global understanding feature vectors to obtain a plurality of classification feature matrices; then, the classification feature matrixes pass through a classifier to obtain classification results, and each classification result is used for indicating whether each transaction data word belongs to personal information or not; and finally, based on the plurality of classification results, desensitizing the bank transaction data to be analyzed.
In a specific example, in the banking transaction data management method, passing the sequence of transaction data word feature vectors through a context encoder based on a converter to obtain a plurality of transaction data word context-associated semantic feature vectors includes: one-dimensional arrangement is carried out on the sequence of the transaction data word feature vectors so as to obtain global transaction data word feature vectors; calculating the product between the global transaction data word feature vector and the transpose vector of each transaction data word feature vector in the sequence of transaction data word feature vectors to obtain a plurality of self-attention association matrices; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and weighting each transaction data word feature vector in the sequence of transaction data word feature vectors by taking each probability value in the plurality of probability values as a weight to obtain the context-associated semantic feature vectors of the plurality of transaction data words.
In a specific example, in the banking transaction data management method, the step of arranging the sequence of the transaction data word feature vectors into a two-dimensional feature matrix and then obtaining the transaction data word global understanding feature vector through a text convolutional neural network model includes: and respectively carrying out convolution processing, feature matrix-based mean pooling processing and nonlinear activation processing on input data in forward transfer of layers by using each layer of the text convolutional neural network model to output the transaction data word global understanding feature vector by the last layer of the text convolutional neural network model, wherein the input of the first layer of the text convolutional neural network model is the two-dimensional feature matrix.
In a specific example, in the banking transaction data management method, each transaction data word context-associated semantic feature vector is used as a query feature vector, and a transfer matrix between the query feature vector and the transaction data word global understanding feature vector is calculated to obtain a plurality of classification feature matrices, including: calculating correlation-probability density distribution affine mapping factors between the context correlation semantic feature vectors of the transaction data words and the global understanding feature vectors of the transaction data words respectively to obtain a plurality of first correlation-probability density distribution affine mapping factors and a plurality of second correlation-probability density distribution affine mapping factors; respectively weighting the context-associated semantic feature vectors of all the transaction data words and the global understanding feature vectors of all the transaction data words by taking the affine mapping factors of the first association-probability density distribution and the affine mapping factors of the second association-probability density distribution as weights so as to obtain context-associated semantic feature vectors of all the transaction data words and global understanding feature vectors of all the transaction data words after correction; and calculating a transfer matrix of the context-associated semantic feature vector of each corresponding corrected transaction data word relative to the global understanding feature vector of the corrected transaction data word to obtain the plurality of classification feature matrices.
In a specific example, in the above banking transaction data management method, calculating the association-probability density distribution affine mapping factors between the respective transaction data word context associated semantic feature vectors and the transaction data word global understanding feature vectors to obtain a plurality of first association-probability density distribution affine mapping factors and a plurality of second association-probability density distribution affine mapping factors, respectively, includes: calculating correlation-probability density distribution affine mapping factors between the context-associated semantic feature vectors of the respective transaction data words and the transaction data word global understanding feature vector respectively in the following optimization formula to obtain the plurality of first correlation-probability density distribution affine mapping factors and the plurality of second correlation-probability density distribution affine mapping factors; wherein, the optimization formula is:
wherein V is 1 Representing the context associated semantic feature vectors of the respective transaction data words, V 2 Representing the global understanding feature vector of the transaction data word, M is an association matrix obtained by position-by-position association between the context association semantic feature vector of each transaction data word and the global understanding feature vector of the transaction data word, mu and sigma are mean vector and position-by-position variance matrix of each Gaussian density map formed by the context association semantic feature vector of each transaction data word and the global understanding feature vector of the transaction data word, Representing matrix multiplication, exp (·) representing the exponential operation of the matrix representing the computation of the natural exponential function value raised to the eigenvalue of each position in the matrix, w 1 Representing each of the plurality of first associative-probability density distribution affine mapping factors, w 2 Representing each of the plurality of second association-probability density distribution affine mapping factors.
In a specific example, in the banking transaction data management method, calculating a transfer matrix of the corrected transaction data word context associated semantic feature vector corresponding to each group with respect to the corrected transaction data word global understanding feature vector to obtain the plurality of classification feature matrices includes: calculating a transfer matrix of the context-associated semantic feature vector of each group of corresponding corrected transaction data words relative to the global understanding feature vector of the corrected transaction data words according to the following transfer formula to obtain a plurality of classification feature matrices; wherein, the transfer formula is:
wherein V is 1 Representing the context-associated semantic feature vector of the corrected transaction data word, V 2 Representing the corrected transaction data word global understanding feature vector, M representing the plurality of classification feature matrices,representing matrix multiplication.
In a specific example, in the banking transaction data management method, the classifying feature matrices are passed through a classifier to obtain a plurality of classification results, and each classification result is used for indicating whether each transaction data word belongs to personal information, including: expanding the plurality of classification feature matrixes into classification feature vectors according to row vectors or column vectors; performing full-connection coding on the classification feature vectors by using a plurality of full-connection layers of the classifier to obtain coded classification feature vectors; and passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
It will be appreciated by those skilled in the art that the specific operation of the various steps in the banking data management method described above has been described in detail in the description of the banking data management system described above with reference to figures 1 to 5 and, therefore, duplicate descriptions thereof will be omitted.
The present application also provides a computer program product comprising instructions which, when executed, cause an apparatus to perform operations corresponding to the above-described method.
In one embodiment of the present application, there is also provided a computer-readable storage medium storing a computer program for executing the above-described method.
It should be appreciated that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the forms of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects may be utilized. Furthermore, the computer program product may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Methods, systems, and computer program products of embodiments of the present application are described in the flow diagrams and/or block diagrams. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The basic principles of the present application have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not intended to be limiting, and these advantages, benefits, effects, etc. are not to be considered as essential to the various embodiments of the present application. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not necessarily limited to practice with the above described specific details.
The block diagrams of the devices, apparatuses, devices, systems referred to in the present application are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
It is also noted that in the apparatus, devices and methods of the present application, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent aspects of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims (10)

1. A banking transaction data management system, comprising:
the transaction data acquisition module is used for acquiring bank transaction data to be analyzed;
the word segmentation processing module is used for carrying out word segmentation processing on the bank transaction data to be analyzed to obtain a transaction data word sequence;
the word embedding module is used for enabling the transaction data word sequence to pass through a word embedding layer to obtain a sequence of transaction data word feature vectors;
the transaction semantic understanding module is used for enabling the sequence of the transaction data word feature vectors to pass through a context encoder based on a converter to obtain a plurality of transaction data word context-associated semantic feature vectors;
the global association coding module is used for arranging the sequence of the transaction data word feature vectors into a two-dimensional feature matrix and obtaining transaction data word global understanding feature vectors through a text convolutional neural network model;
the query module is used for respectively taking the context-associated semantic feature vectors of the transaction data words as query feature vectors, and calculating a transfer matrix between the context-associated semantic feature vectors and the transaction data word global understanding feature vectors so as to obtain a plurality of classification feature matrices;
the detection result judging module is used for enabling the plurality of classification feature matrixes to pass through a classifier to obtain a plurality of classification results, and each classification result is used for indicating whether each transaction data word belongs to personal information or not; and
And the desensitization module is used for carrying out desensitization processing on the bank transaction data to be analyzed based on the plurality of classification results.
2. The banking transaction data management system according to claim 1, wherein the transaction semantic understanding module includes:
the query vector construction unit is used for one-dimensionally arranging the sequence of the transaction data word feature vectors to obtain global transaction data word feature vectors;
a self-attention unit, configured to calculate a product between the global transaction data word feature vector and a transpose vector of each transaction data word feature vector in the sequence of transaction data word feature vectors to obtain a plurality of self-attention correlation matrices;
the normalization unit is used for respectively performing normalization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of normalized self-attention correlation matrices;
the attention calculating unit is used for obtaining a plurality of probability values through a Softmax classification function by each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and
and the attention applying unit is used for weighting each transaction data word characteristic vector in the sequence of the transaction data word characteristic vector by taking each probability value in the plurality of probability values as a weight so as to obtain the plurality of transaction data word context-associated semantic characteristic vectors.
3. The banking data management system according to claim 2, wherein the global association encoding module is configured to: and respectively carrying out convolution processing, feature matrix-based mean pooling processing and nonlinear activation processing on input data in forward transfer of layers by using each layer of the text convolutional neural network model to output the transaction data word global understanding feature vector by the last layer of the text convolutional neural network model, wherein the input of the first layer of the text convolutional neural network model is the two-dimensional feature matrix.
4. A banking data management system according to claim 3, wherein the inquiry module comprises:
an optimization factor calculation unit, configured to calculate association-probability density distribution affine mapping factors between the context associated semantic feature vectors of the respective transaction data words and the global understanding feature vectors of the transaction data words, respectively, to obtain a plurality of first association-probability density distribution affine mapping factors and a plurality of second association-probability density distribution affine mapping factors;
the weighted optimization unit is used for respectively taking the plurality of first association-probability density distribution affine mapping factors and the plurality of second association-probability density distribution affine mapping factors as weights and respectively weighting the context association semantic feature vectors of all transaction data words and the transaction data word global understanding feature vectors so as to obtain a plurality of corrected transaction data word context association semantic feature vectors and a plurality of corrected transaction data word global understanding feature vectors; and
And the transfer association unit is used for calculating a transfer matrix of the context association semantic feature vector of each group of corresponding corrected transaction data words relative to the global understanding feature vector of the corrected transaction data words so as to obtain the plurality of classification feature matrices.
5. The banking data management system according to claim 4, wherein the optimization factor calculation unit is configured to: calculating correlation-probability density distribution affine mapping factors between the context-associated semantic feature vectors of the respective transaction data words and the transaction data word global understanding feature vector respectively in the following optimization formula to obtain the plurality of first correlation-probability density distribution affine mapping factors and the plurality of second correlation-probability density distribution affine mapping factors;
wherein, the optimization formula is:
wherein V is 1 Representing the context associated semantic feature vectors of the respective transaction data words, V 2 Representing the transaction data word global understanding feature vector, M being between the context associated semantic feature vector of each transaction data word and the transaction data word global understanding feature vectorμ and Σ are mean vectors and position-by-position variance matrices of each Gaussian density map formed by context association semantic feature vectors of each transaction data word and global understanding feature vectors of the transaction data word, Representing matrix multiplication, exp (·) representing the exponential operation of the matrix representing the computation of the natural exponential function value raised to the eigenvalue of each position in the matrix, w 1 Representing each of the plurality of first associative-probability density distribution affine mapping factors, w 2 Representing each of the plurality of second association-probability density distribution affine mapping factors.
6. The banking data management system according to claim 5, wherein the transfer association unit is configured to: calculating a transfer matrix of the context-associated semantic feature vector of each group of corresponding corrected transaction data words relative to the global understanding feature vector of the corrected transaction data words according to the following transfer formula to obtain a plurality of classification feature matrices;
wherein, the transfer formula is:
wherein V is 1 Representing the context-associated semantic feature vector of the corrected transaction data word, V 2 Representing the corrected transaction data word global understanding feature vector, M representing the plurality of classification feature matrices,representing matrix multiplication.
7. The banking data management system according to claim 6, wherein the detection result judging module includes:
A matrix expansion unit, configured to expand the plurality of classification feature matrices into classification feature vectors according to row vectors or column vectors;
the full-connection coding unit is used for carrying out full-connection coding on the classification characteristic vectors by using a plurality of full-connection layers of the classifier so as to obtain coded classification characteristic vectors; and
and the classification unit is used for passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
8. A method of managing banking transaction data, comprising:
acquiring bank transaction data to be analyzed;
word segmentation processing is carried out on the bank transaction data to be analyzed so as to obtain a transaction data word sequence;
passing the transaction data word sequence through a word embedding layer to obtain a sequence of transaction data word feature vectors;
passing the sequence of transaction data word feature vectors through a context encoder based on a transducer to obtain a plurality of transaction data word context-associated semantic feature vectors;
the sequence of the feature vectors of the transaction data words is arranged into a two-dimensional feature matrix, and then the feature vectors of the transaction data words are globally understood through a text convolutional neural network model;
respectively taking the context-associated semantic feature vectors of the transaction data words as query feature vectors, and calculating transfer matrixes between the query feature vectors and the transaction data word global understanding feature vectors to obtain a plurality of classification feature matrixes;
The classification feature matrixes pass through a classifier to obtain classification results, and each classification result is used for indicating whether each transaction data word belongs to personal information; and
and based on the classification results, desensitizing the bank transaction data to be analyzed.
9. The method of claim 8, wherein passing the sequence of transaction data word feature vectors through a context encoder based on a transducer to obtain a plurality of transaction data word context-associated semantic feature vectors, comprises:
one-dimensional arrangement is carried out on the sequence of the transaction data word feature vectors so as to obtain global transaction data word feature vectors;
calculating the product between the global transaction data word feature vector and the transpose vector of each transaction data word feature vector in the sequence of transaction data word feature vectors to obtain a plurality of self-attention association matrices;
respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices;
obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and
And weighting each transaction data word feature vector in the sequence of transaction data word feature vectors by taking each probability value in the plurality of probability values as a weight to obtain the context-associated semantic feature vectors of the transaction data words.
10. The method according to claim 9, wherein the step of arranging the sequence of feature vectors of the transaction data words into a two-dimensional feature matrix and then obtaining the global understanding feature vector of the transaction data words through a text convolutional neural network model includes: and respectively carrying out convolution processing, feature matrix-based mean pooling processing and nonlinear activation processing on input data in forward transfer of layers by using each layer of the text convolutional neural network model to output the transaction data word global understanding feature vector by the last layer of the text convolutional neural network model, wherein the input of the first layer of the text convolutional neural network model is the two-dimensional feature matrix.
CN202310540633.0A 2023-05-15 2023-05-15 Bank transaction data management method and system Withdrawn CN116595551A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310540633.0A CN116595551A (en) 2023-05-15 2023-05-15 Bank transaction data management method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310540633.0A CN116595551A (en) 2023-05-15 2023-05-15 Bank transaction data management method and system

Publications (1)

Publication Number Publication Date
CN116595551A true CN116595551A (en) 2023-08-15

Family

ID=87598525

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310540633.0A Withdrawn CN116595551A (en) 2023-05-15 2023-05-15 Bank transaction data management method and system

Country Status (1)

Country Link
CN (1) CN116595551A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117116498A (en) * 2023-10-23 2023-11-24 吉林大学 Mobile ward-round data processing system and method thereof

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117116498A (en) * 2023-10-23 2023-11-24 吉林大学 Mobile ward-round data processing system and method thereof

Similar Documents

Publication Publication Date Title
CN111476294B (en) Zero sample image identification method and system based on generation countermeasure network
CN111581405B (en) Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning
Shi et al. A novel multi-branch channel expansion network for garbage image classification
CN110929080A (en) Optical remote sensing image retrieval method based on attention and generation countermeasure network
CN115951883B (en) Service component management system of distributed micro-service architecture and method thereof
Li et al. Dating ancient paintings of Mogao Grottoes using deeply learnt visual codes
CN112765370A (en) Entity alignment method and device of knowledge graph, computer equipment and storage medium
CN115222998B (en) Image classification method
CN116595551A (en) Bank transaction data management method and system
Chen et al. Malicious URL detection based on improved multilayer recurrent convolutional neural network model
Zhou et al. Hyperspectral image change detection by self-supervised tensor network
Chen et al. Image classification based on convolutional denoising sparse autoencoder
Dang et al. Spectral‐Spatial Attention Transformer with Dense Connection for Hyperspectral Image Classification
Yang et al. Bootstrapping interactive image-text alignment for remote sensing image captioning
CN114494777A (en) Hyperspectral image classification method and system based on 3D CutMix-transform
Yuan et al. CSCIM_FS: Cosine similarity coefficient and information measurement criterion-based feature selection method for high-dimensional data
CN117521012A (en) False information detection method based on multi-mode context hierarchical step alignment
CN117009516A (en) Converter station fault strategy model training method, pushing method and device
CN117082118A (en) Network connection method based on data derivation and port prediction
CN113887504A (en) Strong-generalization remote sensing image target identification method
Li et al. ViT2CMH: Vision Transformer Cross-Modal Hashing for Fine-Grained Vision-Text Retrieval.
Li et al. MS-Former: Memory-Supported Transformer for Weakly Supervised Change Detection with Patch-Level Annotations
Kang et al. Learning binary semantic embedding for breast histology image classification and retrieval
CN116434263A (en) Bank financial data management method and system
CN116821408B (en) Multi-task consistency countermeasure retrieval method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20230815

WW01 Invention patent application withdrawn after publication