CN117649240A - Suspicious account identification method, suspicious account identification system, suspicious account identification device, suspicious account identification storage medium, suspicious account identification program product - Google Patents

Suspicious account identification method, suspicious account identification system, suspicious account identification device, suspicious account identification storage medium, suspicious account identification program product Download PDF

Info

Publication number
CN117649240A
CN117649240A CN202311742722.XA CN202311742722A CN117649240A CN 117649240 A CN117649240 A CN 117649240A CN 202311742722 A CN202311742722 A CN 202311742722A CN 117649240 A CN117649240 A CN 117649240A
Authority
CN
China
Prior art keywords
transaction
data
identifier
entity
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311742722.XA
Other languages
Chinese (zh)
Inventor
向玲
张雷
李迪
严东
黄霖慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Life Insurance Co ltd
Original Assignee
China Life Insurance Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Life Insurance Co ltd filed Critical China Life Insurance Co ltd
Priority to CN202311742722.XA priority Critical patent/CN117649240A/en
Publication of CN117649240A publication Critical patent/CN117649240A/en
Pending legal-status Critical Current

Links

Landscapes

  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The application relates to a suspicious account identification method, a suspicious account identification system, suspicious account identification equipment, a storage medium and a program product. The method comprises the following steps: acquiring transaction data and determining risk data in the transaction data; extracting transaction event data from the risk data, and constructing a knowledge graph according to the transaction event data, wherein the transaction event data at least comprises a transaction entity identifier, an entity relationship and transaction time, and the transaction entity identifier at least comprises an account identifier; when suspicious accounts are identified, the target account identification to be identified is firstly determined, a transaction clue diagram taking the target account identification as an index is determined in the knowledge graph, and whether the target account is the suspicious account is identified according to the transaction clue diagram. By adopting the method, the unique entity and relationship characteristics of the knowledge graph can be utilized to mine suspicious clues and invisible relations thereof in transaction event data, so that the accuracy and reliability of suspicious account identification are improved, and the working efficiency of staff is also improved.

Description

Suspicious account identification method, suspicious account identification system, suspicious account identification device, suspicious account identification storage medium, suspicious account identification program product
Technical Field
The present disclosure relates to the field of big data technologies, and in particular, to a suspicious account identification method, system, device, storage medium, and program product.
Background
The money laundering is the action of converting illegal source funds into legal funds or hiding the source of funds through financial institutions such as banks, insurance companies and the like, the money laundering work is an important means for preventing financial risks, maintaining financial stability and constructing harmonious good financial environments, and in the money laundering work, the accurate identification of suspicious accounts is particularly important.
In the traditional money back-flushing work, fixed index triggering is performed by a rule engine mainly aiming at single transaction data and single transaction data, and suspicious accounts are identified.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a suspicious account identification method, system, apparatus, storage medium, and program product that can improve identification efficiency and reliability.
In a first aspect, the present application provides a suspicious account identification method, the method including:
acquiring transaction data and determining risk data in the transaction data;
extracting transaction event data from the risk data, and constructing a knowledge graph according to the transaction event data, wherein the transaction event data at least comprises a transaction entity identifier, an entity relationship and transaction time, and the transaction entity identifier at least comprises an account identifier;
And determining a target account identifier to be identified, determining a transaction cue graph indexed by the target account identifier in the knowledge graph, and identifying whether the target account is a suspicious account according to the transaction cue graph.
In one embodiment, constructing a knowledge-graph from transaction event data includes:
and constructing a directed graph by taking the transaction entity identifier as a node and the entity relationship as an edge, and correlating the transaction time to obtain a knowledge graph.
In one embodiment, determining a thread map for transactions indexed by the target account identification in the knowledge graph includes:
and positioning the target account identifier in the knowledge graph, determining a target entity relationship and a target transaction entity identifier which are associated with the target account identifier, and determining a transaction cue graph according to the target account identifier, the target entity relationship and the target transaction entity identifier.
In one embodiment, determining a target entity relationship associated with a target account identification and a target transaction entity identification includes:
and expanding based on the target account identifier, taking the entity relationship directly connected or indirectly connected with the target account identifier as a target entity relationship, and taking the transaction entity identifier indirectly connected with the target account identifier as a target transaction entity identifier.
In one embodiment, acquiring transaction data and determining risk data in the transaction data includes:
acquiring original transaction data of a plurality of systems, and carrying out data summarization on each original transaction data to obtain transaction data;
and carrying out data extraction and conversion on the transaction data to obtain original risk data, and carrying out normalization processing on the original risk data to obtain risk data.
In one embodiment, the method further comprises:
the risk data, transaction event data and knowledge graph are stored in different databases, respectively.
In a second aspect, the present application provides a suspicious account identification system, the system comprising:
the risk data acquisition module is used for acquiring transaction data and determining risk data in the transaction data;
the risk data processing module is used for extracting transaction event data from the risk data and constructing a knowledge graph according to the transaction event data, wherein the transaction event data at least comprises a transaction entity identifier, an entity relationship and transaction time, and the transaction entity identifier at least comprises an account identifier;
and the suspicious account identification module is used for determining target account identification to be identified, determining a transaction clue diagram taking the target account identification as an index in the knowledge graph, and identifying whether the target account is a suspicious account according to the transaction clue diagram.
In a third aspect, the present application provides a computer device comprising a memory storing a computer program and a processor, the processor implementing the following steps when executing the computer program:
acquiring transaction data and determining risk data in the transaction data;
extracting transaction event data from the risk data, and constructing a knowledge graph according to the transaction event data, wherein the transaction event data at least comprises a transaction entity identifier, an entity relationship and transaction time, and the transaction entity identifier at least comprises an account identifier;
and determining a target account identifier to be identified, determining a transaction cue graph indexed by the target account identifier in the knowledge graph, and identifying whether the target account is a suspicious account according to the transaction cue graph.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring transaction data and determining risk data in the transaction data;
extracting transaction event data from the risk data, and constructing a knowledge graph according to the transaction event data, wherein the transaction event data at least comprises a transaction entity identifier, an entity relationship and transaction time, and the transaction entity identifier at least comprises an account identifier;
And determining a target account identifier to be identified, determining a transaction cue graph indexed by the target account identifier in the knowledge graph, and identifying whether the target account is a suspicious account according to the transaction cue graph.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of:
acquiring transaction data and determining risk data in the transaction data;
extracting transaction event data from the risk data, and constructing a knowledge graph according to the transaction event data, wherein the transaction event data at least comprises a transaction entity identifier, an entity relationship and transaction time, and the transaction entity identifier at least comprises an account identifier;
and determining a target account identifier to be identified, determining a transaction cue graph indexed by the target account identifier in the knowledge graph, and identifying whether the target account is a suspicious account according to the transaction cue graph.
The suspicious account identification method, the suspicious account identification system, the suspicious account identification equipment, the suspicious account identification storage medium and the suspicious account identification program product are used for acquiring transaction data and determining risk data in the transaction data; extracting transaction event data from the risk data, and constructing a knowledge graph according to the transaction event data, wherein the transaction event data at least comprises a transaction entity identifier, an entity relationship and transaction time, and the transaction entity identifier at least comprises an account identifier; when suspicious accounts are identified, the target account identification to be identified is firstly determined, a transaction clue diagram taking the target account identification as an index is determined in the knowledge graph, and whether the target account is the suspicious account is identified according to the transaction clue diagram. The method is characterized by taking a business rule, utilizing the specific entity and relation characteristics of a knowledge graph, mining suspicious clues and invisible relations thereof in transaction event data, providing a novel data analysis method in the financial related business field, and scientifically analyzing suspicious transaction behaviors by constructing the knowledge graph and displaying transaction data with money laundering risks in a clue graph mode, thereby improving the accuracy and reliability of suspicious account identification and improving the working efficiency of staff.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for a person having ordinary skill in the art.
FIG. 1 is an application environment diagram of a suspicious account identification method in one embodiment;
FIG. 2 is a flow diagram of a suspicious account identification method in one embodiment;
FIG. 3 is a diagram of an interface for querying according to entity identification in one embodiment;
FIG. 4 is an expanded schematic diagram of entity relationships in one embodiment;
FIG. 5 is a schematic diagram of an entity relationship expansion in another embodiment;
FIG. 6 is a flow diagram of knowledge graph construction data in one embodiment;
FIG. 7 is a flow chart of a suspicious account identification method according to another embodiment;
FIG. 8 is a block diagram of a suspicious account identification system in one embodiment;
fig. 9 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The suspicious account identification method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The server 104 obtains the transaction data and determines risk data in the transaction data; the server 104 extracts transaction event data from the risk data, and constructs a knowledge graph according to the transaction event data, wherein the transaction event data at least comprises a transaction entity identifier, an entity relationship and a transaction time, and the transaction entity identifier at least comprises an account identifier; the server 104 determines a target account identification to be identified, determines a transaction cue graph indexed by the target account identification in the knowledge graph, and identifies whether the target account is a suspicious account according to the transaction cue graph.
The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In an exemplary embodiment, as shown in fig. 2, a suspicious account identification method is provided, which is illustrated by using the method applied to the server 104 in fig. 1 as an example, and includes the following steps 202 to 206. Wherein:
step 202, acquiring transaction data and determining risk data in the transaction data.
The transaction data is data for describing transaction actions, including time of transaction, object of transaction, content of transaction, type of transaction and the like, and by analyzing the transaction data, risk data in the transaction data is found, which is important for identifying suspicious accounts. Risk data represents data in which there is a risk in transaction data, such as frequent transaction actions, high-priced transaction data, and the like.
By way of example, risk data may be determined in the transaction data by way of data extraction, data conversion, data loading, and the like. The determination of risk data can be achieved by data extraction tools in the face of huge amounts of data.
In an exemplary embodiment, the determined risk data may be expressed as CSV (common-Separated Values) data, where the CSV data stores table data in a plain text form, and the CSV data may implement data conversion between incompatible systems, thereby improving data processing efficiency.
And 204, extracting transaction event data from the risk data, and constructing a knowledge graph according to the transaction event data, wherein the transaction event data at least comprises a transaction entity identifier, an entity relationship and a transaction time, and the transaction entity identifier at least comprises an account identifier.
The transaction event data is used for describing transaction events, and takes the insurance field as an example, the transaction events can be a policy application event, a policy return event, a policy borrowing event, a policy claim settlement event and the like. In the transaction event data, the transaction entity identifier is used for uniquely determining a transaction entity in a transaction event, such as an account identifier, a policy identifier, a marketer identifier and the like, wherein the account identifier can be an account number, an identity card number and the like, the insurance identifier can be a policy number, and the marketer identifier can be a work number, a serial number and the like of a marketer; entity relationships represent relationships between transaction entities in a transaction event, such as relationships between accounts and policies, relationships between accounts and marketers, and the like; the transaction time represents the time at which the transaction event occurred. And constructing a knowledge graph according to the transaction event data, and correlating the transaction event data, so that the knowledge graph is more visual.
For example, the knowledge graph can be constructed by processing transaction event data through a JSON (JavaScript Object Notation, JS object profile) Script generated by the Aviator Script calculation engine. The Aviator Script calculation engine is used for converging data processing into Excel, so that the data blood margin, namely the source of each table and each field, is recorded, the data hierarchy is divided more clearly, the historical data processing process can be searched from the Excel table when iteration development demands exist, so that a development mode of new demands is designed, the handover time of developers can be effectively shortened when the developers change, the Aviator Script calculation engine also avoids directly developing redundant complex spark codes, the developers can share one automation program of generating JSON (Java Server) from the Excel table, the development cost is effectively reduced, and the data processing efficiency is improved.
Step 206, determining the target account identification to be identified, determining a transaction cue graph indexed by the target account identification in the knowledge graph, and identifying whether the target account is a suspicious account according to the transaction cue graph.
Wherein the target account identification may be used to uniquely represent the identification of the target account. The trade clue graph can be considered as a part of the knowledge graph, when suspicious identification is carried out on accounts, the target account identification to be identified is firstly determined, and the trade clue graph with the target account identification as an index is determined in the knowledge graph, namely, the trade clue graph formed by independently extracting the data related to the target account identification in the knowledge graph. Through analysis and processing of the thread map, whether the target account is a suspicious account is determined. The suspicious account is identified by the method, and the account identification is visually carried out, so that the efficiency and accuracy of suspicious account identification can be improved.
In the suspicious account identification method, transaction data are acquired, and risk data are determined in the transaction data; extracting transaction event data from the risk data, and constructing a knowledge graph according to the transaction event data, wherein the transaction event data at least comprises a transaction entity identifier, an entity relationship and transaction time, and the transaction entity identifier at least comprises an account identifier; when suspicious accounts are identified, the target account identification to be identified is firstly determined, a transaction clue diagram taking the target account identification as an index is determined in the knowledge graph, and whether the target account is the suspicious account is identified according to the transaction clue diagram. The method is characterized by taking a business rule, utilizing the specific entity and relation characteristics of a knowledge graph, mining suspicious clues and invisible relations thereof in transaction event data, providing a novel data analysis method in the financial related business field, and scientifically analyzing suspicious transaction behaviors by constructing the knowledge graph and displaying transaction data with money laundering risks in a clue graph mode, thereby improving the accuracy and reliability of suspicious account identification and improving the working efficiency of staff.
In one exemplary embodiment, building a knowledge-graph from transaction event data includes: and constructing a directed graph by taking the transaction entity identifier as a node and the entity relationship as an edge, and correlating the transaction time to obtain a knowledge graph.
The knowledge graph is a knowledge base for representing entities and relations, and the interrelationships among different entities are presented in a graphical mode. The knowledge graph is constructed by firstly extracting transaction entities and entity relations from transaction event data, then carrying out structural representation on the extracted transaction entities and entity relations, and storing the obtained knowledge graph, and further carrying out data reasoning, data expansion or data query on the generated knowledge graph.
By taking an example of an insurance event, extracting an insurance applicant identifier, an insurance person identifier, an insurance policy identifier, a marketer identifier, an insurance time, an insurance policy time and the like from insurance data before constructing a knowledge graph, wherein the insurance applicant identifier, the insurance policy identifier, the marketer identifier are transaction entity identifiers, an insurance relation exists between the insurance applicant and the insurance policy, a purchase relation exists between the insurance applicant and the insurance policy, a guarantee relation exists between the insurance policy and the marketer, a management relation exists between the insurance policy and the marketer, the insurance relation, the purchase relation, the guarantee relation and the management relation are entity relations, a directed graph is constructed according to the transaction entity identifiers and the entity relation, and the insurance event and the insurance policy event of the insurance event are associated with a formed structure diagram, so that the knowledge graph is obtained.
As shown in fig. 3, an interface for querying according to entity identifiers in an embodiment is shown, where in this embodiment, a knowledge graph is constructed according to transaction event data, and in a foreground application, information of an entity may be queried according to the entity identifiers. For example, in fig. 3, taking the entity identifier as the account identifier, the information of the account 001 is queried, so that the entity category corresponding to the account identifier includes a client entity, and the client entity has a withdrawal event of 2688 yuan on the date of 23, 4, 2019, and the withdrawn policy number is 999. According to the method and the device for identifying the suspicious account, the transaction entity identification, the entity relationship and the transaction event data are associated, the query efficiency of the entity is improved, the relationship between the entities can be conveniently mined, and the suspicious account can be accurately identified.
In a further embodiment, determining a thread map for transactions indexed by the target account identification in the knowledge graph includes: and positioning the target account identifier in the knowledge graph, determining a target entity relationship and a target transaction entity identifier which are associated with the target account identifier, and determining a transaction cue graph according to the target account identifier, the target entity relationship and the target transaction entity identifier.
The target account identifier is an identifier for uniquely determining a target account, the target account identifier can be an ID (identity) of the target account, an identity card number and the like, an entity relationship associated with the target account identifier is used as a target entity relationship in a knowledge graph, and a transaction cue graph is obtained according to the target account identifier and the target entity relationship. In addition, the transaction entity identifier associated with the target entity relationship is also used as a target transaction entity identifier, and the target transaction entity identifier can be a policy identifier, a marketer identifier, a insured person identifier or the like. And determining a transaction cue map according to the target account identification, the target entity relationship and the target transaction entity identification, and realizing data expansion of the target account identification.
For example, an account with the account identifier 001 is determined as a target account, a relation related to the account is obtained in a knowledge graph according to the target account identifier 001, the relation comprises a policy purchased by the target account, a policy for claim settlement and a revoked policy, and for the policy purchased by the target account, information of a insured person, a marketer and the like of the policy purchased by the target account, or information of a policyholder, a marketer and the like of the policy for claim settlement of the target account can be further expanded; the relationship between the target account and the policy, the relationship between the policy and the person to be protected, and the marketer is the target entity relationship. According to the target account identification, the target entity relationship and the target transaction entity identification, a transaction cue diagram can be determined, transaction data related to the target account can be visually seen through the transaction cue diagram, the tracing of funds is facilitated, and the investigation of suspicious accounts is facilitated.
As shown in fig. 4, in one embodiment, a schematic diagram of expanding a partial relationship of the account shown in fig. 3 is shown, the account entity in fig. 4 is expanded, the expanded transaction clue diagram includes a client entity and four policy entities, the names of the client entity are Zhang san, the policy entities are respectively a policy number one to a policy number four, and the relationship types of the four policies and the client entity belong to the relationship of the policy holder, namely, the policy of Zhang san is guaranteed by four policies. As shown in fig. 4, when the account relationship of the present embodiment is expanded, transaction time is also associated, and the transaction time data can be used to determine the relationship between a transaction entity and an entity in a period of time, for example, a large amount of insurance information is applied to a certain account, but the insurance information of the account is not centralized, but is regularly distributed in each period of time, so that the suspicious performance of the account is not high.
In this embodiment, the entity relationships are distinguished in time dimension, and a special event category is obtained according to the transaction time, and compared with the traditional knowledge graph, the entity relationships described by the knowledge graph in this embodiment are relationships within a period of time, and do not represent the occurrence of a certain moment, such as parent-child relationships, grandchild relationships, friend relationships, and the like. The specific event is an event situation which occurs at a single moment, such as a policy application event, a policy return event, a policy borrowing event, etc. According to the embodiment, the transaction time parameter is added into the knowledge graph, so that the accuracy of identifying the suspicious account can be improved.
In one embodiment, determining a target entity relationship and a target transaction entity identification associated with a target account identification includes: and expanding based on the target account identifier, taking the entity relationship directly connected or indirectly connected with the target account identifier as a target entity relationship, and taking the transaction entity identifier indirectly connected with the target account identifier as a target transaction entity identifier.
The entity relationship or transaction entity identifier indirectly connected with the target account identifier is not directly connected with the target account identifier, but can be associated with the target account through other entity identifiers or entity relationships, and has an indirect connection relationship with the target account. When the target account is subjected to deep analysis, each entity node and the corresponding entity relationship can be deeply expanded through the target account identification.
By taking the foregoing example as an example, after the target entity identifier and the target entity relationship associated with the target account identifier are obtained according to the target account identifier, the entity relationship of the target entity identifier may be further expanded, for example, a B identifier having an entity relationship with a is obtained according to the a identifier, and then the B identifier is expanded and analyzed to obtain a C identifier having an entity relationship with the B identifier, and so on, the entity identifier and the entity relationship are continuously expanded, the entity relationship between transaction entities is mined, and the efficiency of suspicious account identification is improved.
Fig. 5 is a schematic diagram of a deep relationship expansion of the account shown in fig. 3 in another embodiment, where the account entity in fig. 5 is subjected to the relationship expansion, and the obtained transaction clue diagram includes five client entities and nine policy entities, the client entities are Zhang three, li four, wang five, zhao Liu and Sun Qi, the policy entities are policy 001 to policy 009, the relationship types of the nine policies include nine policy-protected relationships and six policy-protected relationships, that is, the guarantee objects of the nine policies are among the five client entities, but the five client entities only correspond to the applicant of six policies. Taking Sun Qi in fig. 5 as an example for analyzing a target account, a dashed line between the policy 07 and the client grand seven represents a policy having an entity relationship with the client grand seven, and the transaction clue diagram includes a client entity Sun Qi, where the policy 07 guarantees the client grand seven. Similar to FIG. 4, the embodiment of FIG. 5 also correlates transaction time as the relationship to accounts expands, improving the accuracy of suspicious account identification through transaction time.
In one embodiment, acquiring transaction data and determining risk data in the transaction data includes: acquiring original transaction data of a plurality of systems, and carrying out data summarization on each original transaction data to obtain transaction data; and carrying out data extraction and conversion on the transaction data to obtain original risk data, and carrying out normalization processing on the original risk data to obtain risk data.
The original transaction data are transaction data directly collected in a transaction system, and taking the financial related business field as an example, the same product can be put in a plurality of business platforms or transaction systems at the same time, and when suspicious account identification is carried out, the original transaction data of each transaction system are summarized to obtain the transaction data. The transaction data is subjected to data conversion, extraction and cleaning to obtain original risk data, and the original risk data is subjected to normalization processing to obtain standardized risk data, so that a high-quality data basis is improved for suspicious account identification due to the fact that data among different systems or different platforms may be inconsistent.
In one embodiment, the method further comprises: the risk data, transaction event data and knowledge graph are stored in different databases, respectively.
For example, after determining the risk data from the raw transaction data, the risk data may be stored in an upload in the back money laundering history database of the Hive working library raw data layer. After transaction event data is determined from the risk data and data such as transaction entities, entity relations and the like are obtained according to analysis of the transaction event data, the transaction event data and the data obtained through analysis processing can be uploaded and stored in a money laundering middleware database. When the knowledge-graph data is obtained through processing the transaction event data, the knowledge-graph data can be uploaded and stored in the money-back knowledge-graph database. The data are distributed and stored, so that the data are isolated, the safety of the data can be improved, the expansibility of the system can be improved, and the management and maintenance are easy.
In conjunction with the knowledge graph construction flowchart shown in fig. 6, as shown in fig. 7, in one embodiment, taking an insurance system as an example, the suspicious account identification method includes the following steps 702 to 708. Wherein:
step 702, transaction data for an insurance system is obtained and risk data is determined in the transaction data.
Transaction data, i.e. data obtained synchronously from a transaction system, are cleaned and integrated to obtain risk data, which may be stored in a DWD (Data WareHouse Detail, data detail repository) risk data cluster.
At step 704, transaction event data is extracted from the risk data using the scheduling system and the data processing tool.
The dispatching system can be a dolphin cloud dispatching system, the dolphin cloud dispatching system is a distributed and easily-expanded visual DAG workflow task dispatching open source system, and the dispatching system can be suitable for enterprise-level scenes and provides a solution for visual operation tasks, workflows and full life cycle data processing processes.
The data processing tool may be an ETL (Extract-Transform-Load) tool, which is used to Extract, transform, and Load data from risk data to a destination. The ETL tool may be GDS, dataX, or the like.
As shown in fig. 6, the risk data acquired from the DWD risk data cluster is subjected to a process such as cleaning, and then pushed to a CDH (Cloudera's Distribution, including Apache Hadoop) large data cluster, and the data is CSV data. The CDH is a big data platform, the core of the platform is a distributed computing framework constructed based on Apache Hadoop, and the CDH supports various data processing engines, such as Hadoop, spark, impala, HBase and the like.
The risk data is transmitted to a gateway node of the CDH big data cluster, which gateway node can be used to increase the computational tasks from the node into the computing cluster, but which node does not store data nor participate in the cluster overall computation. The transaction event data transmitted to the gateway is CSV data, which stores form data in plain text.
And step 706, processing the transaction event data to obtain a knowledge graph.
And uploading risk data in the gateway node to a Hive processing library original data layer amls-his (Anti-Money Laundering Historical Data, backwash money history data) database by an oozie scheduling execution shell script carried by the CDH big data cluster, executing a Hive SQL processing script to execute middle layer processing and storing the data in amls-mid (Anti-Money Laundering Middleware, backwash money middleware) data. And executing the generated JSON script by using an avoier computing engine, so as to realize the processing of transaction entities, entity relations and transaction events, store the results in an amls-knowledgegraph (Anti-Money Laundering Knowledge Graph, money laundering knowledge graph) database, start a pushing script to push the transaction entity index to ES (Elasticsearch) clusters, and guide the transaction entities, entity relations and transaction event data into Hbase. And performing multi-degree relation expansion on the generated transaction entities, entity relations and transaction events by using spark, namely mining the graph, and pushing and storing the mined transaction cue graph in an OceanBase database.
Wherein oozie is a scheduling system for managing Hadoop jobs on CDH big data clusters. Shell is a command language, and also a programming language, such as Shell script, as a command-based language, which Shell interactively interprets and executes commands entered by a user; as a programming language, variables, delivery parameters can be defined in Shell, and all flow control structures in many high-level languages are provided. Hive is a data warehouse tool based on Hadoop, and is used for extracting, converting and loading data, and is a mechanism capable of storing, querying and analyzing large-scale data stored in Hadoop. Hive SQL is an hdfs-based data warehouse tool that is a MapReduce package with the underlying MapReduce program.
The Aviator, namely Aviator Script, is a high-performance, lightweight scripting language hosted on the JVM, and the embodiment modifies the traditional multi-table association mode into a longitudinal aggregation mode in the Aviator lightweight engine code. JSON is a lightweight data interchange format that stores and presents data in a text format that is completely independent of the programming language. The elastic search is a distributed and RESTful searching and data analyzing engine, is used for storing transaction entity index data of the knowledge graph, and can accelerate the searching efficiency of billions of data. Spark is a fast, generic, extensible big data analysis engine developed by the Scala language, which is a multi-paradigm programming language, like java, that is scalable and integrates the various features of object-oriented programming and functional programming. OceanBase is a distributed database.
In this embodiment, compared with the traditional method of directly writing spark program by using an aviator lightweight rule engine and spark combination, the method of integrating the whole treatment thought into Excel to record transaction data blood edges, namely sources of each table and each field, so that data hierarchy division is more clear, and later, when iterative development is required, the old data treatment process can be searched from the Excel so as to design a development mode of new requirements. If the developer leaves the job or changes, the working period of the developer can be effectively shortened. Meanwhile, the embodiment avoids directly developing redundant and complex spark codes, developers can share Excel to generate an automation program of JSON and an aviator rule engine to combine with spark calculation codes, and development cost is effectively reduced. For million-level incremental data management, the embodiment uses a longitudinal aggregation mode to replace the traditional multi-table link matching when using an averager engine to combine spark for execution, so as to Reduce the Shuffle process in the whole management flow, thereby improving the calculation efficiency, wherein the Shuffle refers to that in big data calculation, intermediate results generated in a Map stage are ordered according to Key values, value values of the same Key values are combined together to form a list, and then the lists are redistributed to Reduce nodes for further processing according to a Hash algorithm. In the validation experiments, the data processing of this example had been reduced from 6.5 hours to 5.5 hours compared to the prior direct spark calculation of the daily average time for the treatment.
Step 708, selecting the target account identifier to be identified, obtaining a transaction cue graph indexed with the target account identifier, and identifying the suspicious account according to the transaction cue graph.
In the embodiment, an open source big data ecological knowledge graph is constructed, a CDH big data cluster is used as a main data platform, an intermediate table is stored in a Hive database, a large-scale graph database integrated by HBase and ES is used for storing the knowledge graph, and gremlin graph library query language is supported. The complex hidden information such as the relationship among clients, the relationship among the client behaviors, the relationship among accounts and the like is deeply mined through a graph mining algorithm, and the abnormal aspects such as the client behaviors, the client relationships, the key transactions and the like are intuitively and conveniently found through displaying in the form of a transaction clue graph.
For high-performance calculation and search query of mass data, the Spark on Hive method is used for processing and calculating mass service data, so that the aim of daily gain data T+1 is fulfilled, and the timeliness value of the data are greatly improved. The distributed high-efficiency data analysis Search engine Elastic Search is connected with the big data service interface, the gallery query time is effectively shortened by utilizing the technologies of indexing, word segmentation and the like, meanwhile, a fuzzy query function is provided, hidden clues are connected in series, the calculation performance is effectively improved, and the processing time cost is reduced. Graph database techniques implement expert experience multiplexing. By combining the anti-money laundering and checking characteristics, hidden fund easy-to-hand behaviors are constructed in a graphical mode based on Graph X, and fund flow directions are displayed to form an intelligent expert checking mode.
The method and the system integrate business knowledge in the insurance field in multiple angles and all directions, realize a high-accuracy, high-performance and effective artificial intelligent system through technologies such as big data, knowledge graph, graph mining and the like, and realize comprehensive jump in technical frames, performance, deployment and the like.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a suspicious account identification system for realizing the suspicious account identification method. The implementation of the solution provided by the system is similar to the implementation described in the above method, so the specific limitation in the embodiments of the suspicious account identification system provided below may be referred to the limitation of the suspicious account identification method hereinabove, and will not be described herein.
In one exemplary embodiment, as shown in FIG. 8, there is provided a suspicious account identification system comprising: a risk data acquisition module 802, a risk data processing module 804, and a suspicious account identification module 806, wherein:
a risk data acquisition module 802, configured to acquire transaction data, and determine risk data in the transaction data;
the risk data processing module 804 is configured to extract transaction event data from the risk data, and construct a knowledge graph according to the transaction event data, where the transaction event data at least includes a transaction entity identifier, an entity relationship, and a transaction time, and the transaction entity identifier at least includes an account identifier;
the suspicious account identification module 806 is configured to determine a target account identifier to be identified, determine a transaction clue graph with the target account identifier as an index in the knowledge graph, and identify whether the target account is a suspicious account according to the transaction clue graph.
In one embodiment, risk data processing module 804 is further to: and constructing a directed graph by taking the transaction entity identifier as a node and the entity relationship as an edge, and correlating the transaction time to obtain a knowledge graph.
In one embodiment, suspicious account identification module 806 is further to: and positioning the target account identifier in the knowledge graph, determining a target entity relationship and a target transaction entity identifier which are associated with the target account identifier, and determining a transaction cue graph according to the target account identifier, the target entity relationship and the target transaction entity identifier.
In one embodiment, suspicious account identification module 806 is further to: and expanding based on the target account identifier, taking the entity relationship directly connected or indirectly connected with the target account identifier as a target entity relationship, and taking the transaction entity identifier indirectly connected with the target account identifier as a target transaction entity identifier.
In one embodiment, risk data acquisition module 802 is further to: acquiring original transaction data of a plurality of systems, and carrying out data summarization on each original transaction data to obtain transaction data; and carrying out data extraction and conversion on the transaction data to obtain original risk data, and carrying out normalization processing on the original risk data to obtain risk data.
In one embodiment, the apparatus is further for: the risk data, transaction event data and knowledge graph are stored in different databases, respectively.
The various modules in the suspicious account identification system described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one exemplary embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 9. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing business transaction data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a suspicious account identification method.
It will be appreciated by those skilled in the art that the structure shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an exemplary embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor performing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, user transaction information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use, and processing of the relevant data are required to meet the relevant regulations.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. A method of suspicious account identification, the method comprising:
acquiring transaction data and determining risk data in the transaction data;
extracting transaction event data from the risk data, and constructing a knowledge graph according to the transaction event data, wherein the transaction event data at least comprises a transaction entity identifier, an entity relationship and transaction time, and the transaction entity identifier at least comprises an account identifier;
And determining a target account identifier to be identified, determining a transaction cue graph indexed by the target account identifier in the knowledge graph, and identifying whether the target account is a suspicious account according to the transaction cue graph.
2. The method of claim 1, wherein constructing a knowledge-graph from the transaction event data comprises:
and constructing a directed graph by taking the transaction entity identifier as a node and the entity relationship as an edge, and correlating the transaction time to obtain the knowledge graph.
3. The method of claim 1, wherein the determining a thread map in the knowledge-graph that is indexed by the target account identification comprises:
and positioning the target account identifier in the knowledge graph, determining a target entity relationship and a target transaction entity identifier which are associated with the target account identifier, and determining the transaction cue graph according to the target account identifier, the target entity relationship and the target transaction entity identifier.
4. The method of claim 3, wherein the determining a target entity relationship and a target transaction entity identification associated with the target account identification comprises:
And expanding based on the target account identifier, taking the entity relationship directly connected or indirectly connected with the target account identifier as the target entity relationship, and taking the transaction entity identifier indirectly connected with the target account identifier as a target transaction entity identifier.
5. The method of claim 1, wherein the acquiring transaction data and determining risk data in the transaction data comprises:
acquiring original transaction data of a plurality of systems, and carrying out data summarization on each original transaction data to obtain the transaction data;
and carrying out data extraction and conversion on the transaction data to obtain the original risk data, and carrying out normalization processing on the original risk data to obtain the risk data.
6. The method according to claim 1, wherein the method further comprises:
and storing the risk data, the transaction event data and the knowledge graph in different databases respectively.
7. A suspicious account identification system, the system comprising:
the risk data acquisition module is used for acquiring transaction data and determining risk data in the transaction data;
The risk data processing module is used for extracting transaction event data from the risk data and constructing a knowledge graph according to the transaction event data, wherein the transaction event data at least comprises a transaction entity identifier, an entity relationship and transaction time, and the transaction entity identifier at least comprises an account identifier;
and the suspicious account identification module is used for determining a target account identifier to be identified, determining a transaction clue graph indexed by the target account identifier in the knowledge graph, and identifying whether the target account is a suspicious account according to the transaction clue graph.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202311742722.XA 2023-12-18 2023-12-18 Suspicious account identification method, suspicious account identification system, suspicious account identification device, suspicious account identification storage medium, suspicious account identification program product Pending CN117649240A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311742722.XA CN117649240A (en) 2023-12-18 2023-12-18 Suspicious account identification method, suspicious account identification system, suspicious account identification device, suspicious account identification storage medium, suspicious account identification program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311742722.XA CN117649240A (en) 2023-12-18 2023-12-18 Suspicious account identification method, suspicious account identification system, suspicious account identification device, suspicious account identification storage medium, suspicious account identification program product

Publications (1)

Publication Number Publication Date
CN117649240A true CN117649240A (en) 2024-03-05

Family

ID=90043312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311742722.XA Pending CN117649240A (en) 2023-12-18 2023-12-18 Suspicious account identification method, suspicious account identification system, suspicious account identification device, suspicious account identification storage medium, suspicious account identification program product

Country Status (1)

Country Link
CN (1) CN117649240A (en)

Similar Documents

Publication Publication Date Title
US20210166252A1 (en) Method of training machine learning models for making simulated estimations
US20200050968A1 (en) Interactive interfaces for machine learning model evaluations
US10963810B2 (en) Efficient duplicate detection for machine learning data sets
EP3161635B1 (en) Machine learning service
CN102945240B (en) Method and device for realizing association rule mining algorithm supporting distributed computation
CN107657049B (en) Data processing method based on data warehouse
CN101405728B (en) Relational database architecture with dynamic load capability
CN107515927A (en) A kind of real estate user behavioural analysis platform
Fadiya et al. Advancing big data for humanitarian needs
CN110472068A (en) Big data processing method, equipment and medium based on heterogeneous distributed knowledge mapping
CN112613789A (en) Risk control data processing method and risk early warning rule prepositive data monitoring method
CN101799808A (en) Data processing method and system thereof
Jeong et al. Anomaly teletraffic intrusion detection systems on hadoop-based platforms: A survey of some problems and solutions
CN114416855A (en) Visualization platform and method based on electric power big data
CN112579586A (en) Data processing method, device, equipment and storage medium
CN111639121A (en) Big data platform and method for constructing customer portrait
CN107729448A (en) A kind of data handling system based on data warehouse
Ravichandran Big Data processing with Hadoop: a review
CN116483822B (en) Service data early warning method, device, computer equipment and storage medium
Mishra et al. Challenges in big data application: a review
CN114495137B (en) Bill abnormity detection model generation method and bill abnormity detection method
US20220360458A1 (en) Control method, information processing apparatus, and non-transitory computer-readable storage medium for storing control program
CN116414854A (en) Data asset query method, device, computer equipment and storage medium
CN117649240A (en) Suspicious account identification method, suspicious account identification system, suspicious account identification device, suspicious account identification storage medium, suspicious account identification program product
Bhatnagar Data mining-based big data analytics: parameters and layered framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination