WO2020134213A1 - 基于知识图谱查询金融异常数据的方法及系统 - Google Patents

基于知识图谱查询金融异常数据的方法及系统 Download PDF

Info

Publication number
WO2020134213A1
WO2020134213A1 PCT/CN2019/106503 CN2019106503W WO2020134213A1 WO 2020134213 A1 WO2020134213 A1 WO 2020134213A1 CN 2019106503 W CN2019106503 W CN 2019106503W WO 2020134213 A1 WO2020134213 A1 WO 2020134213A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
query
abnormal
information
sample
Prior art date
Application number
PCT/CN2019/106503
Other languages
English (en)
French (fr)
Inventor
鲁岑
Original Assignee
苏宁云计算有限公司
苏宁易购集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁云计算有限公司, 苏宁易购集团股份有限公司 filed Critical 苏宁云计算有限公司
Priority to CA3179620A priority Critical patent/CA3179620A1/en
Publication of WO2020134213A1 publication Critical patent/WO2020134213A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Definitions

  • the invention relates to the technical field of financial anti-fraud, in particular to a method and system for querying financial abnormal data based on knowledge graphs.
  • the prior art mainly uses telephone return visits or secondary identity confirmation to identify frauds.
  • the above-mentioned methods can play a certain recognition effect on short-answer frauds, but for carefully packaged frauds, Because it involves a complicated relationship network, it is difficult to accurately identify by means of telephone return visit or secondary identity confirmation, so this also brings new challenges to fraud identification.
  • the purpose of the present invention is to provide a method and system for querying financial abnormal data based on knowledge graphs, which can accurately and quickly identify abnormal financial data therein by using a knowledge graph.
  • one aspect of the present invention provides a method for querying financial abnormal data based on a knowledge graph, including:
  • Design the structure of the graph database according to the query requirements of financial abnormal data the structure includes the expression of nodes and the relationship between nodes;
  • the method for designing the structure of the graph database according to the query requirements of financial abnormal data includes:
  • the query requirement of the abnormal financial data includes finding out the information of illegal intermediaries from the registration information of multiple lenders, and the registration information of the lender includes lender information, contact information, transferor information and/or recipient information, Wherein, the information includes name data, telephone data and identification code data;
  • the graph database is designed according to the principle that one node corresponds to one data.
  • the method of collecting a plurality of sample source data and cleaning the data to obtain a plurality of sample data conforming to the structure of the atlas database includes:
  • the double-checked sample source data is checked for legality, and the invalid sample source data of telephone data and/or ID code data is removed, and finally valid sample data is retained.
  • the method for identifying that the phone data and/or ID code data is invalid is:
  • the method of identifying financial abnormal data from the knowledge graph includes:
  • a plurality of the sample data are distributed and developed in the form of nodes, and the relation nodes form a knowledge graph by indicating line association;
  • the relationship nodes are selected from the knowledge graph, and then the illegal intermediary information is found from the selected relationship nodes.
  • the method of filtering out the relationship nodes from the knowledge graph according to the input query statement, and then finding out the information of the illegal intermediary from the filtered relationship nodes includes:
  • An abnormal node identification threshold is set, and when the degree of relevance of the relationship node is greater than the threshold, a node in the relationship node that is consistent with the query sentence type is output to obtain a query result of illegal intermediary information.
  • the degree of association is defined according to the number of indicator lines connected to the node.
  • the method for querying financial abnormal data based on knowledge graph provided by the present invention has the following beneficial effects:
  • the structure of the graph database needs to be designed first according to the user's query requirements for financial abnormal data.
  • the financial abnormal data query needs to query illegal intermediary information from lenders
  • the illegal intermediary information that the platform can obtain includes not only the name, but also its effective identification information such as its telephone and identification code, so when designing the structure of the graph database, three types of nodes can be selected.
  • the relationship node uses the indication line association to correspond to the structure of the design atlas database, and then collects multiple sample source data from the platform, after the data is cleaned, a CSV file that can be recognized by the atlas database is formed, and finally the CSV file is imported into the atlas
  • the database constructs a knowledge graph of sample data. By filtering out the nodes whose correlation degree is higher than the threshold from the knowledge graph, the corresponding information data in the nodes are extracted and output as financial abnormal data, such as the name of the illegal intermediary, telephone or ID code, etc. Identification data.
  • the present invention adopts the method of inputting a large amount of sample data into a graph database to form a knowledge graph to identify financial abnormal data.
  • the knowledge graph is good at handling complex network relationships and expresses multiple sample data in a structured network. Accurately identify financial abnormal data.
  • Another aspect of the present invention provides a system for querying financial anomaly data based on knowledge graph, which is applied to the method for querying financial anomaly data based on knowledge graph described in the above technical solution, the system includes:
  • the graph design unit is used to design the structural composition of the graph database according to the query requirements of financial abnormal data, and the structural composition includes expressions of nodes and relationships between the nodes;
  • the sample collection unit is used to collect multiple sample source data, and after cleaning the data, obtain multiple sample data conforming to the structure of the graph database;
  • the identification output unit is configured to import the sample data into the graph database to output a knowledge graph, and then identify financial abnormal data from the knowledge graph.
  • the sample collection unit includes:
  • Information collection module used to obtain multiple lender registration information from the database, and extract lender information, contact information, transferor information and/or recipient information from each lender registration information as sample source data ;
  • the screening module is used for preliminary screening of the sample source data, excluding sample source data that does not include name data, phone data or ID code data;
  • Duplicate check module used to check duplicate sample source data and delete duplicate sample source data
  • the verification module is used to verify the legality of the sample source data after double-checking, remove the invalid sample source data of the phone data and/or ID code data, and finally retain the valid sample data.
  • the identification output unit includes:
  • the pre-storage module is used to preset a variety of financial abnormal data query statements in Cypher language, including abnormal name query statements, abnormal phone query statements or abnormal identification code query statements;
  • the setting module is used to set the abnormal name query statement, abnormal phone query statement or abnormal ID code query statement on the query interface in a modular form, so that the user can select the query statement input according to the query needs of the financial abnormal data;
  • a processing module configured to distribute and expand a plurality of the sample data in the form of nodes, and the relationship nodes are related to each other to form a knowledge graph by indicating lines;
  • the query output module is used to filter out the relationship nodes from the knowledge graph according to the input query statement, and then identify the financial abnormal data from the filtered relationship nodes and output them in the form of query results.
  • the beneficial effects of the system for querying financial anomaly data based on knowledge graph provided by the present invention are the same as the beneficial effects of the method for querying financial anomaly data based on knowledge graph provided by the above technical solution, which will not be repeated here.
  • FIG. 1 is a schematic flowchart of a method for querying financial abnormal data based on a knowledge graph in Embodiment 1 of the present invention
  • FIG. 2 is a structural block diagram of a system for querying financial abnormal data based on a knowledge graph in Embodiment 2 of the present invention.
  • FIG. 1 is a schematic flowchart of a method for querying financial abnormal data based on a knowledge graph in Embodiment 1 of the present invention.
  • this embodiment provides a method for querying financial abnormal data based on a knowledge graph, including:
  • the structural composition includes the expression of nodes and the relationships between nodes; collect multiple sample source data, and clean the data to obtain multiple sample data that conform to the structure of the graph database; The sample data is imported into the graph database to output the knowledge graph, and then the financial abnormal data is found from the knowledge graph.
  • the structure of the graph database needs to be designed first according to the user's query needs for financial anomaly data.
  • the financial anomaly data query needs to query illegal intermediaries from lenders
  • the illegal intermediary information that the platform can obtain includes not only the name, but also its effective identification information such as its telephone and identification code, so when designing the structure of the graph database, three types of nodes can be selected.
  • the node represents a piece of information data, and the relationship node uses the indicator line association to correspond to the structure of the design atlas database. After that, multiple sample source data is collected from the platform. After the data is cleaned, a CSV file that can be recognized by the atlas database is formed.
  • the graph database constructs a knowledge graph of sample data, and selects nodes with a correlation degree higher than the threshold from the knowledge graph, and extracts the corresponding information data in the nodes to output as financial abnormal data, such as the name of illegal intermediaries, telephones, or identification codes. Identification data.
  • a large amount of sample data is input into the graph database to form a knowledge graph to identify financial abnormal data.
  • the knowledge graph is good at handling complex network relationships, and multiple sample data are expressed in a structured network, and then quickly , Accurately identify financial abnormal data.
  • the method for designing the structure of the graph database according to the query requirements of financial abnormal data in the foregoing embodiment includes:
  • the query requirements for abnormal financial data include finding out the information of illegal intermediaries from the registration information of multiple lenders.
  • the registration information of lenders includes lender information, contact information, transferor information and/or recipient information.
  • the information includes Name data, telephone data and ID code data; based on multiple data types, correspondingly set multiple node types, and design the graph database according to the principle of one node corresponding to one data.
  • the installment loan shopping is used as an example for explanation.
  • the platform can obtain the information of the above-mentioned related personnel including name data, phone data and ID code data, when designing the structure of the graph database, you can set three correspondingly in the graph database.
  • Each type of node corresponds to the above three kinds of data.
  • the method for collecting multiple sample source data in the above embodiment and cleaning the data to obtain multiple sample data conforming to the structure of the graph database includes:
  • the sample source data that does not conform to the structure of the atlas database is eliminated. If there are multiple loan records for the same lender, the platform will record multiple copies of the same lender.
  • the lender registration information may have duplicate lender registration information, so when the sample source data is obtained, the sample source data will be deduplicated, and then the duplicated sample source data will be checked for legality and removed Sample source data of invalid phone data and/or ID code data, and finally retain valid sample data.
  • the identification method of invalid phone data and/or ID code data is: by comparing phone data and/or ID code The length of the data is consistent with the standard phone number and/or standard identification code to determine whether it is invalid. For example, the mobile phone number that is not 11 digits and the identification code that is not 18 digits in the sample source data are determined to be invalid.
  • the method for identifying financial abnormal data from the knowledge graph in the above embodiment includes:
  • Cypher language to preset a variety of financial abnormal data query statements, including abnormal name query statements, abnormal phone query statements or abnormal ID code query statements; use abnormal name query statements, abnormal phone query statements or abnormal ID code query statements as modules It is set on the query interface to enable users to select the input of query statements according to the query requirements of financial abnormal data; distribute multiple sample data in the form of nodes, and the relationship nodes are linked by indicator lines to form a knowledge graph; based on the input query The sentence selects the relationship nodes from the knowledge graph, and then finds the information of illegal intermediaries from the relationship nodes.
  • this embodiment adopts a query module edited by presetting Cypher statements on the platform query interface, such as an illegal intermediary name query module or an illegal intermediary telephone query module , So that business personnel can directly drag the name query module of the illegal intermediary to the query box of the platform when searching for the name of the illegal intermediary.
  • the program receives the query instruction, it filters out the relationship nodes from the knowledge graph.
  • the relationship nodes here include Name data, telephone data and ID code data of the illegal intermediary, and finally find out the output result of the illegal intermediary's name data from the relationship node.
  • each sample data includes three types of data such as name, phone or ID
  • the way of the indicator line associates the three nodes in the same sample data.
  • the nodes with the same data are deduplicated, and then the indicator line connected to the deleted node is transferred.
  • a knowledge graph is finally formed.
  • this embodiment has the following advantages:
  • the above method of filtering out relation nodes from the knowledge graph according to the input query sentence, and then finding out the information of illegal intermediaries from the selected relation nodes includes:
  • the degree of association is defined according to the number of indicator lines connected to the node.
  • this embodiment provides a system for querying financial abnormal data based on knowledge graphs, including:
  • the graph design unit 1 is used to design the structural composition of the graph database according to the query requirements of financial abnormal data, and the structural composition includes expressions of nodes and relationships between the nodes;
  • the sample collection unit 2 is used to collect multiple sample source data, and after cleaning the data, obtain multiple sample data conforming to the structure of the atlas database;
  • the identification output unit 3 is used to import sample data into a graph database to output a knowledge graph, and then find financial abnormal data from the knowledge graph.
  • the sample collection unit 2 includes:
  • the information collection module 21 is used to obtain multiple lender registration information from the database, and extract the lender information, contact information, transferor information and/or recipient information from each lender registration information as a sample source data;
  • the screening module 22 is used for preliminary screening of sample source data, excluding sample source data that does not include name data, telephone data or ID code data;
  • Duplicate check module 23 used to check duplicate sample source data and delete duplicate sample source data
  • the verification module 24 is used to verify the validity of the sample source data after the double-checking, remove the sample source data that is invalid for the phone data and/or ID code data, and finally retain the valid sample source data.
  • the identification output unit 3 includes:
  • the pre-storage module 31 is used to preset a variety of financial abnormal data query statements in Cypher language, including abnormal name query statements, abnormal phone query statements or abnormal identification code query statements;
  • the setting module 32 is used to set the abnormal name query sentence, the abnormal phone query sentence or the abnormal identification code query sentence on the query interface in a modular form, so that the user can select the corresponding query sentence input according to the query needs of the financial abnormal data;
  • the processing module 33 is used to distribute and expand a plurality of sample data in the form of nodes, and the relationship nodes are related to each other to form a knowledge graph by indicating lines;
  • the query output module 34 is used to filter out the relationship nodes from the knowledge graph according to the input query statement, and then identify the financial abnormal data from the filtered relationship nodes to output in the form of query results.
  • the beneficial effects of the system for querying financial abnormal data based on knowledge graphs provided by the embodiments of the present invention are the same as the beneficial effects of the method for querying financial abnormal data based on knowledge graphs provided in Embodiment 1 above, and details are not described herein.
  • the above program can be stored in a computer-readable storage medium.
  • the program When executed, it includes Each step of the method in the foregoing embodiment, and the storage medium may be: ROM/RAM, magnetic disk, optical disk, memory card, or the like.

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

一种基于知识图谱查询金融异常数据的方法及系统,采用知识图谱的方式准确、快速的识别出其中的异常金融数据。该方法包括:根据金融异常数据的查询需求设计图谱数据库的结构构成,所述结构构成包括节点及节点间关系的表述;采集多个样本源数据,对其数据清洗后得到多个符合图谱数据库结构构成的样本数据;将所述样本数据导入所述图谱数据库输出知识图谱,然后从所述知识图谱中查找出金融异常数据。

Description

基于知识图谱查询金融异常数据的方法及系统 技术领域
本发明涉及金融反欺诈技术领域,尤其涉及一种基于知识图谱查询金融异常数据的方法及系统。
背景技术
随着互联网金融的发展,贷款中介行业逐渐兴起,他们能给征信黑户、征信白户等平时很难得到贷款审批的人群包装用户材料,帮助他们巧妙地规避平台风控,而由于此类客户多是无正常还款能力的客户,故若放款成功可能会给金融平台造成坏账,导致金融平台的资产损失,因此,为了杜绝上述欺诈行为的发生,如何识别出欺诈行为至关重要。
现有技术主要采取电话回访或者身份二次确认的方式来识别欺诈行为,实际应用中发现,上述方式对于简答的欺诈行为可以起到一定的识别效果,但对于精心包装的欺诈行为来说,由于其涉及到复杂的关系网络,很难在通过电话回访或者身份二次确认的方式准确识别,所以这也给欺诈行为识别带来了新的挑战。
发明内容
本发明的目的在于提供一种基于知识图谱查询金融异常数据的方法及系统,采用知识图谱的方式准确、快速的识别出其中的异常金融数据。
为了实现上述目的,本发明的一方面提供一种基于知识图谱查询金融异常数据的方法,包括:
根据金融异常数据的查询需求设计图谱数据库的结构构成,所述结构构成包括节点及节点间关系的表述;
采集多个样本源数据,对其数据清洗后得到多个符合图谱数据库结构构成的样本数据;
将所述样本数据导入所述图谱数据库输出知识图谱,然后从所述知识图谱中查找出金融异常数据。
优选地,根据金融异常数据的查询需求设计图谱数据库的结构构成的方法包括:
所述金融异常数据的查询需求包括从多位贷款人登记信息中查找出非法中介人信息,所述贷款人登记信息包括贷款人信息、联系人信息、转账人信息和/或收件人信息,其中,所述信息包括姓名数据、电话数据和身份识别码数据;
基于多种数据类型对应设置多种节点类型,按照一节点对应一数据的原则设计图谱数据库。
较佳地,所述采集多个样本源数据,对其数据清洗后得到多个符合图谱数据库结构构成的样本数据的方法包括:
从数据库中获取多份贷款人登记信息,并从中提取每份贷款人登记信息中的贷款人信息、联系人信息、转账人信息和/或收件人信息作为样本源数据;
对所述样本源数据初步筛查,剔除不包括姓名数据、电话数据或身份识别码数据的样本源数据;
对保留下的样本源数据进行查重,删除重复的样本源数据;
将查重后的样本源数据进行合法性校验,去除电话数据和/或身份识别码数据无效的样本源数据,最终保留有效的样本数据。
可选地,所述电话数据和/或身份识别码数据无效的识别方法为:
通过比对电话数据和/或身份识别码数据与标准电话号码和/或标准身份识别码的长度是否一致来判断是否无效。
优选地,从所述知识图谱中识别出金融异常数据的方法包括:
采用Cypher语言预设多种金融异常数据查询语句,包括异常姓名查询语句、异常电话查询语句或异常身份识别码查询语句;
将异常姓名查询语句、异常电话查询语句或异常身份识别码查询语句以模块化的形式设置在查询界面上,以使用户根据金融异常数据的查询需求对 应选择查询语句输入;
将多个所述样本数据以节点形式分布展开,关系节点通过指示线关联形成知识图谱;
根据输入的查询语句从知识图谱中筛选出关系节点,再从筛选出的关系节点中查找出非法中介人信息。
可选地,根据输入的查询语句从知识图谱中筛选出关系节点,再从筛选出的关系节点中查找出非法中介人信息的方法包括:
设置异常节点识别阈值,当关系节点的关联度大于所述阈值时将关系节点中与所述查询语句类型一致的节点输出,得到非法中介人信息的查询结果。
示例性地,所述关联度是根据与节点连接的指示线数量定义得到。
与现有技术相比,本发明提供的基于知识图谱查询金融异常数据的方法具有以下有益效果:
本发明提供的基于知识图谱查询金融异常数据的方法中,首先需根据用户对金融异常数据的查询需求设计图谱数据库的结构构成,当金融异常数据的查询需求为从贷款人当中查询非法中介人信息时,考虑到平台能够获取到的非法中介人信息不仅包括姓名,还包括其电话和身份识别码等有效身份识别信息,故在设计图谱数据库的结构构成时可选用三种类型节点,通过一个节点表示一个信息数据,关系节点使用指示线关联的方式对应设计图谱数据库的结构构成,之后从平台中采集多个样本源数据,数据清洗后形成图谱数据库可识别的CSV文件,最后将CSV文件导入图谱数据库构建样本数据的知识图谱,通过从知识图谱筛选出关联度高于阈值的节点,提取节点中对应的信息数据输出为金融异常数据,例如非法中介人的姓名、电话或者身份识别码等有效的身份识别数据。
可见,本发明采取将大量样本数据输入图谱数据库形成知识图谱的方式识别金融异常数据,利用知识图谱擅长处理复杂网络关系的特性,将多个样本数据用结构化的网络表示出来,进而从中快速、准确的识别出金融异常数据。
本发明的另一方面提供一种基于知识图谱查询金融异常数据系统,应用于上述技术方案所述的基于知识图谱查询金融异常数据系统方法中,所述系统包括:
图谱设计单元,用于根据金融异常数据的查询需求设计图谱数据库的结构构成,所述结构构成包括节点及节点间关系的表述;
样本采集单元,用于采集多个样本源数据,对其数据清洗后得到多个符合图谱数据库结构构成的样本数据;
识别输出单元,用于将所述样本数据导入所述图谱数据库输出知识图谱,然后从所述知识图谱中识别出金融异常数据。
优选地,所述样本采集单元包括:
信息采集模块,用于从数据库中获取多份贷款人登记信息,并从中提取每份贷款人登记信息中的贷款人信息、联系人信息、转账人信息和/或收件人信息作为样本源数据;
筛查模块,用于对所述样本源数据初步筛查,剔除不包括姓名数据、电话数据或身份识别码数据的样本源数据;
查重模块,用于对保留下的样本源数据进行查重,删除重复的样本源数据;
校验模块,用于将查重后的样本源数据进行合法性校验,去除电话数据和/或身份识别码数据无效的样本源数据,最终保留有效的样本数据。
优选地,所述识别输出单元包括:
预存储模块,用于采用Cypher语言预设多种金融异常数据查询语句,包括异常姓名查询语句、异常电话查询语句或异常身份识别码查询语句;
设置模块,用于将异常姓名查询语句、异常电话查询语句或异常身份识别码查询语句以模块化的形式设置在查询界面上,以使用户根据金融异常数据的查询需求对应选择查询语句输入;
处理模块,用于将多个所述样本数据以节点形式分布展开,关系节点通过指示线关联形成知识图谱;
查询输出模块,用于根据输入的查询语句从知识图谱中筛选出关系节点,再从筛选出的关系节点中识别出金融异常数据以查询结果形式输出。
与现有技术相比,本发明提供的基于知识图谱查询金融异常数据系统的有益效果与上述技术方案提供的基于知识图谱查询金融异常数据方法的有益效果相同,在此不做赘述。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本发明的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1为本发明实施例一中基于知识图谱查询金融异常数据方法的流程示意图;
图2为本发明实施例二中基于知识图谱查询金融异常数据系统的结构框图。
附图标记:
1-图谱设计单元,                     2-样本采集单元;
3-识别输出单元,                     21-信息采集模块;
22-筛查模块,                        23-查重模块;
24-校验模块,                        31-预存储模块;
32-设置模块,                        33-处理模块;
34-查询输出模块。
具体实施方式
为使本发明的上述目的、特征和优点能够更加明显易懂,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。 基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其它实施例,均属于本发明保护的范围。
实施例一
图1为本发明实施例一中基于知识图谱查询金融异常数据的方法流程示意图。请参阅图1,本实施例提供一种基于知识图谱查询金融异常数据的方法,包括:
根据金融异常数据的查询需求设计图谱数据库的结构构成,结构构成包括节点及节点间关系的表述;采集多个样本源数据,对其数据清洗后得到多个符合图谱数据库结构构成的样本数据;将样本数据导入图谱数据库输出知识图谱,然后从知识图谱中查找出金融异常数据。
本实施例提供的基于知识图谱查询金融异常数据的方法中,首先需根据用户对金融异常数据的查询需求设计图谱数据库的结构构成,当金融异常数据的查询需求为从贷款人当中查询非法中介人信息时,考虑到平台能够获取到的非法中介人信息不仅包括姓名,还包括其电话和身份识别码等有效身份识别信息,故在设计图谱数据库的结构构成时可选用三种类型节点,通过一个节点表示一个信息数据,关系节点使用指示线关联的方式对应设计图谱数据库的结构构成,之后从平台中采集多个样本源数据,数据清洗后形成图谱数据库可识别的CSV文件,最后将CSV文件导入图谱数据库构建样本数据的知识图谱,通过从知识图谱筛选出关联度高于阈值的节点,提取节点中对应的信息数据输出为金融异常数据,例如非法中介人的姓名、电话或者身份识别码等有效的身份识别数据。
可见,本实施例采取将大量样本数据输入图谱数据库形成知识图谱的方式识别金融异常数据,利用知识图谱擅长处理复杂网络关系的特性,将多个样本数据用结构化的网络表示出来,进而从中快速、准确的识别出金融异常数据。
具体地,上述实施例中根据金融异常数据的查询需求设计图谱数据库的结构构成的方法包括:
金融异常数据的查询需求包括从多位贷款人登记信息中查找出非法中介人信息,贷款人登记信息包括贷款人信息、联系人信息、转账人信息和/或收件人信息,其中,信息包括姓名数据、电话数据和身份识别码数据;基于多种数据类型对应设置多种节点类型,按照一节点对应一数据的原则设计图谱数据库。
具体实施时,为了便于理解现以分期贷款购物为例进行说明,在查找分期贷款购物活动的非法中介人过程中,必定要从贷款人、所购商品的收件人和相关转账人为入口梳理可疑线索并挖掘出其中的非法中介人,由于平台能够获取到上述相关人员的信息包括姓名数据、电话数据和身份识别码数据,故在设计图谱数据库的结构构成时,可在图谱数据库中对应设置三种类型的节点对应表示上述三种数据,通过将多个分期贷款购物数据进行知识图谱分析后,筛选其中关联度高的节点挖掘出可疑的非法中介人。
具体地,上述实施例中所述采集多个样本源数据,对其数据清洗后得到多个符合图谱数据库结构构成的样本数据的方法包括:
从数据库中获取多份贷款人登记信息,并从中提取每份贷款人登记信息中的贷款人信息、联系人信息、转账人信息和/或收件人信息作为样本源数据;对样本源数据初步筛查,剔除不包括姓名数据、电话数据或身份识别码数据的样本源数据;对保留下的样本源数据进行查重,删除重复的样本源数据;将查重后的样本源数据进行合法性校验,去除电话数据和/或身份识别码数据无效的样本源数据,最终保留有效的样本数据。
具体实施时,在获取到多份样本源数据之后,对不符合图谱数据库结构构成的样本源数据给予剔除,如果同一贷款人存在多次贷款记录,那么平台就会记录有同一贷款人的多份贷款人登记信息,可能会存在重复的贷款人登记信息,因此在获取到样本源数据时会对样本源数据进行去重,之后还会对查重后的样本源数据进行合法性校验,去除电话数据和/或身份识别码数据无效的样本源数据,最终保留有效的样本数据,其中,电话数据和/或身份识别码数据无效的识别方法为:通过比对电话数据和/或身份识别码数据与标准电 话号码和/或标准身份识别码的长度是否一致来判断是否无效,例如,对样本源数据中不为11位的手机号码和不为18位的身份识别码判别为无效。
优选地,上述实施例中从知识图谱中识别出金融异常数据的方法包括:
采用Cypher语言预设多种金融异常数据查询语句,包括异常姓名查询语句、异常电话查询语句或异常身份识别码查询语句;将异常姓名查询语句、异常电话查询语句或异常身份识别码查询语句以模块化的形式设置在查询界面上,以使用户根据金融异常数据的查询需求对应选择查询语句输入;将多个样本数据以节点形式分布展开,关系节点通过指示线关联形成知识图谱;根据输入的查询语句从知识图谱中筛选出关系节点,再从筛选出的关系节点中查找出非法中介人信息。
具体实施时,在使用图谱数据库查询检索时,每一次查询都需要使用Cypher语句编辑成图谱数据库可识别的查询命令,图谱数据库才能对应输出查询结果,显然这对非计算机专业出身的业务人员来说是不容易操作的,使用起来具有诸多不便,本实施例为了解决上述问题,采取在平台查询界面上预设Cypher语句编辑好的查询模块,例如非法中介人姓名查询模块或者非法中介人电话查询模块,使得业务人员在查询非法中介人姓名时,可直接将非法中介人姓名查询模块拖曳到平台的查询框搜索,程序接收到查询指令后再从知识图谱筛选出关系节点,此处的关系节点包括非法中介人的姓名数据、电话数据和身份识别码数据,最后从关系节点中查找出非法中介人姓名数据输出结果。
可以理解的是,将多个样本源数据以节点形式分布展开,关系节点通过指示线关联形成知识图谱的方法为:
由于每份样本数据均包括姓名、电话或者身份识别码共3种类型的数据,在构建知识图谱的过程中可参照每份样本数据对应构建3个节点,使得每个节点代表一个数据,同时以指示线的方式将同一份样本数据中的三个节点关联表示,当多份样本数据对应的节点全部构建完成后,再将相同数据的节点去重,然后将与删除节点连接的指示线转接在去重后保留的节点上,最终形 成知识图谱。
从上述实施过程可知,本实施例具有如下优点:
1、能够简化图谱数据库的查询复杂度,以前只有专业数据分析人员和工程师才能掌握的图谱数据库专用语言与语法,现在不懂计算机编程语言的业务人员也可以进行查询操作;
2、能够减少业务人员与开发人员的沟通成本,以前业务人员需要经过撰写查需求说明书-研发部门排期-研发部门实现需求等多个部门的合作流程,现在只需要研发部门将数据导入图谱数据库,后续使用可由业务人员自行完成;
3、提高了查询效率,以前数据分析人员得到的分析结果只能在图谱数据库上使用Cypher语句实现交互且需要将图数据还原成数据表结构才能给业务部门使用,现在将图普数据库设置在平台上,业务人员可直接获取查询结果,整个过程方便快捷。
进一步地,上述根据输入的查询语句从知识图谱中筛选出关系节点,再从筛选出的关系节点中查找出非法中介人信息的方法包括:
设置异常节点识别阈值,当关系节点的关联度大于阈值时将关系节点中与查询语句类型一致的节点输出,得到非法中介人信息的查询结果。其中,关联度是根据与节点连接的指示线数量定义得到的。
实施例二
请参阅图1和图2,本实施例提供一种基于知识图谱查询金融异常数据系统,包括:
图谱设计单元1,用于根据金融异常数据的查询需求设计图谱数据库的结构构成,结构构成包括节点及节点间关系的表述;
样本采集单元2,用于采集多个样本源数据,对其数据清洗后得到多个符合图谱数据库结构构成的样本数据;
识别输出单元3,用于将样本数据导入图谱数据库输出知识图谱,然后从知识图谱中查找出金融异常数据。
优选地,样本采集单元2包括:
信息采集模块21,用于从数据库中获取多份贷款人登记信息,并从中提取每份贷款人登记信息中的贷款人信息、联系人信息、转账人信息和/或收件人信息作为样本源数据;
筛查模块22,用于对样本源数据初步筛查,剔除不包括姓名数据、电话数据或身份识别码数据的样本源数据;
查重模块23,用于对保留下的样本源数据进行查重,删除重复的样本源数据;
校验模块24,用于将查重后的样本源数据进行合法性校验,去除电话数据和/或身份识别码数据无效的样本源数据,最终保留有效的样本源数据。
优选地,识别输出单元3包括:
预存储模块31,用于采用Cypher语言预设多种金融异常数据查询语句,包括异常姓名查询语句、异常电话查询语句或异常身份识别码查询语句;
设置模块32,用于将异常姓名查询语句、异常电话查询语句或异常身份识别码查询语句以模块化的形式设置在查询界面上,以使用户根据金融异常数据的查询需求对应选择查询语句输入;
处理模块33,用于将多个样本数据以节点形式分布展开,关系节点通过指示线关联形成知识图谱;
查询输出模块34,用于根据输入的查询语句从知识图谱中筛选出关系节点,再从筛选出的关系节点中识别出金融异常数据以查询结果形式输出。
与现有技术相比,本发明实施例提供的基于知识图谱查询金融异常数据系统的有益效果与上述实施例一提供的基于知识图谱查询金融异常数据方法的有益效果相同,在此不做赘述。
本领域普通技术人员可以理解,实现上述发明方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,上述程序可以存储于计算机可读取存储介质中,该程序在执行时,包括上述实施例方法的各步骤,而所述的存储介质可以是:ROM/RAM、磁碟、光盘、存储卡等。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (10)

  1. 一种基于知识图谱查询金融异常数据的方法,其特征在于,包括:
    根据金融异常数据的查询需求设计图谱数据库的结构构成,所述结构构成包括节点及节点间关系的表述;
    采集多个样本源数据,对其数据清洗后得到多个符合图谱数据库结构构成的样本数据;
    将所述样本数据导入所述图谱数据库输出知识图谱,然后从所述知识图谱中查找出金融异常数据。
  2. 根据权利要求1所述的方法,其特征在于,根据金融异常数据的查询需求设计图谱数据库的结构构成的方法包括:
    所述金融异常数据的查询需求包括从多位贷款人登记信息中查找出非法中介人信息,所述贷款人登记信息包括贷款人信息、联系人信息、转账人信息和/或收件人信息,其中,所述信息包括姓名数据、电话数据和身份识别码数据;
    基于多种数据类型对应设置多种节点类型,按照一节点对应一数据的原则设计图谱数据库。
  3. 根据权利要求2所述的方法,其特征在于,所述采集多个样本源数据,对其数据清洗后得到多个符合图谱数据库结构构成的样本数据的方法包括:
    从数据库中获取多份贷款人登记信息,并从中提取每份贷款人登记信息中的贷款人信息、联系人信息、转账人信息和/或收件人信息作为样本源数据;
    对所述样本源数据初步筛查,剔除不包括姓名数据、电话数据或身份识别码数据的样本源数据;
    对保留下的样本源数据进行查重,删除重复的样本源数据;
    将查重后的样本源数据进行合法性校验,去除电话数据和/或身份识别码数据无效的样本源数据,最终保留有效的样本数据。
  4. 根据权利要求3所述的方法,其特征在于,所述电话数据和/或身份识别码数据无效的识别方法为:
    通过比对电话数据和/或身份识别码数据与标准电话号码和/或标准身份识别码的长度是否一致来判断是否无效。
  5. 根据权利要求2所述的方法,其特征在于,从所述知识图谱中识别出金融异常数据的方法包括:
    采用Cypher语言预设多种金融异常数据查询语句,包括异常姓名查询语句、异常电话查询语句或异常身份识别码查询语句;
    将异常姓名查询语句、异常电话查询语句或异常身份识别码查询语句以模块化的形式设置在查询界面上,以使用户根据金融异常数据的查询需求对应选择查询语句输入;
    将多个所述样本数据以节点形式分布展开,关系节点通过指示线关联形成知识图谱;
    根据输入的查询语句从知识图谱中筛选出关系节点,再从筛选出的关系节点中查找出非法中介人信息。
  6. 根据权利要求5所述的方法,其特征在于,根据输入的查询语句从知识图谱中筛选出关系节点,再从筛选出的关系节点中查找出非法中介人信息的方法包括:
    设置异常节点识别阈值,当关系节点的关联度大于所述阈值时将关系节点中与所述查询语句类型一致的节点输出,得到非法中介人信息的查询结果。
  7. 根据权利要求5或6所述的方法,其特征在于,所述关联度是根据与节点连接的指示线数量定义得到的。
  8. 一种基于知识图谱查询金融异常数据系统,其特征在于,包括:
    图谱设计单元,用于根据金融异常数据的查询需求设计图谱数据库的结构构成,所述结构构成包括节点及节点间关系的表述;
    样本采集单元,用于采集多个样本源数据,对其数据清洗后得到多个符合图谱数据库结构构成的样本数据;
    识别输出单元,用于将所述样本数据导入所述图谱数据库输出知识图谱,然后从所述知识图谱中查找出金融异常数据。
  9. 根据权利要求8所述的系统,其特征在于,所述样本采集单元包括:
    信息采集模块,用于从数据库中获取多份贷款人登记信息,并从中提取每份贷款人登记信息中的贷款人信息、联系人信息、转账人信息和/或收件人信息作为样本源数据;
    筛查模块,用于对所述样本源数据初步筛查,剔除不包括姓名数据、电话数据或身份识别码数据的样本源数据;
    查重模块,用于对保留下的样本源数据进行查重,删除重复的样本源数据;
    校验模块,用于将查重后的样本源数据进行合法性校验,去除电话数据和/或身份识别码数据无效的样本源数据,最终保留有效的样本数据。
  10. 根据权利要求8所述的系统,其特征在于,所述识别输出单元包括:
    预存储模块,用于采用Cypher语言预设多种金融异常数据查询语句,包括异常姓名查询语句、异常电话查询语句或异常身份识别码查询语句;
    设置模块,用于将异常姓名查询语句、异常电话查询语句或异常身份识别码查询语句以模块化的形式设置在查询界面上,以使用户根据金融异常数据的查询需求对应选择查询语句输入;
    处理模块,用于将多个所述样本数据以节点形式分布展开,关系节点通过指示线关联形成知识图谱;
    查询输出模块,用于根据输入的查询语句从知识图谱中筛选出关系节点, 再从筛选出的关系节点中识别出金融异常数据以查询结果形式输出。
PCT/CN2019/106503 2018-12-25 2019-09-18 基于知识图谱查询金融异常数据的方法及系统 WO2020134213A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3179620A CA3179620A1 (en) 2018-12-25 2019-09-18 Method and system for querying abnormal financial data on basis of knowledge map

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811588282.6A CN109491995A (zh) 2018-12-25 2018-12-25 基于知识图谱查询金融异常数据的方法及系统
CN201811588282.6 2018-12-25

Publications (1)

Publication Number Publication Date
WO2020134213A1 true WO2020134213A1 (zh) 2020-07-02

Family

ID=65711856

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/106503 WO2020134213A1 (zh) 2018-12-25 2019-09-18 基于知识图谱查询金融异常数据的方法及系统

Country Status (3)

Country Link
CN (1) CN109491995A (zh)
CA (2) CA3230500A1 (zh)
WO (1) WO2020134213A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632132A (zh) * 2020-12-31 2021-04-09 中国农业银行股份有限公司 一种异常导入数据的处理方法、装置及设备

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109491995A (zh) * 2018-12-25 2019-03-19 苏宁易购集团股份有限公司 基于知识图谱查询金融异常数据的方法及系统
CN110321438A (zh) * 2019-06-14 2019-10-11 北京奇艺世纪科技有限公司 基于复杂网络的实时欺诈检测方法、装置及电子设备
CN110609905A (zh) * 2019-09-12 2019-12-24 深圳众赢维融科技有限公司 超点类型识别和图数据处理方法及装置
CN110837538A (zh) * 2019-10-24 2020-02-25 北京中科捷信信息技术有限公司 金融知识图谱可视化查询与多维分析系统
TWI736233B (zh) * 2020-04-23 2021-08-11 兆豐國際商業銀行股份有限公司 貸前調查系統以及貸前調查方法
CN112686760B (zh) * 2021-01-20 2021-09-14 深圳市全景网络有限公司 基于大数据的金融业务处理方法及平台
CN113469697B (zh) * 2021-06-30 2022-12-06 重庆富民银行股份有限公司 基于知识图谱的无监督异常检测方法及装置
CN115269879B (zh) * 2022-09-05 2023-05-05 北京百度网讯科技有限公司 知识结构数据的生成方法、数据搜索方法和风险告警方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038700A (zh) * 2017-12-22 2018-05-15 上海前隆信息科技有限公司 一种反欺诈数据分析方法与系统
CN108280760A (zh) * 2018-01-25 2018-07-13 树根互联技术有限公司 一种金融风险在线监控方法和装置
CN108492173A (zh) * 2018-03-23 2018-09-04 上海氪信信息技术有限公司 一种基于双模网络图挖掘算法的信用卡反欺诈预测方法
CN109002470A (zh) * 2018-06-12 2018-12-14 东方银谷(北京)投资管理有限公司 知识图谱构建方法及装置、客户端
CN109064318A (zh) * 2018-08-24 2018-12-21 苏宁消费金融有限公司 一种基于知识图谱的互联网金融风险监测系统
CN109491995A (zh) * 2018-12-25 2019-03-19 苏宁易购集团股份有限公司 基于知识图谱查询金融异常数据的方法及系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033135A (zh) * 2018-06-06 2018-12-18 北京大学 一种面向软件项目知识图谱的自然语言查询方法及系统
CN109064313A (zh) * 2018-07-20 2018-12-21 重庆富民银行股份有限公司 基于知识图谱技术的贷后预警监测系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038700A (zh) * 2017-12-22 2018-05-15 上海前隆信息科技有限公司 一种反欺诈数据分析方法与系统
CN108280760A (zh) * 2018-01-25 2018-07-13 树根互联技术有限公司 一种金融风险在线监控方法和装置
CN108492173A (zh) * 2018-03-23 2018-09-04 上海氪信信息技术有限公司 一种基于双模网络图挖掘算法的信用卡反欺诈预测方法
CN109002470A (zh) * 2018-06-12 2018-12-14 东方银谷(北京)投资管理有限公司 知识图谱构建方法及装置、客户端
CN109064318A (zh) * 2018-08-24 2018-12-21 苏宁消费金融有限公司 一种基于知识图谱的互联网金融风险监测系统
CN109491995A (zh) * 2018-12-25 2019-03-19 苏宁易购集团股份有限公司 基于知识图谱查询金融异常数据的方法及系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632132A (zh) * 2020-12-31 2021-04-09 中国农业银行股份有限公司 一种异常导入数据的处理方法、装置及设备
CN112632132B (zh) * 2020-12-31 2024-04-12 中国农业银行股份有限公司 一种异常导入数据的处理方法、装置及设备

Also Published As

Publication number Publication date
CA3230500A1 (en) 2020-07-02
CA3179620A1 (en) 2020-07-02
CN109491995A (zh) 2019-03-19

Similar Documents

Publication Publication Date Title
WO2020134213A1 (zh) 基于知识图谱查询金融异常数据的方法及系统
CN109635007B (zh) 一种行为评估方法、装置及相关设备
CN108520073A (zh) 风控数据整合方法、装置、设备及计算机可读存储介质
EP3418910A1 (en) Big data-based method and device for calculating relationship between development objects
KR20180060044A (ko) 클라우드 환경에서 개인정보 보호를 지원하는 p2p 중개 보안 시스템
TWI524199B (zh) 用以找出複雜二元或多重交易方關係之多維遞迴學習方法和系統
CN104424613A (zh) 一种增值税发票的监控方法及其系统
CN111833182B (zh) 识别风险对象的方法和装置
CN117391313B (zh) 基于ai的智能决策方法、系统、设备以及介质
CN111090780A (zh) 可疑交易信息的确定方法及装置、存储介质、电子设备
CN109002470A (zh) 知识图谱构建方法及装置、客户端
CN108734021B (zh) 一种基于去隐私数据的金融贷款大数据风险评估方法与系统
CN107944866B (zh) 交易记录排重方法及计算机可读存储介质
Shahbaz Data mapping for data warehouse design
CN111798304A (zh) 一种风险贷款确定方法、装置、电子设备及存储介质
WO2024093960A1 (zh) 异常交易应对策略的验证方法和验证装置
CN111177653A (zh) 一种信用评估方法和装置
Zealand Data integration manual
CN105844577A (zh) 一种关系网络的识别方法及装置
WO2023121848A1 (en) Deduplication of accounts using account data collision detected by machine learning models
CN105786929A (zh) 一种信息监测方法及装置
CN112286926B (zh) 一种基于办事事项数据供需图谱梳理数据质量规则的方法
CN110851431B (zh) 用于数据中台的数据处理方法及装置
CN115292297B (zh) 一种构建数据仓库数据质量监测规则的方法和系统
CN116484054B (zh) 数据处理方法及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19904112

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19904112

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19904112

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3179620

Country of ref document: CA