WO2020015594A1 - 信息修复方法、装置、电子设备及计算机可读介质 - Google Patents

信息修复方法、装置、电子设备及计算机可读介质 Download PDF

Info

Publication number
WO2020015594A1
WO2020015594A1 PCT/CN2019/095867 CN2019095867W WO2020015594A1 WO 2020015594 A1 WO2020015594 A1 WO 2020015594A1 CN 2019095867 W CN2019095867 W CN 2019095867W WO 2020015594 A1 WO2020015594 A1 WO 2020015594A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
relationship network
logical
information
key
Prior art date
Application number
PCT/CN2019/095867
Other languages
English (en)
French (fr)
Inventor
卢周
袁力
范叶亮
杜强
项祖琪
钱勇
Original Assignee
京东数字科技控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东数字科技控股有限公司 filed Critical 京东数字科技控股有限公司
Publication of WO2020015594A1 publication Critical patent/WO2020015594A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Definitions

  • the present disclosure relates generally to the field of Internet technologies, and in particular, to an information repair method, apparatus, electronic device, and computer-readable medium.
  • the present disclosure provides an information repair method, device, electronic device, and computer-readable medium to solve the above technical problems.
  • an information repair method including:
  • the data source of the user data is at least one of an e-commerce login account number, an ID number, a bank card number, a mobile phone number, a wallet payment account number, a financial account number, and a mobile device number.
  • integrating the user data to obtain a data integration table includes:
  • the field value in the data source with higher priority is selected according to the priority of the data source; when the same field is not included in different data sources, the field corresponding to the field is obtained in the corresponding data source value;
  • the data integration table is formed according to a field and a corresponding field value.
  • constructing a relationship network according to the data integration table includes:
  • the relationship network is formed by the nodes and the edges.
  • performing multi-account fusion on the data integration table and generating the fused relationship network by combining the relationship network includes:
  • the corresponding relationship between the side table and the node number and the original KEY is formed;
  • the correspondence between the node number and the logical KEY is calculated by using the connected component algorithm according to the side table and the number of nodes;
  • the original KEY is a unique identification number of the user data, and the logical KEY is used to mark a unique logical subject corresponding to multiple original KEYs after multi-account fusion.
  • forming a side table based on the deleted data integration table and the correspondence between the node number and the original KEY includes:
  • the repairing of the specified information according to the post-fusion relationship network includes:
  • a repair result of the specified information is obtained according to a field value corresponding to a specified field in the first N shortest path logical KEYs.
  • a breadth-first search is performed on a plurality of the merged relationship networks starting from a specified logical KEY to find other related logical KEYs.
  • Finding the first N shortest path logical KEYs includes:
  • the first N shortest path logic KEYs are obtained according to the first N values with the smallest values among the multiple path lengths.
  • an information repair apparatus including:
  • a data acquisition module configured to acquire user data through an e-commerce platform and integrate the user data to obtain a data integration table
  • a network building module configured to build a relationship network according to the data integration table
  • a fusion module configured to perform multi-account fusion on the data integration table, and generate a fusion relationship network by combining the relationship network;
  • a repair module configured to perform repair of specified information according to the fused relationship network.
  • an electronic device including a processor; a memory storing instructions for the processor to control the method steps as described above.
  • a computer-readable medium having computer-executable instructions stored thereon, which, when executed by a processor, implement the method steps described above.
  • a relationship network is generated by constructing user data of an e-commerce platform, and calculations such as fusion and search are performed to realize some information for the user. Repair, repair the original information that has been lost, and get more contact information of the borrower who lost the connection. On the other hand, based on the actual shopping data of the e-commerce platform, the reliability of the contact information of the lost borrower can be improved.
  • FIG. 1 shows a flowchart of an information repair method provided in an embodiment of the present disclosure.
  • FIG. 2 shows a flowchart of step S110 in FIG. 1 according to an embodiment of the present disclosure.
  • FIG. 3 shows a flowchart of step S120 in FIG. 1 according to an embodiment of the present disclosure.
  • FIG. 4 shows a schematic diagram of a relationship network in an embodiment of the present disclosure.
  • FIG. 5 illustrates a flowchart of step S130 in FIG. 1 according to an embodiment of the present disclosure.
  • FIG. 6 shows a flowchart of step S520 in FIG. 5 according to an embodiment of the present disclosure.
  • FIG. 7 shows a schematic diagram of a fused relationship network in an embodiment of the present disclosure.
  • FIG. 8 illustrates a flowchart of step S140 in FIG. 1 according to an embodiment of the present disclosure.
  • FIG. 9 illustrates a schematic diagram of an information repair apparatus provided in another embodiment of the present disclosure.
  • FIG. 10 is a schematic structural diagram of an electronic device suitable for implementing the embodiments of the present application provided by an embodiment of the present disclosure.
  • the proportion of lost contacts among newly applied customers has reached more than 30%, and after entering a bad stage (more than 30 days past due), the proportion of lost contacts has reached 70% .
  • the proportion of missing contact information due to arrears customers is high, or the common contacts filled by arrear customers are less available , which will affect the post-loan collection and generate more non-performing assets.
  • FIG. 1 shows a flowchart of an information repair method provided in an embodiment of the present disclosure, including the following steps:
  • step S110 user data is obtained through an e-commerce platform, and the user data is integrated to obtain a data integration table.
  • step S120 a relationship network is constructed according to the data integration table.
  • step S130 multi-account fusion is performed on the data integration table, and a relationship network after fusion is generated by combining the relationship network.
  • step S140 the specified information is repaired according to the post-fusion relation network.
  • the information restoration method provided by the embodiment of the present disclosure constructs a relationship network by constructing user data of the e-commerce platform, and performs calculations such as fusion and search, thereby realizing the restoration of some information of the user and specifying information that has been lost. Repairs can get more contact information for the missing borrower. On the other hand, based on the actual shopping data of the e-commerce platform, the reliability of the contact information of the lost borrower can be improved.
  • step S110 user data is acquired through an e-commerce platform, and the user data is integrated to obtain a data integration table.
  • users will leave many aspects of information during the entire shopping process, such as the process of registering an account, browsing the product, placing an order successfully, and completing payment. Leave a lot of data such as mobile phone number, device, card number, account number and so on.
  • the data source of the user data obtained through the e-commerce platform is at least one of an e-commerce login account, an ID number, a bank card number, a mobile phone number, a wallet payment account, a financial account number, and a mobile device number One.
  • more user data such as address information, can be obtained in the e-commerce platform.
  • only the specified information such as repair contact information is used as an example for description.
  • FIG. 2 shows a flowchart of integrating the user data in step S110 to obtain a data integration table, which includes the following steps:
  • step S210 data source priorities are set for different data sources of the user data.
  • step S220 field values corresponding to the field level are acquired based on the data source and the priority of the data source.
  • the field value in the data source with the higher priority is selected according to the priority of the data source; when the same field is not included in the different data sources, the field is obtained from the corresponding data source Corresponding field value.
  • step S230 the data integration table is formed according to a field and a corresponding field value.
  • Table 1 is a data integration table showing the user data under the above data sources, as follows:
  • Idlist table Field Type Field description us _ * _ key Bigint Original KEY us _ * _ p String E-commerce login account us _ * _ n String username id _ * _ n String identification number wa _ * _ t String Payment wallet account cu _ * _ ao String Financial account mo _ * _ e String phone number ba _ * _ ao String Bank card number de _ * _ id String Mobile device ID da _ * _ a String Data Sources sr _ * _ ue Bigint Data source priority
  • the e-commerce login account is mainly used to integrate data of user equipment, ID card number, card number, and mobile phone number from different sources.
  • a user visits an e-commerce website for the first time they usually first register for a website account (that is, an e-commerce login account). After logging in to the account, they search and browse for the product, and finally fill in the delivery address, the consignee and the delivery phone, and place an order to purchase the product .
  • a real person may have multiple e-commerce login accounts, and will use these multiple accounts to place orders for themselves or their friends and relatives to purchase goods, so that the mobile phone numbers, addresses, and names of the recipients of the friends and relatives will be left. Wait.
  • the mobile device information number is also left (such as IMEI, which can uniquely identify a mobile device).
  • IMEI which can uniquely identify a mobile device.
  • the "data source” field will distinguish the system from which the data source belongs. Different data sources have different Credibility. For example, the identification number, the source payment real name system is more reliable than the source e-commerce basic information system.
  • the data source priority field is used to quantify the value of the data source priority, so that in the subsequent steps, a large number of data can be selected and selected based on the data source priority.
  • step S120 a relationship network is constructed according to the data integration table.
  • FIG. 3 shows a flowchart of constructing a relationship network according to the data integration table in step S120, including the following steps:
  • step S310 a data source of the user data is used as a node.
  • step S320 an edge between two nodes is obtained according to a direct association or an interval association between the nodes.
  • step S330 the relationship network is formed by the nodes and the edges.
  • FIG. 4 shows a schematic diagram of the relationship network formed based on the above Table 1.
  • the source of nodes in the relationship network shown in FIG. 4 is mainly the various accounts in Table 1, including: e-commerce login account 41, ID number 42, Bank card number 43, mobile phone number 44, wallet payment account number 45, financial account number 46, and mobile device number 47.
  • Nodes in the network also have attributes, and the node attributes mainly include source, time, etc.
  • the attributes of the e-commerce login account 41 include: user level, registration time, latest order time, and recent consumption amount.
  • the attributes of the ID number 42 include: (type: passport / driver's license / identity card), province, data source, and data source priority.
  • the attributes of bank card number 43 include: (type: debit / credit), bank, data source, data source priority.
  • the attributes of the mobile phone number 44 include: mobile phone number, last use time, data source, and priority of the data source.
  • the attributes of the wallet payment account 45 include: wallet payment account and registration time.
  • the attributes of the financial account number 46 include a financial account number and registration time.
  • the attributes of mobile device number 47 include: e-commerce login account, IMEI number, data source, and data source priority.
  • edges there are two types of edges in the relationship network: one is that all types of accounts (that is, known binding and usage relationships) in the same row in Table 1 have a connected edge; Known relationships include implicit relationships derived from rules, such as making multiple rows of data with the same ID number associated with each other.
  • Edge attributes mainly include: real-name authentication, transaction, registration, binding, etc.
  • the attribute of the edge of the e-commerce login account 41 and the ID number 42 is real name authentication
  • the attribute of the edge of the e-commerce login account 41 and bank card number 43 is card binding / transaction
  • the attribute of the edge of the e-commerce login account 41 and the mobile phone number 44 is transaction
  • the attribute of the edge of the e-commerce login account 41 and the wallet payment account 45 is registration
  • the attribute of the edge of the e-commerce login account 41 and financial account 46 is registration
  • the attributes of the edge of the e-commerce login account 41 and the mobile device number 47 are transactions. For the attributes of other edges, refer to FIG. 4, which is not repeated here.
  • step S130 multi-account fusion is performed on the data integration table, and a relationship network after fusion is generated by combining the relationship network.
  • step S130 through the connected component algorithm with filtering, various nodes (mobile phones, ID cards, e-commerce login accounts, etc.) identified as the same "logical person" are associated with the same logical KEY, thereby achieving various account numbers. Mapping of logical KEY.
  • FIG. 5 shows a flowchart of performing multi-account fusion on the data integration table in step S130 and combining the relationship network to generate a fused relationship network, which specifically includes the following steps:
  • step S510 a data source with a data source priority in the data integration table that is lower than or equal to a preset value is deleted.
  • step S520 a correspondence between an edge table and a node number and the original KEY is formed according to the deleted data integration table.
  • step S530 the correspondence between the node number and the logical KEY is calculated by using a connected component algorithm according to the edge table and the number of nodes.
  • step S540 according to the correspondence between the node number and the original KEY and the correspondence between the node number and the logical KEY, the correspondence between the original KEY and the logical KEY is calculated.
  • the original KEY is a unique identification number of the user data
  • the logical KEY is used to mark a unique logical subject corresponding to multiple original KEYs after multi-account fusion.
  • the original KEY (that is, the field us _ * _ key) is the unique ID of the original data, and it is guaranteed that the original KEY of the data will not change each time the data table is updated.
  • step S550 the post-fusion relationship network is generated according to the correspondence between the original KEY and the logical KEY.
  • FIG. 6 shows that in step S520, an edge table is formed according to the deleted data integration table, and the corresponding relationship between the node number and the original KEY includes:
  • step S610 nodes in the deleted data integration table are consecutively numbered.
  • step S620 the nodes of the same original KEY are combined in pairs to obtain an edge table, where the edge table includes a starting node number and an ending node number.
  • step S630 a correspondence between a node number and an original KEY is constructed.
  • the corresponding relationship between the original KEY and the logical KEY is output through the fusion service, that is, the input format is a text file, and each row of data has unique associated data of the original KEY.
  • the fields are separated by spaces; the output format is also a text file, and each row is the mapping relationship between the original KEY and its logical KEY in the original data row, and the format is the original KEY [space] logical KEY.
  • the output is the logical KEY corresponding to all the original KEYs, and the minimum value of all corresponding original KEYs is taken as the logical KEY, which specifically includes the following three steps:
  • the first step is to filter the data integration table shown in Table 1.
  • the source phone number, identity card number, and bank card number source priority is above 6 nodes, other nodes are deleted, and the edges whose starting or ending points are not in the reserved list are directly deleted.
  • nodes are uniformly ID-coded, and all nodes are serially numbered starting from 0 to obtain a unique node number (ie, node ID) (
  • the node values in the multiple rows of data in Table 1 are the same, and the node IDs are the same).
  • the mobile phone number node 13***** generates a new node ID 101
  • the original KEY is 100001
  • the format of the corresponding relationship between the retained node and the original KEY is 10110110000.
  • the second step is to calculate the connected components in the graph.
  • the connected component algorithm used is a standard algorithm, the input file is an edge table file and the number of nodes, and the output is the correspondence between the node ID and the connected component ID. Since the useless edges and nodes have been filtered out in the first step, the nodes in the same connected component can be regarded as the same "logical subject" (that is, "logical person”).
  • the ID of the connected component is the smallest node ID in the same connected component. .
  • the connected component ID is logical KEY
  • the output data file is in the format:
  • the third step is to calculate the correspondence between the original KEY and the logical KEY from the connected components.
  • Idlist table Field Type Field description us _ * _ key bigint Original KEY us _ * _ p string E-commerce login account mo _ * _ e String phone number id _ * _ n string identification number wa _ * _ t String Payment wallet account cu _ * _ ao String Financial account ba _ * _ ao String Bank card number da _ * _ a string Data Sources sr _ * _ ue bigint Data source priority logic_key bigint Logical key
  • FIG. 7 a schematic diagram of the fused relationship network shown in FIG. 7 is obtained.
  • the logic KEY701 as the center, it also includes six nodes: mobile phone number 702, financial account number 703, ID number 704, e-commerce login account 705, bank card number 706, and payment wallet account 707.
  • the remaining nodes are all Forms an edge with logic KEY701, and the attributes of the edges are the source.
  • step S140 the specified information is repaired according to the post-fusion relation network.
  • this step first, a breadth-first search is performed on a plurality of the merged relationship networks from a specified logical KEY to find other related logical KEYs, and the first N shortest path logical KEYs are found. Then, the repair result of the specified information is obtained according to the field value corresponding to the specified field in the first N shortest path logical KEYs.
  • FIG. 8 shows a flowchart of repairing specified information according to the post-fusion relationship network in step S140, which specifically includes the following steps:
  • step S810 a node table is generated according to the deleted data integration table.
  • step S820 variable weights, vertex type weights, and vertex degrees are obtained according to the node table and the edge table.
  • step S840 the first N shortest path logic KEYs are obtained according to the first N values with the smallest values among the multiple path lengths.
  • step S850 the repair result of the specified information is obtained according to the field value corresponding to the specified field in the first N shortest path logical KEYs.
  • This step mainly relies on person-to-person contact (shared mobile phone, shipping address, etc.) to query the logical person who has the closest relationship with the lost customer.
  • logical persons are likely to be the customer himself (that is, the borrower) or have a close relationship with him. Of physical people, expect to be able to reach out of contact customers through the contact methods of these logical people.
  • each "logical person” has a unique KEY (that is, a logical KEY), with the logical KEY as the center, extending various accounts of the "logical person” (mobile phone number, e-commerce login account, bank card number, etc.), each logical person They are connected through information nodes.
  • the repair algorithm is executed, the BFS (Breadth-First Search) algorithm operation is executed from the logical KEY to find other related logical KEYs.
  • the path length is obtained according to the weighted summary of edge weight, vertex type weight, and vertex degree.
  • the first N eg, the first 100 values
  • the first step is to create a fusion relationship network diagram from Table 2.
  • the various data in Table 2 are mapped to the graph to generate a node table and an edge table.
  • each edge is attached with the source weight (that is, the edge weight) and the vertex type weight.
  • the vertex degree is calculated.
  • the vertex degree is equal to the out degree + the in degree.
  • the number of out edges of the vertex is called the vertex's Out-degree, the sum of the number of times a point in a directed graph is used as the end point of an edge in the graph is in-degree.
  • the generated files include:
  • the second step is the calculation of the BFS path length, that is, performing a BFS operation on the specified logical KEY, and finally returning the first N (eg, the first 100) shortest path logical KEYs.
  • customers may fill in personal information through multiple systems, such as ID numbers, which may be filled in when the basic information of e-commerce is registered, or when the real name of the bundled card is paid, so the same vertex
  • ID numbers which may be filled in when the basic information of e-commerce is registered, or when the real name of the bundled card is paid, so the same vertex
  • the credibility of the information is different when the source is different.
  • the real name table of the ID card source payment is more credible than the source e-commerce basic information table.
  • edges of the logical KEY to other types of vertices are given different weights according to different sources, that is, edge weights. The smaller the value, the more reliable the data source is, and the larger the data source is, the less reliable it is.
  • Path length edge weight + vertex type weight + vertex degree.
  • Dijkstra's single-source shortest path algorithm to calculate the shortest path of each other KEY associated with the KEY, and calculate according to the above-mentioned path length calculation formula.
  • the Dijkstra algorithm is the shortest path algorithm from one vertex to the other vertices, which solves the shortest path problem in a directed graph.
  • the main feature of Dijkstra's algorithm is to expand to the outer layer with the starting point as the center until it reaches the end point.
  • Table 3 shows the logic of the modified BFS used in this embodiment as follows:
  • each node ID is mapped back to the original KEY, and the final output format is:
  • the path length is a closeness value of different logical KEYs.
  • the specified information (such as a phone number) corresponding to the N logical keys with the closest relationship is selected, so as to obtain new possible contact information of the borrower.
  • the information restoration method constructs a relational network by constructing user data of the e-commerce platform, and performs calculations such as fusion and search to achieve the restoration of some user information.
  • the lost contact information can be repaired to get more contact information of the lost contact borrower.
  • the reliability of the contact information of the lost borrower can be improved.
  • FIG. 9 shows a schematic diagram of an information repair apparatus provided in another embodiment of the present disclosure.
  • the information repair apparatus 900 includes a data acquisition module 910, a network construction module 920, a fusion module 930, and a repair. Module 940.
  • the data acquisition module 910 is configured to acquire user data through an e-commerce platform and integrate the user data to obtain a data integration table;
  • the network construction module 920 is configured to construct a relationship network according to the data integration table;
  • the fusion module 930 is It is configured to perform multi-account fusion on the data integration table, and generate a fused relationship network in combination with the relationship network;
  • a repair module 940 is configured to perform repair of specified information according to the fused relationship network.
  • the information repair apparatus provided by the embodiments of the present disclosure, on the one hand, separately constructs a nested data structure for data of different granularity levels, so that there is no need to cache wait when data is stored in the database, regardless of the granularity of the data obtained Real-time warehousing can be performed to improve data query performance and simplify the multi-data warehousing process.
  • the coarse-grained granularity with the highest granularity level is used as the index for statistics, there is no need to remove the weight.
  • the indicators based on the coarse-grained statistical value can be directly summed.
  • the present disclosure also provides an electronic device including a processor and a memory.
  • the memory stores operation instructions for the processor to control the following methods: acquiring user data through an e-commerce platform and integrating the user data Obtaining a data integration table; constructing a relationship network according to the data integration table; performing multi-account fusion on the data integration table, combining the relationship network to generate a fused relationship network; and performing repair of specified information according to the fused relationship network .
  • FIG. 10 illustrates a schematic structural diagram of a computer system 1000 suitable for implementing an electronic device according to an embodiment of the present application.
  • the electronic device shown in FIG. 10 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
  • the computer system 1000 includes a central processing unit (CPU) 1001, which can be based on a program stored in a read-only memory (ROM) 1002 or a program loaded from a storage portion 1007 into a random access memory (RAM) 1003. Instead, perform various appropriate actions and processes.
  • ROM read-only memory
  • RAM random access memory
  • various programs and data required for the operation of the system 1000 are also stored.
  • the CPU 1001, the ROM 1002, and the RAM 1003 are connected to each other through a bus 1004.
  • An input / output (I / O) interface 1005 is also connected to the bus 1004.
  • the following components are connected to the I / O interface 1005: an input portion 1006 including a keyboard, a mouse, etc .; an output portion 1007 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc .; and a speaker; a storage portion 1008 including a hard disk, etc. ; And a communication section 1009 including a network interface card such as a LAN card, a modem, and the like. The communication section 1009 performs communication processing via a network such as the Internet.
  • the driver 1010 is also connected to the I / O interface 1005 as needed.
  • a removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 1010 as needed, so that a computer program read therefrom is installed into the storage section 1008 as needed.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart.
  • the computer program may be downloaded and installed from a network through the communication section 1009, and / or installed from a removable medium 1011.
  • this computer program is executed by a central processing unit (CPU) 1001
  • CPU central processing unit
  • the computer-readable medium shown in this application may be a computer-readable signal medium or a computer-readable medium or any combination of the foregoing.
  • the computer-readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable Read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer-readable medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. This propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium can also be any computer-readable medium other than a computer-readable medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more of the logic functions used to implement the specified logic.
  • Executable instructions may also occur in a different order than those marked in the drawings. For example, two successively represented boxes may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram or flowchart, and combinations of blocks in the block diagram or flowchart can be implemented with a dedicated hardware-based system that performs the specified function or operation, or can be implemented with A combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present application may be implemented by software or hardware.
  • the described unit may also be provided in a processor, for example, it may be described as: a processor includes a sending unit, an obtaining unit, a determining unit, and a first processing unit.
  • a processor includes a sending unit, an obtaining unit, a determining unit, and a first processing unit.
  • the name of these units does not constitute a limitation on the unit itself in some cases.
  • the sending unit can also be described as a "unit that sends a picture acquisition request to a connected server".
  • the present disclosure also provides a computer-readable medium, which may be included in the device described in the above embodiments; or may exist alone without being assembled into the device.
  • the computer-readable medium carries one or more programs.
  • the device includes the following method steps:

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种信息修复方法、装置、电子设备及计算机可读介质,属于互联网技术领域。该方法包括:通过电商平台获取用户数据,并对所述用户数据进行整合,得到数据整合表(S110);根据所述数据整合表构建关系网络(S120);对所述数据整合表进行多账号融合,结合所述关系网络生成融合后关系网络(S130);根据所述融合后关系网络进行指定信息的修复(S140)。该方法通过对电商平台的用户数据构建生成关系网络,并进行融合、搜索等计算,实现对用户一些信息的修复,对原来已经失联的指定信息进行修复,可以得到更多失联借款人的联系方式。而且基于电商平台实际发生的购物数据,可以提高失联借款人联系方式的可靠性。

Description

信息修复方法、装置、电子设备及计算机可读介质
本公开要求申请日为2018年07月20日、申请号为201810804729.2、发明创造名称为《信息修复方法、装置、电子设备及计算机可读介质》的中国发明专利申请的优先权。
技术领域
本公开总体涉及互联网技术领域,具体而言,涉及一种信息修复方法、装置、电子设备及计算机可读介质。
背景技术
随着互联网金融业务发展,用户通过在线填写资料申请贷款,通过对大数据、机器学习等技术进行自动化授信决策,反欺诈,贷款催收,个性化营销等,大大提高业务效率,降低成本。
在互联网信贷客户逾期后,主要依靠贷后催收提高贷款收回率。目前,催收人员主要通过在申请贷款时预留个人电话号码,常用联系人电话号码联系借贷人,进行电话催收。但是如果借贷人无心偿还贷款,其预留号码的有效率会比较低,给电话催收带来不利影响。
因此,现有技术中的技术方案还存在有待改进之处。
在所述背景技术部分公开的上述信息仅用于加强对本公开的背景的理解,因此它可以包括不构成对本领域普通技术人员已知的现有技术的信息。
发明内容
本公开提供一种信息修复方法、装置、电子设备及计算机可读介质,以解决上述技术问题。
本公开的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本公开的实践而习得。
根据本公开的一方面,提供一种信息修复方法,包括:
通过电商平台获取用户数据,并对所述用户数据进行整合,得到数据整合表;
根据所述数据整合表构建关系网络;
对所述数据整合表进行多账号融合,结合所述关系网络生成融合后关系网络;
根据所述融合后关系网络进行指定信息的修复。
在本公开的一个实施例中,所述用户数据的数据来源为电商登录账号、身份证号、银行卡号、手机号、钱包支付账号、金融理财账号、移动设备号中的至少一种。
在本公开的一个实施例中,对所述用户数据进行整合,得到数据整合表包括:
针对所述用户数据的不同数据来源分别设定数据来源优先级;
当不同数据来源中含有相同字段时,根据所述数据来源优先级选择优先级高的数据来源中的字段值;当不同数据来源中未包含相同字段时,在相应数据来源中获取字段对应的字段值;
根据字段和对应的字段值形成所述数据整合表。
在本公开的一个实施例中,根据所述数据整合表构建关系网络包括:
将所述用户数据的数据来源作为节点;
根据所述节点之间的直接关联或间隔关联得到两个节点之间的边;
通过所述节点和所述边形成所述关系网络。
在本公开的一个实施例中,对所述数据整合表进行多账号融合,结合所述关系网络生成融合后关系网络包括:
对所述数据整合表中的数据来源优先级低于或等于预设值的数据来源进行删除;
根据删除后的数据整合表形成边表和节点编号与原始KEY的对应关系;
根据所述边表和节点个数采用连通分量算法计算得到节点编号与逻辑KEY的对应关系;
根据所述节点编号与原始KEY的对应关系和所述节点编号与逻辑KEY的对应关系,计算得到原始KEY与逻辑KEY的对应关系;
根据所述原始KEY与逻辑KEY的对应关系生成所述融合后关系网络;
其中所述原始KEY为所述用户数据的唯一识别号,所述逻辑KEY用于标记多账号融合后多个所述原始KEY对应的唯一逻辑主体。
在本公开的一个实施例中,根据删除后的数据整合表形成边表和节点编号与原始KEY的对应关系包括:
对所述删除后的数据整合表中的节点进行连续编号;
将同一原始KEY的节点进行两两组合,得到边表,所述边表中包括起始节点编号和终点节点编号;
构建节点编号与原始KEY的对应关系。
在本公开的一个实施例中,根据所述融合后关系网络进行指定信息的修复包括:
对多个所述融合后关系网络中从指定的逻辑KEY出发进行广度优先搜索查找相关的其他逻辑KEY,找到前N个最短路径逻辑KEY;
根据所述前N个最短路径逻辑KEY中指定字段对应的字段值得到指定信息的修复结果。
在本公开的一个实施例中,对多个所述融合后关系网络中从指定的逻辑KEY出发进行广度优先搜索查找相关的其他逻辑KEY,找到前N个最短路径逻辑KEY包括:
根据所述删除后的数据整合表产生节点表;
根据所述节点表和所述边表得到变权重、顶点类型权重和顶点度数;
对指定的逻辑KEY采用广度优先搜索算法进行遍历和加权计算,得到多个路径长度,其中所述路径长度=边权重+顶点类型权重+顶点度数;
根据所述多个路径长度中数值最小的前N个数值得到所述前N个最短路径逻辑KEY。
根据本公开的再一方面,提供一种信息修复装置,包括:
数据获取模块,被配置为通过电商平台获取用户数据,并对所述用户数据进行整合,得到数据整合表;
网络构建模块,被配置为根据所述数据整合表构建关系网络;
融合模块,被配置为对所述数据整合表进行多账号融合,结合所述关系网络生成融合后关系网络;
修复模块,被配置为根据所述融合后关系网络进行指定信息的修复。
根据本公开的又一方面,提供一种电子设备,包括处理器;存储器,存储用于所述处理器控制如上所述的方法步骤的指令。
根据本公开的另一方面,提供一种计算机可读介质,其上存储有计算机可执行指令,所述可执行指令被处理器执行时实现如上所述的方法步骤。
根据本公开实施例提供的信息修复方法、装置、电子设备及计算机可读介质,一方面,通过对电商平台的用户数据构建生成关系网络,并进行融合、搜索等计算,实现对用户一些信息的修复,对原来已经失联的指定信息进行修复,可以得到更多失联借款人的联系方式。另一方面,基于电商平台实际发生的购物数据,可以提高失联借款人联系方式的可靠性。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本公开。
附图说明
通过参照附图详细描述其示例实施例,本公开的上述和其它目标、特征及优点将变得更加显而易见。
图1示出本公开一实施例中提供的一种信息修复方法的流程图。
图2示出本公开一实施例图1中步骤S110的流程图。
图3示出本公开一实施例图1中步骤S120的流程图。
图4示出本公开一实施例中的关系网络示意图。
图5示出本公开一实施例图1中步骤S130的流程图。
图6示出本公开一实施例图5中步骤S520的流程图。
图7示出本公开一实施例中的融合后关系网络示意图。
图8示出本公开一实施例图1中步骤S140的流程图。
图9示出本公开另一实施例中提供的一种信息修复装置的示意图。
图10示出本公开一实施例提供的适于用来实现本申请实施例的电子设备的结构示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知结构、方法、装置、实现、材料或者操作以避免喧宾夺主而使得本公开的各方面变得模糊。
附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
为使本公开的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本公开进一步详细说明。
在本公开的相关实施例中,据有关数据统计,国内个人信贷市场中,新申请客户中,失联比例达30%以上,进入不良阶段后(逾期30天以上),失联比例高达70%。通过申请贷款时预留个人电话号码,关联人电话号码进行电话催收,会因欠款客户(即借款人)的填写的联系方式失联比例较高或欠款客户填写的常用联系人可用性较低,从而影响贷后催收的工作,产生较多不良资产。
图1示出本公开一实施例中提供的一种信息修复方法的流程图,包括以下步骤:
如图1所示,在步骤S110中,通过电商平台获取用户数据,并对所述用户数据进行整合,得到数据整合表。
如图1所示,在步骤S120中,根据所述数据整合表构建关系网络。
如图1所示,在步骤S130中,对所述数据整合表进行多账号融合,结合所述关系网络生成融合后关系网络。
如图1所示,在步骤S140中,根据所述融合后关系网络进行指定信息的修复。
本公开实施例提供的信息修复方法,一方面,通过对电商平台的用户数据构建生成关系网络,并进行融合、搜索等计算,实现对用户一些信息的修复,对原来已经失联的指定信息进行修复,可以得到更多失联借款人的联系方式。另一方面,基于电商平台实际发生的购物数据,可以提高失联借款人联系方式的可靠性。
以下结合图1所示的流程图对本公开提供的信息修复方法进行详细介绍,具体如下:
在步骤S110中,通过电商平台获取用户数据,并对所述用户数据进行整合,得到数据整合表。
通过电商平台来获取用户数据,由于在电商业务中,用户在完成整个购物过程中会留下诸多方面的信息,如用户在注册账号、浏览商品过程、下单成功、支付完成等过程中留下大量手机号、设备、卡号、账号等数据。
在本公开的一个实施例中,通过电商平台获取的用户数据的数据来源为电商登录账号、身份证号、银行卡号、手机号、钱包支付账号、金融理财账号、移动设备号中的至少一种。实际在电商平台中还可以获取更多的用户数据,如地址信息等,本公开中仅以修复联系方式这种指定信息为例进行介绍。
在本公开的一个实施例中,图2示出步骤S110中对所述用户数据进行整合,得到数据整合表的流程图,包括以下步骤:
如图2所示,在步骤S210中,针对所述用户数据的不同数据来源分 别设定数据来源优先级。
如图2所示,在步骤S220中,基于数据来源及数据来源优先级获取字段级对应的字段值。
具体为:当不同数据来源中含有相同字段时,根据所述数据来源优先级选择优先级高的数据来源中的字段值;当不同数据来源中未包含相同字段时,在相应数据来源中获取字段对应的字段值。
如图2所示,在步骤S230中,根据字段和对应的字段值形成所述数据整合表。
表1为数据整合表,示出上述数据来源下的用户数据,具体如下:
Idlist表 字段类型 字段描述
us_*_key Bigint 原始KEY
us_*_p String 电商登录账号
us_*_n String 用户姓名
id_*_n String 身份证号码
wa_*_t String 支付钱包账号
cu_*_ao String 金融理财账号
mo_*_e String 手机号
ba_*_ao String 银行卡号
de_*_id String 移动端设备ID
da_*_a String 数据来源
sr_*_ue Bigint 数据来源优先级
表1
如表1所示,对用户数据在Idlist表中的字段、字段类型以及字段描述进行介绍。本公开中主要是以电商登录账号为中心,对不同来源的用户设备、身份证号、卡号、手机号等数据进行整合。当用户首次访问电商网站时,通常会先注册网站账号(即电商登录账号),在登录账号后,搜索浏览商品,最后填写收货地址、收货人及收货电话,下单购买商品。同时一个现实中真实人可能会有多个电商登录账号,并且会通过这多个账号给自己或亲朋好友下单购买商品,这样就会留下亲朋好友收货人的手机号、地址、姓名等。同时如果通过移动设备购买商品,还会留下移动设备信息号(如IMEI,可唯一确定一台移动设备)。同时,也可能存在多个账户登录同一设备的情况。如果支付时选择在线快捷支付,还会绑定本人实名信息(身份证、手机号),卡信息(信用卡号、借记卡号、发卡行)等。这些多种由不同系统(支付实名系统、电商基本信息系统等)产生的数据,积累一个庞大的、涵盖电商登录账号、移动设备、手机号、身份证号、收 货地址、银行等等信息,即综合性的用户数据。
如上所述,身份证号、手机号等个人信息字段会来源多个系统(如支付实名、电商基本信息),由“数据来源”字段区分数据来源的所属系统,不同的数据来源具有不同的可信度。比如身份证号,来源支付实名系统比来源电商基本信息系统中要可靠。通过“数据来源优先级”字段量化数据来源优先级的值,以便后续步骤中根据数据来源优先级对众多的数据进行选择和取舍。
在步骤S120中,根据所述数据整合表构建关系网络。
在本公开的一个实施例中,图3示出步骤S120中根据所述数据整合表构建关系网络的流程图,包括以下步骤:
如图3所示,在步骤S310中,将所述用户数据的数据来源作为节点。
如图3所示,在步骤S320中,根据所述节点之间的直接关联或间隔关联得到两个节点之间的边。
如图3所示,在步骤S330中,通过所述节点和所述边形成所述关系网络。
图4示出基于上述表1形成的关系网络示意图,如图4所示的关系网络中节点的来源,主要是表1中的各类账号,包括:电商登录账号41、身份证号42、银行卡号43、手机号44、钱包支付账号45、金融理财账号46、移动设备号47。网络中的节点还具有属性,节点属性主要包括来源、时间等。例如,电商登录账号41的属性包括:用户等级、注册时间、最近下单时间和最近消费金额。身份证号42的属性包括:(类型:护照/驾驶证/身份证)、省份、数据来源和数据来源优先级。银行卡号43的属性包括:(类型:借记/信用)、银行、数据来源、数据来源优先级。手机号44的属性包括:手机号、最后一次使用时间、数据来源、数据来源优先级。钱包支付账号45的属性包括:钱包支付账号和注册时间。金融理财账号46的属性包括理财账号和注册时间。移动设备号47的属性包括:电商登录账号、IMEI号、数据来源、数据来源优先级。
如图4所示的关系网络中边的来源有两种:一是表1中处于同一行的各类账号(即已知的绑定和使用关系),都有一条连接的边;二是由已知的关系,根据规则推导出的隐含关系,如使身份证号相同的多行数据存在关联关系。边的属性主要包括:实名认证、交易、注册、绑定等。例如,以电商登录账号41相关的边为例,电商登录账号41与身份证号42的边的属性为实名认证,电商登录账号41与银行卡号43的边的属性为绑卡/交易,电商登录账号41与手机号44的边的属性为交易,电商登录账号41与钱包支付账号45的边的属性为注册,电商登录账号41与金融理财 账号46的边的属性为注册,电商登录账号41与移动设备号47的边的属性为交易。对于其他边的属性参照图4所示,此处不再赘述。
在步骤S130中,对所述数据整合表进行多账号融合,结合所述关系网络生成融合后关系网络。
由于在关系网络中,会存在多个电商登录账号实际上是同一“逻辑人”的情况,这需要对这多个电商登录账号进行融合,利用更大更准确的关系网络,提高失联修复效果。因此在步骤S130中通过带过滤的连通分量算法,将认定为同一“逻辑人”的各种节点(手机,身份证,电商登录账号等)关联为同一个逻辑KEY,从而实现各种账号到逻辑KEY的映射。
在本公开的一个实施例中,图5示出步骤S130中对所述数据整合表进行多账号融合,结合所述关系网络生成融合后关系网络的流程图,具体包括以下步骤:
如图5所示,在步骤S510中,对所述数据整合表中的数据来源优先级低于或等于预设值的数据来源进行删除。
如图5所示,在步骤S520中,根据删除后的数据整合表形成边表和节点编号与原始KEY的对应关系。
如图5所示,在步骤S530中,根据所述边表和节点个数采用连通分量算法计算得到节点编号与逻辑KEY的对应关系。
如图5所示,在步骤S540中,根据所述节点编号与原始KEY的对应关系和所述节点编号与逻辑KEY的对应关系,计算得到原始KEY与逻辑KEY的对应关系。其中所述原始KEY为所述用户数据的唯一识别号,所述逻辑KEY用于标记多账号融合后多个所述原始KEY对应的唯一逻辑主体。
如表1所示,其中原始KEY(即字段us_*_key)为原始数据的唯一ID,并且保证每次更新数据表时,数据的原始KEY不会改变。
如图5所示,在步骤S550中,根据所述原始KEY与逻辑KEY的对应关系生成所述融合后关系网络。
在本公开的一个实施例中,图6示出步骤S520中根据删除后的数据整合表形成边表和节点编号与原始KEY的对应关系包括:
如图6所示,在步骤S610中,对所述删除后的数据整合表中的节点进行连续编号。
如图6所示,在步骤S620中,将同一原始KEY的节点进行两两组合,得到边表,所述边表中包括起始节点编号和终点节点编号。
如图6所示,在步骤S630中,构建节点编号与原始KEY的对应关系。
根据表1所示的数据整合表,经过融合业务输出原始KEY与逻辑KEY 的对应关系,即输入格式为文本文件,每行数据有唯一原始KEY的关联数据,数据内容如表1所示,不同字段之间以空格分隔;输出格式也为文本文件,每行为原数据行中的原始KEY与其逻辑KEY之间的映射关系,格式为原始KEY[空格]逻辑KEY。输出为所有原始KEY所对应的逻辑KEY,其中取所对应的所有原始KEY中最小值作为逻辑KEY,具体包括下述三个步骤:
第一步,对表1所示的数据整合表进行过滤。
根据业务严格度的要求(可控制修复结果的是否严格),指定限制来源,更新关系网络结构。其中过滤规则如下:
1)电商登录账号一致
2)手机号一致,限制来源为指定来源
3)身份证号码一致,限制来源为指定来源
4)金融理财账号一致
5)支付钱包账号一致
6)银行卡号一致,限制来源为指定来源
针对表1所示的数据,保留手机号、身份证号、银行卡号来源优先级为6以上节点,其他节点删除,对起点或者终点不在保留列表中的边,直接删除。
另外,由于手机号、身份证号等数据格式差异较大,为了计算方便,统一对所有节点进行ID编码,对所有节点按0开始进行连续编号,得到一个唯一的节点编号(即节点ID)(表1中多行数据中节点值相同,节点ID相同)。
(1)将同一原始KEY的节点ID,两两组合,得到边表的格式为:
起点节点ID[空格]终点节点ID。
(2)保留节点ID和原始KEY对应关系,方便后续查找,格式为:
节点ID[空格]原始KEY
例如,将手机号节点13*******生成新的节点ID 101,原始KEY为100001,得到保留节点和原始KEY对应关系的格式为:101 100001。
第二步,在图中计算连通分量。
采用的连通分量算法为标准算法,输入文件为边表文件和节点个数,输出是节点ID到连通分量ID的对应关系。由于第一步中已经将无用的边和节点全部过滤出去,处于同一连通分量的节点即可视为同一“逻辑主体”(即“逻辑人”),连通分量ID取同一连通分量中最小节点ID。
即取连通分量ID为逻辑KEY,输出数据文件,格式为:
节点ID[空格]逻辑KEY。
第三步,由连通分量计算原始KEY到逻辑KEY的对应关系。
首先将节点ID映射成原始KEY,这一步通过查找节点ID->KEY对应 表来完成。最终输出所有原始KEY及其对应的逻辑KEY,格式为:
原始KEY[空格]逻辑KEY。
最后,将此结果与原始输入做一次合并,也就是将逻辑KEY添加到数据整合表中,另外由于过滤删除了一部分节点(如用户姓名、移动端设备ID等),因此得到如表2所示的融合表,如下所示:
Idlist表 字段类型 字段描述
us_*_key bigint 原始KEY
us_*_p string 电商登录账号
mo_*_e String 手机号
id_*_n string 身份证号码
wa_*_t String 支付钱包账号
cu_*_ao String 金融理财账号
ba_*_ao String 银行卡号
da_*_a string 数据来源
sr_*_ue bigint 数据来源优先级
logic_key bigint 逻辑KEY
表2
进一步根据上述融合表,得到图7所示的融合后关系网络的示意图。如图7所示,以逻辑KEY701为中心,还包括手机号702、金融理财账号703、身份证号704、电商登录账号705、银行卡号706和支付钱包账号707等六个节点,其余节点均与逻辑KEY701形成边,边的属性均为来源。
在步骤S140中,根据所述融合后关系网络进行指定信息的修复。
在本公开的一个实施例中,该步骤中首先,对多个所述融合后关系网络中从指定的逻辑KEY出发进行广度优先搜索查找相关的其他逻辑KEY,找到前N个最短路径逻辑KEY,然后,根据所述前N个最短路径逻辑KEY中指定字段对应的字段值得到指定信息的修复结果。
具体的,图8示出步骤S140中根据所述融合后关系网络进行指定信息的修复的流程图,具体包括以下步骤:
如图8所示,在步骤S810中,根据所述删除后的数据整合表产生节点表。
如图8所示,在步骤S820中,根据所述节点表和所述边表得到变权重、顶点类型权重和顶点度数。
如图8所示,在步骤S830中,对指定的逻辑KEY采用广度优先搜索算法进行遍历和加权计算,得到多个路径长度,其中所述路径长度=边权重+顶点类型权重+顶点度数。
如图8所示,在步骤S840中,根据所述多个路径长度中数值最小的前N 个数值得到所述前N个最短路径逻辑KEY。
如图8所示,在步骤S850中,根据所述前N个最短路径逻辑KEY中指定字段对应的字段值得到指定信息的修复结果。
该步骤中主要依靠人与人之间联系(共用手机,收货地址等),查询失联客户关系最紧密的逻辑人,这些逻辑人很可能就是客户本人(即借款人)或者与其有紧密关系的实体人,期望通过这些逻辑人的联系方式能够联系到失联客户。
在关系网络中,通过同一“逻辑人”将同属某一个真实人的不同账号的最终关联起来。其中每一个“逻辑人”拥有唯一的KEY(即逻辑KEY),以逻辑KEY为中心,延伸出“逻辑人”的各种账号(手机号,电商登录账号,银行卡号等),各逻辑人之间通过信息节点相连。修复算法执行时,统一从逻辑KEY出发执行BFS(Breadth-First Search,广度优先搜索)算法操作查找相关的其它逻辑KEY,操作中根据边权重、顶点类型权重、顶点度数等加权汇总得到路径长度,最后只保留前N(如可以取值前100)个最短路径。
其中本实施例中路径长度计算大体分为两步:
第一步,由表2创建融合关系网络图。
将表2中各种不同的数据映射到图中,产生节点表和边表。同时每条边附带来源权重(即边权重)与顶点类型权重,同时计算顶点度数,顶点度数等于出度+入度,其中对于有向图来说,顶点的出边条数称为该顶点的出度,有向图中某点作为图中边的终点的次数之和为入度。
基于上述,产生文件包括:
(1)不同顶点账号(手机,地址,PIN等类型)->逻辑KEY映射表
(2)逻辑KEY->不同顶点账号(手机,地址,PIN等类型)映射表
(3)边表(包含数据来源等属性)
(4)每个节点度数的统计。
第二步,BFS路径长度计算,即对指定的逻辑KEY进行BFS操作,最后返回前N(例如前100)个最短路径逻辑KEY。
在实际应用场景中,客户(即借款人)可能通过多个系统填写个人信息,比如身份证号,可能在电商基本信息注册时填写,也可能在支付绑卡实名时填写,所以,同一顶点信息,来源不同时其可信度不同,如身份证号来源支付实名表比来源电商基本信息表的可信度要高。为了更好度量这些信息,将逻辑KEY到其他类型顶点的边按来源不同赋予不同的权重,即边权重,该值越小,意味数据来源越可靠,越大数据来源越不可靠。同理,两个原始KEY通过身份证号相连,比通过设备相连,可靠性也高一些,所以对不同类型顶点(非原始KEY顶点)赋予不同的权重,即顶点类型权重,该值越小,意味 相连接的KEY关系越紧密。
基于上述,考虑到不同顶点统计的度数差异较大,度数较高的顶点可能是公共顶点,对关系关联可信度较差,因此将每个顶点的出度和入度做一个统计,得到顶点度数,即顶点度数=出度+入度。
最后在计算各个KEY之间路径长度时,使用三者相加算法,值越小,说明关系越紧密。如下公式:
路径长度=边权重+顶点类型权重+顶点度数。
通过并行算法对不同的节点使用BFS算法进行图遍历,使用Dijkstra单源最短路径算法,计算KEY关联的每一个其他KEY的最短路径,根据上述路径长度的计算公式进行计算。其中Dijkstra迪杰斯特拉算法是从一个顶点到其余各顶点的最短路径算法,解决的是有向图中最短路径问题。迪杰斯特拉算法主要特点是以起始点为中心向外层层扩展,直到扩展到终点为止。
表3示出本实施例中采用的修改版BFS的逻辑如下所示:
Figure PCTCN2019095867-appb-000001
表3
然后,将各节点ID对应回原始KEY,最终输出格式为:
逻辑KEY[空格]逻辑KEY1:路径长度[空格]逻辑KEY2:路径长度[空格]逻辑KEY3:路径长度…
其中路径长度为不同逻辑KEY的紧密程度值。
最后,通过返回的路径长度排序,选择关系最近的N个逻辑KEY对应的指定信息(如电话号码),从而得到借款人的新的可能联系方式。
基于上述步骤,通过融合电商平台的用户数据进行关系网络构建、图算法计算等,可得更多失联欠款客户的潜在联系方式,从而帮助催收人员与客户取得联系,进行电话催收,化解不良资产。
综上所述,本公开实施例提供的信息修复方法,一方面,通过对电商平台的用户数据构建生成关系网络,并进行融合、搜索等计算,实现对用户一些信息的修复,对原来已经失联的指定信息进行修复,可以得到更多失联借款人的联系方式。另一方面,基于电商平台实际发生的购物数据,可以提高失联借款人联系方式的可靠性。
图9示出本公开另一实施例中提供的一种信息修复装置的示意图,如图9所示,该信息修复装置900中包括:数据获取模块910、网络构建模块920、融合模块930和修复模块940。
数据获取模块910被配置为通过电商平台获取用户数据,并对所述用户数据进行整合,得到数据整合表;网络构建模块920被配置为根据所述数据整合表构建关系网络;融合模块930被配置为对所述数据整合表进行多账号融合,结合所述关系网络生成融合后关系网络;修复模块940被配置为根据所述融合后关系网络进行指定信息的修复。
该装置中各个模块的功能参见上述方法实施例中的相关描述,此处不再赘述。
综上所述,本公开实施例提供的信息修复装置,一方面,针对不同粒度级别的数据分别构建嵌套数据结构,使得在数据入库时无需缓存等待,不论获取到哪种粒度的数据都可以实时入库,提升数据查询性能,简化多数据入库流程。另一方面,由于是以粒度级别最高的粗粒度为指标进行统计的,因此无需去重,而且按粗粒度统计数值合计的指标,可直接求和,求和结果中不存在重复合计的问题。
另一方面,本公开还提供了一种电子设备,包括处理器和存储器,存储器存储用于上述处理器控制以下方法的操作指令:通过电商平台获取用户数据,并对所述用户数据进行整合,得到数据整合表;根据所述数据整合表构建关系网络;对所述数据整合表进行多账号融合,结合所述关系网络生成融合后关系网络;根据所述融合后关系网络进行指定信息的修复。
下面参考图10,其示出了适于用来实现本申请实施例的电子设备的计算机系统1000的结构示意图。图10示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图10所示,计算机系统1000包括中央处理单元(CPU)1001,其可以根据存储在只读存储器(ROM)1002中的程序或者从存储部分1007加载到随机访问存储器(RAM)1003中的程序而执行各种适当的动作和处理。在RAM 1003中,还存储有系统1000操作所需的各种程序和数据。CPU 1001、ROM 1002以及RAM 1003通过总线1004彼此相连。输入/输出(I/O)接口1005也连接至总线1004。
以下部件连接至I/O接口1005:包括键盘、鼠标等的输入部分1006;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分1007;包括硬盘等的存储部分1008;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分1009。通信部分1009经由诸如因特网的网络执行通信处理。驱动器1010也根据需要连接至I/O接口1005。可拆卸介质1011,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1010上,以便于从其上读出的计算机程序根据需要被安装入存储部分1008。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分1009从网络上被下载和安装,和/或从可拆卸介质1011被安装。在该计算机程序被中央处理单元(CPU)1001执行时,执行本申请的系统中限定的上述功能。
需要说明的是,本申请所示的计算机可读介质可以是计算机可读信号介质或者计算机可读介质或者是上述两者的任意组合。计算机可读介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信 号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括发送单元、获取单元、确定单元和第一处理单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,发送单元还可以被描述为“向所连接的服务端发送图片获取请求的单元”。
另一方面,本公开还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的设备中所包含的;也可以是单独存在,而未装配入该设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该设备执行时,使得该设备包括以下方法步骤:
通过电商平台获取用户数据,并对所述用户数据进行整合,得到数据整合表;根据所述数据整合表构建关系网络;对所述数据整合表进行多账号融合,结合所述关系网络生成融合后关系网络;根据所述融合后关系网络进行指定信息的修复。
应清楚地理解,本公开描述了如何形成和使用特定示例,但本公开的原理不限于这些示例的任何细节。相反,基于本公开公开的内容的教导,这些原理能够应用于许多其它实施方式。
以上具体地示出和描述了本公开的示例性实施方式。应可理解的是,本公开不限于这里描述的详细结构、设置方式或实现方法;相反,本公开意图涵盖包含在所附权利要求的精神和范围内的各种修改和等效设置。

Claims (11)

  1. 一种信息修复方法,包括:
    通过电商平台获取用户数据,并对所述用户数据进行整合,得到数据整合表;
    根据所述数据整合表构建关系网络;
    对所述数据整合表进行多账号融合,结合所述关系网络生成融合后关系网络;
    根据所述融合后关系网络进行指定信息的修复。
  2. 根据权利要求1所述的信息修复方法,其中,所述用户数据的数据来源为电商登录账号、身份证号、银行卡号、手机号、钱包支付账号、金融理财账号、移动设备号中的至少一种。
  3. 根据权利要求2所述的信息修复方法,其中,对所述用户数据进行整合,得到数据整合表包括:
    针对所述用户数据的不同数据来源分别设定数据来源优先级;
    当不同数据来源中含有相同字段时,根据所述数据来源优先级选择优先级高的数据来源中的字段值;当不同数据来源中未包含相同字段时,在相应数据来源中获取字段对应的字段值;
    根据字段和对应的字段值形成所述数据整合表。
  4. 根据权利要求2所述的信息修复方法,其中,根据所述数据整合表构建关系网络包括:
    将所述用户数据的数据来源作为节点;
    根据所述节点之间的直接关联或间隔关联得到两个节点之间的边;
    通过所述节点和所述边形成所述关系网络。
  5. 根据权利要求4所述的信息修复方法,其中,对所述数据整合表进行多账号融合,结合所述关系网络生成融合后关系网络包括:
    对所述数据整合表中的数据来源优先级低于或等于预设值的数据来源进行删除;
    根据删除后的数据整合表形成边表和节点编号与原始KEY的对应关系;
    根据所述边表和节点个数采用连通分量算法计算得到节点编号与逻辑KEY的对应关系;
    根据所述节点编号与原始KEY的对应关系和所述节点编号与逻辑KEY的对应关系,计算得到原始KEY与逻辑KEY的对应关系;
    根据所述原始KEY与逻辑KEY的对应关系生成所述融合后关系网络;
    其中所述原始KEY为所述用户数据的唯一识别号,所述逻辑KEY用于标记多账号融合后多个所述原始KEY对应的唯一逻辑主体。
  6. 根据权利要求5所述的信息修复方法,其中,根据删除后的数据整合表形成边表和节点编号与原始KEY的对应关系包括:
    对所述删除后的数据整合表中的节点进行连续编号;
    将同一原始KEY的节点进行两两组合,得到边表,所述边表中包括起始节点编号和终点节点编号;
    构建节点编号与原始KEY的对应关系。
  7. 根据权利要求5所述的信息修复方法,其中,根据所述融合后关系网络进行指定信息的修复包括:
    对多个所述融合后关系网络中从指定的逻辑KEY出发进行广度优先搜索查找相关的其他逻辑KEY,找到前N个最短路径逻辑KEY;
    根据所述前N个最短路径逻辑KEY中指定字段对应的字段值得到指定信息的修复结果。
  8. 根据权利要求7所述的信息修复方法,其中,对多个所述融合后关系网络中从指定的逻辑KEY出发进行广度优先搜索查找相关的其他逻辑KEY,找到前N个最短路径逻辑KEY包括:
    根据所述删除后的数据整合表产生节点表;
    根据所述节点表和所述边表得到变权重、顶点类型权重和顶点度数;
    对指定的逻辑KEY采用广度优先搜索算法进行遍历和加权计算,得到多个路径长度,其中所述路径长度=边权重+顶点类型权重+顶点度数;
    根据所述多个路径长度中数值最小的前N个数值得到所述前N个最短路径逻辑KEY。
  9. 一种信息修复装置,包括:
    数据获取模块,被配置为通过电商平台获取用户数据,并对所述用户数据进行整合,得到数据整合表;
    网络构建模块,被配置为根据所述数据整合表构建关系网络;
    融合模块,被配置为对所述数据整合表进行多账号融合,结合所述关系网络生成融合后关系网络;
    修复模块,被配置为根据所述融合后关系网络进行指定信息的修复。
  10. 一种电子设备,包括:
    处理器;
    存储器,存储用于所述处理器控制如权利要求1-8任一项所述的方法步骤。
  11. 一种计算机可读介质,其上存储有计算机可执行指令,其中,所述可执行指令被处理器执行时实现如权利要求1-8任一项所述的方法步骤。
PCT/CN2019/095867 2018-07-20 2019-07-12 信息修复方法、装置、电子设备及计算机可读介质 WO2020015594A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810804729.2 2018-07-20
CN201810804729.2A CN110738558B (zh) 2018-07-20 2018-07-20 信息修复方法、装置、电子设备及计算机可读介质

Publications (1)

Publication Number Publication Date
WO2020015594A1 true WO2020015594A1 (zh) 2020-01-23

Family

ID=69163589

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/095867 WO2020015594A1 (zh) 2018-07-20 2019-07-12 信息修复方法、装置、电子设备及计算机可读介质

Country Status (2)

Country Link
CN (1) CN110738558B (zh)
WO (1) WO2020015594A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115367A (zh) * 2020-09-28 2020-12-22 北京百度网讯科技有限公司 基于融合关系网络的信息推荐方法、装置、设备和介质
CN112214648A (zh) * 2020-10-13 2021-01-12 合肥小龟快跑信息科技有限公司 一种根据采集点反馈异常信息实现爆管分析逻辑的方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111917779B (zh) * 2020-08-04 2022-10-21 北京金山云网络技术有限公司 基于目标账号的数据处理方法、装置、系统及服务端设备
CN112069231B (zh) * 2020-09-08 2024-05-17 京东科技控股股份有限公司 用户信息处理方法及装置、存储介质、电子设备
CN112817993B (zh) * 2021-01-30 2022-12-02 上海浦东发展银行股份有限公司 一种失联客户信息修复方法及其系统
CN113157704B (zh) * 2021-05-06 2023-07-25 成都卫士通信息产业股份有限公司 层级关系分析方法、装置、设备及计算机可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246653A (zh) * 2012-02-03 2013-08-14 腾讯科技(深圳)有限公司 数据处理方法和装置
CN105956016A (zh) * 2016-04-21 2016-09-21 成都数联铭品科技有限公司 关联信息可视化处理系统
CN107193855A (zh) * 2016-12-30 2017-09-22 杭州博采网络科技股份有限公司 一种数据分析系统及方法
CN107291709A (zh) * 2016-03-30 2017-10-24 阿里巴巴集团控股有限公司 关系网络的构建方法及装置
US20180075104A1 (en) * 2016-09-15 2018-03-15 Oracle International Corporation Techniques for relationship discovery between datasets
CN107862047A (zh) * 2017-11-08 2018-03-30 爱财科技有限公司 基于多个数据源的自然人数据处理方法和系统

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2001231260A1 (en) * 2000-02-14 2001-08-27 Bpass, Inc. Accessing information for multiple financial accounts via the internet
TW550477B (en) * 2000-03-01 2003-09-01 Passgate Corp Method, system and computer readable medium for Web site account and e-commerce management from a central location
US7865414B2 (en) * 2000-03-01 2011-01-04 Passgate Corporation Method, system and computer readable medium for web site account and e-commerce management from a central location
US20120010994A1 (en) * 2010-07-08 2012-01-12 American Express Travel Related Services Company, Inc. Systems and methods for transaction account offerings
US8959113B2 (en) * 2011-03-30 2015-02-17 Open Text S.A. System, method and computer program product for managing tabulated metadata
US8595267B2 (en) * 2011-06-27 2013-11-26 Amazon Technologies, Inc. System and method for implementing a scalable data storage service
US20140074865A1 (en) * 2012-09-10 2014-03-13 Service Repair Solutions, Inc. Identifying vehicle systems using vehicle components
US20150161622A1 (en) * 2013-12-10 2015-06-11 Florian Hoffmann Fraud detection using network analysis
JP6736450B2 (ja) * 2016-10-25 2020-08-05 株式会社日立製作所 データ分析支援装置及びデータ分析支援システム
CN107909178B (zh) * 2017-08-31 2021-06-08 深圳壹账通智能科技有限公司 电子装置、失联修复率预测方法和计算机可读存储介质
CN108173847A (zh) * 2017-12-27 2018-06-15 百度在线网络技术(北京)有限公司 多账号用户追踪方法、装置、设备及计算机可读介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246653A (zh) * 2012-02-03 2013-08-14 腾讯科技(深圳)有限公司 数据处理方法和装置
CN107291709A (zh) * 2016-03-30 2017-10-24 阿里巴巴集团控股有限公司 关系网络的构建方法及装置
CN105956016A (zh) * 2016-04-21 2016-09-21 成都数联铭品科技有限公司 关联信息可视化处理系统
US20180075104A1 (en) * 2016-09-15 2018-03-15 Oracle International Corporation Techniques for relationship discovery between datasets
CN107193855A (zh) * 2016-12-30 2017-09-22 杭州博采网络科技股份有限公司 一种数据分析系统及方法
CN107862047A (zh) * 2017-11-08 2018-03-30 爱财科技有限公司 基于多个数据源的自然人数据处理方法和系统

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115367A (zh) * 2020-09-28 2020-12-22 北京百度网讯科技有限公司 基于融合关系网络的信息推荐方法、装置、设备和介质
CN112115367B (zh) * 2020-09-28 2024-04-02 北京百度网讯科技有限公司 基于融合关系网络的信息推荐方法、装置、设备和介质
CN112214648A (zh) * 2020-10-13 2021-01-12 合肥小龟快跑信息科技有限公司 一种根据采集点反馈异常信息实现爆管分析逻辑的方法

Also Published As

Publication number Publication date
CN110738558B (zh) 2024-03-05
CN110738558A (zh) 2020-01-31

Similar Documents

Publication Publication Date Title
WO2020015594A1 (zh) 信息修复方法、装置、电子设备及计算机可读介质
US11989789B2 (en) Systems and methods for locating merchant terminals based on transaction data
US20130339186A1 (en) Identifying Fraudulent Users Based on Relational Information
US20150169726A1 (en) Methods and systems for analyzing entity performance
WO2019196549A1 (zh) 确定高风险用户的方法及装置
US20160055501A1 (en) System and method for determining a cohort
US11698269B2 (en) Systems and methods for resolving points of interest on maps
WO2012034237A1 (en) Systems and methods for providing virtual currencies
US11257088B2 (en) Knowledge neighbourhoods for evaluating business events
US20140358742A1 (en) Systems And Methods For Mapping In-Store Transactions To Customer Profiles
CN108595579A (zh) 联系人亲密度估算方法、装置、计算机设备和存储介质
CN114140256A (zh) 数据处理方法、装置、设备、介质和程序产品
US11222026B1 (en) Platform for staging transactions
US10216830B2 (en) Multicomputer processing of client device request data using centralized event orchestrator and link discovery engine
US10296882B2 (en) Multicomputer processing of client device request data using centralized event orchestrator and link discovery engine
US20190332706A1 (en) Systems and Methods for Providing Data Structure Access
US12001800B2 (en) Semantic-aware feature engineering
US20240185284A1 (en) Confidence levels in management and determination of user identity using identity graphs
US20240185277A1 (en) Management and determination of user identity using identity graphs
US20240185275A1 (en) Customer data verification in management and determination of user identity using identity graphs
US20240185242A1 (en) Probabilistic matching of account information in management and determination of user identity using identity graphs
TWI786378B (zh) 基於家庭關係的家庭戶網絡管理方法及系統
US20240086577A1 (en) Pair-wise graph querying, merging, and computing for account linking
US20220148004A1 (en) Systems and methods for predicting on-file payment credentials
JP2023007389A (ja) ブロックチェーンに基づくサイト選択方法、装置、機器および記憶媒体

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19837503

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19837503

Country of ref document: EP

Kind code of ref document: A1