CN111652667A - Method for aligning entity data of main related natural persons of enterprise - Google Patents

Method for aligning entity data of main related natural persons of enterprise Download PDF

Info

Publication number
CN111652667A
CN111652667A CN201911424586.3A CN201911424586A CN111652667A CN 111652667 A CN111652667 A CN 111652667A CN 201911424586 A CN201911424586 A CN 201911424586A CN 111652667 A CN111652667 A CN 111652667A
Authority
CN
China
Prior art keywords
data
investment
enterprises
enterprise
same
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911424586.3A
Other languages
Chinese (zh)
Inventor
吴桐
曾途
尹康
韩远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Business Big Data Technology Co Ltd
Original Assignee
Chengdu Business Big Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Business Big Data Technology Co Ltd filed Critical Chengdu Business Big Data Technology Co Ltd
Priority to CN201911424586.3A priority Critical patent/CN111652667A/en
Publication of CN111652667A publication Critical patent/CN111652667A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Evolutionary Computation (AREA)
  • Human Resources & Organizations (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Medical Informatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a method for aligning entity data of main related natural persons of an enterprise; the method and the system of the invention utilize the enterprise investment relationship network to assist in judging whether natural persons with the same name are the same person among different enterprises. The computer model is trained by using a machine learning method, so that the judgment of a more complex scene can be adapted; the method and the system fully mine the relevance of the data and exert the advantages of data integration analysis; by combining graph calculation and machine learning methods, under the condition that private data is not involved, whether important homonymous natural persons of different enterprises are the same natural person or not is accurately judged, and more reliable data support is provided for relevant data analysis and construction of knowledge maps.

Description

Method for aligning entity data of main related natural persons of enterprise
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method and a system for aligning entity data of main related natural persons of an enterprise.
Background
When performing correlation analysis when extracting information from multiple data sources, unique identification of identity becomes very important. For example, when the association graph is drawn, if natural persons of different enterprise information cannot be judged to be the same person, the graphs cannot be combined, and fact association information between different enterprises cannot be established. Further, if it is not determined that two people with the same name are the same person, different data are merged without trade, and an error in the construction of the associated network may occur.
In recent years, the same-celebrity recognition can be classified into classification problems in machine learning. And (4) constructing characteristics by using other information of the same-name persons between enterprises, and judging whether the two same-name persons are the same person or not by using a classification algorithm. The same investor, high-master, tightly links different enterprises. Whether the same-name people in different enterprises are the same person or not can be accurately and comprehensively identified, and the method has important significance for enterprise credit assessment and risk propagation.
Compared with the identification of natural persons with the same name in other application occasions, the uniqueness identification of the natural persons in the enterprise investment relation is particularly important, and the enterprise, as a main carrier of modern social activities, bears a large proportion of employment and investment and influences the whole social activities. At present, methods for identifying the same celebrities among enterprises mainly depend on data, such as recruitment data and identification card data. However, these data involve personal privacy issues, are difficult to obtain, and have limited coverage. In addition, the enterprise investment data has the characteristics of the enterprise investment data; current entity alignment algorithms are relatively free of the use of attributes of the investment relationship data itself.
Disclosure of Invention
The invention aims to provide a method and a system for aligning entity data of main related natural persons of an enterprise; the relevance of the data is fully mined, and the advantages of data integration analysis are exerted; by utilizing the characteristics of the investment incidence relation and the characteristics of the enterprise data, under the condition of not depending on other data, more accurate judgment is made on whether important homonymous natural persons of different enterprises are the same natural person, and the calculation expense is greatly reduced.
The method uses a machine learning mode and a model building mode to realize the comprehensive use and judgment of a plurality of associated network characteristic data; compared with simple rule type judgment, the method is more stable and higher in calculation efficiency.
In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:
a method for aligning entity data of main related natural persons of an enterprise utilizes the distance of investment relationship of the enterprise to assist in judging whether natural persons with the same name are the same person among different enterprises. And when the degree of the investment relationship between the two enterprises is less than a set threshold value, the natural persons with the same name between the two enterprises are considered to be the same person. The main stockholders, directors, prisoners, high governance, etc. of the natural human enterprise in this patent.
The method comprises the following implementation steps: (1) acquiring enterprise data, wherein the enterprise data comprises investment relations and main related natural persons;
(2) acquiring enterprise data of natural people with the same name;
(3) constructing an incidence relation network between two enterprises with the same natural person;
(4) calculating investment path data among enterprises;
(5) constructing a judgment model by taking the investment path among enterprises as a characteristic;
(6) training the model using the labeled data;
(7) inputting the investment path characteristic vector between two enterprises of the same name to be judged into the trained model, and judging the probability of whether the same name natural person is the same person or not by the model; and judging the same person when the probability value is larger than the set threshold value.
Further, the investment path data in the step (4) comprises:
(a) the shortest investment path between two enterprises;
(b) the number of investment paths between two enterprises;
(c) the number of the same-name natural people between two enterprises.
Further, the method includes a process of vectorizing the investment path data into a conversion.
Further, the investment path data vector in the step (7) is: x ═ X1, X2, X3, …; wherein X is the investment path vector data of the enterprise to be judged, X1 is the reciprocal of the shortest investment path, X2 is the number of the investment paths, and X3 is the number of the same-name natural persons.
Further, the model formula is as follows:
Figure BDA0002350811790000031
wherein (x) is the probability of the same person being the same person for different enterprises; wherein theta is a parameter of the model needing to be trained, and x is a quantized feature vector.
Further, the formula of the loss function in the model training process is as follows:
Figure BDA0002350811790000032
where j (θ) is the loss function, m is the number of samples, y(i)For the judgment of the ith sample, label value, hθ(x(i)) The model predicted value of the ith sample is, n is the number of model parameters, and lambda is a regular term parameter.
Further, when the shortest investment path length among enterprises is larger than 5, the shortest investment path vector is set to be 0.
Further, investment paths having an investment path length greater than 6 do not account for the number of active paths.
Further, the length of the shortest investment path between two enterprises constructing the association relationship network in the step (3) is less than 5.
Further, the invention provides a system for aligning entity data of main related natural persons of an enterprise; the system comprises a data acquisition module and a data storage module; a data processing module;
the data acquisition module acquires relevant data of a target to be analyzed;
the data storage module stores input and output data comprising a data acquisition module and a data processing module;
the data processing module; the method for aligning the entity data of the main related natural persons of the enterprises is used for judging whether the natural persons with the same name in different enterprise information are the same person or not.
Further, the invention provides electronic equipment which comprises a memory and a processor, wherein the memory is connected with the processor, and the processor finishes judging whether natural people with the same name in different enterprise information are the same person or not by the method.
Further, the present invention provides a computer readable storage medium comprising computer readable instructions for causing an electronic device to perform the operational steps contained in the method of the present invention.
Compared with the prior art, the method and the system have the following beneficial effects: the method and the system fully mine the relevance of the data and exert the advantages of data integration analysis; by utilizing the characteristics of the investment incidence relation and the characteristics of the enterprise data, under the condition of not depending on other data, more accurate judgment is made on whether important homonymous natural persons of different enterprises are the same natural person, and the calculation expense is greatly reduced. The method realizes the comprehensive use and judgment of a plurality of associated network characteristic data by using a machine learning model building mode; compared with simple rule type judgment, the method is more stable and higher in calculation efficiency.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a schematic diagram of the implementation steps of the method of the present invention.
Fig. 2 is a schematic diagram of the association map constructed in example 1.
Fig. 3 is a schematic diagram of the investment paths calculated in example 1.
Fig. 4 is a schematic diagram of the investment paths calculated in example 1.
Fig. 5 is a schematic diagram of the investment paths calculated in example 1.
Fig. 6 is a schematic diagram of the investment paths calculated in example 1.
Fig. 7 is a schematic diagram of the investment paths calculated in example 1.
Fig. 8 is a schematic block diagram of the electronic apparatus described in embodiment 5.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a method and a system for aligning entity data of main related natural persons of an enterprise; the relevance of the data is fully mined, and the advantages of data integration analysis are exerted; by utilizing the characteristics of the investment incidence relation and the characteristics of the enterprise data, under the condition of not depending on other data, more accurate judgment is made on whether important homonymous natural persons of different enterprises are the same natural person, and the calculation expense is greatly reduced. The method realizes the comprehensive use and judgment of a plurality of associated network characteristic data by using a machine learning model building mode; compared with simple rule type judgment, the method is more stable and higher in calculation efficiency.
In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:
a method for aligning entity data of main related natural persons of an enterprise utilizes the distance of investment relationship of the enterprise to assist in judging whether natural persons with the same name are the same person among different enterprises. And when the degree of the investment relationship between the two enterprises is less than a set threshold value, the natural persons with the same name between the two enterprises are considered to be the same person. The main stockholders, directors, prisoners, high governance, etc. of the natural human enterprise in this patent.
The method of the invention comprises the following implementation steps as shown in figure 1: (1) acquiring enterprise data, wherein the enterprise data comprises investment relations and main related natural persons;
(2) acquiring enterprise data of natural people with the same name;
(3) constructing an incidence relation network between two enterprises with the same natural person;
(4) calculating investment path data among enterprises;
(5) constructing a judgment model by taking the investment path among enterprises as a characteristic;
(6) training the model using the labeled data;
(7) inputting the investment path characteristic vector between two enterprises of the same name to be judged into the trained model, and judging the probability of whether the same name natural person is the same person or not by the model; and judging the same person when the probability value is larger than the set threshold value.
Further, the investment path data in the step (4) comprises:
(a) the shortest investment path between two enterprises; the degree of closeness among enterprises can be reflected to a great extent by the distance of the investment relation of the enterprises; if two enterprises with close investment relations contain natural persons with the same name in the main stockholders and the high management; then the probability that the nature is the same natural person is high. The incidence relation distance is used as a basis for judging whether natural persons with the same name are the same natural person, and the incidence relation of the enterprise is skillfully applied on the basis of mature technology of judging the shortest investment path of the enterprise; knowledge map technology and the like greatly simplify the judgment and calculation process of uniqueness of natural people and have higher accuracy. And data guarantee is provided for the optimized construction of the knowledge graph, graph reasoning investigation and the like.
(b) The number of investment paths between two enterprises; the investment paths reflect the degree of closeness among enterprises, and a plurality of close-distance investment paths reflect the extremely close association relationship of the enterprises and serve as a strong judgment basis for the same person by natural persons with the same name, so that the method has high accuracy and simple calculation process; when mass data are judged and combed, the calculation cost can be greatly reduced by a simple and effective calculation mode.
(c) The number of the same-name natural people between two enterprises. Whether a plurality of same-name persons appear in an enterprise to be analyzed or not is examined; in the enterprise investment field, strong interactivity exists among main stockholders and directors, supervision and high management, and the main stockholders and directors, supervision and high duty can be shared among a plurality of enterprises. When the investment association relationship is relatively close and a plurality of natural persons with the same name are mainly stockholders or directors, prisoners and high, the probability that the plurality of natural persons with the same name belong to the same natural person is extremely high. This cross-validation improves the accuracy of the determination.
Further, the method includes a process of vectorizing the investment path data into a conversion. The method has the advantages that the vector conversion is carried out on the data related to the investment path, so that the problem that the graph structure data cannot directly participate in calculation is solved, and indirect data such as a graph path and the like are converted into vectors capable of being operated; the calculation process is greatly simplified, and the calculation efficiency is improved.
Further, the investment path data vector in the step (7) is: x ═ X1, X2, X3, …; wherein X is the investment path vector data of the enterprise to be judged, X1 is the reciprocal of the shortest investment path, X2 is the number of the investment paths, and X3 is the number of the same-name natural persons. Integrating features of the shortest investment path, a plurality of investment paths, a plurality of homonymous natural persons and the like to form feature vectors of two enterprises to be analyzed; so that multiple feature data can be directly involved in the prediction of the training of the model at the same time.
Further, the model formula is as follows:
Figure BDA0002350811790000081
wherein (x) is the probability of the same person being the same person for different enterprises; wherein theta is a model parameter vector, and x is a quantized investment path data feature vector. Predicting the result by using a logistic regression algorithm model, wherein the data size is moderate between (0.1) the prediction result of the model and is suitable for representing the probability value; here if the threshold value set is 0.5; when the calculated result is more than 0.5, the same person can be considered as the same person by different enterprises.
Further, the formula of the loss function in the model training process is as follows:
Figure BDA0002350811790000082
where j (θ) is the loss function, m is the number of samples, y(i)For the judgment of the ith sample, label value, hθ(x(i)) The model predicted value of the ith sample is, n is the number of model parameters, and lambda is a regular term parameter.
Further, when the shortest investment path length among enterprises is larger than 5, the shortest investment path vector is set to be 0. The investment path correlation relation is very weak beyond 5 degrees, and the strong correlation is not provided any more. Setting the shortest investment path vector position of more than 5 degrees as 0; low-efficiency data is processed, and the complexity of calculation is reduced.
Further, investment paths having an investment path length greater than 6 do not account for the number of active paths. The map structure of the investment path exceeding 5 degrees is complex, the incidence relation is weak, and the strong correlation is not provided any more.
Further, the length of the shortest investment path between two enterprises constructing the association relationship network in the step (3) is less than 5. The map construction of the investment path exceeding 5 degrees is very complex and has no reference, and when the map construction is carried out, the shortest distance threshold is set to reduce the cost of the map construction, so that the calculation efficiency of the whole method is improved.
Further, the invention provides a system for aligning entity data of main related natural persons of an enterprise; the system comprises a data acquisition module and a data storage module; a data processing module;
the data acquisition module acquires relevant data of a target to be analyzed;
the data storage module stores input and output data comprising a data acquisition module and a data processing module;
the data processing module; the method for aligning the entity data of the main related natural persons of the enterprises is used for judging whether the natural persons with the same name in different enterprise information are the same person or not.
Further, the invention provides electronic equipment which comprises a memory and a processor, wherein the memory is connected with the processor, and the processor finishes judging whether natural people with the same name in different enterprise information are the same person or not by the method.
Further, the present invention provides a computer readable storage medium comprising computer readable instructions for causing an electronic device to perform the operational steps contained in the method of the present invention.
Example 1
Basic data of a company is acquired. Mainly comprises enterprise names, related natural person names such as main stockholders and high governments of the company, directors, prisoners and the like, enterprise names of upstream and downstream investment enterprises of the enterprise and the like. The data is basic data of the enterprise, belongs to the content of enterprise information disclosure and can be obtained from a disclosure channel.
Calculating the number of companies related to the same name; extracting company name keywords; constructing an investment network by taking the investment relationship between companies as edges; such as the presence of such data in a database:
the investment enterprises at the upstream and downstream of the A enterprise are as follows
A C
A E
A D
A F
The investment enterprises in the upstream and downstream of the enterprise B are as follows
B H
B G
B K
C the investment enterprises in the upstream and downstream of the enterprise are as follows
C A
C H
C K
D the investment enterprises in the upstream and downstream of the enterprise are as follows
D E
D F
D G
The investment network as shown in figure 2 can be constructed; if enterprise a and enterprise B have natural people with the same name: a. b and c. Calculating the shortest investment distance degree between the enterprises A, B to be 4; all paths of computing enterprise a to B include: as shown in figures 3, 4, 5 and 6,7A-C-H-B, A-C-K-B, A-D-G-B, A-E-D-B, A-F-D-G-B; the shortest path includes: A-C-H-B; A-C-K-B; A-D-G-B. There are 5 paths with less than a preset threshold of 5;
and inputting the characteristic vector into a model trained in advance, judging whether the a, B and c are respectively the same natural person, and if the probability is higher than 0.5, judging whether the a, B and c are respectively the same natural person.
Example 2
As shown in fig. 8, the dynamic comparison sample set constructing system of the present embodiment also provides an electronic device for implementing the method for determining the business synonyms; the electronic device may comprise a processor 51 and a memory 52, wherein the memory 52 is coupled to the processor 51. It is noted that this figure is exemplary and that other types of structures may be used in addition to or in place of this structure.
As shown in fig. 3, the electronic device may further include: an input unit 53, a display unit 54, and a power supply 55. It is to be noted that the electronic device does not necessarily have to comprise all the components shown in fig. 3. Furthermore, the electronic device may also comprise components not shown in fig. 3, reference being made to the prior art.
The processor 51, also sometimes referred to as a controller or operational control, may comprise a microprocessor or other processor device and/or logic device, the processor 51 receiving input and controlling operation of the various components of the electronic device.
The memory 52 may be one or more of a buffer, a flash memory, a hard drive, a removable medium, a volatile memory, a non-volatile memory, or other suitable devices, and may store the configuration information of the processor 51, the instructions executed by the processor 51, the recorded table data, and other information. The processor 51 may execute a program stored in the memory 52 to realize information storage or processing, or the like. In one embodiment, a buffer memory, i.e., a buffer, is also included in the memory 52 to store the intermediate information.
The input unit 53 is for example used to provide the processor 51 with text data to be annotated. The display unit 54 is used for displaying various results in the process, such as input text data, the converted multi-dimensional vector, the calculated distance value, etc., and may be, for example, an LCD display, but the present invention is not limited thereto. The power supply 55 is used to provide power to the electronic device.
Embodiments of the present invention further provide a computer readable instruction, where when the instruction is executed in an electronic device, the program causes the electronic device to execute the operation steps included in the method of the present invention.
Embodiments of the present invention further provide a storage medium storing computer-readable instructions, where the computer-readable instructions cause an electronic device to execute the operation steps included in the method of the present invention.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Those of ordinary skill in the art will appreciate that the various illustrative modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed system may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (12)

1. A method for aligning entity data of main related natural persons of an enterprise is characterized by comprising the following implementation steps: (1) acquiring enterprise data, wherein the enterprise data comprises investment relations and main related natural persons;
(2) acquiring enterprise data of natural people with the same name;
(3) constructing an incidence relation network between two enterprises with the same natural person;
(4) calculating investment path data among enterprises;
(5) constructing a judgment model by taking the investment path among enterprises as a characteristic;
(6) training the model using the labeled data;
(7) inputting the investment path characteristic vector between two enterprises of the same name to be judged into the trained model, and judging the probability of whether the same name natural person is the same person or not by the model; and judging the same person when the probability value is larger than the set threshold value.
2. The method of claim 1, wherein; the investment path data in the step (4) comprises:
(a) the shortest investment path between two enterprises;
(b) the number of investment paths between two enterprises;
(c) the number of the same-name natural people between two enterprises.
3. The method of claim 2, wherein; the method includes a process of vectorizing investment path data into a transformation.
4. The method of claim 3, wherein; the investment path data vector in the step (7) is: x ═ X1, X2, X3, …; wherein X is the investment path vector data of the enterprise to be judged, X1 is the reciprocal of the shortest investment path, X2 is the number of the investment paths, and X3 is the number of the same-name natural persons.
5. The method of claim 4, wherein the model formula is:
Figure FDA0002350811780000021
wherein (x) is the probability of the same person being the same person for different enterprises; wherein theta is a parameter of the model needing to be trained, and x is a quantized feature vector.
6. The method of claim 5, wherein the loss function during model training is formulated as follows:
Figure FDA0002350811780000022
where j (θ) is the loss function, m is the number of samples, y(i)For the judgment of the ith sample, label value, hθ(x(i)) The model predicted value of the ith sample is, n is the number of model parameters, and lambda is a regular term parameter.
7. The method of claim 6 wherein the shortest investment path vector is set to 0 when the shortest investment path length between businesses is greater than 5.
8. The method of claim 7 wherein investment paths having an investment path length greater than 6 do not count in number of active paths.
9. The method of claim 8, wherein the shortest investment path length between two enterprises that construct the associative relationship network in step (3) is less than 5.
10. A method and a system for aligning entity data of main related natural people of an enterprise are characterized in that: the system comprises a data acquisition module and a data storage module; a data processing module;
the data acquisition module acquires relevant data of a target to be analyzed;
the data storage module stores input and output data comprising a data acquisition module and a data processing module;
the data processing module; the determination of whether natural persons of the same name in different business information are the same person is done by the method of one of claims 1 to 9.
11. An electronic device, comprising a memory and a processor, wherein the memory is connected to the processor, and the processor performs the determination of whether natural persons of the same name are the same person in different business information by the method of any one of claims 1 to 9.
12. A computer readable storage medium comprising computer readable instructions for causing an electronic device to perform the operational steps contained in the method of one of claims 1 to 9.
CN201911424586.3A 2019-12-31 2019-12-31 Method for aligning entity data of main related natural persons of enterprise Pending CN111652667A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911424586.3A CN111652667A (en) 2019-12-31 2019-12-31 Method for aligning entity data of main related natural persons of enterprise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911424586.3A CN111652667A (en) 2019-12-31 2019-12-31 Method for aligning entity data of main related natural persons of enterprise

Publications (1)

Publication Number Publication Date
CN111652667A true CN111652667A (en) 2020-09-11

Family

ID=72346447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911424586.3A Pending CN111652667A (en) 2019-12-31 2019-12-31 Method for aligning entity data of main related natural persons of enterprise

Country Status (1)

Country Link
CN (1) CN111652667A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112182240A (en) * 2020-09-23 2021-01-05 成都数联铭品科技有限公司 Method and system for identifying and processing same-name natural person entity super node and electronic equipment
CN112215500A (en) * 2020-10-15 2021-01-12 支付宝(杭州)信息技术有限公司 Account relation identification method and device
CN112232771A (en) * 2020-10-17 2021-01-15 严怀华 Big data analysis method and big data cloud platform applied to smart government-enterprise cloud service
CN112287674A (en) * 2020-12-17 2021-01-29 成都数联铭品科技有限公司 Method and system for identifying homonymous large nodes among enterprises, electronic equipment and storage medium
CN112487819A (en) * 2020-12-18 2021-03-12 成都数联铭品科技有限公司 Method, system, electronic device and storage medium for identifying homonyms among enterprises

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112182240A (en) * 2020-09-23 2021-01-05 成都数联铭品科技有限公司 Method and system for identifying and processing same-name natural person entity super node and electronic equipment
CN112182240B (en) * 2020-09-23 2024-04-02 成都数联铭品科技有限公司 Super node identification processing method and system for entities of same-name natural persons and electronic equipment
CN112215500A (en) * 2020-10-15 2021-01-12 支付宝(杭州)信息技术有限公司 Account relation identification method and device
CN112215500B (en) * 2020-10-15 2022-06-28 支付宝(杭州)信息技术有限公司 Account relation identification method and device
CN112232771A (en) * 2020-10-17 2021-01-15 严怀华 Big data analysis method and big data cloud platform applied to smart government-enterprise cloud service
CN112287674A (en) * 2020-12-17 2021-01-29 成都数联铭品科技有限公司 Method and system for identifying homonymous large nodes among enterprises, electronic equipment and storage medium
CN112287674B (en) * 2020-12-17 2021-03-26 成都数联铭品科技有限公司 Method and system for identifying homonymous large nodes among enterprises, electronic equipment and storage medium
CN112487819A (en) * 2020-12-18 2021-03-12 成都数联铭品科技有限公司 Method, system, electronic device and storage medium for identifying homonyms among enterprises

Similar Documents

Publication Publication Date Title
CN111652667A (en) Method for aligning entity data of main related natural persons of enterprise
US11915104B2 (en) Normalizing text attributes for machine learning models
WO2021088499A1 (en) False invoice issuing identification method and system based on dynamic network representation
Butler et al. Financial forecasting using character n-gram analysis and readability scores of annual reports
CN110196943A (en) A kind of position intelligent recommendation system, method and its system
CN111221873A (en) Inter-enterprise homonym identification method and system based on associated network
CN112579727A (en) Document content extraction method and device, electronic equipment and storage medium
CN111241153A (en) Enterprise natural person entity comprehensive judgment alignment method and system
CN112990281A (en) Abnormal bid identification model training method, abnormal bid identification method and abnormal bid identification device
CN113674087A (en) Enterprise credit rating method, apparatus, electronic device and medium
Khairi et al. Stock price prediction using technical, fundamental and news based approach
WO2023071120A1 (en) Method for recognizing proportion of green assets in digital assets and related product
CN109284504A (en) It grinds to call the score using the security of deep learning model and analyses method and device
CN115563271A (en) Artificial intelligence accounting data entry method, system, equipment and storage medium
CN111179055A (en) Credit limit adjusting method and device and electronic equipment
CN113392920B (en) Method, apparatus, device, medium, and program product for generating cheating prediction model
CN114090601A (en) Data screening method, device, equipment and storage medium
CN113506023A (en) Working behavior data analysis method, device, equipment and storage medium
Chen The application of an improved C4. 5 decision tree
CN107402925B (en) Information pushing method and device
CN114490965B (en) Question processing method and device, electronic equipment and storage medium
CN115034762A (en) Post recommendation method and device, storage medium, electronic equipment and product
CN113849618A (en) Strategy determination method and device based on knowledge graph, electronic equipment and medium
Zhang et al. Credit Scoring model based on kernel density estimation and support vector machine for group feature selection
Nong Construction and Simulation of Financial Risk Prediction Model Based on LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200911

WD01 Invention patent application deemed withdrawn after publication