CN109408704B - Fund data association method, system, computer device and storage medium - Google Patents

Fund data association method, system, computer device and storage medium Download PDF

Info

Publication number
CN109408704B
CN109408704B CN201811022472.1A CN201811022472A CN109408704B CN 109408704 B CN109408704 B CN 109408704B CN 201811022472 A CN201811022472 A CN 201811022472A CN 109408704 B CN109408704 B CN 109408704B
Authority
CN
China
Prior art keywords
entity
fund
data
foundation
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811022472.1A
Other languages
Chinese (zh)
Other versions
CN109408704A (en
Inventor
陈泽晖
刘琼
蒋逸文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811022472.1A priority Critical patent/CN109408704B/en
Priority to PCT/CN2018/124252 priority patent/WO2020048059A1/en
Publication of CN109408704A publication Critical patent/CN109408704A/en
Application granted granted Critical
Publication of CN109408704B publication Critical patent/CN109408704B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis

Abstract

The present invention relates to the field of financial technologies, and in particular, to a method, a system, a computer device, and a storage medium for associating fund data. The method comprises the following steps: and collecting foundation information from a plurality of foundation websites at regular time, screening the foundation information through a preset data screening rule to obtain a plurality of foundation data sets, generating a plurality of entities for the same foundation data set, correspondingly generating a plurality of entity types for each entity, matching each two entities, setting the entity type as a relationship type when the entity types are the same, correlating the two entities by the relationship type, and generating relationship data with association relationship between the two entities and the entity type. According to the invention, the foundation information is processed into the relational data with the association relation, so that the user can find the foundation data with the close association relation, and the effective extraction and association of mass public data are realized.

Description

Fund data association method, system, computer device and storage medium
Technical Field
The present invention relates to the field of financial technologies, and in particular, to a method, a system, a computer device, and a storage medium for associating fund data.
Background
The fund market is at a high risk and a high profit, and thus research into knowledge discovery of fund data has been attracting attention. In recent years, with rapid development of computer technology and great improvement of storage capacity, research in this respect has been greatly progressed.
At present, financial application APP or website in the market generates a large amount of fund information data every day, and the data is beneficial to the straiters to know the trend of the stock market and make correct investment decisions. However, due to the large data volume, the stakeholders cannot extract or identify effective, novel data information that contributes to the investment from the mass data. At present, financial application APP or websites on the market can only provide simple list information display of funds, simple text list information form, clients can not obtain visual feeling of correlation relations among the funds, and regular and real-time data updating is lacked.
Disclosure of Invention
In view of the foregoing, there is a need for providing a method, a system, a computer device and a storage medium for associating fund data, which can only provide a simple list of funds for the currently available fund information data.
A fund data association method, comprising:
collecting foundation information from a plurality of foundation websites at regular time, and storing the foundation information in a buffer;
invoking the foundation information stored in the buffer, screening the foundation information through a preset data screening rule to obtain multiple foundation data sets, wherein the foundation data in each foundation data set has an association relationship, and storing multiple foundation data sets in the buffer;
sequentially calling each item of the fund data set stored in the buffer, generating a plurality of entities for the same item of the fund data set, wherein each entity correspondingly generates a plurality of entity types, the entities comprise funds, fund managers or fund companies, and the entity types comprise other fund data related to the entities;
and matching every two entities, setting the entity type as a relationship type when the two entities have the same entity type, associating the two entities with the relationship type, generating relationship data with association relationship between the two entities and the entity type, and storing the relationship data in a database.
In one embodiment, the timing of collecting fund base information from a plurality of fund sites and storing the fund base information in a buffer includes:
presetting a website list, wherein the website list comprises websites of a plurality of fund websites;
invoking a browser kernel to sequentially send out the webpage access request to websites in the website list, and waiting for feedback information sent by websites receiving the webpage access request, wherein the feedback information comprises feedback information for receiving access and feedback information for refusing to receive access;
when receiving the feedback information of the receiving access, invoking a web crawler algorithm preset in the database, collecting the foundation information of the foundation, and then continuing to invoke the browser kernel to access other websites in the website list until all websites in the website list are traversed;
when the feedback information of refusing to receive the access is received, continuing to call the browser kernel to access other websites in the website list until all websites in the website list are traversed;
and summarizing the foundation information collected by the web crawler algorithm.
In one embodiment, when the database of the fund website is a local database, the method further comprises periodically collecting fund base information from the database:
presetting a target path list of the database, wherein the target path list contains a plurality of paths for providing foundation information of the foundation, and each path corresponds to at least one file in the database;
and calling a timing task, sequentially reading paths in the target path list, searching the corresponding file in a database according to the paths, and collecting foundation information in the file.
In one embodiment, the calling the fund base information stored in the buffer, and filtering the fund base information through a preset data filtering rule to obtain a plurality of fund data sets, includes:
the data screening rule is a rule for extracting keywords including record information of a fund manager and corresponding fund companies, managed funds, managed fund companies, warehouse holding information of the managed funds, social relations and news media information from the fund basic information;
and calling the data screening rule to extract the foundation information by adopting natural language processing or regular expression, extracting a plurality of pieces of foundation data, assembling the foundation data of each foundation manager and the association relationship into one foundation data set, and generating a plurality of pieces of foundation data sets by the plurality of foundation managers, wherein the association relationship comprises at least one relationship among the foundation manager, the foundation company, the managed foundation company, the graduation university, the guide, the classmates and the spouse.
In one embodiment, the generating a plurality of entities for the same foundation data set, each entity generating a plurality of entity types, includes:
marking unique codes for a fund manager, a fund company and a fund in the fund data set respectively, and storing the unique codes and corresponding fund data in a target keyword text;
acquiring the unique code and the fund data corresponding to the unique code in the target keyword text, setting the fund data as a name, and generating an entity, wherein the entity comprises the unique code and the name;
and acquiring other fund data related to the entity in the fund data set, setting the related relation between the entity and the other fund data as a node type, setting the other fund data as an entity name, and generating one entity type, wherein the entity type comprises the node type and the entity name, the entity also comprises the node type, and the entity type are related by the node type.
In one embodiment, the entity types further include weight coefficients, and when generating one entity type, the method further includes:
invoking a preset weight list, wherein the weight list comprises a node type and a corresponding weight coefficient, obtaining the weight coefficient according to the node type, and generating the entity type from the node type, the entity name and the weight coefficient.
In one embodiment, the matching each two entities, when having the same entity type, sets the entity type as a relationship type, associates the two entities with each other by using the relationship type, and generates relationship data with an association relationship between the two entities and the entity type, including:
matching every two entities, setting the entity type as the relation type when the entity types are the same, and generating a relation link, wherein the relation link comprises a node type, a link source and a link target in the relation type, and setting the unique code of one entity as the link source and the unique code of the other entity as the link target;
and associating the relation link with the link source, and generating relation data with association relation among the link source, the relation link and the link target.
A fund data association system, comprising:
the collecting unit is used for regularly collecting foundation information from a plurality of foundation websites and storing the foundation information in the buffer;
the screening unit is used for calling the foundation information stored in the buffer, screening the foundation information through a preset data screening rule to obtain multiple foundation data sets, wherein the foundation data in each foundation data set has an association relationship, and multiple foundation data sets are stored in the buffer;
generating an entity and an entity type unit, wherein the entity and the entity type unit are used for sequentially calling each item of the fund data set stored in the buffer, generating a plurality of entities for the same item of the fund data set, each entity correspondingly generating a plurality of entity types, each entity comprises a fund, a fund manager or a fund company, and each entity type comprises other fund data related to the entity;
and the association unit is used for matching every two entities, setting the entity type as a relationship type when the two entities have the same entity type, associating the two entities with the relationship type, generating relationship data with association relationship between the two entities and the entity type, and storing the relationship data in a database.
A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the fund data correlation method described above.
A storage medium storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the fund data correlation method described above.
The fund data association method, the device, the computer equipment and the storage medium comprise the steps of regularly acquiring fund basic information from a plurality of fund websites and storing the fund basic information in a buffer; the method comprises the steps of calling foundation information stored in a buffer, screening the foundation information through a preset data screening rule to obtain multiple foundation data sets, wherein the foundation data in each foundation data set has an association relationship, and storing the multiple foundation data sets in the buffer; sequentially calling each item of fund data set stored in the buffer memory, generating a plurality of entities for the same item of fund data set, wherein each entity correspondingly generates a plurality of entity types, the entities comprise funds, fund managers or fund companies, and the entity types comprise other fund data related to the entities; and matching every two entities, setting the entity type as a relationship type when the two entities have the same entity type, associating the two entities with the relationship type, generating relationship data with association relationship between the two entities and the entity type, and storing the relationship data in a database. According to the invention, foundation information of the foundation is obtained from each source at fixed time, deep mining is carried out, relationship data with association relationship is generated by constructing the relationship among the foundation data, a user can search other foundation data with close association relationship with the foundation data through certain foundation data later, and effective extraction and association of mass public data are realized.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention.
FIG. 1 is a flow chart of a method of fund data association in one embodiment of the invention;
FIG. 2 is a flow chart of step S3 in one embodiment;
FIG. 3 is a diagram of an association between two entities in one embodiment;
FIG. 4 is a block diagram of a fund data correlation system in accordance with one embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
FIG. 1 is a flowchart of a method for associating fund data according to an embodiment of the invention, as shown in FIG. 1, comprising the following steps:
step S1, foundation information of the foundation is collected: fund base information is periodically collected from a plurality of fund sites and stored in a buffer.
In order to enrich the fund data of the invention, the step is respectively carried out on different fund websites to obtain massive fund basic information. The source of the fund base information may be external such as: the published official websites of the foundation company, the public data of the supervision authorities, the Chinese knowledge network, the academic paper websites at home and abroad, the newland microblogs and various investment recommendation websites, and the published official websites of the foundation company, the public data of the supervision authorities, the Chinese knowledge network, the academic paper websites at home and abroad, the newland microblogs and various investment recommendation websites can also be APP channels such as sky and eye examination. At this time, a great amount of foundation information can be obtained by crawling periodically or timely through a web crawler algorithm, and the foundation information is stored in a buffer.
In one embodiment, obtaining foundation information via a web crawler algorithm includes:
presetting a website list, wherein the website list comprises websites of a plurality of fund websites; invoking a browser kernel to sequentially send out a webpage access request to websites in a website list, and waiting for receiving feedback information sent by websites of the webpage access request, wherein the feedback information comprises feedback information for receiving access and feedback information for refusing to receive access; when receiving the feedback information of the receiving access, calling a web crawler algorithm preset in a database, collecting foundation information of the foundation, related to the foundation, and then continuously calling a browser kernel to access other websites in the website list until all websites in the website list are traversed; after receiving the feedback information of refusing to receive the access, continuing to call the browser kernel to access other websites in the website list until all websites in the website list are traversed; and summarizing foundation information collected by the web crawler algorithm.
According to the embodiment, the data is crawled by utilizing the web crawler algorithm, automatic operation can be realized, manual information screening is not needed, key information is convenient and cannot be omitted, and massive foundation information can be obtained easily.
The database of the fund website in the step can also be a local database, and the fund basic information sources are such as safe Euler diagram data service websites, safe insurance websites and Liu Jin websites, and the databases of the websites are the local databases, so that massive fund basic information can be obtained from the local databases.
In one embodiment, periodically collecting foundation information from a database includes:
presetting a target path list of a database, wherein the target path list contains a plurality of paths for providing foundation information of funds, and each path corresponds to at least one file in the database; and calling a timing task, sequentially reading paths in the target path list, searching corresponding files in the database according to the paths, and collecting foundation information in the files.
The embodiment utilizes abundant data of the local database, and periodically collects foundation information through a timing task, so as to provide more reliable data for the subsequent foundation data association.
Step S2, screening data: and calling the foundation information stored in the buffer, screening the foundation information through a preset data screening rule to obtain multiple foundation data sets, wherein the foundation data in each foundation data set has an association relationship, and storing the multiple foundation data sets in the buffer.
The data filtering rules in this step are rules for extracting keywords including the history information of the fund manager and corresponding fund companies, managed funds, managed fund companies, the information of the warehouse of the managed funds, social relations, and news media information from the fund base information.
The history information of the fund manager comprises the name of the fund manager, marital status, graduation universities, directors, classmates and the like. Social relationships include parental information about parents, brothers, sisters, children, spouses, and the like. The news media information includes public news, stories, etc. information related to the fund manager.
In one embodiment, the screening of the fund base information to obtain a plurality of fund datasets includes:
and calling a data screening rule to extract foundation information by adopting natural language processing or regular expression, extracting a plurality of foundation data, and converging the foundation data of each foundation manager and the association relationship into one foundation data set, wherein the plurality of foundation managers generate a plurality of foundation data sets, and the association relationship comprises at least one relationship among the foundation manager, the foundation company, the managed foundation company, the graduation university, the guide, the classmates and the spouse.
The natural language processing technology is to process text language with human logic thinking through software, extract foundation information according to data screening rules, and obtain a plurality of foundation data.
A regular expression is a text pattern, a logical formula that operates on strings and special characters, and usually uses predefined specific characters and combinations of the specific characters to form a regular string, where the regular string is used to express a filtering logic for the string. And when the foundation information is structured data, extracting the foundation information through the regular expression, and extracting a plurality of foundation data.
The multiple fund datasets in this embodiment may be as shown in table 1 below:
Figure BDA0001787199960000081
Figure BDA0001787199960000091
TABLE 1
According to the method, the foundation information of the funds is processed through different means, core data related to each foundation are obtained, the core data are used in a follow-up association mode, various relations among the managers of each foundation can be connected in series, association relations of the funds are mined, the most closely associated funds are found, and reliable investment data are provided for users.
Step S3, generating an entity and an entity type: and sequentially calling each item of the fund data set stored in the buffer, generating a plurality of entities for the same item of the fund data set, wherein each entity correspondingly generates a plurality of entity types, the entities comprise funds, fund managers or fund companies, and the entity types comprise other fund data related to the entities.
In one embodiment, as shown in fig. 2, generating a plurality of entities for the same fund data set, each entity correspondingly generating a plurality of entity types, includes:
step S301, marking a unique code: and marking unique codes for the fund manager, the fund company and the fund in the fund data set respectively, and storing the unique codes and the corresponding fund data in a target keyword text.
Because the fund manager, the fund company and the fund are in the financial field and have importance on the fund investment data, and the related fund managers often have a cluster tendency, the step marks unique codes for all the fund manager, the fund company and the fund in the fund data set respectively, is used for generating subsequent entities, and uses the fund manager, the fund company and the fund as main lines to display the association relation of a certain entity.
The target keyword text may employ the contents as shown in table 2:
Figure BDA0001787199960000092
TABLE 2
Step S302, generating an entity: and acquiring the unique code and the fund data corresponding to the unique code in the target keyword text, setting the fund data as a name, and generating an entity, wherein the entity comprises the unique code and the name.
In this step, all the funds manager, the funds company and the funds in the target keyword text are respectively generated into one entity, wherein the entity comprises unique codes and names, and specifically, as shown in the following tables 3, 4 and 5, the entity is respectively three entities of the funds manager, the funds manager and the funds company.
Figure BDA0001787199960000101
TABLE 3 Table 3
As shown in the table above, an entity is generated that includes a funding entity having a unique code 160716 and a name of Carnival base 50.
Figure BDA0001787199960000102
TABLE 4 Table 4
As shown in the above table, an entity is generated that includes a foundation manager entity having a unique code 30284601, named foundation manager 1.
Figure BDA0001787199960000103
TABLE 5
As shown in the above table, an entity is generated that includes a fund company entity having a unique code 80000223 and named Cared fund management Limited.
Step S303, generating entity types: other fund data related to the entity in the fund data set is obtained, the related relation between the entity and the other fund data is set as a node type, the other fund data is set as an entity name, an entity type is generated, the entity type comprises the node type and the entity name, the entity also comprises the node type, and the entity type are related by the node type.
Because the same fund data set comprises the fund manager and other fund data with corresponding relation to the fund manager, after a plurality of entities are generated in the same fund data set, the relation between each entity and the other fund data is reflected and related from the entity type. For example, step S302 generates a foundation manager entity having a unique code 30284601, named Foundation manager 1, as shown in Table 4. The other fund data associated by this fund manager 1 in the fund data set is shown in table 1, and the node types include the fund company, the fund, the graduation institution, the mentor, the classmate and the spouse, and the fund manager entity is associated with six entity types. Taking a fund company as an example, setting an association relationship fund company as a node type, setting a jia-to-date fund management limited company as an entity name, and generating an entity type with the node type of the fund company and the entity name of the jia-to-date fund management limited company. The fund manager entity is associated with this entity type by the fund company.
The entity type further includes a weight coefficient, and when generating an entity type, the method further includes: invoking a preset weight list, wherein the weight list comprises node types and corresponding weight coefficients, obtaining the weight coefficients according to the node types, and generating entity types from the node types, the entity names and the weight coefficients. Because the entities with association relationship tend to have a binding tendency, the invention adopts a weighted relationship, namely the entity types are weighted, so as to determine the importance degree between two entities when the entities are associated subsequently. Specifically, the node type is a foundation company, the weight w=1, the node type is a foundation, the weight w=2, the node type is a graduation institution, the weight w=1, the node type is a guide, the weight w=2, the node type is a classmate, the weight w=1, the node type is a spouse, and the weight w=2.
Specifically, the entity generated by the fund manager 1 in table 1 through step S302 and step S303 and the types of the entities therein are shown in table 6. The entity generated by the fund manager 2 in table 1 through step S302 and step S303 and the types of the entities therein are shown in table 7.
Figure BDA0001787199960000121
TABLE 6
Figure BDA0001787199960000122
TABLE 7
In this embodiment, the unique codes are used to distinguish a plurality of fund data in the fund data set, so that the most important fund manager, the fund company and the fund in the fund data set have uniqueness, and data support is provided for the subsequent generation entity and entity type. When generating the entity and the entity type, respectively generating a plurality of entities in the same fund data set, generating a plurality of entity types associated with each entity by each entity, and linking the containing relations in the fund data set together as much as possible to provide a link relation for the subsequent generation of the relation data.
Step S4, relation among the associated entities: and matching every two entities, setting the entity type as a relationship type when the two entities have the same entity type, associating the two entities with the relationship type, generating relationship data with association relationship between the two entities and the entity type, and storing the relationship data in a database.
In one embodiment, matching is performed on every two entities, when the two entities have the same entity type, the entity type is set as a relationship type, a relationship link is generated, the relationship link comprises a node type, a link source and a link target in the relationship type, the unique code of one entity is set as the link source, and the unique code of the other entity is set as the link target; and associating the relation link with the link source, and generating relation data with association relation by the link source, the relation link and the link target.
Specifically, table 6 and table 7 in step S3 generate two entities, match the two entities to find that the two entities have the same entity type, set the entity type as a relationship type, and generate a relationship link, where the node type in the relationship link is a fund company, the link source is 30284601, and the link target is 30414880. As shown in fig. 3, the foundation manager entity with the unique code 30284601 is associated with the relationship link through the link source, and generates the association relationship with the link target with the unique code 30414880 through the relationship link, and the association relationship is the relationship in the relationship type. And a unique code 30284601 and a unique code 30414880 are each associated with an entity type.
In the embodiment, by adding the relational links, two entities with relation are connected through the link sources and the link targets, so that association is realized, and relational data is generated. By the method, other entity types of a certain entity can be distinguished, entity types without relationship are respectively associated with various entities, the whole association relationship is concise and clear, and the relationship data is easy to store and understand.
According to the fund data association method, a large amount of fund basic information is collected from a plurality of external channels at regular time through a web crawler algorithm or a database searching mode, fund data is enriched, the possibility of missing related information of the fund is reduced, and the obtained fund basic information can be updated in real time due to the fact that the method is used for collecting the related information of the fund at regular time. After a large amount of foundation information of the funds is obtained, screening is carried out through preset data screening rules, and the data of the funds closely related to the funds, such as history information of a manager of the funds and corresponding companies of the funds, managed companies of the funds, warehouse information of the managed funds, social relations, news media information and the like, are obtained. Because of the large amount of fund data, the fund data associated with a particular fund manager is integrated together by the fund manager as an identification to generate a fund data set. And generating entity and entity type for the fund data in each fund data set in turn, and associating the connection relations among the entities through relation links by taking a fund manager, a fund company and the fund as main lines to obtain multiple relation data, so that the purposes of combing, integrating and associating the mass fund related information are achieved, and users can obtain the fund related information with the closest association relation from the mass fund related information with complex and complex relationship.
In one embodiment, a fund data association system is provided, as shown in fig. 4, comprising the following units:
the collecting unit is used for regularly collecting foundation information from a plurality of foundation websites and storing the foundation information in the buffer;
the screening unit is used for calling the foundation information stored in the buffer, screening the foundation information through a preset data screening rule to obtain multiple foundation data sets, wherein the foundation data in each foundation data set has an association relationship, and the multiple foundation data sets are stored in the buffer;
generating an entity and an entity type unit, wherein the entity and the entity type unit are used for sequentially calling each item of fund data set stored in the buffer memory, generating a plurality of entities for the same item of fund data set, each entity correspondingly generates a plurality of entity types, the entity comprises a fund, a fund manager or a fund company, and the entity types comprise other fund data related to the entity;
and the association unit is used for matching every two entities, setting the entity type as a relationship type when the two entities have the same entity type, associating the two entities with the relationship type, generating relationship data with association relationship between the two entities and the entity type, and storing the relationship data in the database.
In one embodiment, a computer device is provided that includes a memory and a processor, the memory having stored therein computer readable instructions that, when executed by the processor, cause the processor to perform the steps of: collecting foundation information from a plurality of foundation websites at regular time, and storing the foundation information in a buffer; the method comprises the steps of calling foundation information stored in a buffer, screening the foundation information through a preset data screening rule to obtain multiple foundation data sets, wherein the foundation data in each foundation data set has an association relationship, and storing the multiple foundation data sets in the buffer; sequentially calling each item of fund data set stored in the buffer memory, generating a plurality of entities for the same item of fund data set, wherein each entity correspondingly generates a plurality of entity types, the entities comprise funds, fund managers or fund companies, and the entity types comprise other fund data related to the entities; and matching every two entities, setting the entity type as a relationship type when the two entities have the same entity type, associating the two entities with the relationship type, generating relationship data with association relationship between the two entities and the entity type, and storing the relationship data in a database.
In one embodiment, a storage medium storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of: collecting foundation information from a plurality of foundation websites at regular time, and storing the foundation information in a buffer; the method comprises the steps of calling foundation information stored in a buffer, screening the foundation information through a preset data screening rule to obtain multiple foundation data sets, wherein the foundation data in each foundation data set has an association relationship, and storing the multiple foundation data sets in the buffer; sequentially calling each item of fund data set stored in the buffer memory, generating a plurality of entities for the same item of fund data set, wherein each entity correspondingly generates a plurality of entity types, the entities comprise funds, fund managers or fund companies, and the entity types comprise other fund data related to the entities; and matching every two entities, setting the entity type as a relationship type when the two entities have the same entity type, associating the two entities with the relationship type, generating relationship data with association relationship between the two entities and the entity type, and storing the relationship data in a database.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above-described embodiments represent only some exemplary embodiments of the invention, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (8)

1. A method of fund data association, comprising:
collecting foundation information from a plurality of foundation websites at regular time, and storing the foundation information in a buffer;
invoking the foundation information stored in the buffer, screening the foundation information through a preset data screening rule to obtain multiple foundation data sets, wherein the foundation data in each foundation data set has an association relationship, and storing multiple foundation data sets in the buffer;
sequentially calling each item of the fund data set stored in the buffer, generating a plurality of entities for the same item of the fund data set, wherein each entity correspondingly generates a plurality of entity types, the entities comprise funds, fund managers or fund companies, and the entity types comprise other fund data related to the entities;
matching every two entities, setting the entity type as a relationship type when the entity types are the same, associating the two entities with each other according to the relationship type, generating relationship data with association relationship between the two entities and the entity type, and storing the relationship data in a database;
the generating a plurality of entities for the same fund dataset, each entity correspondingly generating a plurality of entity types, includes:
marking unique codes for a fund manager, a fund company and a fund in the fund data set respectively, and storing the unique codes and corresponding fund data in a target keyword text;
acquiring the unique code and the fund data corresponding to the unique code in the target keyword text, setting the fund data as a name, and generating an entity, wherein the entity comprises the unique code and the name;
acquiring other fund data related to the entity in the fund data set, setting the related relation between the entity and the other fund data as a node type, setting the other fund data as an entity name, and generating one entity type, wherein the entity type comprises the node type and the entity name, the entity also comprises the node type, and the entity type are related by the node type;
the matching of each two entities, when the entity types are the same, setting the entity types as relationship types, associating the two entities with each other by the relationship types, and generating relationship data with association relationship between the two entities and the entity types, wherein the matching comprises the following steps:
matching every two entities, setting the entity type as the relation type when the entity types are the same, and generating a relation link, wherein the relation link comprises a node type, a link source and a link target in the relation type, and setting the unique code of one entity as the link source and the unique code of the other entity as the link target;
and associating the relation link with the link source, and generating relation data with association relation among the link source, the relation link and the link target.
2. The fund data correlation method of claim 1, wherein the timing of collecting fund base information from a plurality of fund sites and storing the fund base information in a buffer comprises:
presetting a website list, wherein the website list comprises websites of a plurality of fund websites;
invoking a browser kernel to sequentially send out a webpage access request to websites in the website list, and waiting for receiving feedback information sent out by websites of the webpage access request, wherein the feedback information comprises feedback information for receiving access and feedback information for refusing to receive access;
when receiving the feedback information of the receiving access, invoking a web crawler algorithm preset in the database, collecting the foundation information of the foundation, and then continuing to invoke the browser kernel to access other websites in the website list until all websites in the website list are traversed;
when the feedback information of refusing to receive the access is received, continuing to call the browser kernel to access other websites in the website list until all websites in the website list are traversed;
and summarizing the foundation information collected by the web crawler algorithm.
3. The fund data association method of claim 1, wherein when the database of the fund website is a local database, further comprising periodically collecting fund base information from the database:
presetting a target path list of the database, wherein the target path list contains a plurality of paths for providing foundation information of the foundation, and each path corresponds to at least one file in the database;
and calling a timing task, sequentially reading paths in the target path list, searching the corresponding file in a database according to the paths, and collecting foundation information in the file.
4. The method of claim 1, wherein the calling the fund base information stored in the buffer, and the screening the fund base information by a preset data screening rule, to obtain a plurality of fund data sets, comprises:
the data screening rule is a rule for extracting keywords including record information of a fund manager and corresponding fund companies, managed funds, managed fund companies, warehouse holding information of the managed funds, social relations and news media information from the fund basic information;
and calling the data screening rule to extract the foundation information by adopting natural language processing or regular expression, extracting a plurality of pieces of foundation data, assembling the foundation data of each foundation manager and the association relationship into one foundation data set, and generating a plurality of pieces of foundation data sets by the plurality of foundation managers, wherein the association relationship comprises at least one relationship among the foundation manager, the foundation company, the managed foundation company, the graduation university, the guide, the classmates and the spouse.
5. The fund data association method of claim 1, wherein said entity types further comprise weight coefficients, and wherein said generating one of said entity types further comprises:
invoking a preset weight list, wherein the weight list comprises a node type and a corresponding weight coefficient, obtaining the weight coefficient according to the node type, and generating the entity type from the node type, the entity name and the weight coefficient.
6. A fund data association system, comprising:
the collecting unit is used for regularly collecting foundation information from a plurality of foundation websites and storing the foundation information in the buffer;
the screening unit is used for calling the foundation information stored in the buffer, screening the foundation information through a preset data screening rule to obtain multiple foundation data sets, wherein the foundation data in each foundation data set has an association relationship, and multiple foundation data sets are stored in the buffer;
generating an entity and an entity type unit, wherein the entity and the entity type unit are used for sequentially calling each item of the fund data set stored in the buffer, generating a plurality of entities for the same item of the fund data set, each entity correspondingly generating a plurality of entity types, each entity comprises a fund, a fund manager or a fund company, and each entity type comprises other fund data related to the entity;
the association unit is used for matching every two entities, setting the entity type as a relationship type when the entity types are the same, associating the two entities with each other according to the relationship type, generating relationship data with association relationship between the two entities and the entity type, and storing the relationship data in a database;
the generating a plurality of entities for the same fund dataset, each entity correspondingly generating a plurality of entity types, includes:
marking unique codes for a fund manager, a fund company and a fund in the fund data set respectively, and storing the unique codes and corresponding fund data in a target keyword text;
acquiring the unique code and the fund data corresponding to the unique code in the target keyword text, setting the fund data as a name, and generating an entity, wherein the entity comprises the unique code and the name;
acquiring other fund data related to the entity in the fund data set, setting the related relation between the entity and the other fund data as a node type, setting the other fund data as an entity name, and generating one entity type, wherein the entity type comprises the node type and the entity name, the entity also comprises the node type, and the entity type are related by the node type;
the matching of each two entities, when the entity types are the same, setting the entity types as relationship types, associating the two entities with each other by the relationship types, and generating relationship data with association relationship between the two entities and the entity types, wherein the matching comprises the following steps:
matching every two entities, setting the entity type as the relation type when the entity types are the same, and generating a relation link, wherein the relation link comprises a node type, a link source and a link target in the relation type, and setting the unique code of one entity as the link source and the unique code of the other entity as the link target;
and associating the relation link with the link source, and generating relation data with association relation among the link source, the relation link and the link target.
7. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the fund data correlation method of any of claims 1 to 5.
8. A storage medium storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the fund data correlation method of any of claims 1-5.
CN201811022472.1A 2018-09-03 2018-09-03 Fund data association method, system, computer device and storage medium Active CN109408704B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811022472.1A CN109408704B (en) 2018-09-03 2018-09-03 Fund data association method, system, computer device and storage medium
PCT/CN2018/124252 WO2020048059A1 (en) 2018-09-03 2018-12-27 Fund data association method and system, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811022472.1A CN109408704B (en) 2018-09-03 2018-09-03 Fund data association method, system, computer device and storage medium

Publications (2)

Publication Number Publication Date
CN109408704A CN109408704A (en) 2019-03-01
CN109408704B true CN109408704B (en) 2023-05-30

Family

ID=65464485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811022472.1A Active CN109408704B (en) 2018-09-03 2018-09-03 Fund data association method, system, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN109408704B (en)
WO (1) WO2020048059A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112150292A (en) * 2020-09-27 2020-12-29 方雷(成都)科技有限公司 Monitoring method, device and system for fund fixed account and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294588A (en) * 2016-07-28 2017-01-04 广东中标数据科技股份有限公司 The method and device of fast search content to be inquired about
CN106682150A (en) * 2016-12-22 2017-05-17 北京锐安科技有限公司 Information processing method and device
CN107506484A (en) * 2017-09-18 2017-12-22 携程旅游信息技术(上海)有限公司 Operation/maintenance data related auditing method, system, equipment and storage medium
CN107506486A (en) * 2017-09-21 2017-12-22 北京航空航天大学 A kind of relation extending method based on entity link

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10303999B2 (en) * 2011-02-22 2019-05-28 Refinitiv Us Organization Llc Machine learning-based relationship association and related discovery and search engines
CN103064837A (en) * 2011-10-19 2013-04-24 西安邮电学院 Retrieval of leading figures in academic fields and visualized navigation system
CN107369091B (en) * 2016-05-12 2021-02-05 创新先进技术有限公司 Product recommendation method and device and financial product recommendation method
CN107194754A (en) * 2017-04-11 2017-09-22 美林数据技术股份有限公司 Stock trader's Products Show method based on mixing collaborative filtering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294588A (en) * 2016-07-28 2017-01-04 广东中标数据科技股份有限公司 The method and device of fast search content to be inquired about
CN106682150A (en) * 2016-12-22 2017-05-17 北京锐安科技有限公司 Information processing method and device
CN107506484A (en) * 2017-09-18 2017-12-22 携程旅游信息技术(上海)有限公司 Operation/maintenance data related auditing method, system, equipment and storage medium
CN107506486A (en) * 2017-09-21 2017-12-22 北京航空航天大学 A kind of relation extending method based on entity link

Also Published As

Publication number Publication date
CN109408704A (en) 2019-03-01
WO2020048059A1 (en) 2020-03-12

Similar Documents

Publication Publication Date Title
US9495345B2 (en) Methods and systems for modeling complex taxonomies with natural language understanding
US10963513B2 (en) Data system and method
US8180758B1 (en) Data management system utilizing predicate logic
US10339038B1 (en) Method and system for generating production data pattern driven test data
US9619571B2 (en) Method for searching related entities through entity co-occurrence
US11714869B2 (en) Automated assistance for generating relevant and valuable search results for an entity of interest
US10592508B2 (en) Organizing datasets for adaptive responses to queries
US10235449B1 (en) Extracting product facets from unstructured data
CN109918678B (en) Method and device for identifying field meaning
KR101864401B1 (en) Digital timeline output system for support of fusion of traditional culture
WO2015084757A1 (en) Systems and methods for processing data stored in a database
CN107451280B (en) Data communication method and device and electronic equipment
CN109408704B (en) Fund data association method, system, computer device and storage medium
CN111383072A (en) User credit scoring method, storage medium and server
CN109726292A (en) Text analyzing method and apparatus towards extensive multilingual data
US20160246794A1 (en) Method for entity-driven alerts based on disambiguated features
US20220035792A1 (en) Determining metadata of a dataset
KR102041915B1 (en) Database module using artificial intelligence, economic data providing system and method using the same
CN104951869A (en) Workflow-based public opinion monitoring method and workflow-based public opinion monitoring device
KR102594204B1 (en) Server for providing graph of stock investment information and method thereof
US11776176B2 (en) Visual representation of directional correlation of service health
US20230306277A1 (en) Graph Database Implemented Knowledge Mesh
Purohit et al. Transactional Knowledge Graph Generation To Model Adversarial Activities
CN116484054A (en) Data processing method and related device
US11120341B1 (en) Determining the value of facts in a knowledge base and related techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant