CN109408643B - Fund similarity calculation method, system, computer equipment and storage medium - Google Patents

Fund similarity calculation method, system, computer equipment and storage medium Download PDF

Info

Publication number
CN109408643B
CN109408643B CN201811019295.1A CN201811019295A CN109408643B CN 109408643 B CN109408643 B CN 109408643B CN 201811019295 A CN201811019295 A CN 201811019295A CN 109408643 B CN109408643 B CN 109408643B
Authority
CN
China
Prior art keywords
fund
entity
knowledge base
similarity
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811019295.1A
Other languages
Chinese (zh)
Other versions
CN109408643A (en
Inventor
陈泽晖
胡逸凡
黄鸿顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811019295.1A priority Critical patent/CN109408643B/en
Publication of CN109408643A publication Critical patent/CN109408643A/en
Application granted granted Critical
Publication of CN109408643B publication Critical patent/CN109408643B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Technology Law (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The present disclosure relates to the field of graph theory and network analysis technologies, and in particular, to a method, a system, a computer device, and a storage medium for calculating a fund similarity. A fund similarity calculation method comprises the steps of extracting fund knowledge in an information source and then establishing a fund knowledge base; merging the fund knowledge according to a preset rule and storing the fund knowledge in a database; receiving a fund inquiry request, and screening out a fund entity closest to the fund inquiry request from a fund knowledge base, wherein the fund inquiry request and the fund entity are a pair of fund pairs; obtaining the fund information of the fund pair from the fund knowledge base, and generating a corresponding bipartite graph; and calculating the weighted similarity of the entities contained in the two graphs by using a preset algorithm, and displaying the calculation result. The invention expresses funds and stocks as two graphs, and performs weighted similarity calculation on entities contained in the two graphs, thereby facilitating the excavation of foundation managers with similar styles and potential influence relations among the foundation managers.

Description

Fund similarity calculation method, system, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of graph theory and network analysis technologies, and in particular, to a method, a system, a computer device, and a storage medium for calculating a fund similarity.
Background
Knowledge graphs (knowledgegraphs) are a very popular research area at present. It is essentially a semantic network whose nodes represent entities or concepts and edges represent various semantic relationships between entities/concepts. For a knowledge base containing numerous entities, it is necessary to pay attention to the association information between the entities in addition to the information of the entities themselves. One of the problems faced is: given two entities, how similar they are, and how high they are.
The similarity between entities refers to the similarity between entities in deep semantics, and does not only focus on the traditional similarity of surface information. Judging the similarity between entities requires first understanding the semantic information of the entities, and the traditional character similarity method is not feasible. In the knowledge base in which the data is already stored in a structured manner, the attribute of the entity can be used as a main basis for similarity judgment. Then, the attributes of the entities are various, and how to judge what is important and how to calculate the similarity of the attributes becomes a key to solving the problem.
The prior knowledge graph has less analysis on entity similarity. Meanwhile, in the field of finance, for the similarity of the fund holding bins, the traditional method is to count the same proportion sum of the same stocks by two-by-two funds, and the holding bin distribution is not considered, so that the method is not comprehensive enough.
Disclosure of Invention
Based on this, it is necessary to provide a method, a system, a computer device and a storage medium for calculating the similarity of funds, which are necessary to provide a method, a system, a computer device and a storage medium for calculating the similarity of funds, which have less analysis on the similarity of entities in the existing knowledge graph and which do not consider the problem of the distribution of the holding bins itself in the calculation of the similarity of the funds holding bins in the financial field.
A fund similarity calculation method comprises the following steps:
after extracting the fund knowledge in the information source, establishing a fund knowledge base, wherein the fund knowledge base comprises a plurality of subgraphs, and each subgraph comprises an entity, a relationship and an attribute;
fusing the fund knowledge in the fund knowledge base according to a preset rule;
storing the fused fund knowledge metadata in a database;
receiving a fund inquiry request, and screening out a fund entity closest to the fund inquiry request from the fund knowledge base, wherein the fund inquiry request and the fund entity are a pair of fund pairs;
obtaining the fund information of the fund pair from the fund knowledge base, and generating a corresponding bipartite graph;
and obtaining the bipartite graph, calculating the weighted similarity of the entities contained in the bipartite graph by using a preset algorithm, and displaying the calculation result.
In one embodiment, after extracting the fund knowledge in the information source, a fund knowledge base is established, where the fund knowledge base includes a plurality of subgraphs, and each subgraph includes an entity, a relationship and an attribute, and the method includes:
identifying fund knowledge contained in an information source, and identifying the data type and the data source of the fund knowledge;
screening and summarizing according to the data types and the data sources of the fund knowledge, screening the fund knowledge with the same data types and the same data sources, and summarizing the fund knowledge into one type;
and establishing a fund knowledge base according to the generalized and tidied fund knowledge.
In one embodiment, the fusing the fund knowledge in the fund knowledge base according to a preset rule includes:
ID identification is carried out on each entity in the fund knowledge base;
and judging each entity in the fund knowledge base according to the ID mark, wherein the entity with the uniform ID mark is the same entity, combining the relationship and the attribute of the same entity, and if the entity does not have the uniform ID mark, the entity does not need to be combined.
In one embodiment, the receiving the fund query request and screening the fund knowledge base for a fund entity closest to the fund query request, the fund query request and the fund entity being a pair of funds, includes:
receiving a fund inquiry request, and matching the keywords of the fund inquiry request to the nearest fund entity in the fund knowledge base;
and constructing the funds contained in the successfully matched fund inquiry request and the fund entity into a pair of fund pairs.
In one embodiment, the obtaining the fund information of the fund pair from the fund knowledge base, and generating the corresponding bipartite graph include:
obtaining the fund information of each fund in the fund pair from the fund knowledge base, wherein the fund information comprises the number of stocks and the stock ratio;
and generating two graphs of the fund and the stock with the warehouse-holding function from the acquired fund information, and representing the two graphs as G= (V, E), wherein V is a fund set, and E is a stock set.
In one embodiment, the obtaining the fund information of the fund pair in the fund knowledge base generates a corresponding bipartite graph, and further includes:
setting the similarity of stocks according to a preset rule, wherein the similarity of stocks is S weighted A representation;
substituting the same number of stocks held by each fund contained in the pair of funds into formula (1) to calculate an evidence factor, wherein the evidence factor is represented by an evidence, and formula (1) is as follows:
Figure BDA0001786917360000041
in the formula (1), I O (A) and U O (B) I represent the number of the adjacent nodes shared by the foundation A and the foundation B, namely the same number of the strands, it can be seen that evadience is less than 1, and when |o (a) Σo (B) | is larger, it is closer to 1;
substituting stock ratios of the funds contained in the pair of funds into a formula (2) respectively to calculate weight factors corresponding to the funds, wherein the weight factors are represented by W, and the formula (2) is as follows:
Figure BDA0001786917360000042
in the formula (2), W (A, i) represents the weight of the fund A stock i, and varience (i) represents the variance of the weight set of all collar edges of i;
substituting the similarity of stocks, the evidence factor and the weight factor into formula (3) or formula (4) to calculate the weighted similarity of every two foundation pairs, wherein formula (3) is as follows:
Figure BDA0001786917360000043
expanding equation (3) to obtain equation (4), equation (4) is as follows:
Figure BDA0001786917360000051
in the formula (3), O (A) represents a set of adjacent nodes corresponding to the outgoing edge of the foundation A, namely all the strands of the foundation A, O (A) represents the number of all the strands of the foundation A, and O i (A) Representing a corresponding one of the neighboring nodes i of the outgoing edge of the fund a, i.e. a certain holding strand i of the fund a.
In one embodiment, after the obtaining the bipartite graph and applying a preset algorithm to calculate the weighted similarity of the entities included in the bipartite graph, the method further includes:
obtaining a calculation result of the similarity of the fund pair, comparing the result with a preset threshold, and if the result is higher than the threshold, identifying the fund entity contained in the fund pair corresponding to the result;
and outputting the identified entity to other platforms, wherein the other platforms perform depth relation mining on the identified entity.
Based on the same conception, the present application also provides a fund similarity calculation system, the fund similarity calculation system comprising:
the extraction unit is arranged for extracting the fund knowledge in the information source and then establishing a fund knowledge base, wherein the fund knowledge base comprises a plurality of subgraphs, and each subgraph comprises an entity, a relation and an attribute;
the fusion unit is used for fusing the fund knowledge in the fund knowledge base according to a preset rule;
a storage unit configured to store the fused fund knowledge base in a database;
the query unit is used for receiving a fund query request and screening out a fund entity closest to the fund query request from the fund knowledge base, wherein the fund query request and the fund entity are a pair of fund pairs;
the generation unit is arranged to acquire the fund information of the fund pair from the fund knowledge base and generate a corresponding bipartite graph;
the computing unit is arranged to acquire the bipartite graph, calculate the weighted similarity of the entities contained in the bipartite graph by using a preset algorithm, and display the calculation result.
Based on the same technical concept, the embodiments of the present application further provide a computer device, where the computer device includes a memory and a processor, where the memory stores computer readable instructions, where the computer readable instructions are executed by one or more processors, cause the one or more processors to perform the steps of the fund similarity calculation method described above.
Based on the same technical concept, the embodiments of the present application also provide a storage medium storing computer readable instructions, which when executed by one or more processors, cause the one or more processors to perform the steps of the fund similarity calculation method as described above.
The fund similarity calculation method, the system, the computer equipment and the storage medium are characterized in that a fund knowledge base is established after the fund knowledge in an information source is extracted, wherein the fund knowledge base comprises a plurality of subgraphs, and each subgraph comprises an entity, a relationship and an attribute; merging the fund knowledge according to a preset rule and storing the fund knowledge in a database; receiving a fund inquiry request, and screening out a fund entity closest to the fund inquiry request from the fund knowledge base, wherein the fund inquiry request and the fund entity are a pair of fund pairs; obtaining the fund information of the fund pair from the fund knowledge base, and generating a corresponding bipartite graph; and calculating the weighted similarity of the entities contained in the two graphs by using a preset algorithm, and displaying the calculation result. And expressing the funds and stocks as two graphs, taking the share holding ratio as the weight of the share holding relation, carrying out weighted similarity calculation, and aiming at the fund holding bin with high similarity, taking the weighted similarity as one of the standards of related fund inquiry and tracking, thereby being beneficial to mining fund managers with similar styles and potential influence relations among the fund managers.
Drawings
FIG. 1 is a flow chart of a method of fund similarity calculation in one embodiment of the present application;
FIG. 2 is a flow chart of knowledge fusion in one embodiment of the present application;
FIG. 3 is a flow chart of generating a bipartite graph in one embodiment of the present application;
FIG. 4 is a schematic illustration of a bipartite graph in one embodiment of the application;
FIG. 5 is a flow chart of computing similarity in one embodiment of the present application;
FIG. 6 is a functional block diagram of a fund similarity calculation system in one embodiment of the present application.
Detailed Description
FIG. 1 is a flowchart of a method for calculating a similarity of funds according to an embodiment of the present application, as shown in FIG. 1, the flowchart includes:
s1, after extracting the fund knowledge in an information source, establishing a fund knowledge base, wherein the fund knowledge base comprises a plurality of subgraphs, and each subgraph comprises an entity, a relation and an attribute;
the knowledge contained in the information source is extracted through the processes of identification, understanding, screening, induction and the like, and a knowledge element base is established, wherein the knowledge element base comprises entities, relations and attributes.
S2, fusing the fund knowledge in the fund knowledge base according to a preset rule;
the method comprises the steps of integrating data of knowledge from different knowledge sources under the same frame specification, and carrying out ID identification on entities in a knowledge element base, wherein the fusion process comprises fusion of new data and old data, and further comprises assessment of knowledge quality and weighted fusion according to preset fusion rules.
S3, storing the fused fund knowledge metadata in a database;
the data after fusion processing is stored in the step, and the storage database can adopt a relational database, an RDF database, a graph database and the like or adopts a mode of combining any databases.
S4, receiving a fund inquiry request, and screening out a fund entity closest to the fund inquiry request from the fund knowledge base, wherein the fund inquiry request and the fund entity are a pair of fund pairs;
in this step, after receiving the query request including the fund a, the query request is matched with the fund B closest to the fund a through keywords in the fund knowledge base, and the fund a and the fund B form a pair of funds.
S5, obtaining the fund information of the fund pair from the fund knowledge base, and generating a corresponding bipartite graph;
in the step, the fund information of the fund A and the fund B in the fund pair is obtained from the fund knowledge base, the fund information comprises the number of stocks and the stock ratio, a bipartite graph of the fund and the stock in a holding bin is generated according to the fund information, and the bipartite graph is transmitted to a real-time arithmetic unit in a fund similarity calculating platform for calculation.
S6, acquiring the bipartite graph, calculating weighted similarity of entities contained in the bipartite graph by using a preset algorithm, and displaying a calculation result;
in the step, weighted similarity calculation is carried out on the foundation pairs through a SimRank++ algorithm, the SimRank algorithm is based on a graph theory model, the relation between objects is modeled as a directed graph G= (V, E), wherein V is a node set of the directed graph, and represents all objects in the application field; e is a collection of edges of the directed graph, representing relationships between objects. For one node a in the graph, the set of neighbor nodes (in-neighbors) associated with all of its incoming edges are denoted as I (a), and the set of neighbor nodes (out-neighbors) corresponding to its outgoing edges are denoted as O (a). Because the SimRank algorithm has the problem of insufficient accuracy, the technical scheme adopts an improved SimRank++ algorithm, and the SimRank++ algorithm is based on the SimRank algorithm and adds an evidence factor.
According to the method, the function of calculating the similarity of the funds is achieved, and the foundation managers with similar styles and potential influence relations among the foundation managers are more facilitated to be mined through calculating the similarity of the funds.
In one embodiment, after extracting the fund knowledge in the information source, a fund knowledge base is established, where the fund knowledge base includes a plurality of subgraphs, and each subgraph includes an entity, a relationship and an attribute, and the method includes:
s101, identifying fund knowledge contained in an information source, and identifying the data type and the data source of the fund knowledge;
in this step, knowledge in the information source is identified according to the data type and the data source, for example, data of an enterprise internal database is structured data, chart data in websites such as a daily fund net is semi-structured data, and whole text data such as fund research report, fund manager resume, snowball community comment and the like is unstructured data.
S102, screening and summarizing according to the data types and the data sources of the fund knowledge, screening the fund knowledge with the same data types and the same data sources, and summarizing the fund knowledge into one type;
in this step, knowledge data with the same data type and the same data source are summarized into the same class, and different extraction methods are adopted according to different data types, for example, data extraction is performed by manually setting rules for structured data, data extraction is performed by crawler or regular expression matching for semi-structured data, and data extraction is performed by natural language processing for unstructured data.
S103, establishing a fund knowledge base according to the generalized and tidied fund knowledge.
In this embodiment, by extracting data in the information source and establishing the fund knowledge base, a foundation is provided for further integration of the data in the fund knowledge base.
FIG. 2 is a flow chart of knowledge fusion in one embodiment of the present application, as shown in FIG. 2, comprising:
s201, carrying out ID identification on each entity in the fund knowledge base;
in this step, before the knowledge data in the knowledge metadata base are fused according to a preset fusion rule, ID identification is performed on all entities, for example, the fund entity and the stock entity use the market transaction code as ID identification.
S202, judging each entity in the fund knowledge base according to the ID, wherein the entity with the uniform ID is the same entity, combining the relationship and the attribute of the same entity, and if the entity does not have the uniform ID, the entity does not need to be combined;
in the step, the data fusion comprises the fusion of new data and old data, and further comprises the steps of evaluating the quality of knowledge and fusing with weights according to preset fusion rules, wherein the preset fusion rules are that the entities in the fund knowledge base are subjected to ID identification, the entities with the same ID identification are subjected to the fusion of the relationship and the attribute, and the entities without the same ID identification are subjected to the fusion of the similar attribute.
In this embodiment, ID identification is performed on the entities in the fund knowledge base, and then the entities performing ID identification are fused according to a preset fusion rule, so that knowledge in the knowledge base is orderly integrated, and the needed fund information can be quickly found in the fund knowledge base.
In one embodiment, the receiving the fund query request and screening the fund knowledge base for a fund entity closest to the fund query request, the fund query request and the fund entity being a pair of funds, includes:
s401, receiving a fund inquiry request, and matching the keywords of the fund inquiry request to the nearest fund entity in the fund knowledge base;
in this step, a fund query request is received, where the fund query request includes a fund a, and the closest fund entity B is matched to the fund knowledge base according to keywords of the fund a.
S402, constructing a pair of funds included in the successfully matched fund inquiry request and the fund entity.
In this step, the funds a included in the successfully matched funds inquiry request and the funds B included in the funds entity form a pair of funds corresponding to the funds inquiry request.
In this embodiment, the foundation is provided for quickly calculating the similarity of the fund pair by matching the fund entities corresponding to the fund query request to the fund knowledge base and forming a pair of fund pairs.
FIG. 3 is a flow chart of generating a bipartite graph in one embodiment of the present application, as shown in FIG. 3, the flow chart includes:
s501, acquiring the fund information of each fund in the fund pair from the fund knowledge base, wherein the fund information comprises the number of stocks and the stock ratio;
in this step, the fund information of the fund pair including the number of stocks contained in the fund a and the number of stocks contained in the fund B and the ratio of the stocks contained in the fund B is obtained from the fund knowledge base.
S502, generating two graphs of the fund and the stock in the holding warehouse from the acquired fund information, and representing the graph as G= (V, E), wherein V is a fund set, and E is a stock set;
in this step, the obtained fund information of the fund pair is generated into a two-part graph of fund and stock, wherein the two-part graph refers to that nodes in the graph can be divided into two subsets, and two nodes associated with any one side come from the two subsets respectively.
Specifically, for example, the two-part diagram shown in fig. 4 shows the funds a and B on the left, the stocks a, B, c, and d on the right, and the sides represent the share holding ratio. The technical scheme is not limited to the enumerated funds A and B, but also to the enumerated stocks a, B, c and d.
In this embodiment, a corresponding bipartite graph is generated for the relevant fund information of the fund pair, so as to provide a basis for the subsequent calculation of the similarity of the fund pair.
FIG. 5 is a flow chart of computing similarity in one embodiment of the present application, as shown in FIG. 5, comprising:
s601, setting the similarity of stocks according to a preset rule, wherein the similarity of stocks is S weighted A representation;
in this step, the similarity of stocks is set according to a preset rule, for example, the similarity of the same stock is set to be 1, the similarity of the stocks in the same industry is set to be 0.5, and the rest are set to be 0.
S602, substituting the stock numbers of the fund pairs in the fund knowledge base into a formula (1) to calculate an evidence factor, wherein the evidence factor is represented by an evidence, and the formula (1) is as follows:
Figure BDA0001786917360000131
in the formula (1), I O (A) and U O (B) I represent the number of the adjacent nodes shared by the foundation A and the foundation B, namely the same number of the strands, it can be seen that evadience is less than 1, and when |o (a) Σo (B) | is larger, it is closer to 1;
in this step, the same number of shares held by the funds A and the funds B is obtained according to all the stocks contained in the funds A and the funds B in the pair, and the same number of shares held by the funds A and the funds B is substituted into the formula (1) to calculate the evidence factor.
S603, substituting stock ratios of the funds contained in the fund pairs into a formula (2) respectively to calculate weight factors corresponding to the funds, wherein the weight factors are represented by W, and the formula (2) is as follows:
Figure BDA0001786917360000132
in the formula (2), W (A, i) represents the weight of the fund A stock i, and varience (i) represents the variance of the weight set of all collar edges of i;
in this step, the ratio of each stock contained in the funds a and B in the pair is substituted into formula (2) to calculate the weight factor of each stock in the corresponding funds.
S604, substituting the similarity of the stocks, the evidence factors and the weight factors into a formula (3) or a formula (4) to calculate the weighted similarity of the pair of foundation points, wherein the formula (3) is as follows:
Figure BDA0001786917360000141
in the formula (3), O (A) represents a set of adjacent nodes corresponding to the outgoing edge of the foundation A, namely all the strands of the foundation A, O (A) represents the number of all the strands of the foundation A, and O i (A) Representing a corresponding adjacent node i of the outgoing edge of the fund A, namely a certain holding strand i of the fund A;
expanding the formula (3) to obtain a formula (4),
Figure BDA0001786917360000142
in the step, the weighted similarity of the fund pair is calculated by substituting the similarity of preset stocks, the evidence factors of the fund pair and the weight factors of all stocks in the fund into the formula (3) or the formula (4).
According to the method, similarity calculation is carried out on the funds A and the funds B contained in the funds by using a SimRank++ algorithm, so that foundation managers with similar styles and potential influence relations among the foundation managers are more facilitated to be mined.
In one embodiment, after the obtaining the bipartite graph and applying a preset algorithm to calculate the weighted similarity of the entities included in the bipartite graph, the method further includes:
obtaining a calculation result of the similarity of the fund pair, comparing the result with a preset threshold, and if the result is higher than the threshold, identifying an entity contained in the fund pair corresponding to the result; and outputting the identified entity to other platforms, wherein the other platforms perform depth relation mining on the identified entity.
In this embodiment, the result of similarity calculated by any fund pair is compared with a preset threshold, and the calculation result lower than the threshold is output to other platforms to perform depth relation mining, which is more beneficial to mining foundation managers with similar styles and potential influence relations between the foundation managers.
Based on the same conception, the application also provides a fund similarity calculation system, as shown in fig. 6, wherein the fund similarity calculation system comprises an extraction unit, a fusion unit, a storage unit, a query unit, a generation unit and an operation unit, wherein: the extraction unit is arranged for extracting the fund knowledge in the information source and then establishing a fund knowledge base, wherein the fund knowledge base comprises a plurality of subgraphs, and each subgraph comprises an entity, a relation and an attribute; the fusion unit is used for fusing the fund knowledge in the fund knowledge base according to a preset rule; a storage unit configured to store the fused fund knowledge base in a database; the query unit is configured to receive a fund query request and match a fund entity closest to the fund query request in the fund knowledge base, and the fund query request and the fund entity are a pair of fund pairs; the generation unit is arranged to acquire the fund information of the fund pair from the fund knowledge base and generate a corresponding bipartite graph; the operation unit is arranged to acquire the two graphs, calculate the weighted similarity of the foundation pairs by using a preset algorithm, and display the calculation result.
Based on the same technical concept, the embodiments of the present application further provide a computer device, where the computer device includes a memory and a processor, where the memory stores computer readable instructions, and the computer readable instructions when executed by one or more processors cause the one or more processors to implement the steps of the fund similarity calculation method in the foregoing embodiments when the computer readable instructions are executed.
Based on the same technical concept, the embodiments of the present application further provide a storage medium storing computer readable instructions, where the computer readable instructions, when executed by one or more processors, cause the one or more processors to implement the steps of the fund similarity calculation method in the above embodiments when the computer readable instructions are executed. The storage medium may be a non-volatile storage medium.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: a Read Only Memory (ROM), a random access database (RAM, random Access Memory), a magnetic disk or optical disk, or the like.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description of the present application.
The above-described embodiments represent only some exemplary embodiments of the present application, which are described in more detail and detail, but are not to be construed as limiting the scope of the patent application. It should be noted that it would be obvious to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, and that such modifications and improvements fall within the scope of the protection claimed in the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (7)

1. The fund similarity calculation method is characterized by comprising the following steps of:
after extracting the fund data in the information source, establishing a fund knowledge base, wherein the fund knowledge base comprises a plurality of subgraphs, and each subgraph comprises an entity, a relationship and an attribute;
ID identification is carried out on each entity in the fund knowledge base; judging each entity in the fund knowledge base according to the ID, wherein the entity with the uniform ID is the same entity, combining the relationship and the attribute of the same entity, and if the entity does not have the uniform ID, the entity does not need to be combined;
storing the fused fund knowledge metadata in a database;
receiving a fund inquiry request, and screening out a fund entity closest to the fund inquiry request from the fund knowledge base, wherein the fund inquiry request and the fund entity are a pair of fund pairs;
obtaining the fund information of each fund in the fund pair from the fund knowledge base, wherein the fund information comprises the number of stocks and the stock ratio; generating two graphs of the fund and the stock with the warehouse holding function from the acquired fund information, and representing the graphs as G= (V, E), wherein V is a fund set, and E is a stock set;
the two graphs are obtained, the similarity of stocks is set according to a preset rule, and the similarity of stocks is obtained by S weighted A representation;
substituting the same number of stocks held by each fund contained in the pair of funds into formula (1) to calculate an evidence factor, wherein the evidence factor is represented by an evidence, and formula (1) is as follows:
Figure FDA0004109885340000011
in the formula (1), I O (A) and U O (B) I represent the number of the adjacent nodes shared by the foundation A and the foundation B, namely the same number of the strands, it can be seen that evadience is less than 1, and when |o (a) Σo (B) | is larger, it is closer to 1;
substituting stock ratios of the funds contained in the pair of funds into a formula (2) respectively to calculate weight factors corresponding to the funds, wherein the weight factors are represented by W, and the formula (2) is as follows:
Figure FDA0004109885340000021
in the formula (2), W (A, i) represents the weight of the fund A stock i, and varience (i) represents the variance of the weight set of all collar edges of i;
substituting the similarity of stocks, the evidence factor and the weight factor into a formula (3) or a formula (4) to calculate the weighted similarity of every two foundation pairs, and displaying the calculation result, wherein the formula (3) is as follows:
Figure FDA0004109885340000022
expanding equation (3) to obtain equation (4), equation (4) is as follows:
Figure FDA0004109885340000023
in the formula (3), O (A) represents a set of adjacent nodes corresponding to the outgoing edge of the foundation A, namely all the strands of the foundation A, O (A) represents the number of all the strands of the foundation A, and O i (A) Representing a corresponding one of the neighboring nodes i of the outgoing edge of the fund a, i.e. a certain holding strand i of the fund a.
2. The method of claim 1, wherein the step of extracting the fund data from the information source and then creating a fund knowledge base, wherein the fund knowledge base comprises a plurality of sub-images, each sub-image comprising entities, relationships, and attributes, comprises:
identifying fund data in an information source, and identifying the data type and the data source of the fund data;
screening and summarizing according to the data types and the data sources of the fund data, screening the fund data with the same data types and the same data sources, and summarizing the fund data into one type;
and establishing a fund knowledge base according to the summarized and consolidated fund data.
3. The method of claim 1, wherein the step of receiving a fund query and screening the fund knowledge base for a fund entity closest to the fund query, the fund query and the fund entity being a pair of funds, comprises:
receiving a fund inquiry request, and matching the keywords of the fund inquiry request to the nearest fund entity in the fund knowledge base;
and constructing the funds contained in the successfully matched fund inquiry request and the fund entity into a pair of fund pairs.
4. The method for calculating the similarity of funds according to claim 1, wherein after obtaining the bipartite graph and calculating the weighted similarity of the entities included in the bipartite graph by using a preset algorithm, the method further comprises:
obtaining a calculation result of the similarity of the fund pair, comparing the result with a preset threshold, and if the result is higher than the threshold, identifying an entity contained in the fund pair corresponding to the result;
and outputting the identified entity to other platforms, wherein the other platforms perform depth relation mining on the identified entity.
5. A fund similarity computing system, the fund similarity computing system comprising:
the extraction unit is arranged for extracting the fund knowledge in the information source and then establishing a fund knowledge base, wherein the fund knowledge base comprises a plurality of subgraphs, and each subgraph comprises an entity, a relation and an attribute;
the fusion unit is used for carrying out ID identification on each entity in the fund knowledge base; judging each entity in the fund knowledge base according to the ID, wherein the entity with the uniform ID is the same entity, combining the relationship and the attribute of the same entity, and if the entity does not have the uniform ID, the entity does not need to be combined;
a storage unit configured to store the fused fund knowledge base in a database;
the query unit is used for receiving a fund query request and screening out a fund entity closest to the fund query request from the fund knowledge base, wherein the fund query request and the fund entity are a pair of fund pairs;
a generation unit configured to acquire, from the fund knowledge base, fund information of each of the pair of funds, the fund information including a stock number and a stock ratio; generating two graphs of the fund and the stock with the warehouse holding function from the acquired fund information, and representing the graphs as G= (V, E), wherein V is a fund set, and E is a stock set;
an operation unit configured to acquire the bipartite graph, and set a similarity of stocks according to a preset rule, wherein the similarity of stocks is S weighted A representation;
substituting the same number of stocks held by each fund contained in the pair of funds into formula (1) to calculate an evidence factor, wherein the evidence factor is represented by an evidence, and formula (1) is as follows:
Figure FDA0004109885340000051
in the formula (1), I O (A) and U O (B) I represent the number of the adjacent nodes shared by the foundation A and the foundation B, namely the same number of the strands, it can be seen that evadience is less than 1, and when |o (a) Σo (B) | is larger, it is closer to 1;
substituting stock ratios of the funds contained in the pair of funds into a formula (2) respectively to calculate weight factors corresponding to the funds, wherein the weight factors are represented by W, and the formula (2) is as follows:
Figure FDA0004109885340000052
in the formula (2), W (A, i) represents the weight of the fund A stock i, and varience (i) represents the variance of the weight set of all collar edges of i;
substituting the similarity of stocks, the evidence factor and the weight factor into a formula (3) or a formula (4) to calculate the weighted similarity of every two foundation pairs, and displaying the calculation result, wherein the formula (3) is as follows:
Figure FDA0004109885340000053
expanding equation (3) to obtain equation (4), equation (4) is as follows:
Figure FDA0004109885340000054
in the formula (3), O (A) represents a set of adjacent nodes corresponding to the outgoing edge of the foundation A, namely all the strands of the foundation A, O (A) represents the number of all the strands of the foundation A, and O i (A) Representing a corresponding one of the neighboring nodes i of the outgoing edge of the fund a, i.e. a certain holding strand i of the fund a.
6. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by one or more of the processors, cause the one or more processors to perform the steps of the fund similarity calculation method of any of claims 1 to 4.
7. A storage medium storing computer-readable instructions that, when executed by one or more processors, cause one or more of the processors to perform the steps of the fund similarity calculation method of any of claims 1 to 4.
CN201811019295.1A 2018-09-03 2018-09-03 Fund similarity calculation method, system, computer equipment and storage medium Active CN109408643B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811019295.1A CN109408643B (en) 2018-09-03 2018-09-03 Fund similarity calculation method, system, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811019295.1A CN109408643B (en) 2018-09-03 2018-09-03 Fund similarity calculation method, system, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109408643A CN109408643A (en) 2019-03-01
CN109408643B true CN109408643B (en) 2023-05-30

Family

ID=65463876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811019295.1A Active CN109408643B (en) 2018-09-03 2018-09-03 Fund similarity calculation method, system, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109408643B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110111110A (en) * 2019-04-01 2019-08-09 北京三快在线科技有限公司 The method and apparatus of knowledge based map detection fraud, storage medium
CN111428053B (en) * 2020-03-30 2023-10-20 西安交通大学 Construction method of tax field-oriented knowledge graph
CN111563133A (en) * 2020-05-06 2020-08-21 支付宝(杭州)信息技术有限公司 Method and system for data fusion based on entity relationship
CN112800285B (en) * 2021-02-03 2024-10-18 京东科技控股股份有限公司 Data query method, device, storage medium and product based on graph database

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8285719B1 (en) * 2008-08-08 2012-10-09 The Research Foundation Of State University Of New York System and method for probabilistic relational clustering
CN106126828A (en) * 2016-06-28 2016-11-16 北京大学 A kind of enhanced scalability SimRank computational methods based on unidirectional migration
CN107742131A (en) * 2017-11-06 2018-02-27 众安信息技术服务有限公司 Financial asset sorting technique and device
CN107943873A (en) * 2017-11-13 2018-04-20 平安科技(深圳)有限公司 Knowledge mapping method for building up, device, computer equipment and storage medium
CN108363816A (en) * 2018-03-21 2018-08-03 北京理工大学 Open entity relation extraction method based on sentence justice structural model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8195693B2 (en) * 2004-12-16 2012-06-05 International Business Machines Corporation Automatic composition of services through semantic attribute matching

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8285719B1 (en) * 2008-08-08 2012-10-09 The Research Foundation Of State University Of New York System and method for probabilistic relational clustering
CN106126828A (en) * 2016-06-28 2016-11-16 北京大学 A kind of enhanced scalability SimRank computational methods based on unidirectional migration
CN107742131A (en) * 2017-11-06 2018-02-27 众安信息技术服务有限公司 Financial asset sorting technique and device
CN107943873A (en) * 2017-11-13 2018-04-20 平安科技(深圳)有限公司 Knowledge mapping method for building up, device, computer equipment and storage medium
CN108363816A (en) * 2018-03-21 2018-08-03 北京理工大学 Open entity relation extraction method based on sentence justice structural model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
万华林,胡宏,史忠植.利用二部图匹配进行图像相似性度量.计算机辅助设计与图形学学报.2002,(第11期),全文. *
马云龙 ; 林原 ; 林鸿飞 ; .基于权重标准化SimRank方法的查询扩展技术研究.中文信息学报.2011,(第01期),全文. *

Also Published As

Publication number Publication date
CN109408643A (en) 2019-03-01

Similar Documents

Publication Publication Date Title
CN109299090B (en) Foundation centrality calculating method, system, computer equipment and storage medium
CN109408643B (en) Fund similarity calculation method, system, computer equipment and storage medium
US11188537B2 (en) Data processing
US11354282B2 (en) Classifying an unmanaged dataset
US10019442B2 (en) Method and system for peer detection
Karthikeyan et al. A survey on association rule mining
CN106844407B (en) Tag network generation method and system based on data set correlation
CN107729336A (en) Data processing method, equipment and system
CN110019689A (en) Position matching process and position matching system
CN110795524B (en) Main data mapping processing method and device, computer equipment and storage medium
CN106407208A (en) Establishment method and system for city management ontology knowledge base
US20220129635A1 (en) Semantic model instantiation method, system and apparatus
CN113010688A (en) Knowledge graph construction method, device and equipment and computer readable storage medium
CN110287292B (en) Judgment criminal measuring deviation degree prediction method and device
US11295078B2 (en) Portfolio-based text analytics tool
CN112907358A (en) Loan user credit scoring method, loan user credit scoring device, computer equipment and storage medium
Dharmawan et al. Book recommendation using Neo4j graph database in BibTeX book metadata
Abul-Basher et al. Tasweet: optimizing disjunctive regular path queries in graph databases
CN116049379A (en) Knowledge recommendation method, knowledge recommendation device, electronic equipment and storage medium
CN113781246B (en) Strategy generation method and device based on preset label and storage medium
CN114254617A (en) Method, device, computing equipment and storage medium for revising clauses
CN112131259B (en) Similar malicious software recommendation method, device, medium and equipment
CN117993772A (en) Knowledge graph-based crowdsourcing data acquisition method and system and electronic equipment
CN115293479A (en) Public opinion analysis workflow system and method thereof
Giacometti et al. Comparison table generation from knowledge bases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant