CN109408643B - Fund similarity calculation method, system, computer equipment and storage medium - Google Patents
Fund similarity calculation method, system, computer equipment and storage medium Download PDFInfo
- Publication number
- CN109408643B CN109408643B CN201811019295.1A CN201811019295A CN109408643B CN 109408643 B CN109408643 B CN 109408643B CN 201811019295 A CN201811019295 A CN 201811019295A CN 109408643 B CN109408643 B CN 109408643B
- Authority
- CN
- China
- Prior art keywords
- fund
- entity
- knowledge base
- similarity
- formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/06—Asset management; Financial planning or analysis
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Operations Research (AREA)
- Technology Law (AREA)
- Human Resources & Organizations (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Game Theory and Decision Science (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The present disclosure relates to the field of graph theory and network analysis technologies, and in particular, to a method, a system, a computer device, and a storage medium for calculating a fund similarity. A fund similarity calculation method comprises the steps of extracting fund knowledge in an information source and then establishing a fund knowledge base; merging the fund knowledge according to a preset rule and storing the fund knowledge in a database; receiving a fund inquiry request, and screening out a fund entity closest to the fund inquiry request from a fund knowledge base, wherein the fund inquiry request and the fund entity are a pair of fund pairs; obtaining the fund information of the fund pair from the fund knowledge base, and generating a corresponding bipartite graph; and calculating the weighted similarity of the entities contained in the two graphs by using a preset algorithm, and displaying the calculation result. The invention expresses funds and stocks as two graphs, and performs weighted similarity calculation on entities contained in the two graphs, thereby facilitating the excavation of foundation managers with similar styles and potential influence relations among the foundation managers.
Description
Technical Field
The present disclosure relates to the field of graph theory and network analysis technologies, and in particular, to a method, a system, a computer device, and a storage medium for calculating a fund similarity.
Background
Knowledge graphs (knowledgegraphs) are a very popular research area at present. It is essentially a semantic network whose nodes represent entities or concepts and edges represent various semantic relationships between entities/concepts. For a knowledge base containing numerous entities, it is necessary to pay attention to the association information between the entities in addition to the information of the entities themselves. One of the problems faced is: given two entities, how similar they are, and how high they are.
The similarity between entities refers to the similarity between entities in deep semantics, and does not only focus on the traditional similarity of surface information. Judging the similarity between entities requires first understanding the semantic information of the entities, and the traditional character similarity method is not feasible. In the knowledge base in which the data is already stored in a structured manner, the attribute of the entity can be used as a main basis for similarity judgment. Then, the attributes of the entities are various, and how to judge what is important and how to calculate the similarity of the attributes becomes a key to solving the problem.
The prior knowledge graph has less analysis on entity similarity. Meanwhile, in the field of finance, for the similarity of the fund holding bins, the traditional method is to count the same proportion sum of the same stocks by two-by-two funds, and the holding bin distribution is not considered, so that the method is not comprehensive enough.
Disclosure of Invention
Based on this, it is necessary to provide a method, a system, a computer device and a storage medium for calculating the similarity of funds, which are necessary to provide a method, a system, a computer device and a storage medium for calculating the similarity of funds, which have less analysis on the similarity of entities in the existing knowledge graph and which do not consider the problem of the distribution of the holding bins itself in the calculation of the similarity of the funds holding bins in the financial field.
A fund similarity calculation method comprises the following steps:
after extracting the fund knowledge in the information source, establishing a fund knowledge base, wherein the fund knowledge base comprises a plurality of subgraphs, and each subgraph comprises an entity, a relationship and an attribute;
fusing the fund knowledge in the fund knowledge base according to a preset rule;
storing the fused fund knowledge metadata in a database;
receiving a fund inquiry request, and screening out a fund entity closest to the fund inquiry request from the fund knowledge base, wherein the fund inquiry request and the fund entity are a pair of fund pairs;
obtaining the fund information of the fund pair from the fund knowledge base, and generating a corresponding bipartite graph;
and obtaining the bipartite graph, calculating the weighted similarity of the entities contained in the bipartite graph by using a preset algorithm, and displaying the calculation result.
In one embodiment, after extracting the fund knowledge in the information source, a fund knowledge base is established, where the fund knowledge base includes a plurality of subgraphs, and each subgraph includes an entity, a relationship and an attribute, and the method includes:
identifying fund knowledge contained in an information source, and identifying the data type and the data source of the fund knowledge;
screening and summarizing according to the data types and the data sources of the fund knowledge, screening the fund knowledge with the same data types and the same data sources, and summarizing the fund knowledge into one type;
and establishing a fund knowledge base according to the generalized and tidied fund knowledge.
In one embodiment, the fusing the fund knowledge in the fund knowledge base according to a preset rule includes:
ID identification is carried out on each entity in the fund knowledge base;
and judging each entity in the fund knowledge base according to the ID mark, wherein the entity with the uniform ID mark is the same entity, combining the relationship and the attribute of the same entity, and if the entity does not have the uniform ID mark, the entity does not need to be combined.
In one embodiment, the receiving the fund query request and screening the fund knowledge base for a fund entity closest to the fund query request, the fund query request and the fund entity being a pair of funds, includes:
receiving a fund inquiry request, and matching the keywords of the fund inquiry request to the nearest fund entity in the fund knowledge base;
and constructing the funds contained in the successfully matched fund inquiry request and the fund entity into a pair of fund pairs.
In one embodiment, the obtaining the fund information of the fund pair from the fund knowledge base, and generating the corresponding bipartite graph include:
obtaining the fund information of each fund in the fund pair from the fund knowledge base, wherein the fund information comprises the number of stocks and the stock ratio;
and generating two graphs of the fund and the stock with the warehouse-holding function from the acquired fund information, and representing the two graphs as G= (V, E), wherein V is a fund set, and E is a stock set.
In one embodiment, the obtaining the fund information of the fund pair in the fund knowledge base generates a corresponding bipartite graph, and further includes:
setting the similarity of stocks according to a preset rule, wherein the similarity of stocks is S weighted A representation;
substituting the same number of stocks held by each fund contained in the pair of funds into formula (1) to calculate an evidence factor, wherein the evidence factor is represented by an evidence, and formula (1) is as follows:
in the formula (1), I O (A) and U O (B) I represent the number of the adjacent nodes shared by the foundation A and the foundation B, namely the same number of the strands, it can be seen that evadience is less than 1, and when |o (a) Σo (B) | is larger, it is closer to 1;
substituting stock ratios of the funds contained in the pair of funds into a formula (2) respectively to calculate weight factors corresponding to the funds, wherein the weight factors are represented by W, and the formula (2) is as follows:
in the formula (2), W (A, i) represents the weight of the fund A stock i, and varience (i) represents the variance of the weight set of all collar edges of i;
substituting the similarity of stocks, the evidence factor and the weight factor into formula (3) or formula (4) to calculate the weighted similarity of every two foundation pairs, wherein formula (3) is as follows:
expanding equation (3) to obtain equation (4), equation (4) is as follows:
in the formula (3), O (A) represents a set of adjacent nodes corresponding to the outgoing edge of the foundation A, namely all the strands of the foundation A, O (A) represents the number of all the strands of the foundation A, and O i (A) Representing a corresponding one of the neighboring nodes i of the outgoing edge of the fund a, i.e. a certain holding strand i of the fund a.
In one embodiment, after the obtaining the bipartite graph and applying a preset algorithm to calculate the weighted similarity of the entities included in the bipartite graph, the method further includes:
obtaining a calculation result of the similarity of the fund pair, comparing the result with a preset threshold, and if the result is higher than the threshold, identifying the fund entity contained in the fund pair corresponding to the result;
and outputting the identified entity to other platforms, wherein the other platforms perform depth relation mining on the identified entity.
Based on the same conception, the present application also provides a fund similarity calculation system, the fund similarity calculation system comprising:
the extraction unit is arranged for extracting the fund knowledge in the information source and then establishing a fund knowledge base, wherein the fund knowledge base comprises a plurality of subgraphs, and each subgraph comprises an entity, a relation and an attribute;
the fusion unit is used for fusing the fund knowledge in the fund knowledge base according to a preset rule;
a storage unit configured to store the fused fund knowledge base in a database;
the query unit is used for receiving a fund query request and screening out a fund entity closest to the fund query request from the fund knowledge base, wherein the fund query request and the fund entity are a pair of fund pairs;
the generation unit is arranged to acquire the fund information of the fund pair from the fund knowledge base and generate a corresponding bipartite graph;
the computing unit is arranged to acquire the bipartite graph, calculate the weighted similarity of the entities contained in the bipartite graph by using a preset algorithm, and display the calculation result.
Based on the same technical concept, the embodiments of the present application further provide a computer device, where the computer device includes a memory and a processor, where the memory stores computer readable instructions, where the computer readable instructions are executed by one or more processors, cause the one or more processors to perform the steps of the fund similarity calculation method described above.
Based on the same technical concept, the embodiments of the present application also provide a storage medium storing computer readable instructions, which when executed by one or more processors, cause the one or more processors to perform the steps of the fund similarity calculation method as described above.
The fund similarity calculation method, the system, the computer equipment and the storage medium are characterized in that a fund knowledge base is established after the fund knowledge in an information source is extracted, wherein the fund knowledge base comprises a plurality of subgraphs, and each subgraph comprises an entity, a relationship and an attribute; merging the fund knowledge according to a preset rule and storing the fund knowledge in a database; receiving a fund inquiry request, and screening out a fund entity closest to the fund inquiry request from the fund knowledge base, wherein the fund inquiry request and the fund entity are a pair of fund pairs; obtaining the fund information of the fund pair from the fund knowledge base, and generating a corresponding bipartite graph; and calculating the weighted similarity of the entities contained in the two graphs by using a preset algorithm, and displaying the calculation result. And expressing the funds and stocks as two graphs, taking the share holding ratio as the weight of the share holding relation, carrying out weighted similarity calculation, and aiming at the fund holding bin with high similarity, taking the weighted similarity as one of the standards of related fund inquiry and tracking, thereby being beneficial to mining fund managers with similar styles and potential influence relations among the fund managers.
Drawings
FIG. 1 is a flow chart of a method of fund similarity calculation in one embodiment of the present application;
FIG. 2 is a flow chart of knowledge fusion in one embodiment of the present application;
FIG. 3 is a flow chart of generating a bipartite graph in one embodiment of the present application;
FIG. 4 is a schematic illustration of a bipartite graph in one embodiment of the application;
FIG. 5 is a flow chart of computing similarity in one embodiment of the present application;
FIG. 6 is a functional block diagram of a fund similarity calculation system in one embodiment of the present application.
Detailed Description
FIG. 1 is a flowchart of a method for calculating a similarity of funds according to an embodiment of the present application, as shown in FIG. 1, the flowchart includes:
s1, after extracting the fund knowledge in an information source, establishing a fund knowledge base, wherein the fund knowledge base comprises a plurality of subgraphs, and each subgraph comprises an entity, a relation and an attribute;
the knowledge contained in the information source is extracted through the processes of identification, understanding, screening, induction and the like, and a knowledge element base is established, wherein the knowledge element base comprises entities, relations and attributes.
S2, fusing the fund knowledge in the fund knowledge base according to a preset rule;
the method comprises the steps of integrating data of knowledge from different knowledge sources under the same frame specification, and carrying out ID identification on entities in a knowledge element base, wherein the fusion process comprises fusion of new data and old data, and further comprises assessment of knowledge quality and weighted fusion according to preset fusion rules.
S3, storing the fused fund knowledge metadata in a database;
the data after fusion processing is stored in the step, and the storage database can adopt a relational database, an RDF database, a graph database and the like or adopts a mode of combining any databases.
S4, receiving a fund inquiry request, and screening out a fund entity closest to the fund inquiry request from the fund knowledge base, wherein the fund inquiry request and the fund entity are a pair of fund pairs;
in this step, after receiving the query request including the fund a, the query request is matched with the fund B closest to the fund a through keywords in the fund knowledge base, and the fund a and the fund B form a pair of funds.
S5, obtaining the fund information of the fund pair from the fund knowledge base, and generating a corresponding bipartite graph;
in the step, the fund information of the fund A and the fund B in the fund pair is obtained from the fund knowledge base, the fund information comprises the number of stocks and the stock ratio, a bipartite graph of the fund and the stock in a holding bin is generated according to the fund information, and the bipartite graph is transmitted to a real-time arithmetic unit in a fund similarity calculating platform for calculation.
S6, acquiring the bipartite graph, calculating weighted similarity of entities contained in the bipartite graph by using a preset algorithm, and displaying a calculation result;
in the step, weighted similarity calculation is carried out on the foundation pairs through a SimRank++ algorithm, the SimRank algorithm is based on a graph theory model, the relation between objects is modeled as a directed graph G= (V, E), wherein V is a node set of the directed graph, and represents all objects in the application field; e is a collection of edges of the directed graph, representing relationships between objects. For one node a in the graph, the set of neighbor nodes (in-neighbors) associated with all of its incoming edges are denoted as I (a), and the set of neighbor nodes (out-neighbors) corresponding to its outgoing edges are denoted as O (a). Because the SimRank algorithm has the problem of insufficient accuracy, the technical scheme adopts an improved SimRank++ algorithm, and the SimRank++ algorithm is based on the SimRank algorithm and adds an evidence factor.
According to the method, the function of calculating the similarity of the funds is achieved, and the foundation managers with similar styles and potential influence relations among the foundation managers are more facilitated to be mined through calculating the similarity of the funds.
In one embodiment, after extracting the fund knowledge in the information source, a fund knowledge base is established, where the fund knowledge base includes a plurality of subgraphs, and each subgraph includes an entity, a relationship and an attribute, and the method includes:
s101, identifying fund knowledge contained in an information source, and identifying the data type and the data source of the fund knowledge;
in this step, knowledge in the information source is identified according to the data type and the data source, for example, data of an enterprise internal database is structured data, chart data in websites such as a daily fund net is semi-structured data, and whole text data such as fund research report, fund manager resume, snowball community comment and the like is unstructured data.
S102, screening and summarizing according to the data types and the data sources of the fund knowledge, screening the fund knowledge with the same data types and the same data sources, and summarizing the fund knowledge into one type;
in this step, knowledge data with the same data type and the same data source are summarized into the same class, and different extraction methods are adopted according to different data types, for example, data extraction is performed by manually setting rules for structured data, data extraction is performed by crawler or regular expression matching for semi-structured data, and data extraction is performed by natural language processing for unstructured data.
S103, establishing a fund knowledge base according to the generalized and tidied fund knowledge.
In this embodiment, by extracting data in the information source and establishing the fund knowledge base, a foundation is provided for further integration of the data in the fund knowledge base.
FIG. 2 is a flow chart of knowledge fusion in one embodiment of the present application, as shown in FIG. 2, comprising:
s201, carrying out ID identification on each entity in the fund knowledge base;
in this step, before the knowledge data in the knowledge metadata base are fused according to a preset fusion rule, ID identification is performed on all entities, for example, the fund entity and the stock entity use the market transaction code as ID identification.
S202, judging each entity in the fund knowledge base according to the ID, wherein the entity with the uniform ID is the same entity, combining the relationship and the attribute of the same entity, and if the entity does not have the uniform ID, the entity does not need to be combined;
in the step, the data fusion comprises the fusion of new data and old data, and further comprises the steps of evaluating the quality of knowledge and fusing with weights according to preset fusion rules, wherein the preset fusion rules are that the entities in the fund knowledge base are subjected to ID identification, the entities with the same ID identification are subjected to the fusion of the relationship and the attribute, and the entities without the same ID identification are subjected to the fusion of the similar attribute.
In this embodiment, ID identification is performed on the entities in the fund knowledge base, and then the entities performing ID identification are fused according to a preset fusion rule, so that knowledge in the knowledge base is orderly integrated, and the needed fund information can be quickly found in the fund knowledge base.
In one embodiment, the receiving the fund query request and screening the fund knowledge base for a fund entity closest to the fund query request, the fund query request and the fund entity being a pair of funds, includes:
s401, receiving a fund inquiry request, and matching the keywords of the fund inquiry request to the nearest fund entity in the fund knowledge base;
in this step, a fund query request is received, where the fund query request includes a fund a, and the closest fund entity B is matched to the fund knowledge base according to keywords of the fund a.
S402, constructing a pair of funds included in the successfully matched fund inquiry request and the fund entity.
In this step, the funds a included in the successfully matched funds inquiry request and the funds B included in the funds entity form a pair of funds corresponding to the funds inquiry request.
In this embodiment, the foundation is provided for quickly calculating the similarity of the fund pair by matching the fund entities corresponding to the fund query request to the fund knowledge base and forming a pair of fund pairs.
FIG. 3 is a flow chart of generating a bipartite graph in one embodiment of the present application, as shown in FIG. 3, the flow chart includes:
s501, acquiring the fund information of each fund in the fund pair from the fund knowledge base, wherein the fund information comprises the number of stocks and the stock ratio;
in this step, the fund information of the fund pair including the number of stocks contained in the fund a and the number of stocks contained in the fund B and the ratio of the stocks contained in the fund B is obtained from the fund knowledge base.
S502, generating two graphs of the fund and the stock in the holding warehouse from the acquired fund information, and representing the graph as G= (V, E), wherein V is a fund set, and E is a stock set;
in this step, the obtained fund information of the fund pair is generated into a two-part graph of fund and stock, wherein the two-part graph refers to that nodes in the graph can be divided into two subsets, and two nodes associated with any one side come from the two subsets respectively.
Specifically, for example, the two-part diagram shown in fig. 4 shows the funds a and B on the left, the stocks a, B, c, and d on the right, and the sides represent the share holding ratio. The technical scheme is not limited to the enumerated funds A and B, but also to the enumerated stocks a, B, c and d.
In this embodiment, a corresponding bipartite graph is generated for the relevant fund information of the fund pair, so as to provide a basis for the subsequent calculation of the similarity of the fund pair.
FIG. 5 is a flow chart of computing similarity in one embodiment of the present application, as shown in FIG. 5, comprising:
s601, setting the similarity of stocks according to a preset rule, wherein the similarity of stocks is S weighted A representation;
in this step, the similarity of stocks is set according to a preset rule, for example, the similarity of the same stock is set to be 1, the similarity of the stocks in the same industry is set to be 0.5, and the rest are set to be 0.
S602, substituting the stock numbers of the fund pairs in the fund knowledge base into a formula (1) to calculate an evidence factor, wherein the evidence factor is represented by an evidence, and the formula (1) is as follows:
in the formula (1), I O (A) and U O (B) I represent the number of the adjacent nodes shared by the foundation A and the foundation B, namely the same number of the strands, it can be seen that evadience is less than 1, and when |o (a) Σo (B) | is larger, it is closer to 1;
in this step, the same number of shares held by the funds A and the funds B is obtained according to all the stocks contained in the funds A and the funds B in the pair, and the same number of shares held by the funds A and the funds B is substituted into the formula (1) to calculate the evidence factor.
S603, substituting stock ratios of the funds contained in the fund pairs into a formula (2) respectively to calculate weight factors corresponding to the funds, wherein the weight factors are represented by W, and the formula (2) is as follows:
in the formula (2), W (A, i) represents the weight of the fund A stock i, and varience (i) represents the variance of the weight set of all collar edges of i;
in this step, the ratio of each stock contained in the funds a and B in the pair is substituted into formula (2) to calculate the weight factor of each stock in the corresponding funds.
S604, substituting the similarity of the stocks, the evidence factors and the weight factors into a formula (3) or a formula (4) to calculate the weighted similarity of the pair of foundation points, wherein the formula (3) is as follows:
in the formula (3), O (A) represents a set of adjacent nodes corresponding to the outgoing edge of the foundation A, namely all the strands of the foundation A, O (A) represents the number of all the strands of the foundation A, and O i (A) Representing a corresponding adjacent node i of the outgoing edge of the fund A, namely a certain holding strand i of the fund A;
expanding the formula (3) to obtain a formula (4),
in the step, the weighted similarity of the fund pair is calculated by substituting the similarity of preset stocks, the evidence factors of the fund pair and the weight factors of all stocks in the fund into the formula (3) or the formula (4).
According to the method, similarity calculation is carried out on the funds A and the funds B contained in the funds by using a SimRank++ algorithm, so that foundation managers with similar styles and potential influence relations among the foundation managers are more facilitated to be mined.
In one embodiment, after the obtaining the bipartite graph and applying a preset algorithm to calculate the weighted similarity of the entities included in the bipartite graph, the method further includes:
obtaining a calculation result of the similarity of the fund pair, comparing the result with a preset threshold, and if the result is higher than the threshold, identifying an entity contained in the fund pair corresponding to the result; and outputting the identified entity to other platforms, wherein the other platforms perform depth relation mining on the identified entity.
In this embodiment, the result of similarity calculated by any fund pair is compared with a preset threshold, and the calculation result lower than the threshold is output to other platforms to perform depth relation mining, which is more beneficial to mining foundation managers with similar styles and potential influence relations between the foundation managers.
Based on the same conception, the application also provides a fund similarity calculation system, as shown in fig. 6, wherein the fund similarity calculation system comprises an extraction unit, a fusion unit, a storage unit, a query unit, a generation unit and an operation unit, wherein: the extraction unit is arranged for extracting the fund knowledge in the information source and then establishing a fund knowledge base, wherein the fund knowledge base comprises a plurality of subgraphs, and each subgraph comprises an entity, a relation and an attribute; the fusion unit is used for fusing the fund knowledge in the fund knowledge base according to a preset rule; a storage unit configured to store the fused fund knowledge base in a database; the query unit is configured to receive a fund query request and match a fund entity closest to the fund query request in the fund knowledge base, and the fund query request and the fund entity are a pair of fund pairs; the generation unit is arranged to acquire the fund information of the fund pair from the fund knowledge base and generate a corresponding bipartite graph; the operation unit is arranged to acquire the two graphs, calculate the weighted similarity of the foundation pairs by using a preset algorithm, and display the calculation result.
Based on the same technical concept, the embodiments of the present application further provide a computer device, where the computer device includes a memory and a processor, where the memory stores computer readable instructions, and the computer readable instructions when executed by one or more processors cause the one or more processors to implement the steps of the fund similarity calculation method in the foregoing embodiments when the computer readable instructions are executed.
Based on the same technical concept, the embodiments of the present application further provide a storage medium storing computer readable instructions, where the computer readable instructions, when executed by one or more processors, cause the one or more processors to implement the steps of the fund similarity calculation method in the above embodiments when the computer readable instructions are executed. The storage medium may be a non-volatile storage medium.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: a Read Only Memory (ROM), a random access database (RAM, random Access Memory), a magnetic disk or optical disk, or the like.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description of the present application.
The above-described embodiments represent only some exemplary embodiments of the present application, which are described in more detail and detail, but are not to be construed as limiting the scope of the patent application. It should be noted that it would be obvious to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, and that such modifications and improvements fall within the scope of the protection claimed in the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.
Claims (7)
1. The fund similarity calculation method is characterized by comprising the following steps of:
after extracting the fund data in the information source, establishing a fund knowledge base, wherein the fund knowledge base comprises a plurality of subgraphs, and each subgraph comprises an entity, a relationship and an attribute;
ID identification is carried out on each entity in the fund knowledge base; judging each entity in the fund knowledge base according to the ID, wherein the entity with the uniform ID is the same entity, combining the relationship and the attribute of the same entity, and if the entity does not have the uniform ID, the entity does not need to be combined;
storing the fused fund knowledge metadata in a database;
receiving a fund inquiry request, and screening out a fund entity closest to the fund inquiry request from the fund knowledge base, wherein the fund inquiry request and the fund entity are a pair of fund pairs;
obtaining the fund information of each fund in the fund pair from the fund knowledge base, wherein the fund information comprises the number of stocks and the stock ratio; generating two graphs of the fund and the stock with the warehouse holding function from the acquired fund information, and representing the graphs as G= (V, E), wherein V is a fund set, and E is a stock set;
the two graphs are obtained, the similarity of stocks is set according to a preset rule, and the similarity of stocks is obtained by S weighted A representation;
substituting the same number of stocks held by each fund contained in the pair of funds into formula (1) to calculate an evidence factor, wherein the evidence factor is represented by an evidence, and formula (1) is as follows:
in the formula (1), I O (A) and U O (B) I represent the number of the adjacent nodes shared by the foundation A and the foundation B, namely the same number of the strands, it can be seen that evadience is less than 1, and when |o (a) Σo (B) | is larger, it is closer to 1;
substituting stock ratios of the funds contained in the pair of funds into a formula (2) respectively to calculate weight factors corresponding to the funds, wherein the weight factors are represented by W, and the formula (2) is as follows:
in the formula (2), W (A, i) represents the weight of the fund A stock i, and varience (i) represents the variance of the weight set of all collar edges of i;
substituting the similarity of stocks, the evidence factor and the weight factor into a formula (3) or a formula (4) to calculate the weighted similarity of every two foundation pairs, and displaying the calculation result, wherein the formula (3) is as follows:
expanding equation (3) to obtain equation (4), equation (4) is as follows:
in the formula (3), O (A) represents a set of adjacent nodes corresponding to the outgoing edge of the foundation A, namely all the strands of the foundation A, O (A) represents the number of all the strands of the foundation A, and O i (A) Representing a corresponding one of the neighboring nodes i of the outgoing edge of the fund a, i.e. a certain holding strand i of the fund a.
2. The method of claim 1, wherein the step of extracting the fund data from the information source and then creating a fund knowledge base, wherein the fund knowledge base comprises a plurality of sub-images, each sub-image comprising entities, relationships, and attributes, comprises:
identifying fund data in an information source, and identifying the data type and the data source of the fund data;
screening and summarizing according to the data types and the data sources of the fund data, screening the fund data with the same data types and the same data sources, and summarizing the fund data into one type;
and establishing a fund knowledge base according to the summarized and consolidated fund data.
3. The method of claim 1, wherein the step of receiving a fund query and screening the fund knowledge base for a fund entity closest to the fund query, the fund query and the fund entity being a pair of funds, comprises:
receiving a fund inquiry request, and matching the keywords of the fund inquiry request to the nearest fund entity in the fund knowledge base;
and constructing the funds contained in the successfully matched fund inquiry request and the fund entity into a pair of fund pairs.
4. The method for calculating the similarity of funds according to claim 1, wherein after obtaining the bipartite graph and calculating the weighted similarity of the entities included in the bipartite graph by using a preset algorithm, the method further comprises:
obtaining a calculation result of the similarity of the fund pair, comparing the result with a preset threshold, and if the result is higher than the threshold, identifying an entity contained in the fund pair corresponding to the result;
and outputting the identified entity to other platforms, wherein the other platforms perform depth relation mining on the identified entity.
5. A fund similarity computing system, the fund similarity computing system comprising:
the extraction unit is arranged for extracting the fund knowledge in the information source and then establishing a fund knowledge base, wherein the fund knowledge base comprises a plurality of subgraphs, and each subgraph comprises an entity, a relation and an attribute;
the fusion unit is used for carrying out ID identification on each entity in the fund knowledge base; judging each entity in the fund knowledge base according to the ID, wherein the entity with the uniform ID is the same entity, combining the relationship and the attribute of the same entity, and if the entity does not have the uniform ID, the entity does not need to be combined;
a storage unit configured to store the fused fund knowledge base in a database;
the query unit is used for receiving a fund query request and screening out a fund entity closest to the fund query request from the fund knowledge base, wherein the fund query request and the fund entity are a pair of fund pairs;
a generation unit configured to acquire, from the fund knowledge base, fund information of each of the pair of funds, the fund information including a stock number and a stock ratio; generating two graphs of the fund and the stock with the warehouse holding function from the acquired fund information, and representing the graphs as G= (V, E), wherein V is a fund set, and E is a stock set;
an operation unit configured to acquire the bipartite graph, and set a similarity of stocks according to a preset rule, wherein the similarity of stocks is S weighted A representation;
substituting the same number of stocks held by each fund contained in the pair of funds into formula (1) to calculate an evidence factor, wherein the evidence factor is represented by an evidence, and formula (1) is as follows:
in the formula (1), I O (A) and U O (B) I represent the number of the adjacent nodes shared by the foundation A and the foundation B, namely the same number of the strands, it can be seen that evadience is less than 1, and when |o (a) Σo (B) | is larger, it is closer to 1;
substituting stock ratios of the funds contained in the pair of funds into a formula (2) respectively to calculate weight factors corresponding to the funds, wherein the weight factors are represented by W, and the formula (2) is as follows:
in the formula (2), W (A, i) represents the weight of the fund A stock i, and varience (i) represents the variance of the weight set of all collar edges of i;
substituting the similarity of stocks, the evidence factor and the weight factor into a formula (3) or a formula (4) to calculate the weighted similarity of every two foundation pairs, and displaying the calculation result, wherein the formula (3) is as follows:
expanding equation (3) to obtain equation (4), equation (4) is as follows:
in the formula (3), O (A) represents a set of adjacent nodes corresponding to the outgoing edge of the foundation A, namely all the strands of the foundation A, O (A) represents the number of all the strands of the foundation A, and O i (A) Representing a corresponding one of the neighboring nodes i of the outgoing edge of the fund a, i.e. a certain holding strand i of the fund a.
6. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by one or more of the processors, cause the one or more processors to perform the steps of the fund similarity calculation method of any of claims 1 to 4.
7. A storage medium storing computer-readable instructions that, when executed by one or more processors, cause one or more of the processors to perform the steps of the fund similarity calculation method of any of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811019295.1A CN109408643B (en) | 2018-09-03 | 2018-09-03 | Fund similarity calculation method, system, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811019295.1A CN109408643B (en) | 2018-09-03 | 2018-09-03 | Fund similarity calculation method, system, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109408643A CN109408643A (en) | 2019-03-01 |
CN109408643B true CN109408643B (en) | 2023-05-30 |
Family
ID=65463876
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811019295.1A Active CN109408643B (en) | 2018-09-03 | 2018-09-03 | Fund similarity calculation method, system, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109408643B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110111110A (en) * | 2019-04-01 | 2019-08-09 | 北京三快在线科技有限公司 | The method and apparatus of knowledge based map detection fraud, storage medium |
CN111428053B (en) * | 2020-03-30 | 2023-10-20 | 西安交通大学 | Construction method of tax field-oriented knowledge graph |
CN111563133A (en) * | 2020-05-06 | 2020-08-21 | 支付宝(杭州)信息技术有限公司 | Method and system for data fusion based on entity relationship |
CN112800285B (en) * | 2021-02-03 | 2024-10-18 | 京东科技控股股份有限公司 | Data query method, device, storage medium and product based on graph database |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8285719B1 (en) * | 2008-08-08 | 2012-10-09 | The Research Foundation Of State University Of New York | System and method for probabilistic relational clustering |
CN106126828A (en) * | 2016-06-28 | 2016-11-16 | 北京大学 | A kind of enhanced scalability SimRank computational methods based on unidirectional migration |
CN107742131A (en) * | 2017-11-06 | 2018-02-27 | 众安信息技术服务有限公司 | Financial asset sorting technique and device |
CN107943873A (en) * | 2017-11-13 | 2018-04-20 | 平安科技(深圳)有限公司 | Knowledge mapping method for building up, device, computer equipment and storage medium |
CN108363816A (en) * | 2018-03-21 | 2018-08-03 | 北京理工大学 | Open entity relation extraction method based on sentence justice structural model |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8195693B2 (en) * | 2004-12-16 | 2012-06-05 | International Business Machines Corporation | Automatic composition of services through semantic attribute matching |
-
2018
- 2018-09-03 CN CN201811019295.1A patent/CN109408643B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8285719B1 (en) * | 2008-08-08 | 2012-10-09 | The Research Foundation Of State University Of New York | System and method for probabilistic relational clustering |
CN106126828A (en) * | 2016-06-28 | 2016-11-16 | 北京大学 | A kind of enhanced scalability SimRank computational methods based on unidirectional migration |
CN107742131A (en) * | 2017-11-06 | 2018-02-27 | 众安信息技术服务有限公司 | Financial asset sorting technique and device |
CN107943873A (en) * | 2017-11-13 | 2018-04-20 | 平安科技(深圳)有限公司 | Knowledge mapping method for building up, device, computer equipment and storage medium |
CN108363816A (en) * | 2018-03-21 | 2018-08-03 | 北京理工大学 | Open entity relation extraction method based on sentence justice structural model |
Non-Patent Citations (2)
Title |
---|
万华林,胡宏,史忠植.利用二部图匹配进行图像相似性度量.计算机辅助设计与图形学学报.2002,(第11期),全文. * |
马云龙 ; 林原 ; 林鸿飞 ; .基于权重标准化SimRank方法的查询扩展技术研究.中文信息学报.2011,(第01期),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN109408643A (en) | 2019-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109299090B (en) | Foundation centrality calculating method, system, computer equipment and storage medium | |
CN109408643B (en) | Fund similarity calculation method, system, computer equipment and storage medium | |
US11188537B2 (en) | Data processing | |
US11354282B2 (en) | Classifying an unmanaged dataset | |
US10019442B2 (en) | Method and system for peer detection | |
Karthikeyan et al. | A survey on association rule mining | |
CN106844407B (en) | Tag network generation method and system based on data set correlation | |
CN107729336A (en) | Data processing method, equipment and system | |
CN110019689A (en) | Position matching process and position matching system | |
CN110795524B (en) | Main data mapping processing method and device, computer equipment and storage medium | |
CN106407208A (en) | Establishment method and system for city management ontology knowledge base | |
US20220129635A1 (en) | Semantic model instantiation method, system and apparatus | |
CN113010688A (en) | Knowledge graph construction method, device and equipment and computer readable storage medium | |
CN110287292B (en) | Judgment criminal measuring deviation degree prediction method and device | |
US11295078B2 (en) | Portfolio-based text analytics tool | |
CN112907358A (en) | Loan user credit scoring method, loan user credit scoring device, computer equipment and storage medium | |
Dharmawan et al. | Book recommendation using Neo4j graph database in BibTeX book metadata | |
Abul-Basher et al. | Tasweet: optimizing disjunctive regular path queries in graph databases | |
CN116049379A (en) | Knowledge recommendation method, knowledge recommendation device, electronic equipment and storage medium | |
CN113781246B (en) | Strategy generation method and device based on preset label and storage medium | |
CN114254617A (en) | Method, device, computing equipment and storage medium for revising clauses | |
CN112131259B (en) | Similar malicious software recommendation method, device, medium and equipment | |
CN117993772A (en) | Knowledge graph-based crowdsourcing data acquisition method and system and electronic equipment | |
CN115293479A (en) | Public opinion analysis workflow system and method thereof | |
Giacometti et al. | Comparison table generation from knowledge bases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |