CN111079035B - Domain searching and sorting method based on dynamic map link analysis - Google Patents

Domain searching and sorting method based on dynamic map link analysis Download PDF

Info

Publication number
CN111079035B
CN111079035B CN201911146865.8A CN201911146865A CN111079035B CN 111079035 B CN111079035 B CN 111079035B CN 201911146865 A CN201911146865 A CN 201911146865A CN 111079035 B CN111079035 B CN 111079035B
Authority
CN
China
Prior art keywords
file
node
entity
authority
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911146865.8A
Other languages
Chinese (zh)
Other versions
CN111079035A (en
Inventor
鲍家坤
刘思培
高天成
曹玲玲
张志虎
袁鸯
宋春林
侯海婷
邹媛媛
童安玲
李金龙
李香亭
王娟
杨磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North Information Control Institute Group Co ltd
Original Assignee
North Information Control Institute Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North Information Control Institute Group Co ltd filed Critical North Information Control Institute Group Co ltd
Priority to CN201911146865.8A priority Critical patent/CN111079035B/en
Publication of CN111079035A publication Critical patent/CN111079035A/en
Application granted granted Critical
Publication of CN111079035B publication Critical patent/CN111079035B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of Internet searching, and particularly relates to a field searching and sorting method based on dynamic map link analysis. The method and the device establish a semantic-level link relation for file resources in field search, further calculate the semantic-level link relation from two aspects of authority and relativity, and finally realize fusion ordering of search results. The method comprises the following steps: dynamic construction of search sorting-oriented domain patterns; offline incremental calculation of authority of file nodes based on a full graph; on-line calculation of file node relevance based on searching subgraph; search results based on authority and relevance are fused and ranked. According to the method and the device, the entity and the relation in the text content of the file are used as the tie, the originally isolated file is associated from the semantic level, the problem of information island of a single file in search ordering is broken through, analysis and calculation are carried out from the authority level and the correlation level of the file node, and finally fusion ordering of search results is achieved.

Description

Domain searching and sorting method based on dynamic map link analysis
Technical Field
The invention belongs to the field of Internet searching, and particularly relates to a field searching and sorting method based on dynamic map link analysis.
Background
Helping users locate the required resources accurately and quickly is a consistent goal for search engines. However, as information is continuously generated and accumulated, a search often returns a large number of results. Therefore, the search engine must rely on an efficient search ranking method to return the results desired by the user and to give preferential presentation. Compared with internet searching, the method has stronger user specificity and destination in domain searching, and also has higher requirement on searching sorting.
The conventional search ranking method based on word frequency and word position is based on too single ranking basis, and cannot consider the quality of file resources. The existing search ordering method (such as PageRank, hillTop and the like) based on webpage link analysis cannot be directly applied to domain search lacking webpage link relation. The existing search ordering method (such as RankSVM) based on user browsing preference learning usually trains a user-query record as an isolated sample set, and can better process historical search requests of historical users, but is difficult to provide effective ordering for new users or new requests; even though improved by similar "user-queries", it is not applicable to small user volume domain search scenarios. The bidding ranking method of the internet search engine is contrary to the principles of the professionality and the authority of the field search and is not applicable.
Disclosure of Invention
The invention aims to provide a domain searching and sorting method based on dynamic map link analysis.
The technical solution for realizing the purpose of the invention is as follows:
the method comprises the steps of firstly establishing a semantic level link relation for file resources in domain searching, further calculating from two aspects of authority and relativity, and finally realizing fusion ordering of search results; the method comprises the following specific steps:
step (1): dynamic construction of a domain map facing to search sequencing; taking various file sets in the field as input, and constructing a field map;
step (2): offline calculation of the authority increment of the file node based on the full graph; taking the domain map in the step (1) as input, and calculating to obtain authority of each file node in the domain map;
step (3): on-line calculation of file node relevance based on searching subgraph; taking a domain map and a user search term as input, extracting a search sub-map related to search from the whole domain map, and calculating the relevance of each file node in the sub-map;
step (4): the search results based on the authority degree and the relativity degree are fused and sequenced; and (3) comprehensively calculating the ranking degree of the file nodes by taking authority degree and correlation degree of each file node in the searching subgraph in the step (3) as input in the calculation process, and sequencing according to the ranking degree and returning to the user.
Compared with the prior art, the invention has the remarkable advantages that:
(1) According to the search ordering method-oriented domain map construction method, the entity and the relation in the text content of the file are used as the tie, the originally isolated file is associated from the semantic level, the problem of information island of a single file in search ordering is broken through, all domain files are brought into the same association system to be evaluated, and the constructed domain map lays a foundation for analyzing authority degree and relativity of each file node.
(2) The definition and calculation method of the authority degree and the correlation degree of the file node based on the domain map, which are provided in the step (2) and the step (3), can quantitatively evaluate the authority degree of the file node in the whole domain map and the correlation between the file node and the search keyword input by the user in the search subgraph, thereby realizing the search ordering method of the fusion authority degree and the correlation degree, which is provided in the step (4).
(3) The dynamic construction method and the incremental calculation method provided in the step (1) and the step (2) can dynamically construct the domain map and calculate the authority increment of the file node in the whole domain map according to the addition, deletion and modification conditions of the file to be searched in the domain search, so that the calculation amount of the system is reduced, and the calculation efficiency and the practicability of the system are improved.
Drawings
FIG. 1 is a flow chart of a search ranking method of the present invention.
Fig. 2 is a partial schematic diagram of a domain map of the search ranking method of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
As shown in fig. 1, an overall flowchart of a domain search ranking method based on dynamic graph link analysis according to an embodiment of the present invention includes the following steps:
and S1, dynamically constructing a search sorting field-oriented map. Fig. 2 is a partial schematic diagram of a domain map, which is composed of 4 elements, namely entity nodes, file nodes, associated edges and linked edges. The file node corresponds to all files to be searched in the field search, and the file types include but are not limited to text files, multimedia files, database files and the like; the entity node corresponds to the entity described in the file content and is obtained through the steps of entity extraction, entity disambiguation, coreference resolution and the like; the association side describes the association relation between the entity nodes, is obtained through relation extraction on the basis of extracting the entity nodes, discovers a new potential relation between the entities through relation reasoning, has a weight value, and the weight value represents the tightness degree of the relation between the two entity nodes; the link edge is used for connecting the entity node and the file node, the entity node is extracted according to the file description, the link edge has a weight, and the weight size represents the tightness degree of the file and the entity.
In step S101, the "entity node" and "association edge" construct and "association weight" are calculated. By combining with the prior knowledge of the field, named entities which are as accurate as possible and have no ambiguity can be obtained by carrying out entity identification, entity disambiguation (solving the problem of synonymy and dissimilarity) and coreference resolution (solving the problem of synonymy and dissimilarity) on the text content of the file, and the entities form 'entity nodes' in the field map. Further, through identifying the association relation between the entities in the text content, potential candidate relation is obtained, and through disambiguation and resolution of the relation, accurate and unambiguous association relation is obtained, the association relation forms an association side in the domain map, and the association side is noticed to have directionality and appear in pairs. Each associated edge has an "associated weight" that represents how tightly the relationship between the entities is, and may be, but is not limited to, represented by co-occurrence of the entities.
The method comprises the following two steps of initial association weight calculation and normalization: if the entities at the two ends of the associated edge co-appear in k files in total, the initial association weight corrValue' (i, j) of the associated edge is equal to k; after the association weights of all the association edges are calculated, the initial association weights corrValue' (i, j) sent by the same entity node are normalized according to the numerical proportion, and the association weights corrValue (i, j) of the association edges can be obtained.
Step S102, the "file node" and "link edge" are constructed and calculated with the "link weight". The file nodes in the domain map and the files to be searched are mutually bijective, and can be directly constructed, and each file node in the map represents one file to be searched. If a certain 'entity node' is extracted from the file content corresponding to a certain 'file node', a link edge exists between the entity node and the file node. The link edge weight calculation comprises two processes of initial link weight calculation and normalization calculation. The initial link weight calculation considers two aspects, namely the association degree alpha of the entity node to the file node and the importance degree beta of the file node to the entity node.
(1) When it is difficult for a file node to manually classify or evaluate the importance of an entity node, β≡1 is different for different file nodes. At this time, after calculating the initial weights of all the link edges, normalizing the initial weights of all the link edges connected with the same file node to obtain the link weight linkValue' =α; alpha adopts the following calculation method:
α=TF(t,d)·IDF(t,d)·α 1 (t,d)
where t is the entity name of the entity node, d is the file to be retrieved, TF (t, d) is the frequency of occurrence of t in d, idf=log (N/(N) t,d +γ)) (N is the number of files in the set of files to be retrieved, N t,d For the number of files containing entity t, γ takes 0.01 to ensure that the denominator is not zero), α 1 And (t, d) the position coefficient is greater than 1 coefficient when the entity name t is in a special position such as title, abstract, keyword and the like, otherwise, 1 coefficient is obtained.
(2) Further, when it is possible to classify and score the entities and the files manually according to different fields, for example, the files in the financial field are classified into types such as report, account, financial news, etc., the files in the mechanical field are classified into types such as instruction manual, operation manual, reference materials, etc., the files in the software field are classified into types such as software test description, software development manual, software test report, etc., and the β value is set for the importance degree of different types of files in each field. At this time, the initial link weight linkValue' =α·β (α calculation method is the same as the above case).
And step S103, dynamically updating the map increment. In the application scene of the domain search, the file to be searched has the possibility of updating change, so that a corresponding domain map increment updating mechanism needs to be designed, and the global map reconstruction caused by local file change is avoided. The change forms of the file set to be searched comprise 3 types of new files, deleted files and modified files. In the case of newly added files, extraction of entity nodes, file nodes, associated edges and linked edges corresponding to the newly added files is required to be completed according to the methods in the steps S101 and S102; and updating the weights of the affected associated edges and the linked edges. In the case of deleting files, corresponding file nodes and associated edges thereof need to be deleted first; if the entity node is caused to have no connected link edge, deleting the entity node and the related edge thereof; and updating the weights of the affected associated edges and the linked edges. And updating the domain map according to the equivalent operation of deleting and adding in the case of modifying the file.
And S2, performing offline incremental calculation on authority of the file node based on the full graph. The invention takes the entity nodes of the domain map as the states which can be reached by the system, the transition probability among the states is determined by the association edge weight among the entity nodes, and the whole system forms a Markov chain, and the stable distribution of the Markov chain is the authority degree of the entity nodes. If the total number of the entity nodes is N, the transfer matrix is B N×N (N rows and N columns of matrix B), and N entity node authority vectors x N×1 (N row 1 column vector x), bx=x. The method is based on a Monte Carlo method, utilizes random walk to simulate the behavior of a user accessing entity nodes, and can update the random walk process in an increment aiming at the affected entity nodes when the domain map changes, thereby realizing the increment calculation of the authority of the entity nodes. The authority of the file node is equal to the sum of the authority of each link weight of the file node multiplied by the authority of the linked entity node.
In step S201, an authority degree design of "entity node".
If the entity node i has an associated edge pointing to the entity node j, A ji =corrvue (i, j), otherwise a ji =0. In step S101, the weight of the associated edge sent by the same entity node is normalized, so if the entity node i has an associated edge pointing to the entity node, the sum of the ith column of the matrix a is 1. If the entity node i does not point to any other entity node, then the force is made A ii =1. This ensures that matrix a is a column and all 1 transfer matrix.
Considering that the user has a certain probability 1-delta (which can be obtained by counting the number of times the user directly accesses the new node/the total number of times the user accesses each node) and skips the link relation, the user directly accesses the new node, the method can be obtained according to a Markov model:
Figure GDA0004076750730000051
according to the definition in step S2, the authority vectors x of the N entity nodes are the smooth distribution of the markov chain. X in the above n 、x n+1 To calculate x (x=x can be considered as an iterative process n=∞ )。
Order the
Figure GDA0004076750730000052
B also satisfies the column sum being all 1, then the entity node authority is equivalent to solving the stationary distribution of the markov chain with the transition matrix B, i.e. the authority vector satisfies x=bx (equivalent to x=x n=∞ )。
In step S202, the "entity node" authority increment calculation is designed based on the monte carlo method. The behavior of the user accessing the entity nodes is simulated by using the random walk, and the smooth distribution of the Markov chain in the step S201, namely the authority of each entity node is estimated by counting the accessed times of each node.
The invention adopts a circulation starting point mode, and starts M random walk processes (total N multiplied by M random walk processes) by taking N entity nodes as starting points, wherein each step of random walk directly accesses a new node (can be regarded as the current random walk stop) with the probability of (1-alpha), and walks from the entity node i to the entity node j with the probability of alpha-corrValue (i, j). Finally, the number v (i) of accessed times of any entity node i is counted, and then v (i) is divided by the sum of the accessed times of all entity nodes, so that the average access probability of the node i, namely authority of the entity node i, is obtained.
When the domain map structure changes, authority degrees of all entity nodes can be calculated in an incremental mode. The specific method is that a random walk process before each round of map structure change is firstly required to be recorded, entity nodes (including adding and deleting of entity nodes, a set of the entity nodes is marked as X) and associated edges (including adding and deleting of associated edges or weight change of the associated edges, a set of the entity nodes is marked as Y) which generate changes in the round of map are counted, the entity nodes related to the X or the entity nodes connected with the Y are marked as a set Z, and then the X is the trigger node which needs to update the flow in the random walk. The updating process is to examine N multiplied by M random walk processes of the previous round, find a first trigger node in each random walk process, reserve random walk before the trigger node, continue to carry out subsequent random walk according to a new domain map, and calculate authority of each entity node.
Step S203, authority degree calculation of the file node is performed. The authority level file of the file node is equal to the sum of the link weight values linkValue of the file node multiplied by the authority level authority of the linked entity node. That is to say,
Figure GDA0004076750730000061
where authorityFile (p) represents the authority of the file node p; authorityEntity (q) the authority of the entity node q, and there is a link edge between the file node p and the entity node q; linkValue (p, q) represents the link edge weight between file node p and entity node q.
And step S3, online calculation of the relevance of the file node based on the searching subgraph. And extracting a search subgraph from the domain map according to the file nodes contained in the search result. The relevance of the entity node is determined by the number of file nodes linked by the entity node. The file node relevance is determined by the product of the weight of each link edge of the file node and the relevance of the link to the entity node.
Step 301, searching for subgraph construction. The searching subgraph is constructed according to the related results obtained by each search and is the subgraph of the domain map. Each relevant result obtained by the search engine through keyword matching and the like corresponds to a certain file node, and the file nodes form a 'file node' of the search subgraph. The linked edges of the file nodes in the domain map and the linked entity nodes respectively form the linked edges and the entity nodes of the search subgraph. And (3) reserving the association relation among the entity nodes according to the structure of the domain map in the searching sub-graph to form the association edge of the searching sub-graph.
In step 302, a "entity node" relevance is calculated. The relevance of the entity nodes is determined by the number of the file nodes linked by the entity nodes, and the relevance of each entity node in the searching sub-graph is equal to the number of the file nodes linked by the entity nodes. Assuming that fig. 2 is a search sub-graph, the correlation degree between the entity node a and the entity node B is 3.
In step 303, a "file node" relevance is calculated. The file node relevance is determined by the product of the weight of each link edge of the file node and the relevance of the link to the entity node. When the file node has multiple link edges, the product of each link is calculated and summed.
Taking fig. 2 as an example of the above calculation rule, assume that fig. 2 is a search sub-graph, the relevance of the entity node a and the entity node B is a relativity of a relativity entity a and a relativity entity B, respectively, linkValue3 is a linking edge weight between the entity node a and the file node C, and linkValue4 is a linking edge weight between the entity node B and the file node C. The method for calculating authority of the file node c is as follows:
relavancyFileC=relavancyEntityA·linkValue3+relavancyEntityB·linkValue4。
and S4, integrating and sequencing search results based on authority degrees and relevancy degrees.
According to the method, the influence of authority and relevance is comprehensively considered in the search result sorting, so that the ranking degree rankValue=Ω -authenticayFile+ (1- Ω) ·λ -relavancyFile of each file node is required to be comprehensively considered, λ is introduced to ensure that the authority and relevance are similar in measurement level, and Ω is used for determining the ranking weight of the authority and relevance in the file nodes. The file node here considers only the retrieved files during each search.
If the authenticateFile has a median of a and the relavancyFile has a median of b, λ may be a/b. Construction of m times of search results and manual sequencing of samples, and recording n times of search results of the ith time i Artificially ordered samples, n of the ith search result may be obtained for each given Ω i The results are automatically ordered.The manual sorting samples are considered as correct sorting results, the error rate of the minimized automatic sorting results is taken as an optimization target, and the omega value can be obtained through an equidistant sampling method (omega is delta (determined by the required precision, such as 0.01) from 0 to 1 each time) or a one-dimensional searching algorithm (such as Newton method).

Claims (5)

1. The method is characterized in that a semantic level link relation is established for file resources in the field search, and then calculation is performed from two aspects of authority and relativity, and finally fusion ordering of search results is realized; the method comprises the following specific steps:
step (1): dynamic construction of a domain map facing to search sequencing; taking various file sets in the field as input, and constructing a field map;
the step (1) comprises the following steps:
step (11): the 'entity node', 'associated edge' constructs and 'associated weight' calculates;
obtaining accurate and unambiguous named entities by carrying out entity identification, entity disambiguation and coreference resolution on text contents of the file, wherein the entities form 'entity nodes' in the domain map; the method comprises the steps of identifying association relations among entities in text content to obtain potential candidate relations, and obtaining accurate and unambiguous association relations through disambiguation and resolution of the relations, wherein the association relations form an association edge in a domain map; each association edge has an association weight, and the weight size represents the tightness degree of the relationship between the entities;
step (12): the "file node" and the "link edge" are constructed and calculated with the "link weight"; the file nodes in the domain map and the files to be searched are mutually bijective, and are directly constructed, and each file node in the map represents one file to be searched; if a certain 'entity node' is extracted from the file content corresponding to a certain 'file node', a link edge exists between the entity node and the file node; the calculation of the link weight comprises two processes of initial link weight calculation and normalization calculation;
step (13): dynamically updating the map increment;
the change forms of the file set to be searched comprise a new added file, a deleted file and a modified file, and the extraction of entity nodes, file nodes, associated edges and linked edges corresponding to the new added file is required to be completed according to the steps (11) and (12) in the case of the new added file; updating the weight of the affected associated edge and the link edge; in the case of deleting files, corresponding file nodes and associated edges thereof need to be deleted first; if the entity node is caused to have no connected link edge, deleting the entity node and the related edge thereof; updating the weight of the affected associated edge and the link edge; in the case of modifying the file, updating the domain map according to the equivalent operation of deleting and adding;
step (2): offline calculation of the authority increment of the file node based on the full graph; taking the domain map in the step (1) as input, and calculating to obtain authority of each file node in the domain map;
in the step (2), the entity nodes of the domain map are used as states which can be reached by the system, the transition probability among the states is determined by the association edge weight among the entity nodes, the whole system forms a Markov chain, and the stable distribution of the Markov chain is the authority degree of the entity nodes, and the method specifically comprises the following steps:
step (21): authority degree design of 'entity node';
step (22): the authority degree increment of the entity node is calculated; based on a Monte Carlo method, simulating the behavior of a user accessing entity nodes by utilizing random walk, and when the domain map changes, incrementally updating the random walk process aiming at the affected entity nodes to realize incremental calculation of the authority degree of the entity nodes;
step (23): calculating authority degree of the file node; the authority level File of the file node is equal to the sum of the link weight values linkValue of the file node multiplied by the authority level authority of the linked entity node, namely
Figure FDA0004076750710000021
Where authorityFile (p) represents the authority of the file node p; authorityEntity (q) the authority of the entity node q, and there is a link edge between the file node p and the entity node q; linkValue (p, q) represents a link edge weight between the file node p and the entity node q;
step (3): on-line calculation of file node relevance based on searching subgraph; taking a domain map and a user search term as input, extracting a search sub-map related to search from the whole domain map, and calculating the relevance of each file node in the sub-map;
the step (3) specifically comprises the following steps:
step (31): searching for sub-graph construction; the searching subgraph is constructed according to the related results obtained by each search and is a subgraph of the domain map; each related result obtained by the search engine in a keyword matching mode corresponds to a certain file node, and the file nodes form a file node of a search sub-graph; the linked edges of the file nodes in the domain map and the linked entity nodes respectively form the linked edges and the entity nodes of the search subgraph; the entity nodes in the searching subgraph keep the association relation among the entity nodes according to the structure of the domain map to form the association edges of the searching subgraph;
step (32): searching the relevance calculation of the entity node of the subgraph; the relevance of the entity nodes is determined by the number of the file nodes linked by the entity nodes, and the relevance of each entity node in the searching sub-graph is equal to the number of the file nodes linked by the entity nodes;
step (33): searching for the "file node" correlation calculation of the subgraph; the relevance of the file node is determined by the product of the weight of each link edge of the file node and the relevance of the link to the entity node; when the file node has a plurality of link edges, calculating the product of each link and summing;
step (4): the search results based on the authority degree and the relativity degree are fused and sequenced; and (3) comprehensively calculating the ranking degree of the file nodes by taking authority degree and correlation degree of each file node in the searching subgraph in the step (3) as input in the calculation process, and sequencing according to the ranking degree and returning to the user.
2. The method according to claim 1, wherein the calculating of the "association weight" in the step (11) includes two steps of initial association weight calculation and normalization; the method comprises the following steps: if the entities at the two ends of the associated edge co-appear in k files in total, the initial association weight corrvue' (i, j) of the associated edge is equal to k; after the association weights of all the association edges are calculated, the initial association weights corrValue' (i, j) sent by the same entity node are normalized according to the numerical proportion, and the association weights corrValue (i, j) of the association edges are obtained.
3. The method according to claim 1, wherein the initial link weight calculation in the step (12) considers two aspects, namely, a degree of association α of the entity node to the file node and a degree of importance β of the file node to the entity node; the method comprises the following steps:
(1) when the importance degree of the file node to the entity node is difficult to manually classify or evaluate, for different file nodes beta=1, the initial link weight linkValue' =alpha, and after the initial weight of each link edge is calculated, normalizing the initial weight of each link edge connected with the same file node, thereby obtaining the link weight linkValue; alpha adopts the following calculation method:
α=TF(t,d)·IDF(t,d)·α 1 (t,d)
where t is the entity name of the entity node, d is the file to be retrieved, TF (t, d) is the frequency of occurrence of t in d, idf=log (N/(N) t,d +γ)), N is the number of files in the file set to be retrieved, N t,d For the number of files containing entity t, gamma takes 0.01 to ensure that denominator is not zero, alpha 1 (t, d) is a position coefficient, when the entity name t is in the title, abstract and key word, the position coefficient is greater than 1, otherwise, the position coefficient is 1;
(2) when the entity and the file can be classified and scored manually according to different fields, the beta value is set for the importance degree of different types of files in each field, and at this time, the initial link weight value' =α·β.
4. The method according to claim 1, wherein the incremental calculation of authority of the entity node adopts a cyclic starting point mode, M random walk processes are respectively started by taking N entity nodes as starting points, n×m random walk processes are used, each step of random walk directly accesses a new node with probability of (1-alpha), and the entity node i is walked to the entity node j with probability of alpha-corrvue (i, j), finally, the number v (i) of times that any entity node i is accessed is counted, and then v (i) is divided by the sum of the accessed times of all entity nodes, so that the average access probability of the node i, namely authority of the entity node i, is obtained;
when the domain map structure changes, authority degrees of all entity nodes are calculated in an incremental mode; the specific method comprises the following steps: firstly, recording a random walk process before each round of map structure change, and counting entity nodes generating change in the map of the round, wherein the entity nodes comprise adding and deleting of the entity nodes, the set is marked as X and associated sides, the set is marked as Y, the entity nodes with association relation with X or the entity nodes connected with Y are marked as a set Z, and then X U Z is a trigger node needing to update a flow in the random walk; the updating process is to examine N multiplied by M random walk processes of the previous round, find a first trigger node in each random walk process, reserve random walk before the trigger node, continue to carry out subsequent random walk according to a new domain map, and calculate authority of each entity node.
5. The method according to claim 1, wherein the step (4) is specifically:
the search result ordering needs to comprehensively consider the influence of authority and relevance, so that the ranking degree of each file node is as follows:
rankValue=Ω·authorityFile+(1-Ω)·λ·relavancyFile,
lambda is introduced to ensure that authority and correlation measurement levels are similar, and omega is used for determining weights of authority and correlation in file node ranking; the file node herein considers only the files retrieved during each search,
if the median of the authenticatyFile is a and the median of the delavancnyFile is b, lambda takes a/b; construction of m times of search results and manual sequencing of samples, and recording n times of search results of the ith time i Manually ordering the samples to obtain n of the ith search result for each given Ω i Automatically sequencing results; the manual sequencing samples are considered as correct sequencing results, the error rate of the minimized automatic sequencing results is taken as an optimization target, and omega values are obtained through an equidistant sampling method, wherein omega is delta from 0 to 1 each time, or a one-dimensional searching algorithm is adopted.
CN201911146865.8A 2019-11-21 2019-11-21 Domain searching and sorting method based on dynamic map link analysis Active CN111079035B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911146865.8A CN111079035B (en) 2019-11-21 2019-11-21 Domain searching and sorting method based on dynamic map link analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911146865.8A CN111079035B (en) 2019-11-21 2019-11-21 Domain searching and sorting method based on dynamic map link analysis

Publications (2)

Publication Number Publication Date
CN111079035A CN111079035A (en) 2020-04-28
CN111079035B true CN111079035B (en) 2023-04-28

Family

ID=70311460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911146865.8A Active CN111079035B (en) 2019-11-21 2019-11-21 Domain searching and sorting method based on dynamic map link analysis

Country Status (1)

Country Link
CN (1) CN111079035B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737489A (en) * 2020-06-17 2020-10-02 广联达科技股份有限公司 Building information retrieval method, device, equipment and readable storage medium
CN113343046B (en) * 2021-05-20 2023-08-25 成都美尔贝科技股份有限公司 Intelligent search ordering system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182186A (en) * 2016-12-08 2018-06-19 广东精点数据科技股份有限公司 A kind of Web page sequencing method based on random forests algorithm
CN109710701A (en) * 2018-12-14 2019-05-03 浪潮软件股份有限公司 A kind of automated construction method for public safety field big data knowledge mapping

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8682932B2 (en) * 2012-02-16 2014-03-25 Oracle International Corporation Mechanisms for searching enterprise data graphs

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182186A (en) * 2016-12-08 2018-06-19 广东精点数据科技股份有限公司 A kind of Web page sequencing method based on random forests algorithm
CN109710701A (en) * 2018-12-14 2019-05-03 浪潮软件股份有限公司 A kind of automated construction method for public safety field big data knowledge mapping

Also Published As

Publication number Publication date
CN111079035A (en) 2020-04-28

Similar Documents

Publication Publication Date Title
Guo et al. RésuMatcher: A personalized résumé-job matching system
US9171078B2 (en) Automatic recommendation of vertical search engines
CN110704743B (en) Semantic search method and device based on knowledge graph
US7877389B2 (en) Segmentation of search topics in query logs
RU2501078C2 (en) Ranking search results using edit distance and document information
West et al. Mining missing hyperlinks from human navigation traces: A case study of Wikipedia
US7853599B2 (en) Feature selection for ranking
CN104361102B (en) A kind of expert recommendation method and system based on group matches
US20150227589A1 (en) Semantic matching and annotation of attributes
US20120124034A1 (en) Co-selected image classification
US20080294628A1 (en) Ontology-content-based filtering method for personalized newspapers
CN111797214A (en) FAQ database-based problem screening method and device, computer equipment and medium
EP3937029A2 (en) Method and apparatus for training search model, and method and apparatus for searching for target object
Omidvar et al. Context based user ranking in forums for expert finding using WordNet dictionary and social network analysis
CN111382283B (en) Resource category label labeling method and device, computer equipment and storage medium
Wu et al. Extracting topics based on Word2Vec and improved Jaccard similarity coefficient
CN111079035B (en) Domain searching and sorting method based on dynamic map link analysis
CN111159563A (en) Method, device and equipment for determining user interest point information and storage medium
CN112202889B (en) Information pushing method, device and storage medium
Li et al. Crowdsourced top-k queries by pairwise preference judgments with confidence and budget control
Rezaeenour et al. Developing a new hybrid intelligent approach for prediction online news popularity
Lu et al. Semantic link analysis for finding answer experts
Lai et al. Question routing by modeling user expertise and activity in cQA services
KR102008387B1 (en) Patent search engine evaluation system based on non-recall and method thereof
Bragilovski et al. Searching for class models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant