WO2021124933A1

WO2021124933A1 - Information processing system and information processing method

Info

Publication number: WO2021124933A1
Application number: PCT/JP2020/045268
Authority: WO
Inventors: 桂太杉原
Original assignee: 桂太杉原
Priority date: 2019-12-20
Filing date: 2020-12-04
Publication date: 2021-06-24
Also published as: JPWO2021124933A1; JP6919961B1

Abstract

According to one aspect of the present disclosure, a document network consisting of a plurality of documents that are connected by at least a weak connection is identified. A particular document included in the document network and having an in-link from at least two documents is identified. A plurality of sub-networks are identified on the basis of the particular document. A score for each of the plurality of documents constituting the document network is calculated by executing individual processing on each of the sub-networks. In the individual processing, a score for each document included in a corresponding sub-network is calculated. For each duplicate document belonging to two or more sub-networks, the scores of the corresponding duplicate documents in the two or more sub-networks are merged.

Description

Information processing system and information processing method

Cross-reference to related applications

This international application claims priority based on Japanese Patent Application No. 2019-230822 filed with the Japan Patent Office on December 20, 1991, and Japanese Patent Application No. 2019-230822. The entire contents of the issue are incorporated in this international application by reference.

This disclosure relates to an information processing system and an information processing method.

The technology for ranking web pages is already known (see Patent Document 1). In a simple example of this technique, the page rank is determined higher for web pages linked from more web pages. To calculate the page rank, an adjacency matrix that binary-represents the connection relationship (in other words, the connection state) between web pages with

values

0 and 1, and the

value

0, 1 that is a modification of the adjacency matrix and other real numbers are used. The matrix included in the components is used.

Japanese Unexamined Patent Publication No. 2017-102712

In the conventional method of determining the page rank based on the above-mentioned adjacency matrix, it is necessary to determine the virtual connection relationship from all web pages to all web pages in addition to the actual connection relationship between web pages. For this reason, it is difficult to give a good ranking of web pages.

Therefore, according to one aspect of the present disclosure, it is desirable to be able to provide a technique capable of scoring a plurality of documents more appropriately than before.

According to one aspect of the disclosure, an information processing system is provided. The information processing system includes a document network discriminating unit, a document discriminating unit, a sub-network discriminating unit, and a score calculating unit.

The document network discriminating unit is configured to discriminate a document network composed of at least a plurality of documents connected by a weak connection based on data representing a connection relationship between documents. The document discriminating unit is configured to discriminate a specific document having an inlink from two or more documents included in the discriminated document network.

The sub-network discriminating unit is configured to discriminate a plurality of sub-networks included in the document network based on the discriminated specific document. The score calculation unit is configured to calculate the scores of each of the plurality of documents constituting the document network by executing individual processing for each of the plurality of determined sub-networks. In the individual processing, the score of each document included in the corresponding subnetwork is calculated.

The document network contains one or more duplicate documents. Each duplicate document is a document that belongs to two or more subnetworks out of a plurality of subnetworks. The score calculation unit calculates one score for the corresponding duplicate document by integrating the scores of the corresponding duplicate document in two or more subnetworks for each of the one or more duplicate documents.

According to the information processing system according to one aspect of the present disclosure, it is possible to appropriately perform scoring of a plurality of documents in a document network including a document having inlinks from two or more documents. Therefore, the information processing system according to one aspect of the present disclosure is very useful for scoring a plurality of documents in a document network having a complicated connection relationship.

According to one aspect of the present disclosure, the sub-network discriminating unit may discriminate a plurality of sub-networks having a specific document as a boundary. The plurality of subnetworks may have at least two or more upstream subnetworks and downstream subnetworks. Two or more upstream subnetworks correspond to two or more inlinks of a particular document, and in each of the upstream networks, the particular document has one corresponding inlink. The downstream subnetwork is connected to the specific document through the outlink that the specific document has.

In this case, the specific document is a duplicate document that belongs to two or more upstream subnetworks. The score calculation unit calculates one integrated score for a specific document by integrating the scores in the upstream subnetwork of the specific document, and sets the score of each document belonging to the downstream subnetwork as the integrated score of the specific document. It may be calculated based on the standard.

According to one aspect of the present disclosure, the sub-network discriminating unit may discriminate the sub-network for each in-link of the specific document as a plurality of sub-networks. Each of the sub-networks may include a group of documents located upstream of the corresponding inlink, a specific document, and a group of documents located downstream of the outlink of the specific document.

According to one aspect of the present disclosure, the sub-network discriminating unit may discriminate the sub-network for each combination of in-link and out-link possessed by the specific document as a plurality of sub-networks. Each of the sub-networks may include a group of documents located upstream of the inlink corresponding to the combination, a specific document, and a group of documents located downstream of the outlink corresponding to the combination.

According to one aspect of the disclosure, integration may be achieved by calculating the sum of the scores of the corresponding duplicate documents in two or more subnetworks. According to one aspect of the disclosure, integration may be achieved by calculating representative scores of the corresponding duplicate documents in two or more subnetworks. The representative value may be an average value.

According to one aspect of the present disclosure, the individual processing includes the processing of calculating the score of each document contained in the corresponding subnetting using the Hermitian adjacency matrix based on the connection relationship between the documents in the corresponding subnetting. You may.

According to one aspect of the present disclosure, the individual processing is virtually out to the trailing document by adding a dummy document to the trailing document that does not have an outlink and is included in the corresponding subnetworks. It may include a process of changing the corresponding subnetwork so as to provide a link. The individual processing may include a processing that defines an Hermitian adjacency matrix based on the connection relationship between documents in the modified subnetwork.

In one aspect of the disclosure, the Hermitian adjacency matrix is an N-by-N Hermitian matrix based on the connection relationships between the documents D [m] (1 ≤ m (integer) ≤ N) that make up the corresponding subnetworks. May be good.

In the Hermeet adjacency matrix, the component h (p, q) in the p-th row and q-th column has a link from the document D [p] to the document D [q] and the document D [q] to the document D [p]. A value of 1 when a link to is present, and a value when neither the link from document D [p] to document D [q] nor the link from document D [q] to document D [p] is present. When 0 is shown and there is a link from document D [p] to document D [q] but there is no link from document D [q] to document D [p], the value + i (i is an imaginary unit). Shown, when there is no link from document D [p] to document D [q] but there is a link from document D [q] to document D [p], the diagonal component is zero, indicating the value -i. It may correspond to the Elmeat matrix of.

The individual processing may include a process of transforming the Hermitian adjacency matrix to define a special Hermitian adjacency matrix and using the eigenvectors of the special Hermitian adjacency matrix to calculate the score of each document contained in the corresponding subnetwork. ..

The Hermitian adjacency matrix may be deformed so that when each component of the eigenvector is placed on the complex plane, all the components fall within the angle range of π / 2 radians on the complex plane.

According to one aspect of the present disclosure, a plurality of documents are scored using a special Hermitian adjacency matrix corresponding to the Hermitian adjacency matrix in which the connection relationship between documents can be represented by four values of 1,0, + i, and -i. .. Therefore, it is not necessary to determine a virtual connection relationship from all documents to all documents, and scoring of each document based on the connection relationship between documents can be realized more appropriately than before.

According to one aspect of the present disclosure, a computer program may be provided. The computer program may be a computer program for operating the computer as at least one of the document network discrimination unit, the document discrimination unit, the subnetwork discrimination unit, and the score calculation unit included in the above-mentioned information processing system. ..

According to one aspect of the present disclosure, an information processing method executed by a computer may be provided. The information processing method may include determining a document network composed of a plurality of documents connected by at least a weak connection based on data representing a connection relationship between documents.

The information processing method may include discriminating a specific document having an inlink from two or more documents included in the discriminated document network. The information processing method may include discriminating a plurality of sub-networks included in the document network based on the discriminated specific document.

In the information processing method, each of the plurality of documents constituting the document network is executed by executing a process of calculating the score of each document included in the corresponding subnetwork as an individual process for each of the plurality of determined subnetworks. It may include calculating the score of.

The document network may include one or more duplicate documents that belong to two or more subnetworks among the plurality of subnetworks. Calculating the scores for each of the documents that make up the document network corresponds to each of the duplicate documents by integrating the scores of the corresponding duplicate documents in two or more subnetworks. It may include calculating one score for duplicate documents.

According to one aspect of the present disclosure, an information processing method can be provided that includes at least part of the procedure performed by the information processing system described above.

According to one aspect of the present disclosure, an information processing system may be provided that includes a processor and a memory containing instructions for causing the processor to perform a particular process. The specific process may be a process corresponding to the above-mentioned information processing method.

It is a block diagram which shows the structure of the information processing system of 1st Embodiment. It is a functional block diagram about an information processing system. It is a functional block diagram which shows the detail of a query response part. It is a flowchart which shows the process which the 2nd scoring part executes. It is a figure which shows the 1st document network which does not include a join node. It is a figure which shows the 2nd document network which contains a join node. It is a flowchart which shows the 1st part of the score calculation process executed by the 2nd scoring part. It is a flowchart which shows the sub-processing which the 2nd scoring unit executes. It is a figure which shows the 1st document network which attached the dummy node. It is a figure which shows the 1st example of the special Hermitian adjacency matrix. FIG. 11A is an explanatory diagram regarding the arrangement of each node on the complex plane, and FIG. 11B is a diagram illustrating a score calculation method. It is a figure which shows the 2nd example of the special Hermitian adjacency matrix. It is a figure which shows the subgraph which leads from the tip node to the 0th layer connection node. It is a figure which shows the subgraph to which the dummy node is added. It is a flowchart which shows the 2nd part of the score calculation process. It is a figure which shows the subgraph which leads to the 1st layer connection node. It is a figure which shows the subgraph which leads to the 2nd layer connection node. It is a figure which shows the subgraph which leads to the 3rd layer connection node. It is a flowchart which shows the 3rd part of the score calculation process. It is a figure which shows the non-circular type subgraph which leads from a join node to a rear end node. It is a figure which shows the circulation type subgraph. It is a flowchart which shows the score calculation process of 2nd Embodiment. It is a figure which shows the example of the 3rd document network. 24A and 24B are diagrams showing a subgraph of the third document network in the second embodiment. It is a flowchart which shows the score calculation process of 3rd Embodiment. 26A and 26B are diagrams showing a subgraph of the third document network in the third embodiment. It is a figure which shows the example of the 4th document network. It is a figure which shows the example of the 5th document network. 29A and 29B are diagrams showing a subgraph of the fourth document network. It is a flowchart which shows the sub-processing which the 2nd scoring unit executes in 4th Embodiment. 6 is a flowchart showing a sub-process executed by the second scoring unit in the fifth embodiment. It is explanatory drawing about the special Hermitian adjacency matrix in 5th Embodiment. It is explanatory drawing about the special Hermitian adjacency matrix in 5th Embodiment.

1 ... Information system, 5 ... User terminal, 10 ... Computational unit, 11 ... Processor, 15 ... Memory, 20 ... Storage unit, 30 ... Communication unit, 110 ... Crawler, 120 ... Indexer, 130 ... Query processing unit, 140 ... Query response unit, 141 ... 1st scoring unit, 143 ... 2nd scoring unit, 145 ... ranking unit, 147 ... output unit, 210 ... page repository, 220 ... index storage unit.

An exemplary embodiment of the present disclosure will be described below with reference to the drawings.

[First Embodiment]
The information processing system 1 of the present embodiment shown in FIG. 1 is configured to provide the user terminal 5 with a list of documents corresponding to the search query in response to the search query input from the user terminal 5. The document is a web document, specifically a web page. The information processing system 1 functions as a search engine that can be used from the user terminal 5 through the communication network. The communication network is, for example, the Internet.

The information processing system 1 includes a calculation unit 10, a storage unit 20, and a communication unit 30. The arithmetic unit 10 includes a processor 11 and a memory 15. The storage unit 20 stores computer programs and data executed by the processor 11. The storage unit 20 can include either a hard disk drive or a solid state drive.

The communication unit 30 includes a communication interface capable of communicating with the user terminal 5. The calculation unit 10 realizes a search function by executing a process according to a computer program stored in the storage unit 20. Specifically, the process for realizing the search function is executed by the processor 11. The information processing system 1 briefly shown in FIG. 1 may be composed of one or more cooperating server devices.

In the search function, the calculation unit 10 functions as the crawler 110, the indexer 120, the query processing unit 130, and the query response unit 140 shown in FIG. 2, and the storage unit 20 functions as the page repository 210 and the index storage unit 220. It is realized by doing.

The crawler 110 is configured to collect a plurality of web pages existing in a communication network, like a well-known crawler. The web pages collected by the crawler 110 are stored in the page repository 210.

The indexer 120 is configured to analyze and index each web page stored in the page repository 210. Indexing generates index data for each web page. The index data for each web page is stored in the index storage unit 220.

Each index data includes a content index and a structural index. The content index contains information on the corresponding web page keywords, titles, and key texts. The structure index contains information that represents the hyperlink structure of the corresponding web page. A group of index data represents the connection relationship between web pages.

The query processing unit 130 receives a search query from a user and extracts a group of related pages, which is a set of web pages corresponding to the search query, from all the web pages. All the web pages referred to here correspond to a group of web pages found in the communication network by the crawler 110 and whose index data is registered in the index storage unit 220.

Specifically, the query processing unit 130 extracts from all the web pages a set of web pages including the vocabulary corresponding to the search query as a related page group based on the content index of the web page stored in the index storage unit 220. To do. The extracted related page group information is provided to the query response unit 140.

The query response unit 140 transmits a search result list in which the related page groups are arranged in the order of page rank to the user terminal 5 as response data for the search query based on the information of the related page group provided.

Each related page is ranked higher as the web page has a higher degree of relevance and importance to the search query, and is placed at the top of the search result list. The search result list is configured as a web page having links to the listed related pages, similar to the response data from a conventional search engine. The link referred to here is a so-called hyperlink.

As shown in FIG. 3, the query response unit 140 includes a first scoring unit 141, a second scoring unit 143, a ranking unit 145, and an output unit 147. The first scoring unit 141 scores each of the related pages for the related page group corresponding to the search query based on the degree of relevance to the search query of the page content. Specifically, the first scoring unit 141 is configured to give a content score as the first score to each of the related pages.

The second scoring unit 143 operates independently of the search query, and is configured to give each of the web pages collected by the crawler 110 an important score based on the connection relationship between the web pages as the second score. Will be done.

The second score is calculated so that the web page that is estimated to be more important from the connection relationship between the web pages shows a larger value. The second score is higher for web pages with many links, for web pages linked from web pages with high important scores, and for web pages with links from web pages with few links to other web pages. Is shown.

The ranking unit 145 is based on the first score calculated by the first scoring unit 141 for each of the related pages and the second score calculated by the second scoring unit 143 for each of the related pages. It is configured to calculate the page rank of each related page.

According to one example, each page rank of the related page corresponds to the weighted sum of the first score and the second score. For example, using the first score X1, the second score X2, and the weighting coefficient α that takes a value between 0 and 1, the page rank Y of each related page is expressed by the formula Y = α · X1 + (1-α). -Can be calculated according to X2. The page rank of each related page may be understood as an overall score that includes a content score based on a search query and an important score based on the connection relationship between web pages.

The output unit 147 sends a search query as a search result list, which is a page list in which related pages corresponding to the search query are arranged in descending order of page rank based on the page rank of each related page calculated by the ranking unit 145. It is transmitted to the original user terminal 5.

Specifically, the second scoring unit 143 periodically executes the process shown in FIG. 4, and based on the latest index data stored in the index storage unit 220, at least a weakly concatenated web page. For each group, the second score of each web page belonging to the corresponding web page group is calculated.

When the process shown in FIG. 4 is started, the second scoring unit 143 extracts one or more document networks from all the web pages (S110). By referring to the index data stored in the index storage unit 220, the second scoring unit 143 extracts at least each of the web page groups connected by a weak connection as one document network among all the web pages. can do. A document network consists of at least a group of web pages that are weakly linked.

A network consisting of at least a group of nodes connected by a weak connection follows a link from any one node of the group of nodes belonging to the network to the remaining nodes when the connection direction of the link between the nodes is ignored. Corresponds to a reachable network.

That is, the document network is composed of a group of web pages that are at least indirectly connected to the remaining web pages when any one of the web pages belonging to the document network ignores the connection direction of the link.

5 and 6 show examples of two different document networks. The document network is represented as a directed graph. One circle in FIGS. 5 and 6 corresponds to one node, in other words, one web page. The arrow in the figure indicates that a link (hyperlink) to the web page corresponding to the end point of the arrow is formed on the web page corresponding to the start point of the arrow. That is, it means that it is possible to move from the web page corresponding to the start point of the arrow to the web page corresponding to the end point of the arrow via a link.

Each web page in the document network shown in FIGS. 5 and 6 is clearly at least indirectly connected to other web pages in the document network when the direction of the arrow is ignored. In the following, each of a plurality of web pages in one document network is also referred to as a k-th web page by using the number k shown in a circle in the figure. Each web page in the document network is also referred to as a node. The k-th node means the k-th web page.

In S120 following S110, the second scoring unit 143 selects one or more of the extracted document networks as the document network to be processed. After that, the second scoring unit 143 executes the score calculation process shown in FIG. 7 in order to calculate the second score of each node in the document network to be processed (S130).

The second scoring unit 143 repeatedly executes the score calculation process until the score calculation process is executed for all the document networks (S120-S140). That is, the second scoring unit 143 sequentially selects each document network as a processing target (S120), and executes a score calculation process for the selected document network to be processed (S130).

When the second scoring unit 143 finishes the score calculation process for all the document networks (Yes in S140), the process shown in FIG. 4 ends. In this way, the second scoring unit 143 calculates the second score of each web page constituting the corresponding document network for each document network. The calculated second score is provided to the ranking unit 145.

When the score calculation process (S130) shown in FIG. 7 is started, the second scoring unit 143 determines the tip node having no inlink in the document network to be processed (S210).

A node with an in-link means a node in which a link to this node is formed in another node. In other words, a web page with an in-link means a web page in which a link (hyperlink) that can move to this web page is formed in another web page. In the following, a node that does not have an inlink is also referred to as a "leading node".

In FIG. 5, the nodes having an inlink are the second, third, fourth, fifth, sixth, seventh, and eighth nodes, and the node having no inlink is the first node. The nodes having no inlink in FIG. 6 are the first, second, fourth, tenth, thirteenth, and fourteenth nodes.

When the document network to be processed is a document network that does not have a leading node, a dummy node DP that does not have an inlink is added to the document network. Specifically, a dummy node DP having outlinks to all the nodes in the document network is added to the document network.

A node with an outlink means a node with a link to another node. In other words, a web page with an outlink means a web page in which a link (hyperlink) that can be moved to another web page is formed. In the following, a node that does not have an outlink is also referred to as a "rear end node".

When a dummy node DP having no in-link is added to the document network, the second scoring unit 143 considers the document network to which the dummy node DP is added as a document network to be processed, and adds the dummy node DP. Is determined as the tip node.

In S220 following S210, the second scoring unit 143 determines a join node having a plurality of inlinks. One join node means one node having a plurality of inlinks.

The document network shown in FIG. 5 does not have a join node. The join nodes in the document network shown in FIG. 6 are the third, sixth, seventh, twelfth, and fifteenth nodes indicated by double circles. For example, the third node has an inlink from the first node and an inlink from the second node.

When the processing in S220 reveals that the document network to be processed has a join node (Yes in S230), the second scoring unit 143 executes the processing in S250. When it is found that the document network to be processed does not have a join node, the second scoring unit 143 executes the process of S240.

In S240, the second scoring unit 143 calculates the score of each node in the document network to be processed by using the Hermitian adjacency matrix H corresponding to the document network. The calculated score of each node is a score based on the connection relationship between the nodes.

The second scoring unit 143 outputs the score of each node calculated in S240 to the ranking unit 145 as the second score of each web page (S245). After that, the score calculation process shown in FIG. 7 is completed.

In S240, the second scoring unit 143 can calculate the score of each node by the same method as the international application PCT / JP2018 / 026560 filed on July 13, 2018 by the same applicant. Specifically, the second scoring unit 143 can calculate the score of each node by executing the sub-processing shown in FIG.

In the following, in order to explain the score calculation method, each of the nodes constituting the document network to be processed is expressed as node D [m]. The variable m takes an integer value from the values 1 to N (1 ≦ m ≦ N). N is the number of nodes N of the document network to be processed. Node D [m] corresponds to the mth node in the corresponding document network, i.e. the mth web page.

When the sub-processing shown in FIG. 8 is started, the second scoring unit 143 generates the Hermitian adjacency matrix H corresponding to the document network to be processed (S1010). Specifically, the second scoring unit 143 generates the Hermitian adjacency matrix H in which the connection relationships between the nodes in the document network to be processed are represented by the

values

1, 0, + i, and −i. Here, i represents an imaginary unit.

The Hermitian adjacency matrix H is a matrix of N rows and N columns (NxN) corresponding to the number N of nodes of the document network to be processed, and each component takes any value of 1, 0, + i, or -i. It is a matrix. The expression "component h (p, q)" in the following indicates the component of the p-th row and q-th column in the Hermitian adjacency matrix H.

When there is a link from node D [p] to node D [q] and a link from node D [q] to node D [p] in the document network to be processed, the corresponding component h (p) , Q) is set to the value 1.

When neither the link from node D [p] to node D [q] nor the link from node D [q] to node D [p] exists, the corresponding component h (p, q) has a value of 0. Is set to. Therefore, the diagonal component h (p, p) of the Hermitian adjacency matrix H has a value of zero.

When there is a link from node D [p] to node D [q] but no link from node D [q] to node D [p], the corresponding component h (p, q) is the value + i. Is set to. When there is no link from node D [p] to node D [q] but there is a link from node D [q] to node D [p], the corresponding component h (p, q) is the value- Set to i.

The second scoring unit 143 sets the value of each component h (p, q) as described above according to the connection relationship between the nodes of the document network to be processed, and generates the Hermitian adjacency matrix H (S1010).

When the value of each component h (p, q) is set according to the above-mentioned rule, the qth row located symmetrically with the component h (p, q) in the p-th row and q-th column. The component h (q, p) in the first column is a complex conjugate of the component h (p, q). Therefore, the Hermitian adjacency matrix H is a Hermitian matrix.

In S1010, a dummy node DP is added to each rear-end node having no outlink in the document network to be processed before the Hermitian adjacency matrix H is generated.

As shown in FIG. 9, the dummy node DP added to the rear end node is a node that has one inlink from the rear end node but does not have an outlink. The document network shown in FIG. 9 is a document network in which a dummy node DP is added to each of the fifth and eighth nodes having no outlink in the document network shown in FIG. In S1010, the Hermitian adjacency matrix H is generated for the document network to be processed to which the dummy node DP is added in this way.

In the following S1020, the second scoring unit 143 generates a special Hermitian adjacency matrix H1 which is a modification of the generated Hermitian adjacency matrix H. The transformation is performed so that when each component of the eigenvector V of the special Hermitian adjacency matrix H1 is arranged in the complex plane, all of the components fall within the angular range of π / 2 radians. At the time of deformation, the second scoring unit 143 calculates the first correction amount C1 and the second correction amount C2.

The value of the parameter n included in the first correction amount C1 and the second correction amount C2 is defined as a natural number equal to or more than the maximum value of the number N of nodes in one or more document networks extracted in S110. A dummy node DP may be added to the document network as described above. In this case, the number of nodes N is the number of nodes in the document network including the dummy node DP. By defining the value of the parameter n in this way, all the components of the eigenvector V fall within the angle range of π / 2 radians.

The larger the value of the parameter n, the more the component of the eigenvector V falls within the angle range smaller than the angle range of π / 2 radians. The purpose of keeping all of the components within the π / 2 radian angle range is to ensure that all of the components fit within one quadrant on the complex plane. For good calculation of the second score, the parameter n is set to a small value within the range where this purpose can be achieved. The above-mentioned second correction amount C2 contributes to the adjustment of the angle range, and the first correction amount C1 helps to avoid changing the absolute value of the matrix component by the second correction amount C2.

After calculating the first correction amount C1 and the second correction amount C2, the second scoring unit 143 replaces the component of the value + i in the Hermitian adjacency matrix H with the value C1 (C2 + i), and replaces the component showing the value −i with the component showing the value −i. Replace with the value C1 (C2-i). The second scoring unit 143 further sets the value C1 (C2 + i) of the component in each row in the Hermitian adjacency matrix H after the substitution, the number of components showing the value C1 (C2 + i) in the same row, and the number of components showing the value 1. Change to the value {C1 (C2 + i) / W} divided by the sum of W of.

The second scoring unit 143 further sets the value C1 (C2-i) of the component changed to the value {C1 (C2 + i) / W} and the component located symmetrically across the diagonal component to the value {C1. Change to the complex conjugate {C1 (C2-i) / W} of (C2 + i) / W}. The second scoring unit 143 generates the Hermitian matrix defined by such substitution and modification as the special Hermitian adjacency matrix H1.

A specific example of the transformation procedure from the Hermitian adjacency matrix H to the special Hermitian adjacency matrix H1 is shown in FIG. For example, of the total N components h (p1,1), h (p1,2), ..., H (p1, N) in the first row, the component that takes the value + i and the component that takes the value 1 are the total W1. In the case of the number, each component indicating the value + i in the p1 row in the Hermitian adjacency matrix H is changed to the value {C1 (C2 + i) / W1}.

Of the total N components h (p2,1), h (p2,2), ..., H (p2, N) in the second row, the component that takes the value + i and the component that takes the value 1 are W2 in total. In some cases, each component indicating the value + i in row p2 in the Hermitian adjacency matrix H is changed to the value {C1 (C2 + i) / W2}.

Furthermore, the value of the component indicating the value -i is changed to the complex conjugate of the component at a symmetrical position with the diagonal component in between. For example, the value of the component h (p1, q1) showing the value {C1 (C2 + i) / W1} and the component h (q1, p1) symmetrically across the diagonal component is {C1 (C2-i). ) / W1}. Similarly, the value of the component h (p2, q2) showing the value {C1 (C2 + i) / W2} and the component h (q2, p2) located symmetrically across the diagonal component is {C1 (C2-C2-). i) / W2} is changed.

In the following S1030, the second scoring unit 143 calculates the eigenvalue and the eigenvector V of the special Hermitian adjacency matrix H1 generated in S1020. Since the special Hermitian adjacency matrix H1 is a matrix of NxN, the eigenvector V is an N-dimensional vector containing N components.

Below, each component of the eigenvector V corresponding to the eigenvalue with the maximum absolute value is represented using V [m]. The variable m takes an integer value from the value 1 to the value N. That is, the eigenvector V is V = {V [1], V [2], ..., V [N]}. Each component V [m] (1 ≦ m ≦ N) of the eigenvector V corresponds to the node D [m] (1 ≦ m ≦ N) constituting the document network.

In the following S1040, the second scoring unit 143 sets each component V [m] (1 ≦ m ≦ N) of the eigenvector V corresponding to the maximum eigenvalue of the absolute value of the special Hermitian adjacency matrix H1 to the start point node of the document network. Divide by the corresponding component E. When the starting node is the s node D [s], the component E is the s component V [s] of the eigenvector V (E = V [s]).

The start point node corresponds to the node to which the lowest score should be given among the nodes in the document network. The start point node is the tip node corresponding to the combination of the tip node and the trailing node that can move according to the direction of the link in the document network to be processed and has the largest number of nodes from the tip node to the trailing node. Can be set.

When each component V [m] (1 ≦ m ≦ N) of the eigenvector V is divided by the component E, the component of the eigenvector V corresponding to the start point node is converted to the value 1. In the following, the eigenvector V after division is expressed as the eigenvector V1. The eigenvector V1 is V1 = {V [1] / E, V [2] / E, ..., V [s] / E = 1, ..., V [N] / E}. By division, the components of the eigenvector V1 corresponding to the start node are placed on the real axis in the complex plane.

After finishing the processing in S1040, the second scoring unit 143 in the document network is based on each component V1 [m] = V [m] / E (1 ≦ m ≦ N) of the eigenvector V1 after division. The score of each node D [m] is calculated (S1050).

In S1050, the second scoring unit 143 rotationally transforms each component V1 [m] (1 ≦ m ≦ N) on the complex plane. Specifically, in the second scoring unit 143, each component V1 of the eigenvector V1 is arranged so that the component located on the first quadrant side on the complex plane is located on the fourth quadrant side by an angle θ1 from the real axis. [M] (1 ≦ m ≦ N) is rotated on the complex plane.

According to FIG. 11A, the component V1 [s] corresponding to the start point node is on the real axis of the complex plane. Each of the black and white circles in FIGS. 11A and 11B corresponds to one of the components of the eigenvector V1, in other words, one of the nodes in the document network, and the black circle corresponds to the starting page.

By the above rotation conversion, as shown in FIG. 11B, the component corresponding to the start point node is rotationally moved so as to be located on the complex plane so as to be located on the fourth quadrant side by an angle θ1 from the real axis. This rotational transformation is performed to shift the starting node from the real axis to the fourth quadrant for proper scoring. The angle θ1 is set to a small angle in which all the components of the eigenvector V1 are still located in the fourth quadrant even by the rotation transformation. The angle θ1 may be zero as long as the scoring is not adversely affected.

The eigenvector V1 after the rotation conversion is expressed below as the eigenvector Vc = {Vc [1], Vc [2], ..., Vc [s], ..., Vc [N]}. Each component Vc [m] (1 ≦ m ≦ N) of the eigenvector Vc is a complex number.

In S1050, the score of each node is calculated based on the position of each component Vc [m] (1 ≦ m ≦ N) on the complex plane. In the following, each component Vc [m] (1 ≦ m ≦ N) of the eigenvector Vc after rotation conversion is also expressed as a score reference value Vc [m] (1 ≦ m ≦ N) of each node. The score reference value Vc [m] is the score reference value of the m-th node, and is used for scoring the m-node. When the angle θ1 is zero, the score reference value Vc [m] (1 ≦ m ≦ N) of each node corresponds to V1 [m] (1 ≦ m ≦ N).

The function arg (x) described below in the present specification may be understood as an argument on the complex plane of the complex number x. x is, for example, Vc [m]. The angle θ [m] of the component Vc [m] shown in FIG. 11B from the real axis on the complex plane to the fourth quadrant is equal to {2π-arg (Vc [m])}. | X | expressed below means the absolute value of the complex number x. When x = Vc [m], | x | corresponds to the length L [m] of Vc [m] shown in FIG. 11B on the complex plane.

In S1050, the second scoring unit 143 sets the score equivalent value Z [m] (1 ≦ m ≦ N) of each node to the complex plane of the score reference value Vc [m] (1 ≦ m ≦ N) of each node. Value Z [m] = L [m] · θ [m] (1 ≦ m ≦ N) = | Vc [m] | · {2π-arg (Vc [m])} Is calculated.

As another example, the score equivalent value Z [m] may be calculated according to the formula Z [m] = | Vc [m] | ^d1 · {2π-arg (Vc [m])} ^d2. The values d1 and d2 are any real numbers greater than zero. As d1 is larger, the value of Z [m] becomes larger according to the smaller number of outlinks at each point from the start point node. The larger d2 is, the larger the value of Z [m] is according to the distance from the starting node.

After that, the second scoring unit 143 calculates the score of each node D [m] (1 ≦ m ≦ N) in the document network based on the score equivalent value Z [m]. In S1050, the score X corresponding to the node D [m] is calculated according to X = Z [m] −Z0. Z0 is, for example, the minimum value of Z [m] in the entire document network. In this case, the score of the node D [m] indicating the smallest Z [m] is zero. Z0 may have a value of zero. That is, the Z0 term may be omitted.

In S245, the score X of each node calculated in this way is output to the ranking unit 145 as the second score of each web page.

As another example, the second scoring unit 143 may generate the special Hermitian adjacency matrix H2 shown in FIG. 12 in place of the above-mentioned special Hermitian adjacency matrix H1 in S1020. The special Hermitian adjacency matrix H2 shown in FIG. 12 is an example of the special Hermitian adjacency matrix H2 corresponding to the Hermitian adjacency matrix H shown in the upper part of FIG.

When the special Hermitian adjacency matrix H2 is generated, the second scoring unit 143 replaces the component of the value + i in the Hermitian adjacency matrix H with the value C1 (C2 + i), and replaces the component indicating the value −i with the value C1 (C2-i). ) Can be replaced. The second scoring unit 143 can further perform the following processes A and B.

(Process A)
The second scoring unit 143 divides the value C1 (C2 + i) in the components of each row in the Hermitian adjacency matrix H after substitution by the value C1 (C2 + i) and the number W of the components indicating the value 1 in the same row { The value C1 (C2-i) in the component at a symmetrical position with the component changed to C1 (C2 + i) / W} and further changed to the value {C1 (C2 + i) / W} with the diagonal component in between. Can be changed to the complex conjugate {C1 (C2-i) / W} of the value {C1 (C2 + i) / W}.

(Process B)
The second scoring unit 143 multiplies the value C1 (C2-i) in the components of each row in the replaced Hermitian adjacency matrix H by the number Z of the components showing C1 (C2-i) and the value 1 in the same row. Change to the value {C1 (C2-i) Z}, and further change to the value {C1 (C2-i) Z}. (C2 + i) can be changed to the complex conjugate {C1 (C2 + i) Z} of the value {C1 (C2-i) Z}.

By such substitution and modification, the special Hermitian adjacency matrix H2 is generated. The second scoring unit 143 may execute the process B after the execution of the process A, may execute the process A after the execution of the process B, or may execute the process A and the process B in parallel. You may do it. Regardless of which mode the process A and the process B are executed, the same special Hermitian adjacency matrix H2 is generated.

The value Z1 in the special Hermitian adjacency matrix H2 shown in FIG. 12 takes the value −i from the components h (p3, 1), h (p3, 2), ..., H (p3, N) in the third row. Corresponds to the number of components and components that take a value of 1. The value Z2 is a component that takes the value -i and a component that takes the value 1 among the total N components h (p4, 1), h (p4, 2), ..., H (p4, N) in the p4th row. Corresponds to the number of. The third line may be understood as the line in which the value {C1 (C2-i) Z1 / W1} is shown in FIG. The fourth line may be understood as the line in which the value {C1 (C2-i) Z2 / W2} is shown in FIG.

The second scoring unit 143 can execute the processing of S1030-S1050 by using the special Hermitian adjacency matrix H2 calculated in this way in place of the special Hermitian adjacency matrix H1.

In S250 (see FIG. 7), the second scoring unit 143 determines the number of layers J of the join nodes included in the document network to be processed. In the present embodiment, the first joining node that appears when moving between the nodes according to the direction of the link from the tip node is defined as the 0th layer joining node.

The join node that appears next to the 0th layer join node is defined as the 1st layer join node, and the join node that appears next to the jth layer join node is defined as the (j + 1) join node (j is an integer greater than or equal to 0). Is). According to this definition, when there are up to the join nodes of the (J-1) layer in the document network, the number of join nodes in the document network is J.

According to the document network shown in FIG. 6, the 0th layer connection node is the 3rd node and the 12th node, the 1st layer connection node is the 6th node, and the 2nd layer connection node is the 7th node. The third layer join node is the fifteenth node. The number of layers J of the join nodes in the document network shown in FIG. 6 is 4.

As can be understood from this explanation, for a join node that can take a plurality of layer numbers depending on the tip node, the highest layer number among the available layer numbers is assigned to the corresponding join node. The seventh node is not a first layer join node but a second layer join node.

After the processing of S250, the second scoring unit 143 sets j = 0 (S260) and determines the subgraph guided from the tip node to the jth layer (that is, the 0th layer) connecting node (S270).

According to the example of the document network shown in FIG. 6, the subgraph determined in S270 is composed of the subgraph SG1 consisting of the first node and the third node, and the second node and the third node, as shown in FIG. Subgraph SG2, subgraph SG3 consisting of tenth node, eleventh node, twelfth node, and twentieth node, and subgraph SG4 consisting of thirteenth node and twelfth node. In S270, the subgraph is discriminated for each combination of the tip node and the 0th layer join node.

The second scoring unit 143 executes the same processing as that shown in FIG. 8 for each of the subgraphs determined in S270 (S280). As a result, the score reference value and the score of each node in the subgraph are calculated for each subgraph.

In S280, the second scoring unit 143 can sequentially select the determined subgraphs as processing targets and execute the processing shown in FIG. Here, the subgraph to be processed is treated in the same manner as the “document network to be processed” in the description of FIG. 8, and the score reference value and the score of each node in the subgraph are calculated.

For example, when the subgraph to be processed is the subgraph SG3 consisting of the 10th node, the 11th node, the 12th node, and the 20th node, in this subgraph, the 12th node and the 20th node having no outlink. A dummy node DP is added to the node (see FIG. 14). Dummy node DP may be added to the addition target node in the same number as the number of outlinks that the addition target node has before subgraphing.

In S280, the Hermitian adjacency matrix H corresponding to the subgraph to which the dummy node DP is added is generated. Based on the special Hermitian adjacency matrix H1 or the special Hermitian adjacency matrix H2 corresponding to the Hermitian adjacency matrix H, the score reference values and scores of the tenth node, the eleventh node, the twelfth node, and the twentieth node are calculated. The above-mentioned value Z0 can be set so that the score of the tenth node, which is the tip node, becomes zero, for example.

When the score reference value and the score for each subgraph are calculated in S280, the second scoring unit 143 integrates the score reference value and the score of the j-layer connecting node overlapping between the subgraphs (S285).

In S285, the second scoring unit 143 synthesizes the score reference values of the j-layer connected nodes overlapping between the subgraphs as follows, and for each of the j-layer connected nodes in the document network to be processed. Calculate the only score reference value and score.

Specifically, the second scoring unit 143 has the highest angle θ from the real axis on the complex plane among the score reference values for each subgraph of the j-layer connection node with respect to one j-layer connection node. Determine a large score reference value. The score reference value in each subgraph of the j-layer join node is rotated on the complex plane so that the angle θ overlaps with the maximum score reference value on the complex plane.

The second scoring unit 143 vector-synthesizes each of the overlapping score reference values on the complex plane, and determines the only score reference value for one j-layer connecting node as the composite vector. Based on this unique score reference value Vx, the score equivalent value Zx = | Vx | ^d1 · {2π-arg (Vx)} ^d2 corresponding to one j-layer connection node is calculated. The score X of the j-layer join node can be calculated according to X = Zx−Z0.

In order to give a common score X to the j-layer connecting node overlapping between the subgraphs, the value Z0 is Z [m in the subgraph having the maximum score reference value at the angle θ for the j-layer connecting node. ] Can be set to the minimum value. Alternatively, the value Z0 may be zero, as described above.

As another example, the second scoring unit 143 performs vector synthesis on the complex plane with respect to one j-layer connecting node without overlapping the score reference values for each subgraph of the j-layer connecting node. The only score reference value may be determined for the j-layer join node.

When a plurality of j-layer connected nodes exist in the document network, the second scoring unit 143 executes the above processing for each of the j-layer connected nodes in S285, and the second scoring unit 143 executes the above processing for each of the j-layer connected nodes. The score reference value Vx and the score X are calculated. As a result, the second scoring unit 143 calculates the score reference value Vx and the corresponding score X by integrating the score reference values Vc between the subgraphs regarding the j-layer join node for each j-layer join node.

When the processing in S285 is completed, the second scoring unit 143 increments the value of the variable j by 1 (S290), and in the subsequent S300, the second scoring unit 143 has the value of the variable j less than the number of layers J. Judge whether or not.

When it is determined that the value of the variable j is equal to or greater than the number of layers J (No in S300), the second scoring unit 143 executes the process of S410 (see FIG. 19). On the other hand, if it is determined that the value of the variable j is less than the number of layers J (Yes in S300), the second scoring unit 143 executes the process of S310 (see FIG. 15).

In S310, the second scoring unit 143 determines the subgraph from the tip node to the j-layer join node. The subgraph determined here is a subgraph in which no other join node is included between the tip node and the j-th layer join node. In S310, for each combination of the tip node and the j-th layer join node, a subgraph including one tip node and one join node corresponding to the combination is determined.

According to the example of the document network shown in FIG. 6, the subgraph determined in S310 is a subgraph SG5 including a fourth node, a fifth node, and a sixth node, as shown in FIG.

When the processing in S310 reveals that the corresponding subgraph exists (Yes in S320), the second scoring unit 143 executes the same processing as in S280 for each of the subgraphs determined in S310 (Yes in S320). S330). As a result, the score reference value and the score of each node in the subgraph are calculated for each subgraph (S330). After that, the second scoring unit 143 executes the process of S340.

If it is found by the processing of S310 that the corresponding subgraph does not exist (No in S320), the second scoring unit 143 does not execute the processing of S330, but executes the processing of S340.

In S340, the second scoring unit 143 sets the variable f to a value of zero. In the following S350, the second scoring unit 143 determines the subgraph from the f-layer connection node to the j-layer connection node. The subgraph determined here is a subgraph in which no other join node is included between the f-layer join node and the j-layer join node.

In S350, for each combination of the f-layer connection node and the j-layer connection node, a subgraph including one f-layer connection node and one j-layer connection node corresponding to the combination is determined. In the subgraph, the f-layer join node corresponds to the front end node, and the j-layer join node corresponds to the rear end node.

According to the example of the document network shown in FIG. 6, when f = 0 and g = 1, the subgraph determined in S350 is the subgraph SG6 including the third node and the sixth node shown in FIG.

When it is found that the corresponding subgraph does not exist by the processing in S350 (No in S360), the second scoring unit 143 executes the processing of S380 without executing the processing of S370.

On the other hand, when it is found that the corresponding subgraph exists (Yes in S360), the second scoring unit 143 executes the same processing as in S280 for each of the subgraphs determined in S350. As a result, the score reference value and the score of each node in the subgraph are calculated for each subgraph (S370).

In S370, the second scoring unit 143 further corrects the score reference value and score of each node in the calculated subgraph according to the already calculated score reference value and score of the f-layer connection node.

The score reference value and score of the f-layer connection node in the subgraph are calculated before the processing of S370. For example, the score reference value and the score of the 0th layer connection node when f = 0 are calculated in S285. In S370, the second scoring unit 143 modifies the score reference values and scores of the remaining nodes in the subgraph based on the score reference values and scores of the f-layer connected node for which the score reference values and scores have already been calculated. To do.

According to the first example of S370, when the score reference value of the f-layer join node calculated before the processing of S370 is Va, the second scoring unit 143 has the score reference value of each node in the subgraph. Is modified as follows.

That is, the second scoring unit 143 rotates the score reference value of each node before modification calculated in S370 on the complex plane so that the score reference value of the f-layer connection node matches the above Va. The score reference value of each node when rotated in this way is determined as the corrected score reference value.

The second scoring unit 143 can calculate the modified score X of each node in the subgraph based on the score reference value Vc of each node after modification. The score X can be calculated so that the score X of the f-layer join node is the same as the score before modification.

According to the second example of S370, when the score of the f-layer join node calculated before the processing of S370 is Xa, the second scoring unit 143 sets the score of each node in the subgraph as follows. Modify to.

That is, the second scoring unit 143 adds the score of each node before modification calculated in S370 by the difference between the score of the f-layer connection node before modification calculated in S370 and the above Xa. As a result, the second scoring unit 143 corrects the score of each node in the subgraph so that the score of the f-layer connecting node before the correction calculated in S370 matches the above Xa.

When the processing in S370 is completed, the second scoring unit 143 increments the value of the variable f by 1 (S380). After that, the second scoring unit 143 determines whether or not the value of the variable f is less than the value of the variable j (S390). If an affirmative decision is made here (Yes in S390), the second scoring unit 143 executes the process of S350. If affirmative determination is made (No in S390), the second scoring unit 143 executes the process of S400.

In S400, the second scoring unit 143 of the corresponding j-layer join node calculated by the processing of S330 and S370 for each of the j-layer join nodes overlapping between the subgraphs determined in S310 and S350. The score reference value and the score for each subgraph are integrated in the same manner as in the process of S285.

That is, the second scoring unit 143 calculates a score reference value Vx that integrates the score reference value Vc for each subgraph of the corresponding j-layer join node for each of the j-layer join nodes, and uses the score reference value Vx as the score reference value Vx. The score X = Zx-Z0 based on the corresponding score equivalent value Zx is calculated.

After that, the second scoring unit 143 increments the value of the variable j by 1 (S290) and executes the processes of S300 to S400. The second scoring unit 143 calculates the score reference value and the score of each node up to the (J-1) layer connection node by repeatedly executing S300 to S400 while incrementing the value of the variable j.

After calculating the score reference value and the score of each node up to the (J-1) layer connection node, the second scoring unit 143 makes a negative judgment in S300 and executes the process of S410 (see FIG. 19). ).

The flow of processing before S410 will be specifically described based on the example of the document network shown in FIG. When g = 0, the second scoring unit 143 performs the processing related to the subgraph SG1 (see FIG. 13) and the processing related to the subgraph SG2 to perform the score reference values and scores of the first, second, and third nodes. Is calculated.

The second scoring unit 143 further calculates the score reference values and scores of the tenth, eleventh, twelfth, thirteenth, and twentieth nodes by executing the processing related to the subgraph SG3 and the processing related to the subgraph SG4.

After that, in the process of g = 1, the second scoring unit 143 executes the process related to the subgraph SG5 (see FIG. 16), further executes the process related to the subgraph SG6, and executes the processes related to the subgraph SG6, and the fourth, fifth, and sixth. Calculate the node score reference value and score.

In the process of g = 2, the second scoring unit 143 executes the process related to the subgraph SG7 (see FIG. 17), further executes the process related to the subgraph SG8, and calculates the score reference value and the score of the seventh node. To do.

In the process of g = 3, the second scoring unit 143 executes the processing related to the subgraph SG9 (see FIG. 18), and further executes the processing related to the subgraph SG10, and the 8th, 14th, and 15th nodes. Score The reference value and score are calculated.

In S410 (see FIG. 19), the second scoring unit 143 discriminates a non-circular subgraph starting from the joined node and ending at the trailing end node of the unjoined node. According to the example of the document network shown in FIG. 6, the subgraph identified in S410 is the subgraph SG11 composed of the 15th, 16th, 17th, 18th, and 19th nodes shown in FIG.

When it is found that the corresponding subgraph does not exist by the processing in S410 (No in S420), the second scoring unit 143 executes the processing of S440 without executing the processing of S430. On the other hand, when it is found that the corresponding subgraph exists (Yes in S420), the second scoring unit 143 calculates the score reference value and the score of each node in the subgraph for each subgraph determined in S410. (S430).

In S430, the second scoring unit 143 modifies the score reference value and the score of each node in the subgraph based on the already calculated score and the score reference value of the combined node, as in the process of S370. In this way, the score and score reference value of each node in the subgraph are determined.

In the following S440, the second scoring unit 143 determines the subgraph of the circulatory system. According to the example of the document network shown in FIG. 6, the subgraph identified in S440 is the subgraph SG12 composed of the sixth, seventh, eighth, and ninth nodes shown in FIG.

When the circulatory system subgraph does not exist (No in S450), the second scoring unit 143 executes the process of S500 without executing the process of S460-S490. On the other hand, when there is a circulatory system subgraph (Yes in S450), the second scoring unit 143 sets the score reference value and score of each node in the subgraph for each subgraph determined in S440 in S460-S490. calculate. When the second scoring unit 143 executes the processing of S470-S480 for all the subgraphs determined in S440 (Yes in S490), the second scoring unit 143 executes the processing of S500.

In S460, the second scoring unit 143 selects one of the subgraphs determined in S440. In S470, the second scoring unit 143 determines a common additional score for each node in the subgraph based on the score group of the nodes for which the score has already been calculated in the selected subgraph.

According to the first example of S470, the second scoring unit 143 determines the maximum value of the score group in the above subgraph as the added score. According to the second example of S470, the second scoring unit 143 determines the average value of the score group as the added score.

In S480, the second scoring unit 143 adds the determined added score to the score of each node in the selected subgraph to correct the score of each node. For the nodes for which the score has not been calculated before the addition, the score can be regarded as zero and the above-determined addition score can be added.

The second scoring unit 143 executes the process of S500 when the score of the node in each subgraph is corrected in this way. Before the processing of S500 is executed, the scores of all the nodes in the document network are determined.

In S500, the second scoring unit 143 outputs the score of each node in the determined document network to the ranking unit 145 as the second score of each web page. After that, the score calculation process is completed.

The information processing system 1 of the first embodiment described above can be modified as follows. As a first modification, the second scoring unit 143 may generate the Hermitian adjacency matrix H and calculate the score reference value and the score without placing the dummy node DP for the node without the outline.

As a second modification, the second scoring unit 143 modifies the score reference value Vc [m] of each node as a value excluding the influence of the outlink destination of the corresponding node, and modifies the score reference value Vc ^*. Using [m] instead of the uncorrected score reference value Vc [m], the score equivalent value Z [m] = | Vc ^* [m] | ^d1 · {2π-arg (Vc) corresponding to each node. ^* [M])} ^d2 may be calculated.

As a third modification, the second scoring unit 143 modifies the score reference value Vc [m] of each node as a value excluding the influence from the inlink source of the corresponding node, and modifies the score reference value Vc. ^{* Using} [m] instead of the score reference value before correction, the score equivalent value Z [m] = | Vc ^* [m] | ^d1 · {2π-arg (Vc ^* [m] corresponding to each node ])} ^d2 may be calculated. Similar to the first modification, the second modification and the third modification can be implemented by generating the Hermitian adjacency matrix H without placing a dummy node DP for a node having no outline.

According to the information processing system 1 of the present embodiment, special Hermitian adjacency matrices H1 and H2 corresponding to the Hermitian adjacency matrix H in which the connection relationship between web pages is expressed by four values of 1,0, + i, and −i are used. Score multiple web pages. Therefore, it is not necessary to establish a virtual connection relationship from all web pages to all web pages, and scoring / ranking of each web page based on the connection relationship between web pages should be realized more appropriately than before. Can be done.

According to this embodiment, even in a document network having a join node, scoring / ranking of a plurality of web pages can be appropriately performed using the Hermitian adjacency matrix H. Therefore, the output unit 147 can provide the user terminal 5 with an appropriate search result list based on the connection relationship between the web pages based on the second score from the second scoring unit 143.

[Second Embodiment]
Subsequently, the information processing system 1 of the second embodiment will be described. The information processing system 1 of the second embodiment is configured in the same manner as the information processing system 1 of the first embodiment, except that a score calculation process having a content different from that of the first embodiment is executed in S130. Therefore, in the following, only the score calculation process executed by the second scoring unit 143 in S130 will be described. It may be understood that the configuration of the information processing system 1 of the second embodiment, which is not described below, is the same as that of the first embodiment.

In the second embodiment, the second scoring unit 143 executes the score calculation process shown in FIG. 22 when calculating the second score of each node included in the document network to be processed selected in S120 in S130.

When the score calculation process shown in FIG. 22 is started, the second scoring unit 143 determines the tip node having no inlink included in the document network to be processed (S610). For example, when the document network to be processed is the document network illustrated in FIG. 23, the second scoring unit 143 determines the first node and the eighth node as the tip node.

After that, the second scoring unit 143 selects one of the advanced nodes included in the document network (S620) and determines the subgraph including the selected advanced node (S630). The discriminated subgraph is a subgraph consisting of the selected tip node and all the nodes in the document network that can be moved from this tip node according to the direction of the link.

According to the example of the document network shown in FIG. 23, when the selected leading node is the first node, in S630, as shown in FIG. 24A, from the first node to the ninth node excluding the eighth node. The subgraph SG21 composed of nodes is determined. When the selected leading node is the eighth node, in S630, as shown in FIG. 24B, the subgraph SG22 composed of the nodes from the third node to the ninth node is determined.

After that, the second scoring unit 143 executes the same processing as the processing shown in FIG. 8 when the subgraph determined in S630 is regarded as the document network to be processed, and the score reference value of each node in the subgraph is executed. And the score is calculated (S640).

In S640, the second scoring unit 143 can generate the Hermitian adjacency matrix H and calculate the score reference value and the score without arranging the dummy node DP at the rear end node. The second scoring unit 143 arranges the score reference value of the tip node in the subgraph at the point of the value 1 on the real axis or the specific point on the fourth quadrant close to the real axis in the complex plane. The score reference value of each node in the subgraph can be calculated. The specific point may be a point obtained by rotating a point having a value of 1 on the real axis toward the fourth quadrant by an angle θ1.

In S650 following S640, the second scoring unit 143 selects all the leading nodes and determines whether or not the processing of S640 has been executed. If a negative determination is made in S650, the second scoring unit 143 changes the selected tip node (S620) and determines the subgraph of the changed tip node (S630). Then, the score reference value and the score of each node in the subgraph are calculated based on the Hermitian adjacency matrix H of the discriminated subgraph (S640).

In this way, the second scoring unit 143 calculates the score reference value and the score of each node based on the corresponding Hermitian adjacency matrix H for each subgraph corresponding to each of the advanced nodes included in the document network. In other words, the second scoring unit 143 determines the subgraph for each inlink of the connecting node, and calculates the score reference value and the score of each node based on the corresponding Hermitian adjacency matrix H for each subgraph. After that, the second scoring unit 143 makes an affirmative judgment in S650 and executes the process of S660.

The subgraph includes nodes that overlap with other subgraphs, but in S640, a score reference value and a score are calculated for each of the overlapping nodes for each subgraph.

In S660, the second scoring unit 143 calculates a value obtained by integrating the scores of the corresponding nodes in each subgraph as the second score of each node in the document network. Specifically, the second scoring unit 143 calculates the second score of one node as the total value of the scores in each subgraph of the node.

Alternatively, the second scoring unit 143 may calculate the second score of one node based on the composite vector of the score reference values in each subgraph of the node. The second scoring unit 143 uses the composite vector of the score reference values in each subgraph of the corresponding node as the only score reference value Vx of the corresponding node for each node, and uses the equation Zx = | Vx | ^d1. According to {2π-arg (Vx)} ^d2 , the score equivalent value Zx corresponding to the score reference value Vx can be calculated. The second scoring unit 143 can output the calculated score equivalent value Zx as the second score of the corresponding node.

Also according to the second embodiment described above, the information processing system 1 can appropriately perform scoring / ranking of a plurality of web pages related to a document network having a join node by using the Hermitian adjacency matrix H.

The second embodiment may be modified in the same manner as the first embodiment. That is, in the first embodiment, the modified example relating to the calculation of the score described as the first, second, and third modified examples may be applied to the second embodiment.

[Third Embodiment]
Subsequently, the information processing system 1 of the third embodiment will be described. The information processing system 1 of the third embodiment is configured in the same manner as the information processing system 1 of the first embodiment, except that a score calculation process having a content different from that of the first embodiment is executed in S130. Therefore, in the following, only the score calculation process executed by the second scoring unit 143 in S130 will be described. It may be understood that the configuration of the information processing system 1 of the third embodiment, which is not described below, is the same as that of the first embodiment.

In the third embodiment, the second scoring unit 143 executes the score calculation process shown in FIG. 25 when calculating the second score of each node included in the document network to be processed selected in S120 in S130. ..

When the score calculation process shown in FIG. 25 is started, the second scoring unit 143 determines the front node having no inlink and the rear node having no outlink included in the document network to be processed (S710). ).

When there is no trailing node that does not have an outlink, the second scoring unit 143 calculates the total of the outlinks and inlinks when the document network is represented by a directed graph, and the link when the document network is represented by an undirected graph. Nodes with different numbers are formally identified as trailing nodes. If there is no corresponding node, the second scoring unit 143 adds a dummy node DP having inlinks from all the nodes in the document network to the document network, and determines the dummy node DP as the rearmost node. ..

When the document network to be processed is the document network exemplified in FIG. 23, the second scoring unit 143 determines the first and eighth nodes as the tip nodes, and the fifth, seventh, and ninth nodes. Determine the node as the trailing node. When the document network to be processed is the document network exemplified in FIGS. 27 and 28, the second scoring unit 143 determines the first node as the front node and the ninth node as the rear node. Determine.

After that, the second scoring unit 143 determines the subgraph for each combination of the front end node and the rear end node included in the document network (S720). A subgraph is a subgraph consisting of a group of nodes that can move from the front end node to the rear end node according to the direction of the link. The subgraph for each combination of the front-end node and the rear-end node may be understood as a subgraph for each combination of in-link and out-link of the join node.

According to the example of the document network shown in FIG. 23, in S720, the second scoring unit 143 has three subgraphs SG31, SG32, SG33, and FIG. 26B in which the tip node shown in FIG. 26A is the first node. The three subgraphs SG34, SG35, and SG36 whose tip node is the eighth node are discriminated.

According to the example of the document network shown in FIG. 27, in S720, the second scoring unit 143 has the first node and the rearmost node as the front node passing through the first outlink of the third node shown in FIG. 29A. The subgraph which is the ninth node and the subgraph whose front end node passing through the second outlink of the third node shown in FIG. 29B is the first node and the rear end node is the ninth node are discriminated from each other.

After that, the second scoring unit 143 selects one of the determined subgraphs (S730), and executes the same processing as that shown in FIG. 8 when the selected subgraph is regarded as the document network to be processed. , Calculate the first tentative score of each node in the subgraph (S740).

In S740, the second scoring unit 143 sets the first provisional score Xp1 [m] of the m node to the equation Xp1 [m] = {(2π-arg (Vc [m])) / (π / 2n)}. It can be calculated according to ^d3. Vc [m] is a score reference value of the mth node. d3 is any real number greater than 0. The larger d3 is, the larger the first provisional score Xp1 [m] is according to the distance (the number of links) from the tip node having no inlink.

The second scoring unit 143 selects each of the subgraphs in order (S730) until the processing of S740 is executed for all the subgraphs (No in S750), and the processing of S740 is executed. As a result, the second scoring unit 143 calculates the first provisional score of each node in the subgraph for each subgraph (S740).

When the processing of S740 is executed for all the subgraphs (Yes in S750), the second scoring unit 143 sets the second tentative score of the node as the second tentative score of the node for each node in the document network. The average value is calculated (S760). The second provisional score of one node is the sum of the first provisional scores of the corresponding node divided by the number of subgraphs to which the corresponding node belongs.

After that, the second scoring unit 143 calculates the third provisional score of each node based on the second provisional score of each node in the document network (S770). In S770, the second scoring unit 143 uses the third provisional score Xp3 [m] of the m-node, the second provisional score Xp2 [m] of the m-node, and the score reference value Vc [m] to formulate the formula. Calculate according to Xp3 [m] = Xp2 [m] · | Vc [m] |.

After that, the second scoring unit 143 calculates the second score of each node in the document network using the third provisional score of each node (S780). Specifically, the second scoring unit 143, the second score of the m nodes in the document network, the third temporary score Xp3 value obtained by dividing [m] to a value ^{M d4} Xp3 [m] ^{/ M d4} calculate. Here, M is the product of the number of outlinks of each node that can reach the mth node from the rearmost node.

Also according to the third embodiment described above, scoring / ranking of a plurality of web pages can be appropriately performed using the Hermitian adjacency matrix H for a document network having a join node.

[Fourth Embodiment]
Subsequently, the information processing system 1 of the fourth embodiment will be described. The information processing system 1 of the fourth embodiment is different from the first embodiment in that the second scoring unit 143 executes the sub-processing shown in FIG. 30 instead of the sub-processing shown in FIG. On the other hand, the information processing system 1 of the fourth embodiment is basically the same as that of the first embodiment in other respects. Therefore, in the following, a configuration different from that of the first embodiment of the information processing system 1 of the fourth embodiment will be selectively described, and the description of the same configuration as that of the first embodiment will be omitted.

The sub-processing shown in FIG. 30 is executed in S240. Further, in S280, S330, S370, and S430, the same processing as the sub-processing shown in FIG. 30 is executed when calculating the score reference value and the score.

When the sub-processing shown in FIG. 30 is started, the second scoring unit 143 generates the Hermitian adjacency matrix H corresponding to the document network to be processed (S1110), similarly to S1010.

After that, the second scoring unit 143 generates a special Hermitian adjacency matrix H3 which is a modification of the generated Hermitian adjacency matrix H (S1120). The special Hermitian adjacency matrix H3 is generated by substituting all the diagonal components in the special Hermitian adjacency matrix H1 generated in S1020 from a value 0 to a value -1.

That is, the second scoring unit 143 generates a special Hermitian adjacency matrix H1 by transforming the Hermitian adjacency matrix H generated in S1110 in the same manner as in the processing in S1020, and the diagonal components in the special Hermitian adjacency matrix H1. By substituting everything from value 0 to value -1, the special Hermitian adjacency matrix H3 can be generated.

In the following S1130, the second scoring unit 143 generates the column vector B. The special Hermitian adjacency matrix H3 generated in S1120 is a matrix of N rows and N columns (NxN) corresponding to the number N of nodes of the document network to be processed.

The column vector B generated in S1130 corresponds to a matrix of N rows and 1 column, and the column vector B indicates a value according to whether or not each component is a node having an inlink. Is generated in.

Specifically, the second scoring unit 143 sets the component corresponding to the node having no inlink to the value -1, and sets the component corresponding to the node having the inlink to the value 0. Generate vector B.

In the following S1140, the second scoring unit 143 solves each component u [m] of the score reference vector U corresponding to the solution of the simultaneous equations by solving the following simultaneous equations including the special Hermitian adjacency matrix H3 and the column vector B. (1 ≦ m ≦ N) is obtained. The score reference vector U corresponds to an N-by-1 matrix like the column vector B. In the following equation, the m-th row component of the score reference vector U is represented by u [m], the m-th row component of the column vector B is represented by b [m], and the special Hermitian adjacency matrix H3 is represented by H'.

For example, assume an exemplary and simple document network in which only the first node has no inlinks and the number of nodes is 5. In this case, the simultaneous equations can be expressed as follows using the exemplary Hermitian adjacency matrix H3.

In the subsequent S1150, the second scoring unit 143 determines the score reference value Vc [m] of each node based on each component u [m] of the score reference vector U calculated in S1140.

Similar to the eigenvector V1 in the first embodiment, the second scoring unit 143 sets each component u [m] (1 ≦ m ≦ N) of the score reference vector U to the component E = corresponding to the start point node of the document network. It can be corrected by dividing by u [s]. The correction value u [m] / E (1 ≦ m ≦ N) can be corrected so as to be rotationally moved from the real axis to the fourth quadrant side by an angle θ1 on the complex plane. This correction value can be determined as the score reference value Vc [m].

In the following S1160, the second scoring unit 143 sets the score equivalent value Z [m] (1 ≦ m ≦ N) of each node based on the score reference value Vc [m] (1 ≦ m ≦ N) of each node. The value Z [m] = | Vc [m] | · {2π-arg (Vc [m])} is calculated. Similar to the first embodiment, the score equivalent value Z [m] may be calculated according to ^{the formula Z [m] = | Vc [m] | d1} · {2π-arg (Vc [m])} ^d2.

The second scoring unit 143 further calculates the score X of each node D [m] (1 ≦ m ≦ N) in the document network based on the score equivalent value Z [m] (S1160). The second scoring unit 143 can calculate the score X corresponding to the node D [m] according to X = Z [m] −Z0, as in the first embodiment. Z0 is, for example, the minimum value of Z [m] in the entire document network. Z0 may have a value of zero.

In this embodiment, the score X of each node is calculated in this way. The treatment of the calculated score X is the same as that of the first embodiment. According to the present embodiment, the score X of each node can be calculated using the Hermitian adjacency matrix H without calculating the eigenvalues and the eigenvectors as in the first embodiment.

[Fifth Embodiment]
Subsequently, the information processing system 1 of the fifth embodiment will be described. The information processing system 1 of the fifth embodiment is different from the first embodiment in that the second scoring unit 143 executes the sub-processing shown in FIG. 31 instead of the sub-processing shown in FIG. On the other hand, the information processing system 1 of the fifth embodiment is basically the same as that of the first embodiment in other respects. Therefore, in the following, a configuration different from that of the first embodiment of the information processing system 1 of the fifth embodiment will be selectively described, and the description of the same configuration as that of the first embodiment will be omitted.

The sub-processing shown in FIG. 31 is executed in S240. Further, in S280, S330, S370, and S430, the same processing as the sub-processing shown in FIG. 31 is executed when calculating the score reference value and the score.

When the sub-processing shown in FIG. 31 is started, the second scoring unit 143 generates the Hermitian adjacency matrix H corresponding to the document network to be processed (S1210), similarly to S1010.

After that, the second scoring unit 143 generates a special Hermitian adjacency matrix H4 which is a modification of the generated Hermitian adjacency matrix H (S1220). In order to generate the special Hermitian adjacency matrix H4, the second scoring unit 143 can replace all the components of the value + i with the value 0 in the Hermitian adjacency matrix H. Further, all the components of the value −i can be replaced with the value C1 (C2-i). By this substitution, the exemplary Hermitian adjacency matrix H shown in the upper part of FIG. 32 is replaced with the ^{matrix H (1) shown in the lower part of FIG. 32.}

The second scoring unit 143 further includes, in the above matrix H ⁽¹⁾ , a column corresponding to a node having two or more outlinks, in other words, the number of components having a value −i in the Hermitian adjacency matrix H is 2. The component of the value C1 (C2-i) in the above column is replaced with the value C1 (C2-i) / R. The value R is the number of outlinks and corresponds to the number of components having the value −i in the corresponding column of the Hermitian adjacency matrix H. ^{By this substitution, the matrix H (1)} shown in the lower part of FIG. 32 is replaced with the ^{matrix H (2)} shown in the upper part of FIG. 33.

^{The second scoring unit 143 further replaces all the diagonal components in the matrix H (2)} with a value -1 as shown in the lower part of FIG. 33 to generate a special Hermitian adjacency matrix H4. In the subsequent S1230, the second scoring unit 143 generates the column vector B as in the process in S1130.

After that, the second scoring unit 143 solves the simultaneous equations including the special Hermitian adjacency matrix H4 and the column vector B in the same manner as the processing in S1140, so that each component of the score reference vector U corresponding to the solution of the simultaneous equations u [m] (1 ≦ m ≦ N) is obtained (S1240).

The processing in S1240 is the same as the processing in S1140 except that the special Hermitian adjacency matrix H4 is used instead of the special Hermitian adjacency matrix H3. In the solution of simultaneous equations, the component corresponding to the tip node having no inlink shows the real number 1.

In the following S1250, the second scoring unit 143 determines the score reference value Vc [m] of each node based on each component u [m] of the score reference vector U calculated in S1240. According to the present embodiment, the second scoring unit 143 can determine the score reference value Vc [m] of each node to be the same value as the corresponding component u [m] of the score reference vector U.

In the following S1260, the second scoring unit 143 sets the score equivalent value Z [m] (1 ≦ m ≦ N) of each node based on the score reference value Vc [m] (1 ≦ m ≦ N) of each node. The value Z [m] according to the following equation is calculated. The value d is any real number greater than zero. The value d may be the value 1.

The second scoring unit 143 further calculates the score X of each node D [m] (1 ≦ m ≦ N) in the document network based on the score equivalent value Z [m] (S1260). The second scoring unit 143 can calculate the score X corresponding to the node D [m] according to X = Z [m] −Z0, as in the first embodiment. Z0 is, for example, the minimum value of Z [m] in the entire document network. Z0 may have a value of zero.

In this embodiment, the score X of each node is calculated in this way. The treatment of the calculated score X is the same as that of the first embodiment. According to the present embodiment, the score X of each node can be calculated using the Hermitian adjacency matrix H without obtaining the eigenvalues and the eigenvectors as in the first embodiment.

Although the exemplary embodiments of the present disclosure have been described above, the present disclosure is not limited to the above-described embodiments. The disclosure may apply to scoring documents with link / citation relationships that are not limited to web documents. The technical ideas according to the fourth embodiment and the fifth embodiment may be applied to the second embodiment or the third embodiment.

The technique of adding a dummy node DP having no inlink and having an outlink to all the nodes in the document network to the document network having no advanced node is the first to the first embodiment. 5 Can be applied to embodiments. Furthermore, the dummy node DP having no inlink may be added to the document network having the advanced node.

Similarly, the technique of adding a dummy node DP having inlinks from all the nodes in the document network to the document network having no rear end node and having no outlink is the first technique. It can be applied from the embodiment to the fifth embodiment. Furthermore, the dummy node DP having no outlink may be added to the document network having the trailing node.

The functions of one component in the above embodiment may be distributed to a plurality of components. The functions of the plurality of components may be integrated into one component. Some of the configurations of the above embodiments may be omitted. The embodiments of the present disclosure are all aspects contained in the technical idea identified from the wording described in the claims.

Claims

A document network discriminator configured to discriminate at least a document network composed of a plurality of documents linked by a weak connection based on data representing a connection relationship between documents.
A document discriminating unit configured to discriminate a specific document having an inlink from two or more documents included in the discriminated document network, and a document discriminating unit.
A sub-network discriminating unit configured to discriminate a plurality of sub-networks included in the document network based on the discriminated specific document.
A score calculation unit configured to calculate the scores of the plurality of documents constituting the document network by executing individual processing for each of the determined plurality of subnetworks. In the process, a score calculation unit that calculates the score of each document included in the corresponding subnetwork, and
With
The document network includes one or more duplicate documents that belong to two or more of the plurality of subnetworks.
The score calculation unit calculates one score for the corresponding duplicate document by integrating the scores of the corresponding duplicate document in the two or more subnetworks for each of the one or more duplicate documents. Information processing system.
The information processing system according to claim 1.
The sub-network discriminating unit discriminates a plurality of sub-networks having the specific document as a boundary, and discriminates between them.
The plurality of subnetworks are two or more upstream networks corresponding to two or more inlinks possessed by the specific document, and each of the upstream networks has one inlink corresponding to the specific document. It has at least two or more upstream subnetworks and at least a downstream subnet network that is connected to the particular document through the outlinks that the particular document has.
The specific document is the duplicate document belonging to the two or more upstream subnetworks.
The score calculation unit calculates one integrated score for the specific document by integrating the scores of the specific document in the upstream subnetwork, and obtains the score of each document belonging to the downstream subnetwork. An information processing system that calculates based on the integrated score of a specific document.
The information processing system according to claim 1.
The sub-network discriminating unit is a sub-network for each in-link possessed by the specific document as the plurality of sub-networks, and is a document group located upstream of the corresponding in-link, the specific document, and the specific document. An information processing system that discriminates a group of documents located downstream of the outlinks of the document and a subnetwork for each inlink.
The information processing system according to claim 1.
The sub-network discriminating unit is a sub-network for each combination of in-link and out-link possessed by the specific document as the plurality of sub-networks, and includes a group of documents located upstream of the in-link corresponding to the combination. An information processing system that discriminates a sub-network for each combination including the specific document and a document group located downstream from the outlink corresponding to the combination.
The information processing system according to claim 3 or 4.
The integration is achieved by calculating the sum of the scores of the corresponding duplicate documents in the two or more subnetworks, or by calculating the representative value of the scores in the two or more subnetworks. Information processing system to be done.
The information processing system according to any one of claims 1 to 5.
The individual process is an information processing system including a process of calculating a score of each document included in the corresponding subnetwork by using an Hermitian adjacency matrix based on the connection relationship between documents in the corresponding subnetwork.
In the individual processing, a dummy document is added to the trailing document that does not have an outlink included in the corresponding subnetwork so that the trailing document is virtually provided with an outlink. The information processing system according to claim 6, further comprising a process of changing the corresponding subnetwork and defining an Elmeat adjacency matrix based on the connection relationship between documents in the changed subnetwork.
The Elmeet adjacency matrix is an N-by-N-column Elmeat matrix based on the connection relationship between the documents D [m] (1 ≦ m (integer) ≦ N) constituting the corresponding subnetwork, and is the p-th row. The component h (p, q) in column q is a value when there is a link from document D [p] to document D [q] and there is a link from document D [q] to document D [p]. 1 is indicated, and when neither the link from the document D [p] to the document D [q] nor the link from the document D [q] to the document D [p] exists, the value 0 is indicated and the document D [p] is indicated. When there is a link from document D [q] but no link from document D [q] to document D [p], a value + i (i is an imaginary unit) is shown, and document D [p] to document Claim 6 corresponding to an Elmeat matrix with zero diagonal, indicating a value −i, where there is no link to D [q] but there is a link from document D [q] to document D [p]. Alternatively, the information processing system according to claim 7.
The individual processing includes a process of transforming the Hermitian adjacency matrix to define a special Hermitian adjacency matrix and calculating the score of each document included in the corresponding subnetwork using the eigenvectors of the special Hermitian adjacency matrix. The Hermitian adjacency matrix according to claim 8, wherein when each component of the eigenvector is tentatively arranged on the complex plane, all the components are deformed so as to be within the angle range of π / 2 radian in the complex plane. Information processing system.
With the processor
A memory containing instructions for causing the processor to perform a specific process, and
The specific process is
Determining a document network consisting of at least multiple documents linked by weakly linked documents based on data representing the connection relationship between documents.
Distinguishing a specific document having an inlink from two or more documents included in the discriminated document network, and
To discriminate a plurality of sub-networks included in the document network based on the discriminated specific document.
As an individual process for each of the determined plurality of sub-networks, a process of calculating the score of each document included in the corresponding sub-network is executed, so that each score of the plurality of documents constituting the document network is executed. To calculate and
Including
The document network includes one or more duplicate documents that belong to two or more of the plurality of subnetworks.
To calculate the score of each of the plurality of documents constituting the document network is to integrate the scores of the corresponding duplicate documents in the two or more subnetworks for each of the one or more duplicate documents. An information processing system comprising calculating one score for the corresponding duplicate document.
An information processing method executed by a computer
Determining a document network consisting of at least multiple documents linked by weakly linked documents based on data representing the connection relationship between documents.
Distinguishing a specific document having an inlink from two or more documents included in the discriminated document network, and
To discriminate a plurality of sub-networks included in the document network based on the discriminated specific document.
As an individual process for each of the determined plurality of sub-networks, a process of calculating the score of each document included in the corresponding sub-network is executed, so that each score of the plurality of documents constituting the document network is executed. To calculate and
Including
The document network includes one or more duplicate documents that belong to two or more of the plurality of subnetworks.
To calculate the score of each of the plurality of documents constituting the document network is to integrate the scores of the corresponding duplicate documents in the two or more subnetworks for each of the one or more duplicate documents. An information processing method comprising calculating one score for the corresponding duplicate document.
The information processing method according to claim 11.
Distinguishing the plurality of subnetworks includes discriminating a plurality of subnetworks having the specific document as a boundary.
The plurality of subnetworks are two or more upstream networks corresponding to two or more inlinks possessed by the specific document, and each of the upstream networks has one inlink corresponding to the specific document. It has at least two or more upstream subnetworks and at least a downstream subnet network that is connected to the particular document through the outlinks that the particular document has.
The specific document is the duplicate document belonging to the two or more upstream subnetworks.
To calculate the score of each of the plurality of documents constituting the document network, one integrated score is calculated for the specific document by integrating the scores of the specific document in the upstream subnetwork. , An information processing method including calculating the score of each document belonging to the downstream subnetwork based on the integrated score of the specific document.
The information processing method according to claim 11.
Distinguishing the plurality of sub-networks is a sub-network for each in-link possessed by the specific document, and is a document group located upstream of the corresponding in-link, the specific document, and an out of the specific document. An information processing method including determining a group of documents located downstream of a link and a subnetwork for each in-link including the document group.
The information processing method according to claim 11.
Distinguishing the plurality of sub-networks is a sub-network for each combination of in-link and out-link of the specific document, and the document group located upstream of the in-link corresponding to the combination and the specific document. An information processing method including determining a sub-network for each combination including a document group located downstream from the outlink corresponding to the combination.
The information processing method according to claim 13 or 14.
The integration is achieved by calculating the sum of the scores of the corresponding duplicate documents in the two or more subnetworks, or by calculating the representative value of the scores in the two or more subnetworks. Information processing method to be performed.
To make a computer function as the document network discriminating unit, the document discriminating unit, the sub-network discriminating unit, and the score calculating unit included in the information processing system according to any one of claims 1 to 9. Computer program.