WO2019106878A1 - Information processing system, information processing method, and computer program - Google Patents

Information processing system, information processing method, and computer program Download PDF

Info

Publication number
WO2019106878A1
WO2019106878A1 PCT/JP2018/026560 JP2018026560W WO2019106878A1 WO 2019106878 A1 WO2019106878 A1 WO 2019106878A1 JP 2018026560 W JP2018026560 W JP 2018026560W WO 2019106878 A1 WO2019106878 A1 WO 2019106878A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
value
component
information processing
adjacency matrix
Prior art date
Application number
PCT/JP2018/026560
Other languages
French (fr)
Japanese (ja)
Inventor
桂太 杉原
Original Assignee
桂太 杉原
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 桂太 杉原 filed Critical 桂太 杉原
Priority to JP2018559909A priority Critical patent/JP6502592B1/en
Publication of WO2019106878A1 publication Critical patent/WO2019106878A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Definitions

  • the present disclosure relates to an information processing system, an information processing method, and a computer program.
  • page rank is determined as high as web pages linked from many web pages.
  • the page rank calculation includes an adjacency matrix in which the connection relationship between web pages is binary-represented by the values 0 and 1, and values 0, 1 and other real numbers obtained by changing each component of the adjacency matrix. A matrix is used.
  • an information processing system for scoring a plurality of documents includes an identification unit, a determination unit, a setting unit, a definition unit, an eigenvector calculation unit, and a score calculation unit.
  • the specifying unit specifies one or more document networks including at least a weakly connected document group based on data representing a connection relationship between a plurality of documents.
  • the determination unit determines, for each of the one or more document networks, start point documents that are documents located at the start point of the corresponding document network.
  • the setting unit connects dummy documents for each of one or more document networks by connecting virtual dummy documents of in-degree 1 and out-degree 0 to documents of out degree 0 in the corresponding document network.
  • Set up a virtual document network to include.
  • the definition unit defines a special Hermite adjacency matrix for each of the one or more virtual document networks corresponding to the one or more document networks.
  • the special Hermite adjacency matrix is a modification of the N-by-N Hermitian adjacency matrix based on the connection between documents D [m] (1 ⁇ m ⁇ N) constituting the corresponding virtual document network.
  • m is an integer.
  • the Hermitian adjacency matrix is a Hermitian matrix with zero diagonal components.
  • the component h (p, q) at the p-th row and the q-th column has a link from the document D [p] to the document D [q] and the document D [q] to the document D [p]
  • a value + i (i is an imaginary unit) when 0 indicates that there is a link from document D [p] to document D [q] but no link from document D [q] to document D [p]
  • the link from the document D [p] to the document D [q] does not exist but the link from the document D [q] to the document D [p] exists, the value ⁇ i is indicated.
  • the eigenvector calculation unit calculates an eigenvector corresponding to the largest absolute value eigenvalue of the special Hermite adjacency matrix.
  • the eigenvectors have components corresponding to each of the documents D [m] (1 ⁇ m ⁇ N).
  • the eigenvector calculation unit can calculate an eigenvector corresponding to a positive eigenvalue, when a positive eigenvalue and a negative eigenvalue are present as the maximum absolute value eigenvalues.
  • the score calculation unit determines the document D [m] (1 ⁇ m ⁇ ) when each component of the eigenvectors is arranged on the complex plane based on the component corresponding to the start document. N) Based on the positional relationship on the complex plane between them, the score of each of the documents D [m] (1 ⁇ m ⁇ N) is calculated.
  • a plurality of documents are scored using a special Hermite adjacency matrix corresponding to a Hermitian adjacency matrix capable of expressing the connection relation between documents with four values of 1, 0, + i and -i. For this reason, it is not necessary to presume virtual connection relationships from all documents to all documents, and scoring / ranking of each document based on the connection relationship between documents can be realized more appropriately than in the past.
  • the definition unit is configured such that when each component of the eigenvector is placed on the complex plane, all components fall within an angular range of ⁇ / 2 radians (that is, 90 degrees) in the complex plane.
  • the Hermitian adjacency matrix may be modified to be configured to define a special Hermitian adjacency matrix.
  • the hermite is arranged so that the components of the eigenvector corresponding to each document do not go through more than one rotation on the complex plane, in particular, the components fit within the area corresponding to one quadrant on the complex plane. Since the adjacency matrix is deformed to calculate the score, distance information on the rotational direction on the complex plane can be used to appropriately score each document.
  • the special Hermitian adjacency matrix substitutes the component indicating the value + i in the Hermitian adjacency matrix with the value C1 (C2 + i) based on the first correction amount C1 and the second correction amount C2,
  • the component indicating i is replaced by the value C1 (C2-i)
  • the value C1 (C2 + i) of the component of each row in the Hermite adjacency matrix after the replacement is represented by the value C1 (C2 + i) and the value 1 in the same row.
  • the first correction amount C1 and the second correction amount C2 can be determined as follows using the parameter n.
  • the parameter n is an angle of ⁇ / 2 radians in the complex plane based on the number of documents max ⁇ N ⁇ in the virtual document network where the number of documents N is the largest among the one or more virtual document networks It may be determined to fall within the range.
  • the first correction amount C1 and the second correction amount C2 are special Hermite adjacency matrices such that all components fall within an angular range of ⁇ / 2 radian in the complex plane while maintaining the relationship of each component of the eigenvector Help to define
  • the special Hermitian adjacency matrix replaces the component indicating the value + i with the value C1 (C2 + i) in the Hermitian adjacency matrix and replaces the component indicating the value -i with the value C1 (C2-i).
  • the value C1 (C2 + i) of the component of each row in the Hermite adjacency matrix after the replacement is changed to the value ⁇ C1 (C2 + i) / W ⁇ divided by the number C of the components showing the value C1 (C2 + i) and the value 1 in the same row
  • the value C1 (C2-i) of the component which is in a symmetrical positional relationship across the diagonal component with the component changed to the value ⁇ C1 (C2 + i) / W ⁇ is set to the value ⁇ C1 (C2 + i) / Change the complex conjugate of W ⁇ to ⁇ C1 (C2-i) / W ⁇ and change the component value ⁇ C1 (C2-i) / W ⁇ of each row in the modified Hermite adjacency matrix H in the same row before the change C1 (C2-i) and the component showing the value 1 Change to the value ⁇ C1 (C2-i) Z / W ⁇ multiplied by the number Z, and further symmetrical
  • the special Hermitian adjacency matrix substitutes the component indicating the value + i in the Hermitian adjacency matrix with the value C + i based on the correction amount C, and indicates the value C + i of each row in the Hermitian adjacency matrix after the substitution. Is changed to the value ⁇ (C + i) / R ⁇ divided by the number R of components indicating the value C + i in the same row, and the value of the component indicating the value ⁇ i is further symmetrical with respect to the diagonal components. It may correspond to a Hermitian matrix defined by changing to complex conjugates ⁇ (Ci) / R ⁇ of components in a certain positional relationship.
  • the correction amount C may be determined as follows using the parameter n.
  • the correction amount C serves to define a special Hermitian adjacency matrix such that all components of the eigenvectors fall within an angular range of ⁇ / 2 radians in the complex plane.
  • the score calculation unit determines, for each of one or more virtual document networks, the value V [m] (1 ⁇ m ⁇ N) of each component of the eigenvector and the component corresponding to the start document is complex. Rotation conversion may be performed so as to be disposed at a specific position in the plane. The score calculation unit may calculate each score of the document D [m] (1 ⁇ m ⁇ N) based on the value Vc [m] after rotational conversion of the corresponding document D [m].
  • the score calculation unit is configured to arrange the component corresponding to the start document as a specific position at a predetermined angle that is not zero from a specific axis located at a boundary of one quadrant in the complex plane. Then, the value V [m] (1 ⁇ m ⁇ N) of each component of the eigenvector may be subjected to rotational conversion.
  • the score calculation unit determines, as each score of the document D [m] (1 ⁇ m ⁇ N), based on the value Vc [m] after rotational conversion of the corresponding document D [m].
  • a value corresponding to the product ( ⁇ [m] ⁇ L [m]) of the angle ⁇ [m] of the value Vc [m] from the specific axis in the complex plane and the absolute value L [m] of the value Vc [m] It may be calculated.
  • the score calculation unit determines, for each of the one or more virtual document networks, the value V [m] (1 ⁇ m ⁇ N) of each component of the eigenvector and the value of the component corresponding to the start document
  • the component corresponding to the start document may be arranged on a specific axis of the complex plane.
  • the score calculation unit then rotates the value V [m] / E (1 ⁇ m ⁇ N) of each component of the eigenvector so that the component corresponding to the start document is rotated from the specific axis to the position of a predetermined angle that is not zero. Rotation conversion may be performed.
  • the score calculation unit divides the value V [m] (1 ⁇ m ⁇ N) of each component of the eigenvector by the specific value E, and in the complex plane, the rotation conversion is based on rotation conversion rather than the specific axis.
  • the value V [m] / of each component of the eigenvector is set to rotate the component located most upstream from the specific axis to a position at a predetermined angle other than zero.
  • E (1 ⁇ m ⁇ N) may be subjected to rotational conversion.
  • each score of the document D [m] (1 ⁇ m ⁇ N) is set such that the minimum value of all the scores of the document D [m] (1 ⁇ m ⁇ N) is zero. It may be calculated as a value obtained by subtracting the product ( ⁇ [m] ⁇ L [m]).
  • the score calculation unit determines, for each of the one or more virtual document networks, the value V [m] (1 ⁇ m ⁇ N) of each component of the eigenvector and the value of the component corresponding to the start document By dividing by the specific value E based on V [s], the component corresponding to the start document is placed on the specific axis of the complex plane, and the score of each of the document D [m] (1 ⁇ m ⁇ N) is obtained.
  • the particular axis may be a real axis in the complex plane.
  • the score calculation unit divides, as the specific value E, the value V [m] of each component of the eigenvector by the value V [s] of the component corresponding to the start document to obtain the component corresponding to the start document an actual complex plane. It can be arranged on the axis.
  • the determination unit may determine a document located at the start point of the longest path in the corresponding document network as a start document. In one aspect of the present disclosure, when the plurality of paths correspond to the longest path in the corresponding document network, the determination unit determines the document with the highest out-degree among the documents located at the start points of the plurality of paths. The document with the lowest degree may be determined as the start document.
  • the determination unit in the case where the plurality of routes correspond to the longest route in the corresponding document network, the determination unit is a virtual in-degree zero connected to the document located at the start point of the plurality of routes.
  • the dummy document may be placed, and the placed virtual dummy document may be determined as the start point document.
  • the determination unit determines that the plurality of paths correspond to the longest path in the corresponding document network, and further, the document or output document with the highest outdegree in the plurality of documents located at the start points of the plurality of paths.
  • a virtual dummy document with an in-degree of zero connected to a plurality of documents with the highest out degree or the lowest out degree located at the start points of the plurality of paths The arranged and arranged virtual dummy documents may be determined as the start document.
  • the arrangement determination unit may be configured to determine the arrangement of documents in the list of documents corresponding to the search query based on the scores of the plurality of documents calculated by the score calculation unit.
  • the sequencing unit may generate a list of documents corresponding to a search query such that documents with higher scores are placed higher in the list.
  • an information processing method may be provided that includes a procedure performed by the above-described information processing system. That is, according to one aspect of the present disclosure, an information processing method for scoring a plurality of documents may be provided.
  • An information processing method includes identifying one or more document networks including at least a weakly linked document group based on data representing a connection relationship between a plurality of documents. For each of one or more document networks, determining a start document which is a document located at the start of the corresponding document network, and in the corresponding document network, an in-degree 1 and an out-degree in a document with an out degree of zero Setting up a virtual document network including dummy documents for each of one or more document networks by connecting 0 virtual dummy documents, and one or more virtual documents corresponding to one or more document networks Defining a special Hermitian adjacency matrix for each of the document networks, and Calculating an eigenvector having a component corresponding to each of the documents D [m] (1 ⁇ m ⁇ N), which is an eigenvector corresponding to a pairwise maximum eigenvalue, and for each of one or more virtual document networks , Document D based on the positional relationship on the complex plane between documents D
  • the special Hermite adjacency matrix is defined by transforming the Hermitian adjacency matrix so that when each component of the eigenvector is placed on the complex plane, all the components fall within the angular range of ⁇ / 2 radian in the complex plane. May be
  • a computer program for causing a computer to execute the above-described information processing method may be provided.
  • a non-transitional tangible storage medium storing the computer program may be provided.
  • an information processing system may be provided that includes a processor and a memory that stores a computer program for causing the processor to perform processing for scoring a plurality of documents.
  • FIG. 5A is a diagram showing an example of a linked document network
  • FIG. 5B is a diagram showing a corresponding virtual document network
  • FIG. 6A is a diagram showing an example of a linked document network
  • FIG. 6B is a diagram showing a corresponding virtual document network
  • FIG. 7A shows an example of a linked document network
  • FIG. 7B shows a corresponding virtual document network
  • FIG. 8A shows an example of a linked document network
  • FIG. 8B shows a corresponding virtual document network.
  • FIG. 9A shows an example of a linked document network
  • FIG. 9B shows a corresponding virtual document network
  • FIG. 10A is a diagram showing an example of a linked document network
  • FIG. 10B is a diagram showing a corresponding virtual document network.
  • FIG. 1 illustrates an example of a consolidated document network. It is a flowchart showing the process which a 2nd scoring part performs in order to set a starting point page. It is a figure explaining the generation method of a special Hermite adjacency matrix. It is a flowchart showing the process which a 2nd scoring part performs for rotation conversion.
  • FIG. 15A is an explanatory diagram of the arrangement of a plurality of components on a complex plane
  • FIG. 15A is an explanatory diagram of the arrangement of a plurality of components on a complex plane
  • 15B is a diagram illustrating a method of calculating a score. It is a figure which shows the modification of a special Hermite adjacency matrix. In another modification, it is a flow chart showing processing which a 2nd scoring part performs. It is a figure which shows another modification of a special Hermite adjacency matrix. It is a figure explaining the calculation method of the score in another modification.
  • SYMBOLS 1 Information processing system, 5 ... User terminal, 10 ... Arithmetic unit, 11 ... Processor, 15 ... Memory, 20 ... Storage part, 30 ... Communication part, 110 ... Crawler, 120 ... Indexer, 130 ... Query processing part, 140 ... Query response unit, 141: first scoring unit, 143: second scoring unit, 145: ranking unit, 147: output unit, 210: page repository, 220: index storage unit, SP: start point page, EP: end point Page, DP ... dummy page.
  • the information processing system 1 of the present embodiment shown in FIG. 1 is configured to provide the user terminal 5 with a list of documents corresponding to the search query in response to the search query input from the user terminal 5.
  • the document is a web document, specifically a web page. That is, the information processing system 1 functions as a search engine available to the user terminal 5 through the communication network.
  • the communication network is, for example, the Internet.
  • the information processing system 1 includes an arithmetic unit 10, a storage unit 20, and a communication unit 30.
  • the arithmetic unit 10 includes a processor 11 and a memory 15.
  • the storage unit 20 stores computer programs and data executed by the processor.
  • the storage unit 20 can include one of a hard disk drive and a solid state drive.
  • the communication unit 30 is configured to be able to communicate with the user terminal 5.
  • Arithmetic unit 10 implements a search function, that is, a function as a search engine by executing processing in accordance with the computer program stored in storage unit 20. Specifically, the processing for realizing the search function is executed by the processor 11.
  • the information processing system 1 schematically shown in FIG. 1 can be configured by one or more group of cooperating server devices.
  • the operation unit 10 functions as the crawler 110, the indexer 120, the query processing unit 130, and the query response unit 140 shown in FIG. 2, and the storage unit 20 functions as the page repository 210 and the index storage unit 220. It is realized by doing.
  • the crawler 110 is configured to collect web pages that reside in a communication network, similar to known crawlers.
  • the web pages collected by the crawler 110 are stored in the page repository 210.
  • the indexer 120 is configured to analyze and index web pages stored in the page repository 210. Through indexing, information valuable for search is extracted from the web page, and index data corresponding to the web page is generated.
  • Index data includes information valuable to searches extracted from web pages.
  • index data includes content index, structure index, and special purpose index.
  • the content index includes information on web page keywords, titles, and key sentences.
  • the structure index includes information representing the hyperlink structure of the web page.
  • Special purpose indexes hold information useful for special query processing such as image indexes and pdf indexes.
  • the index data for each web page generated by the indexer 120 is stored in the index storage unit 220.
  • a group of index data represents a connection between web pages.
  • the query processing unit 130 receives a search query from the user, and extracts a related page group which is a set of web pages corresponding to the search query from among all the web pages. All web pages here are found in the communication network by the crawler 110, and correspond to web pages in which index data is registered in the index storage unit 220.
  • the query processing unit 130 sets a set of web pages including a vocabulary corresponding to the search query as a related page group from among all the web pages. Extract. Then, the information on the related page group is provided to the query response unit 140.
  • the query response unit 140 transmits a search result list in which related page groups are arranged in page rank order, to the user terminal 5 as response data to the search query, based on the information on the related page group provided from the query processing unit 130.
  • Each of the related pages is ranked higher on the web page with higher relevance and importance to the search query, and is placed at the top of the search result list.
  • the search result list is configured as a web page with links to the listed related pages, as well as response data from conventional search engines.
  • the links referred to here are so-called hyperlinks.
  • the query response unit 140 includes a first scoring unit 141, a second scoring unit 143, a ranking unit 145, and an output unit 147.
  • the first scoring unit 141 scores each of the related pages for the related page group corresponding to the search query based on the degree of association of the page content with the search query, and sets each of the related pages as a first score. Configured to give content points.
  • the second scoring unit 143 operates independently of the search query, and is configured to give each of the web pages collected by the crawler 110 an important score based on the connection relationship between web pages as a second score. Be done.
  • the second score is calculated so as to indicate a larger value as the web page estimated to be more important from the connection relationship between web pages, and is assigned to each web page. Specifically, a web page with many links, a web page linked from a web page with a high importance score, and a web page with a link from a web page with few links to other web pages have a larger value. It is calculated to indicate and given to each web page.
  • the ranking unit 145 is based on the first score calculated for each of the related pages by the first scoring unit 141 and the second score calculated for each of the related pages by the second scoring unit 143, It is configured to calculate each page rank of the related page.
  • each page rank of the relevant page corresponds to a weighted sum of the first score and the second score.
  • the page rank of each related page may be understood as an overall score including the content score based on the search query and the important score based on the connection relationship between web pages.
  • the output unit 147 arranges the page list in which related pages corresponding to the search query are arranged in order from the related page having the highest page rank based on the page rank of each related page calculated by the ranking unit 145. , And transmits it as a search result list to the user terminal 5 of the search query transmission source.
  • scoring of web pages based on the connection relationship between web pages is performed using an adjacency matrix in which each component is represented by a binary value (1, 0) (real number).
  • web page scoring is performed using a Hermite adjacency matrix H represented by four values (complex numbers) of values 1, 0, + i, -i.
  • i represents an imaginary unit.
  • a Hermite adjacency matrix H is generated for each web page group in a weakly connected relationship.
  • a network consisting of nodes in a weak connection relationship ignores the connection direction of the link between the nodes, it links from any one node of the nodes belonging to the network to the remaining nodes.
  • a document network consisting of web pages in a weak link relationship may at least be indirectly linked to the remaining web pages when any one of the web pages belonging to the document network ignores the link connection direction. It consists of a group of web pages that have various connection relationships.
  • the second scoring unit 143 periodically executes the flow chart shown in FIG. 4 so that, based on the latest index data stored in the index storage unit 220, the corresponding web page group for each web page group in a weakly connected relationship. Calculate a second score for each web page that belongs to the page group.
  • the second scoring unit 143 extracts one or more connected document networks from all the web pages (S110). Specifically, by referring to the index data stored in the index storage unit 220, each web page group having a weakly connected relationship is extracted as one connected document network among all web pages (S110). .
  • One linked document network is composed of web pages having weak link relationships.
  • FIGS. 5A, 6A, 7A, 8A, 9A, 10A, and 11 show examples of linked document networks.
  • One circle in FIGS. 5A, 6A, 7A, 8A, 9A, 10A, and 11 corresponds to one node, in other words, one web page.
  • the arrow in the figure indicates that a link (hyperlink) to the web page corresponding to the end point of the arrow is formed on the web page corresponding to the start point of the arrow. That is, it means that the web page corresponding to the start point of the arrow can be moved via the link to the web page corresponding to the end point of the arrow.
  • a double-headed arrow means that links to each other are formed in two web pages connected to the arrow and can be moved in both directions.
  • Each web page in the document network shown in the figure is at least indirectly connected to another web page when the direction of the arrow is ignored, as apparent from the figure.
  • these web pages will be expressed as the k-th web page using the number k shown in a circle in the figure.
  • the second scoring unit 143 sets the start page SP for each of the linked document networks. Specifically, the second scoring unit 143 sets the start point page SP by executing the process shown in FIG. 12 for each of the linked document networks.
  • the second scoring unit 143 searches for one or more paths corresponding to the longest path in the corresponding linked document network (S121).
  • the second scoring unit 143 observes the rule that, for each of the nodes (web pages) in the corresponding linked document network, it can not move to the same node multiple times from the corresponding node to the adjacent node according to the arrow.
  • it is possible to calculate the number of nodes constituting the route when moving until there is no destination, and search for the route with the largest number of nodes as the longest route.
  • the second scoring unit 143 determines whether a plurality of routes are found as the longest route (S122). When only one route is found as the longest route (No in S122), the second scoring unit 143 sets a web page corresponding to the start point of the route found as the longest route as the start page SP (S123).
  • the first web page with zero indegree located at the start of the longest path is set as the start page SP .
  • the second scoring unit 143 selects the specific route having the smallest or largest out-degree from the starting point among the plurality of routes corresponding to the longest route. Search one or more routes corresponding to (S124). The designer of the system can arbitrarily determine whether the specific route is defined as “a route with the smallest out-degree from the start point” or “a route with the largest out-order from the start point”. A plurality of routes corresponding to the longest route exist in the linked document network illustrated in FIGS. 10A and 11.
  • the second scoring unit 143 determines whether or not a plurality of routes corresponding to the specific route are found (S125). If it is determined that a plurality of paths are found (Yes in S125), the second scoring unit 143 determines that one in-order zero dummy connected to the start points of the plurality of paths upstream of the plurality of paths corresponding to the specific path. A page is arranged, and this dummy page is set as the start page SP (S126).
  • the dummy page said here means a virtual web page.
  • one dummy page having links to the first web page and the third web page is provided as a start page SP upstream of the first web page and the third web page.
  • the second scoring unit 143 determines that only one route is found as the specific route (No in S125), it sets the web page located at the start point of this route as the start page SP (S127).
  • the linked document network shown in FIG. 11 includes a plurality of paths starting from the first, second, third, fourth, fifth, sixth, seventh and tenth web pages as the longest paths.
  • the path whose starting point degree is the smallest is a path starting from the tenth web page. Therefore, when the specific route is defined as the route with the smallest outgoing order from the start point, the tenth web page is set as the start point page SP.
  • the second scoring unit 143 sets the start point page SP according to the network structure for each of the connected document networks (S120). Thereafter, the second scoring unit 143 sets a virtual document network corresponding to each of the connected document networks (S130). One virtual document network corresponds to one linked document network.
  • the second scoring unit 143 searches, for each of the linked document networks, a web page with an output degree of zero existing in the corresponding linked document network as the end page EP, and enters the found end page EP.
  • a virtual document network is set by adding dummy pages DP of degree 1 and degree 0.
  • the dummy page DP connected to the end point page EP is a virtual such that there is a link from the end point page EP to the dummy page DP and no link from the dummy page DP to another web page in the corresponding virtual document network. It is a web page.
  • the second scoring unit 143 adds a dummy page DP to each of the end point pages EP to set a virtual document network.
  • FIGS. 5B, 6B, 7B, 8B, 9B, 10B, and 11 are connected documents shown in FIGS. 5A, 6A, 7A, 8A, 9A, 10A, and 11 Shown is a virtual document network corresponding to each of the networks.
  • the virtual document network corresponding to the linked document network shown in FIG. 11 is the same as the linked document network because there is no zero-degree web page in the linked document network.
  • the reason why the virtual document network is set by adding the dummy page DP to the end point page EP here is as follows. If the scoring is performed using the hermitian adjacency matrix H based on the connected document network without the dummy page DP, the score of the end page EP with zero out degree is the adjacent page having a link to the end page EP It shows a tendency to lower against the score of. However, the end point page EP corresponding to the end of the connection direction between web pages can be said to be a web page of high importance and is a web page for which a high score is to be calculated.
  • the second scoring unit 143 specifies the maximum value max ⁇ N ⁇ of the element number N of the virtual document network (S140).
  • the number N of elements of one virtual document network corresponds to the total number of web pages and dummy pages belonging to the virtual document network.
  • the maximum value max ⁇ N ⁇ corresponds to the number N of elements of the virtual document network with the largest number N of elements among all the virtual document networks.
  • the second scoring unit 143 selects one of the virtual document networks as a processing target.
  • the virtual document network selected here is hereinafter referred to as a selected document network.
  • the second scoring unit 143 calculates, based on the Hermite adjacency matrix H of the selected document network, a second score for each of the web pages constituting the selected document network (S160 to S220).
  • one of the web pages which comprise a selection document network is expressed as web page D [m].
  • the variable m takes an integer value of 1 to N (1 ⁇ m ⁇ N), and the web page D [m] corresponds to the mth web page in the selected document network.
  • N is the number N of elements of the selected document network.
  • the second scoring unit 143 generates a Hermitian adjacency matrix H representing the connection relationship of web pages in the selected document network with the values 1, 0, + i, -i.
  • the Hermitian adjacency matrix H is a matrix of N rows and N columns (NxN) corresponding to the number N of elements of the selected document network, and each matrix is a matrix which takes any one of the values 1, 0, + i and -i. is there.
  • the expression “component h (p, q)” in the following indicates the component of the p th row and the q column in the Hermite adjacency matrix H.
  • the second scoring unit 143 responds when there is a link from the web page D [p] to the web page D [q] and a link from the web page D [q] to the web page D [p].
  • Component h (p, q) is set to the value 1.
  • the second scoring unit 143 sets the diagonal component h (p, p) of the Hermitian adjacency matrix H to the value zero.
  • the second scoring unit 143 When there is a link from the web page D [p] to the web page D [q] but there is no link from the web page D [q] to the web page D [p], the second scoring unit 143 The corresponding component h (p, q) is set to the value + i. When there is no link from web page D [p] to web page D [q] but a link from web page D [q] to web page D [p], the corresponding component h (p, q) , Set the value -i.
  • the second scoring unit 143 sets the value of each component h (p, q) and generates a Hermite adjacency matrix H.
  • the Hermitian adjacency matrix H is a Hermitian matrix.
  • the second scoring unit 143 generates a special Hermite adjacency matrix H1 obtained by transforming the Hermitian adjacency matrix H (S170).
  • the transformation is performed such that when each component of the eigenvector V of the special Hermite adjacency matrix H1 is arranged as a complex vector in the complex plane, all of the components fall within an angle range of ⁇ / 2 radians.
  • the second scoring unit 143 calculates a first correction amount C1 and a second correction amount C2 according to the following equation based on the maximum value max ⁇ N ⁇ calculated in S140.
  • the parameter n is a natural number equal to or greater than the maximum value max ⁇ N ⁇ specified in S140.
  • n max ⁇ N ⁇ in order to fit all of the above components into an angle range of ⁇ / 2 radians.
  • the value of the parameter n is made larger than the maximum value max ⁇ N ⁇ , all of the components fall within an angle range smaller than the angle range of ⁇ / 2 radians.
  • the purpose of having all of the components in an angular range of ⁇ / 2 radians is to ensure that all of the components are in one quadrant on the complex plane.
  • the parameter n can be set to any value as long as this object can be achieved. However, for good calculation of the second score, the parameter n is preferably set to a value as small as possible within the range in which the object can be achieved.
  • the second correction amount C2 contributes to adjustment of the angular range, and the first correction amount C1 helps to prevent the absolute value of the matrix component from changing due to the second correction amount C2.
  • the second scoring unit 143 replaces the component of the value + i in the hermitian adjacency matrix H with the value C1 (C2 + i), and the component indicating the value -i Replace with the value C1 (C2-i).
  • the second scoring unit 143 further calculates the value C1 (C2 + i) of the component of each row in the Hermite adjacency matrix H after the replacement, the number of the component indicating the value C1 (C2 + i) in the same row, and the number of the component indicating the value 1 Change to the value ⁇ C1 (C2 + i) / W ⁇ divided by the sum W of.
  • the second scoring unit 143 further sets the value C1 (C2-i) of the component that is in a symmetrical positional relationship with respect to the component changed to the value ⁇ C1 (C2 + i) / W ⁇ and the diagonal component to the value ⁇ Change to the complex conjugate of C1 (C2 + i) / W ⁇ ⁇ C1 (C2-i) / W ⁇ .
  • a Hermitian matrix defined by such permutations and changes is generated as a special Hermitian adjacency matrix H1.
  • FIG. 1 A specific example of the transformation procedure from the Hermite adjacency matrix H to the special Hermite adjacency matrix H1 is shown in FIG.
  • the component taking the value + i and the component taking the value 1 are the total W1
  • each component indicating the value + i in the p1th row in the Hermite adjacency matrix H is changed to the value ⁇ C1 (C2 + i) / W1 ⁇ .
  • each component indicating the value + i in the p2th row in the Hermite adjacency matrix H is changed to a value ⁇ C1 (C2 + i) / W2 ⁇ .
  • the value of the component indicating the value -i is changed to the complex conjugate of the component in a symmetrical positional relationship with respect to the diagonal component.
  • the value of the component h (q1, p1) which has a symmetrical positional relationship with respect to the component h (p1, q1) representing the value ⁇ C1 (C2 + i) / W1 ⁇ with respect to the diagonal component is ⁇ C1 (C2-) i) Change to / W1 ⁇ .
  • the value of the component h (q2, p2) that is in a symmetrical positional relationship with respect to the component h (p2, q2) indicating the value ⁇ C1 (C2 + i) / W2 ⁇ with respect to the diagonal component is ⁇ C1 (C2) Change to -i) / W2 ⁇ .
  • the second scoring unit 143 displays each component V [m] (1 ⁇ m ⁇ N) of the eigenvector V corresponding to the absolute value maximum eigenvalue of the special Hermite adjacency matrix H1 as the start page of the selected document network.
  • the start point page SP corresponds to the start point page SP set in S120 for the selected document network.
  • the component of the eigenvector V corresponding to the start page SP is converted to the value “1”.
  • the eigenvector V after division is expressed as an eigenvector V1.
  • the second scoring unit 143 rotationally converts each component V [m] / E (1 ⁇ m ⁇ N) of the eigenvector V1 after division (S200).
  • the second scoring unit 143 can execute the process shown in FIG.
  • the second scoring unit 143 arranges each component of the eigenvector V1 in the complex plane, whether or not there is a component located on the first quadrant side of the real axis in the complex plane. It is determined (S201). The component located on the first quadrant side from the real axis does not include the component located on the real axis.
  • the second scoring unit 143 rotates each component of the eigenvector V1 by a predetermined angle ⁇ 1 to the fourth quadrant side. Rotation conversion (S203). In this manner, the second scoring unit 143 rotates each component of the eigenvector V1 so as to arrange the component corresponding to the start page SP at a specific position away from the real axis by the angle ⁇ 1 on the complex plane. Convert.
  • FIG. 15A illustrates the arrangement on the complex plane of each component before rotational transformation.
  • each of the black circle and the white circle corresponds to one of the components of the eigenvector V1, and the black circle particularly corresponds to the component of the start page SP.
  • FIG. 15B illustrates the arrangement on the complex plane of each component after rotational transformation.
  • the angle ⁇ 1 is, for example, ⁇ / 180 radians.
  • the angle ⁇ 1 is sufficient if the components corresponding to the start page SP are apart from the real axis within the range in which all components of the eigenvector V1 are all contained in the fourth quadrant, and is not limited to ⁇ / 180 radians . The reason for separating the component corresponding to the start point page SP from the real axis will be described later.
  • the second scoring unit 143 proceeds to S205.
  • the second scoring unit 143 identifies the angle ⁇ 2 from the real axis of the component farthest from the real axis in the first quadrant.
  • the second scoring unit 143 rotationally converts each component of the eigenvector V1 so as to rotate it by the angle ⁇ 1 + ⁇ 2 toward the fourth quadrant (S207).
  • the angle ⁇ 1 is as described above for the process of S203.
  • the second scoring unit 143 puts all the components of the eigenvector V1 in the fourth quadrant of the complex plane, including the components located on the first quadrant side from the start page SP.
  • the second scoring unit 143 arranges each component Vc [m] (1 ⁇ m ⁇ N) of the eigenvector Vc as a complex vector in the complex plane.
  • a value Z [m] L [m] ⁇ ⁇ [m] (1 ⁇ m ⁇ N) corresponding to the distance from the real axis of Vc [m] (1 ⁇ m ⁇ N) is calculated (S210).
  • FIG. 15B conceptually shows the value Z [m].
  • the value L [m] corresponds to the absolute value
  • the value ⁇ [m] corresponds to the angle between the component Vc [m] and the real axis, in other words, the angle of the component Vc [m] clockwise from the real axis.
  • the second scoring unit 143 calculates the second score of each web page D [m] (1 ⁇ m ⁇ N) based on the value Z [m], and stores the calculated second score in the storage unit 20. (S220).
  • min ⁇ Z [m] ⁇ means the smallest value in the group of Z [m] (1 ⁇ m ⁇ N). That is, as the second score X2 of each web page D [m] (1 ⁇ m ⁇ N), the second score X2 of the web page D [m] indicating the smallest Z [m] has a value of zero. A value obtained by subtracting the minimum value min ⁇ Z [m] ⁇ from [m] is calculated.
  • the second scoring unit 143 calculates, as the second score, a higher score for a web page in which the component of the eigenvector V is farther from the real axis and for a web page in which the absolute value of the component is larger.
  • the component corresponding to the start point page SP is at the specific position closest to the real axis, and the second score corresponding to the start point page SP is zero. Therefore, according to the present embodiment, a higher second score is calculated as the web page is farther from the start point page SP.
  • a web page in which the component of the eigenvector V is largely rotated from the component of the eigenvector corresponding to the start page SP and a web page having a larger absolute value of the component have a higher second score It is calculated.
  • each component of the eigenvector Vc must be within the angular range of ⁇ / 2 radians of the complex plane without making one rotation on the complex plane.
  • the component corresponding to starting point page SP is arrange
  • the value Z [m] and the second score of each web page D [m] are calculated according to the positional relationship with the component corresponding to the start point page SP.
  • the rotational conversion is performed so that the component corresponding to the start point page SP is disposed offset from the real axis.
  • the rotational conversion of each component Vc [m] away from the real axis corresponds to the rotational conversion of each component of the eigenvector V1 such that ⁇ [m] becomes nonzero.
  • the second score in particular, the second score of the web page having an interactive link can be favorably calculated based on L [m].
  • the second score of each web page can be calculated so that the second score corresponding to the end point page EP becomes high thanks to the dummy page DP. Therefore, according to the present embodiment, it is possible to calculate, as the second score, an appropriate score according to the connection relationship of web pages in the selected document network.
  • the second scoring unit 143 After calculating the second score of each web page D [m] in the selected document network in S220, the second scoring unit 143 proceeds to S230 and executes the processing of S160 to S220 for all virtual document networks, It is determined whether the second score for the web page has been calculated. Then, if there is an unprocessed virtual document network (No in S230), the process returns to S150, and one of the unprocessed virtual document networks is selected as a new virtual document network to be processed. That is, the selected document network is changed. Thereafter, the processing of S160 to S220 is executed for the new selected document network, and the second score of each web page in the selected document network is calculated and stored.
  • the second scoring unit 143 ends the processing shown in FIG.
  • the second scoring unit 143 can read out the second score of each web page stored in the storage unit 20 by this processing from the storage unit 20 as needed. For example, when the search query occurs, the second scoring unit 143 can read out the second score of each of the related pages corresponding to the search query from the storage unit 20 and provide the second score to the ranking unit 145.
  • the processing shown in FIG. 4 arranges the components corresponding to each web page in the fourth quadrant, and further arranges the components corresponding to the start point page SP at a specific position, with the value Z m
  • the second score can be calculated on a common scale for web pages of different consolidated document networks.
  • the ranking unit 145 appropriately calculates the overall score of each related page, that is, the page rank, based on the second score, even when web pages of different connected document networks are included in the related page group. can do.
  • the ranking unit 145 sets the arrangement of the web page in the search result list in such a form that the page rank Y is based on the first score and the second score. , It can be determined that the corresponding web page is placed at the top.
  • the output unit 147 can provide the user terminal 5 with an appropriate search result list based on such a connection relationship between web pages. Specifically, it is possible to provide the user terminal 5 with a search result list that is positioned higher as the web page where links away from the start point page where a high second score is calculated gathers. Therefore, according to the present embodiment, scoring / ranking of a plurality of web pages according to the connection relationship more appropriately than in the past can be performed, and appropriate search results can be provided to the user terminal 5.
  • the web pages in the linked document network of FIG. 5A are ranked in the order of “5” “4” “3” “2” “1” as in the conventional method.
  • Web pages in the linked document network of FIG. 7A are ranked in the order of “4” “3” “7” “8” “6” “2” “5” “1” according to the conventional method, According to the present embodiment, the ranking is performed in the order of “4” “3” “2” “5” “6” “7” “8” “1”.
  • web pages in the linked document network of FIG. 8A are ranked in the order of “5” “4” “2” “3” “1”, and according to this embodiment “5” "4" “3” "2” "1” is ranked in the order.
  • Web pages in the linked document network of FIG. 9A are ranked in the order of “4” “3” “2” “1” according to the conventional method, and according to the present embodiment, similar to the conventional method. , “4" “3” “2” “1” is ranked.
  • the web pages in the linked document network of FIG. 11 are “4” “6” “7” “1” “2” “8” “9” “3” “5” “10” It is ranked in order, and according to the present embodiment, it is ranked in the order of “4” “6” “7” “2” “1” “3” “5” “8” “9” “10” .
  • the second score is calculated so as to indicate a larger value as the web page in which the links away from the start page SP gather.
  • the second score is calculated to indicate a larger value as a web page with more links gathers.
  • the second score is calculated such that web pages linked from web pages having high importance scores show larger values.
  • the second score is calculated such that web pages having links from web pages with few links to other web pages show larger values.
  • the second score is further calculated to indicate a larger value as the web page with more links to other web pages.
  • the second score is further calculated such that web pages having links to web pages having high importance scores show larger values. This point is particularly the feature of the special Hermite adjacency matrix H1. Therefore, according to the present embodiment, it is possible to realize good scoring of web pages.
  • the second scoring unit 143 executes the process of S210 after generating the eigenvector V1 in S190 without executing the process of S200.
  • This example can be applied to a linked document network in which there is no bi-directional link, and in which no web page placed in the first quadrant side of the start page SP appears.
  • may be set to a small value such as ⁇ / 180 radians so that ⁇ [m] + ⁇ does not exceed ⁇ / 2 radians.
  • non-zero value Z [m] including information of value L [m] can be calculated, and an appropriate second score can be calculated. It can be calculated.
  • the second scoring unit 143 generates another special Hermite adjacency matrix H2 instead of the above-mentioned special Hermite adjacency matrix H1.
  • the eigenvalues and eigenvectors of the special Hermite adjacency matrix H2 are calculated (S180).
  • the processes after S190 are executed using the eigenvector V corresponding to the eigenvalue with the maximum absolute value calculated here. That is, the second scoring unit 143 of the third modification executes the same process as the above-described embodiment, using another special Hermite adjacency matrix H2 instead of the special Hermite adjacency matrix H1.
  • the special Hermite adjacency matrix H2 shown in FIG. 16 is an example of the special Hermitian adjacency matrix H2 corresponding to the Hermite adjacency matrix H shown in the upper part of FIG.
  • second scoring unit 143 replaces the component of value + i in Hermitian adjacency matrix H with value C1 (C2 + i), and replaces the component indicating value -i with value C1 (C2-i).
  • the second scoring unit 143 further performs the following processing A and processing B.
  • the second scoring unit 143 is a value obtained by dividing the value C1 (C2 + i) in the component of each row in the Hermite adjacency matrix H after replacement by the number W of the components indicating the value C1 (C2 + i) and the value 1 in the same row ⁇ Change to C1 (C2 + i) / W ⁇ , and further change the value ⁇ C1 (C2 + i) / W ⁇ to the value C1 (C2-i) in the component which is in a symmetrical positional relationship with respect to the diagonal component. Change) to the complex conjugate ⁇ C1 (C2-i) / W ⁇ of value ⁇ C1 (C2 + i) / W ⁇ .
  • the second scoring unit 143 multiplies the values C1 (C2-i) in the components of each row in the Hermite adjacency matrix H after replacement by C1 (C2-i) and the number Z of the components indicating the value 1 in the same row. Change the value to ⁇ C1 (C2-i) Z ⁇ , and further change the value to a value ⁇ C1 (C2-i) Z ⁇ and the value in the component in a symmetrical positional relationship with respect to the diagonal component. Change C1 (C2 + i) to the complex conjugate ⁇ C1 (C2 + i) Z ⁇ of value ⁇ C1 (C2-i) Z ⁇ .
  • a special Hermite adjacency matrix H2 is generated.
  • the second scoring unit 143 may execute the process B after the execution of the process A, or may execute the process A after the execution of the process B. It may be executed.
  • the same special Hermite adjacency matrix H2 is generated by performing process A and process B in any manner.
  • the second scoring unit 143 is a value obtained by dividing the value C1 (C2 + i) of the component of each row in the Hermite adjacency matrix H after replacement by the number W of the components indicating the value C1 (C2 + i) and the value 1 in the same row ⁇ C1 Change to (C 2 + i) / W ⁇ , and further, change the value ⁇ C 1 (C 2 + i) / W ⁇ to the component and the value C 1 (C 2 -i) of the component in symmetrical positional relationship across the diagonal component.
  • the second scoring unit 143 sets the values ⁇ C1 (C2-i) / W ⁇ of the components of each row in the Hermitian adjacency matrix H after the change to C1 (C2-i) and the value 1 in the same row before the change.
  • the value Z1 in the special Hermite adjacency matrix H2 shown in FIG. 16 takes the value -i among the components h (p3, 1), h (p3, 2), ..., h (p3, N) in the p3 row. It corresponds to the number of components and components that take the value one.
  • the value Z2 is a component taking the value -i and a component taking the value 1 out of a total of N components h (p4,1), h (p4,2), ..., h (p4, N) in the p4th row
  • Correspond to the number of Line p3 may be understood as the line whose value ⁇ C1 (C2-i) Z1 / W1 ⁇ is shown in FIG.
  • Line p4 may be understood as the line whose value ⁇ C1 (C2-i) Z2 / W2 ⁇ is shown in FIG.
  • the second scoring unit 143 uses the special Hermite adjacency matrix H2 thus calculated in place of the special Hermitian adjacency matrix H1 to execute the processing of S180 to S220. For each of the linked document networks shown in FIGS. 5A, 7A, 8A, 9A, and 11, the ranking of web pages based on the second score calculated by applying the third modification is described above. A description method similar to the description method in the embodiment will be described.
  • the web pages in the consolidated document network of FIG. 5A are ranked in the order of “5” “4” “3” “2” “1”.
  • Web pages in the consolidated document network of FIG. 7A are ranked in the order of “3” “4” “2” “5” “6” “7” “8” “1”.
  • Web pages in the consolidated document network of FIG. 8A are ranked in the order of “5” “4” “3” “2” “1”.
  • Web pages in the consolidated document network of FIG. 9A are ranked in the order of books “4” “3” “1” “2”.
  • Web pages in the linked document network of FIG. 11 are ranked in the order of “4” “6” “7” “2” “1” “3” “5” “8” “9” “10” .
  • the special Hermitian adjacency matrix H2 is different from the special Hermitian adjacency matrix H1 in that the second score is calculated to indicate a larger value as the web page having links to web pages with more links from other web pages. Is different. This point is particularly the feature of the special Hermite adjacency matrix H2.
  • the special Hermitian adjacency matrix H2 exhibits features similar to the special Hermitian adjacency matrix H1 in other respects, but the second score shows larger values for web pages having links to web pages with higher importance scores.
  • the calculated features are specific to the special Hermitian adjacency matrix H1. According to the third modification, it is possible to realize good scoring of web pages.
  • the connected document network includes in-degree zero web pages and out-degree zero web pages. Furthermore, it does not deal with a linked document network having a cyclic structure in which a plurality of paths exist from one start page SP to one end page EP, and a linked document network in which mutual links exist.
  • the second scoring unit 143 is configured to execute the process shown in FIG. 17 instead of the process shown in FIG.
  • the process shown in FIG. 17 has many parts that are the same as the process shown in FIG. Therefore, in the following, the description of the same parts will be simplified.
  • the second scoring unit 143 extracts one or more connected document networks from all the web pages, as in S110 (S310). Thereafter, the second scoring unit 143 sets the start point page SP for each of the connected document networks (S320). When there is only one web page with an in-degree of zero included in the linked document network, the start page SP of the corresponding linked document network is set to the web page with an in-degree of zero. If the corresponding connected document network includes a plurality of web pages with zero indegree, the start page SP is set based on the path length.
  • the second scoring unit 143 searches for a path having the largest path length among the paths according to the direction of the link from the web page with zero in-degree to the web page with zero in-degree.
  • the second scoring unit 143 sets a web page with an in-degree of zero as the start point page SP in the path with the largest path length.
  • the second scoring unit 143 arranges dummy pages of in-degree zero connected to the start points of the plurality of paths as shown in FIGS. 10A and 10B.
  • a dummy page can be set as the start page SP.
  • the second scoring unit 143 performs the second scoring when there are a plurality of paths with the largest path length and there are a plurality of start points with the smallest (or largest) outdegree at the start points of the plurality of paths. As shown in FIGS. 10A and 10B, the unit 143 can arrange a dummy page of in-degree zero connected to the start points of a plurality of paths, and set the dummy page as the start page SP. When there are a plurality of paths with the largest path length but there is only one starting point with the smallest (or largest) outgoing degree at the starting points of the plurality of paths, the second scoring unit 143 is the smallest (or the largest) The start point of the out-degree of) can be set in the start page SP.
  • the second scoring unit 143 sets a virtual document network for each of the connected document networks, as in the process of S130 (S330). That is, a web page with a zero outgoing degree existing in the corresponding linked document network is searched as the end point page EP, a dummy page DP is added to the end page EP, and a virtual document network is set.
  • the second scoring unit 143 specifies the maximum value max ⁇ N ⁇ of the number N of elements of the virtual document network, as in S140 (S340). In the subsequent S350, the second scoring unit 143 selects one of the virtual document networks as a processing target.
  • the second scoring unit 143 In S360, the second scoring unit 143 generates a Hermitian adjacency matrix H corresponding to the selected document network, as in S160. Furthermore, the second scoring unit 143 generates a special Hermite adjacency matrix H3 by the deformation of the Hermite adjacency matrix H (S370). At the time of deformation, the second scoring unit 143 calculates the correction amount C according to the following equation based on the maximum value max ⁇ N ⁇ calculated in S340.
  • the parameter n in the above equation is predetermined so that all of the components fall within an angle range of ⁇ / 2 radian when each component of the eigenvectors of the special Hermite adjacency matrix H3 is arranged as a complex vector in the complex plane.
  • the parameter n may be defined as a natural number equal to or greater than the maximum value max ⁇ N ⁇ specified in S340.
  • the second scoring unit 143 replaces the component of value + i in the Hermite adjacency matrix H with a value (C + i), and further, the value of each row in the Hermite adjacency matrix H after the replacement (C + i).
  • the component which shows) is changed into the value ⁇ (C + i) / R ⁇ which divided the number R of the components which show value (C + i) in the same line, and further, the value of the component which shows value -i, the diagonal component Change to the complex conjugate ⁇ (C ⁇ i) / R ⁇ of the components in symmetrical positional relationship with each other.
  • the second scoring unit 143 generates a Hermitian matrix defined by such permutations and changes as a special Hermite adjacency matrix H3.
  • the special Hermite adjacency matrix H3 shown in FIG. 18 is an example of the special Hermitian adjacency matrix H3 corresponding to the Hermite adjacency matrix H shown in the upper part of FIG.
  • the component taking the value + i is R1
  • each component indicating the value + i in the p1 row is changed to the value ⁇ (C + i) / R1 ⁇ .
  • the total N components h (p2,1), h (p2,2),..., H (p2, N) in the row p2 if there are R2 components taking the value + i, Hermitian adjacent
  • Each component indicating the value + i of the p2 th row in the matrix H is changed to the value ⁇ (C + i) / R2 ⁇ .
  • the value of component h (q1, p1) that is in a symmetrical positional relationship with respect to component h (p1, q1) indicating the value ⁇ (C + i) / R1 ⁇ with respect to the diagonal component is ⁇ (Ci) / It is changed to R1 ⁇ .
  • the value of the component h (q2, p2) that is in a symmetrical positional relationship with respect to the component h (p2, q2) indicating the value ⁇ (C + i) / R2 ⁇ with respect to the diagonal component is ⁇ (Ci) Change to /).
  • the special Hermitian adjacency matrix H3 is generated.
  • the Hermitian adjacency matrix H does not include the component of value 1.
  • the number R corresponds to the number W of the above embodiment.
  • the second scoring unit 143 calculates the eigenvalues and eigenvectors of the special Hermite adjacency matrix H3 generated in S370, as in the process of S180.
  • the second scoring unit 143 processes each component V [m] (1 ⁇ m ⁇ N) of the eigenvector V corresponding to the eigenvalue with the maximum absolute value of the special Hermite adjacency matrix H3, as in the process of S190. , Divide by the component E corresponding to the start page SP of the selected document network.
  • each component V [m] (1 ⁇ m ⁇ N) of the eigenvector V corresponding to a positive eigenvalue when the maximum absolute value eigenvalue includes a positive eigenvalue and a negative eigenvalue to the start page page SP of the selected document network Can be divided by the component E.
  • the components of the eigenvector V corresponding to the start page SP are arranged on the real axis in the complex plane.
  • the second scoring unit 143 arranges each component V [m] / E (1 ⁇ m ⁇ N) of the eigenvector V1 after division on the complex plane as a complex vector.
  • the value ⁇ [m] in the fourth modified example corresponds to the angle in the clockwise direction from the real axis of the component V [m] / E, as shown in FIG.
  • the start page SP is arranged on the real axis. Therefore, the value ⁇ [m] can also be referred to as an angle in the clockwise direction with respect to the component corresponding to the start point page SP. That is, the value ⁇ [m] corresponds to the angle formed by the component V [m] / E and the component V [s] / E corresponding to the start page SP.
  • the value L [m] corresponds to the absolute value
  • the meanings of the white and black circles shown in FIG. 19 are as described in connection with FIG. 15A.
  • the second scoring unit 143 calculates, as the second score, a score that is higher for web pages in which the component of the eigenvector V is farther from the start page SP, based on the special Hermite adjacency matrix H3.
  • each component V [m] / E (1 ⁇ m N N) of the eigenvector V1 must be in the complex plane without making one rotation on the complex plane. Arranged in one quadrant.
  • the start page SP is fixed on the real axis corresponding to the value 1 in the complex plane, and each component V [m] / E of the eigenvector V1 is arranged in the fourth quadrant. Therefore, according to the fourth modification, it is possible to easily and appropriately calculate the important score (second score) in accordance with the connection relationship between web pages based on the start page.
  • the second scoring unit 143 After calculating the second score of each web page in the selected document network in S400, the second scoring unit 143 proceeds to S410, executes the processing of S360 to S400 for all virtual document networks, and executes the second scoring for the web page. 2 Determine whether the score has been calculated.
  • the process returns to S350, and one of the unprocessed virtual document networks is used as a new virtual object to be processed. Select to document network. Thereafter, the processing of S360 to S400 is executed for the new selected document network, and the second score of each web page in the selected document network is calculated and stored. Then, when the processing of S360 to S400 is executed for all the virtual document networks, the processing shown in FIG. 17 ends. According to the fourth modification, there is a technical advantage that the amount of calculation required to calculate the second score can be suppressed.
  • the technology of the present disclosure is characterized by using a special Hermitian adjacency matrix as described above. It is mathematically proved that the eigenvalues are always real in a Hermitian matrix including a Hermitian adjacency matrix and a special Hermitian adjacency matrix.
  • the Hermitian adjacency matrix and the special Hermitian adjacency matrix generally have only positive eigenvalues whose absolute values are only positive eigenvalues for the eigenvalues with the maximum absolute value, provided that the connection between documents (web pages) is at least weakly connected.
  • the degree of duplication is 1 or the eigenvalue with the largest absolute value is only the negative one and the degree of duplication is 1 or the eigenvalue with the largest absolute value is the positive and negative eigenvalues and the degree of each duplication is 1 It has been demonstrated in several cases.
  • the eigenvalue with the largest absolute value is a positive and negative one, adopt the positive eigenvalue to ensure that the eigenvector corresponding to the eigenvalue with the largest absolute value is unique. be able to.
  • the maximum absolute eigenvalue includes positive and negative eigenvalues, in general, the complex numbers of each component of the negative eigenvalue eigenvector hold or change each of the real part plus and minus signs and the imaginary part plus and minus Match each corresponding component of the positive eigenvector.
  • a change in the sign of the real part of the complex corresponds to moving the complex to a position symmetrical about the imaginary axis in the complex plane
  • a change in the sign of the imaginary part of the complex means that the complex in the complex plane is symmetrical with the real axis It corresponds to moving to the following position. That is, when the eigenvalue with the largest absolute value includes positive and negative eigenvalues, the positional relationship of each component of the eigenvector of the negative eigenvalue on the complex plane and the position of each component of the eigenvector of the positive eigenvalue on the complex plane
  • the relation has a correspondence relation through the symmetrical relation about the real axis and the symmetrical relation about the imaginary axis.
  • the eigenvalue with the largest absolute value is a positive or negative eigenvalue, specify a simple rule of adopting a negative eigenvalue and perform an operation that produces the same result as adopting a positive eigenvalue. It is possible.
  • the special Hermite adjacency matrix corresponding to each of the document networks forming weak linkage is defined using the common term of the number of documents of the document network having the largest number of documents among all the document networks. . Therefore, the score given to each document in each document network can be meaningfully compared / ranked not only within the same document network but also among different document networks.
  • the special Hermite adjacency matrix of the present disclosure does not have to determine virtual connection relations from all documents to all documents. Then, the special Hermite adjacency matrix of the present disclosure has the same tendency with regard to scoring / ranking of each document based on the connection relation between documents as in the case where the damping factor is 0.85 in the conventional adjacency matrix method. Give the result.
  • the degree of appearance of the document needs to change the score instead of changing the damping factor and searching for necessary values empirically.
  • appropriate scoring / ranking can be realized in that it is more systematic, systematic and theoretical.
  • a plurality of dummy documents constituting a path of a length corresponding to the change in the required score ie, the number of continuous arrows protected in the direction
  • the special Hermitian adjacency matrix of the present disclosure responds to the necessary change while maintaining the property of the Hermitian matrix at two locations representing the connection between a document whose score needs to be changed and one other document. It is also possible to realize scoring / ranking more appropriately in that it is more systematic, systematic, and theoretical by changing it.
  • the special Hermite adjacency matrix H2 is a systematic, systematic and theoretical development of the special Hermitian adjacency matrix H1.
  • the advantages of the special Hermitian adjacency matrix H1 described above hold also for the special Hermitian adjacency matrix H2. In this sense, the present disclosure provides significant technical improvements over the prior art.
  • the second scoring unit 143 divides each component V [m] of the eigenvector V by the component V [s] of the start page SP in S190 and S390 to form the start page SP on the complex plane, It arranges to the point of value 1 on real axis.
  • each component V [m] of the eigenvector V may be divided by -V [s] or may be divided by + V [s] ⁇ i, or -V [s] It may be divided by i or divided by a predetermined multiple (real number multiple) thereof.
  • the start page SP is placed at a point of -1 on the real axis in the complex plane, and each of the divided eigenvectors V1 is The components are arranged in the second quadrant in the complex plane. Also in this case, it is possible to appropriately calculate the second score.
  • the second scoring unit 143 selects one or more paths corresponding to the specific path having the largest output degree from the start point among the plurality of paths corresponding to the longest path. You may search.
  • the present disclosure may be applied to the scoring of documents having link / quoting relationships not limited to web documents.
  • the functions of one component in the above embodiment may be distributed to a plurality of components.
  • the functions of multiple components may be integrated into one component.
  • a part of the configuration of the above embodiment may be omitted. All the aspects contained in the technical thought specified from the wording as described in a claim are an embodiment of this indication.
  • the processes of S110 and S310 executed by the second scoring unit 143 correspond to an example of the process implemented by the identifying unit.
  • the processes of S120 and S320 correspond to an example of the process implemented by the determination unit.
  • the processes of S130 and S330 correspond to an example of the process realized by the setting unit.
  • the processes of S160, S170, S360, and S370 correspond to an example of the process realized by the definition unit.
  • the processes of S180 and S380 correspond to an example of the process realized by the eigenvector calculation unit.
  • the processes of S190 to S220 and S390 to S400 correspond to an example of the process realized by the score calculation unit.
  • the process performed by the ranking unit 145 and the output unit 147 corresponds to an example of the process implemented by the arrangement determination unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A system of the present disclosure calculates an eigenvector of a special Hermitian adjacency matrix in each of one or more virtual file networks which correspond to one or more file networks. The special Hermitian adjacency matrix is a modification of a Hermitian adjacency matrix of N rows by N columns based on a connection relationship between D[m](1≤m≤N) files that form a corresponding virtual file network. The calculated eigenvector corresponds to an eigenvalue of the maximum absolute value. The eigenvector has components which respectively correspond to the D[m](1≤m≤N) files. The system calculates, on the basis of a component that corresponds to a starting point file, respective scores of the D[m](1≤m≤N) files on the basis of a positional relationship between the D[m](1≤m≤N) files on a complex plane, when each component of the eigenvector is arranged on the complex plane.

Description

情報処理システム、情報処理方法、及びコンピュータプログラムINFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM 関連出願の相互参照Cross-reference to related applications
 本国際出願は、2017年11月28日に日本国特許庁に出願された日本国特許出願第2017-227936号に基づく優先権を主張するものであり、日本国特許出願第2017-227936号の全内容を参照により本国際出願に援用する。 This international application claims the priority based on Japanese Patent Application No. 2017-227936 filed with the Japanese Patent Office on November 28, 2017, and the Japanese Patent Application No. 2017-227936 The entire contents are incorporated into this international application by reference.
 本開示は、情報処理システム、情報処理方法、及びコンピュータプログラムに関する。 The present disclosure relates to an information processing system, an information processing method, and a computer program.
 ウェブページのスコアリング/ランク付けを行う技術が既に知られている(特許文献1参照)。この技術の単純な例では、ページランクを、多くのウェブページからリンクされるウェブページほど高く判定する。ページランクの計算には、ウェブページ間の接続関係を値0,1で二値表現した隣接行列、及び、隣接行列の各成分を変更した、値0,1と他の実数とを成分に含む行列が用いられる。 Techniques for scoring / ranking web pages are already known (see Patent Document 1). In a simple example of this technique, page rank is determined as high as web pages linked from many web pages. The page rank calculation includes an adjacency matrix in which the connection relationship between web pages is binary-represented by the values 0 and 1, and values 0, 1 and other real numbers obtained by changing each component of the adjacency matrix. A matrix is used.
特開2017-102712号公報JP, 2017-102712, A
 上述の隣接行列に基づきページランクを判定する従来方法では、ウェブページ間の実際の接続関係に加え、全てのウェブページから全てのウェブページへの仮想的な接続関係を措定する必要がある。このため、ウェブページの良好なランク付けを行うことが難しい。 In the conventional method of determining page rank based on the above-described adjacency matrix, it is necessary to estimate virtual connection relationships from all web pages to all web pages in addition to actual connection relationships between web pages. Because of this, it is difficult to make a good ranking of web pages.
 そこで、本開示の一側面によれば、新規な隣接行列に基づいて、従来よりも適切に複数文書のスコアリング/ランク付けを行うことが可能な技術を提供できることが望ましい。 Thus, according to one aspect of the present disclosure, it is desirable to be able to provide a technique capable of scoring / ranking multiple documents more appropriately than in the past based on a novel adjacency matrix.
 本開示の一側面によれば、複数の文書をスコアリングする情報処理システムが提供される。情報処理システムは、特定部と、判別部と、設定部と、定義部と、固有ベクトル算出部と、スコア算出部と、を備える。 According to one aspect of the present disclosure, an information processing system for scoring a plurality of documents is provided. The information processing system includes an identification unit, a determination unit, a setting unit, a definition unit, an eigenvector calculation unit, and a score calculation unit.
 特定部は、複数の文書間の接続関係を表すデータに基づき、少なくとも弱連結関係にある文書群で構成される文書ネットワークを1つ以上特定する。判別部は、1つ以上の文書ネットワークのそれぞれについて、対応する文書ネットワークの始点に位置する文書である始点文書を判別する。 The specifying unit specifies one or more document networks including at least a weakly connected document group based on data representing a connection relationship between a plurality of documents. The determination unit determines, for each of the one or more document networks, start point documents that are documents located at the start point of the corresponding document network.
 設定部は、対応する文書ネットワークにおいて、出次数がゼロの文書に、入次数1及び出次数0の仮想的なダミー文書を接続することにより、1つ以上の文書ネットワークのそれぞれについて、ダミー文書を含む仮想文書ネットワークを設定する。 The setting unit connects dummy documents for each of one or more document networks by connecting virtual dummy documents of in-degree 1 and out-degree 0 to documents of out degree 0 in the corresponding document network. Set up a virtual document network to include.
 定義部は、1つ以上の文書ネットワークに対応する1つ以上の仮想文書ネットワークのそれぞれについて、特殊エルミート隣接行列を定義する。特殊エルミート隣接行列は、対応する仮想文書ネットワークを構成する文書D[m](1≦m≦N)間の接続関係に基づくN行N列のエルミート隣接行列の変形である。ここで、mは、整数である。 The definition unit defines a special Hermite adjacency matrix for each of the one or more virtual document networks corresponding to the one or more document networks. The special Hermite adjacency matrix is a modification of the N-by-N Hermitian adjacency matrix based on the connection between documents D [m] (1 ≦ m ≦ N) constituting the corresponding virtual document network. Here, m is an integer.
 エルミート隣接行列は、対角成分がゼロのエルミート行列である。エルミート隣接行列において、第p行第q列の成分h(p,q)は、文書D[p]から文書D[q]へのリンクが存在し且つ文書D[q]から文書D[p]へのリンクが存在するとき、値1を示し、文書D[p]から文書D[q]へのリンク及び文書D[q]から文書D[p]へのリンクのいずれも存在しないとき、値0を示し、文書D[p]から文書D[q]へのリンクが存在するが文書D[q]から文書D[p]へのリンクが存在しないとき、値+i(iは虚数単位)を示し、文書D[p]から文書D[q]へのリンクが存在しないが文書D[q]から文書D[p]へのリンクが存在するとき、値-iを示す。 The Hermitian adjacency matrix is a Hermitian matrix with zero diagonal components. In the Hermitian adjacency matrix, the component h (p, q) at the p-th row and the q-th column has a link from the document D [p] to the document D [q] and the document D [q] to the document D [p] Indicates a value of 1 when there is a link to and when there is neither a link from document D [p] to document D [q] nor a link from document D [q] to document D [p] A value + i (i is an imaginary unit) when 0 indicates that there is a link from document D [p] to document D [q] but no link from document D [q] to document D [p] When the link from the document D [p] to the document D [q] does not exist but the link from the document D [q] to the document D [p] exists, the value −i is indicated.
 固有ベクトル算出部は、特殊エルミート隣接行列の絶対値最大の固有値に対応する固有ベクトルを算出する。固有ベクトルは、文書D[m](1≦m≦N)のそれぞれに対応する成分を有する。本開示の一側面によれば、固有ベクトル算出部は、絶対値最大の固有値として正の固有値と負の固有値が存在する場合、正の固有値に対応する固有ベクトルを算出することができる。 The eigenvector calculation unit calculates an eigenvector corresponding to the largest absolute value eigenvalue of the special Hermite adjacency matrix. The eigenvectors have components corresponding to each of the documents D [m] (1 ≦ m ≦ N). According to an aspect of the present disclosure, the eigenvector calculation unit can calculate an eigenvector corresponding to a positive eigenvalue, when a positive eigenvalue and a negative eigenvalue are present as the maximum absolute value eigenvalues.
 スコア算出部は、1つ以上の仮想文書ネットワークのそれぞれについて、始点文書に対応する成分を基準に、固有ベクトルの各成分を複素平面上に配置したときの、文書D[m](1≦m≦N)間の複素平面上の位置関係に基づき、文書D[m](1≦m≦N)のそれぞれのスコアを算出する。 For each of the one or more virtual document networks, the score calculation unit determines the document D [m] (1 ≦ m ≦) when each component of the eigenvectors is arranged on the complex plane based on the component corresponding to the start document. N) Based on the positional relationship on the complex plane between them, the score of each of the documents D [m] (1 ≦ m ≦ N) is calculated.
 この情報処理システムによれば、文書間の接続関係を1,0,+i,-iの4値で表現可能なエルミート隣接行列に対応する特殊エルミート隣接行列を用いて複数の文書をスコアリングする。このため、全文書から全文書への仮想的な接続関係を措定する必要がなく、文書間の接続関係に基づく各文書のスコアリング/ランク付けを従来よりも適切に実現することができる。 According to this information processing system, a plurality of documents are scored using a special Hermite adjacency matrix corresponding to a Hermitian adjacency matrix capable of expressing the connection relation between documents with four values of 1, 0, + i and -i. For this reason, it is not necessary to presume virtual connection relationships from all documents to all documents, and scoring / ranking of each document based on the connection relationship between documents can be realized more appropriately than in the past.
 本開示の一側面では、定義部は、固有ベクトルの各成分を複素平面上に配置したときに、全ての成分が複素平面においてπ/2ラジアン(即ち90度)の角度範囲内に収まるように、エルミート隣接行列を変形して、特殊エルミート隣接行列を定義するように構成されてもよい。 In one aspect of the present disclosure, the definition unit is configured such that when each component of the eigenvector is placed on the complex plane, all components fall within an angular range of π / 2 radians (that is, 90 degrees) in the complex plane. The Hermitian adjacency matrix may be modified to be configured to define a special Hermitian adjacency matrix.
 この情報処理システムでは、複素平面上で各文書に対応する固有ベクトルの成分が1回転以上しないように、特には、各成分が複素平面上で1象限に対応するサイズの領域に収まるように、エルミート隣接行列を変形して、スコアを算出するので、複素平面上での回転方向の距離情報を用いて、各文書を適切にスコアリングすることができる。 In this information processing system, the hermite is arranged so that the components of the eigenvector corresponding to each document do not go through more than one rotation on the complex plane, in particular, the components fit within the area corresponding to one quadrant on the complex plane. Since the adjacency matrix is deformed to calculate the score, distance information on the rotational direction on the complex plane can be used to appropriately score each document.
 本開示の一側面では、特殊エルミート隣接行列は、第一の補正量C1及び第二の補正量C2に基づき、エルミート隣接行列において値+iを示す成分を値C1(C2+i)に置換し、値-iを示す成分を値C1(C2-i)に置換し、当該置換後のエルミート隣接行列における各行の成分の値C1(C2+i)を、同じ行において値C1(C2+i)及び値1を示す成分の数Wで除算した値{C1(C2+i)/W}に変更し、更に、値{C1(C2+i)/W}に変更された成分と対角成分を挟んで対称的な位置関係にある成分の値C1(C2-i)を、値{C1(C2+i)/W}の複素共役{C1(C2-i)/W}に変更することによって定義されるエルミート行列に対応し得る。第一の補正量C1及び第二の補正量C2は、パラメータnを用いて次のように定められ得る。 In one aspect of the present disclosure, the special Hermitian adjacency matrix substitutes the component indicating the value + i in the Hermitian adjacency matrix with the value C1 (C2 + i) based on the first correction amount C1 and the second correction amount C2, The component indicating i is replaced by the value C1 (C2-i), and the value C1 (C2 + i) of the component of each row in the Hermite adjacency matrix after the replacement is represented by the value C1 (C2 + i) and the value 1 in the same row. Change to a value {C1 (C2 + i) / W} divided by a number W, and further change the value {C1 (C2 + i) / W} to a value that is symmetrical about the diagonal component with the component It may correspond to a Hermitian matrix defined by changing the value C1 (C2-i) to the complex conjugate {C1 (C2-i) / W} of the value {C1 (C2 + i) / W}. The first correction amount C1 and the second correction amount C2 can be determined as follows using the parameter n.
Figure JPOXMLDOC01-appb-M000004
 パラメータnは、上記1つ以上の仮想文書ネットワークの中で、文書数Nが最大の仮想文書ネットワークにおける文書数max{N}に基づいて、上記全ての成分が複素平面においてπ/2ラジアンの角度範囲内に収まるように定められ得る。
Figure JPOXMLDOC01-appb-M000004
The parameter n is an angle of π / 2 radians in the complex plane based on the number of documents max {N} in the virtual document network where the number of documents N is the largest among the one or more virtual document networks It may be determined to fall within the range.
 第一の補正量C1及び第二の補正量C2は、固有ベクトルの各成分の関係を保持しながら、全ての成分が複素平面においてπ/2ラジアンの角度範囲内に収まるように、特殊エルミート隣接行列を定義することに役立つ。 The first correction amount C1 and the second correction amount C2 are special Hermite adjacency matrices such that all components fall within an angular range of π / 2 radian in the complex plane while maintaining the relationship of each component of the eigenvector Help to define
 本開示の一側面では、特殊エルミート隣接行列は、エルミート隣接行列において値+iを示す成分を値C1(C2+i)に置換し、値-iを示す成分を値C1(C2-i)に置換し、当該置換後のエルミート隣接行列における各行の成分の値C1(C2+i)を、同じ行において値C1(C2+i)及び値1を示す成分の数Wで除算した値{C1(C2+i)/W}に変更し、更に、値{C1(C2+i)/W}に変更された成分と対角成分を挟んで対称的な位置関係にある成分の値C1(C2-i)を、値{C1(C2+i)/W}の複素共役{C1(C2-i)/W}に変更し、変更後のエルミート隣接行列Hにおける各行の成分の値{C1(C2-i)/W}を、変更前において同じ行でC1(C2-i)及び値1を示していた成分の数Zで乗算した値{C1(C2-i)Z/W}に変更し、更に、値{C1(C2-i)Z/W}に変更された成分と対角成分を挟んで対称的な位置関係にある成分の値{C1(C2+i)/W}を、値{C1(C2-i)Z/W}の複素共役{C1(C2+i)Z/W}に変更することによって定義されるエルミート行列に対応してもよい。 In one aspect of the present disclosure, the special Hermitian adjacency matrix replaces the component indicating the value + i with the value C1 (C2 + i) in the Hermitian adjacency matrix and replaces the component indicating the value -i with the value C1 (C2-i). The value C1 (C2 + i) of the component of each row in the Hermite adjacency matrix after the replacement is changed to the value {C1 (C2 + i) / W} divided by the number C of the components showing the value C1 (C2 + i) and the value 1 in the same row Further, the value C1 (C2-i) of the component which is in a symmetrical positional relationship across the diagonal component with the component changed to the value {C1 (C2 + i) / W} is set to the value {C1 (C2 + i) / Change the complex conjugate of W} to {C1 (C2-i) / W} and change the component value {C1 (C2-i) / W} of each row in the modified Hermite adjacency matrix H in the same row before the change C1 (C2-i) and the component showing the value 1 Change to the value {C1 (C2-i) Z / W} multiplied by the number Z, and further symmetrical about the diagonal component with the component changed to the value {C1 (C2-i) Z / W} Hermitian defined by changing the component value {C1 (C2 + i) / W} in positional relationship to the complex conjugate {C1 (C2 + i) Z / W} of value {C1 (C2-i) Z / W} It may correspond to a matrix.
 本開示の一側面では、特殊エルミート隣接行列は、補正量Cに基づき、エルミート隣接行列において値+iを示す成分を値C+iに置換し、当該置換後のエルミート隣接行列における各行の値C+iを示す成分の値を、同じ行において値C+iを示す成分の数Rで除算した値{(C+i)/R}に変更し、更に、値-iを示す成分の値を、対角成分を挟んで対称的な位置関係にある成分の複素共役{(C-i)/R}に変更することによって定義されるエルミート行列に対応してもよい。補正量Cは、パラメータnを用いて次のように定められてもよい。 In one aspect of the present disclosure, the special Hermitian adjacency matrix substitutes the component indicating the value + i in the Hermitian adjacency matrix with the value C + i based on the correction amount C, and indicates the value C + i of each row in the Hermitian adjacency matrix after the substitution. Is changed to the value {(C + i) / R} divided by the number R of components indicating the value C + i in the same row, and the value of the component indicating the value −i is further symmetrical with respect to the diagonal components. It may correspond to a Hermitian matrix defined by changing to complex conjugates {(Ci) / R} of components in a certain positional relationship. The correction amount C may be determined as follows using the parameter n.
Figure JPOXMLDOC01-appb-M000005
 補正量Cは、固有ベクトルの全ての成分が複素平面においてπ/2ラジアンの角度範囲内に収まるように、特殊エルミート隣接行列を定義することに役立つ。
Figure JPOXMLDOC01-appb-M000005
The correction amount C serves to define a special Hermitian adjacency matrix such that all components of the eigenvectors fall within an angular range of π / 2 radians in the complex plane.
 本開示の一側面では、スコア算出部は、1つ以上の仮想文書ネットワークのそれぞれについて、固有ベクトルの各成分の値V[m](1≦m≦N)を、始点文書に対応する成分が複素平面における特定位置に配置されるように回転変換してもよい。スコア算出部は、文書D[m](1≦m≦N)のそれぞれのスコアを、対応する文書D[m]の回転変換後の値Vc[m]に基づき算出してもよい。 In one aspect of the present disclosure, the score calculation unit determines, for each of one or more virtual document networks, the value V [m] (1 ≦ m ≦ N) of each component of the eigenvector and the component corresponding to the start document is complex. Rotation conversion may be performed so as to be disposed at a specific position in the plane. The score calculation unit may calculate each score of the document D [m] (1 ≦ m ≦ N) based on the value Vc [m] after rotational conversion of the corresponding document D [m].
 このようにスコアを算出すれば、複数の文書ネットワークのそれぞれの始点文書を、複素平面上において同じ位置に配置することができ、複数の文書ネットワークのそれぞれに属する文書を、同じ評価基準で適切にスコアリングすることができる。 By calculating the score in this way, it is possible to place the source documents of each of the plurality of document networks at the same position on the complex plane, and the documents belonging to each of the plurality of document networks are appropriately determined by the same evaluation criteria. It can be scored.
 本開示の一側面では、スコア算出部は、始点文書に対応する成分が、特定位置として、複素平面における一つの象限の境界に位置する特定軸からゼロではない所定角度の位置に配置されるように、固有ベクトルの各成分の値V[m](1≦m≦N)を回転変換してもよい。 In one aspect of the present disclosure, the score calculation unit is configured to arrange the component corresponding to the start document as a specific position at a predetermined angle that is not zero from a specific axis located at a boundary of one quadrant in the complex plane. Then, the value V [m] (1 ≦ m ≦ N) of each component of the eigenvector may be subjected to rotational conversion.
 本開示の一側面では、スコア算出部は、文書D[m](1≦m≦N)のそれぞれのスコアとして、対応する文書D[m]の回転変換後の値Vc[m]に基づき、複素平面における特定軸からの値Vc[m]の角度θ[m]と値Vc[m]の絶対値L[m]との積(θ[m]・L[m])に対応する値を算出してもよい。 In one aspect of the present disclosure, the score calculation unit determines, as each score of the document D [m] (1 ≦ m ≦ N), based on the value Vc [m] after rotational conversion of the corresponding document D [m]. A value corresponding to the product (θ [m] · L [m]) of the angle θ [m] of the value Vc [m] from the specific axis in the complex plane and the absolute value L [m] of the value Vc [m] It may be calculated.
 本開示の一側面では、スコア算出部は、1つ以上の仮想文書ネットワークのそれぞれについて、固有ベクトルの各成分の値V[m](1≦m≦N)を、始点文書に対応する成分の値V[s]に基づいた特定値Eで除算することにより、始点文書に対応する成分を複素平面の特定軸上に配置してもよい。スコア算出部は、その後、始点文書に対応する成分を特定軸からゼロではない所定角度の位置まで回転させるように、固有ベクトルの各成分の値V[m]/E(1≦m≦N)を回転変換してもよい。 In one aspect of the present disclosure, the score calculation unit determines, for each of the one or more virtual document networks, the value V [m] (1 ≦ m ≦ N) of each component of the eigenvector and the value of the component corresponding to the start document By dividing by a specific value E based on V [s], the component corresponding to the start document may be arranged on a specific axis of the complex plane. The score calculation unit then rotates the value V [m] / E (1 ≦ m ≦ N) of each component of the eigenvector so that the component corresponding to the start document is rotated from the specific axis to the position of a predetermined angle that is not zero. Rotation conversion may be performed.
 本開示の一側面では、スコア算出部は、固有ベクトルの各成分の値V[m](1≦m≦N)を特定値Eで除算したときに、複素平面において、特定軸よりも回転変換による回転方向の上流に位置する成分が存在する場合には、最も上流に位置する成分を、特定軸からゼロではない所定角度の位置まで回転させるように、固有ベクトルの各成分の値V[m]/E(1≦m≦N)を回転変換してもよい。 In one aspect of the present disclosure, the score calculation unit divides the value V [m] (1 ≦ m ≦ N) of each component of the eigenvector by the specific value E, and in the complex plane, the rotation conversion is based on rotation conversion rather than the specific axis. When there is a component located upstream in the rotational direction, the value V [m] / of each component of the eigenvector is set to rotate the component located most upstream from the specific axis to a position at a predetermined angle other than zero. E (1 ≦ m ≦ N) may be subjected to rotational conversion.
 本開示の一側面では、文書D[m](1≦m≦N)のそれぞれのスコアを、文書D[m](1≦m≦N)の全てのスコアの最小値がゼロとなるように積(θ[m]・L[m])を減算した値として算出してもよい。 In one aspect of the present disclosure, each score of the document D [m] (1 ≦ m ≦ N) is set such that the minimum value of all the scores of the document D [m] (1 ≦ m ≦ N) is zero. It may be calculated as a value obtained by subtracting the product (θ [m] · L [m]).
 本開示の一側面では、スコア算出部は、1つ以上の仮想文書ネットワークのそれぞれについて、固有ベクトルの各成分の値V[m](1≦m≦N)を、始点文書に対応する成分の値V[s]に基づいた特定値Eで除算することにより、始点文書に対応する成分を複素平面の特定軸上に配置し、文書D[m](1≦m≦N)のそれぞれのスコアとして、対応する文書D[m]の成分の値V[m]/Eの特定軸からの角度θ[m]と値V[m]/Eの絶対値L[m]との積(θ[m]・L[m])に対応する値を算出してもよい。 In one aspect of the present disclosure, the score calculation unit determines, for each of the one or more virtual document networks, the value V [m] (1 ≦ m ≦ N) of each component of the eigenvector and the value of the component corresponding to the start document By dividing by the specific value E based on V [s], the component corresponding to the start document is placed on the specific axis of the complex plane, and the score of each of the document D [m] (1 ≦ m ≦ N) is obtained. The product of the angle θ [m] of the value V [m] / E of the component of the corresponding document D [m] from the specific axis and the absolute value L [m] of the value V [m] / E (θ [m ] · A value corresponding to L [m]) may be calculated.
 本開示の一側面では、特定軸は、複素平面における実軸であってもよい。スコア算出部は、特定値Eとして、固有ベクトルの各成分の値V[m]を始点文書に対応する成分の値V[s]で除算することにより、始点文書に対応する成分を複素平面の実軸上に配置することができる。 In one aspect of the present disclosure, the particular axis may be a real axis in the complex plane. The score calculation unit divides, as the specific value E, the value V [m] of each component of the eigenvector by the value V [s] of the component corresponding to the start document to obtain the component corresponding to the start document an actual complex plane. It can be arranged on the axis.
 本開示の一側面では、判別部は、対応する文書ネットワークにおいて、最も長い経路の始点に位置する文書を始点文書に判別してもよい。
 本開示の一側面では、判別部は、対応する文書ネットワークにおいて複数の経路が最も長い経路に該当する場合には、複数の経路の始点に位置する文書の内、出次数が最も多い文書又は出次数が最も少ない文書を、始点文書に判別してもよい。
In one aspect of the present disclosure, the determination unit may determine a document located at the start point of the longest path in the corresponding document network as a start document.
In one aspect of the present disclosure, when the plurality of paths correspond to the longest path in the corresponding document network, the determination unit determines the document with the highest out-degree among the documents located at the start points of the plurality of paths. The document with the lowest degree may be determined as the start document.
 本開示の一側面では、判別部は、対応する文書ネットワークにおいて複数の経路が最も長い経路に該当する場合には、複数の経路の始点に位置する文書に接続される入次数がゼロの仮想的なダミー文書を配置し、配置した仮想的なダミー文書を始点文書として判別してもよい。本開示の一側面では、判別部は、対応する文書ネットワークにおいて複数の経路が最も長い経路に該当し、さらに、これら複数の経路の始点に位置する複数の文書において出次数が最も多い文書又は出次数が最も少ない文書が複数存在する場合には、複数の経路の始点に位置する出次数が最も多い又は出次数が最も少ない複数の文書に接続される入次数がゼロの仮想的なダミー文書を配置し、配置した仮想的なダミー文書を始点文書として判別してもよい。 In one aspect of the present disclosure, in the case where the plurality of routes correspond to the longest route in the corresponding document network, the determination unit is a virtual in-degree zero connected to the document located at the start point of the plurality of routes. The dummy document may be placed, and the placed virtual dummy document may be determined as the start point document. In one aspect of the present disclosure, the determination unit determines that the plurality of paths correspond to the longest path in the corresponding document network, and further, the document or output document with the highest outdegree in the plurality of documents located at the start points of the plurality of paths. When there are a plurality of documents with the lowest degree, a virtual dummy document with an in-degree of zero connected to a plurality of documents with the highest out degree or the lowest out degree located at the start points of the plurality of paths The arranged and arranged virtual dummy documents may be determined as the start document.
 本開示の一側面では、スコア算出部により算出された複数の文書のスコアに基づき、検索クエリに対応する文書のリストにおける文書の配列を決定する配列決定部を備えてもよい。本開示の一側面では、配列決定部は、検索クエリに対応する文書のリストを、スコアが高い文書ほどリストにおいて対応する文書が上位に配置されるように生成してもよい。 In one aspect of the present disclosure, the arrangement determination unit may be configured to determine the arrangement of documents in the list of documents corresponding to the search query based on the scores of the plurality of documents calculated by the score calculation unit. In one aspect of the present disclosure, the sequencing unit may generate a list of documents corresponding to a search query such that documents with higher scores are placed higher in the list.
 この情報処理システムによれば、検索クエリに対応する複数の文書を、重要度に応じた順序で配列したリストを生成することができ、適切な検索結果を出力することができる。
 本開示の一側面では、上述の情報処理システムが実行する手順を含む情報処理方法が提供されてもよい。即ち、本開示の一側面では、複数の文書をスコアリングする情報処理方法が提供されてもよい。
According to this information processing system, it is possible to generate a list in which a plurality of documents corresponding to the search query are arranged in the order according to the importance, and it is possible to output an appropriate search result.
In an aspect of the present disclosure, an information processing method may be provided that includes a procedure performed by the above-described information processing system. That is, according to one aspect of the present disclosure, an information processing method for scoring a plurality of documents may be provided.
 本開示の一側面に係る情報処理方法は、複数の文書間の接続関係を表すデータに基づき、少なくとも弱連結関係にある文書群で構成される文書ネットワークを1つ以上特定することと、特定された1つ以上の文書ネットワークのそれぞれについて、対応する文書ネットワークの始点に位置する文書である始点文書を判別することと、対応する文書ネットワークにおいて、出次数がゼロの文書に入次数1及び出次数0の仮想的なダミー文書を接続することにより、1つ以上の文書ネットワークのそれぞれについて、ダミー文書を含む仮想文書ネットワークを設定することと、1つ以上の文書ネットワークに対応する1つ以上の仮想文書ネットワークのそれぞれについて、特殊エルミート隣接行列を定義することと、特殊エルミート隣接行列の絶対値最大の固有値に対応する固有ベクトルであって、文書D[m](1≦m≦N)のそれぞれに対応する成分を有する固有ベクトルを算出することと、1つ以上の仮想文書ネットワークのそれぞれについて、始点文書に対応する成分を基準に、固有ベクトルの各成分を複素平面上に配置したときの、文書D[m](1≦m≦N)間の複素平面上の位置関係に基づき、文書D[m](1≦m≦N)のそれぞれのスコアを算出することと、を含んでいてもよい。 An information processing method according to an aspect of the present disclosure includes identifying one or more document networks including at least a weakly linked document group based on data representing a connection relationship between a plurality of documents. For each of one or more document networks, determining a start document which is a document located at the start of the corresponding document network, and in the corresponding document network, an in-degree 1 and an out-degree in a document with an out degree of zero Setting up a virtual document network including dummy documents for each of one or more document networks by connecting 0 virtual dummy documents, and one or more virtual documents corresponding to one or more document networks Defining a special Hermitian adjacency matrix for each of the document networks, and Calculating an eigenvector having a component corresponding to each of the documents D [m] (1 ≦ m ≦ N), which is an eigenvector corresponding to a pairwise maximum eigenvalue, and for each of one or more virtual document networks , Document D based on the positional relationship on the complex plane between documents D [m] (1 ≦ m ≦ N) when the components of the eigenvector are arranged on the complex plane based on the component corresponding to the start document Calculating each score of [m] (1 ≦ m ≦ N).
 特殊エルミート隣接行列は、固有ベクトルの各成分を複素平面上に配置したときに、全ての成分が複素平面においてπ/2ラジアンの角度範囲内に収まるように、エルミート隣接行列を変形することによって定義されてもよい。 The special Hermite adjacency matrix is defined by transforming the Hermitian adjacency matrix so that when each component of the eigenvector is placed on the complex plane, all the components fall within the angular range of π / 2 radian in the complex plane. May be
 本開示の一側面では、上述した情報処理方法をコンピュータに実行させるためのコンピュータプログラムが提供されてもよい。本開示の一側面によれば、このコンピュータプログラムを格納した非遷移的実体的記録媒体が提供されてもよい。本開示の一側面では、プロセッサと、プロセッサに複数の文書をスコアリングするための処理を実行させるためのコンピュータプログラムを記憶するメモリと、を備える情報処理システムが提供されてもよい。 According to an aspect of the present disclosure, a computer program for causing a computer to execute the above-described information processing method may be provided. According to an aspect of the present disclosure, a non-transitional tangible storage medium storing the computer program may be provided. According to one aspect of the present disclosure, an information processing system may be provided that includes a processor and a memory that stores a computer program for causing the processor to perform processing for scoring a plurality of documents.
情報処理システムの構成を表すブロック図である。It is a block diagram showing composition of an information processing system. 情報処理システムに関する機能ブロック図である。It is a functional block diagram regarding an information processing system. クエリ応答部の詳細を表す機能ブロック図である。It is a functional block diagram showing the detail of a query response part. 第2スコアリング部が実行する処理を表すフローチャートである。It is a flowchart showing the process which a 2nd scoring part performs. 図5Aは、連結文書ネットワークの例を示す図であり、図5Bは、対応する仮想文書ネットワークを示す図である。FIG. 5A is a diagram showing an example of a linked document network, and FIG. 5B is a diagram showing a corresponding virtual document network. 図6Aは、連結文書ネットワークの例を示す図であり、図6Bは、対応する仮想文書ネットワークを示す図である。FIG. 6A is a diagram showing an example of a linked document network, and FIG. 6B is a diagram showing a corresponding virtual document network. 図7Aは、連結文書ネットワークの例を示す図であり、図7Bは、対応する仮想文書ネットワークを示す図である。FIG. 7A shows an example of a linked document network, and FIG. 7B shows a corresponding virtual document network. 図8Aは、連結文書ネットワークの例を示す図であり、図8Bは、対応する仮想文書ネットワークを示す図である。FIG. 8A shows an example of a linked document network, and FIG. 8B shows a corresponding virtual document network. 図9Aは、連結文書ネットワークの例を示す図であり、図9Bは、対応する仮想文書ネットワークを示す図である。FIG. 9A shows an example of a linked document network, and FIG. 9B shows a corresponding virtual document network. 図10Aは、連結文書ネットワークの例を示す図であり、図10Bは、対応する仮想文書ネットワークを示す図である。FIG. 10A is a diagram showing an example of a linked document network, and FIG. 10B is a diagram showing a corresponding virtual document network. 連結文書ネットワークの例を示す図である。FIG. 1 illustrates an example of a consolidated document network. 第2スコアリング部が始点ページを設定するために実行する処理を表すフローチャートである。It is a flowchart showing the process which a 2nd scoring part performs in order to set a starting point page. 特殊エルミート隣接行列の生成方法を説明した図である。It is a figure explaining the generation method of a special Hermite adjacency matrix. 第2スコアリング部が回転変換のために実行する処理を表すフローチャートである。It is a flowchart showing the process which a 2nd scoring part performs for rotation conversion. 図15Aは、複数成分の複素平面上の配置に関する説明図であり、図15Bは、スコアの算出方法を説明した図である。FIG. 15A is an explanatory diagram of the arrangement of a plurality of components on a complex plane, and FIG. 15B is a diagram illustrating a method of calculating a score. 特殊エルミート隣接行列の変形例を示す図である。It is a figure which shows the modification of a special Hermite adjacency matrix. 別の変形例において、第2スコアリング部が実行する処理を表すフローチャートである。In another modification, it is a flow chart showing processing which a 2nd scoring part performs. 特殊エルミート隣接行列の別の変形例を示す図である。It is a figure which shows another modification of a special Hermite adjacency matrix. 別の変形例におけるスコアの算出方法を説明した図である。It is a figure explaining the calculation method of the score in another modification.
1…情報処理システム、5…ユーザ端末、10…演算部、11…プロセッサ、15…メモリ、20…記憶部、30…通信部、110…クローラ、120…インデクサ、130…クエリ処理部、140…クエリ応答部、141…第1スコアリング部、143…第2スコアリング部、145…ランク付け部、147…出力部、210…ページリポジトリ、220…インデックス記憶部、SP…始点ページ、EP…終点ページ、DP…ダミーページ。 DESCRIPTION OF SYMBOLS 1 ... Information processing system, 5 ... User terminal, 10 ... Arithmetic unit, 11 ... Processor, 15 ... Memory, 20 ... Storage part, 30 ... Communication part, 110 ... Crawler, 120 ... Indexer, 130 ... Query processing part, 140 ... Query response unit, 141: first scoring unit, 143: second scoring unit, 145: ranking unit, 147: output unit, 210: page repository, 220: index storage unit, SP: start point page, EP: end point Page, DP ... dummy page.
 本開示の例示的実施形態を、以下に図面を参照しながら説明する。
 図1に示す本実施形態の情報処理システム1は、ユーザ端末5から入力される検索クエリに応答して、ユーザ端末5に、検索クエリに対応する文書のリストを提供するように構成される。文書は、ウェブ文書、具体的にはウェブページである。即ち、情報処理システム1は、通信ネットワークを通じてユーザ端末5から利用可能な検索エンジンとして機能する。通信ネットワークは、例えば、インターネットである。
Exemplary embodiments of the present disclosure are described below with reference to the drawings.
The information processing system 1 of the present embodiment shown in FIG. 1 is configured to provide the user terminal 5 with a list of documents corresponding to the search query in response to the search query input from the user terminal 5. The document is a web document, specifically a web page. That is, the information processing system 1 functions as a search engine available to the user terminal 5 through the communication network. The communication network is, for example, the Internet.
 この情報処理システム1は、演算部10と、記憶部20と、通信部30と、を備える。演算部10は、プロセッサ11及びメモリ15を備える。記憶部20は、プロセッサにより実行されるコンピュータプログラム及びデータを記憶する。記憶部20は、ハードディスクドライブ及びソリッドステートドライブの一方を備えることができる。 The information processing system 1 includes an arithmetic unit 10, a storage unit 20, and a communication unit 30. The arithmetic unit 10 includes a processor 11 and a memory 15. The storage unit 20 stores computer programs and data executed by the processor. The storage unit 20 can include one of a hard disk drive and a solid state drive.
 通信部30は、ユーザ端末5と通信可能に構成される。演算部10は、記憶部20に記憶されたコンピュータプログラムに従う処理を実行することにより、検索機能、即ち検索エンジンとしての機能を実現する。この検索機能を実現するための処理は、具体的には、プロセッサ11により実行される。図1に簡略的に示される情報処理システム1は、具体的には、一つ以上の協働するサーバ装置群で構成され得る。 The communication unit 30 is configured to be able to communicate with the user terminal 5. Arithmetic unit 10 implements a search function, that is, a function as a search engine by executing processing in accordance with the computer program stored in storage unit 20. Specifically, the processing for realizing the search function is executed by the processor 11. Specifically, the information processing system 1 schematically shown in FIG. 1 can be configured by one or more group of cooperating server devices.
 検索機能は、演算部10が、図2に示すクローラ110、インデクサ120、クエリ処理部130、及び、クエリ応答部140として機能し、記憶部20が、ページリポジトリ210、及びインデックス記憶部220として機能することにより実現される。 In the search function, the operation unit 10 functions as the crawler 110, the indexer 120, the query processing unit 130, and the query response unit 140 shown in FIG. 2, and the storage unit 20 functions as the page repository 210 and the index storage unit 220. It is realized by doing.
 クローラ110は、周知のクローラと同様に、通信ネットワークに存在するウェブページを収集するように構成される。クローラ110により収集されたウェブページは、ページリポジトリ210に蓄積される。 The crawler 110 is configured to collect web pages that reside in a communication network, similar to known crawlers. The web pages collected by the crawler 110 are stored in the page repository 210.
 インデクサ120は、ページリポジトリ210に蓄積されたウェブページを解析してインデックス化するように構成される。インデックス化により、ウェブページから、検索に価値のある情報が抽出され、ウェブページに対応するインデックスデータが生成される。 The indexer 120 is configured to analyze and index web pages stored in the page repository 210. Through indexing, information valuable for search is extracted from the web page, and index data corresponding to the web page is generated.
 インデックスデータは、ウェブページから抽出された検索に価値のある情報を含む。具体的に、インデックスデータは、内容インデックス、構造インデックス、及び特殊用途インデックスを含む。内容インデックスは、ウェブページのキーワード、タイトル、キーとなる文章の情報を含む。構造インデックスは、ウェブページのハイパーリンク構造を表す情報を含む。特殊用途インデックスは、画像インデックス及びpdfインデックスなどの特殊なクエリ処理に役立つ情報を保持する。インデクサ120により生成されたウェブページ毎のインデックスデータは、インデックス記憶部220に記憶される。インデックスデータの一群は、ウェブページ間の接続関係を表す。 Index data includes information valuable to searches extracted from web pages. Specifically, index data includes content index, structure index, and special purpose index. The content index includes information on web page keywords, titles, and key sentences. The structure index includes information representing the hyperlink structure of the web page. Special purpose indexes hold information useful for special query processing such as image indexes and pdf indexes. The index data for each web page generated by the indexer 120 is stored in the index storage unit 220. A group of index data represents a connection between web pages.
 クエリ処理部130は、ユーザからの検索クエリを受け付け、検索クエリに対応するウェブページの集合である関連ページ群を、全ウェブページの中から抽出する。ここでいう全ウェブページは、クローラ110により通信ネットワーク内で見つけられ、インデックス記憶部220にインデックスデータが登録されたウェブページ群に対応する。 The query processing unit 130 receives a search query from the user, and extracts a related page group which is a set of web pages corresponding to the search query from among all the web pages. All web pages here are found in the communication network by the crawler 110, and correspond to web pages in which index data is registered in the index storage unit 220.
 具体的に、クエリ処理部130は、インデックス記憶部220が記憶する各ウェブページの内容インデックスに基づき、検索クエリに対応する語彙を含むウェブページの集合を関連ページ群として、全ウェブページの中から抽出する。そして、この関連ページ群の情報をクエリ応答部140に提供する。 Specifically, based on the content index of each web page stored in the index storage unit 220, the query processing unit 130 sets a set of web pages including a vocabulary corresponding to the search query as a related page group from among all the web pages. Extract. Then, the information on the related page group is provided to the query response unit 140.
 クエリ応答部140は、クエリ処理部130から提供される関連ページ群の情報に基づき、関連ページ群をページランク順に配列した検索結果リストを、検索クエリに対する応答データとして、ユーザ端末5に送信する。関連ページの夫々は、検索クエリとの関連度及び重要度が高いウェブページほど上位にランク付けされ、検索結果リストの上位に配置される。検索結果リストは、従来の検索エンジンからの応答データと同様に、リストアップされた関連ページへのリンクを有したウェブページとして構成される。ここで言うリンクは、所謂ハイパーリンクである。 The query response unit 140 transmits a search result list in which related page groups are arranged in page rank order, to the user terminal 5 as response data to the search query, based on the information on the related page group provided from the query processing unit 130. Each of the related pages is ranked higher on the web page with higher relevance and importance to the search query, and is placed at the top of the search result list. The search result list is configured as a web page with links to the listed related pages, as well as response data from conventional search engines. The links referred to here are so-called hyperlinks.
 詳述すると、クエリ応答部140は、図3に示すように、第1スコアリング部141と、第2スコアリング部143と、ランク付け部145と、出力部147と、を備える。
 第1スコアリング部141は、検索クエリに対応する関連ページ群について、関連ページの夫々を、ページコンテンツの検索クエリとの関連度に基づいてスコアリングし、関連ページの夫々に、第1スコアとして、内容得点を与えるように構成される。
More specifically, as shown in FIG. 3, the query response unit 140 includes a first scoring unit 141, a second scoring unit 143, a ranking unit 145, and an output unit 147.
The first scoring unit 141 scores each of the related pages for the related page group corresponding to the search query based on the degree of association of the page content with the search query, and sets each of the related pages as a first score. Configured to give content points.
 第2スコアリング部143は、検索クエリとは独立して動作し、クローラ110により収集されたウェブページの夫々に、第2スコアとして、ウェブページ間の接続関係に基づく重要得点を与えるように構成される。 The second scoring unit 143 operates independently of the search query, and is configured to give each of the web pages collected by the crawler 110 an important score based on the connection relationship between web pages as a second score. Be done.
 第2スコアは、ウェブページ間の接続関係から重要度が高いと推定されるウェブページほど大きな値を示すように算出され、各ウェブページに付与される。具体的には、多くのリンクが集まるウェブページほど、高い重要得点を持つウェブページからリンクされるウェブページほど、他のウェブページへのリンクの少ないウェブページからのリンクを持つウェブページほど大きな値を示すように算出され、各ウェブページに付与される。 The second score is calculated so as to indicate a larger value as the web page estimated to be more important from the connection relationship between web pages, and is assigned to each web page. Specifically, a web page with many links, a web page linked from a web page with a high importance score, and a web page with a link from a web page with few links to other web pages have a larger value. It is calculated to indicate and given to each web page.
 ランク付け部145は、第1スコアリング部141が関連ページの夫々に対して算出した第1スコアと、第2スコアリング部143が関連ページの夫々に対して算出した第2スコアとに基づき、関連ページの夫々のページランクを算出するように構成される。 The ranking unit 145 is based on the first score calculated for each of the related pages by the first scoring unit 141 and the second score calculated for each of the related pages by the second scoring unit 143, It is configured to calculate each page rank of the related page.
 簡単な例によれば、関連ページの夫々のページランクは、第1スコアと第2スコアとの重み付け和に対応する。例えば、第1スコアX1、第2スコアX2、及び、0から1の間の値を採る重み付け係数αを用いて、各関連ページのページランクYは、式Y=α・X1+(1-α)・X2に従って算出され得る。各関連ページのページランクは、検索クエリに基づく内容得点とウェブページ間の接続関係に基づく重要得点とを成分に含む全体得点として理解されてもよい。 According to a simple example, each page rank of the relevant page corresponds to a weighted sum of the first score and the second score. For example, using the first score X1, the second score X2, and the weighting coefficient α taking a value between 0 and 1, the page rank Y of each related page is expressed by the equation Y = α · X1 + (1−α) It can be calculated according to X2. The page rank of each related page may be understood as an overall score including the content score based on the search query and the important score based on the connection relationship between web pages.
 出力部147は、検索クエリに対応する関連ページ群を、ランク付け部145により算出された各関連ページのページランクに基づき、ページランクが高い関連ページから順に並べるように、リストアップしたページリストを、検索結果リストとして検索クエリ送信元のユーザ端末5に送信する。 The output unit 147 arranges the page list in which related pages corresponding to the search query are arranged in order from the related page having the highest page rank based on the page rank of each related page calculated by the ranking unit 145. , And transmits it as a search result list to the user terminal 5 of the search query transmission source.
 続いて、第2スコアリング部143が実行する処理の詳細と共に、本実施形態に特徴的な第2スコアの算出技術の詳細を、以下に説明する。
 従来では、ウェブページ間の接続関係に基づくウェブページのスコアリングを、各成分が値1,0の二値(実数)で表される隣接行列を用いて行っていた。これに対し、本実施形態では、値1,0,+i,-iの四値(複素数)で表されるエルミート隣接行列Hを用いて、ウェブページのスコアリングを行う。ここでiは、虚数単位を表す。
Subsequently, the details of the process of calculating the second score, which is characteristic of the present embodiment, as well as the details of the process performed by the second scoring unit 143 will be described below.
Conventionally, scoring of web pages based on the connection relationship between web pages is performed using an adjacency matrix in which each component is represented by a binary value (1, 0) (real number). On the other hand, in the present embodiment, web page scoring is performed using a Hermite adjacency matrix H represented by four values (complex numbers) of values 1, 0, + i, -i. Here, i represents an imaginary unit.
 エルミート隣接行列Hは、弱連結関係にあるウェブページ群毎に生成される。周知のように弱連結関係にあるノード群から構成されるネットワークは、ノード間のリンクの接続方向を無視したときに、そのネットワークに属するノード群の任意の一つのノードから、残りのノードにリンクをたどって到達可能なネットワークに対応する。同様に、弱連結関係にあるウェブページ群から構成される文書ネットワークは、その文書ネットワークに属するウェブページの任意の一つが、リンクの接続方向を無視したときに、残りのウェブページと少なくとも間接的な接続関係を有するウェブページ群から構成される。 A Hermite adjacency matrix H is generated for each web page group in a weakly connected relationship. As well known, when a network consisting of nodes in a weak connection relationship ignores the connection direction of the link between the nodes, it links from any one node of the nodes belonging to the network to the remaining nodes. Correspond to reachable networks. Similarly, a document network consisting of web pages in a weak link relationship may at least be indirectly linked to the remaining web pages when any one of the web pages belonging to the document network ignores the link connection direction. It consists of a group of web pages that have various connection relationships.
 第2スコアリング部143は、図4に示すフローチャートを定期的に実行することにより、インデックス記憶部220が記憶する最新のインデックスデータに基づき、弱連結関係にあるウェブページ群毎に、対応するウェブページ群に属する各ウェブページの第2スコアを算出する。 The second scoring unit 143 periodically executes the flow chart shown in FIG. 4 so that, based on the latest index data stored in the index storage unit 220, the corresponding web page group for each web page group in a weakly connected relationship. Calculate a second score for each web page that belongs to the page group.
 図4に示す処理を開始すると、第2スコアリング部143は、全ウェブページの中から、一つ以上の連結文書ネットワークを抽出する(S110)。具体的には、インデックス記憶部220が記憶するインデックスデータを参照することにより、全ウェブページの中で、弱連結関係を有するウェブページ群のそれぞれを、一つの連結文書ネットワークとして抽出する(S110)。一つの連結文書ネットワークは、弱連結関係を有するウェブページ群から構成される。 When the process shown in FIG. 4 is started, the second scoring unit 143 extracts one or more connected document networks from all the web pages (S110). Specifically, by referring to the index data stored in the index storage unit 220, each web page group having a weakly connected relationship is extracted as one connected document network among all web pages (S110). . One linked document network is composed of web pages having weak link relationships.
 図5A、図6A、図7A、図8A、図9A、図10A、及び図11は、連結文書ネットワークの例を示す。図5A、図6A、図7A、図8A、図9A、図10A、及び図11における一つの円は、一つのノード、換言すれば一つのウェブページに対応する。 5A, 6A, 7A, 8A, 9A, 10A, and 11 show examples of linked document networks. One circle in FIGS. 5A, 6A, 7A, 8A, 9A, 10A, and 11 corresponds to one node, in other words, one web page.
 同図における矢印は、矢印の始点に対応するウェブページに、矢印の終点に対応するウェブページへのリンク(ハイパーリンク)が形成されていることを示す。即ち、矢印の始点に対応するウェブページから矢印の終点に対応するウェブページにリンクを介して移動可能であることを意味する。両方向の矢印は、矢印に接続された二つのウェブページにおいて互いへのリンクが形成され、双方向に移動可能であることを意味する。 The arrow in the figure indicates that a link (hyperlink) to the web page corresponding to the end point of the arrow is formed on the web page corresponding to the start point of the arrow. That is, it means that the web page corresponding to the start point of the arrow can be moved via the link to the web page corresponding to the end point of the arrow. A double-headed arrow means that links to each other are formed in two web pages connected to the arrow and can be moved in both directions.
 同図に示される文書ネットワーク内の各ウェブページは、その図から明らかに、矢印の方向を無視したとき、他のウェブページと少なくとも間接的に接続されている。以下では、これらのウェブページを、図において円内に示される数字kを用いて、第kウェブページと表現する。 Each web page in the document network shown in the figure is at least indirectly connected to another web page when the direction of the arrow is ignored, as apparent from the figure. Hereinafter, these web pages will be expressed as the k-th web page using the number k shown in a circle in the figure.
 S110に続くS120において、第2スコアリング部143は、連結文書ネットワークの夫々について、始点ページSPを設定する。具体的には、第2スコアリング部143は、連結文書ネットワークの夫々について、図12に示す処理を実行して、始点ページSPを設定する。 In S120 following S110, the second scoring unit 143 sets the start page SP for each of the linked document networks. Specifically, the second scoring unit 143 sets the start point page SP by executing the process shown in FIG. 12 for each of the linked document networks.
 図12に示す処理を開始すると、第2スコアリング部143は、対応する連結文書ネットワークにおいて、最長経路に該当する一つ以上の経路を探索する(S121)。第2スコアリング部143は、対応する連結文書ネットワーク内のノード(ウェブページ)のそれぞれに関して、対応するノードから、矢印に従って隣接するノードに、同じノードに複数回移動することができないという規則を守って、移動先がなくなるまで移動したときの、その経路を構成するノード数を算出し、ノード数が最大の経路を、最長経路として探索することができる。 When the process shown in FIG. 12 is started, the second scoring unit 143 searches for one or more paths corresponding to the longest path in the corresponding linked document network (S121). The second scoring unit 143 observes the rule that, for each of the nodes (web pages) in the corresponding linked document network, it can not move to the same node multiple times from the corresponding node to the adjacent node according to the arrow. Thus, it is possible to calculate the number of nodes constituting the route when moving until there is no destination, and search for the route with the largest number of nodes as the longest route.
 その後、第2スコアリング部143は、最長経路として複数の経路が見つかったか否かを判断する(S122)。最長経路として一つの経路のみが見つかった場合(S122でNo)、第2スコアリング部143は、最長経路として見つかった経路の始点に対応するウェブページを、始点ページSPに設定する(S123)。 Thereafter, the second scoring unit 143 determines whether a plurality of routes are found as the longest route (S122). When only one route is found as the longest route (No in S122), the second scoring unit 143 sets a web page corresponding to the start point of the route found as the longest route as the start page SP (S123).
 例えば、図5A、図6A、図7A、図8A、図9Aに示す連結文書ネットワークに対しては、最長経路の始点に位置する入次数がゼロの第1ウェブページが始点ページSPに設定される。 For example, for the linked document network shown in FIGS. 5A, 6A, 7A, 8A, 9A, the first web page with zero indegree located at the start of the longest path is set as the start page SP .
 第2スコアリング部143は、最長経路として複数の経路が見つかった場合(S122でYes)、上記最長経路に該当する複数の経路の中で、始点からの出次数が最小又は最大である特定経路に該当する一つ以上の経路を探索する(S124)。特定経路を「始点からの出次数が最小の経路」及び「始点からの出次数が最大の経路」のいずれに定義するかは、システムの設計者が任意に決定することができる。図10A、及び、図11に例示される連結文書ネットワークには、最長経路に該当する経路が複数存在する。 When a plurality of routes are found as the longest route (Yes in S122), the second scoring unit 143 selects the specific route having the smallest or largest out-degree from the starting point among the plurality of routes corresponding to the longest route. Search one or more routes corresponding to (S124). The designer of the system can arbitrarily determine whether the specific route is defined as “a route with the smallest out-degree from the start point” or “a route with the largest out-order from the start point”. A plurality of routes corresponding to the longest route exist in the linked document network illustrated in FIGS. 10A and 11.
 ここで、用語「出次数」及び「入次数」について説明する。一つの連結文書ネットワーク内で、一つのウェブページからU1個のウェブページへのリンクが存在するとき、一つのウェブページの出次数はU1である。一方、一つのウェブページにU2個のウェブページからのリンクが存在するとき、一つのウェブページの入次数はU2である。 Here, the terms "out-degree" and "in-degree" will be described. In one linked document network, when there is a link from one web page to U1 web pages, the out-degree of one web page is U1. On the other hand, when there is a link from U2 web pages in one web page, the in-degree of one web page is U2.
 S124での探索を終えると、第2スコアリング部143は、上記特定経路に該当する複数の経路が見つかったか否かを判断する(S125)。複数の経路が見つかったと判断すると(S125でYes)、第2スコアリング部143は、特定経路に該当する複数の経路の上流に、複数の経路の始点に接続する入次数がゼロの一つのダミーページを配置して、このダミーページを始点ページSPに設定する(S126)。 After completing the search in S124, the second scoring unit 143 determines whether or not a plurality of routes corresponding to the specific route are found (S125). If it is determined that a plurality of paths are found (Yes in S125), the second scoring unit 143 determines that one in-order zero dummy connected to the start points of the plurality of paths upstream of the plurality of paths corresponding to the specific path. A page is arranged, and this dummy page is set as the start page SP (S126).
 ここで言うダミーページは、仮想的なウェブページのことを意味する。図10Bによれば、第1ウェブページ及び第3ウェブページの上流に、第1ウェブページ及び第3ウェブページへのリンクを有する一つのダミーページが、始点ページSPとして設けられる。 The dummy page said here means a virtual web page. According to FIG. 10B, one dummy page having links to the first web page and the third web page is provided as a start page SP upstream of the first web page and the third web page.
 一方、第2スコアリング部143は、上記特定経路として一つの経路のみが見つかったと判断すると(S125でNo)、この経路の始点に位置するウェブページを始点ページSPに設定する(S127)。図11に示す連結文書ネットワークは、最長経路として、第1、第2、第3、第4、第5、第6、第7、及び第10ウェブページを始点とする複数の経路を含む。始点の出次数が最小である経路は、第10ウェブページを始点とする経路である。従って、上記特定経路が始点からの出次数が最小の経路に定義されている場合、第10ウェブページが始点ページSPに設定される。 On the other hand, if the second scoring unit 143 determines that only one route is found as the specific route (No in S125), it sets the web page located at the start point of this route as the start page SP (S127). The linked document network shown in FIG. 11 includes a plurality of paths starting from the first, second, third, fourth, fifth, sixth, seventh and tenth web pages as the longest paths. The path whose starting point degree is the smallest is a path starting from the tenth web page. Therefore, when the specific route is defined as the route with the smallest outgoing order from the start point, the tenth web page is set as the start point page SP.
 このようにして、第2スコアリング部143は、連結文書ネットワークの夫々に関し、そのネットワーク構造に応じた始点ページSPを設定する(S120)。その後、第2スコアリング部143は、連結文書ネットワークの夫々に対応する仮想文書ネットワークを設定する(S130)。一つの仮想文書ネットワークは、一つの連結文書ネットワークに対応する。 Thus, the second scoring unit 143 sets the start point page SP according to the network structure for each of the connected document networks (S120). Thereafter, the second scoring unit 143 sets a virtual document network corresponding to each of the connected document networks (S130). One virtual document network corresponds to one linked document network.
 S130において、第2スコアリング部143は、連結文書ネットワークの夫々に関して、対応する連結文書ネットワーク内に存在する出次数がゼロのウェブページを終点ページEPとして探索し、見つかった終点ページEPに、入次数1及び出次数0のダミーページDPを付加して、仮想文書ネットワークを設定する。 In S130, the second scoring unit 143 searches, for each of the linked document networks, a web page with an output degree of zero existing in the corresponding linked document network as the end page EP, and enters the found end page EP. A virtual document network is set by adding dummy pages DP of degree 1 and degree 0.
 終点ページEPに接続されるダミーページDPは、対応する仮想文書ネットワークにおいて、終点ページEPからダミーページDPへのリンクが存在し、ダミーページDPから他のウェブページへのリンクが存在しない仮想的なウェブページである。 The dummy page DP connected to the end point page EP is a virtual such that there is a link from the end point page EP to the dummy page DP and no link from the dummy page DP to another web page in the corresponding virtual document network. It is a web page.
 対応する連結文書ネットワーク内に複数の終点ページEPが見つかった場合、第2スコアリング部143は、終点ページEPのそれぞれに、ダミーページDPを付加して、仮想文書ネットワークを設定する。 If a plurality of end point pages EP are found in the corresponding linked document network, the second scoring unit 143 adds a dummy page DP to each of the end point pages EP to set a virtual document network.
 図5B、図6B、図7B、図8B、図9B、図10B、及び図11には、図5A、図6A、図7A、図8A、図9A、図10A、及び図11に示される連結文書ネットワークのそれぞれに対応する仮想文書ネットワークを示す。図11に示す連結文書ネットワークに対応する仮想文書ネットワークは、連結文書ネットワークに出次数がゼロのウェブページが存在しないため、連結文書ネットワークと同じである。 FIGS. 5B, 6B, 7B, 8B, 9B, 10B, and 11 are connected documents shown in FIGS. 5A, 6A, 7A, 8A, 9A, 10A, and 11 Shown is a virtual document network corresponding to each of the networks. The virtual document network corresponding to the linked document network shown in FIG. 11 is the same as the linked document network because there is no zero-degree web page in the linked document network.
 ここで終点ページEPにダミーページDPを付加することによって仮想文書ネットワークを設定するのは、次の理由からである。
 仮にダミーページDPなしの連結文書ネットワークに基づいて、エルミート隣接行列Hを用いたスコアリングを行う場合には、出次数がゼロの終点ページEPのスコアが、終点ページEPへのリンクを有する隣接ページのスコアに対して下がる傾向を示す。しかしながら、ウェブページ間の接続方向の終端に対応する終点ページEPは、重要度の高いウェブページということができ、高いスコアが算出されるべきウェブページである。
The reason why the virtual document network is set by adding the dummy page DP to the end point page EP here is as follows.
If the scoring is performed using the hermitian adjacency matrix H based on the connected document network without the dummy page DP, the score of the end page EP with zero out degree is the adjacent page having a link to the end page EP It shows a tendency to lower against the score of. However, the end point page EP corresponding to the end of the connection direction between web pages can be said to be a web page of high importance and is a web page for which a high score is to be calculated.
 このため、本実施形態では、ダミーページDPを導入して、終点ページEPに高いスコア(第2スコア)が算出されるように工夫している。
 S130において、仮想文書ネットワークを設定し終えると、第2スコアリング部143は、仮想文書ネットワークが有する要素数Nの最大値max{N}を特定する(S140)。一つの仮想文書ネットワークの要素数Nは、その仮想文書ネットワークに属するウェブページとダミーページとの合計数に対応する。例えば、図6Bに示す仮想文書ネットワークの要素数Nは、14である(N=14)。例えば、図8Bに示す仮想文書ネットワークの要素数Nは、7である(N=7)。最大値max{N}は、全ての仮想文書ネットワークの内、要素数Nが最大の仮想文書ネットワークの当該要素数Nに対応する。
For this reason, in the present embodiment, a dummy page DP is introduced to devise that a high score (second score) is calculated on the end point page EP.
After setting the virtual document network in S130, the second scoring unit 143 specifies the maximum value max {N} of the element number N of the virtual document network (S140). The number N of elements of one virtual document network corresponds to the total number of web pages and dummy pages belonging to the virtual document network. For example, the element number N of the virtual document network shown in FIG. 6B is 14 (N = 14). For example, the element number N of the virtual document network shown in FIG. 8B is 7 (N = 7). The maximum value max {N} corresponds to the number N of elements of the virtual document network with the largest number N of elements among all the virtual document networks.
 続くS150において、第2スコアリング部143は、仮想文書ネットワークの一つを処理対象に選択する。ここで選択した仮想文書ネットワークを、以下では、選択文書ネットワークと表現する。 In the following S150, the second scoring unit 143 selects one of the virtual document networks as a processing target. The virtual document network selected here is hereinafter referred to as a selected document network.
 その後、第2スコアリング部143は、選択文書ネットワークのエルミート隣接行列Hに基づき、選択文書ネットワークを構成するウェブページの夫々に対する第2スコアを算出する(S160~S220)。以下では、選択文書ネットワークを構成するウェブページの一つを、ウェブページD[m]と表現する。変数mは、値1からNまでの整数値を採り(1≦m≦N)、ウェブページD[m]は、選択文書ネットワークにおける第mウェブページに対応する。Nは、選択文書ネットワークの要素数Nである。 Thereafter, the second scoring unit 143 calculates, based on the Hermite adjacency matrix H of the selected document network, a second score for each of the web pages constituting the selected document network (S160 to S220). Below, one of the web pages which comprise a selection document network is expressed as web page D [m]. The variable m takes an integer value of 1 to N (1 ≦ m ≦ N), and the web page D [m] corresponds to the mth web page in the selected document network. N is the number N of elements of the selected document network.
 S160において、第2スコアリング部143は、選択文書ネットワーク内のウェブページの接続関係を、値1,0,+i,-iで表すエルミート隣接行列Hを生成する。エルミート隣接行列Hは、選択文書ネットワークの要素数Nに対応したN行N列(NxN)の行列であり、各成分が、値1,0,+i,-iのいずれかの値を採る行列である。以下における表現「成分h(p,q)」は、エルミート隣接行列Hにおける第p行第q列の成分を示す。 In S160, the second scoring unit 143 generates a Hermitian adjacency matrix H representing the connection relationship of web pages in the selected document network with the values 1, 0, + i, -i. The Hermitian adjacency matrix H is a matrix of N rows and N columns (NxN) corresponding to the number N of elements of the selected document network, and each matrix is a matrix which takes any one of the values 1, 0, + i and -i. is there. The expression “component h (p, q)” in the following indicates the component of the p th row and the q column in the Hermite adjacency matrix H.
 第2スコアリング部143は、ウェブページD[p]からウェブページD[q]へのリンクが存在し且つウェブページD[q]からウェブページD[p]へのリンクが存在するとき、対応する成分h(p,q)を、値1に設定する。 The second scoring unit 143 responds when there is a link from the web page D [p] to the web page D [q] and a link from the web page D [q] to the web page D [p]. Component h (p, q) is set to the value 1.
 ウェブページD[p]からウェブページD[q]へのリンク及びウェブページD[q]からウェブページD[p]へのリンクのいずれもが存在しないとき、対応する成分h(p,q)を、値0に設定する。従って、第2スコアリング部143は、エルミート隣接行列Hの対角成分h(p,p)を値ゼロに設定する。 When there is no link from web page D [p] to web page D [q] and no link from web page D [q] to web page D [p], the corresponding component h (p, q) Is set to the value 0. Therefore, the second scoring unit 143 sets the diagonal component h (p, p) of the Hermitian adjacency matrix H to the value zero.
 第2スコアリング部143は更に、ウェブページD[p]からウェブページD[q]へのリンクが存在するがウェブページD[q]からウェブページD[p]へのリンクが存在しないとき、対応する成分h(p,q)を、値+iを設定する。ウェブページD[p]からウェブページD[q]へのリンクが存在しないがウェブページD[q]からウェブページD[p]へのリンクが存在するとき、対応する成分h(p,q)を、値-iを設定する。 Furthermore, when there is a link from the web page D [p] to the web page D [q] but there is no link from the web page D [q] to the web page D [p], the second scoring unit 143 The corresponding component h (p, q) is set to the value + i. When there is no link from web page D [p] to web page D [q] but a link from web page D [q] to web page D [p], the corresponding component h (p, q) , Set the value -i.
 このようにして、第2スコアリング部143は、各成分h(p,q)の値を設定し、エルミート隣接行列Hを生成する。 Thus, the second scoring unit 143 sets the value of each component h (p, q) and generates a Hermite adjacency matrix H.
Figure JPOXMLDOC01-appb-M000006
 上述の規則に従って各成分h(p,q)の値を設定した場合、第p行第q列の成分h(p,q)と対角成分を挟んで対称的な位置関係にある第q行第p列の成分h(q,p)は、成分h(p,q)の複素共役に設定される。従って、エルミート隣接行列Hは、エルミート行列である。
Figure JPOXMLDOC01-appb-M000006
When the value of each component h (p, q) is set according to the above-mentioned rule, the q-th row having a symmetrical positional relationship with respect to the component h (p, q) in the p th row and q column The component h (q, p) in the p-th column is set to the complex conjugate of the component h (p, q). Thus, the Hermitian adjacency matrix H is a Hermitian matrix.
 その後、第2スコアリング部143は、このエルミート隣接行列Hを変形した特殊エルミート隣接行列H1を生成する(S170)。変形は、特殊エルミート隣接行列H1の固有ベクトルVの各成分を、複素ベクトルとして複素平面に配置したときに、成分の全てがπ/2ラジアンの角度範囲に収まるように行われる。 Thereafter, the second scoring unit 143 generates a special Hermite adjacency matrix H1 obtained by transforming the Hermitian adjacency matrix H (S170). The transformation is performed such that when each component of the eigenvector V of the special Hermite adjacency matrix H1 is arranged as a complex vector in the complex plane, all of the components fall within an angle range of π / 2 radians.
 変形に際して、第2スコアリング部143は、S140で算出した最大値max{N}に基づいて次式に従う第一補正量C1及び第二補正量C2を算出する。 At the time of deformation, the second scoring unit 143 calculates a first correction amount C1 and a second correction amount C2 according to the following equation based on the maximum value max {N} calculated in S140.
Figure JPOXMLDOC01-appb-M000007
 ここで、パラメータnは、S140で特定された最大値max{N}以上の自然数である。
Figure JPOXMLDOC01-appb-M000007
Here, the parameter n is a natural number equal to or greater than the maximum value max {N} specified in S140.
 上記成分の全てをπ/2ラジアンの角度範囲に収めるためには、n=max{N}に設定すれば十分である。パラメータnの値を、最大値max{N}より大きくするほど、成分の全ては、π/2ラジアンの角度範囲より小さい角度範囲内に収まる。成分の全てをπ/2ラジアンの角度範囲に収めることの目的は、成分の全てが複素平面上の一つの象限内に収まるようにするためである。この目的が達成できる範囲で、パラメータnは任意の値に定められ得る。しかしながら、第2スコアの良好な算出のために、パラメータnは、目的が達成可能な範囲で、なるべく小さい値に設定されるのが好ましい。 It is sufficient to set n = max {N} in order to fit all of the above components into an angle range of π / 2 radians. As the value of the parameter n is made larger than the maximum value max {N}, all of the components fall within an angle range smaller than the angle range of π / 2 radians. The purpose of having all of the components in an angular range of π / 2 radians is to ensure that all of the components are in one quadrant on the complex plane. The parameter n can be set to any value as long as this object can be achieved. However, for good calculation of the second score, the parameter n is preferably set to a value as small as possible within the range in which the object can be achieved.
 第二補正量C2は、角度範囲の調整に寄与し、第一補正量C1は、第二補正量C2によって行列成分の絶対値が変化するのを回避するのに役立つ。
 第2スコアリング部143は、第一補正量C1及び第二補正量C2の算出後、エルミート隣接行列Hにおける値+iの成分を、値C1(C2+i)に置換し、値-iを示す成分を値C1(C2-i)に置換する。第2スコアリング部143は更に、当該置換後のエルミート隣接行列Hにおける各行の成分の値C1(C2+i)を、同じ行において値C1(C2+i)を示す成分の数及び値1を示す成分の数の和Wで除算した値{C1(C2+i)/W}に変更する。第2スコアリング部143は更に、値{C1(C2+i)/W}に変更された成分と対角成分を挟んで対称的な位置関係にある成分の値C1(C2-i)を、値{C1(C2+i)/W}の複素共役{C1(C2-i)/W}に変更する。このような置換及び変更によって定義されるエルミート行列を、特殊エルミート隣接行列H1として生成する。
The second correction amount C2 contributes to adjustment of the angular range, and the first correction amount C1 helps to prevent the absolute value of the matrix component from changing due to the second correction amount C2.
After the calculation of the first correction amount C1 and the second correction amount C2, the second scoring unit 143 replaces the component of the value + i in the hermitian adjacency matrix H with the value C1 (C2 + i), and the component indicating the value -i Replace with the value C1 (C2-i). The second scoring unit 143 further calculates the value C1 (C2 + i) of the component of each row in the Hermite adjacency matrix H after the replacement, the number of the component indicating the value C1 (C2 + i) in the same row, and the number of the component indicating the value 1 Change to the value {C1 (C2 + i) / W} divided by the sum W of. The second scoring unit 143 further sets the value C1 (C2-i) of the component that is in a symmetrical positional relationship with respect to the component changed to the value {C1 (C2 + i) / W} and the diagonal component to the value {{ Change to the complex conjugate of C1 (C2 + i) / W} {C1 (C2-i) / W}. A Hermitian matrix defined by such permutations and changes is generated as a special Hermitian adjacency matrix H1.
 エルミート隣接行列Hから特殊エルミート隣接行列H1への変形手順の具体例を図13に示す。例えば、第p1行における合計N個の成分h(p1,1),h(p1,2),…,h(p1,N)の内、値+iを採る成分及び値1を採る成分が合計W1個である場合には、エルミート隣接行列Hにおける第p1行の値+iを示す各成分を、値{C1(C2+i)/W1}に変更する。第p2行における合計N個の成分h(p2,1),h(p2,2),…,h(p2,N)の内、値+iを採る成分及び値1を採る成分が合計W2個である場合には、エルミート隣接行列Hにおける第p2行の値+iを示す各成分を、値{C1(C2+i)/W2}に変更する。 A specific example of the transformation procedure from the Hermite adjacency matrix H to the special Hermite adjacency matrix H1 is shown in FIG. For example, among the total N components h (p1,1), h (p1,2),..., H (p1, N) in the p1-th row, the component taking the value + i and the component taking the value 1 are the total W1 If the number is i, then each component indicating the value + i in the p1th row in the Hermite adjacency matrix H is changed to the value {C1 (C2 + i) / W1}. Among the total N components h (p2,1), h (p2,2),..., H (p2, N) in the row p2, the component taking the value + i and the component taking the value 1 is the total W2 In some cases, each component indicating the value + i in the p2th row in the Hermite adjacency matrix H is changed to a value {C1 (C2 + i) / W2}.
 更に、値-iを示す成分の値を、対角成分を挟んで対称的な位置関係にある成分の複素共役に変更する。例えば、値{C1(C2+i)/W1}を示す成分h(p1,q1)と対角成分を挟んで対称的な位置関係にある成分h(q1,p1)の値を、{C1(C2-i)/W1}に変更する。同様に、値{C1(C2+i)/W2}を示す成分h(p2,q2)と対角成分を挟んで対称的な位置関係にある成分h(q2,p2)の値を、{C1(C2-i)/W2}に変更する。 Furthermore, the value of the component indicating the value -i is changed to the complex conjugate of the component in a symmetrical positional relationship with respect to the diagonal component. For example, the value of the component h (q1, p1) which has a symmetrical positional relationship with respect to the component h (p1, q1) representing the value {C1 (C2 + i) / W1} with respect to the diagonal component is {C1 (C2-) i) Change to / W1}. Similarly, the value of the component h (q2, p2) that is in a symmetrical positional relationship with respect to the component h (p2, q2) indicating the value {C1 (C2 + i) / W2} with respect to the diagonal component is {C1 (C2) Change to -i) / W2}.
 続くS180において、第2スコアリング部143は、S170で生成した特殊エルミート隣接行列H1の固有値及び固有ベクトルを算出する。特殊エルミート隣接行列H1がNxNの行列であることから、固有ベクトルは、N個の成分を含むN次元ベクトルである。以下では、絶対値最大の固有値に対応する固有ベクトルVの各成分をV[m]を用いて表す。変数mは値1から値Nまでの整数値を採る。即ち、固有ベクトルは、V={V[1],V[2],…,V[N]}である。固有ベクトルVの各成分V[m](1≦m≦N)は、選択文書ネットワークを構成する各ウェブページD[m](1≦m≦N)に対応する。 In S180, the second scoring unit 143 calculates the eigenvalues and the eigenvectors of the special Hermite adjacency matrix H1 generated in S170. Since the special Hermite adjacency matrix H1 is an NxN matrix, the eigenvectors are N-dimensional vectors including N components. In the following, each component of the eigenvector V corresponding to the absolute value maximum eigenvalue is represented using V [m]. The variable m takes an integer value from value 1 to value N. That is, the eigenvectors are V = {V [1], V [2],..., V [N]}. Each component V [m] (1 ≦ m ≦ N) of the eigenvector V corresponds to each web page D [m] (1 ≦ m ≦ N) constituting the selected document network.
 続くS190において、第2スコアリング部143は、特殊エルミート隣接行列H1の絶対値最大の固有値に対応する固有ベクトルVの各成分V[m](1≦m≦N)を、選択文書ネットワークの始点ページSPに対応する成分Eで除算する。ここで、始点ページSPは、選択文書ネットワークに関してS120で設定された始点ページSPに対応する。始点ページSPが第sウェブページD[s]であるとき、成分Eは、固有ベクトルVの第s成分V[s]に対応する(E=V[s])。 In the next S190, the second scoring unit 143 displays each component V [m] (1 ≦ m ≦ N) of the eigenvector V corresponding to the absolute value maximum eigenvalue of the special Hermite adjacency matrix H1 as the start page of the selected document network. Divide by component E corresponding to SP. Here, the start point page SP corresponds to the start point page SP set in S120 for the selected document network. When the start page SP is the s-th web page D [s], the component E corresponds to the s-th component V [s] of the eigenvector V (E = V [s]).
 固有ベクトルVの各成分V[m](1≦m≦N)を成分Eで除算すると、始点ページSPに対応する固有ベクトルVの成分は、値1に変換される。以下では、除算後の固有ベクトルVを、固有ベクトルV1と表現する。固有ベクトルV1は、V1={V[1]/E,V[2]/E,…,V[s]/E=1,…,V[N]/E}である。除算により、始点ページSPに対応する固有ベクトルV1の成分は、複素平面において、実軸上に配置される。 When each component V [m] (1 ≦ m ≦ N) of the eigenvector V is divided by the component E, the component of the eigenvector V corresponding to the start page SP is converted to the value “1”. In the following, the eigenvector V after division is expressed as an eigenvector V1. The eigenvector V1 is V1 = {V [1] / E, V [2] / E, ..., V [s] / E = 1, ..., V [N] / E}. By division, the components of the eigenvector V1 corresponding to the start page SP are arranged on the real axis in the complex plane.
 S190での処理を終えると、第2スコアリング部143は、除算後の固有ベクトルV1の各成分V[m]/E(1≦m≦N)を回転変換する(S200)。S200において、第2スコアリング部143は、図14に示す処理を実行することができる。 When the process in S190 is finished, the second scoring unit 143 rotationally converts each component V [m] / E (1 ≦ m ≦ N) of the eigenvector V1 after division (S200). In S200, the second scoring unit 143 can execute the process shown in FIG.
 図14に示す処理を開始すると、第2スコアリング部143は、固有ベクトルV1の各成分を複素平面に配置したときに、複素平面において実軸より第一象限側に位置する成分が存在するか否かを判断する(S201)。ここでいう実軸より第一象限側に位置する成分には、実軸に位置する成分は含まれない。 When the process shown in FIG. 14 is started, when the second scoring unit 143 arranges each component of the eigenvector V1 in the complex plane, whether or not there is a component located on the first quadrant side of the real axis in the complex plane. It is determined (S201). The component located on the first quadrant side from the real axis does not include the component located on the real axis.
 多くの場合、実軸に位置する始点ページSPの成分よりも第一象限側に位置する成分は、存在しない。しかしながら、対応する仮想文書ネットワークに環式構造が存在する場合には、始点ページSPよりも第一象限側に位置する成分が存在する場合がある。S201の処理は、低い確率で発生するこのような成分の存在を判別するために行われる。 In many cases, there is no component located on the first quadrant side of the component of the start page SP located on the real axis. However, when a cyclic structure exists in the corresponding virtual document network, there may be a component located on the first quadrant side of the start page SP. The process of S201 is performed to determine the presence of such a component that occurs with low probability.
 実軸より第一象限側に位置する成分が存在しないと判断すると(S201でNo)、第2スコアリング部143は、固有ベクトルV1の各成分を、第4象限側に所定角度θ1だけ回転させるように回転変換する(S203)。このようにして、第2スコアリング部143は、始点ページSPに対応する成分を、複素平面上において、実軸から角度θ1だけ離れた特定位置に配置するように、固有ベクトルV1の各成分を回転変換する。 If it is determined that there is no component positioned on the first quadrant side from the real axis (No in S201), the second scoring unit 143 rotates each component of the eigenvector V1 by a predetermined angle θ1 to the fourth quadrant side. Rotation conversion (S203). In this manner, the second scoring unit 143 rotates each component of the eigenvector V1 so as to arrange the component corresponding to the start page SP at a specific position away from the real axis by the angle θ1 on the complex plane. Convert.
 図15Aは、回転変換前の各成分の複素平面上の配置を例示する。図15Aにおいて黒丸及び白丸の夫々は、固有ベクトルV1の成分の一つに対応し、黒丸は、特に、始点ページSPの成分に対応する。図15Bは、回転変換後の各成分の複素平面上の配置を例示する。 FIG. 15A illustrates the arrangement on the complex plane of each component before rotational transformation. In FIG. 15A, each of the black circle and the white circle corresponds to one of the components of the eigenvector V1, and the black circle particularly corresponds to the component of the start page SP. FIG. 15B illustrates the arrangement on the complex plane of each component after rotational transformation.
 角度θ1は、例えば、π/180ラジアンである。角度θ1は、固有ベクトルV1の各成分が、全て第4象限に収まる範囲内で、始点ページSPに対応する成分が実軸から離れるような角度であれば十分であり、π/180ラジアンに限定されない。始点ページSPに対応する成分を実軸から離す理由については、後述する。 The angle θ1 is, for example, π / 180 radians. The angle θ1 is sufficient if the components corresponding to the start page SP are apart from the real axis within the range in which all components of the eigenvector V1 are all contained in the fourth quadrant, and is not limited to π / 180 radians . The reason for separating the component corresponding to the start point page SP from the real axis will be described later.
 実軸より第一象限側に位置する成分が存在すると判断すると(S201でYes)、第2スコアリング部143は、S205に移行する。S205において、第2スコアリング部143は、実軸から第1象限側に最も離れた成分の実軸からの角度θ2を特定する。その後、第2スコアリング部143は、固有ベクトルV1の各成分を、第4象限側に角度θ1+θ2だけ回転させるように回転変換する(S207)。 If it is determined that there is a component located on the first quadrant side from the real axis (Yes in S201), the second scoring unit 143 proceeds to S205. In S205, the second scoring unit 143 identifies the angle θ2 from the real axis of the component farthest from the real axis in the first quadrant. Thereafter, the second scoring unit 143 rotationally converts each component of the eigenvector V1 so as to rotate it by the angle θ1 + θ2 toward the fourth quadrant (S207).
 角度θ1は、S203の処理に関して上述した通りである。角度θ1+θ2の回転変換により、第2スコアリング部143は、固有ベクトルV1の各成分を、始点ページSPより第1象限側に位置する成分を含めて、全て複素平面の第4象限に収める。以下では、回転変換後の固有ベクトルV1のことを、固有ベクトルVc={Vc[1],Vc[2],…,Vc[s],…,Vc[N]}と表現する。 The angle θ1 is as described above for the process of S203. By rotational conversion of the angle θ1 + θ2, the second scoring unit 143 puts all the components of the eigenvector V1 in the fourth quadrant of the complex plane, including the components located on the first quadrant side from the start page SP. Hereinafter, the eigenvector V1 after rotation conversion is expressed as eigenvector Vc = {Vc [1], Vc [2],..., Vc [s],..., Vc [N]}.
 S200において、このように回転変換を行うと、第2スコアリング部143は、固有ベクトルVcの各成分Vc[m](1≦m≦N)を、複素ベクトルとして複素平面に配置したときの各成分Vc[m](1≦m≦N)の実軸からの距離に対応する値Z[m]=L[m]・θ[m](1≦m≦N)を算出する(S210)。図15Bには、値Z[m]を概念的に示す。 In S200, when the rotation conversion is performed as described above, the second scoring unit 143 arranges each component Vc [m] (1 ≦ m ≦ N) of the eigenvector Vc as a complex vector in the complex plane. A value Z [m] = L [m] · θ [m] (1 ≦ m ≦ N) corresponding to the distance from the real axis of Vc [m] (1 ≦ m ≦ N) is calculated (S210). FIG. 15B conceptually shows the value Z [m].
 ここで値L[m]は、成分Vc[m]の絶対値|Vc[m]|に対応する。値θ[m]は、成分Vc[m]と実軸との間の角度、換言すれば、成分Vc[m]の実軸からの時計回り方向の角度に対応する。 Here, the value L [m] corresponds to the absolute value | Vc [m] | of the component Vc [m]. The value θ [m] corresponds to the angle between the component Vc [m] and the real axis, in other words, the angle of the component Vc [m] clockwise from the real axis.
 その後、第2スコアリング部143は、各ウェブページD[m](1≦m≦N)の第2スコアを、値Z[m]に基づいて算出し、算出した第2スコアを記憶部20に記憶する(S220)。 Thereafter, the second scoring unit 143 calculates the second score of each web page D [m] (1 ≦ m ≦ N) based on the value Z [m], and stores the calculated second score in the storage unit 20. (S220).
 ウェブページD[m]に対応する第2スコアX2は、X2=Z[m]-min{Z[m]}に従って算出される。min{Z[m]}は、Z[m](1≦m≦N)の一群のうちの最小値を意味する。即ち、最も小さいZ[m]を示すウェブページD[m]の第2スコアが値ゼロとなるように、各ウェブページD[m](1≦m≦N)の第2スコアX2として、Z[m]から最小値min{Z[m]}を減算した値を算出する。但し、ダミーページDPに対応する成分についての第2スコアX2を算出する必要はない。実際には存在しないダミーページDPは、第2スコアX2を算出しても意味がないためである。 The second score X2 corresponding to the web page D [m] is calculated according to X2 = Z [m] -min {Z [m]}. min {Z [m]} means the smallest value in the group of Z [m] (1 ≦ m ≦ N). That is, as the second score X2 of each web page D [m] (1 ≦ m ≦ N), the second score X2 of the web page D [m] indicating the smallest Z [m] has a value of zero. A value obtained by subtracting the minimum value min {Z [m]} from [m] is calculated. However, it is not necessary to calculate the second score X2 for the component corresponding to the dummy page DP. This is because the dummy page DP which does not actually exist does not make sense even if the second score X2 is calculated.
 このようにして、第2スコアリング部143は、第2スコアとして、固有ベクトルVの成分が実軸から離れているウェブページほど、及び、成分の絶対値が大きいウェブページほど、高いスコアを算出する。 Thus, the second scoring unit 143 calculates, as the second score, a higher score for a web page in which the component of the eigenvector V is farther from the real axis and for a web page in which the absolute value of the component is larger. .
 多くの場合には、始点ページSPに対応する成分が実軸に最も近い特定位置にあり、始点ページSPに対応する第2スコアは、ゼロである。従って、本実施形態によれば、始点ページSPから離れているウェブページほど高い第2スコアが算出される。具体的には、複素平面において、固有ベクトルVの成分が始点ページSPに対応する固有ベクトルの成分から大きく回転しているウェブページほど、及び、成分の絶対値が大きいウェブページほど、高い第2スコアが算出される。上述したS201の処理で肯定判断してS205の処理に移行したケースにおいても、角度θ2は大きくないため、多くのウェブページの第2スコアは、始点ページSPから離れているほど高い値として算出される。 In many cases, the component corresponding to the start point page SP is at the specific position closest to the real axis, and the second score corresponding to the start point page SP is zero. Therefore, according to the present embodiment, a higher second score is calculated as the web page is farther from the start point page SP. Specifically, in the complex plane, a web page in which the component of the eigenvector V is largely rotated from the component of the eigenvector corresponding to the start page SP and a web page having a larger absolute value of the component have a higher second score It is calculated. Even in the case where an affirmative determination is made in the process of S201 described above and the process proceeds to the process of S205, the angle θ2 is not large, so the second score of many web pages is calculated as a higher value as it is farther from the start page SP. Ru.
 特殊エルミート隣接行列H1に基づけば、第2補正量C2のおかげで、固有ベクトルVcの各成分が、複素平面上を1回転することなく、かならず複素平面のπ/2ラジアンの角度範囲内に、具体的には第4象限内に配置される。また、回転変換により、始点ページSPに対応する成分は、複素平面上の特定位置(角度θ1を有する位置)に配置される。そして、第2スコアが、特定位置と一定の対応関係にある実軸を基準とした値Z[m]=L[m]・θ[m](1≦m≦N)に従って算出される。このようにして、始点ページSPに対応する成分との位置関係に応じた、各ウェブページD[m]の値Z[m]及び第2スコアが算出される。 Based on the special Hermite adjacency matrix H1, thanks to the second correction amount C2, each component of the eigenvector Vc must be within the angular range of π / 2 radians of the complex plane without making one rotation on the complex plane. In the fourth quadrant. Moreover, the component corresponding to starting point page SP is arrange | positioned by rotation conversion at the specific position (position which has angle (theta) 1) on a complex plane. Then, a second score is calculated according to a value Z [m] = L [m] · θ [m] (1 ≦ m ≦ N) based on the real axis having a certain correspondence relationship with the specific position. Thus, the value Z [m] and the second score of each web page D [m] are calculated according to the positional relationship with the component corresponding to the start point page SP.
 従って、本実施形態によれば、始点ページSPを基準としたウェブページ間の接続関係に従う重要得点(第2スコア)を、容易かつ適切に算出可能である。
 本実施形態において、始点ページSPに対応する成分を実軸からずらして配置するように回転変換を行っているのは、Z[m]が式Z[m]=L[m]・θ[m]に従って算出されるためである。この式によれば、実軸上に配置される成分は、θ[m]=0であることから、全てZ[m]=0となり、値L[m]が第2スコアに反映されない。各成分Vc[m]のいずれもが実軸から離れるように回転変換することは、θ[m]が非ゼロになるように固有ベクトルV1の各成分を回転変換することに対応する。この回転変換により、第2スコアを、特には双方向リンクを有するウェブページの第2スコアを、L[m]に基づき良好に算出することができる。
Therefore, according to the present embodiment, it is possible to easily and appropriately calculate the important score (second score) according to the connection relationship between web pages based on the start point page SP.
In the present embodiment, the rotational conversion is performed so that the component corresponding to the start point page SP is disposed offset from the real axis. Z [m] is the equation Z [m] = L [m] · θ [m It is because it calculates according to. According to this equation, components arranged on the real axis are all θ [m] = 0 since θ [m] = 0, and the value L [m] is not reflected in the second score. The rotational conversion of each component Vc [m] away from the real axis corresponds to the rotational conversion of each component of the eigenvector V1 such that θ [m] becomes nonzero. By this rotation conversion, the second score, in particular, the second score of the web page having an interactive link can be favorably calculated based on L [m].
 本実施形態によれば、ダミーページDPのおかげで、終点ページEPに対応する第2スコアが高くなるように、各ウェブページの第2スコアを算出可能であることも有意義である。従って、本実施形態によれば、選択文書ネットワークにおけるウェブページの接続関係に応じた適切なスコアを、第2スコアとして算出することができる。 According to the present embodiment, it is also significant that the second score of each web page can be calculated so that the second score corresponding to the end point page EP becomes high thanks to the dummy page DP. Therefore, according to the present embodiment, it is possible to calculate, as the second score, an appropriate score according to the connection relationship of web pages in the selected document network.
 第2スコアリング部143は、S220で選択文書ネットワークにおける各ウェブページD[m]の第2スコアを算出すると、S230に移行し、全ての仮想文書ネットワークについてS160~S220の処理を実行して、ウェブページに対する第2スコアを算出したか否かを判断する。そして、未処理の仮想文書ネットワークが存在する場合には(S230でNo)、処理をS150に戻して、未処理の仮想文書ネットワークの一つを、新たな処理対象の仮想文書ネットワークに選択する。即ち、選択文書ネットワークを変更する。その後、新たな選択文書ネットワークについて、S160~S220の処理を実行し、選択文書ネットワークにおける各ウェブページの第2スコアを算出及び記憶する。 After calculating the second score of each web page D [m] in the selected document network in S220, the second scoring unit 143 proceeds to S230 and executes the processing of S160 to S220 for all virtual document networks, It is determined whether the second score for the web page has been calculated. Then, if there is an unprocessed virtual document network (No in S230), the process returns to S150, and one of the unprocessed virtual document networks is selected as a new virtual document network to be processed. That is, the selected document network is changed. Thereafter, the processing of S160 to S220 is executed for the new selected document network, and the second score of each web page in the selected document network is calculated and stored.
 そして、全ての仮想文書ネットワークについて、S160~S220の処理を実行すると、第2スコアリング部143は、図4に示す処理を終了する。第2スコアリング部143は、この処理によって記憶部20内に記憶した各ウェブページの第2スコアを、必要に応じ、記憶部20から読み出すことができる。例えば、第2スコアリング部143は、検索クエリが発生したときに、検索クエリに対応する関連ページのそれぞれの第2スコアを記憶部20から読み出して、ランク付け部145に提供することができる。 Then, when the processing of S160 to S220 is executed for all the virtual document networks, the second scoring unit 143 ends the processing shown in FIG. The second scoring unit 143 can read out the second score of each web page stored in the storage unit 20 by this processing from the storage unit 20 as needed. For example, when the search query occurs, the second scoring unit 143 can read out the second score of each of the related pages corresponding to the search query from the storage unit 20 and provide the second score to the ranking unit 145.
 本実施形態によれば、図4に示す処理によって、各ウェブページに対応する成分を第4象限に配置し、更には始点ページSPに対応する成分を特定位置に配置して、値Z[m」を算出するので、異なる連結文書ネットワークのウェブページに対して、共通の尺度で、第2スコアを算出することができる。 According to the present embodiment, the processing shown in FIG. 4 arranges the components corresponding to each web page in the fourth quadrant, and further arranges the components corresponding to the start point page SP at a specific position, with the value Z m The second score can be calculated on a common scale for web pages of different consolidated document networks.
 従って、ランク付け部145は、異なる連結文書ネットワークのウェブページが関連ページ群に含まれる場合であっても、第2スコアに基づいて、各関連ページの全体得点、即ち、ページランクを適切に算出することができる。 Therefore, the ranking unit 145 appropriately calculates the overall score of each related page, that is, the page rank, based on the second score, even when web pages of different connected document networks are included in the related page group. can do.
 即ち、ランク付け部145は、各ウェブページの第2スコアに基づき、検索結果リストにおけるウェブページの配列を、第1スコア及び第2スコアに基づくページランクYという形態で、第2スコアが高い程、対応するウェブページが上位に配置されるように決定することができる。 That is, based on the second score of each web page, the ranking unit 145 sets the arrangement of the web page in the search result list in such a form that the page rank Y is based on the first score and the second score. , It can be determined that the corresponding web page is placed at the top.
 出力部147は、このようなウェブページ間の接続関係に基づいた適切な検索結果リストを、ユーザ端末5に提供することができる。具体的には、高い第2スコアが算出される、始点ページから離れたリンクの集まるウェブページほど、上位に位置する検索結果リストを、ユーザ端末5に提供することができる。従って、本実施形態によれば、従来よりも適切に接続関係に従う複数のウェブページのスコアリング/ランク付けを行って、適切な検索結果をユーザ端末5に提供することができる。 The output unit 147 can provide the user terminal 5 with an appropriate search result list based on such a connection relationship between web pages. Specifically, it is possible to provide the user terminal 5 with a search result list that is positioned higher as the web page where links away from the start point page where a high second score is calculated gathers. Therefore, according to the present embodiment, scoring / ranking of a plurality of web pages according to the connection relationship more appropriately than in the past can be performed, and appropriate search results can be provided to the user terminal 5.
 ここで、図5A、図7A、図8A、図9A、及び図11に示す連結文書ネットワークのそれぞれに関して、本実施形態の技術を適用して算出される第2スコアに基づく、ウェブページのランク付けを説明する。図5Aに示される連結文書ネットワークのウェブページ群は、従来手法によれば、「5」「4」「3」「2」「1」の順序でランク付けされる。ここで示される数字は、対応する図において円内に記載される数字に対応する。即ち、「5」は、図5Aにおける「5」と記載されたノードに対応するウェブページを指し示す。ここでの数字の配列は、「5」のウェブページの第2スコアが最も高く、従ってランクが最も高く、「1」のウェブページの第2スコアが最も小さく、従ってランクが最も低いことを意味する。 Here, regarding each of the linked document networks shown in FIG. 5A, FIG. 7A, FIG. 8A, FIG. 9A, and FIG. 11, ranking of web pages based on the second score calculated by applying the technique of this embodiment. Explain. Web pages of the linked document network shown in FIG. 5A are ranked in the order of “5” “4” “3” “2” “1” according to the conventional method. The numbers shown here correspond to the numbers described in the circles in the corresponding figures. That is, “5” indicates the web page corresponding to the node described as “5” in FIG. 5A. The arrangement of numbers here means that the second score of the '5' web page is the highest, so the rank is the highest, the second score of the '1' web page is the lowest, and so the rank is the lowest. Do.
 本実施形態によれば、図5Aの連結文書ネットワーク内のウェブページ群は、従来手法と同様に、「5」「4」「3」「2」「1」の順序でランク付けされる。
 図7Aの連結文書ネットワーク内のウェブページ群は、従来手法によれば、「4」「3」「7」「8」「6」「2」「5」「1」の順序でランク付けされ、本実施形態によれば、「4」「3」「2」「5」「6」「7」「8」「1」の順序でランク付けされる。
According to the present embodiment, the web pages in the linked document network of FIG. 5A are ranked in the order of “5” “4” “3” “2” “1” as in the conventional method.
Web pages in the linked document network of FIG. 7A are ranked in the order of “4” “3” “7” “8” “6” “2” “5” “1” according to the conventional method, According to the present embodiment, the ranking is performed in the order of “4” “3” “2” “5” “6” “7” “8” “1”.
 図8Aの連結文書ネットワーク内のウェブページ群は、従来手法によれば、「5」「4」「2」「3」「1」の順序でランク付けされ、本実施形態によれば、「5」「4」「3」「2」「1」の順序でランク付けされる。 According to the conventional method, web pages in the linked document network of FIG. 8A are ranked in the order of “5” “4” “2” “3” “1”, and according to this embodiment “5” "4" "3" "2" "1" is ranked in the order.
 図9Aの連結文書ネットワーク内のウェブページ群は、従来手法によれば、「4」「3」「2」「1」の順序でランク付けされ、本実施形態によれば、従来手法と同様に、「4」「3」「2」「1」の順序でランク付けされる。 Web pages in the linked document network of FIG. 9A are ranked in the order of “4” “3” “2” “1” according to the conventional method, and according to the present embodiment, similar to the conventional method. , "4" "3" "2" "1" is ranked.
 図11の連結文書ネットワーク内のウェブページ群は、従来手法によれば、「4」「6」「7」「1」「2」「8」「9」「3」「5」「10」の順序でランク付けされ、本実施形態によれば、「4」「6」「7」「2」「1」「3」「5」「8」「9」「10」の順序でランク付けされる。 According to the conventional method, the web pages in the linked document network of FIG. 11 are “4” “6” “7” “1” “2” “8” “9” “3” “5” “10” It is ranked in order, and according to the present embodiment, it is ranked in the order of “4” “6” “7” “2” “1” “3” “5” “8” “9” “10” .
 このことから理解できるように、特殊エルミート隣接行列H1によれば、第2スコアは、始点ページSPから離れたリンクの集まるウェブページほど大きな値を示すように算出される。第2スコアは、多くのリンクが集まるウェブページほど、大きな値を示すように算出される。第2スコアは、高い重要得点を持つウェブページからリンクされるウェブページほど大きな値を示すように算出される。第2スコアは、他のウェブページへのリンクの少ないウェブページからのリンクを持つウェブページほど大きな値を示すように算出される。 As can be understood from this, according to the special Hermite adjacency matrix H1, the second score is calculated so as to indicate a larger value as the web page in which the links away from the start page SP gather. The second score is calculated to indicate a larger value as a web page with more links gathers. The second score is calculated such that web pages linked from web pages having high importance scores show larger values. The second score is calculated such that web pages having links from web pages with few links to other web pages show larger values.
 第2スコアは更に、他のウェブページへのリンクが多いウェブページほど、大きな値を示すように算出される。第2スコアは更に、高い重要得点を持つウェブページへのリンクを有するウェブページほど大きな値を示すように算出される。この点が特に、特殊エルミート隣接行列H1の特徴である。従って、本実施形態によれば、ウェブページの良好なスコアリングを実現することができる。 The second score is further calculated to indicate a larger value as the web page with more links to other web pages. The second score is further calculated such that web pages having links to web pages having high importance scores show larger values. This point is particularly the feature of the special Hermite adjacency matrix H1. Therefore, according to the present embodiment, it is possible to realize good scoring of web pages.
 続いて、上述した情報処理システム1の変形例について説明する。以下に説明する変形例は、第2スコアリング部143が実行する処理の一部が、上述した情報処理システム1と異なるだけである。変形例のその他の構成は、上述した情報処理システム1と同一である。従って、以下では、変形例に特徴的な第2スコアリング部143の処理の内容を選択的に説明する。 Then, the modification of information processing system 1 mentioned above is explained. The modification described below is different from the above-described information processing system 1 only in part of the processing performed by the second scoring unit 143. The other configuration of the modification is the same as that of the information processing system 1 described above. Therefore, in the following, the contents of the process of the second scoring unit 143 characteristic of the modification will be selectively described.
 [第一変形例]
 第2スコアリング部143は、S210(図4参照)において、角度θ[m]として、実軸からの角度ではなく、始点ページSPに対応する成分からの角度を用いて、値Z[m]=L[m]・θ[m]を算出する。第一変形例によれば、角度θ[m]の算出が、上述した実施形態より複雑になるが、多くの場合、適切に第2スコアを算出することができる。
First Modification
In S210 (see FIG. 4), the second scoring unit 143 uses the angle from the component corresponding to the start point page SP, not the angle from the real axis, as the angle θ [m], to obtain the value Z [m]. = L [m] · θ [m] is calculated. According to the first modification, the calculation of the angle θ [m] is more complicated than the above-described embodiment, but in many cases, the second score can be calculated appropriately.
 [第二変形例]
 第2スコアリング部143は、S200の処理を実行することなく、S190で固有ベクトルV1を生成した後、S210の処理を実行する。S210において、第2スコアリング部143は、上述した固有ベクトルVcに代えて、固有ベクトルV1を用いて、値Z[m]を算出する。即ち、値Z[m]を、固有ベクトルV1の成分V[m]/Eに対応する値L[m]及びθ[m]に基づいて、式Z[m]=L[m]・θ[m]に従って算出する。この例は、双方向リンクが存在しない連結文書ネットワークであって、始点ページSPよりも第1象限側に配置されるウェブページが現れない連結文書ネットワークに適用することができる。
Second Modification
The second scoring unit 143 executes the process of S210 after generating the eigenvector V1 in S190 without executing the process of S200. In S210, the second scoring unit 143 calculates the value Z [m] using the eigenvector V1 instead of the eigenvector Vc described above. That is, based on the values L [m] and θ [m] corresponding to the component V [m] / E of the eigenvector V1, the value Z [m] is expressed by the equation Z [m] = L [m] · θ [m Calculate according to]. This example can be applied to a linked document network in which there is no bi-directional link, and in which no web page placed in the first quadrant side of the start page SP appears.
 あるいは、第二変形例のS210では、式Z[m]=L[m]・(θ[m]+β)に従って、値Z[m]が算出されてもよい。βは、θ[m]+βがπ/2ラジアンを超えないように、π/180ラジアンなどの微小な値に定められ得る。この例によれば、連結文書ネットワークに双方向リンクが存在する場合にも、値L[m]の情報を含む非ゼロの値Z[m]を算出することができ、適切な第2スコアを算出することができる。 Alternatively, in S210 of the second modification, the value Z [m] may be calculated according to the equation Z [m] = L [m] · (θ [m] + β). β may be set to a small value such as π / 180 radians so that θ [m] + β does not exceed π / 2 radians. According to this example, even when there is a bi-directional link in the linked document network, non-zero value Z [m] including information of value L [m] can be calculated, and an appropriate second score can be calculated. It can be calculated.
 [第三変形例]
 第2スコアリング部143は、S170において、上述の特殊エルミート隣接行列H1に代えて、別の特殊エルミート隣接行列H2を生成する。S180では、この特殊エルミート隣接行列H2の固有値及び固有ベクトルを算出する(S180)。ここで算出された絶対値最大の固有値に対応する固有ベクトルVを用いて、S190以降の処理を実行する。即ち、第三変形例の第2スコアリング部143は、特殊エルミート隣接行列H1に代えて、別の特殊エルミート隣接行列H2を用いて、上述した実施形態と同様の処理を実行する。
Third Modification
In S170, the second scoring unit 143 generates another special Hermite adjacency matrix H2 instead of the above-mentioned special Hermite adjacency matrix H1. In S180, the eigenvalues and eigenvectors of the special Hermite adjacency matrix H2 are calculated (S180). The processes after S190 are executed using the eigenvector V corresponding to the eigenvalue with the maximum absolute value calculated here. That is, the second scoring unit 143 of the third modification executes the same process as the above-described embodiment, using another special Hermite adjacency matrix H2 instead of the special Hermite adjacency matrix H1.
 図16に示される特殊エルミート隣接行列H2は、図13上段に示すエルミート隣接行列Hに対応する特殊エルミート隣接行列H2の例である。第2スコアリング部143は、S170において、エルミート隣接行列Hにおける値+iの成分を、値C1(C2+i)に置換し、値-iを示す成分を値C1(C2-i)に置換する。第2スコアリング部143は更に、次の処理A及び処理Bを行う。 The special Hermite adjacency matrix H2 shown in FIG. 16 is an example of the special Hermitian adjacency matrix H2 corresponding to the Hermite adjacency matrix H shown in the upper part of FIG. In S170, second scoring unit 143 replaces the component of value + i in Hermitian adjacency matrix H with value C1 (C2 + i), and replaces the component indicating value -i with value C1 (C2-i). The second scoring unit 143 further performs the following processing A and processing B.
 (処理A)
 第2スコアリング部143は、置換後のエルミート隣接行列Hにおける各行の成分内の値C1(C2+i)を、同じ行において値C1(C2+i)及び値1を示す成分の数Wで除算した値{C1(C2+i)/W}に変更し、更に、値{C1(C2+i)/W}に変更された成分と対角成分を挟んで対称的な位置関係にある成分内の値C1(C2-i)を、値{C1(C2+i)/W}の複素共役{C1(C2-i)/W}に変更する。
(Process A)
The second scoring unit 143 is a value obtained by dividing the value C1 (C2 + i) in the component of each row in the Hermite adjacency matrix H after replacement by the number W of the components indicating the value C1 (C2 + i) and the value 1 in the same row { Change to C1 (C2 + i) / W}, and further change the value {C1 (C2 + i) / W} to the value C1 (C2-i) in the component which is in a symmetrical positional relationship with respect to the diagonal component. Change) to the complex conjugate {C1 (C2-i) / W} of value {C1 (C2 + i) / W}.
 (処理B)
 第2スコアリング部143は、置換後のエルミート隣接行列Hにおける各行の成分内の値C1(C2-i)を、同じ行においてC1(C2-i)及び値1を示す成分の数Zで乗算した値{C1(C2-i)Z}に変更し、更に、値{C1(C2-i)Z}に変更された成分と対角成分を挟んで対称的な位置関係にある成分内の値C1(C2+i)を、値{C1(C2-i)Z}の複素共役{C1(C2+i)Z}に変更する。
(Process B)
The second scoring unit 143 multiplies the values C1 (C2-i) in the components of each row in the Hermite adjacency matrix H after replacement by C1 (C2-i) and the number Z of the components indicating the value 1 in the same row. Change the value to {C1 (C2-i) Z}, and further change the value to a value {C1 (C2-i) Z} and the value in the component in a symmetrical positional relationship with respect to the diagonal component. Change C1 (C2 + i) to the complex conjugate {C1 (C2 + i) Z} of value {C1 (C2-i) Z}.
 このような置換及び変更によって、特殊エルミート隣接行列H2は生成される。第2スコアリング部143は、処理Aの実行後、処理Bを実行してもよいし、処理Bの実行後、処理Aを実行してもよいし、処理A及び処理Bを同時並行的に実行してもよい。いずれの態様で処理A及び処理Bを実行しても、同じ特殊エルミート隣接行列H2が生成される。 By such substitution and modification, a special Hermite adjacency matrix H2 is generated. The second scoring unit 143 may execute the process B after the execution of the process A, or may execute the process A after the execution of the process B. It may be executed. The same special Hermite adjacency matrix H2 is generated by performing process A and process B in any manner.
 正しい理解のために、ここで、処理Aの実行後、処理Bを実行する場合の処理手順について再度説明する。第2スコアリング部143は、置換後のエルミート隣接行列Hにおける各行の成分の値C1(C2+i)を、同じ行において値C1(C2+i)及び値1を示す成分の数Wで除算した値{C1(C2+i)/W}に変更し、更に、値{C1(C2+i)/W}に変更された成分と対角成分を挟んで対称的な位置関係にある成分の値C1(C2-i)を、値{C1(C2+i)/W}の複素共役{C1(C2-i)/W}に変更する。第2スコアリング部143は、上記変更後のエルミート隣接行列Hにおける各行の成分の値{C1(C2-i)/W}を、上記変更前において同じ行でC1(C2-i)及び値1を示していた成分の数Zで乗算した値{C1(C2-i)Z/W}に変更し、更に、値{C1(C2-i)Z/W}に変更された成分と対角成分を挟んで対称的な位置関係にある成分の値{C1(C2+i)/W}を、値{C1(C2-i)Z/W}の複素共役{C1(C2+i)Z/W}に変更する。上記置換後の処理Aによって、まず特殊エルミート隣接行列H1が生成され、その後の処理Bによって、本変形例の特殊エルミート隣接行列H2が生成される。 Here, the processing procedure in the case of executing the process B after the execution of the process A will be described again for correct understanding. The second scoring unit 143 is a value obtained by dividing the value C1 (C2 + i) of the component of each row in the Hermite adjacency matrix H after replacement by the number W of the components indicating the value C1 (C2 + i) and the value 1 in the same row {C1 Change to (C 2 + i) / W}, and further, change the value {C 1 (C 2 + i) / W} to the component and the value C 1 (C 2 -i) of the component in symmetrical positional relationship across the diagonal component. , To the complex conjugate {C1 (C2-i) / W} of value {C1 (C2 + i) / W}. The second scoring unit 143 sets the values {C1 (C2-i) / W} of the components of each row in the Hermitian adjacency matrix H after the change to C1 (C2-i) and the value 1 in the same row before the change. Is changed to the value {C1 (C2-i) Z / W} multiplied by the number Z of the components that have been indicated, and further, the component and the diagonal component changed to the value {C1 (C2-i) Z / W} Change the value of the component {C1 (C2 + i) / W} in a symmetrical positional relationship with respect to n to the complex conjugate {C1 (C2 + i) Z / W} of value {C1 (C2-i) Z / W} . The special Hermite adjacency matrix H1 is first generated by the above-described processing A after substitution, and the special Hermite adjacency matrix H2 of this modification is generated by the subsequent processing B.
 図16に示される特殊エルミート隣接行列H2における値Z1は、第p3行における成分h(p3,1),h(p3,2),…,h(p3,N)の内、値-iを採る成分及び値1を採る成分の数に対応する。値Z2は、第p4行における合計N個の成分h(p4,1),h(p4,2),…,h(p4,N)の内、値-iを採る成分及び値1を採る成分の数に対応する。第p3行は、図16において値{C1(C2-i)Z1/W1}が示される行と理解されてよい。第p4行は、図16において値{C1(C2-i)Z2/W2}が示される行と理解されてよい。 The value Z1 in the special Hermite adjacency matrix H2 shown in FIG. 16 takes the value -i among the components h (p3, 1), h (p3, 2), ..., h (p3, N) in the p3 row. It corresponds to the number of components and components that take the value one. The value Z2 is a component taking the value -i and a component taking the value 1 out of a total of N components h (p4,1), h (p4,2), ..., h (p4, N) in the p4th row Correspond to the number of Line p3 may be understood as the line whose value {C1 (C2-i) Z1 / W1} is shown in FIG. Line p4 may be understood as the line whose value {C1 (C2-i) Z2 / W2} is shown in FIG.
 第2スコアリング部143は、このように算出した特殊エルミート隣接行列H2を、特殊エルミート隣接行列H1に代えて用いて、S180-S220の処理を実行する。
 図5A、図7A、図8A、図9A、及び図11に示す連結文書ネットワークのそれぞれに関して、第三変形例を適用して算出される第2スコアに基づく、ウェブページのランク付けを、上述した実施形態での記述方式と同様の記述方式で説明する。
The second scoring unit 143 uses the special Hermite adjacency matrix H2 thus calculated in place of the special Hermitian adjacency matrix H1 to execute the processing of S180 to S220.
For each of the linked document networks shown in FIGS. 5A, 7A, 8A, 9A, and 11, the ranking of web pages based on the second score calculated by applying the third modification is described above. A description method similar to the description method in the embodiment will be described.
 第三変形例によれば、図5Aの連結文書ネットワーク内のウェブページ群は、「5」「4」「3」「2」「1」の順序でランク付けされる。図7Aの連結文書ネットワーク内のウェブページ群は、「3」「4」「2」「5」「6」「7」「8」「1」の順序でランク付けされる。 According to a third variant, the web pages in the consolidated document network of FIG. 5A are ranked in the order of “5” “4” “3” “2” “1”. Web pages in the consolidated document network of FIG. 7A are ranked in the order of “3” “4” “2” “5” “6” “7” “8” “1”.
 図8Aの連結文書ネットワーク内のウェブページ群は、「5」「4」「3」「2」「1」の順序でランク付けされる。図9Aの連結文書ネットワーク内のウェブページ群は、本「4」「3」「1」「2」の順序でランク付けされる。図11の連結文書ネットワーク内のウェブページ群は、「4」「6」「7」「2」「1」「3」「5」「8」「9」「10」の順序でランク付けされる。 Web pages in the consolidated document network of FIG. 8A are ranked in the order of “5” “4” “3” “2” “1”. Web pages in the consolidated document network of FIG. 9A are ranked in the order of books “4” “3” “1” “2”. Web pages in the linked document network of FIG. 11 are ranked in the order of “4” “6” “7” “2” “1” “3” “5” “8” “9” “10” .
 特殊エルミート隣接行列H2は、第2スコアが、他のウェブページからのリンクが多いウェブページへのリンクを持つウェブページほど、大きな値を示すように算出される点で、特殊エルミート隣接行列H1とは異なる。この点が特に、特殊エルミート隣接行列H2の特徴である。 The special Hermitian adjacency matrix H2 is different from the special Hermitian adjacency matrix H1 in that the second score is calculated to indicate a larger value as the web page having links to web pages with more links from other web pages. Is different. This point is particularly the feature of the special Hermite adjacency matrix H2.
 特殊エルミート隣接行列H2は、その他の点で特殊エルミート隣接行列H1と同様の特徴を示すが、第2スコアが、高い重要得点を持つウェブページへのリンクを有するウェブページほど大きな値を示すように算出される特徴は、特殊エルミート隣接行列H1に特有である。第三変形例によっても、ウェブページの良好なスコアリングを実現することができる。 The special Hermitian adjacency matrix H2 exhibits features similar to the special Hermitian adjacency matrix H1 in other respects, but the second score shows larger values for web pages having links to web pages with higher importance scores. The calculated features are specific to the special Hermitian adjacency matrix H1. According to the third modification, it is possible to realize good scoring of web pages.
 [第四変形例]
 第四変形例では、連結文書ネットワークが、入次数ゼロのウェブページ及び出次数ゼロのウェブページを含むことを前提とする。更に言えば、一つの始点ページSPから一つの終点ページEPへ複数の経路が存在する環式構造を有する連結文書ネットワーク、及び、相互リンクが存在する連結文書ネットワークを取り扱わない。これらを前提に、第2スコアリング部143は、図4に示す処理に代えて、図17に示す処理を実行するように構成される。図17に示す処理は、図4に示す処理と同じ部分も多い。従って、以下では、同じ部分に関しての説明を簡略化する。
Fourth Modified Example
In the fourth variation, it is assumed that the connected document network includes in-degree zero web pages and out-degree zero web pages. Furthermore, it does not deal with a linked document network having a cyclic structure in which a plurality of paths exist from one start page SP to one end page EP, and a linked document network in which mutual links exist. On the premise of these, the second scoring unit 143 is configured to execute the process shown in FIG. 17 instead of the process shown in FIG. The process shown in FIG. 17 has many parts that are the same as the process shown in FIG. Therefore, in the following, the description of the same parts will be simplified.
 図17に示す処理を開始すると、第2スコアリング部143は、S110と同様に、全ウェブページの中から、一つ以上の連結文書ネットワークを抽出する(S310)。その後、第2スコアリング部143は、連結文書ネットワークの夫々について、始点ページSPを設定する(S320)。連結文書ネットワークに含まれる入次数がゼロのウェブページが一つのみであるとき、対応する連結文書ネットワークの始点ページSPは、入次数がゼロのウェブページに設定される。対応する連結文書ネットワークに入次数がゼロのウェブページが複数含まれる場合、始点ページSPは、経路長に基づいて設定される。 When the process shown in FIG. 17 is started, the second scoring unit 143 extracts one or more connected document networks from all the web pages, as in S110 (S310). Thereafter, the second scoring unit 143 sets the start point page SP for each of the connected document networks (S320). When there is only one web page with an in-degree of zero included in the linked document network, the start page SP of the corresponding linked document network is set to the web page with an in-degree of zero. If the corresponding connected document network includes a plurality of web pages with zero indegree, the start page SP is set based on the path length.
 具体的に、第2スコアリング部143は、入次数がゼロのウェブページから出次数がゼロのウェブページまでのリンクの向きに従う経路のうち、経路長が最大の経路を探索する。第2スコアリング部143は、経路長が最大の経路において、入次数がゼロのウェブページを、始点ページSPに設定する。経路長が最大の経路が複数存在するとき、第2スコアリング部143は、図10A及び図10Bに示すように、複数の経路の始点に接続する入次数ゼロのダミーページを配置して、このダミーページを始点ページSPに設定することができる。具体的に、第2スコアリング部143は、経路長が最大の経路が複数存在し、これら複数の経路の始点において出次数が最小(又は最大)の始点が複数存在するとき、第2スコアリング部143は、図10A及び図10Bに示すように、複数の経路の始点に接続する入次数ゼロのダミーページを配置して、このダミーページを始点ページSPに設定することができる。第2スコアリング部143は、経路長が最大の経路が複数存在するが、これら複数の経路の始点において出次数が最小(又は最大)の始点が一つのみであるとき、その最小(又は最大)の出次数の始点を、始点ページSPに設定することができる。 Specifically, the second scoring unit 143 searches for a path having the largest path length among the paths according to the direction of the link from the web page with zero in-degree to the web page with zero in-degree. The second scoring unit 143 sets a web page with an in-degree of zero as the start point page SP in the path with the largest path length. When there are a plurality of paths having the largest path length, the second scoring unit 143 arranges dummy pages of in-degree zero connected to the start points of the plurality of paths as shown in FIGS. 10A and 10B. A dummy page can be set as the start page SP. Specifically, the second scoring unit 143 performs the second scoring when there are a plurality of paths with the largest path length and there are a plurality of start points with the smallest (or largest) outdegree at the start points of the plurality of paths. As shown in FIGS. 10A and 10B, the unit 143 can arrange a dummy page of in-degree zero connected to the start points of a plurality of paths, and set the dummy page as the start page SP. When there are a plurality of paths with the largest path length but there is only one starting point with the smallest (or largest) outgoing degree at the starting points of the plurality of paths, the second scoring unit 143 is the smallest (or the largest) The start point of the out-degree of) can be set in the start page SP.
 S320での処理を終えると、第2スコアリング部143は、S130の処理と同様に、連結文書ネットワークの夫々に関して、仮想文書ネットワークを設定する(S330)。即ち、対応する連結文書ネットワーク内に存在する出次数がゼロのウェブページを終点ページEPとして探索し、終点ページEPにダミーページDPを付加して、仮想文書ネットワークを設定する。 After the process of S320, the second scoring unit 143 sets a virtual document network for each of the connected document networks, as in the process of S130 (S330). That is, a web page with a zero outgoing degree existing in the corresponding linked document network is searched as the end point page EP, a dummy page DP is added to the end page EP, and a virtual document network is set.
 その後、第2スコアリング部143は、S140と同様に、仮想文書ネットワークが有する要素数Nの最大値max{N}を特定する(S340)。続くS350において、第2スコアリング部143は、仮想文書ネットワークの一つを処理対象に選択する。 Thereafter, the second scoring unit 143 specifies the maximum value max {N} of the number N of elements of the virtual document network, as in S140 (S340). In the subsequent S350, the second scoring unit 143 selects one of the virtual document networks as a processing target.
 続くS360において、第2スコアリング部143は、S160と同様に、選択文書ネットワークに対応するエルミート隣接行列Hを生成する。更に、第2スコアリング部143は、エルミート隣接行列Hの変形により、特殊エルミート隣接行列H3を生成する(S370)。変形に際して、第2スコアリング部143は、S340で算出した最大値max{N}に基づいて次式に従う補正量Cを算出する。 In S360, the second scoring unit 143 generates a Hermitian adjacency matrix H corresponding to the selected document network, as in S160. Furthermore, the second scoring unit 143 generates a special Hermite adjacency matrix H3 by the deformation of the Hermite adjacency matrix H (S370). At the time of deformation, the second scoring unit 143 calculates the correction amount C according to the following equation based on the maximum value max {N} calculated in S340.
Figure JPOXMLDOC01-appb-M000008
 上式におけるパラメータnは、特殊エルミート隣接行列H3の固有ベクトルの各成分を、複素ベクトルとして複素平面に配置したときに、成分の全てがπ/2ラジアンの角度範囲に収まるように予め決定される。パラメータnは、S340で特定された最大値max{N}以上の自然数で定められ得る。パラメータnは、例えば、n=max{N}であってもよいし、n=2(max{N}-1)であってもよい。第四変形例の場合には、始点ページSPに対応する成分が複素平面において実軸上に配置されるので、理論上はn=max{N}-1に設定しても成分の全てを複素空間の第4象限に収めることができる。
Figure JPOXMLDOC01-appb-M000008
The parameter n in the above equation is predetermined so that all of the components fall within an angle range of π / 2 radian when each component of the eigenvectors of the special Hermite adjacency matrix H3 is arranged as a complex vector in the complex plane. The parameter n may be defined as a natural number equal to or greater than the maximum value max {N} specified in S340. The parameter n may be, for example, n = max {N} or n = 2 (max {N} -1). In the case of the fourth modification, since the component corresponding to the start page SP is arranged on the real axis in the complex plane, theoretically all components are complex even if n = max {N} -1 is set. It can fit in the fourth quadrant of space.
 第2スコアリング部143は、補正量Cの算出後、エルミート隣接行列Hにおける値+iの成分を、値(C+i)に置換し、更に、当該置換後のエルミート隣接行列Hにおける各行の値(C+i)を示す成分を、同じ行において値(C+i)を示す成分の数Rで除算した値{(C+i)/R}に変更し、更に、値-iを示す成分の値を、対角成分を挟んで対称的な位置関係にある成分の複素共役{(C-i)/R}に変更する。 After calculating the correction amount C, the second scoring unit 143 replaces the component of value + i in the Hermite adjacency matrix H with a value (C + i), and further, the value of each row in the Hermite adjacency matrix H after the replacement (C + i The component which shows) is changed into the value {(C + i) / R} which divided the number R of the components which show value (C + i) in the same line, and further, the value of the component which shows value -i, the diagonal component Change to the complex conjugate {(C−i) / R} of the components in symmetrical positional relationship with each other.
 第2スコアリング部143は、このような置換及び変更によって定義されるエルミート行列を、特殊エルミート隣接行列H3として生成する。図18に示される特殊エルミート隣接行列H3は、図13上段に示すエルミート隣接行列Hに対応する特殊エルミート隣接行列H3の例である。 The second scoring unit 143 generates a Hermitian matrix defined by such permutations and changes as a special Hermite adjacency matrix H3. The special Hermite adjacency matrix H3 shown in FIG. 18 is an example of the special Hermitian adjacency matrix H3 corresponding to the Hermite adjacency matrix H shown in the upper part of FIG.
 エルミート隣接行列Hにおいて、第p1行における合計N個の成分h(p1,1),h(p1,2),…,h(p1,N)の内、値+iを採る成分がR1個である場合には、第p1行の値+iを示す各成分が、値{(C+i)/R1}に変更される。第p2行における合計N個の成分h(p2,1),h(p2,2),…,h(p2,N)の内、値+iを採る成分がR2個である場合には、エルミート隣接行列Hにおける第p2行の値+iを示す各成分が、値{(C+i)/R2}に変更される。 In the Hermitian adjacency matrix H, of the total N components h (p1,1), h (p1,2),..., H (p1, N) in the p1-th row, the component taking the value + i is R1 In the case, each component indicating the value + i in the p1 row is changed to the value {(C + i) / R1}. Among the total N components h (p2,1), h (p2,2),..., H (p2, N) in the row p2, if there are R2 components taking the value + i, Hermitian adjacent Each component indicating the value + i of the p2 th row in the matrix H is changed to the value {(C + i) / R2}.
 更に値{(C+i)/R1}を示す成分h(p1,q1)と対角成分を挟んで対称的な位置関係にある成分h(q1,p1)の値が、{(C-i)/R1}に変更される。同様に、値{(C+i)/R2}を示す成分h(p2,q2)と対角成分を挟んで対称的な位置関係にある成分h(q2,p2)の値が、{(C-i)/R2}に変更される。このようにして特殊エルミート隣接行列H3は生成される。 Furthermore, the value of component h (q1, p1) that is in a symmetrical positional relationship with respect to component h (p1, q1) indicating the value {(C + i) / R1} with respect to the diagonal component is {(Ci) / It is changed to R1}. Similarly, the value of the component h (q2, p2) that is in a symmetrical positional relationship with respect to the component h (p2, q2) indicating the value {(C + i) / R2} with respect to the diagonal component is {(Ci) Change to /). Thus, the special Hermitian adjacency matrix H3 is generated.
 第四変形例によれば、双方向リンクが存在する連結文書ネットワークを取扱わない。このため、エルミート隣接行列Hには値1の成分が含まれない。このことから理解できるように、数Rは、上記実施形態の数Wに対応する。 According to a fourth variant, it does not handle linked document networks in which bi-directional links exist. Therefore, the Hermitian adjacency matrix H does not include the component of value 1. As can be understood from this, the number R corresponds to the number W of the above embodiment.
 続くS380において、第2スコアリング部143は、S180の処理と同様に、S370で生成した特殊エルミート隣接行列H3の固有値及び固有ベクトルを算出する。続くS390において、第2スコアリング部143は、S190の処理と同様に、特殊エルミート隣接行列H3の絶対値最大の固有値に対応する固有ベクトルVの各成分V[m](1≦m≦N)を、選択文書ネットワークの始点ページSPに対応する成分Eで除算する。絶対値最大の固有値が正の固有値と負の固有値を含む場合、正の固有値に対応する固有ベクトルVの各成分V[m](1≦m≦N)を、選択文書ネットワークの始点ページSPに対応する成分Eで除算することができる。この除算により、始点ページSPに対応する固有ベクトルVの成分は、複素平面において、実軸上に配置される。 In the following S380, the second scoring unit 143 calculates the eigenvalues and eigenvectors of the special Hermite adjacency matrix H3 generated in S370, as in the process of S180. In the following S390, the second scoring unit 143 processes each component V [m] (1 ≦ m ≦ N) of the eigenvector V corresponding to the eigenvalue with the maximum absolute value of the special Hermite adjacency matrix H3, as in the process of S190. , Divide by the component E corresponding to the start page SP of the selected document network. Corresponds to each component V [m] (1 ≦ m ≦ N) of the eigenvector V corresponding to a positive eigenvalue when the maximum absolute value eigenvalue includes a positive eigenvalue and a negative eigenvalue to the start page page SP of the selected document network Can be divided by the component E. By this division, the components of the eigenvector V corresponding to the start page SP are arranged on the real axis in the complex plane.
 S400において、第2スコアリング部143は、除算後の固有ベクトルV1の各成分V[m]/E(1≦m≦N)を複素ベクトルとして複素平面上に配置したときの、各成分V[m]/E(1≦m≦N)の始点ページSPに対応する成分(実軸)からの回転方向の距離に対応する値Z[m]=L[m]・θ[m](1≦m≦N)を、対応するウェブページD[m](1≦m≦N)の第2スコアX2=Z[m](1≦m≦N)として算出する。ダミーページDPに対応する成分についての第2スコアX2の算出は不要である。そして、算出した第2スコアX2を記憶部20に記憶する。 In S400, the second scoring unit 143 arranges each component V [m] / E (1 ≦ m ≦ N) of the eigenvector V1 after division on the complex plane as a complex vector. ] / E (1 ≦ m ≦ N) A value Z [m] = L [m] · θ [m] (1 ≦ m) corresponding to the distance in the rotational direction from the component (real axis) corresponding to the start point page SP ≦ N is calculated as the second score X2 = Z [m] (1 ≦ m ≦ N) of the corresponding web page D [m] (1 ≦ m ≦ N). It is not necessary to calculate the second score X2 for the component corresponding to the dummy page DP. Then, the calculated second score X2 is stored in the storage unit 20.
 第四変形例における値θ[m]は、図19に示すように、成分V[m]/Eの実軸からの時計回り方向の角度に対応する。始点ページSPは、実軸上に配置される。このため、値θ[m]は、始点ページSPに対応する成分を基準とした時計回り方向の角度ということもできる。即ち、値θ[m]は、成分V[m]/Eと始点ページSPに対応する成分V[s]/Eとのなす角度に対応する。値L[m]は、成分V[m]/Eの絶対値|V[m]/E|に対応する。図19に示す白丸及び黒丸の意味は、図15Aに関連して説明した通りである。 The value θ [m] in the fourth modified example corresponds to the angle in the clockwise direction from the real axis of the component V [m] / E, as shown in FIG. The start page SP is arranged on the real axis. Therefore, the value θ [m] can also be referred to as an angle in the clockwise direction with respect to the component corresponding to the start point page SP. That is, the value θ [m] corresponds to the angle formed by the component V [m] / E and the component V [s] / E corresponding to the start page SP. The value L [m] corresponds to the absolute value | V [m] / E | of the component V [m] / E. The meanings of the white and black circles shown in FIG. 19 are as described in connection with FIG. 15A.
 このようにして、第2スコアリング部143は、第2スコアとして、固有ベクトルVの成分が始点ページSPから離れているウェブページほど高いスコアを、特殊エルミート隣接行列H3に基づき算出する。 In this way, the second scoring unit 143 calculates, as the second score, a score that is higher for web pages in which the component of the eigenvector V is farther from the start page SP, based on the special Hermite adjacency matrix H3.
 特殊エルミート隣接行列H3に基づけば、補正量Cのおかげで、固有ベクトルV1の各成分V[m]/E(1≦m≦N)が、複素平面上を1回転することなく、かならず複素平面の1象限内に配置される。特には、始点ページSPが複素平面上において値1に対応する実軸上に固定され、固有ベクトルV1の各成分V[m]/Eは、第4象限に配置される。従って、第四変形例によれば、始点ページを基準としたウェブページ間の接続関係に従う重要得点(第2スコア)を、容易かつ適切に算出することができる。 Based on the special Hermite adjacency matrix H3, thanks to the correction amount C, each component V [m] / E (1 ≦ m N N) of the eigenvector V1 must be in the complex plane without making one rotation on the complex plane. Arranged in one quadrant. In particular, the start page SP is fixed on the real axis corresponding to the value 1 in the complex plane, and each component V [m] / E of the eigenvector V1 is arranged in the fourth quadrant. Therefore, according to the fourth modification, it is possible to easily and appropriately calculate the important score (second score) in accordance with the connection relationship between web pages based on the start page.
 第2スコアリング部143は、S400で選択文書ネットワークにおける各ウェブページの第2スコアを算出すると、S410に移行し、全ての仮想文書ネットワークについてS360~S400の処理を実行して、ウェブページに対する第2スコアを算出したか否かを判断する。 After calculating the second score of each web page in the selected document network in S400, the second scoring unit 143 proceeds to S410, executes the processing of S360 to S400 for all virtual document networks, and executes the second scoring for the web page. 2 Determine whether the score has been calculated.
 第2スコアリング部143は、未処理の仮想文書ネットワークが存在すると判断した場合(S410でNo)、処理をS350に戻して、未処理の仮想文書ネットワークの一つを、新たな処理対象の仮想文書ネットワークに選択する。その後、新たな選択文書ネットワークについて、S360~S400の処理を実行し、選択文書ネットワークにおける各ウェブページの第2スコアを算出及び記憶する。そして、全ての仮想文書ネットワークについて、S360~S400の処理を実行すると、図17に示す処理を終了する。第四変形例によれば、第2スコアの算出にかかる計算量を抑えることができるという技術的利点がある。 If the second scoring unit 143 determines that there is an unprocessed virtual document network (No in S410), the process returns to S350, and one of the unprocessed virtual document networks is used as a new virtual object to be processed. Select to document network. Thereafter, the processing of S360 to S400 is executed for the new selected document network, and the second score of each web page in the selected document network is calculated and stored. Then, when the processing of S360 to S400 is executed for all the virtual document networks, the processing shown in FIG. 17 ends. According to the fourth modification, there is a technical advantage that the amount of calculation required to calculate the second score can be suppressed.
 以上、変形例を含む本開示の例示的実施形態を説明したが、本開示の技術は、上述の通り、特殊エルミート隣接行列を用いることが特徴的である。エルミート隣接行列および特殊エルミート隣接行列を含むエルミート行列においては、固有値は必ず実数となることが数学的に証明されている。 Although the exemplary embodiments of the present disclosure including the modifications have been described above, the technology of the present disclosure is characterized by using a special Hermitian adjacency matrix as described above. It is mathematically proved that the eigenvalues are always real in a Hermitian matrix including a Hermitian adjacency matrix and a special Hermitian adjacency matrix.
 更に、エルミート隣接行列および特殊エルミート隣接行列は、文書(ウェブページ)間の接続関係が少なくとも弱連結であれば、概して絶対値最大の固有値について、絶対値最大の固有値が正の固有値のみでありその重複度が1であるか、絶対値最大の固有値が負の固有値のみでありその重複度が1であるか、もしくは、絶対値最大の固有値が正および負の固有値でありそれぞれの重複度が1であることが複数の例において実証されている。 Furthermore, the Hermitian adjacency matrix and the special Hermitian adjacency matrix generally have only positive eigenvalues whose absolute values are only positive eigenvalues for the eigenvalues with the maximum absolute value, provided that the connection between documents (web pages) is at least weakly connected. The degree of duplication is 1 or the eigenvalue with the largest absolute value is only the negative one and the degree of duplication is 1 or the eigenvalue with the largest absolute value is the positive and negative eigenvalues and the degree of each duplication is 1 It has been demonstrated in several cases.
 従って、絶対値最大の固有値が正および負の固有値である場合は正の固有値を採用するという単純な規則を指定することによって、絶対値最大の固有値に対応する固有ベクトルが一意であることを保証することができる。絶対値最大の固有値が正および負の固有値を含む場合、概して、負の固有値の固有ベクトルの各成分の複素数は、実部の正負の符号及び虚部の正負の符号のそれぞれを保持あるいは変更すると、正の固有値の固有ベクトルの対応する各成分と一致する。 Therefore, by designating a simple rule that if the eigenvalue with the largest absolute value is a positive and negative one, adopt the positive eigenvalue to ensure that the eigenvector corresponding to the eigenvalue with the largest absolute value is unique. be able to. When the maximum absolute eigenvalue includes positive and negative eigenvalues, in general, the complex numbers of each component of the negative eigenvalue eigenvector hold or change each of the real part plus and minus signs and the imaginary part plus and minus Match each corresponding component of the positive eigenvector.
 複素数の実部の符号の変更は、複素平面において複素数を虚軸に関して対称となる位置に移動させることに相当し、複素数の虚部の符号の変更は、複素平面において複素数を実軸に関して対称となる位置へと移動させることに相当する。つまり、絶対値最大の固有値が正および負の固有値を含む場合には、負の固有値の固有ベクトルの各成分の複素平面上の位置関係と、正の固有値の固有ベクトルの各成分の複素平面上の位置関係と、が実軸に関する対称関係及び虚軸に関する対称関係を通した対応関係を有する。そこで、絶対値最大の固有値が正および負の固有値である場合は、負の固有値を採用するという単純な規則を指定し、正の固有値を採用した場合と同じ結果を生じさせる操作を行うことも可能である。 A change in the sign of the real part of the complex corresponds to moving the complex to a position symmetrical about the imaginary axis in the complex plane, and a change in the sign of the imaginary part of the complex means that the complex in the complex plane is symmetrical with the real axis It corresponds to moving to the following position. That is, when the eigenvalue with the largest absolute value includes positive and negative eigenvalues, the positional relationship of each component of the eigenvector of the negative eigenvalue on the complex plane and the position of each component of the eigenvector of the positive eigenvalue on the complex plane The relation has a correspondence relation through the symmetrical relation about the real axis and the symmetrical relation about the imaginary axis. Therefore, if the eigenvalue with the largest absolute value is a positive or negative eigenvalue, specify a simple rule of adopting a negative eigenvalue and perform an operation that produces the same result as adopting a positive eigenvalue. It is possible.
 本開示によれば、弱連結を形成する文書ネットワークのそれぞれに対応する特殊エルミート隣接行列を、全ての文書ネットワークの中で最大の文書数を持つ文書ネットワークの文書数という共通項を用いて定義する。従って、各文書ネットワークにおいて各文書に付与されるスコアは、同じ文書ネットワークの中だけなく、異なる文書ネットワーク間で有意義に比較/ランク付け可能である。 According to the present disclosure, the special Hermite adjacency matrix corresponding to each of the document networks forming weak linkage is defined using the common term of the number of documents of the document network having the largest number of documents among all the document networks. . Therefore, the score given to each document in each document network can be meaningfully compared / ranked not only within the same document network but also among different document networks.
 文書間の接続関係を0,1で二値表現する従来の隣接行列において、絶対値が最大の固有値の内で実数である固有値の重複度が1であることを数学的に保証し、対応する固有ベクトルが一意であることをすることを確保するためには、1つの方法として、各文書から他の各文書への実際の接続関係に加え、全文書から全文書への仮想的な接続関係を措定し、一定の確率で前者が用いられ、残りの一定の確率で後者が用いられると想定しなければならない。そのため、文書間の比較/ランクにおいては、前者の確率を変化させると各文書のランクが変化してしまい一貫性がないこと、前者の確率を何にすれば適切な比較/ランクが得られるかは主に経験的に探る他なかったこと、前者の確率の設定が恣意的なランク付けに用いられる余地があったこと、等の難点があった。前者の確率はダンピング・ファクターと呼ばれ、代表的な数値は0.85とされているが、実際の接続関係が0.85という確率でしか用いられないという前提は現実からかけ離れている。さらに、後者の確率について、全文書から全文書への仮想的な接続関係において、ある文書から各文書への仮想的な接続関係が用いられる確率は全て等しいという前提も現実から大きく乖離している。 In the conventional adjacency matrix which expresses the connection relation between documents as 0, 1 in a mathematical manner, it is mathematically guaranteed that the multiplicity of eigenvalues that are real among the largest eigenvalues is 1 and corresponding In order to ensure that the eigenvectors are unique, one way is to add virtual connections from every document to all documents in addition to the actual connections from each document to every other document. It must be assumed that the former is used with a certain probability and the latter is used with a certain remaining probability. Therefore, in the comparison / rank among documents, changing the probability of the former changes the rank of each document and makes it inconsistent, and what should the probability of the former be for the appropriate comparison / rank to be obtained? There was a drawback in that there was nothing other than to explore mainly empirically, that the setting of the probability of the former had room to be used for arbitrary ranking, etc. The former probability is called a damping factor, and a typical numerical value is 0.85. However, the assumption that the actual connection relationship is used only with a probability of 0.85 is far from reality. Furthermore, regarding the latter probability, in the virtual connection relation from all documents to all documents, the assumption that the probability that the virtual connection relation from a certain document to each document is used is also largely divergent from the reality .
 本開示の特殊エルミート隣接行列は、全文書から全文書への仮想的な接続関係を措定する必要がない。そして、本開示の特殊エルミート隣接行列は、文書間の接続関係に基づく各文書のスコアリング/ランク付けについて、従来の隣接行列による方法においてダンピング・ファクターを0.85とした場合と同じ傾向が高い結果を与える。 The special Hermite adjacency matrix of the present disclosure does not have to determine virtual connection relations from all documents to all documents. Then, the special Hermite adjacency matrix of the present disclosure has the same tendency with regard to scoring / ranking of each document based on the connection relation between documents as in the case where the damping factor is 0.85 in the conventional adjacency matrix method. Give the result.
 更に、本開示によれば、これとは異なる結果の実現が望ましい場合には、ダンピング・ファクターを変化させ必要な値を経験的に探るのではなく、スコアを変化させる必要のある文書に出次数ダミーまたは入次数ダミーを必要な数だけ加えることによって、より体系的、系統的、理論的であるという点で適切なスコアリング/ランク付けを実現することができる。 Furthermore, according to the present disclosure, if it is desirable to realize different results, the degree of appearance of the document needs to change the score instead of changing the damping factor and searching for necessary values empirically. By adding only the required number of dummy or in-order dummy, appropriate scoring / ranking can be realized in that it is more systematic, systematic and theoretical.
 更に、スコアを変化させる必要のある文書について、必要なスコアの変化に対応した長さ(即ち、向きを守った連続する矢印の数)の経路を構成する複数のダミー文書とダミー文書の間の複数の接続関係を、スコアを変化させる必要のある文書から出発するように加える、もしくは、到着するように加えることによっても、より体系的、系統的、理論的であるという点で適切なスコアリング/ランク付けを実現することができる。 Furthermore, for documents whose score needs to be changed, a plurality of dummy documents constituting a path of a length corresponding to the change in the required score (ie, the number of continuous arrows protected in the direction) are placed between the dummy documents. Scoring that is more systematic, systematic, and theoretical by adding multiple connection relations to start from documents that need to change scores, or by adding them to arrive / It is possible to realize the ranking.
 加えて、本開示の特殊エルミート隣接行列は、スコアを変化させる必要がある文書と他の一つの文書との接続関係を表す二箇所の成分をエルミート行列の性質を保持しつつ必要な変化に応じて変更することによっても、より体系的、系統的、理論的であるという点においてより適切にスコアリング/ランク付けを実現することができる。 In addition, the special Hermitian adjacency matrix of the present disclosure responds to the necessary change while maintaining the property of the Hermitian matrix at two locations representing the connection between a document whose score needs to be changed and one other document. It is also possible to realize scoring / ranking more appropriately in that it is more systematic, systematic, and theoretical by changing it.
 特殊エルミート隣接行列H2は、特殊エルミート隣接行列H1を体系的、系統的、理論的に発展させたものである。以上で述べた特殊エルミート隣接行列H1の利点は、特殊エルミート隣接行列H2についても成り立つ。この意味で、本開示は、従来技術に対する有意義な技術的改良をもたらす。 The special Hermite adjacency matrix H2 is a systematic, systematic and theoretical development of the special Hermitian adjacency matrix H1. The advantages of the special Hermitian adjacency matrix H1 described above hold also for the special Hermitian adjacency matrix H2. In this sense, the present disclosure provides significant technical improvements over the prior art.
 本開示は、上記実施形態及び変形例に限定されることなく、種々の態様を採ることができることは言うまでもない。例えば、第2スコアリング部143は、S190,S390において、固有ベクトルVの各成分V[m]を、始点ページSPの成分V[s]で除算することにより、複素平面上において始点ページSPを、実軸上の値1の点に配置した。 It goes without saying that the present disclosure can adopt various aspects without being limited to the above embodiment and modifications. For example, the second scoring unit 143 divides each component V [m] of the eigenvector V by the component V [s] of the start page SP in S190 and S390 to form the start page SP on the complex plane, It arranges to the point of value 1 on real axis.
 しかしながら、S190,S390において、固有ベクトルVの各成分V[m]は、-V[s]で除算されてもよいし、+V[s]・iで除算されてもよいし、-V[s]・iで除算されてもよいし、その所定倍(実数倍)で除算されてもよい。例えば、固有ベクトルVの各成分V[m]を、-V[s]で除算した場合、始点ページSPは、複素平面において実軸上の-1の地点に配置され、除算後の固有ベクトルV1の各成分は、複素平面において第2象限に配置される。この場合にも、適切に第2スコアを算出することが可能である。 However, in S190 and S390, each component V [m] of the eigenvector V may be divided by -V [s] or may be divided by + V [s] · i, or -V [s] It may be divided by i or divided by a predetermined multiple (real number multiple) thereof. For example, when each component V [m] of the eigenvector V is divided by -V [s], the start page SP is placed at a point of -1 on the real axis in the complex plane, and each of the divided eigenvectors V1 is The components are arranged in the second quadrant in the complex plane. Also in this case, it is possible to appropriately calculate the second score.
 この他、第2スコアリング部143は、図12に示すS124において、最長経路に該当する複数の経路の中で、始点からの出次数が最大である特定経路に該当する一つ以上の経路を探索してもよい。本開示は、ウェブ文書に限定されないリンク/引用関係を持つ文書のスコアリングに適用されてもよい。 In addition, in S124 shown in FIG. 12, the second scoring unit 143 selects one or more paths corresponding to the specific path having the largest output degree from the start point among the plurality of paths corresponding to the longest path. You may search. The present disclosure may be applied to the scoring of documents having link / quoting relationships not limited to web documents.
 上記実施形態における1つの構成要素が有する機能は、複数の構成要素に分散して設けられてもよい。複数の構成要素が有する機能は、1つの構成要素に統合されてもよい。上記実施形態の構成の一部は、省略されてもよい。請求の範囲に記載の文言から特定される技術思想に含まれるあらゆる態様が本開示の実施形態である。 The functions of one component in the above embodiment may be distributed to a plurality of components. The functions of multiple components may be integrated into one component. A part of the configuration of the above embodiment may be omitted. All the aspects contained in the technical thought specified from the wording as described in a claim are an embodiment of this indication.
 最後に用語間の対応関係について説明する。第2スコアリング部143が実行するS110,S310の処理は、特定部により実現される処理の一例に対応する。S120,S320の処理は、判別部により実現される処理の一例に対応する。S130,S330の処理は、設定部により実現される処理の一例に対応する。S160,S170,S360,S370の処理は、定義部により実現される処理の一例に対応する。S180,S380の処理は、固有ベクトル算出部により実現される処理の一例に対応する。S190~S220,S390~S400の処理は、スコア算出部により実現される処理の一例に対応する。ランク付け部145及び出力部147が実行する処理は、配列決定部により実現される処理の一例に対応する。 Finally, the correspondence between terms will be described. The processes of S110 and S310 executed by the second scoring unit 143 correspond to an example of the process implemented by the identifying unit. The processes of S120 and S320 correspond to an example of the process implemented by the determination unit. The processes of S130 and S330 correspond to an example of the process realized by the setting unit. The processes of S160, S170, S360, and S370 correspond to an example of the process realized by the definition unit. The processes of S180 and S380 correspond to an example of the process realized by the eigenvector calculation unit. The processes of S190 to S220 and S390 to S400 correspond to an example of the process realized by the score calculation unit. The process performed by the ranking unit 145 and the output unit 147 corresponds to an example of the process implemented by the arrangement determination unit.

Claims (18)

  1.  複数の文書をスコアリングする情報処理システムであって、
     前記複数の文書間の接続関係を表すデータに基づき、少なくとも弱連結関係にある文書群で構成される文書ネットワークを1つ以上特定する特定部と、
     前記1つ以上の文書ネットワークのそれぞれについて、対応する文書ネットワークの始点に位置する文書である始点文書を判別する判別部と、
     前記対応する文書ネットワークにおいて、出次数がゼロの文書に入次数1及び出次数0の仮想的なダミー文書を接続することにより、前記1つ以上の文書ネットワークのそれぞれについて、前記ダミー文書を含む仮想文書ネットワークを設定する設定部と、
     前記1つ以上の文書ネットワークに対応する1つ以上の仮想文書ネットワークのそれぞれについて、特殊エルミート隣接行列を定義する定義部であって、前記特殊エルミート隣接行列は、対応する仮想文書ネットワークを構成する文書D[m](1≦m(整数)≦N)間の接続関係に基づくN行N列のエルミート隣接行列の変形であり、前記エルミート隣接行列は、第p行第q列の成分h(p,q)が、文書D[p]から文書D[q]へのリンクが存在し且つ文書D[q]から文書D[p]へのリンクが存在するとき、値1を示し、文書D[p]から文書D[q]へのリンク及び文書D[q]から文書D[p]へのリンクのいずれも存在しないとき、値0を示し、文書D[p]から文書D[q]へのリンクが存在するが文書D[q]から文書D[p]へのリンクが存在しないとき、値+i(iは虚数単位)を示し、文書D[p]から文書D[q]へのリンクが存在しないが文書D[q]から文書D[p]へのリンクが存在するとき、値-iを示す、対角成分がゼロのエルミート行列である定義部と、
     前記特殊エルミート隣接行列の絶対値最大の固有値に対応する固有ベクトルであって、前記文書D[m](1≦m≦N)のそれぞれに対応する成分を有する固有ベクトルを算出する固有ベクトル算出部と、
     前記1つ以上の仮想文書ネットワークのそれぞれについて、前記始点文書に対応する成分を基準に、前記固有ベクトルの各成分を複素平面上に配置したときの、前記文書D[m](1≦m≦N)間の前記複素平面上の位置関係に基づき、前記文書D[m](1≦m≦N)のそれぞれのスコアを算出するスコア算出部と、
     を備え、前記定義部は、前記固有ベクトルの各成分を前記複素平面上に配置したときに、全ての成分が前記複素平面においてπ/2ラジアンの角度範囲内に収まるように、前記エルミート隣接行列を変形して、前記特殊エルミート隣接行列を定義する情報処理システム。
    An information processing system for scoring a plurality of documents, wherein
    A specification unit that specifies one or more document networks including at least a weakly connected document group based on data representing the connection relationship among the plurality of documents;
    A determination unit that determines, for each of the one or more document networks, a start point document that is a document located at a start point of the corresponding document network;
    A virtual document including the dummy documents for each of the one or more document networks by connecting a virtual dummy document of in-degree 1 and an out-degree 0 to a document with an output degree of zero in the corresponding document network A setting unit for setting a document network;
    A definition unit defining a special Hermite adjacency matrix for each of the one or more virtual document networks corresponding to the one or more document networks, wherein the special Hermitian adjacency matrix is a document constituting the corresponding virtual document network It is a modification of the Hermitian adjacency matrix of N rows and N columns based on the connection relation between D [m] (1 ≦ m (integer) ≦ N), and the Hermitian adjacency matrix is a component h of p th row q th column , Q) indicate a value of 1 when there is a link from document D [p] to document D [q] and a link from document D [q] to document D [p], and When there is neither a link from p] to document D [q] nor a link from document D [q] to document D [p], a value of 0 is indicated, and from document D [p] to document D [q] Link exists but from document D [q] When there is no link to [p], it indicates the value + i (i is an imaginary unit), and there is no link from document D [p] to document D [q] but document D [q] to document D [p A definition part which is a Hermitian matrix with zero diagonal components indicating a value -i when there is a link to
    An eigenvector calculation unit that calculates an eigenvector corresponding to an eigenvector corresponding to the absolute value maximum of the special Hermite adjacency matrix, and having a component corresponding to each of the documents D [m] (1 ≦ m ≦ N);
    For each of the one or more virtual document networks, the document D [m] (1 ≦ m ≦ N) when each component of the eigenvector is arranged on a complex plane based on the component corresponding to the start document A score calculation unit that calculates the score of each of the documents D [m] (1 ≦ m ≦ N) based on the positional relationship on the complex plane between them);
    And the definition unit arranges the Hermitian adjacency matrix such that when the components of the eigenvector are arranged on the complex plane, all the components fall within an angular range of .pi. / 2 radians in the complex plane. An information processing system which deforms and defines the special Hermite adjacency matrix.
  2.  請求項1記載の情報処理システムであって、
     前記特殊エルミート隣接行列は、第一の補正量C1及び第二の補正量C2に基づき、前記エルミート隣接行列において値+iを示す成分を値C1(C2+i)に置換し、値-iを示す成分を値C1(C2-i)に置換し、当該置換後のエルミート隣接行列における各行の成分の値C1(C2+i)を、同じ行において値C1(C2+i)及び値1を示す成分の数Wで除算した値{C1(C2+i)/W}に変更し、更に、前記{C1(C2+i)/W}に変更された成分と対角成分を挟んで対称的な位置関係にある成分の値C1(C2-i)を、前記値{C1(C2+i)/W}の複素共役{C1(C2-i)/W}に変更することによって定義されるエルミート行列に対応し、
     前記第一の補正量C1及び前記第二の補正量C2は、パラメータnを用いて次のように定められ、
    Figure JPOXMLDOC01-appb-M000001
     前記パラメータnは、前記1つ以上の仮想文書ネットワークの中で、文書数Nが最大の仮想文書ネットワークにおける文書数max{N}に基づいて、前記全ての成分が前記複素平面においてπ/2ラジアンの角度範囲内に収まるように定められる情報処理システム。
    The information processing system according to claim 1, wherein
    The special Hermite adjacency matrix substitutes the component indicating the value + i in the hermitian adjacency matrix with the value C1 (C2 + i) based on the first correction amount C1 and the second correction amount C2, and the component indicating the value -i The value C1 (C2-i) is substituted, and the value C1 (C2 + i) of the component of each row in the Hermite adjacency matrix after the substitution is divided by the number W of components showing the value C1 (C2 + i) and the value 1 in the same row Change the value to {C1 (C2 + i) / W}, and further change the value of the component changed to {C1 (C2 + i) / W} to the value of the component C1 (C2−) in a symmetrical positional relationship with respect to the diagonal component. correspond to a Hermitian matrix defined by changing i) to the complex conjugate {C1 (C2-i) / W} of the value {C1 (C2 + i) / W},
    The first correction amount C1 and the second correction amount C2 are determined as follows using a parameter n:
    Figure JPOXMLDOC01-appb-M000001
    The parameter n is π / 2 radians in the complex plane based on the number of documents max {N} in the virtual document network where the number of documents N is largest among the one or more virtual document networks An information processing system determined to fall within the angle range of.
  3.  請求項1記載の情報処理システムであって、
     前記特殊エルミート隣接行列は、第一の補正量C1及び第二の補正量C2に基づき、前記エルミート隣接行列において値+iを示す成分を値C1(C2+i)に置換し、値-iを示す成分を値C1(C2-i)に置換し、当該置換後のエルミート隣接行列における各行の成分の値C1(C2+i)を、同じ行において値C1(C2+i)及び値1を示す成分の数Wで除算した値{C1(C2+i)/W}に変更し、更に、前記値{C1(C2+i)/W}に変更された成分と対角成分を挟んで対称的な位置関係にある成分の値C1(C2-i)を、前記値{C1(C2+i)/W}の複素共役{C1(C2-i)/W}に変更し、変更後のエルミート隣接行列Hにおける各行の成分の値{C1(C2-i)/W}を、前記変更の前において同じ行でC1(C2-i)及び値1を示していた成分の数Zで乗算した値{C1(C2-i)Z/W}に変更し、更に、前記値{C1(C2-i)Z/W}に変更された成分と対角成分を挟んで対称的な位置関係にある成分の値{C1(C2+i)/W}を、前記値{C1(C2-i)Z/W}の複素共役{C1(C2+i)Z/W}に変更することによって定義されるエルミート行列に対応し、
     前記第一の補正量C1及び前記第二の補正量C2は、パラメータnを用いて次のように定められ、
    Figure JPOXMLDOC01-appb-M000002
     前記パラメータnは、前記1つ以上の仮想文書ネットワークの中で、文書数Nが最大の仮想文書ネットワークにおける文書数max{N}に基づいて、前記全ての成分が前記複素平面においてπ/2ラジアンの角度範囲内に収まるように定められる情報処理システム。
    The information processing system according to claim 1, wherein
    The special Hermite adjacency matrix substitutes the component indicating the value + i in the hermitian adjacency matrix with the value C1 (C2 + i) based on the first correction amount C1 and the second correction amount C2, and the component indicating the value -i The value C1 (C2-i) is substituted, and the value C1 (C2 + i) of the component of each row in the Hermite adjacency matrix after the substitution is divided by the number W of components showing the value C1 (C2 + i) and the value 1 in the same row Change the value to {C1 (C2 + i) / W}, and further, change the value {C1 (C2 + i) / W} to the value of the component C1 (C2) having a symmetrical positional relationship with respect to the diagonal component. -I) is changed to the complex conjugate {C1 (C2-i) / W} of the value {C1 (C2 + i) / W}, and the value {C1 (C2-C2) of each row component in the modified Hermite adjacency matrix H i) / W} before the change Change to the value {C1 (C2-i) Z / W} multiplied by the number Z of the components that indicated C1 (C2-i) and the value 1 in a row, and further, the value {C1 (C2-i) Z The value of the component {C1 (C2 + i) / W} in a symmetrical positional relationship with respect to the component changed to / W} and the diagonal component is a complex of the value {C1 (C2-i) Z / W}. Corresponds to the Hermitian matrix defined by changing to the conjugate {C1 (C2 + i) Z / W},
    The first correction amount C1 and the second correction amount C2 are determined as follows using a parameter n:
    Figure JPOXMLDOC01-appb-M000002
    The parameter n is π / 2 radians in the complex plane based on the number of documents max {N} in the virtual document network where the number of documents N is largest among the one or more virtual document networks An information processing system determined to fall within the angle range of.
  4.  請求項1記載の情報処理システムであって、
     前記特殊エルミート隣接行列は、補正量Cに基づき、前記エルミート隣接行列において値+iを示す成分を値C+iに置換し、当該置換後のエルミート隣接行列における各行の値C+iを示す成分の値を、同じ行において値C+iを示す成分の数Rで除算した値{(C+i)/R}に変更し、更に、値-iを示す成分の値を、対角成分を挟んで対称的な位置関係にある成分の複素共役{(C-i)/R}に変更することによって定義されるエルミート行列に対応し、
     前記補正量Cは、パラメータnを用いて次のように定められ、
    Figure JPOXMLDOC01-appb-M000003
     前記パラメータnは、前記1つ以上の仮想文書ネットワークの中で、文書数Nが最大の仮想文書ネットワークにおける文書数max{N}に基づいて、前記全ての成分が前記複素平面においてπ/2ラジアンの角度範囲内に収まるように定められる情報処理システム。
    The information processing system according to claim 1, wherein
    The special Hermite adjacency matrix substitutes the component indicating the value + i in the Hermite adjacency matrix with the value C + i based on the correction amount C, and the value of the component indicating the value C + i in each row in the Hermitian adjacency matrix after the replacement is the same. Change to the value {(C + i) / R} divided by the number R of components showing the value C + i in the row, and further, the values of the components showing the value −i have a symmetrical positional relationship across diagonal components. Corresponds to the Hermitian matrix defined by changing to the complex conjugate of the component {(Ci) / R},
    The correction amount C is determined as follows using a parameter n:
    Figure JPOXMLDOC01-appb-M000003
    The parameter n is π / 2 radians in the complex plane based on the number of documents max {N} in the virtual document network where the number of documents N is largest among the one or more virtual document networks An information processing system determined to fall within the angle range of.
  5.  請求項1~請求項4のいずれか一項記載の情報処理システムであって、
     前記スコア算出部は、前記1つ以上の仮想文書ネットワークのそれぞれについて、前記固有ベクトルの各成分の値V[m](1≦m≦N)を、前記始点文書に対応する成分が前記複素平面における特定位置に配置されるように回転変換し、前記文書D[m](1≦m≦N)のそれぞれのスコアを、対応する文書D[m]の前記回転変換後の値Vc[m]に基づき算出する情報処理システム。
    The information processing system according to any one of claims 1 to 4, wherein
    The score calculation unit determines, for each of the one or more virtual document networks, the value V [m] (1 ≦ m ≦ N) of each component of the eigenvector and the component corresponding to the start document in the complex plane Rotation conversion is performed so as to be disposed at a specific position, and the score of each of the documents D [m] (1 ≦ m ≦ N) is converted to the value Vc [m] after the rotation conversion of the corresponding document D [m]. Information processing system to calculate based on.
  6.  請求項5記載の情報処理システムであって、
     前記スコア算出部は、前記始点文書に対応する成分が、前記特定位置として、前記複素平面における前記一つの象限の境界に位置する特定軸からゼロではない所定角度の位置に配置されるように、前記固有ベクトルの各成分の値V[m](1≦m≦N)を回転変換する情報処理システム。
    The information processing system according to claim 5, wherein
    The score calculation unit is configured such that the component corresponding to the start document is disposed at a predetermined angle that is not zero from a specific axis located at the boundary of the one quadrant in the complex plane as the specific position. An information processing system that rotationally converts values V [m] (1 ≦ m ≦ N) of respective components of the eigenvectors.
  7.  請求項5又は請求項6記載の情報処理システムであって、
     前記スコア算出部は、前記文書D[m](1≦m≦N)のそれぞれのスコアとして、対応する文書D[m]の前記回転変換後の値Vc[m]に基づき、前記複素平面における特定軸からの前記値Vc[m]の角度θ[m]と前記値Vc[m]の絶対値L[m]との積(θ[m]・L[m])に対応する値を算出する情報処理システム。
    The information processing system according to claim 5 or 6, wherein
    The score calculation unit determines the score of the document D [m] (1 ≦ m ≦ N) on the complex plane based on the value Vc [m] after the rotation conversion of the corresponding document D [m]. Calculate the value corresponding to the product (θ [m] · L [m]) of the angle θ [m] of the value Vc [m] from the specific axis and the absolute value L [m] of the value Vc [m] Information processing system.
  8.  請求項1~請求項4のいずれか一項記載の情報処理システムであって、
     前記スコア算出部は、前記1つ以上の仮想文書ネットワークのそれぞれについて、前記固有ベクトルの各成分の値V[m](1≦m≦N)を、前記始点文書に対応する成分の値V[s]に基づいた特定値Eで除算することにより、前記始点文書に対応する成分を複素平面の特定軸上に配置し、その後、前記始点文書に対応する成分を前記特定軸からゼロではない所定角度の位置まで回転させるように、前記固有ベクトルの各成分の値V[m]/E(1≦m≦N)を回転変換し、前記文書D[m](1≦m≦N)のそれぞれのスコアとして、対応する文書D[m]の前記回転変換後の値Vc[m]に基づき、前記複素平面における前記特定軸からの前記値Vc[m]の角度θ[m]と前記値Vc[m]の絶対値L[m]との積(θ[m]・L[m])に対応する値を算出する情報処理システム。
    The information processing system according to any one of claims 1 to 4, wherein
    The score calculation unit determines, for each of the one or more virtual document networks, the value V [m] (1 ≦ m ≦ N) of each component of the eigenvector and the value V [s of the component corresponding to the start document. By dividing the component corresponding to the starting document on a specific axis of the complex plane by dividing by a specific value E based on the following equation, and thereafter, the component corresponding to the starting document is a predetermined angle which is not zero from the specific axis Values of each component of the eigenvector are rotated and the score of each of the documents D [m] (1 ≦ m ≦ N) The angle θ [m] of the value Vc [m] from the specific axis in the complex plane and the value Vc [m] based on the value Vc [m] after the rotation conversion of the corresponding document D [m] as Product with the absolute value L [m] (θ [m] · L [m ] The information processing system which calculates the value corresponding to].
  9.  請求項8記載の情報処理システムであって、
     前記スコア算出部は、前記固有ベクトルの各成分の値V[m](1≦m≦N)を前記特定値Eで除算したときに、前記複素平面において、前記特定軸よりも前記回転変換による回転方向の上流に位置する成分が存在する場合には、最も上流に位置する成分を、前記特定軸からゼロではない所定角度の位置まで回転させるように、前記固有ベクトルの各成分の値V[m]/E(1≦m≦N)を回転変換する情報処理システム。
    The information processing system according to claim 8, wherein
    The score calculation unit, when dividing the value V [m] (1 m m N N) of each component of the eigenvector by the specific value E, in the complex plane, the rotation due to the rotation conversion by the rotation conversion than the specific axis. When there is a component located upstream of the direction, the value V [m] of each component of the eigenvector is set to rotate the component located most upstream from the specific axis to a position at a predetermined angle that is not zero. An information processing system that rotationally converts / E (1 ≦ m ≦ N).
  10.  請求項7~請求項9のいずれか一項記載の情報処理システムであって、
     前記文書D[m](1≦m≦N)のそれぞれのスコアを、前記文書D[m](1≦m≦N)の全ての前記スコアの最小値がゼロとなるように前記積(θ[m]・L[m])を減算した値として算出する情報処理システム。
    The information processing system according to any one of claims 7 to 9, wherein
    Each score of the document D [m] (1 ≦ m ≦ N) is multiplied by the product (θ such that the minimum value of all the scores of the document D [m] (1 ≦ m ≦ N) is zero. An information processing system that calculates a value obtained by subtracting [m] · L [m]).
  11.  請求項1~請求項4のいずれか一項記載の情報処理システムであって、
     前記スコア算出部は、前記1つ以上の仮想文書ネットワークのそれぞれについて、前記固有ベクトルの各成分の値V[m](1≦m≦N)を、前記始点文書に対応する成分の値V[s]に基づいた特定値Eで除算することにより、前記始点文書に対応する成分を複素平面の特定軸上に配置し、前記文書D[m](1≦m≦N)のそれぞれのスコアとして、対応する文書D[m]の成分の値V[m]/Eの前記特定軸からの角度θ[m]と前記値V[m]/Eの絶対値L[m]との積(θ[m]・L[m])に対応する値を算出する情報処理システム。
    The information processing system according to any one of claims 1 to 4, wherein
    The score calculation unit determines, for each of the one or more virtual document networks, the value V [m] (1 ≦ m ≦ N) of each component of the eigenvector and the value V [s of the component corresponding to the start document. The component corresponding to the start point document is arranged on the specific axis of the complex plane by dividing by the specific value E based on [], and the score of each of the documents D [m] (1 ≦ m ≦ N) is obtained. The product of the angle θ [m] of the value V [m] / E of the component of the corresponding document D [m] from the specific axis and the absolute value L [m] of the value V [m] / E (θ [ m] · L [m]) An information processing system which calculates a value corresponding to.
  12.  請求項8、請求項9、及び請求項11のいずれか一項記載の情報処理システムであって、
     前記スコア算出部は、前記特定値Eとして、前記固有ベクトルの各成分の値V[m]を前記始点文書に対応する成分の値V[s]で除算することにより、前記始点文書に対応する成分を複素平面の実軸上に配置する情報処理システム。
    The information processing system according to any one of claims 8, 9, and 11.
    The score calculation unit divides, as the specific value E, the value V [m] of each component of the eigenvector by the value V [s] of the component corresponding to the start document, and thereby the component corresponding to the start document An information processing system that arranges on the real axis of the complex plane.
  13.  請求項1~請求項12のいずれか一項記載の情報処理システムであって、
     前記判別部は、前記対応する文書ネットワークにおいて、最も長い経路の始点に位置する文書を前記始点文書に判別する情報処理システム。
    The information processing system according to any one of claims 1 to 12,
    The information processing system, wherein the determination unit determines a document located at a starting point of the longest path in the corresponding document network as the starting document.
  14.  請求項13記載の情報処理システムであって、
     前記判別部は、前記対応する文書ネットワークにおいて複数の経路が前記最も長い経路に該当する場合には、前記複数の経路の始点に位置する文書の内、出次数が最も多い文書及び出次数が最も少ない文書の一方を、前記始点文書に判別する情報処理システム。
    The information processing system according to claim 13, wherein
    When the plurality of routes correspond to the longest route in the corresponding document network, the determination unit determines that the document with the highest out-degree and the out-degree are the most among the documents located at the start points of the plurality of routes. An information processing system for determining one of a small number of documents as the start document.
  15.  請求項13記載の情報処理システムであって、
     前記判別部は、前記対応する文書ネットワークにおいて複数の経路が前記最も長い経路に該当する場合には、前記複数の経路の始点に位置する文書に接続される入次数がゼロの仮想的なダミー文書を配置し、配置した仮想的なダミー文書を前記始点文書として判別する情報処理システム。
    The information processing system according to claim 13, wherein
    The discrimination unit is a virtual dummy document having zero in-degree connected to a document located at a start point of the plurality of paths when a plurality of paths correspond to the longest path in the corresponding document network An information processing system that arranges the virtual dummy documents arranged as the start point document.
  16.  請求項1~請求項15のいずれか一項記載の情報処理システムであって、
     前記スコア算出部により算出された前記複数の文書のスコアに基づき、検索クエリに対応する文書のリストにおける文書の配列を決定する配列決定部を備える情報処理システム。
    The information processing system according to any one of claims 1 to 15,
    An information processing system comprising: an arrangement determination unit configured to determine an arrangement of documents in a list of documents corresponding to a search query based on the scores of the plurality of documents calculated by the score calculation unit.
  17.  複数の文書をスコアリングする情報処理方法であって、コンピュータが、
     前記複数の文書間の接続関係を表すデータに基づき、少なくとも弱連結関係にある文書群で構成される文書ネットワークを1つ以上特定することと、
     前記特定された1つ以上の文書ネットワークのそれぞれについて、対応する文書ネットワークの始点に位置する文書である始点文書を判別することと、
     前記対応する文書ネットワークにおいて、出次数がゼロの文書に入次数1及び出次数0の仮想的なダミー文書を接続することにより、前記1つ以上の文書ネットワークのそれぞれについて、前記ダミー文書を含む仮想文書ネットワークを設定することと、
     前記1つ以上の文書ネットワークに対応する1つ以上の仮想文書ネットワークのそれぞれについて、特殊エルミート隣接行列を定義することであって、前記特殊エルミート隣接行列は、対応する仮想文書ネットワークを構成する文書D[m](1≦m(整数)≦N)間の接続関係に基づくN行N列のエルミート隣接行列の変形であり、前記エルミート隣接行列は、第p行第q列の成分h(p,q)が、文書D[p]から文書D[q]へのリンクが存在し且つ文書D[q]から文書D[p]へのリンクが存在するとき、値1を示し、文書D[p]から文書D[q]へのリンク及び文書D[q]から文書D[p]へのリンクのいずれも存在しないとき、値0を示し、文書D[p]から文書D[q]へのリンクが存在するが文書D[q]から文書D[p]へのリンクが存在しないとき、値+i(iは虚数単位)を示し、文書D[p]から文書D[q]へのリンクが存在しないが文書D[q]から文書D[p]へのリンクが存在するとき、値-iを示す、対角成分がゼロのエルミート行列である、前記定義することと、
     前記特殊エルミート隣接行列の絶対値最大の固有値に対応する固有ベクトルであって、前記文書D[m](1≦m≦N)のそれぞれに対応する成分を有する固有ベクトルを算出することと、
     前記1つ以上の仮想文書ネットワークのそれぞれについて、前記始点文書に対応する成分を基準に、前記固有ベクトルの各成分を複素平面上に配置したときの、前記文書D[m](1≦m≦N)間の前記複素平面上の位置関係に基づき、前記文書D[m](1≦m≦N)のそれぞれのスコアを算出することと
     を含み、
     前記固有ベクトルの各成分を前記複素平面上に配置したときに、全ての成分が前記複素平面においてπ/2ラジアンの角度範囲内に収まるように、前記エルミート隣接行列を変形することによって、前記特殊エルミート隣接行列を定義する情報処理方法。
    An information processing method for scoring a plurality of documents, wherein a computer is
    Specifying one or more document networks composed of at least a weakly connected document group based on data representing a connection relationship between the plurality of documents;
    Determining, for each of the one or more specified document networks, a start document which is a document located at a corresponding document network start;
    A virtual document including the dummy documents for each of the one or more document networks by connecting a virtual dummy document of in-degree 1 and an out-degree 0 to a document with an output degree of zero in the corresponding document network Setting up a document network,
    Defining a special Hermitian adjacency matrix for each of the one or more virtual document networks corresponding to the one or more document networks, the special Hermitian adjacency matrix comprising a document D constituting a corresponding virtual document network [M] is a modification of a Hermitian adjacency matrix of N rows and N columns based on a connection relation between [m] (1 ≦ m (integer) ≦ N), and the Hermitian adjacency matrix is a component h (p, q) of the p th row and q th column q) indicates a value of 1 when there is a link from document D [p] to document D [q] and a link from document D [q] to document D [p], document D [p] ] Indicates that the link from document D [q] to the link from document D [q] to document D [p] does not have a value of 0, and from document D [p] to document D [q] A link exists but document D [q] to document D When there is no link to p], the value + i (i is an imaginary unit) is indicated, and there is no link from document D [p] to document D [q], but document D [q] to document D [p] Defining that the diagonal component is a Hermitian matrix of zeros indicating a value -i when there is a link to
    Calculating an eigenvector having a component corresponding to each of the documents D [m] (1 ≦ m ≦ N), which is an eigenvector corresponding to an absolute value maximum eigenvalue of the special Hermitian adjacency matrix;
    For each of the one or more virtual document networks, the document D [m] (1 ≦ m ≦ N) when each component of the eigenvector is arranged on a complex plane based on the component corresponding to the start document Calculating the score of each of the documents D [m] (1 ≦ m ≦ N) based on the positional relationship on the complex plane between them).
    When the components of the eigenvector are arranged on the complex plane, the special hermitian is modified by deforming the hermitian adjacency matrix so that all components fall within an angle range of π / 2 radian in the complex plane. An information processing method that defines an adjacency matrix.
  18.  請求項17記載の情報処理方法をコンピュータに実行させるためのコンピュータプログラム。 A computer program for causing a computer to execute the information processing method according to claim 17.
PCT/JP2018/026560 2017-11-28 2018-07-13 Information processing system, information processing method, and computer program WO2019106878A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2018559909A JP6502592B1 (en) 2017-11-28 2018-07-13 INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2017-227936 2017-11-28
JP2017227936 2017-11-28

Publications (1)

Publication Number Publication Date
WO2019106878A1 true WO2019106878A1 (en) 2019-06-06

Family

ID=66664439

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/026560 WO2019106878A1 (en) 2017-11-28 2018-07-13 Information processing system, information processing method, and computer program

Country Status (1)

Country Link
WO (1) WO2019106878A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112734039A (en) * 2021-03-31 2021-04-30 杭州海康威视数字技术股份有限公司 Virtual confrontation training method, device and equipment for deep neural network
WO2021124933A1 (en) * 2019-12-20 2021-06-24 桂太 杉原 Information processing system and information processing method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005275794A (en) * 2004-03-24 2005-10-06 Ntt Data Corp Inter-information relevancy analyzing device and method, program and recording medium
JP2014241034A (en) * 2013-06-11 2014-12-25 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Device, method and program to retrieve sentence

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005275794A (en) * 2004-03-24 2005-10-06 Ntt Data Corp Inter-information relevancy analyzing device and method, program and recording medium
JP2014241034A (en) * 2013-06-11 2014-12-25 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Device, method and program to retrieve sentence

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021124933A1 (en) * 2019-12-20 2021-06-24 桂太 杉原 Information processing system and information processing method
JP6919961B1 (en) * 2019-12-20 2021-08-18 桂太 杉原 Information processing system and information processing method
CN112734039A (en) * 2021-03-31 2021-04-30 杭州海康威视数字技术股份有限公司 Virtual confrontation training method, device and equipment for deep neural network

Similar Documents

Publication Publication Date Title
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
Vluymans et al. Learning from imbalanced data
Yu et al. On service community learning: A co-clustering approach
JP5616444B2 (en) Method and system for document indexing and data querying
CN112989842A (en) Construction method of universal embedded framework of multi-semantic heterogeneous graph
US20070011110A1 (en) Building support vector machines with reduced classifier complexity
Zheng et al. Deviation-based contextual SLIM recommenders
US11500884B2 (en) Search and ranking of records across different databases
CN114186084B (en) Online multi-mode Hash retrieval method, system, storage medium and equipment
WO2019106878A1 (en) Information processing system, information processing method, and computer program
CN106528648A (en) Distributed keyword approximate search method for RDF in combination with Redis memory database
Peng et al. Hierarchical visual-textual knowledge distillation for life-long correlation learning
CN112069399A (en) Personalized search system based on interactive matching
Chios et al. Helping results assessment by adding explainable elements to the deep relevance matching model
CN110516078A (en) Alignment schemes and device
JP6502592B1 (en) INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM
Haj Mohamed et al. Local commute-time guided MDS for 3D non-rigid object retrieval
Mccarthy et al. Generating diverse compound critiques
JP5507620B2 (en) Synonym estimation device, synonym estimation method, and synonym estimation program
CN115238170A (en) User portrait processing method and system based on block chain finance
KR102205061B1 (en) Method and apparatus of metadata recommendation service
JP2021096848A (en) Deep metric learning method and system
WO2021124933A1 (en) Information processing system and information processing method
JP5202569B2 (en) Machine learning method and machine learning system
Liao et al. Drug3d-dti: improved drug-target interaction prediction by incorporating spatial information of small molecules

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018559909

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18884021

Country of ref document: EP

Kind code of ref document: A1