WO2022259303A1 - Dispositif d'association de données de nom, procédé d'association de données de nom et programme d'association de données de nom - Google Patents

Dispositif d'association de données de nom, procédé d'association de données de nom et programme d'association de données de nom Download PDF

Info

Publication number
WO2022259303A1
WO2022259303A1 PCT/JP2021/021548 JP2021021548W WO2022259303A1 WO 2022259303 A1 WO2022259303 A1 WO 2022259303A1 JP 2021021548 W JP2021021548 W JP 2021021548W WO 2022259303 A1 WO2022259303 A1 WO 2022259303A1
Authority
WO
WIPO (PCT)
Prior art keywords
path
name data
database
data
building
Prior art date
Application number
PCT/JP2021/021548
Other languages
English (en)
Japanese (ja)
Inventor
まな美 小川
正崇 佐藤
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to JP2023527147A priority Critical patent/JPWO2022259303A1/ja
Priority to PCT/JP2021/021548 priority patent/WO2022259303A1/fr
Publication of WO2022259303A1 publication Critical patent/WO2022259303A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Definitions

  • Embodiments of the present invention relate to a name data association device, a name data association method, and a name data association program.
  • Non-Patent Document 1 and Non-Patent Document 2 disclose a method of searching for the most similar character string by quantitatively calculating the degree of similarity between character strings to be searched. is proposing.
  • Non-Patent Document 3 proposes a method of accurately and efficiently searching for character strings representing the same matter by creating a search dictionary.
  • Patent Literature 1 discloses a linking method using peripheral information of data for which name identification is desired.
  • Non-Patent Documents 1 and 2 are popular and effective means when only the former abbreviated notation exists as a notation variation.
  • each common name is associated with a name that is similar in character string to the common name, so there is a high possibility of presenting an erroneous result. This is because, in many cases, common name notation is significantly different from the name that should be originally linked.
  • Non-Patent Documents 1 and 2 are created on the assumption that they will be used for Japanese, so the scope of application of the technology is limited. Since the characteristics of abbreviations in Japanese and the characteristics of other languages do not all match, the methods disclosed in Non-Patent Documents 1 and 2 can be applied without problems to name data input in other languages. is not limited.
  • Non-Patent Document 3 is the optimal method for common name notation.
  • the dictionary needs to be expanded accordingly, so there is a drawback that it takes time to cope with spelling variations.
  • Japanese Patent Application Laid-Open No. 2002-200010 proposes a technique that makes it possible to identify common names by using information around data to be identified (data A and data B in the same database are related, etc.) without relying on a dictionary.
  • the technology disclosed in Patent Document 1 is such that the graphs constructed from the name data of each database have a kind of inclusion relationship (an edge corresponding to an edge of one graph always exists in the other graph). must be satisfied). For this reason, there is a problem that it is difficult to collect name data from which a graph having a structure in which inclusion relationships are not maintained is obtained, or even if it is possible, a large number of candidate names are generated.
  • This invention seeks to provide a technology that can accurately associate synonymous name data that has notational variations between databases to be integrated without requiring human intervention.
  • a name data association device includes a first database holding a plurality of name data and adjacency information indicating a logical or physical adjacency relationship between the name data. and a second database that holds a plurality of name data, adjacency information of the name data, and path identification information representing paths to which the name data belong.
  • a device comprising a common data extraction unit, a path creation unit, and an association unit. The common data extraction unit extracts name data having the same notation between the first database and the second database as common data.
  • the path creation unit extracts a partial path having common data extracted by the common data extraction unit as endpoints and non-common data as vertices between the endpoints, from the path represented by the path identification information held by the second database. , based on the information held by the first database, for each partial path, a path having common data end points identical to the end points of the partial path and having a length equal to or greater than the length of the partial path is created.
  • the associating unit searches for combinations of vertices on the partial path and vertices on the path created by the path creation unit, thereby obtaining names held by the first database.
  • the data and the name data held by the second database are associated with each other.
  • FIG. 1 is a block diagram showing an example of the configuration of a name data association device according to one embodiment of the present invention.
  • FIG. 2 is a diagram showing an example of the hardware configuration of the name data association device.
  • FIG. 3 is a diagram showing an example of information held by a basic database stored in a basic database storage unit.
  • FIG. 4 is a diagram showing an example of information held by a derived database stored in a derived database storage unit.
  • FIG. 5 is a flow chart showing an example of processing operations related to association of name data in the name data association device.
  • FIG. 6 is a schematic diagram for explaining a method of associating names.
  • FIG. 7 is a diagram showing an example of information held by a basic database in an operation example.
  • FIG. 1 is a block diagram showing an example of the configuration of a name data association device according to one embodiment of the present invention.
  • FIG. 2 is a diagram showing an example of the hardware configuration of the name data association device.
  • FIG. 3 is a diagram showing an
  • FIG. 8 is a diagram showing an example of information held by the derivative database in the operation example.
  • FIG. 9 is a schematic diagram showing an example of a cycle graph created from information held in a basic database by the graph creating unit in the operation example.
  • FIG. 10 is a schematic diagram showing an example of a path generated from a cycle graph created from information held in a derived database by the graph creating unit in the operation example.
  • FIG. 11 is a diagram illustrating an example of output information stored in an output information storage unit in an operation example;
  • each data column can include name data and string-specific data corresponding to the name data, such as measurement value, date and time of measurement, date and time of sales, and amount of sales.
  • each database holds logical or physical adjacency information indicating the adjacency relationship of name data.
  • the adjacency information indicating the adjacency relationship of the name data includes, for example, personal connections (person A and person B are acquaintances) and connection relationship on the network (building A and building B are connected by a cable). It refers to information on how data is connected to each other.
  • each database has columns named "upper building” and “lower building”, and the name data stored in "upper building” and the name data stored in "lower building” are Indicates that they are adjacent.
  • at least one of the plurality of databases is added with path identification information indicating the path to which the name data belongs, in addition to the adjacent information.
  • FIG. 1 is a block diagram showing an example of the configuration of a name data association device according to one embodiment of the present invention.
  • the name data association device includes a basic database (in the figure, the database is abbreviated as DB) 1, a derivative database 2, a graph creation unit 3, a common data extraction unit 4, a path information extraction unit 5, a path creation unit 6, correspondence It has an attachment unit 7 and a data output unit 8 .
  • the basic database 1 is a first database that holds a plurality of name data and adjacency information indicating the adjacency relationship between the name data.
  • the derivative database 2 is a second database that holds a plurality of name data, adjacency information of the name data, and path identification information representing paths to which the name data belong.
  • the graph creation unit 3 Based on the information held by the basic database 1 and the derived database 2, the graph creation unit 3 creates an undirected graph with name data as vertices.
  • the common data extraction unit 4 extracts name data that is written in the same way between the basic database 1 and the derivative database 2 as common data.
  • the path information extraction unit 5 takes one of the common data extracted by the common data extraction unit 4 as the starting point and sets the name data held by the derivative database 2 as the vertex. , to generate all paths.
  • the end point of the path may have the same common data as the start point, or may have different common data from the start point.
  • the path information extraction unit 5 extracts path information including the number of vertices, the name data of the included vertices, and the position on the path for each of those paths.
  • the path information extraction unit 5 can extract path information based on the undirected graph created by the graph creation unit 3 and the path identification information held by the derivative database 2 .
  • the path creation unit 6 extracts partial paths having each common data as an end point, that is, a start point and an end point, for each path indicated by the path information extracted by the path information extraction unit 5 . Then, based on the information held by the basic database 1, the path creation unit 6 counts up all paths whose endpoints are common data that match each partial path and that have a prescribed length. For example, the path creation unit 6 can enumerate paths based on the undirected graph created by the graph creation unit 3 from the basic database 1 .
  • the associating unit 7 searches for a combination of vertex name data from the partial paths extracted by the path creating unit 6 and the counted paths, for example, based on the character string similarity such as the edit distance. Then, the associating unit 7 associates the name data held by the basic database 1 with the name data held by the derivative database 2 based on the searched combinations.
  • the data output unit 8 generates output information based on the result of association by the association unit 7 and outputs it.
  • the data output unit 8 can generate, as output information, a correspondence table representing the correspondence of name data based on the result of association by the association unit 7 .
  • the data output unit 8 converts the name data of the information held in the basic database 1 based on the result of association by the association unit 7, creates a new database, and uses this as output information. You can make it work.
  • the data output unit 8 integrates the information held by the basic database 1 and the derivative database 2 based on the result of association by the association unit 7, creates a new database, and outputs this as output information. You can also use
  • FIG. 2 is a diagram showing an example of the hardware configuration of the name data association device.
  • the name data association device is composed of a computer such as a server computer or a personal computer, and has a hardware processor 101 such as a CPU (Central Processing Unit).
  • a program memory 102 In the name data association device, a program memory 102, a data memory 103, a communication interface 104, and an input/output interface (denoted as an input/output IF in FIG. connected through
  • the communication interface 104 can include, for example, one or more wired or wireless communication modules. If the basic database 1 and/or the derivative database 2 are configured in a data server or the like connected via a network such as a LAN (Local Area Network) or the Internet, the communication interface 104 is connected to the data server or the like. can communicate between and retrieve data from those data servers. Also, the communication interface 104 can communicate with an external data processing device or the like to receive a request from the data processing device, and can also send a data processing result corresponding to the request back to the data processing device. .
  • LAN Local Area Network
  • An input unit 107 and a display unit 108 are connected to the input/output interface 105 .
  • the input unit 107 and the display unit 108 are so-called tablet-type inputs, in which an input detection sheet adopting an electrostatic method or a pressure method is arranged on a display screen of a display device using liquid crystal or organic EL (Electro Luminescence), for example. - using a display device can be used; Note that the input unit 107 and the display unit 108 may be configured by independent devices.
  • the input/output interface 105 inputs operation information input from the input unit 107 to the processor 101 and displays display information generated by the processor 101 on the display unit 108 .
  • the input unit 107 and the display unit 108 do not have to be connected to the input/output interface 105 .
  • the input unit 107 and the display unit 108 are provided with a communication unit for connecting to the communication interface 104 directly or via a network, so that information can be exchanged with the processor 101 .
  • the input/output interface 105 may have a read/write function for a recording medium such as a semiconductor memory such as a flash memory, or may be connected to a reader/writer having a read/write function for such a recording medium. It may have functions. As a result, a recording medium detachable from the name data association device can be used as a database for holding name data.
  • the input/output interface 105 may further have a connection function with other devices.
  • the program memory 102 is a non-temporary tangible computer-readable storage medium, for example, a non-volatile memory such as a HDD (Hard Disk Drive) or SSD (Solid State Drive) that can be written and read at any time, and a non-volatile memory such as a ROM. It is used in combination with a static memory.
  • the program memory 102 stores programs necessary for the processor 101 to execute various control processes according to one embodiment. That is, the processing function units in each of the above-described graph creation unit 3, common data extraction unit 4, path information extraction unit 5, path creation unit 6, association unit 7, and data output unit 8 are all stored in the program memory 102. It can be realized by causing the processor 101 to read and execute the stored program. Some or all of these processing functions may be implemented in various other forms, including integrated circuits such as Application Specific Integrated Circuits (ASICs) or field-programmable gate arrays (FPGAs). May be.
  • ASICs Application Specific Integrated Circuits
  • FPGAs field-programmable gate arrays
  • the data memory 103 is a tangible computer-readable storage medium, for example, a combination of the above nonvolatile memory and a volatile memory such as RAM (Random Access Memory).
  • This data memory 103 is used to store various data acquired and created in the process of performing various processes. That is, in the data memory 103, an area for storing various data is appropriately secured in the process of performing various processes. As such areas, the data memory 103 can be provided with, for example, a basic database storage unit 1031 , a derived database storage unit 1032 , a temporary storage unit 1033 and an output information storage unit 1034 .
  • the basic database storage unit 1031 stores information of the basic database 1, and the derived database storage unit 1032 stores information of the derived database 2. That is, the basic database 1 and the derivative database 2 can be configured in the basic database storage unit 1031 and the derivative database storage unit 1032 .
  • FIG. 3 is a diagram showing an example of information held by the basic database 1 stored in the basic database storage unit 1031
  • FIG. 4 is an example of information held by the derived database 2 stored in the derived database storage unit 1032. It is a figure which shows. Here, an example is shown in which the name data is the name of a building. In the basic database 1 stored in the basic database storage unit 1031, the upper building and the lower building are adjacent to each other.
  • a combination of buildings having the same path identifier (identifier is abbreviated as ID in the figure) is represented by one path (Shinjuku Building ⁇ Minami-Shinjuku Building ⁇ Gaien Building ⁇ Yotsuya Building ⁇ Shinjuku Building).
  • ID path identifier
  • the building names in the derived database 2 are denoted by c i (i ⁇ ⁇ 1, 2, ..., n ⁇ )
  • the building names in the basic database 1 are denoted by d j (j ⁇ ⁇ 1, 2, ..., m ⁇ ).
  • n and m are the number of building names in each database.
  • the information stored in the basic database storage unit 1031 and the derived database storage unit 1032 is, for example, the information of the basic database 1 and the derived database 2 input from the input unit 107 received by the processor 101 via the input/output interface 105.
  • a base database 1 and a derived database 2 can be constructed in the data memory 103 .
  • all or part of the information held by the basic database 1 and the derived database 2 constructed in an external data server may be stored in the basic database storage unit 1031 and the derived database storage unit 1032 .
  • the processor 101 acquires information accumulated in the database server via the communication interface 104 in response to an instruction by a user operation from the input unit 107, and stores them in the storage units 1031 and 1032.
  • processor 101 may acquire information recorded on a recording medium via input/output interface 105 .
  • the processor 101 also receives a request for associating the information of the basic database 1 and the derived database 2 with the name data from an external data processing device or the like via the communication interface 104, and processes the received database information. It may be stored in the storage units 1031 and 1032 as target information.
  • the temporary storage unit 1033 stores the undirected graph created when the processor 101 performs the operation as the graph creation unit 3, the common data extracted when the operation as the common data extraction unit 4 is performed, the path Path information about all paths extracted when the operation of the information extraction unit 5 is performed, partial paths and counted paths extracted when the operation of the path creation unit 6 is performed, and the correspondence unit 7 Stores the result of association of name data obtained when performing the operation of .
  • the output information storage unit 1034 stores output information obtained when the processor 101 operates as the data output unit 8 described above.
  • FIG. 5 is a flow chart showing an example of a processing operation related to association of name data in the name data association device.
  • the information of the basic database 1 is already stored in the basic database storage unit 1031 and the information of the derivative database 2 is already stored in the derivative database storage unit 1032 .
  • the input unit 107 via the input/output interface 105 or an external data processing device via the communication interface 104 instructs to perform name data matching
  • the processor 101 of the name data matching device The operation shown in this flow chart is started.
  • the processor 101 performs the operation as the graph creation unit 3 . That is, the processor 101 uses the adjacency information for each of the information of the derived database 2 stored in the derived database storage unit 1032 and the information of the basic database 1 stored in the basic database storage unit 1031 to extract the name data.
  • Cycle graphs G c and G d to be vertices are generated (step S1).
  • the generated cycle graphs G c and G d are stored in the temporary storage unit 1033 of the data memory 103 .
  • Building name c i in the derived database 2 and building name d j in the basic database 1 are each taken as vertices, and if it is interpreted that adjacent vertices are connected by edges, the following undirected graph is Certain cycle graphs G c and G d can be constructed.
  • a cycle is a subgraph of the cycle graph G c and indicates a path whose start point and end point are the same vertex.
  • E d A set of edges obtained from the adjacency information of the basic database 1 g d : E d ⁇ P(V d ) A mapping that associates a subset of the vertex set V d with an element of Ed .
  • E c A set of edges obtained from the adjacency information of the derivative database 2 g c : E c ⁇ P(V c ) A mapping that associates a subset of the vertex set V c with the elements of E c .
  • the processor 101 of the name data association device operates as the common data extraction unit 4 . That is, the processor 101 extracts common name data between the information of the basic database 1 stored in the basic database storage unit 1031 and the information of the derived database 2 stored in the derived database storage unit 1032 (step S2). The extracted common name data is stored in temporary storage section 1033 of data memory 103 .
  • the processor 101 executes the operation as the path information extractor 5.
  • Path information indicating the extracted path ⁇ k is stored in the temporary storage unit 1033 of the data memory 103 .
  • the path information can include the number of vertices of the path ⁇ k extracted, the name data of the included vertices and their positions on the path.
  • I k the array of vertices included in the set S among the vertices forming the path ⁇ k , which are defined below.
  • I k : ( ⁇ k [i]
  • ⁇ k [i] ⁇ S, ⁇ k [i] ⁇ s k , i 1, 2, . . . ⁇ k
  • the processor 101 performs the operation as the path creating section 6.
  • L k i is a partial path from vertex l k [i] to vertex l k [i+1] in path ⁇ k .
  • l k [i] is the i-th element of the array l k .
  • the processor 101 selects, based on the extracted partial paths, among the paths whose start point is l k [i] and whose end point is l k [i+1] in the cycle graph G d of the basic database 1, All the lengths of
  • (i 1, 2, . . . ,
  • +x are counted (step S5).
  • x is a positive integer greater than or equal to 0 specified by the user. Note that the same vertex and edge are not passed twice when enumerating this path. Let the set of enumerated partial paths be
  • the processor 101 operates as the associating unit 7 . That is, the processor 101, first, if there is a path of length
  • among the set A k i of paths counted in step S5, sets that path to ⁇ . Under this, a combination of names is selected as follows (step S6). (L k i [j], ⁇ [j]), j 1, 2, . . .
  • step S7 a combination of name data is searched and associated, and the result is stored in the temporary storage unit 1033 of the data memory 103 (step S7).
  • Edit distance is disclosed, for example, in D. Gusfield. "Algorithms on strings, trees and sequences: computer science and computational biology.” Cambridge university press, 1997.
  • FIG. 6 is a schematic diagram for explaining a method of associating names.
  • name data of buildings BL d stored in the basic database 1 A building, B building, ... n building
  • names of buildings BL c stored in the derivative database 2 ⁇ building, ⁇ building, ... ⁇ building.
  • processor 101 can search for a combination of name data in the following procedure.
  • the name data for which there is only one combination is output as it is, and for other names, the name data for which the output result has already been obtained is excluded from the candidates.
  • Consistency means that when there are multiple candidate names for a certain name A and there is a name B excluded by the above operation among the candidate names, a combination of the excluded name B and the name A
  • a path P that serves as a basis for outputting (A, B). From this path P, a name combination (C, D) is also given to a name C different from the name A. Since the name combination (A,B) has been excluded, the combination (C,D) is also excluded.
  • a more specific example will be described later as an operation example.
  • the processor 101 determines whether or not all of the paths ⁇ k have been processed based on the path information extracted in step S3 (step S8). That is, it is determined whether the processing has been completed for all vertices of all paths ⁇ k . If it is determined that there is a path ⁇ k that has not yet been processed, the processor 101 updates k, shifts to the process of step S4, and repeats the processes of steps S4 to S7.
  • step S9 the processor 101 operates as the data output unit 8 to output name data association information (step S9). That is, processor 101 generates output information in a form instructed from input unit 107 or from an external data processing device from the association results stored in temporary storage unit 1033 of data memory 103, and outputs the generated output information. Stored in the output information storage unit 1034 of the data memory 103 . The processor 101 can display this output information on the display unit 108 via the input/output interface 105, or can transmit it to an external data processing device via the communication interface 104.
  • the path creation unit 6 extracts partial paths having common data as endpoints and non-common data as vertices between the endpoints. and a path having a length equal to or greater than the length of the partial path.
  • the name data held by the basic database 1 and the name data held by the derived database 2 are associated with each other.
  • synonymous name data that has spelling variations between databases to be integrated can be accurately matched without human intervention, even if the character string data corresponding to the name data does not have a corresponding relationship between databases. be able to. Therefore, it is possible to collect information without omission on a certain matter between different databases.
  • the effect of improving work efficiency can be expected by reducing human operations.
  • the graph creation unit 3 creates cycle graphs Gd and Gc , which are undirected graphs of the basic database 1 and the derivative database 2, with the name data as vertices,
  • the path information extracting unit 5 generates all paths ⁇ k whose endpoints are the common data and whose vertices are the name data held in the derived database 2 , and for each of these paths ⁇ k , the number of vertices and the names of the vertices included. Extract the path information, including the data and its position on the path.
  • the path creation unit 6 extracts partial paths from the cycle graph G c based on the path information, and extracts partial paths from the cycle graph G d for each of these partial paths.
  • a path can be created that excludes vertices that have no possibility of being associated with data.
  • the path creation unit 6 creates a path including the number of vertices equal to or greater than the number of vertices of the path ⁇ k and equal to or less than the number of vertices specified by the user. Therefore, by limiting the number of vertices included in the path, the processing time can be shortened.
  • the associating unit 7 for each vertex on the path created by the path creating unit 6, when the position on the path corresponds to the vertex on the partial path, associates the name data corresponding to the vertex on the path among the name data held by the basic database 1 with the name data for the vertex on the partial path among the name data held by the derivative database 2 . Also, if the position on the path does not correspond to the vertex on the partial path, the associating unit 7 selects the position on the path among the name data held by the basic database 1 based on the character string similarity between the name data.
  • name data corresponding to the vertices of , and name data of the vertices on the partial path among the name data held by the derived database 2 are associated with each other. Therefore, the name data held by the basic database 1 can be easily associated with the name data held by the derivative database 2 .
  • the name data association device repeats the processing of the path creation unit 6 and the association unit 7 until the processing for all paths ⁇ k generated by the path information extraction unit 5 is completed. Therefore, it is possible to reduce the probability that the name data held by the derivative database 2 fails to be associated with the name data held by the basic database 1 .
  • the name data association device uses the data output unit 8 to generate output information including a name data correspondence table based on the result of name data association. Therefore, by using this output information, it is possible to perform database integration processing. Further, the name data association device according to one embodiment may generate information of the integrated database as the output information.
  • FIG. 7 is a diagram showing an example of information held by the basic database 1 stored in the basic database storage unit 1031 in the operation example. Neighborhood information obtained from this basic database is as follows.
  • the notation (A, B) indicates that data name A and data name B are connected.
  • V c ⁇ Fukuoka Hanazono Building, Tatsukoyama Building, Fukuyama Date Building, Kuwabara Building, Fukui Fujita Building, Fukuchi Yanagawa Building, Hoshina Building, Osorezan Building, Tsukidate Building, Fukushima Kawamata Building ⁇
  • V d ⁇ Hanazono Building, Date Building, Kuwabara Building, Fujita Building, Yanagawa Building, Hoshina Building, Osorezan Building, Tsukikan Building, Kawamata Building ⁇
  • the combination of correct descriptions of the name data that is, the association of the name data is as follows. ⁇ (Tsukidate Building, Tsukikan Building), (Fukushima Kawamata Building, Kawamata Building), (Fukuoka Hanazono Building, Hanazono Building), (Fukuyama Date Building, Date Building), (Fukui Fujita Building, Fujita Building), (Fukuchi Yanagawa Building , Yanagawa Building) ⁇
  • step S1 the processor 101 of the name data association device operates as the graph creation unit 3 to create a cycle graph.
  • FIG. 9 is a schematic diagram showing an example of the cycle graph Gd created from the information held by the basic database 1 in the operation example.
  • step S2 the processor 101 operates as the common data extraction unit 4 to extract name data common to the cycle graph Gc and the cycle graph Gd .
  • step S3 the processor 101 operates as the path information extraction unit 5 to extract path information from the derivative database 2, and in step S4, operates as the path creation unit 6 to generate a partial path. Extract.
  • FIG. 10 is a schematic diagram showing an example of the path ⁇ 1 generated from the cycle graph Gc created from the information held by the derivative database 2 in the operation example.
  • the processor 101 extracts a partial path from the cycle graph G c whose endpoints are the elements of the building name set S.
  • L 1 1 (Kuwabara Building, Fujita Building, Yanagawa Building, Hoshina Building)
  • L 1 2 (Hoshina Building, Osorezan Building)
  • L 1 3 (Osorezan Building, Tsukikan Building, Kawamata Building, Hanazono Building, Date Building, Kuwabara Building)
  • step S5 the processor 101 counts paths of length 3 or more and 3+x or less having "Kuwabara Building” and "Hoshina Building” as endpoints on the cycle graph Gd for the partial path L11.
  • Length 3 (Kuwabara Building, Fukui Fujita Building, Fukuchi Yanagawa Building, Hoshina Building)
  • Length 4 (Kuwabara Building, Fukuyama Date Building, Ritsukoyama Building, Fukuoka Hanazono Building, Hoshina Building) becomes.
  • any combination has an edit distance of 1, so Candidates for "Fujita Building”: “Fukuyama Date Building”, “Ritsukoyama Building", “Fukui Fujita Building” Candidates for "Yanagawa Building”: “Fukuoka Hanazono Building", “Ritsukoyama Building”, “Fukuchi Yanagawa Building” can be considered.
  • Partial path L 1 2 is omitted because it has length 1.
  • FIG. 11 is a diagram showing an example of output information stored in this output information storage unit 1034. As shown in FIG. Although the output information is shown here as a correspondence table showing the correspondence of name data, it is of course not limited to this.
  • the number of target databases is two has been described as an example, but the number of target databases may be three or more. That is, if at least one of three or more databases holds path identification information, name data can be associated with the remaining two or more databases.
  • the processor 101 appropriately accesses an external data server through the communication interface 104, proceeds with processing using the information accumulated in the basic database 1 and the derivative database 2 constructed there, and obtains only the processing results of each step. may be stored in the temporary storage unit 1033 .
  • the capacity of the data memory 103 included in the name data association device can be suppressed, and the name data association device can be configured at low cost.
  • the method described in each embodiment can be executed by a computer (computer) as a program (software means), such as a magnetic disk (floppy (registered trademark) disk, hard disk, etc.), an optical disk (CD-ROM, DVD , MO, etc.), a semiconductor memory (ROM, RAM, flash memory, etc.), or the like, or may be transmitted and distributed via a communication medium.
  • the programs stored on the medium also include a setting program for configuring software means (including not only execution programs but also tables and data structures) to be executed by the computer.
  • a computer that realizes this apparatus reads a program recorded on a recording medium, and in some cases, builds software means by a setting program, and executes the above-described processes by controlling the operation by this software means.
  • the term "recording medium” as used herein is not limited to those for distribution, and includes storage media such as magnetic disks, semiconductor memories, etc. provided in computers or devices connected via a network.
  • the present invention is not limited to the above embodiments, and can be modified in various ways without departing from the gist of the invention at the implementation stage.
  • each embodiment may be implemented in combination as much as possible, and in that case, the combined effect can be obtained.
  • the above-described embodiments include inventions at various stages, and various inventions can be extracted by appropriately combining a plurality of disclosed constituent elements.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Un dispositif d'association de données de nom selon un mode de réalisation comprend : une unité d'extraction de données communes qui extrait, en tant que données communes, des données de nom ayant la même représentation entre une première base de données (DB) et une seconde base de données, la première base de données contenant une pluralité d'éléments de données de nom et des informations adjacentes indiquant des relations adjacentes entre les données de nom, la seconde base de données contenant la pluralité d'éléments de données de nom, des informations adjacentes entre les données de nom et des informations d'identification de chemins représentant des chemins auxquels appartiennent ces données de nom ; une unité de génération de chemin qui extrait des chemins partiels à partir d'un chemin représenté par les informations d'identification de chemin conservées par la seconde base de données, les chemins partiels ayant les données communes en tant que points d'extrémité et ayant des données non communes en tant que sommet entre les points d'extrémité, et générant, sur la base des informations conservées par la première base de données, pour chacun des chemins partiels, un chemin ayant les mêmes points d'extrémité de données communs que les points d'extrémité des chemins partiels et une longueur supérieure ou égale à celle des chemins partiels ; et une unité d'association qui recherche une combinaison des sommets sur les chemins partiels et un sommet sur le chemin pour chacun des chemins partiels pour ainsi associer les données de nom entre la première base de données et la seconde base de données.
PCT/JP2021/021548 2021-06-07 2021-06-07 Dispositif d'association de données de nom, procédé d'association de données de nom et programme d'association de données de nom WO2022259303A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2023527147A JPWO2022259303A1 (fr) 2021-06-07 2021-06-07
PCT/JP2021/021548 WO2022259303A1 (fr) 2021-06-07 2021-06-07 Dispositif d'association de données de nom, procédé d'association de données de nom et programme d'association de données de nom

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/021548 WO2022259303A1 (fr) 2021-06-07 2021-06-07 Dispositif d'association de données de nom, procédé d'association de données de nom et programme d'association de données de nom

Publications (1)

Publication Number Publication Date
WO2022259303A1 true WO2022259303A1 (fr) 2022-12-15

Family

ID=84424985

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/021548 WO2022259303A1 (fr) 2021-06-07 2021-06-07 Dispositif d'association de données de nom, procédé d'association de données de nom et programme d'association de données de nom

Country Status (2)

Country Link
JP (1) JPWO2022259303A1 (fr)
WO (1) WO2022259303A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005011049A (ja) * 2003-06-19 2005-01-13 Nec Soft Ltd データベース統合装置
JP2017123062A (ja) * 2016-01-07 2017-07-13 富士通株式会社 関係情報生成方法、装置、及びプログラム
JP2020064417A (ja) * 2018-10-16 2020-04-23 Nttテクノクロス株式会社 管理装置、管理方法及びプログラム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005011049A (ja) * 2003-06-19 2005-01-13 Nec Soft Ltd データベース統合装置
JP2017123062A (ja) * 2016-01-07 2017-07-13 富士通株式会社 関係情報生成方法、装置、及びプログラム
JP2020064417A (ja) * 2018-10-16 2020-04-23 Nttテクノクロス株式会社 管理装置、管理方法及びプログラム

Also Published As

Publication number Publication date
JPWO2022259303A1 (fr) 2022-12-15

Similar Documents

Publication Publication Date Title
CN110837550B (zh) 基于知识图谱的问答方法、装置、电子设备及存储介质
JP2021500692A (ja) 系図エンティティ解決システムおよび方法
WO2016112782A1 (fr) Procédé et système d'extraction d'une étendue de domiciliation d'utilisateur
WO2011134141A1 (fr) Procédé permettant d'extraire une entité désignée
JP2019032704A (ja) 表データ構造化システムおよび表データ構造化方法
CN113806579A (zh) 文本图像检索方法和装置
WO2022259303A1 (fr) Dispositif d'association de données de nom, procédé d'association de données de nom et programme d'association de données de nom
CN116450664A (zh) 数据处理方法、装置、设备和存储介质
JP2023014025A (ja) 方法、コンピュータプログラム、及びコンピュータシステム(文字列類似度決定)
CN115329083A (zh) 文档分类方法、装置、计算机设备和存储介质
CN115082999A (zh) 合影图像人物分析方法、装置、计算机设备和存储介质
WO2018220688A1 (fr) Générateur de dictionnaire, procédé de génération de dictionnaire, et programme
JP2019148859A (ja) フローダイアグラムを用いたモデル開発環境におけるデザインパターンの発見を支援する装置および方法
JP6365070B2 (ja) 検索プログラム、装置、及び方法
JP7392841B2 (ja) 名称データ対応付け装置、名称データ対応付け方法及びプログラム
JP7392840B2 (ja) 名称データ対応付け装置、名称データ対応付け方法及びプログラム
JP7276509B2 (ja) 名称データ対応付け装置、名称データ対応付け方法及びプログラム
JP7105718B2 (ja) 情報処理装置、情報処理方法、およびプログラム
JP2006004157A (ja) 画像検索プログラム、画像検索方法、画像検索装置及び記録媒体
JP4983397B2 (ja) 文書検索装置、および文書検索方法、並びにコンピュータ・プログラム
US11604841B2 (en) Mechanistic mathematical model search engine
JP2018041281A (ja) 検索装置、方法、及びプログラム
JP2022186543A (ja) データ管理システム及びデータ管理方法
JP2007172315A (ja) 同義語辞書生成システム、同義語辞書生成方法および同義語辞書生成プログラム
JP2005322098A (ja) 情報検索装置、情報検索方法、情報検索プログラムおよび情報検索プログラムが記録された記録媒体

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21944984

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023527147

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE