WO2014167647A1 - データ管理装置、データ管理方法及び非一時的な記録媒体 - Google Patents
データ管理装置、データ管理方法及び非一時的な記録媒体 Download PDFInfo
- Publication number
- WO2014167647A1 WO2014167647A1 PCT/JP2013/060712 JP2013060712W WO2014167647A1 WO 2014167647 A1 WO2014167647 A1 WO 2014167647A1 JP 2013060712 W JP2013060712 W JP 2013060712W WO 2014167647 A1 WO2014167647 A1 WO 2014167647A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- file
- virtual
- attribute
- information
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24575—Query processing with adaptation to user needs using context
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H40/00—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
- G16H40/20—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/60—ICT specially adapted for the handling or processing of medical references relating to pathologies
Definitions
- the present invention relates to a data management device, a data management method, and a non-temporary recording medium, and is preferably applied to a data management device, a data management method, and a non-temporary recording medium for managing unstructured data.
- a wide variety of data is electronically managed in an information system, and a user collects, processes, and displays data through the information system in order to obtain knowledge from the data.
- Such electronic data includes structural data having structural information and non-structural data not having structural information.
- the structure data is, for example, data in which various characteristics of data are managed using structure information such as attributes and attribute values.
- unstructured data does not have a structure such as an attribute and an attribute value, and is generally managed as a file in the information system.
- the information system can collect, process, and display the data by using the structure information as a clue.
- a user who uses data also uses the structural information of the structural data to compare attribute values of specific attributes between the data. This facilitates acquisition of knowledge such as differences and identity between data.
- the structure data expresses a structure that represents the data, information that does not match the structure may not be included as data.
- unstructured data does not specify the structure that represents data
- information that cannot be represented by structured data is also included as data. Therefore, there is a possibility that more information and knowledge can be obtained than structural data.
- structural information since there is no structural information, it is difficult to collect data using the structural information as a clue and discover knowledge by the user. Therefore, a technique for structuring data in response to an information acquisition request from a user is disclosed.
- Patent Document 1 discloses a technique for extracting information from a plurality of HTML documents and structuring data.
- the technology has means for storing attribute information that is structure information, the location of an HTML document that includes information that is attribute values of the attribute, and information extraction rules from the HTML document.
- attribute information that is structure information
- the location of an HTML document that includes information that is attribute values of the attribute
- information extraction rules from the HTML document.
- Patent Document 2 as a method for presenting unstructured data to a user, there is a method for presenting a structured representation of unstructured data by writing information extracted from a set of unstructured data as attribute values of attributes. It is disclosed. As a result, various information systems and users can manage unstructured data using the structure information as a clue.
- Patent Document 1 As described above, in Patent Document 1, as a means for structuring data, information extraction processing is executed when a search query is received. For this reason, the latest information at the time when the information extraction process is executed can be acquired, but the time until the search result structured for the information extraction process is acquired increases. Further, the information extraction target is an HTML document that holds a clue of structural information as tag information, and non-structural data is not the target. Patent Document 2 discloses a method of structuring unstructured data by a process of extracting information by a combination of attributes and attribute values. However, when a search query is received, it is necessary to execute an information extraction process. A certain point is the same as that of Patent Document 1.
- the present invention has been made in consideration of the above points.
- the recording medium is to be proposed.
- a first database that holds structured data in which a plurality of features of data are structured by attributes and attribute values, and unstructured non-structured data in units of files
- a storage unit that stores a second database held in the database, and the structure data and the non-structure data are combined and managed as virtual structure data accessed when executing a search query for the second database;
- the attribute value of the virtual attribute of the structure data is a value extracted from the second database file by a predetermined information extraction rule, and the second database file having the unstructured data is updated
- the virtual structure Provided with a data management device comprising a control unit for updating an attribute value of a virtual attribute of data That.
- the structural data and the non-structural data are combined into virtual structural data that is accessed when a search query is performed on the second database, and an attribute value of a virtual attribute of the virtual structural data is set to the first A value extracted from the database file 2 by a predetermined information extraction rule. Then, when the file of the second database having the non-structure data is updated, the attribute value of the virtual attribute of the virtual structure data is updated.
- the desired extraction can be performed only by accessing the structure data reflecting the state of the latest non-structured data without performing the re-extraction process on the non-structured data of the extraction source every time the search process is executed. Data can be acquired.
- unstructured data can be efficiently managed by combining unstructured data with existing structure data.
- the data management device 101 includes a memory 111, a CPU 112, a communication device 113, a storage device 114, an input device 115, a display device 116, and the like.
- the CPU 112 functions as an arithmetic processing device and a control device, and controls the overall operation of the data management device 101 according to various programs stored in the memory 111.
- the memory 111 is a ROM (Read Only Memory), a RAM (Random Access Memory), or the like.
- the ROM 202 stores a program used by the CPU 112, an operation parameter, and the like, and the RAM 203 stores a program used in the execution of the CPU 112. Parameters that change as appropriate during the execution are temporarily stored. These are connected to each other by a host bus including a CPU bus.
- the CPU 112 includes an information extraction rule registration unit 131, an information extraction rule holding unit 132, a virtual attribute update unit 133, an information extraction unit 134, a related file information holding unit 135, and an update detection unit 136.
- Each unit of the CPU 112 registers an information extraction rule to be described later, executes an information extraction process, registers related file information, and manages updating of virtual structure data according to the registered information extraction rule. The processing executed by each unit will be described in detail later.
- the communication device 113 is a communication interface configured by a communication device or the like for connecting to a network.
- the communication device 113 may be a wireless LAN (Local Area Network) compatible communication device, a wireless USB compatible communication device, or a wire communication device that performs wired communication.
- the storage device 114 is composed of, for example, an HDD (Hard Disk Drive), and stores programs executed by the CPU 112 and various data. Further, a first database 151 and a second database 152 to be described later may be stored in the storage device 114, or may be stored in a storage device separate from the data management device 101.
- HDD Hard Disk Drive
- the storage device 114 stores various programs 121, data 122, information extraction rules 123, and related file information 124 for the data management device 101 to execute processing. Each information stored in the storage device 114 will be described in detail later.
- the input device 115 is a device for inputting an instruction to a computer such as a keyboard and a mouse, and inputs an instruction such as starting a program.
- the display device 116 is a display or the like, and displays the execution status and execution result of the processing by the data management device 101.
- (1-2) Function of Data Management Device First, structured data and unstructured data managed by the data management device 101 will be described.
- the structure data will be described using a relational database as an example of data having an attribute and attribute value structure.
- data is expressed as records, and attributes are expressed as column names.
- the attribute value is written to the cell corresponding to the specific attribute in the record.
- unstructured data a file including document information, image information, video information, audio information, or the like will be described as an example.
- the information extraction rule registration unit 131 receives the information extraction rule 123 via the communication device or the input device, and adds the virtual attribute name and the table information to which the virtual attribute is included in the information extraction rule 123 from the virtual attribute addition destination.
- the information is extracted and stored in the information extraction rule holding unit 132.
- the information extraction rule 123 will be described with reference to FIG.
- the information extraction rule 123 a rule for extracting predetermined information is set, and the information extraction rule registration unit 131 stores the rule in the storage device. As shown in FIG. 2, information such as a virtual attribute name, a virtual attribute addition destination, an extraction target specifying condition, an output destination specifying condition, an extraction process content, and a use dictionary is set in the information extraction rule 123.
- the virtual attribute name is information for specifying the writing position in the structure data, and the result extracted from the file included in the non-structure data is written to the structure data.
- the virtual attribute addition destination is information for specifying a database and a table to which a virtual attribute name is added.
- the extraction target specifying condition is a condition for narrowing down the extraction target and database information including unstructured data to be extracted.
- the output destination specifying condition is a condition for specifying a position in a table to which a result extracted from unstructured data is written.
- the content of the extraction process includes the name of the attribute value output as the extraction result and the extraction condition for the attribute value.
- the use dictionary is information for setting a dictionary to be referred to when extracting information.
- the virtual attribute name is concurrent, and the table of the first database 151 to which the virtual attribute is added is the table 1 of the database A. Further, it can be seen that the file of the second database 152 to be extracted is the nursing record file of the database B. It can also be seen that the extraction result is written in the position specified by the patient ID in Table 1.
- the name of the attribute value output as the extraction result is the disease name
- the disease name indicates that the disease name defined in the medical dictionary A is extracted.
- the onset information is information that determines whether the disease includes information indicating the same meaning as the onset, such as "onset", "takes", or "sees symptoms” Indicates. If there is a description that the disease name described in the medical dictionary A has developed in accordance with the condition 1 of the extraction processing content, the disease name is extracted.
- the information extraction rule 123 shown in FIG. 2 is an example, and if a plurality of information extraction results exist, a list of a plurality of output results may be written as virtual attribute values.
- statistical processing was performed on the information extraction rule 123, a rule that writes the number of results of full-text search for the second database to a virtual attribute value, a rule that writes location information of a related file, and information in the related file A rule for writing the result may be set.
- the information extraction rule registration unit 131 uses the information set as the virtual attribute addition destination of the information extraction rule 123 to create a database (first database 151) to which the virtual attribute is added and a table 1510 included in the database. Identify. Then, the information extraction rule registration unit 131 generates a virtual structure data 153 by adding a column with the virtual attribute name as the column name to the identified database table. In this case, even if the column is not actually added to the table, a table composed of a unique ID for uniquely identifying a record included in the table and a virtual attribute is newly created, and the virtual structure data 153 is created. May be generated. As described above, after a virtual attribute is added to the specified table, information for determining an initial value set as the virtual attribute is extracted, and related file information 124 described later is registered in the related file information holding unit 135.
- the information extraction unit 134 refers to the extraction target specifying condition indicated in the information extraction rule 123 and refers to the file 1520a, 1520b, or 1520c (hereinafter referred to as file 1520) of the database (second database 152) that is the target of information extraction. In some cases, the file is specified.) Then, the file is specified using the information set in the output destination specifying condition, and the position of the virtual attribute value that is the writing destination of the information extracted from the file is specified. For example, in the information extraction rule 123 of FIG. 2, since the patient ID is specified as the output destination specifying condition, the nursing record file for each patient is specified, and the position where the information extracted from the file is written is the virtual structure data. It is specified from the virtual attribute value column in the table 1530 of 153.
- the information extraction unit 134 registers the specified file as the related file in the related file information 124 in association with the virtual attribute value specifying information for specifying the position of the virtual attribute value. For example, in the information extraction rule 123 of FIG. 2, since the patient ID is specified as the output destination specifying condition, the related file information 124 is used as a related file for associating the nursing record file for each patient with the virtual attribute value of each patient. Register with.
- the information extraction unit 134 performs information extraction processing on the related files associated with the related file information 124 for each specified virtual attribute value, and uses the virtual attribute value that specifies the extraction result as a virtual attribute value. Write to the structure data 153.
- the information extraction unit 134 registers the related file information registered in the related file information 124 of the related file information holding unit 135 in association with the information extraction rule. Thereby, the related file information 124 shown in FIG. 4 is held in the related file information holding unit 135.
- the related file information 124 includes a virtual attribute value specifying information column 1240, a related file column 1241, and an information extraction rule column 1242.
- the virtual attribute value specifying information column 1240 stores information for specifying the position of the virtual attribute value of the virtual structure data 153 to which the information extracted from the file is written.
- information for identifying a file to be extracted is stored as a related file.
- Information indicating the information extraction rule 123 is stored in the information extraction rule column 1242.
- the destination of writing the virtual attribute value extracted from the related file file1 (nursing record file for each patient) according to the information extraction rule file.
- rule is patient name A in the nursing record table 1530 of the virtual structure data 145 It can be seen that the position is specified by the column of the concurrent column in the row.
- the information indicating the related file to be extracted and the information extraction rule can be set in association with the related file information 124 of the related file information holding unit 135. Further, according to the information extraction rule of the related file information 124, the virtual attribute value is extracted from the specified related file, and the virtual attribute value is set at the position indicated by the virtual attribute value specifying information, thereby generating the virtual structure data 153. Is done.
- the update detection unit 136 checks whether the updated file matches the related file set in the related file information 124. To do.
- whether or not the file has been updated is determined based on, for example, whether or not the file update date has been changed.
- file update includes file deletion.
- the update detection unit 136 executes information extraction processing according to the information extraction rule 123 associated with the related file. Then, the virtual attribute updating unit 133 updates the extracted result as a virtual attribute value at a position specified by the output destination specifying condition and the virtual attribute name.
- the data extracted from the non-structure data is managed as the virtual structure data 153 in combination with the existing structure data, and when the non-structure data is updated, the virtual structure data 153 is also updated to update the latest data. It is said.
- the virtual structure data 153 reflecting the state of the latest non-structured data can be obtained without performing the re-extraction process on the non-structured data of the extraction source every time the search process is executed on the virtual structure data 153. It is possible to obtain desired extraction data simply by accessing.
- the data management apparatus 101 executes information extraction rule registration processing for registering a virtual attribute name, a virtual attribute addition destination, and the like based on the input information extraction rule 123. Then, the data management apparatus 101 extracts data from the information extraction target file according to the information extraction rule 123, and uses the extraction result as a virtual attribute value at the specified position in the table 1530 to which the virtual structure data 153 is written. Execute virtual attribute value initial value determination processing to be written. Further, when a file included in the second database 152 is updated, a virtual attribute update process for updating a virtual attribute corresponding to the updated file is executed. Hereinafter, each process will be described in detail.
- the information extraction rule registration unit 131 includes the information set in the virtual attribute name and the virtual attribute addition destination included in the information extraction rule 123. And the virtual attribute name and the table information to which the virtual attribute is added are stored in the related file information holding unit 135 (S102).
- the information extraction rule registration unit 131 identifies a database to which a virtual attribute is added and a table included in the database (S103). Specifically, the information extraction rule registration unit 131 specifies the database A as the database to which the virtual attribute is added when the database A and the table 1 are set as the virtual attribute addition destination of the information extraction rule 123, and Further, the table 1 included in the database A is specified.
- the information extraction rule registration unit 131 adds a column whose column name is the virtual attribute name of the information extraction rule 123 to the table specified in step S103 (S104). Specifically, the information extraction rule registration unit 131 adds a column with a column name to the table 1 identified in step S103 when the virtual attribute name of the information extraction rule 123 is set to be concurrent.
- the information extraction unit 134 specifies a file that is a target of information extraction in accordance with the extraction target specifying condition set in the information extraction rule 123 (S201).
- the information extraction unit 134 specifies a file using the information of the output destination specifying condition of the information extraction rule 123, and specifies the position of the virtual attribute value that is the writing destination of the information extracted from the file (S202). Specifically, when the output destination specifying condition is a patient ID, the information extracting unit 134 specifies a nursing record file for each patient. Then, the position to write the virtual attribute value in the table 1530 of the virtual structure data 153 is specified as the destination to write the information extracted from the nursing record file.
- the information extraction unit 134 registers the file specified in step S202 as the related file in association with the virtual attribute value specifying information for specifying the position of the virtual attribute value in the related file information 124 (S203). Specifically, since the patient ID is specified as the output destination specifying condition in the information extraction rule 123, the information extraction unit 134 is associated as a related file for associating a nursing record file for each patient with a virtual attribute value of each patient. Register in the file information 124.
- the information extraction unit 134 executes information extraction processing for the related files associated with the related file information 124 for each identified virtual attribute value (S204). Subsequently, the information extraction unit 134 writes the result of the extraction process executed in step S204 as a virtual attribute value in the specified writing position of the corresponding table 1530 of the virtual structure data 153 (S205).
- the information indicating the related file to be extracted and the information extraction rule can be set in association with the related file information 124 of the related file information holding unit 135. Further, according to the information extraction rule of the related file information 124, the virtual attribute value is extracted from the specified related file, and the virtual attribute value is set at the position indicated by the virtual attribute value specifying information, thereby generating the virtual structure data 153. Is done.
- the update detection unit 136 determines whether a file included in the second database 152 that is a target of information extraction has been updated (S301).
- step S301 If it is determined in step S301 that the file has been updated, the update detection unit 136 acquires the related file information 124 stored in the related file information storage unit 135 and matches the updated file. It is confirmed whether there is (S302).
- the update detection unit 136 determines whether there is a matching related file in the confirmation in step S302 (S303). If it is determined in step S303 that no matching file exists, the update detection unit 136 repeats the processing from step S301 onward. On the other hand, if it is determined in step S303 that there is a matching file, the update detection unit 136 executes the process of step S304.
- the update detection unit 136 executes information extraction processing on the matching related files according to the information extraction rule 123 corresponding to the related file information 124 (S304). Then, the virtual attribute update unit 133 updates the result extracted by the information extraction process executed in step S304 as the virtual attribute value at the position specified by the output destination specifying condition and the virtual attribute name (S305).
- the data extracted from the non-structure data is managed as the virtual structure data 153 in combination with the existing structure data.
- the virtual structure data 153 is also updated to update the latest data. It is said.
- the virtual structure data 153 reflecting the state of the latest non-structured data can be obtained without performing the re-extraction process on the non-structured data of the extraction source every time the search process is executed on the virtual structure data 153. It is possible to obtain desired extraction data simply by accessing.
- the virtual structure data management screen 500 is a screen that a user uses for managing virtual structure data.
- FIG. 8 shows an example of managing a virtual structure database that has an IP address 192.168.1.1 as an access point and is given the name medical information.
- the virtual DB name 501 displays medical information indicating the database name and 192.168.1.1 indicating the IP address.
- table name 502 a list of table names managed as virtual structure data is displayed.
- table information of the existing structure database selected by the user to be managed as virtual structure data is displayed side by side.
- hyphens indicating influenza or not applicable are displayed in the concurrent column of sample 506 as the extraction result.
- related file information that is a file from which the word / phrase is extracted is displayed. At this time, in addition to the file name, it may be displayed from which part in the file the word is extracted. Moreover, you may display the information extraction rule utilized in order to extract the phrase.
- an arbitrary attribute is added as a virtual attribute to data included in the structured first database 151, and the virtual attribute Information extraction rule with the attribute value of the second database 152 as a result of the search query for the second database 152 is registered, and the file of the second database 152 involved in the derivation of the result of the search query is stored in association with the information extraction rule To do. Then, when the related file is updated, the search query is executed again, and the execution result is set as a new attribute value of the virtual attribute.
- the virtual structure data 153 reflecting the state of the latest non-structured data can be obtained without performing the re-extraction process on the non-structured data of the extraction source every time the search process is executed on the virtual structure data 153. It is possible to obtain desired extraction data simply by accessing.
- the update / addition detection unit 137 has a function of detecting the addition of a file to the second database 152 that manages unstructured data.
- the additional file checking unit 138 has a function of adding the information of the file added to the related file information holding unit 135 and a function of writing the information extracted from the added file to the corresponding virtual attribute value of the structure data. .
- the additional file inspection unit 138 receives the location information of the file added to the second database 152 from the additional detection unit 137 (S401). ). Then, the additional file checking unit 138 acquires the information extraction rule 123 from the information extraction rule holding unit 132 (S402).
- the additional file inspection unit 138 acquires the extraction target specifying condition for specifying the file as the information extraction target from the information extraction rule 123 (S403).
- step S403 for example, when the information extraction rule 123 shown in FIG. 2 is used, the database B and the nursing record are extracted as the extraction target specifying condition.
- the additional file inspection unit 138 checks whether the additional file matches the extraction target specifying condition (S404). In this embodiment, it is checked whether the additional file is data added to the database B or a file belonging to the nursing record.
- the additional file inspection unit 138 determines whether the file matches the extraction target specifying condition as a result of the inspection in step S404 (S405). If it is determined in step S405 that the file does not match, the additional file checking unit 138 ends the process. On the other hand, if it is determined in step S405 that the files match, the additional file inspection unit 138 executes the process of step S406.
- step S406 the additional file checking unit 138 specifies the position of the virtual attribute value to which information extracted from the additional file is written using the output destination specifying condition of the acquired information extraction rule 123. Subsequently, the additional file inspection unit 138 associates the additional file with the identified virtual attribute value position as a related file (S407).
- the information extraction unit 134 executes information extraction processing for the related files associated with the related file information 124 for each identified virtual attribute value (S408). Subsequently, the information extraction unit 134 writes the result of the extraction process executed in step S204 as a virtual attribute value in the specified writing position of the corresponding table 1530 of the virtual structure data 153 (S409).
- the update / addition detection unit 137 can detect the update of the added file. . If there is a change in the result of information extraction according to the information extraction rule 123 corresponding to the related file, the process of updating the virtual attribute value in the table 1530 of the virtual structure data 153 is repeated.
- step S405 Even when it is determined in step S405 that the additional file does not match the extraction target specification condition, there is a possibility that the subsequent update will meet the extraction target specification condition. In that case, the added file may be stored as an unrelated file, and when the unrelated file is updated, the process shown in FIG. 10 may be executed again.
- a search query is executed on unstructured data, information extraction processing is executed from the resulting file, and the extraction is performed.
- the result is written in a virtual attribute value indicating one characteristic of data included in the structure data that can be specified by the information extraction rule.
- the virtual structure data management device that specifies the position of the virtual attribute value to which the information extraction result is written by using the attribute value of the attribute other than the virtual attribute among the data included in the structure data. An example will be described.
- the data management device 101 according to the present embodiment has the same hardware configuration as that of the first embodiment, detailed description thereof is omitted. Further, the data management apparatus 101 according to the present embodiment is different from the first embodiment in that an information extraction rule expansion unit 139 and a structure data acquisition unit 140 are provided as shown in FIG.
- the structure data acquisition unit 140 has a function of acquiring structure data related to the received information extraction rule 123.
- the information extraction rule extension unit 139 has a function of extending the information extraction rule 123 using the structure data acquired by the structure data acquisition unit 140.
- the information extraction rule registration unit 131 determines whether the information extraction rule 123 has been received via the communication device 113 or the input device 115 (S501).
- the information extraction rule registration unit 131 includes information set in the virtual attribute name and the virtual attribute addition destination included in the information extraction rule 123. And the virtual attribute name and the table information to which the virtual attribute is added are stored in the information extraction rule holding unit 132 (S502). Assume that the patient information table 1510 included in the first database 1510 shown in FIG.
- the structure data acquisition unit 140 acquires the attribute value of the attribute that identifies each row of the table 1510 acquired in step S502 (S503).
- the value for identifying each row in the table 1510 is an attribute value that is different between each row included in the table 1510, and is a value that can uniquely identify each row. For example, when the patient names are all different, only the patient name may be used, or when each row is uniquely identified by combining the patient name and the hospitalization date, the combination of the patient name and the hospitalization date may be used. Further, it may be a patient ID set to identify each row of the table 1510.
- the information extraction rule extension unit 139 adds an identification attribute value for identifying each row acquired in step S503 to the output destination specifying condition of the information extraction rule 123 (S504). As illustrated in FIG. 13, the information extraction rule expansion unit 139 adds the patient name and hospitalization date that identify each row of the table 1510 to the output destination specifying condition of the information extraction rule 123.
- the related file is set according to the expanded output destination specifying condition. Identify. Then, the information specifying the position of the virtual attribute value of the record including the attribute value used for extending the output destination specifying condition is associated with the related file.
- the patient names A, B, and C are attribute values for extending the output destination specifying conditions.
- the virtual attribute name is “coincident”
- the file related to the virtual attribute value exists in the database B, and the related file in which the description about Mr. A exists specifies the position of the virtual attribute of the record with the patient name A Associated with the information
- the output destination specifying conditions extended in this way are displayed as extended rules related to related files in the virtual structure data management screen 500 presented to the user in FIG.
- a patient name & hospitalization date @ patient table may be displayed as an extended rule. This means that a file including both the patient name and hospitalization date of the patient table managed as virtual structure data as information is used as the related file.
- the search for unstructured data included nursing records and disease names.
- the nursing record and the disease name are included, the patient name is Mr. C, and the hospitalization date is December 1st.
- the files to be extracted can be further narrowed down.
- data is extracted from unstructured data using attribute values of attributes other than virtual attributes of data included in the structured data.
- the position of the virtual attribute value to which the information extraction result is written can be specified. As a result, even when a large amount of data is included in the structure data, it is possible to simplify the description of the rule for specifying the writing destination of the information extraction result.
- a file included in non-structural data related to determination of the virtual attribute value is stored in the related file information 124 as a related file. is doing. Then, information is extracted from the related file, and the information extraction result is written as a virtual attribute value.
- the user wants to know the details of the information extraction source information, the user can acquire the related file itself and refer to the contents. At this time, if there are a large number of related files, it becomes difficult for the user to see all the contents.
- the attribute value of the attribute included in the structure data other than the virtual attribute is used to manage the strength of the association with the data for a plurality of related files. Therefore, when there are many related files, the user can refer to a file having a strong connection with the extracted data.
- the structure data acquisition unit 140 has a function of acquiring structure data related to the received information extraction rule 123.
- the relation strength calculation unit 141 has a function of calculating the relation strength between the relation file and the virtual attribute value using the structure data acquired by the structure data acquisition unit 140.
- the information extraction rule registration unit 131 associates a related file with a virtual attribute value using an extraction target specifying condition described in the information extraction rule 123 and an output destination specifying condition (S601). ).
- the structure data acquisition unit 140 acquires an attribute value other than the virtual attribute value of the record associated with the related file in step S601 (S602).
- the relation strength calculation unit 141 calculates the relation strength between the attribute value acquired in step S602 and the relation file (S603).
- the relation strength may count the number of times the attribute value acquired in step S602 appears in the related file. If the attribute value is a character string, the number of occurrences of the synonym or synonym may be counted. Good. Further, each attribute value may be weighted according to the presence or absence of duplication between records, and a value obtained by multiplying the number of appearances by a weighting coefficient may be calculated. Further, when there are a plurality of attribute values acquired in step S603, configuration information in the related file such as the proximity of appearance positions of the plurality of attribute values in the related file may be used.
- the relation strength calculation unit 141 stores the relation strength calculated by these methods in the relation file information 124 for each relation file (S604). Specifically, the related strength calculation unit 141 stores the calculated related strength (score) in the related strength (score) column 1243 of the related file information 124 illustrated in FIG. 16 for each related file.
- the related strength (score) set in steps S603 and S604 is used in response to the user's file request. For example, when the user refers to the related file from which the extraction is performed in order to investigate the details of the virtual attribute value of “Mr. A, co-occurring”, file12. doc, file11. doc, file1. doc.
- an object included in a file is extracted, and the extraction result is registered as a virtual attribute value of data included in the structure data.
- the file to be extracted is a document
- words included in the document and related words such as synonyms and synonyms of the words.
- the file to be extracted is a moving image
- the image and name of the moving image can be extracted.
- the files to be extracted include not only the objects explicitly expressed in the file, but also the category and class of the file, the prediction of information that will appear in the future, whether positive information or negative information, etc.
- Various information obtained by analyzing the information in the file is included. Therefore, in the present embodiment, in order to extract such information, statistical processing of information included in the file is acquired, and analysis processing or data mining for making a determination on the result is performed.
- the statistical calculation unit 142 has a function of performing a statistical calculation defined for information associated with a related file.
- the statistical calculation unit 142 acquires statistical information for information in one or more related files and makes a determination on the result. Add analytical processing or data mining. Then, by writing the result of analysis processing or data mining by the statistical calculation unit 142 to the structure data as a virtual attribute value, it is also possible to structure object information that is not explicitly expressed in the related file.
- the statistical calculation unit 142 performs the following processing when the virtual attribute value that is the information extraction destination from the unstructured data is specified after the information extraction rule 123 is registered or the file of the unstructured data is updated or added. To start.
- the statistical calculation unit 142 acquires a file related to the identified virtual attribute value from the related file information holding unit 135 (S701).
- the statistical calculation unit 142 performs statistical calculation according to a predetermined statistical calculation rule for one or more related files (S702).
- the statistical calculation rule used in step S702 can be exemplified by the statistical calculation rule shown in FIG.
- rule 1 is a rule for calculating the number of words that match words appearing in the dictionary.
- rule 2 shows a positive meaning such as “can”, “recovery”, “becomes better” and a negative meaning such as “can't”, “deteriorates”, “becomes worse”. It is a rule that tabulates the frequency of appearance of words.
- rule 3 is a rule for counting the number of words belonging to a specific category or class, such as a word related to treatment, a word related to rehabilitation, and a word related to meal.
- the statistical calculation unit 142 After carrying out the aggregation results according to the statistical calculation rules described above, the statistical calculation unit 142 notifies the information extraction unit 134 of the aggregation results (S703).
- the information extraction unit 134 applies the information extraction rule to the statistical calculation result notified in step S703, writes the result as the information extraction result, and writes it as the identified virtual attribute value (S704).
- an information extraction rule applied in step S704 for example, one is a rule for registering a disease name word having the highest appearance frequency.
- One is a rule in which the number of positive information and negative information is compared, and if there is a lot of positive information, it is positive.
- One is a rule for writing a category name if the number of words in a specific category is large.
- One is a rule for registering words derived from a plurality of category names that appear.
- statistical calculation may be performed using metadata attached to the file.
- file creator information For example, file creator information, updater information, and personal information such as a person included in the file may be used.
- file creator information only files created or updated by a specific creator may be subject to statistical calculation. Thereby, statistical calculation can be performed using only files created or updated by a reliable person, and the accuracy of information can be improved.
- Metadata accompanying other than personal information may be used.
- file creation time or update time, time information included in the file, or the like may be used.
- the tendency of numerical change may be extracted from the time information attached to the file and the numerical information in the file, and the future numerical value may be extracted as the predicted value.
- various metadata such as position information, language information, color information, right information, access right information, or version information may be used.
- the data that is the target of information extraction is unstructured data, but the data that is the target of information extraction may be arbitrary data including structural data.
- an arbitrary target data group is divided into appropriate partial data. Then, the divided partial data is handled in the same manner as the related file described above, and the update of the partial data is detected.
- the partial data is updated, the result obtained by applying the information extraction rule to the partial data is updated as the virtual attribute value of the virtual structure data.
- the present invention is not limited to the above-described embodiment, and includes various modifications.
- the above-described embodiment has been described in detail for easy understanding of the present invention, and is not necessarily limited to the one having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. . Further, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.
- each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit.
- Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor.
- Information such as programs, tables, and files that realize each function can be stored in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.
- the control lines and information lines are those that are considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Public Health (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Medical Informatics (AREA)
- Epidemiology (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
(1-1)データ管理装置の構成
まず、図1を参照して、データ管理装置101のハードウェア構成について説明する。図1に示すように、データ管理装置101は、メモリ111、CPU112、通信装置113、記憶装置114、入力装置115及び表示装置116などを備える。
まず、データ管理装置101において管理する構造データ及び非構造データについて説明する。構造データとしては、データを属性と属性値の構造を持つ例として、関係データベースを用いて説明する。関係データベースでは、データはレコードとして表現され、属性はカラム名で表現される。属性値は、レコード中の特定の属性に対応するセルに書き込まれる。非構造データとしては、文書情報、画像情報、映像情報または音声情報などを含むファイルを例として説明する。
次に、データ管理装置101の動作の詳細について説明する。データ管理装置101は、まず、入力された情報抽出ルール123をもとに仮想属性名や仮想属性追加先などを登録する情報抽出ルール登録処理を実行する。そして、データ管理装置101は、情報抽出ルール123にしたがって、情報抽出対象のファイルからデータを抽出して、抽出結果を仮想属性値として仮想構造データ153の書き込み先のテーブル1530の特定された位置に書き込む仮想属性値初期値決定処理を実行する。さらに、第2のデータベース152に含まれるファイルが更新された場合には、更新されたファイルに対応する仮想属性を更新する仮想属性更新処理を実行する。以下、各処理を詳細に説明する。
図5を参照して、情報抽出ルール登録処理の詳細を説明する。図5に示すように、情報抽出ルール登録部131は、通信装置113または入力装置115を介して情報抽出ルール123を受信したかを判断する(S101)。
次に、図6を参照して、仮想属性値初期値決定処理の詳細を説明する。図6に示すように、情報抽出部134は、情報抽出ルール123に設定されている抽出対象特定条件にしたがって、情報抽出の対象となるファイルを特定する(S201)。
次に、図7を参照して、仮想属性更新処理の詳細を説明する。図7に示すように、更新検知部136は、情報抽出の対象となる第2のデータベース152に含まれるファイルが更新されたかを判断する(S301)。
次に、図8を参照して、仮想構造データ管理画面500について説明する。仮想構造データ管理画面500は、ユーザが仮想構造データの管理に利用する画面である。図8では、アクセスポイントとしてIPアドレス192.168.1.1を有し、医療情報という名前が付与された仮想構造データベースを管理する例を示す。
以上のように、本実施の形態によれば、構造化された第1のデータベース151に含まれるデータに任意の属性を仮想属性として追加し、仮想属性の属性値を第2のデータベース152に対する検索クエリの結果とする情報抽出ルールを登録し、検索クエリの結果の導出に関わった第2のデータベース152のファイルを関連ファイルとして情報抽出ルールと関連付けて記憶する。そして、関連ファイルが更新された場合に、検索クエリを再度実行して、その実行結果を仮想属性の新たな属性値とする。
以下では、第2のデータベース152のファイルについて、ファイルの更新や削除に加えて、新規に作成されたファイルの追加があった場合について説明する。新規ファイルが追加された場合には、第1のデータベース151に含まれるテーブル1510の仮想属性値が変更になる場合がある。そこで、本実施の形態では、追加されたファイルがいずれの仮想属性値に影響するかを特定する。
本実施の形態にかかるデータ管理装置101は、第1の実施の形態と同様のハードウェア構成であるため、詳細な説明は省略する。また、本実施の形態にかかるデータ管理装置101は、第1の実施の形態とは、図9に示すように、更新/追加検知部137及び追加ファイル検査部138を備える点で異なっている。
図10に示すように、まず、追加ファイル検査部138は、追加検知部137から第2のデータベース152に追加されたファイルの所在情報を受け取る(S401)。そして、追加ファイル検査部138は、情報抽出ルール保持部132から情報抽出ルール123を取得する(S402)。
以上のように、本実施の形態によれば、非構造データに新規ファイルが追加された場合においても、ユーザは、その新規ファイルから抽出できる最新の情報を反映した構造データに対して検索することができる。また、第1の実施の形態と同様に、ユーザが構造データに対する検索を実行するたびに非構造データに対する情報抽出処理の実行は不要なため、検索結果を得るまでの時間を短縮することができる。
以下では、第1の実施の形態と同様に、非構造データに対して検索クエリを実行し、その結果得られたファイルから情報抽出処理を実行し、その抽出結果を情報抽出ルールにより特定できる構造データに含まれるデータの1つの特徴を示す仮想属性値に書き込む。構造データに大量のデータが含まれる場合には、情報抽出結果を書き込む仮想属性値の位置を一意に特定することが困難となる場合がある。
本実施の形態にかかるデータ管理装置101は、第1の実施の形態と同様のハードウェア構成であるため、詳細な説明は省略する。また、本実施の形態にかかるデータ管理装置101は、第1の実施の形態とは、図11に示すように、情報抽出ルール拡張部139及び構造データ取得部140を備える点で異なっている。
図12を参照して、情報抽出ルール123が与えられた場合に、情報抽出ルールを拡張する処理について説明する。
以上のように、本実施の形態によれば、構造データに含まれるデータが持つ仮想属性以外の属性の属性値を用いて、非構造データから抽出された情報抽出結果を書き込む仮想属性値の位置を特定できる。これにより、構造データに大量のデータが含まれていても情報抽出結果の書き込み先を特定するルールの記述を簡易化できる。
第1の実施の形態では、構造データの仮想属性に対し、その仮想属性値の決定に関わる非構造データに含まれるファイルを関連ファイルとして関連ファイル情報124に記憶している。そして、その関連ファイルから情報を抽出し、その情報抽出結果を仮想属性値として書き込む。ユーザが情報抽出元の情報の詳細を知りたい場合には、関連ファイル自体を取得し、その中身を参照することができる。この時、関連ファイルが多数ある場合は、ユーザが全ての中身を見ることは困難になる。
本実施の形態にかかるデータ管理装置101は、第1の実施の形態と同様のハードウェア構成であるため、詳細な説明は省略する。また、本実施の形態にかかるデータ管理装置101は、第1の実施の形態とは、図14に示すように、構造データ取得部140及び関連強さ計算部141を備える点で異なっている。
図15を参照して、関連ファイルを特定すると同時に、関連ファイルと仮想属性値との関連強さを計算する処理について説明する。
以上のように、本実施の形態によれば、関連ファイルが複数ある場合に、関連元となる構造データに含まれるデータとの結び付きの強さ順に、関連ファイルを並び替えてユーザに提示することができる。これにより、ユーザが関連ファイルを参照する場合には、その結び付き強さをヒントにして、複数の関連ファイルから優先的に参照する関連ファイルを特定することができる。
第1の実施の形態では、ファイルに含まれるオブジェクトを抽出し、その抽出結果を構造データに含まれるデータの仮想属性値として登録している。抽出対象となるファイルが文書の場合には、その文書に含まれる単語やその単語の類義語や同義語などの関連する単語を抽出することできる。また、抽出対象となるファイルが動画の場合には、その動画の画像と名前を抽出できる。また、抽出対象となるファイルには、ファイル中に明示的に表現されるオブジェクトだけでなく、そのファイルのカテゴリやクラス、今後出現する情報の予測、肯定的情報か否定的情報かの別、などファイル内の情報を分析して得られる様々な情報が含まれている。そこで、本実施の形態ででは、このような情報を抽出するために、ファイル内に含まれる情報の統計を取得し、その結果に対して判定を加える分析処理またはデータマイニングを実施する。
本実施の形態にかかるデータ管理装置101は、第1の実施の形態と同様のハードウェア構成であるため、詳細な説明は省略する。また、本実施の形態にかかるデータ管理装置101は、第1の実施の形態とは、図17に示すように、統計計算部142を備える点で異なっている。
図18を参照して、非構造データからの情報抽出時に関連ファイルに対する統計情報を利用する情報抽出処理について説明する。
以上のように、本実施の形態によれば、非構造データ内のファイルに明示的に表現されないオブジェクトの情報を構造化し、当該オブジェクトの情報を構造データに含まれるデータの仮想属性値として管理することができる。
上記実施形態では、情報抽出の対象となるデータを非構造データとしているが、情報抽出の対象となるデータを、構造データを含む任意のデータとしてもよい。その場合には、対象となる任意のデータ群を適当な部分データに分割する。そして、分割した部分データを上記した関連ファイルと同様に扱い、部分データの更新を検知する。部分データに更新があった場合には、部分データに対して情報抽出ルールを適用して得られる結果を仮想構造データの仮想属性値として更新する。
111 メモリ
112 CPU
113 通信装置
114 記憶装置
115 入力装置
116 表示装置
131 情報抽出ルール登録部
132 情報抽出ルール保持部
133 仮想属性更新部
134 情報抽出部
135 関連ファイル情報保持部
136 更新検知部
Claims (15)
- データが有する複数の特徴を属性と属性値により構造化された構造データを保持する第1のデータベースと、構造化されていない非構造データをファイル単位で保持する第2のデータベースとを記憶する記憶部と、
前記構造データと前記非構造データとを組み合わせて、前記第2のデータベースに対する検索クエリ実行時にアクセスされる仮想構造データとして管理し、前記仮想構造データの仮想属性の属性値を前記第2のデータベースのファイルから所定の情報抽出ルールで抽出した値とし、前記非構造データを有する前記第2データベースのファイルが更新された場合に、前記仮想構造データの仮想属性の属性値を更新する制御部と
を備えることを特徴とする、データ管理装置。 - 前記制御部は、
前記第1のデータベースに含まれるデータに前記仮想属性の属性値を追加して仮想構造データを生成し、前記仮想属性の属性値を前記第2のデータベースに対する検索クエリの結果とする情報抽出ルールを登録し、前記検索クエリの結果の導出に関わった前記第2のデータベースのファイルを関連ファイルとして前記情報抽出ルールと関連付けて記憶し、
前記関連ファイルが更新された場合に、前記検索クエリを再度実行して、その実行結果を前記仮想属性の新たな属性値とする
ことを特徴とする、データ管理装置。 - 前記制御部は、
前記第2のデータベースに新規にファイルが追加された場合に、前記情報抽出ルールに示される検索クエリの条件に前記追加されたファイルが合致するかを確認し、合致する場合に前記検索クエリを再度実行して、その実行結果を前記仮想属性の新たな属性値とする
ことを特徴とする、請求項1に記載のデータ管理装置。 - 前記制御部は、
前記仮想属性の属性値を検索する検索クエリを第1のクエリとし、
前記仮想属性の他にデータが有する属性の属性値を前記仮想属性の属性値を検索する条件として前記第1のクエリに加えて第2の検索クエリとし、
前記第2の検索クエリの結果を前記仮想属性の属性値とする情報抽出ルールを登録する
ことを特徴とする、請求項1に記載のデータ管理装置。 - 前記制御部は、
前記データの仮想属性以外の属性に対する属性値が含まれる数を計測し、
前記計測した数に応じたデータと前記関連ファイルの結び付きの強さを、前記関連ファイルに関連付けて記憶する
ことを特徴とする、請求項2に記載のデータ管理装置。 - 前記制御部は、
前記第2のデータベースに対する検索結果に対して、検索結果のファイル中に出現する特定のオブジェクト数を計測して統計情報を計算し、
前記計測したオブジェクト数に応じた特定の値を導出するためのマッピング情報を管理し、
前記導出した値を前記仮想属性の属性値とする
ことを特徴とする、請求項1に記載のデータ管理装置。 - 前記制御部は、
前記関連ファイルの作成者情報、更新者情報、ファイルに含まれる人物情報といった関連ファイルに関連付けられる人物情報を取得し、
前記関連ファイルにて取得した人物情報と前記関連ファイルから抽出したオブジェクトの統計情報を組み合わせ、人物とオブジェクト統計情報の組み合わせ情報を前記仮想属性の属性値情報とする
ことを特徴とする、請求項6に記載のデータ管理装置。 - 前記制御部は、
前記関連ファイルの作成日時、更新日時、第2のデータベースへの登録日時、ファイルに含まれる時刻情報といった時間情報を取得し、
前記取得した時刻情報順に前記関連ファイルを並び替え、前記関連ファイルに含まれる特定のオブジェクト数を計測し、前記関連ファイル間で前記計測したオブジェクト数を比較して時間ごとのオブジェクト出現数の変遷を抽出し、その結果を前記仮想属性の傾向情報とする
ことを特徴とする、請求項6に記載のデータ管理装置。 - 前記制御部は、
データをファイル単位で保持する前記第2のデータベースの他に、データをある特定の区分で分割して保持する任意のデータベースを組み合わせて管理し、
前記任意のデータベースに対する検索クエリの結果とする抽出ルールを登録し、
前記検索クエリの結果の導出に関わった前記任意のデータベースの特定の区分を前記関連ファイルと同様の関連区分として記憶し、
前記関連区分が更新された場合に、前記検索クエリを再度実行して、その実行結果を前記仮想属性の新たな属性値とすることを特徴とする
請求項1~8のいずれかに記載のデータ管理装置。 - データが有する複数の特徴を属性と属性値により構造化された構造データを保持する第1のデータベースと、構造化されていない非構造データをファイル単位で保持する第2のデータベースとを記憶する記憶部と、前記構造データと前記非構造データとを組み合わせて前記第2のデータベースに対する検索クエリ実行時にアクセスされる仮想構造データとして管理する制御部と、を備えるデータ管理装置におけるデータ管理方法であって、
前記制御部が、前記仮想構造データの仮想属性の属性値を前記第2のデータベースのファイルから所定の情報抽出ルールで抽出した値とする第1のステップと、
前記制御部が、前記非構造データを有する前記第2データベースのファイルが更新された場合に、前記仮想構造データの仮想属性の属性値を更新する第2のステップと、
を含むことを特徴とする、データ管理方法。 - 前記制御部が、前記第1のデータベースに含まれるデータに前記仮想属性を追加して仮想構造データを生成する第3のステップと、
前記制御部が、前記仮想属性の属性値を前記第2のデータベースに対する検索クエリの結果とする情報抽出ルールを登録する第4のステップと、
前記検索クエリの結果の導出に関わった前記第2のデータベースのファイルを関連ファイルとして前記情報抽出ルールと関連付けて記憶する第5のステップと、
前記関連ファイルが更新された場合に、前記検索クエリを再度実行して、その実行結果を前記仮想属性の新たな属性値とする第6のステップと
を含むことを特徴とする、データ管理方法。 - 前記制御部が、前記第6のステップにおいて、前記第2のデータベースに新規にファイルが追加された場合に、前記情報抽出ルールに示される検索クエリの条件に前記追加されたファイルが合致するかを確認し、合致する場合に前記検索クエリを再度実行して、その実行結果を前記仮想属性の新たな属性値とする第7のステップを含む
ことを特徴とする、請求項9に記載のデータ管理方法。 - 前記制御部が、前記第4のステップにおいて、前記仮想属性の属性値を検索する検索クエリを第1のクエリとし、前記仮想属性の他にデータが有する属性の属性値を前記仮想属性の属性値を検索する条件として前記第1のクエリに加えて第2の検索クエリとし、前記第2の検索クエリの結果を前記仮想属性の属性値とする情報抽出ルールを登録する第8のステップを含む
ことを特徴とする、請求項9に記載のデータ管理方法。 - 前記制御部が、前記第5のステップにおいて、前記データの仮想属性以外の属性に対する属性値が含まれる数を計測し、前記計測した数に応じたデータと前記関連ファイルの結び付きの強さを、前記関連ファイルに関連付けて記憶する第9のステップを含む
ことを特徴とする、請求項9に記載のデータ管理方法。 - コンピュータを、
データが有する複数の特徴を属性と属性値により構造化された構造データを保持する第1のデータベースと、構造化されていない非構造データをファイル単位で保持する第2のデータベースとを記憶する記憶部と、
前記構造データと前記非構造データとを組み合わせて前記第2のデータベースに対する検索クエリ実行時にアクセスされる仮想構造データとして管理し、前記仮想構造データの仮想属性の属性値を前記第2のデータベースのファイルから所定の情報抽出ルールで抽出した値とし、前記非構造データを有する前記第2データベースのファイルが更新された場合に、前記仮想構造データの仮想属性の属性値を更新する制御部と、
を備えるデータ管理装置として機能させるためのプログラムを記録する非一時的な記録媒体。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2013/060712 WO2014167647A1 (ja) | 2013-04-09 | 2013-04-09 | データ管理装置、データ管理方法及び非一時的な記録媒体 |
JP2015510993A JP6042974B2 (ja) | 2013-04-09 | 2013-04-09 | データ管理装置、データ管理方法及び非一時的な記録媒体 |
US14/782,237 US20160041992A1 (en) | 2013-04-09 | 2013-04-09 | Data management apparatus, data management method and non-transitory recording medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2013/060712 WO2014167647A1 (ja) | 2013-04-09 | 2013-04-09 | データ管理装置、データ管理方法及び非一時的な記録媒体 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014167647A1 true WO2014167647A1 (ja) | 2014-10-16 |
Family
ID=51689083
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/060712 WO2014167647A1 (ja) | 2013-04-09 | 2013-04-09 | データ管理装置、データ管理方法及び非一時的な記録媒体 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20160041992A1 (ja) |
JP (1) | JP6042974B2 (ja) |
WO (1) | WO2014167647A1 (ja) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101601916B1 (ko) * | 2014-04-30 | 2016-03-21 | 울산과학기술원 | 외래진료에 대한 프로세스 시뮬레이션 모델 도출 시스템 및 이를 이용한 프로세스 시뮬레이션 모델 도출 방법 |
US20170031966A1 (en) * | 2015-07-29 | 2017-02-02 | International Business Machines Corporation | Ingredient based nutritional information |
US10521464B2 (en) * | 2015-12-10 | 2019-12-31 | Agile Data Decisions, Llc | Method and system for extracting, verifying and cataloging technical information from unstructured documents |
US10956467B1 (en) * | 2016-08-22 | 2021-03-23 | Jpmorgan Chase Bank, N.A. | Method and system for implementing a query tool for unstructured data files |
US10877944B2 (en) * | 2019-05-08 | 2020-12-29 | Atlassian Pty Ltd. | External data repository file integration using a virtual file system |
CN111177156B (zh) * | 2019-12-31 | 2023-10-03 | 广东科学技术职业学院 | 一种大数据存储方法及系统 |
JP2021189569A (ja) * | 2020-05-26 | 2021-12-13 | 富士通株式会社 | データ更新プログラム、データ更新装置及びデータ更新方法 |
CN112765712A (zh) * | 2021-01-20 | 2021-05-07 | 广联达科技股份有限公司 | Bim数据的结构化管理方法、装置、计算机设备及存储介质 |
CN113705415B (zh) * | 2021-08-23 | 2023-10-27 | 中国电子科技集团公司第十五研究所 | 基于雷达情报的空情目标特征提取方法及装置 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010282241A (ja) * | 2007-08-20 | 2010-12-16 | Nec Corp | ファイル管理装置、ファイル管理システム、ファイル管理方法、および、プログラム |
JP2012515407A (ja) * | 2009-01-16 | 2012-07-05 | グーグル・インコーポレーテッド | 非構造化電子文書コレクションからの情報の取り出しおよび表示 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3708146B2 (ja) * | 1994-10-14 | 2005-10-19 | 富士通株式会社 | ファイルシステムおよびそのファイルシステムで管理される情報の属性構造 |
US8200775B2 (en) * | 2005-02-01 | 2012-06-12 | Newsilike Media Group, Inc | Enhanced syndication |
US8347088B2 (en) * | 2005-02-01 | 2013-01-01 | Newsilike Media Group, Inc | Security systems and methods for use with structured and unstructured data |
US20080275731A1 (en) * | 2005-05-18 | 2008-11-06 | Rao R Bharat | Patient data mining improvements |
JP2007199315A (ja) * | 2006-01-25 | 2007-08-09 | Ntt Software Corp | コンテンツ提供装置 |
JP2010211438A (ja) * | 2009-03-10 | 2010-09-24 | Hitachi Ltd | 文書検索装置及び文書検索方法 |
JP5485866B2 (ja) * | 2010-12-28 | 2014-05-07 | 株式会社日立ソリューションズ | 情報管理方法、及び情報提供用計算機 |
-
2013
- 2013-04-09 US US14/782,237 patent/US20160041992A1/en not_active Abandoned
- 2013-04-09 JP JP2015510993A patent/JP6042974B2/ja not_active Expired - Fee Related
- 2013-04-09 WO PCT/JP2013/060712 patent/WO2014167647A1/ja active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010282241A (ja) * | 2007-08-20 | 2010-12-16 | Nec Corp | ファイル管理装置、ファイル管理システム、ファイル管理方法、および、プログラム |
JP2012515407A (ja) * | 2009-01-16 | 2012-07-05 | グーグル・インコーポレーテッド | 非構造化電子文書コレクションからの情報の取り出しおよび表示 |
Also Published As
Publication number | Publication date |
---|---|
JP6042974B2 (ja) | 2016-12-14 |
US20160041992A1 (en) | 2016-02-11 |
JPWO2014167647A1 (ja) | 2017-02-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6042974B2 (ja) | データ管理装置、データ管理方法及び非一時的な記録媒体 | |
US9836389B2 (en) | Test data generation utilizing analytics | |
CN103631847A (zh) | 基于上下文的搜索与图形节点相关的数据存储的方法和系统 | |
CA2816781C (en) | Identifying client states | |
US20150039984A1 (en) | Table format multi-dimensional data translation method and device | |
WO2017065891A1 (en) | Automated join detection | |
CN117409922A (zh) | 一种用于临床辅助决策的循证方法 | |
JP7324058B2 (ja) | 文章解析方法、文章解析プログラム、および文章解析システム | |
JP2007334412A (ja) | 検索プログラムおよび検索装置 | |
WO2015124086A1 (en) | Virus signature matching method and apparatus | |
JP6075013B2 (ja) | ログ取得プログラム、ログ取得装置及びログ取得方法 | |
US8302045B2 (en) | Electronic device and method for inspecting electrical rules of circuit boards | |
JP2019148859A (ja) | フローダイアグラムを用いたモデル開発環境におけるデザインパターンの発見を支援する装置および方法 | |
JP5826148B2 (ja) | 図面管理サーバ及びこれを用いた図面管理システム | |
JP5020274B2 (ja) | 意味ドリフトの発生評価方法及び装置 | |
KR20140123000A (ko) | 연상 메모리 내의 문맥적 결과를 식별하기 위한 시스템 및 방법 | |
JP2015094988A (ja) | データ構造、データ生成装置、その方法及びプログラム | |
JP6375066B2 (ja) | 解析支援システム及び解析支援方法 | |
US11886459B2 (en) | Data management system and data management method | |
JP2013149068A (ja) | ファイル間の関連性の解析方法及びシステム並びにプログラム | |
JP7314089B2 (ja) | 検索支援システム、及び検索支援方法 | |
US11151158B2 (en) | Data duplication device and computer readable medium | |
JP7119411B2 (ja) | データベース装置、データ管理方法、及びコンピュータ・プログラム | |
JP2018055522A (ja) | 計装図データ生成装置、計装図検索システム及びプログラム | |
CN117688124A (zh) | 数据查询索引创建方法、装置、存储介质及电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13882005 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2015510993 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14782237 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13882005 Country of ref document: EP Kind code of ref document: A1 |