WO2015037498A1 - Digital information analysis system, digital information analysis method, and digital information analysis program - Google Patents

Digital information analysis system, digital information analysis method, and digital information analysis program Download PDF

Info

Publication number
WO2015037498A1
WO2015037498A1 PCT/JP2014/073260 JP2014073260W WO2015037498A1 WO 2015037498 A1 WO2015037498 A1 WO 2015037498A1 JP 2014073260 W JP2014073260 W JP 2014073260W WO 2015037498 A1 WO2015037498 A1 WO 2015037498A1
Authority
WO
WIPO (PCT)
Prior art keywords
digital information
information
metadata
relationship
unit
Prior art date
Application number
PCT/JP2014/073260
Other languages
French (fr)
Japanese (ja)
Inventor
守本 正宏
秀樹 武田
和巳 蓮子
Original Assignee
株式会社Ubic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Ubic filed Critical 株式会社Ubic
Publication of WO2015037498A1 publication Critical patent/WO2015037498A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Definitions

  • the present invention relates to a digital information analysis system, a digital information analysis method, and a digital information analysis program.
  • the present invention relates to a digital information analysis system, a digital information analysis method, and a digital information analysis program that can improve the accuracy of analysis by further using metadata associated with digital information.
  • recorded digital information is displayed, and for each of a plurality of document files, user identification information indicating which of the users included in the user information is related is set and set.
  • the specified user identification information is set to be recorded in the storage unit, at least one user is specified, and a document file in which the user identification information corresponding to the specified user is set is searched and displayed.
  • a system that sets incidental information indicating whether or not a retrieved document file is related to a lawsuit via the department, and outputs a document file related to a lawsuit based on the supplementary information.
  • Patent Document 1 According to the system described in Patent Document 1, it is possible to extract only digital document information related to a specific person, and to reduce the work load for preparing legal evidence material.
  • an object of the present invention is to provide a digital information analysis system, a digital information analysis method, and a digital information analysis program that can improve the accuracy of analysis using metadata.
  • the present invention has a relationship between a metadata acquisition unit that acquires metadata associated with digital information stored in an information processing apparatus, and metadata and predetermined specific items. Based on the relationship with the first digital information, the predetermined morpheme has a predetermined morpheme for the second digital information that has no relationship between the weighting information of the predetermined morpheme for the first digital information and a predetermined specific item. Update the association between the predetermined morpheme and the digital information by using the first updating unit that updates the weighting parameter set in association with the weighting information and the weighting parameter set updated by the first updating unit.
  • a digital information analysis system including a second update unit.
  • the digital information analysis system further includes a parameter set acquisition unit that acquires a weighting parameter set, and a relationship information storage unit that stores relationship information indicating a relationship between the metadata and the first digital information.
  • the digital information is target digital information to be investigated that is stored in the information processing apparatus, and the metadata acquisition unit acquires at least one metadata among a plurality of metadata associated with the target digital information.
  • the first update unit updates the weighting parameter set based on the relationship information stored in the relationship information storage unit in association with the metadata acquired by the metadata acquisition unit, and the second The updating unit uses the weighting parameter set to update the strength information indicating the strength of the relationship between the predetermined morpheme and the target digital information. It is also possible to.
  • the weighting information of the predetermined morpheme may include appearance frequency information indicating a frequency at which the predetermined morpheme appears in the first digital information or the second digital information.
  • the metadata may be structural metadata or descriptive metadata.
  • the parameter set acquisition unit acquires scoring parameters including intensity information previously associated with each of the plurality of morphemes
  • the first update unit acquires the metadata acquisition unit.
  • the scoring parameter acquired by the parameter set acquisition unit is updated based on the relationship information stored in the relationship information storage unit in association with the metadata, and the second update unit is updated by the first update unit.
  • the strength information can be updated using the updated scoring parameters.
  • a relevance determination unit that determines relevance with a predetermined specific item of the target digital information based on a result of morphological analysis of the target digital information and strength information, and a relevance determination A determination result setting unit that associates the determination result of each unit with the target digital information.
  • information indicating that a predetermined specific item is related to a lawsuit may be used.
  • the present invention predetermines a metadata acquisition stage for acquiring metadata associated with digital information stored in the information processing apparatus in predetermined morphemes, and metadata. Based on the relationship between the specific information and the first digital information having a relationship, the predetermined morpheme has no relationship between the predetermined morpheme weighting information for the first digital information and the predetermined specific item.
  • a digital information analysis method is provided comprising a second update stage for updating the relevance with the digital information.
  • the present invention is a digital information analysis program, a metadata acquisition function for acquiring metadata associated with digital information stored in an information processing apparatus in a computer, Based on the relationship between the metadata and the first digital information having a relationship with a predetermined specific item, the predetermined morpheme weight information on the first digital information and the predetermined specific item are A first updating function for updating a weighting parameter set in which weighting information of a predetermined morpheme is associated with second digital information having no relation; and a weighting parameter set updated by the first updating function A digital information analysis program that implements a second update function that updates the relationship between a predetermined morpheme and digital information. Lamb is provided.
  • the digital information analysis system According to the digital information analysis system, the digital information analysis method, and the digital information analysis program according to the present invention, the digital information analysis system, the digital information analysis method, and the digital information that can improve the accuracy of analysis using metadata.
  • An analysis program can be provided.
  • the digital information analysis system 1 automatically outputs digital information related to a predetermined specific item from a plurality of digital information stored in an information processing apparatus 2 such as a user terminal or a server.
  • the extraction system is a system that automatically corrects or updates the weight of a morpheme indicating the strength of association between a morpheme and target digital information by using metadata attached to the digital information.
  • Metadata is supplementary information, additional information, supplementary information, properties, etc. of documents, and is information added to input information (that is, keywords, documents, etc.), morpheme information of input information, etc. And / or information for further processing the input information or the processing result of the input information.
  • the predetermined specific item is information indicating that it is related to a lawsuit, for example.
  • the digital information analysis system 1 is an electronic device required for investigating and investigating the cause of a crime or dispute when a crime or legal dispute related to a computer such as unauthorized access or leakage of confidential information occurs. It can be applied to forensics, a technology that collects and analyzes digital information, which is a historical record, and reveals its legal evidence.
  • the predetermined specific item may be information indicating, for example, a case, a target, a situation, an action, or the like, in which the vehicle driver is considered to exhibit the information processing ability.
  • the digital information analysis system 1 is configured such that the driver operates the vehicle (e.g., avoids collision with an obstacle (for example, a pedestrian, a guardrail, another vehicle, etc.), enters a garage, and lanes. It can be applied to a driving assist system that assists with changes, merges and departures from highways.
  • the predetermined specific items include information indicating symptoms, diseases, diseases, syndromes, etc. diagnosed by a doctor as being unhealthy (a state in which a person's heart or body is unwell or inconvenient), for example. It may be.
  • the digital information analysis system 1 is a health related to a predetermined symptom from a plurality of healthcare data acquired from structured healthcare data and / or unstructured healthcare data. It can be applied to a medical diagnostic system that extracts care data and enables predictive diagnosis of diseases.
  • the server is one or more servers, and may be configured to include a plurality of servers.
  • the server includes a server capable of storing digital information such as a mail server, a file server, or a document management server.
  • the user terminal is one or more user terminals, and can be configured to include a plurality of user terminals.
  • the user terminal includes a personal computer, a notebook computer, a tablet PC, or a mobile communication terminal such as a mobile phone.
  • FIG. 1 shows an example of functional configuration blocks of the digital information analysis system according to the present embodiment.
  • the digital information analysis system 1 receives morpheme weighting information for digital information related to predetermined specific items and morpheme weighting information for digital information not related to predetermined specific items.
  • a parameter set storage unit 10 that stores a weighting parameter set associated with each of a plurality of morphemes, and a parameter set acquisition unit 14 that acquires a weighting parameter set from information input to the parameter set storage unit 10 or the input unit 12
  • a metadata acquisition unit 16 that acquires metadata associated with the digital information stored in the information processing apparatus 2, and relationship information that stores relationship information indicating a relationship between the metadata and the digital information
  • a storage unit 18 that stores morpheme weighting information for digital information related to predetermined specific items and morpheme weighting information for digital information not related to predetermined specific items.
  • the digital information analysis system 1 updates the first update unit 20 that updates the weighting parameter set, and the strength information that indicates the strength of the relationship between the morpheme and the digital information (that is, the “weight” of the morpheme).
  • a determination result setting unit that associates the determination result of the unit with the digital information; and an output unit that outputs the setting result of the determination result setting unit to the outside.
  • the output unit 28 is a display device such as a display that can display digital information and / or an output device such as a printer that outputs the digital information to a predetermined medium. Furthermore, the output unit 28 can output the information to be output by recording the information on a recording medium such as a magnetic recording medium or an optical recording medium.
  • the information processing apparatus 2 includes a digital information storage unit that stores a plurality of digital information and an information output unit that outputs the digital information to the outside.
  • the digital information storage unit stores a plurality of digital information such as a document file including text information, a text file, or an e-mail.
  • the digital information storage unit supplies metadata of predetermined digital information to the metadata acquisition unit 16 in response to an action from the metadata acquisition unit 16.
  • the digital information analysis system 1 and the information processing apparatus 2 are connected to be communicable with each other via a communication network such as the Internet or a wired or wireless network such as a LAN.
  • the digital information analysis system 1 can also include some or all of the functions and configurations of the information processing apparatus 2.
  • the parameter set storage unit 10 relates the predetermined morpheme weighting information with respect to the first digital information having a relationship with a predetermined specific item to a predetermined morpheme (for example, a keyword) and the predetermined specific item.
  • a weighting parameter set that associates weighting information of a predetermined morpheme with respect to the second digital information that does not have the is stored. That is, the parameter set storage unit 10 associates, with one morpheme, the first weighting information for the first digital information of the one morpheme and the second weighting information for the second digital information of the one morpheme.
  • the one morpheme, the first weighting information, and the second weighting information are stored as a weighting parameter set.
  • the first digital information is referred to as a “HOT document” that is a document having a relationship with a predetermined specific item
  • the second digital information is related to a predetermined specific item. It may be referred to as a “Non-HOT document”, which is a document that does not have a document.
  • the weighting information can include, for example, appearance frequency information indicating the frequency with which a predetermined morpheme appears in the digital information.
  • the morpheme “confidential” exists as a keyword.
  • the frequency (document frequency) at which the morpheme appears in a document (HOT document) related to a predetermined specific item (for example, fraud etc.) is significantly higher than the frequency at which the morpheme appears in a Non-HOT document.
  • the “weight” of the morpheme representing the relevance with the HOT document is such that the weight of the morpheme for the HOT document is larger than the weight of the morpheme for the Non-HOT document. The larger the “weight” of a morpheme, the more morpheme appears in the HOT document.
  • the parameter set storage unit 10 supplies the weighting parameter set to the parameter set acquisition unit 14 in response to the action from the parameter set acquisition unit 14.
  • the parameter set acquisition unit 14 acquires a weighting parameter set stored in the parameter set storage unit 10 or a weighting parameter set input to the input unit 12.
  • the parameter set acquisition unit 14 sets the value of Nhot / Nnon-hot when the number of occurrences of one morpheme in the HOT document is Nhot and the number of occurrences of the one morpheme in the Non-HOT document is Nnon-hot. It can also have a function to calculate.
  • the parameter set acquisition unit 14 can also acquire a scoring parameter including intensity information associated with each of a plurality of morphemes in advance.
  • the scoring parameter is a weight of a plurality of morphemes. Therefore, the parameter set acquisition unit 14 acquires information indicating the weight associated with each of the plurality of morphemes.
  • the parameter set storage unit 10 can store scoring parameters. The parameter set acquisition unit 14 supplies information indicating the acquired weighting parameter set, scoring parameters, and / or calculation results to the first update unit 20.
  • the metadata acquisition unit 16 acquires at least one piece of metadata among a plurality of pieces of metadata associated with the target digital information that is the investigation target stored in the information processing apparatus 2.
  • the metadata is structural metadata or descriptive metadata.
  • metadata includes email headers (information indicating sender / receiver attributes, information indicating transmission time, information indicating reception time, information indicating TO / CC / BCC, etc.) and document properties (file creation time).
  • the metadata is information that identifies the sender of the electronic mail (that is, identification information that uniquely identifies the sender (sender identification information)).
  • the metadata acquisition unit 16 supplies the acquired metadata to the first update unit 20.
  • the relationship information storage unit 18 stores relationship information indicating the relationship between the metadata and the first digital information. For example, a case where the metadata is e-mail sender identification information will be described. In this case, the relationship is determined based on information indicating by which recipient the sender identified by the single sender identification information has sent a large number of emails corresponding to the HOT document. That is, when one sender sends more e-mails corresponding to the HOT document to a specific recipient (or a plurality of specific recipients) among a plurality of recipients than other recipients, the relationship information The relationship shown by is “high”.
  • the relationship information storage unit 18 stores the relationship information in association with each of a plurality of metadata.
  • the relationship information storage unit 18 supplies the relationship information to the first update unit 20.
  • the first update unit 20 is weighted by the parameter set acquisition unit 14 based on the relationship information stored in the relationship information storage unit 18 in association with the metadata acquired by the metadata acquisition unit 16. Update the parameter set.
  • the first updating unit 20 updates the weighting parameter set based on whether the metadata and the predetermined morpheme are related to the HOT document. As an example, when the morpheme “confidential” exists, the first updating unit 20 transmits the first sender, which is grasped from the metadata (that is, the property included in the metadata), to other receivers. If the e-mail is more relevant to the HOT document than the e-mail sent from the second sender to other recipients, the weighting parameter set is corrected or updated using information such as the document frequency.
  • the first update unit 20 is acquired by the parameter set acquisition unit 14 based on the relationship information stored in the relationship information storage unit 18 in association with the metadata acquired by the metadata acquisition unit 16. You can also update the scoring parameters. Furthermore, when the metadata includes a plurality of properties, the first update unit 20 corrects or updates the weighting parameter set and / or the scoring parameter using different patterns depending on the presence or absence of each property. You can also.
  • the first update unit 20 determines whether a property is included in the document. Then, the first updating unit 20 can calculate the number of HOT documents having properties and the number of HOT documents having no properties for each of a plurality of properties. The first updating unit 20 can also modify or update the weighting parameter set or scoring parameter in the same manner as described above based on the presence or absence of each of the plurality of properties.
  • the first update unit 20 when there are a plurality of HOT documents regardless of the presence or absence of properties, the first update unit 20 divides them into two groups of HOT documents having properties and HOT documents having no properties. It can also be managed. Similarly, when there are a plurality of non-HOT documents, the first updating unit 20 manages the non-HOT document by dividing it into two groups of a non-HOT document having properties and a non-HOT document having no properties. You can also. The first updating unit 20 supplies the updated weighting parameter set and / or scoring parameter to the second updating unit 22.
  • the second updating unit 22 uses the weighting parameter set updated by the first updating unit 20 and uses the same calculation method as that of the first updating unit 20 to check the relationship between the predetermined morpheme and the target digital information. Update the intensity information indicating the intensity.
  • the second updating unit 22 can also update the strength information using the scoring parameter updated by the first updating unit 20. For example, based on the number of HOT documents having properties, the number of HOT documents having no properties, the number of Non-HOT documents having properties, and the number of Non-HOT documents having no properties, Update process can be executed.
  • the second updating unit 22 can execute the updating process using the updated scoring parameter.
  • the second update unit 22 supplies the updated information to the relevance determination unit 24.
  • the relevance determination unit 24 determines relevance with a predetermined specific item of the target digital information based on the result of the morphological analysis of the target digital information and the strength information.
  • the relevance determination unit 24 supplies information indicating the determination result to the determination result setting unit 26.
  • the determination result setting unit 26 associates the determination result of the relevance determination unit 24 with the target digital information.
  • the determination result setting unit 26 supplies the target digital information associated with the determination result to the output unit 28.
  • FIG. 2 shows an example of the flow of processing in the digital information analysis system according to the embodiment of the present invention.
  • the parameter set acquisition unit 14 acquires a weighted parameter set from information input to the parameter set storage unit 10 or the input unit 12 (step 10; hereinafter, “step” is expressed as “S”).
  • the parameter set acquisition unit 14 supplies the acquired weighting parameter set to the first update unit 20.
  • the metadata acquisition unit 16 acquires the metadata of the target digital information from the information processing apparatus 2 (S15).
  • the metadata acquisition unit 16 supplies the acquired metadata to the first update unit 20.
  • the first update unit 20 determines the weighting parameter set received from the parameter set acquisition unit 14. Correct or update (S20).
  • the first updating unit 20 corrects or updates and supplies the weighting parameter set to the second updating unit 22.
  • the second updating unit 22 uses the weighting parameter set received from the first updating unit 20 to update the strength information indicating the strength of relevance between the predetermined morpheme and the target digital information (S25).
  • the second update unit 22 supplies the updated strength information to the relevance determination unit 24.
  • the relevance determination unit 24 determines relevance between the target digital information and a predetermined specific item using the updated strength information and the result of the morphological analysis of the target digital information (S30).
  • the relevance determination unit 24 supplies information indicating the determination result to the determination result setting unit 26.
  • the determination result setting unit 26 associates information indicating the determination result with the target digital information.
  • the output unit 28 outputs the target digital information associated with the information indicating the determination result to the outside.
  • FIG. 3 shows an example of the hardware configuration of the digital information analysis system according to the embodiment of the present invention.
  • Digital information analysis system 1 stores CPU 1500, graphic controller 1520, memory 1530 such as Random Access Memory (RAM), Read-Only Memory (ROM), and / or flash ROM, and data.
  • CPU 1500, graphic controller 1520, memory 1530, storage device 1540, read / write device 1545, input device 1560, and communication interface 1550 are communicably connected to each other. And a Tsu door 1510.
  • the chip set 1510 includes a memory 1530, a CPU 1500 that accesses the memory 1530 and executes predetermined processing, and a graphic controller 1520 that controls display on an external display device. Perform data passing.
  • the CPU 1500 operates based on a program stored in the memory 1530 and controls each component.
  • the graphic controller 1520 displays an image on a predetermined display device based on the image data temporarily stored on the buffer provided in the memory 1530.
  • the chipset 1510 connects a storage device 1540, a read / write device 1545, and a communication interface 1550.
  • the storage device 1540 stores programs and data used by the CPU 1500 of the digital information analysis system 1.
  • the storage device 1540 is, for example, a flash memory.
  • the read / write device 1545 reads the program and / or data from the storage medium storing the program and / or data, and stores the read program and / or data in the storage device 1540.
  • the read / write device 1545 acquires a predetermined program from a server on the Internet via the communication interface 1550 and stores the acquired program in the storage device 1540.
  • the communication interface 1550 executes data transmission / reception with an external device via a communication network. Further, when the communication network is disconnected, the communication interface 1550 can execute data transmission / reception with an external device without going through the communication network.
  • An input device 1560 such as a keyboard, a tablet, or a mouse is connected to the chip set 1510 via a predetermined interface.
  • the digital information analysis program for the digital information analysis system 1 stored in the storage device 1540 is provided to the storage device 1540 via a communication network such as the Internet or a recording medium such as a magnetic recording medium or an optical recording medium.
  • the digital information analysis program for the digital information analysis system 1 stored in the storage device 1540 is executed by the CPU 1500.
  • the digital information analysis program executed by the digital information analysis system 1 operates on the CPU 1500 to change the digital information analysis system 1 to the parameter set storage unit 10 and the input unit 12 described with reference to FIGS. , Parameter set acquisition unit 14, metadata acquisition unit 16, relationship information storage unit 18, first update unit 20, second update unit 22, relevance determination unit 24, determination result setting unit 26, and output unit 28 To function as.
  • the digital information analysis system 1 refers to a document metadata and is a document corresponding to a HOT document having (or not having) a predetermined property, or having (or having) a predetermined property. No) Since a plurality of document files can be subdivided according to a predetermined criterion such as whether the document is equivalent to a Non-HOT document, the weight of the morpheme can be corrected or updated, so that a highly accurate forensic system can be provided. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

 The present invention is provided with: a metadata acquisition unit for acquiring metadata associated with digital information; a first updating unit for updating, for a prescribed morpheme and on the basis of the relationship between the metadata and first digital information having a relation with predetermined specific matter, a weighting parameter set in which weighting information of the prescribed morpheme for the first digital information and weighting information of the prescribed morpheme for second digital information not having a relation with the predetermined specific matter are corresponded to each other; and a second updating unit for updating the correlation of the prescribed morpheme with the digital information by using the weighting parameter set updated by the first updating unit.

Description

デジタル情報分析システム、デジタル情報分析方法、及びデジタル情報分析プログラムDigital information analysis system, digital information analysis method, and digital information analysis program
 本発明は、デジタル情報分析システム、デジタル情報分析方法、及びデジタル情報分析プログラムに関する。特に、本発明は、デジタル情報に付随するメタデータを更に用いることで分析の精度を向上させることができるデジタル情報分析システム、デジタル情報分析方法、及びデジタル情報分析プログラムに関する。 The present invention relates to a digital information analysis system, a digital information analysis method, and a digital information analysis program. In particular, the present invention relates to a digital information analysis system, a digital information analysis method, and a digital information analysis program that can improve the accuracy of analysis by further using metadata associated with digital information.
 従来、記録されたデジタル情報を表示し、複数の文書ファイルごとに、利用者情報に含まれる利用者のうちいずれの利用者に関連するものであるかを示す利用者特定情報を設定し、設定された利用者特定情報を記憶部に記録するように設定し、少なくとも一人以上の利用者を指定し、指定された利用者に対応する利用者特定情報が設定された文書ファイルを検索し、表示部を介して、検索された文書ファイルが、訴訟に関連するものであるか否かを示す付帯情報を設定し、付帯情報に基づき、訴訟に関連する文書ファイルを出力するシステムが知られている(例えば、特許文献1参照。)。特許文献1に記載のシステムによれば、特定の者に関係するデジタル文書情報のみを抽出し、訴訟の証拠資料作成のための作業負荷の軽減を図ることができる。 Conventionally, recorded digital information is displayed, and for each of a plurality of document files, user identification information indicating which of the users included in the user information is related is set and set. The specified user identification information is set to be recorded in the storage unit, at least one user is specified, and a document file in which the user identification information corresponding to the specified user is set is searched and displayed. There is known a system that sets incidental information indicating whether or not a retrieved document file is related to a lawsuit via the department, and outputs a document file related to a lawsuit based on the supplementary information. (For example, refer to Patent Document 1). According to the system described in Patent Document 1, it is possible to extract only digital document information related to a specific person, and to reduce the work load for preparing legal evidence material.
特開2012-181851号公報JP 2012-181851 A
 特許文献1に記載されているようなシステムにおいて、文書ファイルの形態素解析によって得られる結果だけではなく、当該文書ファイルのメタデータを用いることで、分析の精度を向上させることができると共に、更なる作業負荷の軽減を図ることができると考えられる。 In the system as described in Patent Document 1, not only the result obtained by the morphological analysis of the document file, but also the accuracy of the analysis can be improved by using the metadata of the document file, and further It is thought that the workload can be reduced.
 したがって、本発明の目的は、メタデータを用いて分析の精度を向上させることができるデジタル情報分析システム、デジタル情報分析方法、及びデジタル情報分析プログラムを提供することにある。 Therefore, an object of the present invention is to provide a digital information analysis system, a digital information analysis method, and a digital information analysis program that can improve the accuracy of analysis using metadata.
 本発明は、上記目的を達成するため、情報処理装置に格納されているデジタル情報に関連付けられているメタデータを取得するメタデータ取得部と、メタデータと予め定められた特定事項と関係を有する第1デジタル情報との関係に基づいて、所定の形態素に、第1デジタル情報に対する所定の形態素の重みづけ情報と予め定められた特定事項とは関係を有さない第2デジタル情報に対する所定の形態素の重みづけ情報とを対応づけた重みづけパラメーターセットを更新する第1の更新部と、第1の更新部が更新した重みづけパラメーターセットを用い、所定の形態素とデジタル情報との関連性を更新する第2の更新部とを備えるデジタル情報分析システムが提供される。 In order to achieve the above object, the present invention has a relationship between a metadata acquisition unit that acquires metadata associated with digital information stored in an information processing apparatus, and metadata and predetermined specific items. Based on the relationship with the first digital information, the predetermined morpheme has a predetermined morpheme for the second digital information that has no relationship between the weighting information of the predetermined morpheme for the first digital information and a predetermined specific item. Update the association between the predetermined morpheme and the digital information by using the first updating unit that updates the weighting parameter set in association with the weighting information and the weighting parameter set updated by the first updating unit. There is provided a digital information analysis system including a second update unit.
 また、上記デジタル情報分析システムにおいて、重みづけパラメーターセットを取得するパラメーターセット取得部と、メタデータと第1デジタル情報との関係性を示す関係性情報を格納する関係性情報格納部とを更に備え、デジタル情報が、情報処理装置に格納されている調査対象である対象デジタル情報であり、メタデータ取得部が、対象デジタル情報に関連付けられている複数のメタデータのうち少なくとも1つのメタデータを取得し、第1の更新部が、メタデータ取得部が取得したメタデータに対応づけて関係性情報格納部が格納している関係性情報に基づいて、重みづけパラメーターセットを更新し、第2の更新部が、重みづけパラメーターセットを用い、所定の形態素と対象デジタル情報との関連性の強度を示す強度情報を更新することもできる。 The digital information analysis system further includes a parameter set acquisition unit that acquires a weighting parameter set, and a relationship information storage unit that stores relationship information indicating a relationship between the metadata and the first digital information. The digital information is target digital information to be investigated that is stored in the information processing apparatus, and the metadata acquisition unit acquires at least one metadata among a plurality of metadata associated with the target digital information Then, the first update unit updates the weighting parameter set based on the relationship information stored in the relationship information storage unit in association with the metadata acquired by the metadata acquisition unit, and the second The updating unit uses the weighting parameter set to update the strength information indicating the strength of the relationship between the predetermined morpheme and the target digital information. It is also possible to.
 また、上記デジタル情報分析システムにおいて、所定の形態素の重み付け情報が、第1デジタル情報又は第2デジタル情報内に所定の形態素が出現する頻度を示す出現頻度情報を含むこともできる。 Further, in the digital information analysis system, the weighting information of the predetermined morpheme may include appearance frequency information indicating a frequency at which the predetermined morpheme appears in the first digital information or the second digital information.
 また、上記デジタル情報分析システムにおいて、メタデータが、構造的メタデータ若しくは記述的メタデータであってもよい。 In the digital information analysis system, the metadata may be structural metadata or descriptive metadata.
 また、上記デジタル情報分析システムにおいて、パラメーターセット取得部が、複数の形態素のそれぞれに予め対応づけられた強度情報を含むスコアリングパラメーターを取得し、第1の更新部が、メタデータ取得部が取得したメタデータに対応づけて関係性情報格納部に格納されている関係性情報に基づいて、パラメーターセット取得部が取得したスコアリングパラメーターを更新し、第2の更新部が、第1の更新部が更新したスコアリングパラメーターを用い、強度情報を更新することもできる。 In the digital information analysis system, the parameter set acquisition unit acquires scoring parameters including intensity information previously associated with each of the plurality of morphemes, and the first update unit acquires the metadata acquisition unit. The scoring parameter acquired by the parameter set acquisition unit is updated based on the relationship information stored in the relationship information storage unit in association with the metadata, and the second update unit is updated by the first update unit. The strength information can be updated using the updated scoring parameters.
 また、上記デジタル情報分析システムにおいて、対象デジタル情報の形態素解析の結果と強度情報とに基づいて対象デジタル情報の予め定められた特定事項との関連性を判断する関連性判断部と、関連性判断部の判断結果を対象デジタル情報に対応づける判断結果設定部とを更に備えることもできる。 In the digital information analysis system, a relevance determination unit that determines relevance with a predetermined specific item of the target digital information based on a result of morphological analysis of the target digital information and strength information, and a relevance determination A determination result setting unit that associates the determination result of each unit with the target digital information.
 また、上記デジタル情報分析システムにおいて、予め定められた特定事項が、訴訟に関係することを示す情報であってもよい。 Further, in the digital information analysis system, information indicating that a predetermined specific item is related to a lawsuit may be used.
 また、本発明は、上記目的を達成するため、所定の形態素に、情報処理装置に格納されているデジタル情報に関連付けられているメタデータを取得するメタデータ取得段階と、メタデータと予め定められた特定事項と関係を有する第1デジタル情報との関係に基づいて、所定の形態素に、第1デジタル情報に対する所定の形態素の重みづけ情報と予め定められた特定事項とは関係を有さない第2デジタル情報に対する所定の形態素の重みづけ情報とを対応づけた重みづけパラメーターセットを更新する第1の更新段階と、第1の更新段階において更新された重みづけパラメーターセットを用い、所定の形態素とデジタル情報との関連性を更新する第2の更新段階とを備えるデジタル情報分析方法が提供される。 In addition, in order to achieve the above object, the present invention predetermines a metadata acquisition stage for acquiring metadata associated with digital information stored in the information processing apparatus in predetermined morphemes, and metadata. Based on the relationship between the specific information and the first digital information having a relationship, the predetermined morpheme has no relationship between the predetermined morpheme weighting information for the first digital information and the predetermined specific item. A first update stage for updating a weighting parameter set in which weighting information of a predetermined morpheme for two digital information is associated; and using the weighting parameter set updated in the first update stage, A digital information analysis method is provided comprising a second update stage for updating the relevance with the digital information.
 また、本発明は、上記目的を達成するため、デジタル情報分析プログラムであって、コンピュータに、情報処理装置に格納されているデジタル情報に関連付けられているメタデータを取得するメタデータ取得機能と、メタデータと予め定められた特定事項と関係を有する第1デジタル情報との関係に基づいて、所定の形態素に、第1デジタル情報に対する所定の形態素の重みづけ情報と予め定められた特定事項とは関係を有さない第2デジタル情報に対する所定の形態素の重みづけ情報とを対応づけた重みづけパラメーターセットを更新する第1の更新機能と、第1の更新機能において更新された重みづけパラメーターセットを用い、所定の形態素とデジタル情報との関連性を更新する第2の更新機能とを実現させるデジタル情報分析プログラムが提供される。 Further, in order to achieve the above object, the present invention is a digital information analysis program, a metadata acquisition function for acquiring metadata associated with digital information stored in an information processing apparatus in a computer, Based on the relationship between the metadata and the first digital information having a relationship with a predetermined specific item, the predetermined morpheme weight information on the first digital information and the predetermined specific item are A first updating function for updating a weighting parameter set in which weighting information of a predetermined morpheme is associated with second digital information having no relation; and a weighting parameter set updated by the first updating function A digital information analysis program that implements a second update function that updates the relationship between a predetermined morpheme and digital information. Lamb is provided.
 本発明に係るデジタル情報分析システム、デジタル情報分析方法、及びデジタル情報分析プログラムによれば、メタデータを用いて分析の精度を向上させることができるデジタル情報分析システム、デジタル情報分析方法、及びデジタル情報分析プログラムを提供できる。 According to the digital information analysis system, the digital information analysis method, and the digital information analysis program according to the present invention, the digital information analysis system, the digital information analysis method, and the digital information that can improve the accuracy of analysis using metadata. An analysis program can be provided.
本実施の形態に係るデジタル情報分析システムの機能構成ブロック図である。It is a functional block diagram of the digital information analysis system concerning this embodiment. 本発明の実施の形態に係るデジタル情報分析システムの処理のフロー図である。It is a flowchart of a process of the digital information analysis system which concerns on embodiment of this invention. 本実施の形態に係るデジタル情報分析システムのハードウェア構成図である。It is a hardware block diagram of the digital information analysis system which concerns on this Embodiment.
[実施の形態]
(デジタル情報分析システム1の概要)
 本実施の形態に係るデジタル情報分析システム1は、ユーザー端末若しくはサーバー等の情報処理装置2に格納されている複数のデジタル情報から予め定められた特定事項に関連性のあるデジタル情報を自動的に抽出するステムであって、デジタル情報に付随するメタデータを用い、形態素と目的とするデジタル情報との関連性の強度を示す形態素の重みを自動的に修正若しくは更新するシステムである。
[Embodiment]
(Outline of digital information analysis system 1)
The digital information analysis system 1 according to the present embodiment automatically outputs digital information related to a predetermined specific item from a plurality of digital information stored in an information processing apparatus 2 such as a user terminal or a server. The extraction system is a system that automatically corrects or updates the weight of a morpheme indicating the strength of association between a morpheme and target digital information by using metadata attached to the digital information.
 なお、本実施の形態においてメタデータは、ドキュメントの付帯情報、追加情報、補足情報、プロパティ等であって、入力情報(すなわち、キーワード、ドキュメント等)や入力情報の形態素情報等に追加される情報、及び/又は入力情報若しくは入力情報の処理結果を更に処理するための情報である。 In this embodiment, metadata is supplementary information, additional information, supplementary information, properties, etc. of documents, and is information added to input information (that is, keywords, documents, etc.), morpheme information of input information, etc. And / or information for further processing the input information or the processing result of the input information.
 また、予め定められた特定事項は、例えば、訴訟に関連することを示す情報である。そして、本実施の形態に係るデジタル情報分析システム1は、一例として、不正アクセスや機密情報漏洩等のコンピュータに関する犯罪や法的紛争が生じた場合に、犯罪や紛争の原因究明や捜査に要する電子的記録であるデジタル情報を収集及び分析し、その法的な証拠性を明らかにする技術であるフォレンジックに適用できる。 Also, the predetermined specific item is information indicating that it is related to a lawsuit, for example. The digital information analysis system 1 according to the present embodiment, for example, is an electronic device required for investigating and investigating the cause of a crime or dispute when a crime or legal dispute related to a computer such as unauthorized access or leakage of confidential information occurs. It can be applied to forensics, a technology that collects and analyzes digital information, which is a historical record, and reveals its legal evidence.
 さらに、予め定められた特定事項は、例えば、車両のドライバが情報処理能力を発揮すると考えられる事案、対象、状況、行動などを示す情報であってもよい。この場合、本実施の形態に係るデジタル情報分析システム1は、一例として、上記ドライバによる車両の運転(障害物(例えば、歩行者、ガードレール、他の車両など)との衝突回避、車庫入れ、車線変更、高速道路への合流・離脱など)をアシストする運転アシストシステムに適用できる。 Furthermore, the predetermined specific item may be information indicating, for example, a case, a target, a situation, an action, or the like, in which the vehicle driver is considered to exhibit the information processing ability. In this case, as an example, the digital information analysis system 1 according to the present embodiment is configured such that the driver operates the vehicle (e.g., avoids collision with an obstacle (for example, a pedestrian, a guardrail, another vehicle, etc.), enters a garage, and lanes. It can be applied to a driving assist system that assists with changes, merges and departures from highways.
 さらに、予め定められた特定事項は、例えば、医師によって、不健康な状態(人間の心や体に不調または不都合が生じた状態)であると診断された症状、疾病、疾患、症候群などを示す情報であってもよい。この場合、本実施の形態に係るデジタル情報分析システム1は、一例として、構造化ヘルスケアデータ及び/又は非構造化ヘルスケアデータから取得された複数のヘルスケアデータから所定の症状と関係するヘルスケアデータを抽出し、病気の予測診断を可能とする医療診断システムに適用できる。 Furthermore, the predetermined specific items include information indicating symptoms, diseases, diseases, syndromes, etc. diagnosed by a doctor as being unhealthy (a state in which a person's heart or body is unwell or inconvenient), for example. It may be. In this case, the digital information analysis system 1 according to the present embodiment, for example, is a health related to a predetermined symptom from a plurality of healthcare data acquired from structured healthcare data and / or unstructured healthcare data. It can be applied to a medical diagnostic system that extracts care data and enables predictive diagnosis of diseases.
 なお、本実施の形態においてサーバーは、1つ以上のサーバーであって、複数のサーバーを含んで構成することもできる。例えば、サーバーは、メールサーバー、ファイルサーバー、又は文書管理サーバー等のデジタル情報を格納可能なサーバーを含む。また、ユーザー端末は、1つ以上のユーザー端末であって、複数のユーザー端末を含んで構成することもできる。例えば、ユーザー端末は、パーソナルコンピュータ、ノートパソコン、タブレットPC、又は携帯電話等の携帯通信端末等を含む。 In the present embodiment, the server is one or more servers, and may be configured to include a plurality of servers. For example, the server includes a server capable of storing digital information such as a mail server, a file server, or a document management server. Further, the user terminal is one or more user terminals, and can be configured to include a plurality of user terminals. For example, the user terminal includes a personal computer, a notebook computer, a tablet PC, or a mobile communication terminal such as a mobile phone.
(デジタル情報分析システム1の詳細)
 図1は、本実施の形態に係るデジタル情報分析システムの機能構成ブロックの一例を示す。
(Details of digital information analysis system 1)
FIG. 1 shows an example of functional configuration blocks of the digital information analysis system according to the present embodiment.
 本実施の形態に係るデジタル情報分析システム1は、予め定められた特定事項と関係するデジタル情報に対する形態素の重みづけ情報、及び予め定められた特定事項と関係しないデジタル情報に対する形態素の重みづけ情報を複数の形態素のそれぞれに対応づけた重みづけパラメーターセットを格納するパラメーターセット格納部10と、パラメーターセット格納部10若しくは入力部12に入力された情報から重みづけパラメーターセットを取得するパラメーターセット取得部14と、情報処理装置2に格納されているデジタル情報に関連付けられているメタデータを取得するメタデータ取得部16と、メタデータとデジタル情報との関係性を示す関係性情報を格納する関係性情報格納部18とを備える。 The digital information analysis system 1 according to the present embodiment receives morpheme weighting information for digital information related to predetermined specific items and morpheme weighting information for digital information not related to predetermined specific items. A parameter set storage unit 10 that stores a weighting parameter set associated with each of a plurality of morphemes, and a parameter set acquisition unit 14 that acquires a weighting parameter set from information input to the parameter set storage unit 10 or the input unit 12 A metadata acquisition unit 16 that acquires metadata associated with the digital information stored in the information processing apparatus 2, and relationship information that stores relationship information indicating a relationship between the metadata and the digital information And a storage unit 18.
 また、デジタル情報分析システム1は、重みづけパラメーターセットを更新する第1の更新部20と、形態素とデジタル情報との関連性の強度を示す強度情報(すなわち、形態素の「重み」)を更新する第2の更新部22と、検査対象となるデジタル情報の形態素解析の結果に基づいて、予め定められた特定事項と当該デジタル情報との関連性を判断する関連性判断部24と、関連性判断部24の判断結果を当該デジタル情報に対応づける判断結果設定部26と、判断結果設定部26の設定結果を外部に出力する出力部28とを備える。 Also, the digital information analysis system 1 updates the first update unit 20 that updates the weighting parameter set, and the strength information that indicates the strength of the relationship between the morpheme and the digital information (that is, the “weight” of the morpheme). A second updating unit 22; a relevance determining unit 24 that determines a relevance between a predetermined specific item and the digital information based on a result of morphological analysis of digital information to be inspected; A determination result setting unit that associates the determination result of the unit with the digital information; and an output unit that outputs the setting result of the determination result setting unit to the outside.
 なお、出力部28は、デジタル情報を表示可能なディスプレイ等の表示装置、及び/又はデジタル情報を所定の媒体に出力するプリンター等の出力装置である。更に、出力部28は、出力する情報を磁気記録媒体、光学記録媒体等の記録媒体に記録することで出力することもできる。 The output unit 28 is a display device such as a display that can display digital information and / or an output device such as a printer that outputs the digital information to a predetermined medium. Furthermore, the output unit 28 can output the information to be output by recording the information on a recording medium such as a magnetic recording medium or an optical recording medium.
(情報処理装置2)
 情報処理装置2は、複数のデジタル情報を格納するデジタル情報格納部と、デジタル情報を外部に出力する情報出力部とを有する。デジタル情報格納部は、文章情報を含む文書ファイル、テキストファイル、又は電子メール等の複数のデジタル情報を格納する。デジタル情報格納部は、メタデータ取得部16からの働きかけに応じ、所定のデジタル情報のメタデータをメタデータ取得部16に供給する。なお、デジタル情報分析システム1と情報処理装置2とは、インターネット等の通信ネットワーク、又はLAN等の有線若しくは無線のネットワーク等により相互に通信可能に接続される。また、デジタル情報分析システム1は、情報処理装置2が有する機能及び構成の一部又は全部を備えることもできる。
(Information processing apparatus 2)
The information processing apparatus 2 includes a digital information storage unit that stores a plurality of digital information and an information output unit that outputs the digital information to the outside. The digital information storage unit stores a plurality of digital information such as a document file including text information, a text file, or an e-mail. The digital information storage unit supplies metadata of predetermined digital information to the metadata acquisition unit 16 in response to an action from the metadata acquisition unit 16. The digital information analysis system 1 and the information processing apparatus 2 are connected to be communicable with each other via a communication network such as the Internet or a wired or wireless network such as a LAN. The digital information analysis system 1 can also include some or all of the functions and configurations of the information processing apparatus 2.
(パラメーターセット格納部10)
 パラメーターセット格納部10は、所定の形態素(例えば、キーワード)に、予め定められた特定事項と関係を有する第1デジタル情報に対する所定の形態素の重みづけ情報と、予め定められた特定事項とは関係を有さない第2デジタル情報に対する所定の形態素の重みづけ情報とを対応づけた重みづけパラメーターセットを格納する。すなわち、パラメーターセット格納部10は、一の形態素に、当該一の形態素の第1デジタル情報に対する第1重みづけ情報と、当該一の形態素の第2デジタル情報に対する第2重みづけ情報とを対応づけて、当該一の形態素、第1重みづけ情報、及び第2重みづけ情報を重みづけパラメーターセットとして格納する。
(Parameter set storage unit 10)
The parameter set storage unit 10 relates the predetermined morpheme weighting information with respect to the first digital information having a relationship with a predetermined specific item to a predetermined morpheme (for example, a keyword) and the predetermined specific item. A weighting parameter set that associates weighting information of a predetermined morpheme with respect to the second digital information that does not have the is stored. That is, the parameter set storage unit 10 associates, with one morpheme, the first weighting information for the first digital information of the one morpheme and the second weighting information for the second digital information of the one morpheme. The one morpheme, the first weighting information, and the second weighting information are stored as a weighting parameter set.
 ここで、本実施の形態においては、第1デジタル情報を、予め定められた特定事項と関係を有するドキュメントである「HOTドキュメント」と称し、第2デジタル情報を、予め定められた特定事項と関係を有さないドキュメントである「Non-HOTドキュメント」と称することがある。また、重みづけ情報は、例えば、デジタル情報内に所定の形態素が出現する頻度を示す出現頻度情報を含むことができる。 Here, in the present embodiment, the first digital information is referred to as a “HOT document” that is a document having a relationship with a predetermined specific item, and the second digital information is related to a predetermined specific item. It may be referred to as a “Non-HOT document”, which is a document that does not have a document. Further, the weighting information can include, for example, appearance frequency information indicating the frequency with which a predetermined morpheme appears in the digital information.
 例えば、「内密」という形態素がキーワードとして存在すると仮定する。この場合、予め定められた特定事項(例えば、不正行為等)に関わる文書(HOTドキュメント)において当該形態素が出現する頻度(文書頻度)が、Non-HOTドキュメントにおいて当該形態素が出現する頻度よりも著しく多い場合、当該HOTドキュメントとの関連性を表す形態素の「重み」は、当該HOTドキュメントに対する当該形態素の重みの方が、Non-HOTドキュメントに対する当該形態素の重みよりも大きくなる。形態素の「重み」が大きい程、当該形態素がHOTドキュメントでより多く出現していることを示す。パラメーターセット格納部10は、パラメーターセット取得部14からの働きかけに応じ、重みづけパラメーターセットをパラメーターセット取得部14に供給する。 For example, assume that the morpheme “confidential” exists as a keyword. In this case, the frequency (document frequency) at which the morpheme appears in a document (HOT document) related to a predetermined specific item (for example, fraud etc.) is significantly higher than the frequency at which the morpheme appears in a Non-HOT document. In many cases, the “weight” of the morpheme representing the relevance with the HOT document is such that the weight of the morpheme for the HOT document is larger than the weight of the morpheme for the Non-HOT document. The larger the “weight” of a morpheme, the more morpheme appears in the HOT document. The parameter set storage unit 10 supplies the weighting parameter set to the parameter set acquisition unit 14 in response to the action from the parameter set acquisition unit 14.
(パラメーターセット取得部14)
 パラメーターセット取得部14は、パラメーターセット格納部10に格納されている重みづけパラメーターセット、若しくは入力部12に入力された重みづけパラメーターセットを取得する。なお、パラメーターセット取得部14は、HOTドキュメントにおける一の形態素の出現数をNhot、Non-HOTドキュメントにおける当該一の形態素の出現数をNnon-hotとした場合に、Nhot/Nnon-hotの値を計算する機能を有することもできる。
(Parameter set acquisition unit 14)
The parameter set acquisition unit 14 acquires a weighting parameter set stored in the parameter set storage unit 10 or a weighting parameter set input to the input unit 12. The parameter set acquisition unit 14 sets the value of Nhot / Nnon-hot when the number of occurrences of one morpheme in the HOT document is Nhot and the number of occurrences of the one morpheme in the Non-HOT document is Nnon-hot. It can also have a function to calculate.
 また、パラメーターセット取得部14は、複数の形態素のそれぞれに予め対応づけられた強度情報を含むスコアリングパラメーターを取得することもできる。なお、本実施の形態において、スコアリングパラメーターとは、複数の形態素の重みである。したがって、パラメーターセット取得部14は、複数の形態素のそれぞれに対応づけられた重みを示す情報を取得することになる。なお、この場合、パラメーターセット格納部10は、スコアリングパラメーターを格納することができる。パラメーターセット取得部14は取得した重みづけパラメーターセット、スコアリングパラメーター、及び/又は計算結果を示す情報を第1の更新部20に供給する。 The parameter set acquisition unit 14 can also acquire a scoring parameter including intensity information associated with each of a plurality of morphemes in advance. In the present embodiment, the scoring parameter is a weight of a plurality of morphemes. Therefore, the parameter set acquisition unit 14 acquires information indicating the weight associated with each of the plurality of morphemes. In this case, the parameter set storage unit 10 can store scoring parameters. The parameter set acquisition unit 14 supplies information indicating the acquired weighting parameter set, scoring parameters, and / or calculation results to the first update unit 20.
(メタデータ取得部16)
 メタデータ取得部16は、情報処理装置2に格納されている調査対象である対象デジタル情報に関連付けられている複数のメタデータのうち少なくとも1つのメタデータを取得する。具体的に、メタデータは、構造的メタデータ若しくは記述的メタデータである。例えば、メタデータは、電子メールのメールヘッダー(送受信者の属性を示す情報、送信時刻を示す情報、受信時刻を示す情報、TO/CC/BCCを示す情報等)やドキュメントのプロパティ(ファイル作成時刻を示す情報、ファイル作成者を示す情報、ファイルが作成された情報処理端末を示す情報、更新時刻情報等)等である。一例として、メタデータは、対象デジタル情報が電子メールの場合、電子メールの送信者を特定する情報(すなわち、送信者を一意に識別する識別情報(送信者識別情報))である。メタデータ取得部16は、取得したメタデータを第1の更新部20に供給する。
(Metadata acquisition unit 16)
The metadata acquisition unit 16 acquires at least one piece of metadata among a plurality of pieces of metadata associated with the target digital information that is the investigation target stored in the information processing apparatus 2. Specifically, the metadata is structural metadata or descriptive metadata. For example, metadata includes email headers (information indicating sender / receiver attributes, information indicating transmission time, information indicating reception time, information indicating TO / CC / BCC, etc.) and document properties (file creation time). Information indicating the file creator, information indicating the information processing terminal in which the file was created, update time information, and the like. As an example, when the target digital information is an electronic mail, the metadata is information that identifies the sender of the electronic mail (that is, identification information that uniquely identifies the sender (sender identification information)). The metadata acquisition unit 16 supplies the acquired metadata to the first update unit 20.
(関係性情報格納部18)
 関係性情報格納部18は、メタデータと第1デジタル情報との関係性を示す関係性情報を格納する。例えば、メタデータが、電子メールの送信者識別情報である場合を説明する。この場合、一の送信者識別情報で識別される送信者が、HOTドキュメントに相当する電子メールをいずれの受信者により多数送信したかを示す情報に基づいて、関係性が決定される。すなわち、一の送信者が複数の受信者のうち特定の受信者(若しくは複数の特定の受信者)にHOTドキュメントに相当する電子メールを他の受信者より多く送信していた場合、関係性情報が示す関係性は「高く」なる。関係性情報格納部18は、係る関係性情報を複数のメタデータのそれぞれに対応づけて格納する。関係性情報格納部18は、関係性情報を第1の更新部20に供給する。
(Relationship information storage unit 18)
The relationship information storage unit 18 stores relationship information indicating the relationship between the metadata and the first digital information. For example, a case where the metadata is e-mail sender identification information will be described. In this case, the relationship is determined based on information indicating by which recipient the sender identified by the single sender identification information has sent a large number of emails corresponding to the HOT document. That is, when one sender sends more e-mails corresponding to the HOT document to a specific recipient (or a plurality of specific recipients) among a plurality of recipients than other recipients, the relationship information The relationship shown by is “high”. The relationship information storage unit 18 stores the relationship information in association with each of a plurality of metadata. The relationship information storage unit 18 supplies the relationship information to the first update unit 20.
(第1の更新部20)
 第1の更新部20は、メタデータ取得部16が取得したメタデータに対応づけて関係性情報格納部18が格納している関係性情報に基づいて、パラメーターセット取得部14が取得した重みづけパラメーターセットを更新する。第1の更新部20は、メタデータと所定の形態素とがHOTドキュメントに関連しているか否かに基づいて、重みづけパラメーターセットを更新する。一例として、第1の更新部20は、「内密」という形態素が存在する場合、メタデータ(すなわち、メタデータに含まれるプロパティ)から把握される第1の送信者から他の受信者に送信した電子メールの方が、第2の送信者から他の受信者に送信した電子メールよりHOTドキュメントとの関連性が高い場合、文書頻度等の情報を用いて重みづけパラメーターセットを修正若しくは更新する。
(First update unit 20)
The first update unit 20 is weighted by the parameter set acquisition unit 14 based on the relationship information stored in the relationship information storage unit 18 in association with the metadata acquired by the metadata acquisition unit 16. Update the parameter set. The first updating unit 20 updates the weighting parameter set based on whether the metadata and the predetermined morpheme are related to the HOT document. As an example, when the morpheme “confidential” exists, the first updating unit 20 transmits the first sender, which is grasped from the metadata (that is, the property included in the metadata), to other receivers. If the e-mail is more relevant to the HOT document than the e-mail sent from the second sender to other recipients, the weighting parameter set is corrected or updated using information such as the document frequency.
 また、第1の更新部20は、メタデータ取得部16が取得したメタデータに対応づけて関係性情報格納部18に格納されている関係性情報に基づいて、パラメーターセット取得部14が取得したスコアリングパラメーターを更新することもできる。更に、第1の更新部20は、メタデータに複数のプロパティが含まれている場合、各プロパティの有無に応じて異なるパターンを用いて重みづけパラメーターセット及び/又はスコアリングパラメーターを修正若しくは更新することもできる。 The first update unit 20 is acquired by the parameter set acquisition unit 14 based on the relationship information stored in the relationship information storage unit 18 in association with the metadata acquired by the metadata acquisition unit 16. You can also update the scoring parameters. Furthermore, when the metadata includes a plurality of properties, the first update unit 20 corrects or updates the weighting parameter set and / or the scoring parameter using different patterns depending on the presence or absence of each property. You can also.
 例えば、一のメタデータにプロパティp1とプロパティp2とが含まれている場合であって、HOTドキュメントとNon-HOTドキュメントとが存在する場合を説明する。この場合、p1を有するHOTドキュメント、p2を有するHOTドキュメント、p1を有するNon-HOTドキュメント、p2を有するNon-HOTドキュメントが存在することが想定される。第1の更新部20は、例えば、ドキュメントにプロパティが含まれているか否かを判断する。そして、第1の更新部20は、複数のプロパティごとに、プロパティを有するHOTドキュメント数とプロパティを有さないHOTドキュメント数とを算出することができる。第1の更新部20は、重みづけパラメーターセット若しくはスコアリングパラメーターを、複数のプロパティそれぞれの有無に基づいて上記と同様に修正若しくは更新することもできる。 For example, a case where the property p1 and the property p2 are included in one metadata and a HOT document and a Non-HOT document exist will be described. In this case, it is assumed that there exists a HOT document having p1, a HOT document having p2, a Non-HOT document having p1, and a Non-HOT document having p2. For example, the first update unit 20 determines whether a property is included in the document. Then, the first updating unit 20 can calculate the number of HOT documents having properties and the number of HOT documents having no properties for each of a plurality of properties. The first updating unit 20 can also modify or update the weighting parameter set or scoring parameter in the same manner as described above based on the presence or absence of each of the plurality of properties.
 また、第1の更新部20は、一例として、プロパティの有無にかかわらずHOTドキュメントが複数、存在する場合、プロパティを有するHOTドキュメントとプロパティを有さないHOTドキュメントとの2つの群に分割して管理することもできる。同様に、第1の更新部20は、Non-HOTドキュメントが複数、存在する場合、プロパティを有するNon-HOTドキュメントとプロパティを有さないNon-HOTドキュメントとの2つの群に分割して管理することもできる。第1の更新部20は、更新された重みづけパラメーターセット及び/又はスコアリングパラメーターを第2の更新部22に供給する。 Further, as an example, when there are a plurality of HOT documents regardless of the presence or absence of properties, the first update unit 20 divides them into two groups of HOT documents having properties and HOT documents having no properties. It can also be managed. Similarly, when there are a plurality of non-HOT documents, the first updating unit 20 manages the non-HOT document by dividing it into two groups of a non-HOT document having properties and a non-HOT document having no properties. You can also. The first updating unit 20 supplies the updated weighting parameter set and / or scoring parameter to the second updating unit 22.
(第2の更新部22)
 第2の更新部22は、第1の更新部20が更新した重みづけパラメーターセットを用いると共に第1の更新部20と同様の計算方法を用い、所定の形態素と対象デジタル情報との関連性の強度を示す強度情報を更新する。また、第2の更新部22は、第1の更新部20が更新したスコアリングパラメーターを用い、強度情報を更新することもできる。例えば、第2の更新部22は、プロパティを有するHOTドキュメント数、プロパティを有さないHOTドキュメント数、プロパティを有するNon-HOTドキュメント数、及びプロパティを有さないNon-HOTドキュメント数に基づいて、更新処理を実行できる。
(Second update unit 22)
The second updating unit 22 uses the weighting parameter set updated by the first updating unit 20 and uses the same calculation method as that of the first updating unit 20 to check the relationship between the predetermined morpheme and the target digital information. Update the intensity information indicating the intensity. The second updating unit 22 can also update the strength information using the scoring parameter updated by the first updating unit 20. For example, based on the number of HOT documents having properties, the number of HOT documents having no properties, the number of Non-HOT documents having properties, and the number of Non-HOT documents having no properties, Update process can be executed.
 また、第2の更新部22は、第1の更新部20からスコアリングパラメーターを受け取った場合、更新後のスコアリングパラメーターを用いて更新処理を実行できる。第2の更新部22は、更新後の各情報を関連性判断部24に供給する。 In addition, when receiving the scoring parameter from the first updating unit 20, the second updating unit 22 can execute the updating process using the updated scoring parameter. The second update unit 22 supplies the updated information to the relevance determination unit 24.
(関連性判断部24、判断結果設定部26)
 関連性判断部24は、対象デジタル情報の形態素解析の結果と強度情報とに基づいて対象デジタル情報の予め定められた特定事項との関連性を判断する。関連性判断部24は、判断結果を示す情報を判断結果設定部26に供給する。そして、判断結果設定部26は、関連性判断部24の判断結果を対象デジタル情報に対応づける。判断結果設定部26は、判断結果を対応づけた対象デジタル情報を出力部28に供給する。
(Relevance determination unit 24, determination result setting unit 26)
The relevance determination unit 24 determines relevance with a predetermined specific item of the target digital information based on the result of the morphological analysis of the target digital information and the strength information. The relevance determination unit 24 supplies information indicating the determination result to the determination result setting unit 26. Then, the determination result setting unit 26 associates the determination result of the relevance determination unit 24 with the target digital information. The determination result setting unit 26 supplies the target digital information associated with the determination result to the output unit 28.
(デジタル情報分析方法の概要)
 図2は、本発明の実施の形態に係るデジタル情報分析システムにおける処理の流れの一例を示す。
(Outline of digital information analysis method)
FIG. 2 shows an example of the flow of processing in the digital information analysis system according to the embodiment of the present invention.
 まず、パラメーターセット取得部14が、パラメーターセット格納部10若しくは入力部12に入力された情報から重みづけパラメーターセットを取得する(ステップ10。以下「ステップ」を「S」と表す。)。パラメーターセット取得部14は、取得した重みづけパラメーターセットを第1の更新部20に供給する。 First, the parameter set acquisition unit 14 acquires a weighted parameter set from information input to the parameter set storage unit 10 or the input unit 12 (step 10; hereinafter, “step” is expressed as “S”). The parameter set acquisition unit 14 supplies the acquired weighting parameter set to the first update unit 20.
 一方、メタデータ取得部16は、情報処理装置2から対象デジタル情報のメタデータを取得する(S15)。メタデータ取得部16は、取得したメタデータを第1の更新部20に供給する。第1の更新部20は、メタデータ取得部16から受け取ったメタデータ及び関係性情報格納部18が格納している関係性情報に基づいて、パラメーターセット取得部14から受け取った重みづけパラメーターセットを修正若しくは更新する(S20)。第1の更新部20は、修正若しくは更新して重みづけパラメーターセットを第2の更新部22に供給する。 On the other hand, the metadata acquisition unit 16 acquires the metadata of the target digital information from the information processing apparatus 2 (S15). The metadata acquisition unit 16 supplies the acquired metadata to the first update unit 20. Based on the metadata received from the metadata acquisition unit 16 and the relationship information stored in the relationship information storage unit 18, the first update unit 20 determines the weighting parameter set received from the parameter set acquisition unit 14. Correct or update (S20). The first updating unit 20 corrects or updates and supplies the weighting parameter set to the second updating unit 22.
 第2の更新部22は、第1の更新部20から受け取った重みづけパラメーターセットを用い、所定の形態素と対象デジタル情報との関連性の強度を示す強度情報を更新する(S25)。第2の更新部22は、更新した強度情報を関連性判断部24に供給する。関連性判断部24は、更新された強度情報、及び対象デジタル情報の形態素解析の結果を用い、対象デジタル情報と予め定められた特定事項との関連性を判断する(S30)。関連性判断部24は、判断結果を示す情報を判断結果設定部26に供給する。判断結果設定部26は、判断結果を示す情報を対象デジタル情報に対応づける。そして、出力部28は、判断結果を示す情報が対応づけられた対象デジタル情報を外部に出力する。 The second updating unit 22 uses the weighting parameter set received from the first updating unit 20 to update the strength information indicating the strength of relevance between the predetermined morpheme and the target digital information (S25). The second update unit 22 supplies the updated strength information to the relevance determination unit 24. The relevance determination unit 24 determines relevance between the target digital information and a predetermined specific item using the updated strength information and the result of the morphological analysis of the target digital information (S30). The relevance determination unit 24 supplies information indicating the determination result to the determination result setting unit 26. The determination result setting unit 26 associates information indicating the determination result with the target digital information. Then, the output unit 28 outputs the target digital information associated with the information indicating the determination result to the outside.
 図3は、本発明の実施の形態に係るデジタル情報分析システムのハードウェア構成の一例を示す。 FIG. 3 shows an example of the hardware configuration of the digital information analysis system according to the embodiment of the present invention.
 本実施の形態に係るデジタル情報分析システム1は、CPU1500と、グラフィックコントローラ1520と、Random Access Memory(RAM)、Read-Only Memory(ROM)及び/又はフラッシュROM等のメモリ1530と、データを記憶する記憶装置1540と、記録媒体からデータを読み込み及び/又は記録媒体にデータを書き込む読込/書込み装置1545と、データを入力する入力装置1560と、外部の通信機器とデータを送受信する通信インターフェース1550と、CPU1500とグラフィックコントローラ1520とメモリ1530と記憶装置1540と読込/書込み装置1545と入力装置1560と通信インターフェース1550とを互いに通信可能に接続するチップセット1510とを備える。 Digital information analysis system 1 according to the present embodiment stores CPU 1500, graphic controller 1520, memory 1530 such as Random Access Memory (RAM), Read-Only Memory (ROM), and / or flash ROM, and data. A storage device 1540; a read / write device 1545 that reads data from and / or writes data to a recording medium; an input device 1560 that inputs data; a communication interface 1550 that transmits and receives data to and from an external communication device; CPU 1500, graphic controller 1520, memory 1530, storage device 1540, read / write device 1545, input device 1560, and communication interface 1550 are communicably connected to each other. And a Tsu door 1510.
 チップセット1510は、メモリ1530と、メモリ1530にアクセスして所定の処理を実行するCPU1500と、外部の表示装置の表示を制御するグラフィックコントローラ1520とを相互に接続することにより、各構成要素間のデータの受渡しを実行する。CPU1500は、メモリ1530に格納されたプログラムに基づいて動作して、各構成要素を制御する。グラフィックコントローラ1520は、メモリ1530内に設けられたバッファ上に一時的に蓄えられた画像データに基づいて、画像を所定の表示装置に表示させる。 The chip set 1510 includes a memory 1530, a CPU 1500 that accesses the memory 1530 and executes predetermined processing, and a graphic controller 1520 that controls display on an external display device. Perform data passing. The CPU 1500 operates based on a program stored in the memory 1530 and controls each component. The graphic controller 1520 displays an image on a predetermined display device based on the image data temporarily stored on the buffer provided in the memory 1530.
 また、チップセット1510は、記憶装置1540と、読込/書込み装置1545と、通信インターフェース1550とを接続する。記憶装置1540は、デジタル情報分析システム1のCPU1500が使用するプログラムとデータとを格納する。記憶装置1540は、例えば、フラッシュメモリである。読込/書込み装置1545は、プログラム及び/又はデータを記憶している記憶媒体からプログラム及び/又はデータを読み取って、読み取ったプログラム及び/又はデータを記憶装置1540に格納する。読込/書込み装置1545は、例えば、通信インターフェース1550を介し、インターネット上のサーバーから所定のプログラムを取得して、取得したプログラムを記憶装置1540に格納する。 Also, the chipset 1510 connects a storage device 1540, a read / write device 1545, and a communication interface 1550. The storage device 1540 stores programs and data used by the CPU 1500 of the digital information analysis system 1. The storage device 1540 is, for example, a flash memory. The read / write device 1545 reads the program and / or data from the storage medium storing the program and / or data, and stores the read program and / or data in the storage device 1540. For example, the read / write device 1545 acquires a predetermined program from a server on the Internet via the communication interface 1550 and stores the acquired program in the storage device 1540.
 通信インターフェース1550は、通信ネットワークを介して外部の装置とデータの送受信を実行する。また、通信インターフェース1550は、通信ネットワークが不通の場合、通信ネットワークを介さずに外部の装置とデータの送受信を実行することもできる。そして、キーボード、タブレット、マウス等の入力装置1560は、所定のインターフェースを介してチップセット1510と接続する。 The communication interface 1550 executes data transmission / reception with an external device via a communication network. Further, when the communication network is disconnected, the communication interface 1550 can execute data transmission / reception with an external device without going through the communication network. An input device 1560 such as a keyboard, a tablet, or a mouse is connected to the chip set 1510 via a predetermined interface.
 記憶装置1540に格納されるデジタル情報分析システム1用のデジタル情報分析プログラムは、インターネット等の通信ネットワーク、又は磁気記録媒体、光学記録媒体等の記録媒体を介して記憶装置1540に提供される。そして、記憶装置1540に格納されたデジタル情報分析システム1用のデジタル情報分析プログラムは、CPU1500により実行される。 The digital information analysis program for the digital information analysis system 1 stored in the storage device 1540 is provided to the storage device 1540 via a communication network such as the Internet or a recording medium such as a magnetic recording medium or an optical recording medium. The digital information analysis program for the digital information analysis system 1 stored in the storage device 1540 is executed by the CPU 1500.
 本実施の形態に係るデジタル情報分析システム1により実行されるデジタル情報分析プログラムは、CPU1500に働きかけて、デジタル情報分析システム1を、図1及び図2において説明したパラメーターセット格納部10、入力部12、パラメーターセット取得部14、メタデータ取得部16、関係性情報格納部18、第1の更新部20、第2の更新部22、関連性判断部24、判断結果設定部26、及び出力部28として機能させる。 The digital information analysis program executed by the digital information analysis system 1 according to the present embodiment operates on the CPU 1500 to change the digital information analysis system 1 to the parameter set storage unit 10 and the input unit 12 described with reference to FIGS. , Parameter set acquisition unit 14, metadata acquisition unit 16, relationship information storage unit 18, first update unit 20, second update unit 22, relevance determination unit 24, determination result setting unit 26, and output unit 28 To function as.
(実施の形態の効果)
 本実施の形態に係るデジタル情報分析システム1は、ドキュメントのメタデータを参照し、所定のプロパティを有する(若しくは有さない)HOTドキュメントに相当するドキュメントであるか、所定のプロパティを有する(若しくは有さない)Non-HOTドキュメントに相当するドキュメントであるか等、複数のドキュメントファイルを所定の判断基準で細分化して形態素の重みを修正若しくは更新できるので、精度の高いフォレンジックシステムを提供することができる。
(Effect of embodiment)
The digital information analysis system 1 according to the present embodiment refers to a document metadata and is a document corresponding to a HOT document having (or not having) a predetermined property, or having (or having) a predetermined property. No) Since a plurality of document files can be subdivided according to a predetermined criterion such as whether the document is equivalent to a Non-HOT document, the weight of the morpheme can be corrected or updated, so that a highly accurate forensic system can be provided. .
 以上、本発明の実施の形態を説明したが、上記に記載した実施の形態は特許請求の範囲に係る発明を限定するものではない。また、実施の形態の中で説明した特徴の組合せのすべてが発明の課題を解決するための手段に必須であるとは限らない点に留意すべきである。更に、上記した実施形態の技術的要素は、単独で適用されてもよいし、プログラム部品とハードウェア部品とのような複数の部分に分割されて適用されるようにすることもできる。 As mentioned above, although embodiment of this invention was described, embodiment described above does not limit the invention which concerns on a claim. In addition, it should be noted that not all combinations of features described in the embodiments are necessarily essential to the means for solving the problems of the invention. Furthermore, the technical elements of the above-described embodiments may be applied independently, or may be applied by being divided into a plurality of parts such as program parts and hardware parts.
 1    デジタル情報分析システム
 2    情報処理装置
 10   パラメーターセット格納部
 12   入力部
 14   パラメーターセット取得部
 16   メタデータ取得部
 18   関係性情報格納部
 20   第1の更新部
 22   第2の更新部
 24   関連性判断部
 26   判断結果設定部
 28   出力部
 1500 CPU
 1510 チップセット
 1520 グラフィックコントローラ
 1530 メモリ
 1540 記憶装置
 1545 読込/書込み装置
 1550 通信インターフェース
 1560 入力装置
DESCRIPTION OF SYMBOLS 1 Digital information analysis system 2 Information processing apparatus 10 Parameter set storage part 12 Input part 14 Parameter set acquisition part 16 Metadata acquisition part 18 Relationship information storage part 20 1st update part 22 2nd update part 24 Relevance judgment part 26 judgment result setting unit 28 output unit 1500 CPU
1510 chip set 1520 graphic controller 1530 memory 1540 storage device 1545 read / write device 1550 communication interface 1560 input device

Claims (9)

  1.  情報処理装置に格納されているデジタル情報に関連付けられているメタデータを取得するメタデータ取得部と、
     前記メタデータと予め定められた特定事項と関係を有する第1デジタル情報との関係に基づいて、所定の形態素に、前記第1デジタル情報に対する前記所定の形態素の重みづけ情報と前記予め定められた特定事項とは関係を有さない第2デジタル情報に対する前記所定の形態素の重みづけ情報とを対応づけた重みづけパラメーターセットを更新する第1の更新部と、
     前記第1の更新部が更新した前記重みづけパラメーターセットを用い、前記所定の形態素と前記デジタル情報との関連性を更新する第2の更新部と
    を備えるデジタル情報分析システム。
    A metadata acquisition unit that acquires metadata associated with digital information stored in the information processing apparatus;
    Based on the relationship between the metadata and the first digital information having a relationship with a predetermined specific item, the predetermined morpheme is weighted with the predetermined morpheme weighting information with respect to the first digital information. A first updating unit that updates a weighting parameter set that associates the weighting information of the predetermined morpheme with respect to second digital information that has no relationship with the specific matter;
    A digital information analysis system comprising: a second update unit that updates the association between the predetermined morpheme and the digital information using the weighting parameter set updated by the first update unit.
  2.  前記重みづけパラメーターセットを取得するパラメーターセット取得部と、
     前記メタデータと前記第1デジタル情報との関係性を示す関係性情報を格納する関係性情報格納部と
    を更に備え、
     前記デジタル情報が、前記情報処理装置に格納されている調査対象である対象デジタル情報であり、
     前記メタデータ取得部が、前記対象デジタル情報に関連付けられている複数のメタデータのうち少なくとも1つのメタデータを取得し、
     前記第1の更新部が、前記メタデータ取得部が取得した前記メタデータに対応づけて前記関係性情報格納部が格納している前記関係性情報に基づいて、前記重みづけパラメーターセットを更新し、
     前記第2の更新部が、前記重みづけパラメーターセットを用い、前記所定の形態素と前記対象デジタル情報との関連性の強度を示す強度情報を更新する請求項1に記載のデジタル情報分析システム。
    A parameter set acquisition unit for acquiring the weighting parameter set;
    A relationship information storage unit for storing relationship information indicating a relationship between the metadata and the first digital information;
    The digital information is target digital information that is an investigation target stored in the information processing apparatus,
    The metadata acquisition unit acquires at least one metadata among a plurality of metadata associated with the target digital information;
    The first update unit updates the weighting parameter set based on the relationship information stored in the relationship information storage unit in association with the metadata acquired by the metadata acquisition unit. ,
    2. The digital information analysis system according to claim 1, wherein the second update unit updates strength information indicating a strength of association between the predetermined morpheme and the target digital information using the weighting parameter set.
  3.  前記所定の形態素の重み付け情報が、前記第1デジタル情報又は前記第2デジタル情報内に前記所定の形態素が出現する頻度を示す出現頻度情報を含む請求項2に記載のデジタル情報分析システム。 3. The digital information analysis system according to claim 2, wherein the weighting information of the predetermined morpheme includes appearance frequency information indicating a frequency at which the predetermined morpheme appears in the first digital information or the second digital information.
  4.  前記メタデータが、構造的メタデータ若しくは記述的メタデータである請求項1~3のいずれか1項に記載のデジタル情報分析システム。 The digital information analysis system according to any one of claims 1 to 3, wherein the metadata is structural metadata or descriptive metadata.
  5.  前記パラメーターセット取得部が、複数の形態素のそれぞれに予め対応づけられた強度情報を含むスコアリングパラメーターを取得し、
     前記第1の更新部が、前記メタデータ取得部が取得した前記メタデータに対応づけて前記関係性情報格納部に格納されている前記関係性情報に基づいて、前記パラメーターセット取得部が取得した前記スコアリングパラメーターを更新し、
     前記第2の更新部が、前記第1の更新部が更新した前記スコアリングパラメーターを用い、前記強度情報を更新する請求項1~4のいずれか1項に記載のデジタル情報分析システム。
    The parameter set acquisition unit acquires a scoring parameter including intensity information previously associated with each of a plurality of morphemes,
    The first update unit acquires the parameter set acquisition unit based on the relationship information stored in the relationship information storage unit in association with the metadata acquired by the metadata acquisition unit. Update the scoring parameters;
    The digital information analysis system according to any one of claims 1 to 4, wherein the second update unit updates the intensity information using the scoring parameter updated by the first update unit.
  6.  前記対象デジタル情報の形態素解析の結果と前記強度情報とに基づいて前記対象デジタル情報の前記予め定められた特定事項との関連性を判断する関連性判断部と、
     前記関連性判断部の判断結果を前記対象デジタル情報に対応づける判断結果設定部と
    を更に備える請求項1~5のいずれか1項に記載のデジタル情報分析システム。
    A relevance determination unit that determines relevance of the target digital information with the predetermined specific matter based on a result of morphological analysis of the target digital information and the strength information;
    6. The digital information analysis system according to claim 1, further comprising a determination result setting unit that associates the determination result of the relevance determination unit with the target digital information.
  7.  前記予め定められた特定事項が、訴訟に関係することを示す情報である請求項1~6のいずれか1項に記載のデジタル情報分析システム。 The digital information analysis system according to any one of claims 1 to 6, wherein the predetermined specific matter is information indicating that it relates to a lawsuit.
  8.  情報処理装置に格納されているデジタル情報に関連付けられているメタデータを取得するメタデータ取得段階と、
     前記メタデータと予め定められた特定事項と関係を有する第1デジタル情報との関係に基づいて、所定の形態素に、前記第1デジタル情報に対する前記所定の形態素の重みづけ情報と前記予め定められた特定事項とは関係を有さない第2デジタル情報に対する前記所定の形態素の重みづけ情報とを対応づけた重みづけパラメーターセットを更新する第1の更新段階と、
     前記第1の更新段階において更新された前記重みづけパラメーターセットを用い、前記所定の形態素と前記デジタル情報との関連性を更新する第2の更新段階と
    を備えるデジタル情報分析方法。
    A metadata acquisition stage for acquiring metadata associated with digital information stored in the information processing apparatus;
    Based on the relationship between the metadata and the first digital information having a relationship with a predetermined specific item, the predetermined morpheme is weighted with the predetermined morpheme weighting information with respect to the first digital information. A first update step of updating a weighting parameter set that associates the weight information of the predetermined morpheme with the second digital information not related to the specific matter;
    A digital information analysis method comprising: a second update step of updating a relationship between the predetermined morpheme and the digital information using the weighting parameter set updated in the first update step.
  9.  デジタル情報分析プログラムであって、
     コンピュータに、
     情報処理装置に格納されているデジタル情報に関連付けられているメタデータを取得するメタデータ取得機能と、
     前記メタデータと予め定められた特定事項と関係を有する第1デジタル情報との関係に基づいて、所定の形態素に、前記第1デジタル情報に対する前記所定の形態素の重みづけ情報と前記予め定められた特定事項とは関係を有さない第2デジタル情報に対する前記所定の形態素の重みづけ情報とを対応づけた重みづけパラメーターセットを更新する第1の更新機能と、
     前記第1の更新機能において更新された前記重みづけパラメーターセットを用い、前記所定の形態素と前記デジタル情報との関連性を更新する第2の更新機能と
    を実現させるデジタル情報分析プログラム。
    A digital information analysis program,
    On the computer,
    A metadata acquisition function for acquiring metadata associated with digital information stored in the information processing apparatus;
    Based on the relationship between the metadata and the first digital information having a relationship with a predetermined specific item, the predetermined morpheme is weighted with the predetermined morpheme weighting information with respect to the first digital information. A first update function for updating a weighting parameter set that associates the weight information of the predetermined morpheme with the second digital information not related to the specific matter;
    The digital information analysis program which implement | achieves the 2nd update function which updates the relationship of the said predetermined | prescribed morpheme and the said digital information using the said weighting parameter set updated in the said 1st update function.
PCT/JP2014/073260 2013-09-10 2014-09-03 Digital information analysis system, digital information analysis method, and digital information analysis program WO2015037498A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2013-187084 2013-09-10
JP2013187084 2013-09-10

Publications (1)

Publication Number Publication Date
WO2015037498A1 true WO2015037498A1 (en) 2015-03-19

Family

ID=52665604

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/073260 WO2015037498A1 (en) 2013-09-10 2014-09-03 Digital information analysis system, digital information analysis method, and digital information analysis program

Country Status (2)

Country Link
TW (1) TW201510922A (en)
WO (1) WO2015037498A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003016106A (en) * 2001-06-29 2003-01-17 Fuji Xerox Co Ltd Device for calculating degree of association value
JP2006244028A (en) * 2005-03-02 2006-09-14 Nippon Hoso Kyokai <Nhk> Information exhibition device and information exhibition program
JP2011209930A (en) * 2010-03-29 2011-10-20 Ubic:Kk Forensic system, forensic method, and forensic program
WO2013129548A1 (en) * 2012-02-29 2013-09-06 株式会社Ubic Document classification system, document classification method, and document classification program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003016106A (en) * 2001-06-29 2003-01-17 Fuji Xerox Co Ltd Device for calculating degree of association value
JP2006244028A (en) * 2005-03-02 2006-09-14 Nippon Hoso Kyokai <Nhk> Information exhibition device and information exhibition program
JP2011209930A (en) * 2010-03-29 2011-10-20 Ubic:Kk Forensic system, forensic method, and forensic program
WO2013129548A1 (en) * 2012-02-29 2013-09-06 株式会社Ubic Document classification system, document classification method, and document classification program

Also Published As

Publication number Publication date
TW201510922A (en) 2015-03-16

Similar Documents

Publication Publication Date Title
Casey et al. Leveraging CybOX™ to standardize representation and exchange of digital forensic information
CN104321802B (en) Image analysis device, image analysis system, and image analysis method
CN104471582B (en) The defence tracked to search engine
JP2013069279A (en) Information management and networking
TW201513035A (en) Correlation display system, correlation display method, and correlation display program
Miranda Lopez et al. Scenario-based digital forensics challenges in cloud computing
US20190080000A1 (en) Entropic classification of objects
JP7069802B2 (en) Systems and methods for user-oriented topic selection and browsing, how to display multiple content items, programs, and computing devices.
CN109522705B (en) Authority management method, device, electronic equipment and medium
US20180004855A1 (en) Web link quality analysis and prediction in social networks
JP5533291B2 (en) Privacy protection device, privacy protection method and program
Byrne et al. Building public support for anti-corruption efforts: Why anti-corruption agencies need to communicate and how
Boeschoten et al. Automatic de-identification of data download packages
US11568968B2 (en) Resolving ambiguous search queries
WO2015037498A1 (en) Digital information analysis system, digital information analysis method, and digital information analysis program
Vogel Big data and biodefense: Prospects and pitfalls
Quick et al. Big Digital Forensic Data: Volume 2: Quick Analysis for Evidence and Intelligence
TWI682281B (en) Information processing device, information processing method and computer readable recording medium
Quick et al. Quick analysis of digital forensic data
JP5513437B2 (en) E-mail erroneous transmission prevention system and method, user terminal device and user profile recording device
JP5055202B2 (en) Keyword comparison system, keyword comparison method, and keyword comparison program
Cipriano The importance of knowledge-based technology
US9928303B2 (en) Merging data analysis paths
CN104657397B (en) Information processing method and terminal
JP6145570B2 (en) Information leakage detection device, information leakage detection method, and information leakage detection program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14843669

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14843669

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP