WO2016042762A1

WO2016042762A1 - Information-generating device, information-generating method, and recording medium

Info

Publication number: WO2016042762A1
Application number: PCT/JP2015/004707
Authority: WO
Inventors: 健太郎園田; かや人関谷; 由也木津
Original assignee: 日本電気株式会社
Priority date: 2014-09-19
Filing date: 2015-09-16
Publication date: 2016-03-24
Also published as: JPWO2016042762A1; JP6436171B2

Abstract

Provided is art for generating dummy information that is less readily distinguished by attackers as dummy information. This information-generating device comprises: an analysis unit that breaks down, into words, a character string included in constituent element information related to constituent elements of a system; an associated-word determination unit that, on the basis of concept information and in relation to words that are among the aforementioned words and are included in the concept information, determines associated words for said words; and a synthesis unit that generates dummy information comprising a character string that is different from the character string included in the constituent element information. Said dummy information is generated by combining: the associated words; and either words that come before and/or after the words in the character string prior to being broken down into the words used to determine the associated words, or associated words for said words.

Description

Information generating apparatus, information generating method, and recording medium

The present invention relates to an information generation device, an information generation method, and a recording medium.

Recently, defense measures against cyber attacks on corporate and social infrastructure are being considered. As such defensive measures, measures are taken to monitor, detect and block cyber attacks and virus intrusions.

However, given the phenomena that cyber attacks do not stop at once, it is not possible to completely protect against virus intrusion into corporations and social infrastructure due to the evolution of attack methods and technical difficulties in fully guaranteeing attack detection accuracy. It is very difficult.

Therefore, it is necessary to consider defensive measures based on the premise that a cyber attack has infiltrated the network that constitutes a company or social infrastructure, or that a virus has already infiltrated the inside.

Patent Document 1 describes a method of identifying the outflow source when personal information or the like leaks to the outside. In the technique described in Patent Literature 1, dummy search data is mixed with search results of a database, and the search requester who leaks customer data including this dummy data by associating this dummy data with the identification information of the search requester. Is identified.

Patent Document 2 describes an information processing apparatus that displays dummy data on a display unit when a condition for releasing the security lock is not satisfied.

Further, as a data generation method, Patent Document 3 discloses a method of dividing two words into two and connecting the first half of one word and the second half of the other word. Further, Patent Document 4 discloses a method of assigning a priority to a plurality of terms related to a certain term and assigning them a priority.

Also, Patent Document 5 describes a method for increasing or decreasing the value of sequential number phrases indicating the order of numbers, English letters, symbols, and the like.

JP 2005-222135 A JP 2013-250776 A JP 2009-271784 A JP-A-10-214271 JP-A-8-171559

When identifying an attacker who has entered the network using dummy information such as dummy data, the dummy information is preferably information that is difficult for an attacker to identify as dummy information. That is, it is preferable that the dummy information is difficult to distinguish from regular information, has no sense of incongruity, and is easily caught by an attacker. Here, regular information indicates information that already exists (is used).

In the technique described in Patent Document 1, dummy data is generated by using a synonym or by defining a word considered as an address name or the like and combining it appropriately.

In the technique described in Patent Document 2, dummy data is generated by replacing a part of information in data usable as dummy data.

Thus, there is a problem that the dummy data generated by rewriting or combining the original information (data) is likely to be determined as a dummy by an attacker.

Also, in the method of connecting character strings obtained by dividing the word described in Patent Document 3 back and forth at an arbitrary place, the generated character string becomes a simple character enumeration that does not make sense. Such dummy data is also likely to be determined as a dummy by an attacker.

In addition, when the dummy information is a static list prepared in advance, depending on an ICT (Information and Communication Technology) system to which the dummy information is applied, it may be easily determined as dummy information. Further, when the regular information of the target ICT system to which the dummy information is applied is organized and the dummy information suitable for the ICT system is manually generated, when the amount of regular information and the number of target ICT systems increase, It takes an enormous amount of time to generate dummy information.

The present invention has been made in view of the above-described problems, and an object of the present invention is to provide a technique for generating dummy information that is more difficult for an attacker to identify as dummy information.

An information generation apparatus according to an aspect of the present invention provides an analysis unit that decomposes a character string included in component information relating to a component of a system into words, and a word included in conceptual information among the decomposed words, An associative word determining means for determining an associative word of the word based on the concept information, the associative word, and a character string before the word used for determining the associative word before and after the word And a synthesizing unit that generates dummy information including a character string different from the character string included in the component element information by combining a word following at least one of the above or an associated word of the word.

An information generation method according to an aspect of the present invention decomposes a character string included in component information relating to a component of a system into words, and applies the concept information to a word included in concept information among the decomposed words. Based on the word, the word following the at least one of the word before and after the word in the character string before the word used for the word and the word used for the word determination, or The dummy information including a character string different from the character string included in the component element information is generated by combining the associated word with the word.

Note that a computer program that realizes the information generation apparatus or the information generation method by a computer and a computer-readable storage medium that stores the computer program are also included in the scope of the present invention.

According to the present invention, it is possible to generate dummy information that is more difficult for an attacker to identify as dummy information.

It is a functional block diagram which shows an example of a function structure of the information generation apparatus which concerns on the 1st Embodiment of this invention. It is a figure which shows an example of a structure of the information generation system which concerns on the 2nd Embodiment of this invention. It is a functional block diagram which shows an example of a function structure of the information generation apparatus which concerns on the 2nd Embodiment of this invention. It is a flowchart which shows an example of the flow of a process of the information generation apparatus which concerns on the 2nd Embodiment of this invention. It is a figure for demonstrating operation | movement of the analysis part of the information generation apparatus which concerns on the 2nd Embodiment of this invention. It is a figure for demonstrating the structure of the conceptual information stored in the memory | storage part of the information generation apparatus which concerns on the 2nd Embodiment of this invention. It is a figure for demonstrating operation | movement of the synthetic | combination part of the information generation apparatus which concerns on the 2nd Embodiment of this invention. It is a figure for demonstrating operation | movement when the component information collected in the information generation apparatus which concerns on the 2nd Embodiment of this invention is a file name. It is a figure for demonstrating operation | movement when the component information collected in the information generation apparatus which concerns on the 2nd Embodiment of this invention is a mail address. It is a figure for demonstrating operation | movement when the component information collected in the information generation apparatus which concerns on the 2nd Embodiment of this invention is a mail address. It is a figure for demonstrating operation | movement when the component information collected in the information generator which concerns on the 2nd Embodiment of this invention is URI (Uniform Resource Identifier). It is a figure for demonstrating the structure of the conceptual information stored in the memory | storage part of the information generation apparatus which concerns on the 3rd Embodiment of this invention. It is a functional block diagram which shows an example of a function structure of the information generation apparatus which concerns on the 5th Embodiment of this invention. It is a figure which illustrates illustartively the hardware constitutions of the computer (information processing apparatus) which can implement | achieve each embodiment of this invention.

<First Embodiment>
A first embodiment of the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram showing an example of a functional configuration of the information generation apparatus 10 according to the first embodiment of the present invention. Note that the information generation apparatus 10 shown in FIG. 1 shows a configuration unique to the present invention, and the information generation apparatus 10 shown in FIG. 1 may have a member that is not shown in FIG. Needless to say. Moreover, the direction of the arrow in the drawings shows an example, and does not limit the direction of signals between blocks. Similarly in other block diagrams to be referred to hereinafter, the directions of the arrows in the drawings show an example and do not limit the direction of signals between the blocks.

1, the information generation apparatus 10 includes an analysis unit 101, an associative word determination unit 102, and a synthesis unit 103.

The analysis unit 101 analyzes the character string included in the component information related to the component such as a server included in the system, and decomposes it into one or more words. The component information includes, for example, a host name indicating each component, a user account used to access each component, a file name indicating an information resource stored in each component, and a location of the information resource Information such as URI (Uniform Resource Identifier) is included, but the component information of the present embodiment is not limited to this. The component information may include, for example, the email address of the user who uses each component. Since this component element information is information that is actually used in the system, it is also called regular information in this system.

When the component information includes a plurality of character strings, the analysis unit 101 analyzes the character string for each character string and decomposes the character string into one or more words. Then, the analysis unit 101 outputs the decomposed word to the associative word determination unit 102 and the synthesis unit 103.

The associative word determination unit 102 receives the decomposed word from the analysis unit 101. Then, the associative word determination unit 102 confirms whether or not the received word is included in the concept information, and determines an associative word of the word based on the concept information for the word included in the concept information. The associative word determination unit 102 outputs the determined associative word to the synthesis unit 103.

The synthesizing unit 103 receives the decomposed word from the analyzing unit 101. The synthesizing unit 103 receives an associative word from the associative word determining unit 102. The synthesizing unit 103 combines the associative word and the word following at least one of the word before and after the word or the word associative word in the character string before the word used for the determination of the word. . Thereby, the composition unit 103 generates a character string different from the character string included in the component element information.

For example, when the character strings included in the component element information are “spring01” and “fall02”, the analysis unit 101 decomposes “spring01” into “spring” and “01”. Also, the analysis unit 101 decomposes “fall02” into “fall” and “02”.

Then, the associative word determination unit 102 checks whether or not “spring”, “fall”, “01”, and “02” are included in the concept information. Then, when “sspring” and “fall” are included in the concept information, the associative word determination unit 102, based on the concept information, associate words of “sspring” and “fall” (for example, “winter” and “fall”). ”outumn”).

Then, the synthesizing unit 103 combines (1) “winter” and (2) a word following this word in the character string before the word used for the determination of “winter”.

The number of words used to determine this associative word may be one or plural. For example, when “spring” and “fall” share the superordinate concept “season”, the associative word determination unit 102 determines an associative word “winter” for the two words. Thus, when an associative word is determined using two words, the character strings before the word decomposition used in the determination of “winter” in the above (2) are “spring01” and “fall02”. The words following the word used to determine “winter” are “01” and “02”. The same is true for “autumn”.

Thereby, the synthesis unit 103 generates “winter01”, “winter02”, “autumn01”, and “autumn02”. These character strings are different from the character strings “spring01” and “fall02” included in the component information.

As described above, the synthesis unit 103 can generate dummy information including the generated character string.

(effect)
When determining component information such as a host name, users tend to name this component information based on heuristics such as classification, category, serial number, and the like. Since the dummy information generated by the information generation apparatus 10 according to the present embodiment is generated based on the concept information, it can be said that the information is difficult to distinguish from the component element information generated based on the heuristics. Further, since the dummy information generated by the information generation apparatus 10 according to the present embodiment is generated based on the concept information of the component element information, it can be said that the information does not feel strange even when used as the component element information. .

Thus, the information generation apparatus 10 according to the present embodiment can generate dummy information that is more difficult to discriminate when it is dummy information for an attacker. That is, the information generation apparatus 10 according to the present embodiment can automatically generate dummy information that is used by an attacker without noticing a dummy.

A system that uses dummy information generated in this way for the system can detect an attacker more.

<Second Embodiment>
Next, a second embodiment based on the above-described first embodiment will be described. For convenience of explanation, members having the same functions as those included in the drawings described in the first embodiment described above are given the same reference numerals, and descriptions thereof are omitted.

First, the configuration of the information generation system 1 according to the present embodiment will be described. FIG. 2 is a diagram illustrating an example of the configuration of the information generation system 1 according to the present embodiment. As shown in FIG. 2, the information generation system 1 according to the present embodiment includes an information generation device 100 and an in-company system 300. The information generation apparatus 100 and the in-company system 300 are connected via a network 200 so that they can communicate with each other.

The in-company system 300 is a system using ICT (Information and Communication Technology). In the present embodiment, an example of an ICT system will be described taking a system built in a company as an example, but the ICT system of the present embodiment is not limited to this. The ICT system only needs to be an environment where services can be used via a network or the like.

The in-company system 300 includes various devices such as a server, a client, and a storage. Hereinafter, these are referred to as components of the in-company system 300.

The information generation device 100 is a device that generates dummy information. The configuration of the information generation apparatus 100 will be described with reference to different drawings.

(Configuration of information generating apparatus 100)
Next, the functional configuration of the information generation apparatus 100 according to the present embodiment will be described with reference to FIG. FIG. 3 is a functional block diagram illustrating an example of a functional configuration of the information generation apparatus 100 according to the present embodiment. The information generation apparatus 100 according to the present embodiment has a configuration in which the information generation apparatus 10 described in the first embodiment includes a sequential number generation unit 104, a collection unit 110, and a storage unit 120. Specifically, as illustrated in FIG. 3, the information generation apparatus 100 includes an analysis unit 101, an associative word determination unit 102, a synthesis unit 103, a sequence number generation unit 104, a collection unit 110, and a storage unit 120. It is equipped with.

The collection unit 110 is means for collecting component information regarding each component of the in-company system 300 via the network 200.

The collection unit 110 collects component information from, for example, a directory service. Then, the collection unit 110 outputs the collected component information (hereinafter referred to as collection data) to the analysis unit 101. Note that the collection unit 110 may store the collected component information in, for example, the storage unit 120 or the collection unit 110 described later. The collection unit 110 may be configured to collect data of a specific type (for example, a host name) among the component element information regarding each component of the in-company system 300.

The analysis unit 101 receives the collected data from the collection unit 110. Similarly to the analysis unit 101 in the first embodiment, the analysis unit 101 performs language analysis on one or more character strings included in the collected data (component element information), and each character string is converted into one or more character strings. Break it down into words. In the present embodiment, the analysis unit 101 uses morphological analysis as language analysis, but the language analysis method of the present embodiment is not limited to this. The analysis unit 101 may perform language analysis using other language analysis methods.

The analysis unit 101 outputs one or a plurality of words (referred to as decomposition data) decomposed for each character string to the associative word determination unit 102, the synthesis unit 103, and the sequence number generation unit 104. At this time, the analysis unit 101 assigns the attribute of the word to each of the decomposed one or more words. This attribute includes information indicating the position in the character string before decomposition (character string included in the collected data). For example, when the analysis unit 101 decomposes the character string “AAA01” into “AAA” and “01”, the attribute “AAA” indicates that the word is the first part of the character string before decomposition. Contains information to indicate. The information included in this attribute is not limited to this. The attribute may include, for example, information that there is a space after the decomposed word. The attribute may include information indicating a character string before decomposition, for example.

Note that when the collected data includes multiple types of component information, the analysis unit 101 may perform analysis processing by extracting data of the analysis target type from the collected data.

The storage unit 120 stores conceptual information prepared in advance. Concept information is information including a concept dictionary, which is a dictionary for defining the concept of words. The concept information is not limited to this, and may include, for example, antonyms, synonyms, and the like of a certain word. The conceptual information may be changed as appropriate according to the components in the in-company system 300.

In the present embodiment, the configuration in which the storage unit 120 is built in the information generating apparatus 100 will be described as an example, but the configuration related to the storage unit 120 is not limited to this. The storage unit 120 may be realized by a storage device that is separate from the information generation device 100.

The associative word determination unit 102 determines each associative word of one or more words decomposed by the analysis unit 101 based on the concept information, similarly to the associative word determination unit 102 of the first embodiment. Specifically, the associative word determination unit 102 receives the decomposed data decomposed by the language analysis from the analysis unit 101. Then, the associative word determination unit 102 refers to the concept information stored in the storage unit 120 and confirms whether one or more words included in the decomposition data are included in the concept information. The associative word determination unit 102 determines another word having the same superordinate concept as that of the word included in the concept information as an associative word for the word.

The associative word determination unit 102 assigns, to the determined associative word, the attribute of the word used in the determination of the associative word (the word included in the decomposition data output from the analysis unit 101) as the attribute of the associative word. . For example, when the attribute of the word used to determine the associative word (referred to as the original word) includes information indicating the position of the character string before decomposition, the associative word determining unit 102 sets the attribute of the associative word to Include information indicating the position of the string. Note that the attribute of the associative word may include information indicating the original word.

Then, the associative word determination unit 102 outputs the determined associative word to the serial number generation unit 104.

The serial number generation unit 104 receives the decomposed data from the analysis unit 101. Then, the serial number generation unit 104 identifies a word that can generate serial numbers and consecutive phrases (hereinafter referred to as sequential number phrases) among the received words. The word that can generate the serial number phrase is a continuous number, alphabet, or the like. In addition, the word which can produce | generate a serial number phrase is not limited to a number and an alphabet, For example, phrases, such as "(alpha), (beta), (gamma), ...", may be sufficient. That is, the words that can generate sequential number phrases may be words included in a predetermined array.

The serial number generation unit 104 generates a serial number phrase of the identified word. That is, the serial number generation unit 104 extracts words that are included in a predetermined array including the identified word and that are different from the identified word as sequential number phrases. The serial number generation unit 104 assigns, to the generated serial number word / phrase, an attribute of a word (word that can generate the serial number word / phrase) that is a source of generation of the serial number word / phrase as an attribute of the serial number word / phrase. For example, when the original word attribute includes information indicating the position of the character string before decomposition, the serial number generation unit 104 includes information indicating the position of the character string in the attribute of the serial number phrase. Note that the serial number phrase attribute may include information indicating the original word.

Then, the sequence number generation unit 104 outputs the generated sequence number phrase to the synthesis unit 103.

The synthesizing unit 103 receives the decomposed data from the analyzing unit 101. The synthesizing unit 103 receives an associative word from the associative word determining unit 102. The synthesizing unit 103 also receives sequential number phrases from the sequential number generating unit 104. The synthesizing unit 103 combines the associative word and the word following at least one of the word before and after the word or the word associative word in the character string before the word used for the determination of the word. (Synthesize). A specific example of the combining method of the combining unit 103 will be described with reference to different drawings.

(Processing flow of information generating apparatus 100)
Next, processing of the information generating apparatus 100 according to the present embodiment will be described with reference to FIGS. FIG. 4 is a flowchart illustrating an example of a processing flow of the information generation apparatus 100 according to the present embodiment. 5 to 7 are diagrams for explaining the operation of the information generating apparatus 100 according to the present embodiment. In the following, description will be given by taking as an example that the component information collected by the collection unit 110 is a host name.

As shown in FIG. 4, the collection unit 110 collects component information (step S41). Then, the character string included in the component information (collected data) collected by the analysis unit 101 is linguistically analyzed and decomposed into one or a plurality of words (step S42).

Here, the process of step S42 will be described with an example. FIG. 5 is a diagram for explaining the operation of the analysis unit 101 of the information generation apparatus 100 according to the present embodiment.

The collection data collected by the collection unit 110 includes the four host names “spring-a”, “fall”, “test01”, and “test02” shown on the left side of FIG. The analysis unit 101 performs linguistic analysis on each of these host names and decomposes them into one or a plurality of words. That is, the analysis unit 101 generates decomposed data including six words “spring”, “−a”, “fall”, “test”, “01”, and “02”. Here, the analysis unit 101 may include the number of appearances in the attribute of the overlapping word (in this example, “test”).

In addition, the analysis unit 101 includes information indicating that the word is the first part of the character string before decomposition in the attributes “spring” and “test” and includes “−a”, “01”, “02”. In the attribute of, information indicating that the word is the last part of the character string before decomposition is included. Further, the analysis unit 101 may include information indicating that the attribute has not been decomposed in the attribute of “fall”.

The analysis unit 101 outputs the decomposed data including these words to the associative word determination unit 102, the synthesis unit 103, and the sequence number generation unit 104.

Returning to FIG. 4, the continuation of the processing by the information generating apparatus 100 will be described. After step S42, the associative word determination unit 102 refers to the storage unit 120 and determines an associative word of the word included in the conceptual information among one or more words included in the decomposed data (step S43).

Here, the process of step S43 will be further described with reference to FIG. FIG. 6 is a diagram for explaining the configuration of the conceptual information stored in the storage unit 120. As shown in FIG. 6, the storage unit 120 stores tree-structured conceptual information. Note that the data structure of the concept information is not limited to this, and any structure may be used as long as the superordinate concept and / or subordinate concept of a word can be understood.

FIG. 6 includes “spring”, “summer”, “winter”, “fall”, “autumn”, and the like as examples of conceptual information. Then, “season”, which is a superordinate concept of these words, is associated with a word such as “sspring” as a superordinate concept such as “spring”. Similarly, the superordinate concept “xxx” of “season” is associated with “season” as a superordinate concept of “season”.

In FIG. 6, one superordinate concept is associated as a superordinate concept of a word, but the present embodiment is not limited to this. One word may be associated with a plurality of superordinate concepts. For example, “spring” may be associated with a superordinate concept “elastic body”.

The associative word determination unit 102 checks whether or not “spring” included in the decomposition data is included in the concept information, and if included, searches for a superordinate concept of the word. When the superordinate concept “season” of “spring” is searched, the associative word determination unit 102 determines an arbitrary word from the subordinate concepts of “season” as an associative word. In this example, it is assumed that the associative word determination unit 102 determines “winter” as an associative word of “spring”. Similarly, the associative word determination unit 102 determines “autumn” as an associative word of “fall”.

Also, the associative word determination unit 102 checks whether “test” is included in the concept information. In this example, it is assumed that this “test” is not included in the concept information. Further, the associative word determination unit 102 confirms whether “−a”, “01”, and “02” are also included in the concept information. In this example, it is assumed that these words are not included in the concept information.

As described above, the associative word determination unit 102 checks whether or not all the words included in the decomposition data are included in the concept information, and determines the associated word of the word for the included words. . Then, the associative word determination unit 102 includes the original word attribute in the determined word attribute. That is, the associative word determination unit 102 includes information indicating that the word is the first part of the character string before decomposition in the “winter” attribute. Further, the associative word determination unit 102 may include information indicating that the attribute is not decomposed in the attribute of “autumn”. According to the configuration of FIG. 6, “autumn” can also be an associative word of “spring”, and “winter” can be an associative word of “fall”. Therefore, the associative word determination unit 102 may include the attribute “fall” in the attribute “winter”, for example.

In addition, when a plurality of words having the same superordinate concept are included in the decomposed data, it is preferable that the associative word determining unit 102 uses these words to determine the superordinate concepts of these words. For example, when the superordinate concepts of “spring” are “season” and “elastic body”, the associative word determination unit 102 may determine the subordinate concept of “elastic body” as an associative word of “spring”. is there.

Therefore, the associative word determination unit 102 searches for a superordinate concept common to “spring” and “fall”. That is, the associative word determination unit 102 confirms whether the superordinate concept of “spring” and the superordinate concept of “fall” are the same. When the superordinate concepts are the same, the associative word determination unit 102 searches for this superordinate concept, and is a subordinate concept of “season” with respect to the superordinate concept “season” searched for, “spring” and “ Words other than “fall” are determined as associative words of “spring” and “fall”. As a result, the information generation apparatus 100 can generate dummy information that is difficult for an attacker to be determined as a dummy.

At this time, it is preferable that the associative word determination unit 102 determines as many associative words the number of words that is equal to or more than the number of original words used for determining the associative word. Thereby, the information generation apparatus 100 can generate at least as many pieces of dummy information as the regular information. Note that the number of words to be determined as associative words is arbitrary, and may not be the same as the number of original words used to determine the associative words.

Then, the associative word determination unit 102 includes attributes of “spring” and “fall” in the respective attributes of “winter” and “autumn”, which are words determined as associative words. That is, the associative word determination unit 102 includes, in each of the attributes “winter” and “autumn”, information indicating that the word is the first part of the character string before decomposition and that it is not decomposed.

Then, the associative word determination unit 102 outputs “winter” and “autumn” to the synthesis unit 103.

Also, the associative word determination unit 102 may have the same attribute for words having the same superordinate concept. For example, as shown in FIG. 6, since “spring” and “fall” have the same superordinate concept, the associative word determination unit 102 includes the attribute “fall” in the attribute “spring”, and “fall” ”Attribute may be included in the“ spring ”attribute. Then, the associative word determination unit 102 outputs information indicating that the attribute of the word included in the decomposed data has been changed to the synthesis unit 103.

Returning to FIG. 4, the continuation of the processing by the information generating apparatus 100 will be described. The sequential number generation unit 104 generates sequential number phrases of words that can generate sequential number phrases from among one or more words included in the decomposed data (step S44). The step S44 may be performed after the step S42, may be performed simultaneously with the step S43, or may be performed before the step S43.

For example, when receiving the decomposed data as shown on the right side of FIG. 5, the serial number generation unit 104 identifies “−a”, “01”, and “02” as words having continuity. The serial number generation unit 104 generates “−b” and “−c” based on “−a”. The serial number generation unit 104 generates “03” and “04” based on “01” and “02”. Note that the number of serial number phrases generated by the serial number generation unit 104 is not particularly limited.

Then, the serial number generation unit 104 includes information included as attributes “−a”, “01”, and “02” in the attributes “−b”, “−c”, “03”, and “04”, respectively. Information indicating that the word is the last part of the character string before decomposition. Then, the sequence number generation unit 104 outputs the generated sequence number phrase to the synthesis unit 103.

After completion of step S43 and step S44, the synthesizing unit 103 generates associative words “winter” and “autumn” and sequential word phrases “−b”, “−c”, “03”, and “04”. Then, dummy information is generated by performing synthesis processing using the words included in the decomposed data (step S45).

In this way, by using the sequential number words and phrases generated by the sequential number generation unit 104 for synthesis, it is possible to generate dummy information that is difficult for an attacker to be determined as a dummy in more patterns.

Here, with reference to FIG. 7, the process of step S45 will be further described. FIG. 7 is a diagram for explaining the operation of the combining unit 103 of the information generating apparatus 100 according to the present embodiment.

As shown in FIG. 7, the synthesis unit 103 combines the following (A) and (B).
(A) an associative word or a word that cannot generate a serial number phrase among the words received from the analysis unit 101 (described as decomposition data A in FIG. 7),
(B) A serial number phrase or a word that can generate a serial number phrase among the words received from the analysis unit 101 (described as decomposition data B in FIG. 7).

At this time, the combining unit 103 generates dummy information by combining (combining) the arrays having the same attribute information as one array in each of (A) and (B). In this example, the attribute of the associative word and decomposition data A included in (A) includes information indicating that the word is the first part of the character string before decomposition, and the serial number included in (B). The attribute of the phrase and the decomposition data B includes information indicating that the word is the last part of the character string before decomposition. Therefore, the synthesizing unit 103 takes the associative word and the decomposed data A included in (A) as one array, and the sequential number phrase and decomposed data B included in (B) as one array. Word). In FIG. 7, “X” between the arrays indicates that the elements of the arrays are combined.

At this time, the synthesizing unit 103 uses the combination element information stored in the storage unit 120 or the collection unit 110 so that the combined character string is not the same as the original character string (regular information). Confirm that the character string is not included. When the combined character string is included in the component element information, since it is the original character string, the synthesis unit 103 does not use the combined character string as dummy information.

Thus, the synthesis unit 103 generates dummy information using the associative word generated based on the concept information of the component element information. Therefore, the information generating apparatus 100 can generate dummy information that does not feel uncomfortable even when used as component information. Further, the synthesizing unit 103 may include a character string that does not use an associative word, for example, “test03” and “test-a” in the dummy information. Thereby, more character strings can be generated as dummy information.

The combining unit 103 may store the generated dummy information as information that can be browsed by an attacker, for example, in an external storage device or the like, or may transmit it to a predetermined device. Further, the composition unit 103 may be configured to transmit the generated dummy information to another device or system when an inquiry is made from another device or system.

In FIGS. 5 to 7, the case where the component information is a host name has been described as an example. However, the information generation apparatus 100 may generate dummy information for other types of component information as well. it can. For example, even when the component information is a user account, the information generation apparatus 100 can generate dummy information for the user account in the same manner as the host name.

In addition, even when the component information is a file name, URI, or mail address, the information generation apparatus 100 can generate dummy information for these pieces of information.

Hereinafter, a method in which the information generation apparatus 100 generates dummy information will be described separately for the case where the component information is a file name, a URI, and a mail address.

(When the component information is a file name)
In the following, description will be given by taking as an example that the component information collected by the collection unit 110 is a file name. FIG. 8 is a diagram for explaining the operation of the information generating apparatus 100 when the component information (collected data) to be collected is a file name.

Even when the collected data is a file name, the information generating apparatus 100 can generate dummy information by the same process as the host name. FIG. 8 illustrates a case where the file name is a character string including a space between words.

Suppose that the file name included in the collected data is “Japanese summer 2014” as shown in FIG. At this time, the analysis unit 101 decomposes the character string (file name) into “Japan”, “summer”, and “2014”. Then, the analysis unit 101 includes, in the attribute “Japane”, information indicating that the word is the first part of the character string before decomposition and information indicating that a space follows the word. Further, the analysis unit 101 includes, in the “summer” attribute, information indicating that the word is the second part of the character string before decomposition and information indicating that a space follows the word. Further, the analysis unit 101 includes information indicating that the word is the last part of the character string before decomposition in the attribute “2014”.

As described above, the analysis unit 101 includes the information indicating the position of the space in the attribute of the word immediately before the space, but the analysis unit 101 of the present embodiment is not limited to this. . The analysis unit 101 may be configured to include information indicating the position of the space in the attribute of the word immediately after the space.

Then, the associative word determination unit 102 confirms whether these words are included in the concept information. In this example, it is assumed that “Japan” and “summer” are included in the concept information. The associative word determination unit 102 searches for a superordinate concept of “Japan” and “summer”. In this example, it is assumed that “Japan” and “summer” do not have the same superordinate concept. The associative word determination unit 102 determines the associative word (for example, “American”) of “Japan” and the associative word (for example, “winter”) of “summer”.

Then, the associative word determination unit 102 includes the attribute “Japan” in the attribute “American” and the attribute “summer” in the attribute “winter”.

Further, the serial number generation unit 104 generates “2013” based on “2014”. Then, the serial number generation unit 104 includes the attribute “2014” in the attribute “2013”.

The synthesizing unit 103 then includes (1) an associative word or a word that cannot be generated from a word received from the analyzing unit 101, and (2) a serial number of the word received from the analyzing unit 101. Of these, words having the same attribute information for each word for which serial number phrases can be generated are arranged as one array, and dummy information is generated by combining these arrays.

That is, as shown in FIG. 8, the synthesis unit 103 combines the following (A) to (C).
(A) “American” is an association word or a word that has an attribute of being a word of the first part of a character string before decomposition, among words that cannot be generated as a sequential number phrase among words received from the analysis unit 101 ”And“ Japane ”,
(B) A word having an attribute that it is a word of the second part of the character string before the decomposition among the words that cannot be generated from the associative word or the serial number phrase among the words received from the analysis unit 101. winter "and" summer ",
(C) Sequential number phrases or words that can generate a sequential number phrase among the words received from the analysis unit 101, are words having an attribute of being the last part of the character string before decomposition. “2013” and “2014”.

Further, when combining (A) to (C), the synthesis unit 103 inserts a space at a predetermined position according to information indicating the position of the space included in the attribute of each word. Thereby, the composition unit 103 can generate dummy information including character strings such as “American winter 2013” and “American summer 2014” as shown on the right side of FIG.

(When the component information is an email address)
In the following, description will be given by taking as an example that the component information collected by the collection unit 110 is a mail address. 9 and 10 are diagrams for explaining the operation of the information generating apparatus 100 when the component information (collected data) to be collected is a mail address.

As shown in FIG. 9, when the collected data is an e-mail address, the analysis unit 101 decomposes the e-mail address into a local part and a domain for each e-mail address. Then, the analysis unit 101 analyzes the character string included in the local part and breaks it down into words. An example of the decomposed word is shown on the right side of FIG. The local part “a-xxx” of the first mail address shown on the right side of FIG. 9 is decomposed into “a-” and “xxx” as shown in FIG. At this time, the analysis unit 101 assigns information indicating that the word is the first part of the character string before decomposition to “a−” as an attribute. In addition, the analysis unit 101 assigns information indicating that the word is the second (last in the local part) word of the character string before decomposition to “xxx”. Similarly, for the other mail addresses, the analysis unit 101 decomposes the local part into a domain and divides the local part into words. In this example, the at sign is included in the domain as the first character of the domain.

Assume that the associative word determination unit 102 determines “vvv” and “nnn” as the associative words of “xxx” and “kkk”, and determines “yy” as the associative word of “zz”.

Further, the sequence number generation unit 104 generates “c-” and “d-” as sequence numbers of “a-” and “b-”, and generates “02” as sequence numbers of “01”. Suppose that

After that, the synthesizing unit 103 (1) the associative word or the word received from the analyzing unit 101 that cannot generate the serial number phrase, and (2) the sequential number phrase or the word received from the analyzing unit 101 Among them, for each word for which serial number phrases can be generated, those having the same attribute information are made into one array, and the domains are made into one array, and dummy information is generated by combining these arrays.

That is, as shown in FIG. 10, the synthesis unit 103 combines the following (A) to (C).
(A) Sequential number phrases or words that can generate a sequential number phrase among the words received from the analysis unit 101, are words having an attribute of being the first part of the character string before decomposition. “A-”, “b-”, “c-” and “d-”,
(B) A word having an attribute that it is a word of the second part of the character string before the decomposition among the words that cannot be generated from the associative word or the serial number phrase among the words received from the analysis unit 101. xxx "," kkk "," vvv "and" nnn ",
(C) Domain.

Further, the synthesis unit 103 combines the following (D) to (F) as shown in FIG.
(D) “zz”, which is an associative word or a word that cannot be generated as a sequential number phrase among the words received from the analysis unit 101, has the attribute that it is the first part of the character string before decomposition. ”And“ yy ”,
(E) A word having an attribute that it is a word of the second part of the character string before decomposition, out of words that can generate a serial number word or phrase among the words received from the analysis unit 101. Certain "01" and "02",
(F) Domain.

Thereby, the synthesizing unit 103 can generate dummy information including a character string such as “a-vvv@yyy.ne.jp” as shown on the right side of FIG.

As described above, according to the information generation apparatus 100 according to the present embodiment, even if the component information is a mail address, dummy information that is difficult for an attacker to determine can be generated if the information is dummy information.

(When component information is URI)
In the following, description will be given by taking as an example that the component information collected by the collection unit 110 is a URI. FIG. 11 is a diagram for explaining the operation of the information generation apparatus 100 when the collected data is a URI.

As shown in FIG. 11, when the collected data is a URI, the analysis unit 101 decomposes the character string described as the URI for each hierarchy. And the analysis part 101 analyzes the character string of each hierarchy, and decomposes | disassembles it into a word. An example of the decomposed word is shown in FIG. As shown in FIG. 11, the character string “folder01” in the first hierarchy in the URI is broken down into “folder” and “01”.

At this time, the analysis unit 101 uses information indicating the hierarchy including the character string before the decomposition of the decomposed word and information indicating the position of the decomposed word in the character string before the decomposition of the decomposed word. Include as an attribute.

Then, the associative word determination unit 102 determines an associative word for the decomposed characters. Further, the serial number generation unit 104 generates a serial number phrase. Then, the synthesizing unit 103 synthesizes words for each hierarchy, and then synthesizes character strings in each hierarchy. Since the synthesis method is the same as that described above, the description thereof is omitted. The synthesizing unit 103 generates, as dummy information, a character string in which delimiters that delimit layers are inserted between the layers.

As described above, according to the information generation apparatus 100 according to the present embodiment, even if the component information is a URI, dummy information that is difficult to be discriminated by an attacker as dummy information can be generated.

As described above, according to the information generation apparatus 100 according to the present embodiment, the same effects as those of the information generation apparatus 10 according to the first embodiment can be obtained. Further, according to the information generation apparatus 100 according to the present embodiment, even if the component information is a host name, a file name, a user account, a mail address, a URI, etc., more preferably, it is dummy information for the attacker. It is possible to generate dummy information that is difficult to discriminate.

<Third Embodiment>
Next, a third embodiment of the present invention will be described with reference to the drawings. For convenience of explanation, members having the same functions as those included in the drawings described in the first and second embodiments described above are given the same reference numerals, and descriptions thereof are omitted. In the present embodiment, another method of associative word generation by the associative word determination unit 102 will be described. Note that the information generation apparatus 100 according to the present embodiment has the same configuration as that shown in FIG.

FIG. 12 is a diagram for explaining a configuration of conceptual information stored in the storage unit 120 of the information generating apparatus 100 according to the present embodiment. As illustrated in FIG. 12, the storage unit 120 indicates that a higher concept such as “spring” or “fall” is “season” and a higher concept such as “season” is “xxx”. Further, FIG. 12 shows that “yyy” is present in the upper concept on the multiple layers of “xxx”. In addition, “zzz” is included in the subordinate concept of “yyy”, “fruit” is included in the subordinate concept of “zzz”, and “apple” and “orange” are included in the subordinate concept of “fruit”.

As shown in FIG. 12, since the data structure of the concept information stored in the storage unit 120 according to the present embodiment is a tree structure, each word is referred to as a node in the present embodiment.

The associative word determination unit 102 according to the present embodiment calculates the distance between the superordinate concepts when a plurality of superordinate concepts common to the plurality of words included in the concept information are searched. For example, it is assumed that the words decomposed by the analysis unit 101 include “spring”, “fall”, “apple”, and “orange”. At this time, the associative word determination unit 102 searches for a superordinate concept of all words. Since “spring” and “fall” have the same superordinate concept “season”, and “apple” and “orange” have the same superordinate concept “fruit”, the associative word determination unit 102 has the superordinate concept It is determined that two are retrieved.

Thereafter, the associative word determination unit 102 calculates the inter-node distance between the “season” node and the “fruit” node. In this embodiment, the distance between the node and the parent node of this node is 1. This distance is also called the arrival hop count. That is, the distance (the number of hops reached) from the node to the parent node of this node is 1.

Then, the associative word determination unit 102 selects another upper level in a distance (also called an intermediate distance or the number of intermediate hops) from at least one of the higher level concepts (“season”, “fruit”) to approximately half of the calculated distance. A word that is a subordinate concept to the concept is determined as an associative word.

For example, when the number of hops reached between the “season” node and the “fruit” node is 8, the number of intermediate hops is 4. Therefore, the associative word determination unit 102 determines, as an associative word, a word that is a child node (subordinate concept) of a node (superordinate concept) whose arrival hop count from the “season” node is four. When the node having the number of hops reached from the “season” node of 4 is “city” and the subordinate concept thereof is “tokyo”, “paris”, “kyoto”, etc., the associative word determination unit 102 selects “ A predetermined number of words are determined as associative words from “tokyo”, “paris”, “kyoto”, and the like. At this time, the number of words determined as associative words is arbitrary.

In the above example, the subordinate concept of a node whose number of hops reached from the “season” node is the number of intermediate hops has been described as an associative word. However, the number of hops reached from the “fruit” node is the number of intermediate hops. A certain node may be an associative word.

Thereby, the information generation apparatus 100 according to the present embodiment can generate dummy information using a keyword close to a word (keyword) included in regular information. Therefore, according to the information generation device 100 according to the present embodiment, not only the upper concept (referred to as the direct superordinate concept) for the word included in the regular information and the word sharing the superordinate concept, A word that is different from a direct concept in terms of words included in regular information can also be determined as an associative word. Thereby, according to the information generation device 100 according to the present embodiment, it is possible to generate dummy information that is beyond the range assumed from the legitimate information and is more difficult for an attacker to identify as dummy information.

(Modification 1)
Next, a modification according to the present embodiment will be described. In this modification, another method of associative word generation by the associative word determination unit 102 will be described.

In this modification, for example, a case where three or more superordinate concepts that are common among a plurality of words included in the concept information are searched will be described. For example, it is assumed that the words decomposed by the analysis unit 101 include “spring”, “fall”, “apple”, “orange”, “tokyo”, and “paris”. At this time, the associative word determination unit 102 searches for a superordinate concept of all words. Note that “spring” and “fall” have the same superordinate concept “season”, “apple” and “orange” have the same superordinate concept “fruit”, and “tokyo” and “paris” are the same. Suppose that it has a superordinate concept “city”. Therefore, the associative word determination unit 102 determines that three superordinate concepts have been searched.

Then, the associative word determination unit 102 calculates the number of hops reached between the three superordinate concepts. That is, the associative word determination unit 102 determines (1) the distance between “season” and “fruit”, (2) the distance between “season” and “city”, and (3) “fruit” and “city”. Is calculated.

Then, the associative word determination unit 102 selects another superordinate concept that is in the average distance (also referred to as the mean hop count) of each hop number reached from at least one of the superordinate concepts (“season”, “fruit”, “city”). The subordinate concept word for is determined as an associative word.

For example, if the average number of hops reached is 4, the associative word determination unit 102 determines a word that is a child node (subordinate concept) of a node having a hop count of 4 from the “fruit” node as an associative word. . If the node having the number of hops reached from the “fruit” node is “restaurant” and the subordinate concept is “cafeteria”, “teashop”, “pub”, etc., the associative word determination unit 102 selects “ A predetermined number of words are determined as associative words from “cafeteria”, “teashop”, “pub”, and the like. At this time, the number of words determined as associative words is arbitrary.

In the above example, it has been described that the subordinate concept of a node whose average hop count is the number of hops reached from the “fruit” node is an associative word, but the number of hops reached from the “season” node or the “city” node. A node having the average number of hops may be used as an associative word. Further, a subordinate concept of a node whose number of hops reached from the route is the average number of hops may be used as an association word.

Thereby, the information generating apparatus 100 according to the present modification can obtain the same effects as the information generating apparatus 100 according to the third embodiment.

Note that the associative word determination unit 102 according to the present modification, like the associative word determination unit 102 according to the third embodiment described above, is the number of intermediate hops calculated from at least one of the higher concepts. The subordinate concept words for other superordinate concepts may be further determined as associative words.

(Modification 2)
Next, another modification according to the present embodiment will be described. In this modification, another method of associative word generation by the associative word determination unit 102 will be described.

The associative word determination unit 102 according to this modification may determine an associative word using an initial value given in advance. This initial value is, for example, a value indicating how many levels go up from a certain word. At this time, the associative word determination unit 102 specifies a word in a hierarchy higher than the initial value from words included in the concept information (a superordinate concept at a predetermined distance). For example, when the word included in the concept information is “winter” and the initial value is 2, the associative word determination unit 102 has two higher-level concepts (“xxx” in FIG. 12) above “winter”. Is identified. Then, the associative word determination unit 102 determines a word of a lower concept of “xxx” as an associative word. The associative word determination unit 102 may add the associative word determined in the present modification to the associative word determined in the third embodiment or the first modification.

Even if the associative word determination unit 102 determines an associative word in this way, the information generation device 100 according to the present modified example uses dummy information for an attacker who exceeds the range assumed from regular information. It is possible to generate dummy information that is more difficult to discriminate.

(Modification 3)
Next, another modification according to the present embodiment will be described. In this modification, another method of associative word generation by the associative word determination unit 102 will be described.

The associative word determining unit 102 according to the present modification may determine an associative word using an initial value given in advance, similarly to the associative word determining unit 102 according to the second modification. The initial value in this modification is a value indicating the number of required associative words.

The associative word determination unit 102 searches for a superordinate concept of a word included in the concept information, and determines a subordinate concept word for the superordinate concept as an associative word. At this time, when the number of words of the lower concept is smaller than the initial value, the associative word determination unit 102 determines the word of the lower concept for the higher concept of the higher concept as the associative word.

For example, when the word included in the concept information is “winter” and the initial value is 8, the associative word determination unit 102 is a superordinate concept one level higher than “winter” (“season” in FIG. 12). Is identified. Then, the associative word determination unit 102 checks whether the number of words of the subordinate concept of “season” and the number of words other than “winter” is equal to or greater than the initial value (8). When the number of words of the subordinate concept of “season” excluding “winter” is 6, for example, the associative word determination unit 102 identifies the superordinate concept of “season” (“xxx” in FIG. 12). . Then, the associative word determination unit 102 confirms whether the number of words of the subordinate concept of “xxx” is equal to or larger than the initial value. As described above, the associative word determination unit 102 according to the present embodiment searches the higher-level concept by going back up the hierarchy until a higher-level concept having a word equal to or higher than the initial value as a lower-level concept appears. Then, when there is a superordinate concept having a word equal to or higher than the initial value as a subordinate concept, the associative word determination unit 102 determines a subordinate concept word for the superordinate concept as an associative word. The associative word determination unit 102 may add the associative word determined in the present modification to the associative word determined in the third embodiment, the first modification, or the second modification.

<Fourth embodiment>
Next, a fourth embodiment of the present invention will be described. For convenience of explanation, members having the same functions as those included in the drawings described in the first to third embodiments described above are given the same reference numerals, and descriptions thereof are omitted. In this embodiment, another method of generating dummy information by the synthesis unit 103 will be described. Note that the information generation apparatus 100 according to the present embodiment has the same configuration as that shown in FIG.

In each of the above-described embodiments, it has been described that all character strings combined by the combining unit 103 are dummy information, but the configuration of the combining unit 103 is not limited to this.

The composition unit 103 of the information generation apparatus 100 according to the present embodiment may assign a priority to the combined (after composition) character string. For example, when the associative word and / or sequential number phrase is included in the combined character string, the combining unit 103 may set the priority of the character string higher. For example, when the number of appearances is included in the attribute of the word included in the combined character string, the combining unit 103 may set the priority of the character string including the word with a higher appearance number to a higher priority. Good. Further, for example, when the associative word included in the combined character string is determined from a predetermined number of words or more, the combining unit 103 may set the priority of the character string higher. . Thus, the priority setting method is not particularly limited. The priority may indicate a level or may be ranked.

When the priority assigned to the character string indicates a level, and a value with a large numerical value indicates a higher level, the synthesis unit 103 selects a character string with a priority greater than a predetermined value. Generated as dummy information.

Further, when the priority given to the character string is a rank (priority order), the synthesis unit 103 generates a character string having a higher priority than a predetermined value as dummy information. For example, when the predetermined value is N (N is a natural number), the synthesis unit 103 generates the top N character strings as dummy information.

For example, it can be said that a character string with a large number of appearances or a serial number is a character string that is difficult for an attacker to identify as dummy information. Therefore, the character string generated by the synthesizing unit 103 is a character string that is more difficult for an attacker to identify as dummy information than the character string generated in the first to third embodiments described above. Therefore, the information generating apparatus 100 according to the present embodiment can generate only character strings that are more difficult to be identified as dummy information by an attacker as dummy information.

<Fifth embodiment>
Next, a fifth embodiment of the present invention will be described in detail with reference to the drawings.

For convenience of explanation, members having the same functions as those described in the first to fourth embodiments are denoted by the same reference numerals and description thereof is omitted.

FIG. 13 is a functional block diagram showing an example of the functional configuration of the information generating apparatus according to the present embodiment. The information generation apparatus 400 according to the present embodiment is configured to further include a storage unit 420 in the information generation apparatus 100 according to the second to fourth embodiments. Specifically, as illustrated in FIG. 13, the information generation apparatus 400 includes an analysis unit 101, an associative word determination unit 102, a synthesis unit 103, a sequence number generation unit 104, a collection unit 110, and a storage unit (first unit). 2 storage units) 120 and a storage unit (first storage unit) 420. The configuration of the information generation system including the information generation apparatus 400 according to the present embodiment is the same as the configuration described with reference to FIG.

In the present embodiment, a configuration in which the storage unit 420 is built in the information generating apparatus 400 will be described as an example. However, the configuration related to the storage unit 420 is not limited to this. The storage unit 420 may be realized by a storage device that is separate from the information generation device 400. In the present embodiment, the storage unit 120 and the storage unit 420 are described as an example of a separate configuration. However, the storage unit 120 and the storage unit 420 are realized by a single storage unit. It may be.

The storage unit 420 stores material information. The material information is information indicating a material that can be used as dummy information. Specifically, the material information is information including words that are listed in advance by the user and registered in the storage unit 420 as words that are difficult for a computer to automatically generate. Words that are difficult for a computer to generate automatically are dummy-like names that can be used as dummy information (for example, proxygate2, ip8800, dhcp01, etc.) and unique in the corporate system 300 to which the dummy information is applied. This is a word composed of a name that conforms to the naming convention. The unique naming rule is, for example, a rule that the name of a server installed in Tokyo is “tk-svr”. That is, the words included in the material information are character strings that are highly likely to be used in the system, but are not conceptualized.

The associative word determination unit 102 checks whether the word received from the analysis unit 101 is included in the concept information. If the word is a word that is not included in the concept information, the word can generate a serial number phrase. Check whether or not. If the word not included in the concept information is not a word that can generate a serial number phrase, the associative word determination unit 102 may register the word as material information in the storage unit 420.

As described above, the material information includes not only words registered in advance by the user but also words registered by the associative word determination unit 102. Note that the word to be registered as material information by the user may be selected from words determined to be not included in the concept information by the associative word determination unit 102.

In addition, for words that are listed in advance and registered as material information in the storage unit 420 by the user, the associative word determination unit 102 determines whether or not the words themselves are actually used. It may be confirmed by making an inquiry to a DNS (Domain Name System), a DHCP (Dynamic Host Configuration Protocol) server, or the like in the in-company system 300. Then, as a result of the inquiry, if the word itself that is not included in the concept information is not actually used, the associative word determination unit 102 may register this word as material information.

Note that the associative word determination unit 102 may attach an attribute to each word registered as material information. Information included in this attribute may be information arbitrarily registered by the user. When the word registered as the material information is a word supplied from the analysis unit 101, the attribute of the word registered as the material information may be an attribute given to this word by the analysis unit 101.

The synthesizing unit 103 receives the decomposed word from the analyzing unit 101. The synthesizing unit 103 receives an associative word from the associative word determining unit 102. The synthesizing unit 103 also receives sequential number phrases from the sequential number generating unit 104. Further, the synthesis unit 103 acquires material information from the storage unit 420.

The synthesizing unit 103 synthesizes not only the associative word determined by the associative word determining unit 102 but also the word included in the material information acquired from the storage unit 420 as the associative word. Note that the synthesizing unit 103 may perform synthesis using the words themselves decomposed by the analyzing unit 101, as in the second and third embodiments described above. Further, as in the fourth embodiment described above, the synthesis unit 103 may assign a priority to the synthesized character string and use the higher priority as dummy information. Thus, since the synthesizing method of the synthesizing unit 103 according to the present embodiment uses the same synthesizing method as in each of the above-described embodiments, detailed description thereof is omitted in the present embodiment.

As described above, the information generation apparatus 400 according to the present embodiment generates dummy information using the material information. Thereby, the information generation device 400 according to the present embodiment can obtain the same effects as those of the information generation devices according to the first to fourth embodiments described above. In addition, the information generation apparatus 400 according to the present embodiment can generate dummy information more similar to regular information.

<Example of hardware configuration>
Here, a configuration example of hardware capable of realizing the information generation apparatus (10, 100, 400) according to each embodiment described above will be described. The information generation device (10, 100, 400) described above may be realized as a dedicated device, but may be realized using a computer (information processing device).

FIG. 14 is a diagram illustrating a hardware configuration of a computer (information processing apparatus) capable of realizing each embodiment of the present invention.

The hardware of the information processing apparatus (computer) 90 shown in FIG. 14 includes a CPU (Central Processing Unit) 11, a communication interface (I / F) 12, an input / output user interface 13, a ROM (Read Only Memory) 14, a RAM ( Random Access Memory) 15, a storage device 17, and a drive device 18 of a computer-readable storage medium 19, which are connected via a bus 16. The input / output user interface 13 is a man-machine interface such as a keyboard which is an example of an input device and a display as an output device. The communication interface 12 is a general communication means for the devices according to the above-described embodiments (FIGS. 1, 3, and 13) to communicate with an external device via the communication network 80. In the hardware configuration, the CPU 11 controls the overall operation of the information processing apparatus 90 that realizes the information generation apparatuses (10, 100, 400) according to the embodiments.

In each of the above-described embodiments, for example, a program (computer program) that can realize the processing described in each of the above-described embodiments is supplied to the information processing apparatus 90 illustrated in FIG. It implement | achieves by reading to CPU11 and performing. The program is stored in the apparatus in the various processes described in the flowchart (FIG. 4) referred to in the description of the above embodiments, or in the block diagrams shown in FIGS. It may be a program capable of realizing each part (each block) shown.

The program supplied in the information processing apparatus 90 may be stored in a readable / writable temporary storage memory (15) or a non-volatile storage device (17) such as a hard disk drive. That is, in the storage device 17, the program group 17 </ b> A is a program that can realize the function of each unit shown in the information generation device (10, 100, 400) in each of the above-described embodiments. The various kinds of stored information 17B are, for example, collected data, decomposed data, associative words, sequential number phrases, dummy information, conceptual information, material information, and the like in the above-described embodiments. However, when the program is installed in the information processing apparatus 90, the constituent unit of each program module is not limited to the division of each block shown in the block diagrams (FIG. 1, FIG. 3, FIG. 13). May be selected as appropriate during mounting.

In the above case, the program is supplied into the apparatus via various computer-readable recording media (19) such as a CD (Compact Disk) -ROM and a flash memory. A general procedure can be adopted at present, such as a method and a method of downloading from the outside via a communication line (80) such as the Internet. In such a case, each embodiment can be considered to be configured by a code (program group 17A) constituting the computer program or a storage medium (19) in which the code is stored.

The present invention has been described above as an example applied to the exemplary embodiment described above. However, the technical scope of the present invention is not limited to the scope described in each embodiment described above. It will be apparent to those skilled in the art that various modifications and improvements can be made to the embodiment. In such a case, new embodiments to which such changes or improvements are added can also be included in the technical scope of the present invention. This is clear from the matters described in the claims.

This application claims priority based on Japanese Patent Application No. 2014-190805 filed on September 19, 2014, the entire disclosure of which is incorporated herein.

DESCRIPTION OF SYMBOLS 1 Information generation system 10 Information generation apparatus 100 Information generation apparatus 101 Analysis part 102 Associative word determination part 103 Composition part 104 Serial number generation part 110 Collection part 120 Storage part 200 Network 300 In-company system 400 Information generation apparatus 420 Storage part 80 Communication network 90 Information processing apparatus 11 CPU
12 Communication interface 13 Input / output user interface 14 ROM
15 RAM
16 bus 17 storage device 18 drive device 19 storage medium

Claims

An analysis means for decomposing a character string included in component information relating to a component of the system into words;
An associative word determining means for determining an associative word of the word based on the concept information for a word included in the conceptual information among the decomposed words;
By combining the associative word and the word following at least one of the word before and after the word in the character string before the word used for the determination of the associative word, or the word associative word, An information generation apparatus comprising: synthesis means for generating dummy information including a character string different from the character string included in the component element information.
The associative word determination means searches for a superordinate concept common among a plurality of words included in the concept information among the decomposed words based on the concept information, and when a common superordinate concept is searched, The information generation apparatus according to claim 1, wherein words other than the plurality of words that are lower concept words with respect to the higher concept are determined as associative words with respect to the plurality of words included in the concept information.
3. The information generation apparatus according to claim 2, wherein the associative word determination unit determines words of the lower concept that are equal to or more than the number of words included in the concept information as associative words.
The associative word determining means calculates a distance on the data structure between the superordinate concepts when a plurality of superordinate concepts common to the plurality of words included in the concept information are searched, and from at least one of the superordinate concepts The information generation device according to claim 2, wherein a word of a subordinate concept relative to another superordinate concept at a distance up to approximately half of the distance is determined as an associative word.
The associative word determining means calculates a distance in the data structure between the superordinate concepts when at least three superordinate concepts common to the plurality of words included in the concept information are searched, and at least the superordinate concepts The information generation apparatus according to claim 2 or 3, wherein a word of a lower concept with respect to another higher concept at an average distance of the distance is determined as an associative word.
Serial number generating means for generating, for the words included in the predetermined array among the decomposed words, other words in the predetermined array as serial number phrases for the words included in the predetermined array; In addition,
The synthesizing means includes the association word and a word that follows at least one of the word before and after the word in the character string before the word used for determination of the association word, and is included in the predetermined array. The information generation apparatus according to any one of claims 1 to 5, wherein the dummy information including a character string that is a combination of the sequential number phrases for a word to be generated is generated.
The associative word determining means specifies a superordinate concept whose distance on a data structure from a word included in the concept information is a predetermined distance, and determines a subordinate concept word for the superordinate concept as an associative word. Item 7. The information generation device according to any one of Items 1 to 6.
The associative word determining means searches a superordinate concept of a plurality of words included in the concept information among the decomposed words based on the concept information, and determines a word of a subordinate concept with respect to the superordinate concept as an associative word. In this case, the superordinate concept in which the number of associative words is equal to or greater than a predetermined value is searched, and a subordinate concept word for the superordinate concept is determined as an associative word. Information generator.
The said synthetic | combination means attaches a priority to the combined character string, The said dummy information is produced | generated based on whether the said priority is more than a predetermined value, It is any one of Claim 1-8 Information generator.
A first storage means for storing material information indicating material of a word that is likely to be used in the system;
The information generating apparatus according to claim 1, wherein the synthesizing unit generates the dummy information using a word included in the material information as the associative word.
The component information includes an email address,
The analysis means decomposes the mail address into a local part and a domain, decomposes a character string included in the local part into words,
The synthesizing unit is configured to provide the local part with at least one of the associative word of the word in the local part and the character string before the word used for determination of the associative word before and after the word. Combining the following word or the associated word of the word,
The dummy information which consists of a character string different from the character string described as the said mail address by further combining the said combined character string and the said domain is produced | generated in any one of Claim 1 to 10. Information generator.
The component element information includes a URI (Uniform Resource Identifier),
The analysis means decomposes the character string described as the URI for each hierarchy, decomposes the decomposed character string of each hierarchy into words,
The synthesizing means includes a word following at least one of the word before and after the word in the associative word and the character string before the word used for the determination of the word in the hierarchy. Or in combination with the associated word of the word,
12. The dummy information including a character string different from the character string described as the URI is generated by combining the character strings of the layers so as to include the combined character string in at least one hierarchy. The information generation device according to any one of the above.
The information generation device according to any one of claims 1 to 12, further comprising second storage means for storing the conceptual information.
Decompose character strings contained in component information about system components into words,
For words included in the concept information among the decomposed words, based on the concept information, determine an associative word of the word,
By combining the associative word and the word following at least one of the word before and after the word in the character string before the word used for the determination of the associative word, or the word associative word, An information generation method for generating dummy information composed of a character string different from the character string included in the component element information.
A process of decomposing a character string included in component information related to system components into words;
A process of determining an associative word of the word based on the concept information for a word included in the concept information among the decomposed words;
By combining the associative word and the word following at least one of the word before and after the word in the character string before the word used for the determination of the associative word, or the word associative word, A computer-readable recording medium for storing a program for causing a computer to execute processing for generating dummy information including a character string different from a character string included in component element information.