JP3514193B2

JP3514193B2 - Surname data generation device

Info

Publication number: JP3514193B2
Application number: JP36381399A
Authority: JP
Inventors: 美知雄鍵井
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1999-12-22
Filing date: 1999-12-22
Publication date: 2004-03-31
Anticipated expiration: 2019-12-22
Also published as: JP2001175684A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、システム開発時に
データベースで用いる架空の姓データを発生する姓デー
タの生成装置に関する技術に属する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique relating to a surname data generation device for generating fictitious surname data used in a database during system development.

【０００２】[0002]

【従来の技術】従来、データベースを有するシステムを
テストする場合、個人に関するテストデータを生成する
場合、数値データについては、統計データ等から作成
し、個人名については、辞書に入っている個人名の候補
を表示し、開発者が選択入力したり、選択された個人名
を変換して使用するなどが行われている。2. Description of the Related Art Conventionally, when testing a system having a database, when generating test data for an individual, numerical data is created from statistical data and the like, and an individual name is stored in a dictionary. The candidates are displayed, and the developer selects and inputs them, or the selected personal name is converted and used.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来技
術には以下に掲げる問題点があった。少量の個人名の生
成は、開発者によるデータ入力で容易にできるが、多量
の個人名を生成する場合、特に生成された個人名におけ
る姓について、多量に生成された姓データは出現する回
数分布などが現実の頻出姓とかけ離れ、作成されたデー
タベースに現実性がなく、また、現実性を持たせた姓デ
ータについては、もとのデータを類推できてしまうとい
う問題点があった。However, the prior art has the following problems. Generating a small number of personal names can be easily done by the data input by the developer, but when generating a large number of personal names, especially for surnames in the generated personal names, a large number of surname data occurrence distributions However, there is a problem that the created database is not realistic, and the created surname data is not realistic, and the surname data that is realistic can be inferred from the original data.

【０００４】本発明は斯かる問題点を鑑みてなされたも
のであり、その目的とするところは、システムテストで
使用される個人名における姓データを、頻出度の高い姓
を利用して現実の分布に近似させることと、文字列の組
合せとから容易に生成し、この姓データを用いたテスト
結果を開示してもトラブルが生じない姓データの生成装
置に関する技術を提供する点にある。The present invention has been made in view of such problems, and an object of the present invention is to make the surname data of an individual name used in a system test practical by using surnames having a high frequency. and thereby approximates a distribution, and easily produced from a combination of character strings, generating instrumentation surname data that does not cause any trouble discloses the test results using the last name data
The point is to provide the technology for storage .

【０００５】[0005]

【課題を解決するための手段】請求項１記載の本発明の
要旨は、システム開発でのテストデータなどに使用する
姓データの生成装置であって、乱数を発生させ、ランダ
ムな値を取得するための乱数発生手段と、入力手段を介
して、前記姓データを生成するため、一般の印刷物にて
頻出度の高い複数の頻出姓と、組合せのための複数の文
字列と、該文字列の組合せで生成した前記姓データを除
外するための所定の禁則を有する禁則テーブルとが登録
される記憶手段と、前記乱数発生手段による第２の乱数
の値を、小さくなるに従い分布の数が多くなるように近
似処理を施して補正し、現実の分布に近似させ、前記頻
出姓から、補正された値に対応する頻出姓を取得して姓
データを生成する頻出姓生成部と、前記乱数発生手段に
よる第３の乱数の値と第４の乱数の値に各々対応する前
記文字列の組合せにより前記姓データを生成し、前記姓
データを生成する毎に生成件数に１を加算する組合せ姓
生成部とを有し、生成された前記データを出力手段を介
して出力する姓生成手段と、前記頻出姓に基づく前記姓
データと前記文字列の組合せに基づく前記姓データとが
生成される所定の比率を定める設定値と前記乱数発生手
段が発生する第１の乱数とに基づき、前記頻出姓に基づ
く前記姓データの生成処理と前記文字列の組合せに基づ
く前記姓データの生成処理とを振分ける処理振分け手段
と、前記文字列の組合せにより前記姓データを生成する
とき、前記禁則テーブルが有する禁則を参照して禁則チ
ェックを行う禁則チェック手段とを備えることを特徴と
する姓データの生成装置に存する。請求項２記載の本発
明の要旨は、前記記憶手段は、人名辞書や印刷物などで
検索され、頻出度の高い前記頻出姓を頻出順にソートし
て登録される頻出姓テーブルと、組合せにより生成され
る前記姓データを構成する前記文字列である、前記姓デ
ータの上位に位置する１つの文字、又は、文字列の上位
文字列と、前記姓データの下位に位置する１つの文字、
又は、文字列の下位文字列とが登録される文字列テーブ
ルと、前記文字列を前記上位文字列として登録すると
き、所定の文字列を除外するための、文字の組合せであ
る第１の禁則と、前記文字列を前記下位文字列として登
録するとき、所定の文字列を除外するための、文字の組
合せである第２の禁則と、前記上位文字列と前記下位文
字列とを組合せて前記姓データを生成するとき、所定の
上位文字列と下位文字列との組合せによる前記姓データ
を除外し、前記上位文字列に付加された第１の属性と前
記下位文字列に付加された第２の属性との所定の組合せ
を除外するための第３の禁則とが登録される禁則テーブ
ルとを備えることを特徴とする請求項１記載の姓データ
の生成装置に存する。請求項３記載の本発明の要旨は、
前記禁則チェック手段は、前記文字列を前記上位文字列
として前記記憶手段に登録するとき、前記禁則テーブル
の前記第１の禁則を参照して、該当する文字の組合せと
なる前記文字列を登録から除外し、前記文字列を前記下
位文字列として前記記憶手段に登録するとき、前記禁則
テーブルの前記第２の禁則を参照して、該当する文字の
組合せとなる前記文字列を登録から除外し、前記組合せ
姓生成部で前記上位文字列と前記下位文字列とを組合せ
て前記姓データを生成するとき、前記禁則テーブルの前
記第３の禁則を参照して、該当する前記上位文字列と前
記下位文字列との組合せによる前記姓データの生成を除
外し、前記禁則テーブルの前記第３の禁則を参照して、
前記上位文字列に付加された第１の属性と前記下位文字
列に付加された第２の属性との組合せが該当する場合、
前記姓データの生成を除外することを特徴とする請求項
１又は２記載の姓データの生成装置に存する。SUMMARY OF THE INVENTION The gist of the present invention according to claim 1 is a surname data generation device used for test data or the like in system development, in which a random number is generated to obtain a random value. In order to generate the surname data via a random number generating means and an input means, a plurality of frequent surnames having a high frequency in general printed matter, a plurality of character strings for combination, and a plurality of character strings A storage means in which a prohibition table having a predetermined prohibition for excluding the surname data generated by combination is registered, and the value of the second random number generated by the random number generating means decreases, the number of distributions increases. And a random number generation means for generating a family name data by acquiring a family name data corresponding to the corrected value from the frequent family name, and approximating it to an actual distribution. Of the third random number And a combination surname generation unit that generates the surname data by a combination of the character strings respectively corresponding to the values of the fourth random numbers, and adds 1 to the number of generations each time the surname data is generated. The surname generation means for outputting the data via the output means, and the set value and the random number for setting a predetermined ratio at which the surname data based on the combination of the surname data based on the frequent surname and the character string are generated. A processing distribution means for distributing the surname data generation processing based on the frequent surname and the surname data generation processing based on the character string combination based on the first random number generated by the generation means; and the character string. The surname data generation device is characterized by further comprising prohibition checking means for performing a prohibition check with reference to the prohibition included in the prohibition table when the surname data is generated by the combination. The gist of the present invention according to claim 2 is that the storage means is generated by a combination with a frequent family name table that is searched by a personal name dictionary, printed matter, or the like, and is registered by sorting the frequent family names with high frequency in order of frequency. One character located above the surname data, which is the character string forming the surname data, or an upper character string of the character string, and one character located below the surname data,
Alternatively, a character string table in which a lower character string of a character string is registered, and a first prohibition rule that is a combination of characters for excluding a predetermined character string when registering the character string as the upper character string And when registering the character string as the lower character string, a second prohibition that is a combination of characters for excluding a predetermined character string, and combining the upper character string and the lower character string, When the surname data is generated, the surname data that is a combination of a predetermined upper character string and a lower character string is excluded, and the first attribute added to the upper character string and the second attribute added to the lower character string. The surname data generating apparatus according to claim 1, further comprising: a prohibition table in which a third prohibition for excluding a predetermined combination with the attribute of. The gist of the present invention according to claim 3 is
When registering the character string as the higher-order character string in the storage means, the prohibition checking unit refers to the first prohibition in the prohibition table to register the character string that is a combination of corresponding characters. When excluding and registering the character string in the storage unit as the lower character string, refer to the second prohibition rule in the prohibition table to exclude the character string that is a combination of the corresponding characters from registration. When the surname data is generated by combining the higher-order character string and the lower-order character string in the combined surname generation unit, referring to the third prohibition rule in the prohibition table, the corresponding higher-order character string and the lower-order character string are referred to. Excluding the generation of the surname data by combination with a character string, referring to the third prohibition in the prohibition table,
When the combination of the first attribute added to the upper character string and the second attribute added to the lower character string is applicable,
Claims, characterized in that to exclude generation of the last name data
It exists in the surname data generation device described in 1 or 2 .

【０００６】[0006]

【発明の実施の形態】以下、本発明の実施の形態を図面
に基づいて詳細に説明する。図１に示すように、本実施
の形態に係る姓データの生成装置は、入力手段５と乱数
発生手段１０と記憶手段２０と姓生成手段３０と処理振
分け手段４０と禁則チェック手段５０と出力手段６０と
で概略構成される。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described in detail below with reference to the drawings. As shown in FIG. 1, the surname data generation device according to the present embodiment is provided with an input unit 5, a random number generation unit 10, a storage unit 20, a surname generation unit 30, a process distribution unit 40, a prohibition checking unit 50, and an output unit. And 60.

【０００７】入力手段５は、必要とする姓データを生成
するため、予めテーブル等のデータを記憶手段２０に登
録するために用いられる。乱数発生手段１０は、乱数を
発生してランダムな値を取得する。ここでは、第１の乱
数と第２の乱数と第３の乱数と第４の乱数とを発生す
る。The input means 5 is used for registering data such as a table in advance in the storage means 20 in order to generate necessary family name data. The random number generation means 10 generates a random number and acquires a random value. Here, the first random number, the second random number, the third random number, and the fourth random number are generated.

【０００８】記憶手段２０は、頻出姓テーブル２２と文
字列テーブル２３と禁則テーブル２６とを有する。頻出
姓テーブル２２には、人名辞書や印刷物などで検索され
る複数の姓が頻出順にソートして登録される。文字列テ
ーブル２３には、複数の上位文字列２４と複数の下位文
字列２５とが登録される。上位文字列２４は、組合せで
生成される姓データの上位を構成し、下位文字列２５は
下位を構成する。上位文字列２４には、第１の属性が付
加され、下位文字列２５には、第２の属性が付加され
る。また、上位文字列２４及び下位文字列２５は、とも
に文字、又は、文字列を意味する。本実施の形態では、
この文字列は２文字として説明をする。The storage means 20 has a frequent family name table 22, a character string table 23, and a prohibition table 26. In the frequent family name table 22, a plurality of family names searched by a personal name dictionary, printed matter, etc. are sorted and registered in the order of frequent occurrence. A plurality of upper character strings 24 and a plurality of lower character strings 25 are registered in the character string table 23. The high-order character string 24 constitutes the high-order of the family name data generated by the combination, and the low-order character string 25 constitutes the low-order. A first attribute is added to the upper character string 24, and a second attribute is added to the lower character string 25. Further, the upper character string 24 and the lower character string 25 both mean a character or a character string. In this embodiment,
This character string will be described as two characters.

【０００９】禁則テーブル２６は、第１の禁則２７と第
２の禁則２８と第３の禁則２９とを有する。第１の禁則
２７は、上位文字列２４を登録するとき、所定の文字列
を除外するための、文字の組合せが登録される。第２の
禁則２８は、下位文字列２５を登録するとき、所定の文
字列を除外するための、文字の組合せが登録される。第
３の禁則２９は、上位文字列２４と下位文字列２５とを
組合せて姓データを生成するとき、所定の文字列の組合
せによる姓データを除外するための、上位文字列２４及
び下位文字列２５の組合せと、第１の属性及び第２の属
性の組合せとが登録される。The prohibition table 26 has a first prohibition 27, a second prohibition 28, and a third prohibition 29. In the first prohibition rule 27, a combination of characters for excluding a predetermined character string when registering the upper character string 24 is registered. In the second prohibition rule 28, when a lower character string 25 is registered, a combination of characters for excluding a predetermined character string is registered. The third prohibition rule 29 is that when the surname data is generated by combining the high-order character string 24 and the low-order character string 25, the high-order character string 24 and the low-order character string for excluding the surname data by a predetermined combination of character strings. 25 combinations and combinations of the first attribute and the second attribute are registered.

【００１０】姓生成手段３０は、頻出姓生成部３２と組
合せ姓生成部３６とを有し、生成された姓データを出力
手段６０に送出する。The surname generation means 30 has a frequent surname generation section 32 and a combination surname generation section 36, and sends the generated surname data to the output means 60.

【００１１】頻出姓生成部３２は、乱数発生手段１０に
よる第２の乱数の値を、値が小さくなるに従い、値の分
布が多くなるように近似処理を施して補正し、現実の分
布に近似させ、登録された頻出姓テーブル２２から、補
正により得られた値に対応する頻出姓を取得して姓デー
タを生成し、生成件数に１を加算する。The frequent family name generation unit 32 corrects the value of the second random number generated by the random number generation means 10 by performing an approximation process so that the distribution of values increases as the value decreases, and approximates to the actual distribution. Then, the frequent family name corresponding to the value obtained by the correction is acquired from the registered frequent family name table 22, the family name data is generated, and 1 is added to the generated number.

【００１２】組合せ姓生成部３６は、乱数発生手段１０
による第３の乱数の値に対応する上位文字列２４と第４
の乱数の値に対応する下位文字列２５との組合せにより
姓データを生成し、生成件数に１を加算する。The combination family name generation unit 36 uses the random number generation means 10
4 and the upper character string 24 corresponding to the value of the third random number
The surname data is generated by combination with the lower character string 25 corresponding to the random number value of, and 1 is added to the number of generated cases.

【００１３】処理振分け手段４０は、予め設定された設
定値と乱数発生手段１０による第１の乱数とを比較する
ことで、頻出姓生成部３２での処理と組合せ姓生成部３
６での処理とを振分ける。The processing distribution means 40 compares the preset value with the first random number generated by the random number generation means 10 to perform the processing in the frequent family name generation section 32 and the combined family name generation section 3
The process in 6 is distributed.

【００１４】禁則チェック手段５０は、上位文字列２４
を登録するとき、第１の禁則２７を参照して該当する文
字列を除外し、下位文字列２５とを登録するとき、第２
の禁則２８を参照して、該当する文字列を除外し、組合
せ姓生成部３６で姓データを生成するとき、第３の禁則
２９を参照して該当する上位文字列２４と下位文字列２
５との組合せと、第１の属性と第２の属性の組合せとに
該当する組合せによる姓データを除外する。The prohibition checking means 50 is used for the upper character string 24.
When registering, the corresponding character string is excluded by referring to the first prohibition rule 27, and when registering the lower character string 25,
When the combination surname generation unit 36 generates surname data by excluding the corresponding character string by referring to the prohibition 28 of No. 28, the corresponding upper character string 24 and the lower character string 2 are referred by referring to the third prohibition 29.
The surname data by the combination corresponding to the combination of 5 and the combination of the first attribute and the second attribute is excluded.

【００１５】出力手段６０は、姓生成手段３０から送出
された姓データを出力する。The output means 60 outputs the surname data sent from the surname generation means 30.

【００１６】図２は、図１の処理の流れを示すフローチ
ャート図である。図２を参照して姓データの生成装置の
動作を説明する。FIG. 2 is a flow chart showing the process flow of FIG. The operation of the surname data generation device will be described with reference to FIG.

【００１７】まず、生成された姓の生成件数と所要件数
との比較を行い（図中に「所要件数生成されたか」で示
す）、生成件数が所要件数より小さい（図中「Ｎｏ」
の）場合、処理を続ける（ステップＳ１０１）。First, the number of generated surnames is compared with the required number of places (indicated by "whether the required number of places has been generated" in the figure), and the generated number is smaller than the required number of numbers ("No" in the figure).
In the case of), the processing is continued (step S101).

【００１８】生成する姓を一件毎に頻出姓に基づき生成
するのか、文字列の組合せにより生成するのかの処理を
振分けるため、乱数発生手段１０により第１の乱数を発
生する（ステップＳ１０２）。この第１の乱数は、０か
ら１未満の値をとる。The first random number is generated by the random number generation means 10 in order to distribute the processing to generate the family name on a case-by-case basis based on the frequent family name or the combination of character strings (step S102). . This first random number takes a value from 0 to less than 1.

【００１９】発生した第１の乱数が、予め設定された設
定値より大きい／以上かどうか（図中「設定値より大き
いか」で示す）を比較する（ステップＳ１０３）。ここ
で、設定値は、頻出姓生成の処理と組合せ姓生成の処理
とで発生する比率が目的とする振分け比率となるように
設定する。It is compared whether or not the generated first random number is larger / greater than or equal to a preset setting value (indicated by "is it greater than the setting value" in the figure) (step S103). Here, the set value is set so that the ratio generated in the frequent surname generation process and the combined surname generation process is the target distribution ratio.

【００２０】発生した第１の乱数が設定値以下の／より
小さい（図中「Ｎｏ」の）場合、頻出姓生成の処理を行
い（ステップＳ１０４）、ステップＳ１０１に戻る。When the generated first random number is equal to or smaller than the set value (less than "No" in the figure), the process of generating the frequent family name is performed (step S104), and the process returns to step S101.

【００２１】ステップＳ１０４の処理を説明する。ま
ず、乱数の発生を行う（ステップＳ２０１）。この乱数
を第２の乱数とする。The processing of step S104 will be described. First, a random number is generated (step S201). This random number is the second random number.

【００２２】第２の乱数は均等分布であるため、第２の
乱数の値を補正して、値の分布を現実の分布に近似させ
る（ステップＳ２０２）。Since the second random number has a uniform distribution, the value of the second random number is corrected to approximate the value distribution to the actual distribution (step S202).

【００２３】頻出姓テーブル２２から、補正により得ら
れた値に対応する頻出姓を取得（図中「表から姓を取
得」で示す）する（ステップＳ２０３）。The frequent family name corresponding to the value obtained by the correction is acquired from the frequent family name table 22 (indicated by "acquire family name from table" in the figure) (step S203).

【００２４】生成件数に１を加算し（ステップＳ２０
４）、ステップＳ１０１の処理に戻る。1 is added to the number of generated items (step S20
4) and returns to the process of step S101.

【００２５】ステップＳ１０３において、第１の乱数が
設定値より大きい／以上の（図中「Ｙｅｓ」の）場合、
組合せ姓生成の処理を行う（ステップＳ１０５）。In step S103, if the first random number is greater than / greater than the set value ("Yes" in the figure),
A combined surname generation process is performed (step S105).

【００２６】以下、ステップＳ１０５の処理を説明す
る。まず、０から１００未満の乱数を１組（２つ）発生
（図中「乱数の組を発生」で示す）する（ステップＳ２
１１）。この１組の乱数を第３の乱数と第４の乱数とす
る。The process of step S105 will be described below. First, one set (two sets) of random numbers from 0 to less than 100 is generated (indicated by "generate a set of random numbers" in the figure) (step S2).
11). This set of random numbers is referred to as a third random number and a fourth random number.

【００２７】文字列テーブル２３からステップＳ２１１
で得られた第３の乱数の値に対応する上位文字列２４を
取得し、次に文字列テーブル２３から第４の乱数の値に
対応する下位文字列２５を取得（図中「表から文字列の
組を取得」で示す）する（ステップＳ２１２）。From the character string table 23 to step S211
The upper-order character string 24 corresponding to the value of the third random number obtained in step 3 is acquired, and then the lower-order character string 25 corresponding to the value of the fourth random number is acquired from the character string table 23. The column set is acquired ") (step S212).

【００２８】禁則テーブル２６を参照して禁則チェック
を行い、生成した姓データが第３の禁則２９に該当する
かどうか判断（図中「禁則チェック」で示す）する（ス
テップＳ２１３）。A prohibition check is performed by referring to the prohibition table 26, and it is determined whether or not the generated surname data corresponds to the third prohibition 29 (indicated by "prohibition check" in the figure) (step S213).

【００２９】第３の禁則２９に該当しない（図中「Ｎ
ｏ」の）場合、生成件数に１を加算（ステップＳ２１
４）し、ステップＳ１０１の制御に戻る。Does not correspond to the third prohibition rule 29 ("N in the figure
In case of “o”), 1 is added to the number of generated items (step S21).
4) and returns to the control of step S101.

【００３０】ステップ２１３において、第３の禁則２９
に該当する（図中「Ｙｅｓ」の）場合、生成した姓デー
タを除外して、ステップＳ２１１の処理に戻る。In step 213, the third prohibition 29
If it corresponds to (Yes in the figure), the generated surname data is excluded, and the process returns to step S211.

【００３１】ステップＳ１０１において、姓データの生
成件数が所要件数に達した（図中「Ｙｅｓ」の）場合、
処理を終了する。In step S101, if the number of generated surname data has reached the required number (“Yes” in the figure),
The process ends.

【００３２】（実施例）本実施の形態に係る姓データの
生成装置を用いた方法の実施例を具体的に説明する。予
め、一般の人名辞書や印刷物などで検索される姓を頻出
順にソートして頻出姓テーブル２２に登録する。一例と
して、この登録件数は１００件〜１０００件程度とす
る。(Example) An example of a method using the surname data generation device according to the present embodiment will be specifically described. In advance, family names searched by a general person name dictionary or printed matter are sorted in order of frequency and registered in the frequent family name table 22. As an example, the number of registrations is about 100 to 1000.

【００３３】次に、データベースのテストで必要とする
姓データの所要件数と、この姓の全体の所要件数に対し
て頻出姓テーブル２２に登録された頻出姓に基づき生成
される姓データをどのような比率で生成するかの設定値
を予め設定する。例えば、必要とする姓データの所要件
数を１００００件とした場合、頻出順にソートされた上
位１００件の頻出姓に基づき、２０００件（全体の２０
％に相当）の姓データを生成すると設定する。Next, how is the required number of surname data required for the database test and the surname data generated based on the frequent surname registered in the frequent surname table 22 for the total required number of surnames? A preset value of whether or not to generate at a different ratio is set in advance. For example, if the required number of surname data is 10,000, then 2000 (based on the top 100 most frequent surnames sorted in order of frequency)
Set to generate surname data (equivalent to%).

【００３４】次に、文字列の組合せによる姓データを生
成するため、文字列テーブル２３に、複数の上位文字列
２４と複数の下位文字列２５とを登録する。上位文字列
２４は、組合せで生成される姓データの上位を構成し、
下位文字列２５は、下位を構成する。上位文字列２４に
は、第１の属性を付加し、下位文字列２５には、第２の
属性を付加する。また、上位文字列２４及び下位文字列
２５は、ともに１つの文字、又は、文字列を意味する。Next, in order to generate surname data by combining character strings, a plurality of upper character strings 24 and a plurality of lower character strings 25 are registered in the character string table 23. The high-order character string 24 constitutes the high-order of the surname data generated by the combination,
The lower character string 25 constitutes the lower order. A first attribute is added to the upper character string 24, and a second attribute is added to the lower character string 25. The upper character string 24 and the lower character string 25 both mean one character or a character string.

【００３５】禁則テーブル２６は、第１の禁則２７と第
２の禁則２８と第３の禁則２９とを有する。第１の禁則
２７は、上位文字列２４を登録するとき、所定の文字列
を除外するための、文字の組合せを登録する。第２の禁
則２８は、下位文字列２５を登録するとき、所定の文字
列を除外するための、文字の組合せを登録する。The prohibition table 26 has a first prohibition rule 27, a second prohibition rule 28, and a third prohibition rule 29. The first prohibition rule 27 registers a combination of characters for excluding a predetermined character string when registering the upper character string 24. The second prohibition rule 28 registers a combination of characters for excluding a predetermined character string when registering the lower character string 25.

【００３６】この登録の際、第１の禁則２７を参照して
禁則チェックを行い、「佐々」の様に２文字の場合も登
録可能とするが、同じ文字の連続については上位文字列
２４から除外する。登録された各々の文字又は文字列に
は、ＪＩＳなどのように統一されたコードと、第１の属
性（生物／無生物・有形／無形・人造物／非人造物／数
詞／色／方向／時間／関係／地形／その他（下にブラン
クはこない等））とを付加する。At the time of this registration, the prohibition check is performed with reference to the first prohibition rule 27, and even if there are two characters such as "Sasa", registration is possible. exclude. Each registered character or character string has a unified code such as JIS and the first attribute (biological / inanimate / tangible / intangible / artificial / non-artificial / numerical / color / direction / time / Relationship / Topography / Others (no blank below).

【００３７】次に、第２の禁則２８を参照して禁則チェ
ックを行い、「河原」の様に２文字の場合も登録可能と
するが、同じ文字の連続については下位文字列２５から
除外する。登録された各々の文字には、ＪＩＳなどのよ
うに統一されたコードと第２の属性（生物／無生物・有
形／無形・人造物／非人造物／数詞／色／方向／時間／
関係等）とを付加する。登録件数は一例として、上位文
字列２４、下位文字列２５ともに１００〜５００件程度
とする。Next, a prohibition check is performed with reference to the second prohibition rule 28, and even if there are two characters such as "Kawara", registration is possible, but the same character sequence is excluded from the lower character string 25. . Each registered character has a unified code such as JIS and a second attribute (biological / inanimate / tangible / intangible / artificial / non-artificial / numerical / color / direction / time /
(Relationship etc.) and. As an example, the number of registered cases is about 100 to 500 for both the upper character string 24 and the lower character string 25.

【００３８】第３の禁則は、上位文字列２４と下位文字
列２５とを組合せて姓データを生成するとき、所定の文
字列の組合せによる姓データを除外するための、上位文
字列２４及び下位文字列２５の組合せと、第１の属性及
び第２の属性の組合せとを登録する。The third prohibition is that, when the surname data is generated by combining the upper character string 24 and the lower character string 25, the upper character string 24 and the lower character string for excluding the surname data by a predetermined combination of character strings. The combination of the character string 25 and the combination of the first attribute and the second attribute are registered.

【００３９】次に、必要とする姓データである頻出姓と
組合せ姓の生成を開始する。生成された姓データの生成
件数と所要件数との比較を行い、処理の続行か処理の終
了かを判断する（ステップＳ１０１）。ここで、最初に
姓データが生成されるときは、まだ生成件数が０のた
め、生成件数＜所要件数となるため姓データの生成を行
う。生成件数＝所要件数となれば処理を終了する。Next, the generation of frequent surnames and combined surnames, which are required surname data, is started. The generated number of generated surname data is compared with the required number of places to determine whether to continue the process or to end the process (step S101). Here, when the surname data is first generated, since the number of generations is still 0, the number of generations is smaller than the required number, so the surname data is generated. When the number of generated items = the number of required places, the processing is ended.

【００４０】姓データ一件毎に、頻出姓生成の処理を行
うのか、組合せ姓生成の処理を行うのかを振分けるた
め、０から１未満の第１の乱数を発生する（ステップＳ
１０２）。A first random number from 0 to less than 1 is generated in order to determine whether to process the frequent family name or the combined family name for each family name data (step S).
102).

【００４１】発生した第１の乱数が設定値より大きい／
以上かどうかを比較する（ステップＳ１０３）。ここ
で、設定値は、頻出姓生成の処理と組合せ姓生成の処理
との比率が目的とする振分け比率となるように予め設定
してある。この例の場合、発生した第１の乱数が０．２
０より小さい場合、頻出姓生成の処理を行い、約２００
０件が頻出姓に基づく姓データとなる。The generated first random number is larger than the set value /
It is compared whether it is the above (step S103). Here, the set value is set in advance such that the ratio between the frequent family name generation process and the combined family name generation process is the target distribution ratio. In this example, the generated first random number is 0.2
If it is less than 0, the frequent surname generation process is performed and about 200
Zero cases are surname data based on frequent surnames.

【００４２】発生した第１の乱数が設定値以下の／より
小さい場合、頻出姓生成の処理を行う（ステップＳ１０
４）。If the generated first random number is less than or equal to the set value / less than the set value, the process of generating a frequent family name is performed (step S10).
4).

【００４３】以下、ステップＳ１０４の処理の詳細を説
明する。まず、乱数の発生を行う。一例として、１から
５００未満の乱数を発生させる（ステップＳ２０１）。The details of the processing in step S104 will be described below. First, a random number is generated. As an example, a random number of 1 to less than 500 is generated (step S201).

【００４４】この乱数を第２の乱数とする。第２の乱数
は均等分布であるため、第２の乱数の値を下に凸の関数
（値が小さくなるに従い発生数が多くなる）に代入し小
さい値が多く分布するように補正して、現実の分布に近
似させる（ステップＳ２０２）。一例として、以下の関
数に代入することにより現実の分布に近似させる。発生
する第２の乱数の値Ｒを０から５００未満とした場合、
関数を以下に示す。（０．００００１７／１２５）Ｒ＾３＋（０．００７／
２５）Ｒ＾２＋（０．１３／５）Ｒこの補正により得られる値は０〜９９の値となる。Let this random number be the second random number. Since the second random number has a uniform distribution, the value of the second random number is substituted into a function convex downward (the number of occurrences increases as the value decreases), and correction is performed so that small values are distributed, The distribution is approximated to the actual distribution (step S202). As an example, it is approximated to the actual distribution by substituting the following function. When the value R of the generated second random number is 0 to less than 500,
The functions are shown below. (0.000017 / 125) R ^ 3 + (0.007 /
25) R ^ 2 + (0.13 / 5) R The value obtained by this correction is a value of 0 to 99.

【００４５】頻出姓テーブル２２から、補正により得ら
れた値に対応する頻出姓から姓データを生成し（ステッ
プＳ２０３）、生成件数に１を加算し（ステップＳ２０
４）、ステップＳ１０１の処理に戻る。From the frequent family name table 22, the family name data is generated from the frequent family name corresponding to the value obtained by the correction (step S203), and 1 is added to the generated number (step S20).
4) and returns to the process of step S101.

【００４６】ステップＳ１０３において、第１の乱数が
設定値より大きい／以上の場合、組合せ姓生成の処理を
行う（ステップＳ１０５）。In step S103, if the first random number is greater than / greater than the set value, a combined family name generation process is performed (step S105).

【００４７】以下、ステップＳ１０５の処理の詳細を説
明する。乱数発生手段１０により、０から１００未満の
乱数を１組（２つ）発生する（ステップＳ２１１）。こ
の１組の乱数を第３の乱数と第４の乱数とする。The details of the processing in step S105 will be described below. The random number generator 10 generates one set (two) of random numbers from 0 to less than 100 (step S211). This set of random numbers is referred to as a third random number and a fourth random number.

【００４８】文字列テーブル２３からステップＳ２１１
で得られた第３の乱数の値に対応する上位文字列２４を
取得し、次に文字列テーブル２３から第４乱数の値に対
応する下位文字列２５を取得する（ステップＳ２１
２）。From the character string table 23 to step S211
The upper character string 24 corresponding to the value of the third random number obtained in step S21 is acquired, and then the lower character string 25 corresponding to the value of the fourth random number is acquired from the character string table 23 (step S21).
2).

【００４９】ここで、禁則テーブル２６の第３の禁則を
参照して禁則チェックを行い、生成した姓データが第３
の禁則に該当するかどうか判断する（ステップＳ２１
３）。この第３の禁則は、取得した１組（２つ）の文字
列が同一の文字列の場合や動物どうしが接続された場合
などを登録しておく。一例として、「河原河原」、「馬
鹿」などはこの第３の禁則に従い除外する。Here, the prohibition check is performed by referring to the third prohibition in the prohibition table 26, and the generated surname data is the third.
It is determined whether or not the above prohibition applies (step S21)
3). In the third prohibition rule, the case where the acquired one set (two) of character strings is the same character string or the case where animals are connected to each other is registered. As an example, "Kawaragawara", "idiot", etc. are excluded according to the third prohibition.

【００５０】登録された第３の禁則に該当しなければ、
姓データを採用し、生成件数に１を加算（ステップＳ２
１４）し、ステップＳ１０１の処理に戻る。If the registered third prohibition does not apply,
Adopt 1 family name data and add 1 to the number of generated data (step S2
14) and the process returns to step S101.

【００５１】ステップ２１３において、第３の禁則２９
に該当する場合、ステップＳ２１１の処理に戻る。In step 213, the third prohibition 29
If the above condition applies, the process returns to step S211.

【００５２】なお、本実施の形態においては、本発明は
それに限定されず、本発明を適用する上で好適な姓デー
タの生成装置に関する技術に適用することができる。In the present embodiment, the present invention is not limited to this, and can be applied to a technique relating to a family name data generation device suitable for applying the present invention.

【００５３】また、上記構成部材の数、位置、形状等は
上記実施の形態に限定されず、本発明を実施する上で好
適な数、位置、形状等にすることができる。また、禁則
テーブル２６に登録される禁則を多く設定することでよ
り現実性のある姓データを生成することができる。Further, the number, position, shape, etc. of the above-mentioned constituent members are not limited to those in the above-mentioned embodiment, and the number, position, shape, etc. suitable for carrying out the present invention can be adopted. Further, by setting a large number of prohibitions registered in the prohibition table 26, more realistic surname data can be generated.

【００５４】[0054]

【発明の効果】本発明は以上のように構成されているの
で、以下に掲げる効果を奏する。システムテストで使用
する人に関する姓データを、公知のデータベースから頻
出度の高い頻出姓から抽出して、現実の姓の分布に近似
させることと、文字列を組合せることとで容易に生成で
き、また、現実の分布に近似した架空の姓データを利用
したテストデータを開示しても流用ではないため問題に
はならない。Since the present invention is configured as described above, it has the following effects. Surname data for people used in the system test can be easily generated by extracting the surnames with high frequency from a known database, approximating the distribution of surnames in reality, and combining character strings, In addition, even if the test data that uses fictitious family name data that is close to the actual distribution is disclosed, it is not a diversion, so there is no problem.

[Brief description of drawings]

【図１】本発明の実施の形態に係る姓データの生成装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a family name data generation device according to an embodiment of the present invention.

【図２】図１の処理の流れを示すフローチャート図であ
る。FIG. 2 is a flowchart showing the flow of processing of FIG.

[Explanation of symbols]

５入力手段１０乱数発生手段２０記憶手段２２頻出姓テーブル２３文字列テーブル２４上位文字列２５下位文字列２６禁則テーブル２７第１の禁則２８第２の禁則２９第３の禁則３０姓生成手段３２頻出姓生成部３６組合せ姓生成部４０処理振分け手段５０禁則チェック手段６０出力手段 5 Input means 10 Random number generator 20 storage means 22 Frequent family name table 23 string table 24 upper string 25 lower string 26 prohibition table 27 First Prohibition 28 Second Prohibition 29 Third Prohibition 30 Last name generation means 32 Frequent surname generator 36 Combination surname generator 40 Processing distribution means 50 Prohibition check method 60 Output means

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 170 G06F 9/06 540 G06F 11/28 340 G06F 17/21 590 ＪＩＣＳＴファイル（ＪＯＩＳ)─────────────────────────────────────────────────── ─── Continuation of the front page (58) Fields surveyed (Int.Cl. ⁷ , DB name) G06F 17/30 170 G06F 9/06 540 G06F 11/28 340 G06F 17/21 590 JISST file (JOIS)

Claims

(57) [Claims]

1. A surname data generator used for test data in system development, wherein the surname is generated via a random number generating means for generating a random number and obtaining a random value, and an input means. In order to generate data, a plurality of frequent family names having a high frequency in general printed matter, a plurality of character strings for combination, and a predetermined prohibition rule for excluding the family name data generated by the combination of the character strings A storage unit in which a prohibition table having the following is registered, and the value of the second random number generated by the random number generation unit is corrected by performing an approximation process so that the number of distributions increases as the value decreases, and approximates to the actual distribution. Then, from the frequent family name, the frequent family name generating unit that acquires the frequent family name corresponding to the corrected value and generates the family name data, and the third random number value and the fourth random number value by the random number generating means. Of the corresponding string A surname generation unit that generates the surname data by combination and adds one to the number of generated surname data each time the surname data is generated; and a surname generation unit that outputs the generated data via an output unit. , Based on a first random number generated by the random number generating means and a set value that defines a predetermined ratio at which the family name data based on the frequent family name and the family name data based on a combination of the character strings are generated. When the surname data is generated by the combination of the character strings, the prohibition table is generated when the surname data is generated based on the surname data and the surname data is generated based on the combination of the character strings. A surname data generation device, comprising: a prohibition check means for performing a prohibition check with reference to the prohibition rule.

2. The storage means comprises a frequent family name table which is searched by a personal name dictionary, printed matter, etc., and is registered by sorting the frequent family names having high frequency in order of frequent occurrence, and the family name data generated by a combination. One character located higher in the surname data, which is the character string, or a higher character string of the character string, and one character located lower in the surname data, or a lower character string of the character string. A character string table in which is registered, when the character string is registered as the upper character string, a first prohibition that is a combination of characters for excluding a predetermined character string, and the character string as the lower character When registering as a string, a second prohibition, which is a combination of characters for excluding a predetermined character string, and a combination of the upper character string and the lower character string to generate the family name data, a predetermined character Upper string and lower A third for excluding the surname data by a combination with a character string and excluding a predetermined combination of the first attribute added to the upper character string and the second attribute added to the lower character string 2. The surname data generation device according to claim 1, further comprising: a prohibition table in which the prohibition is registered.

3. The prohibition checking unit refers to the first prohibition in the prohibition table when registering the character string as the higher-order character string in the storage unit, and forms a corresponding character combination. When a character string is excluded from registration and the character string is registered in the storage means as the lower character string, the character string that is a combination of corresponding characters is referred to by referring to the second prohibition of the prohibition table. When excluding from registration, when generating the surname data by combining the upper-order character string and the lower-order character string in the combination surname generation unit, refer to the third prohibition in the prohibition table and refer to the corresponding upper rank. Excluding generation of the surname data by a combination of a character string and the lower character string, referring to the third prohibition in the prohibition table, the first attribute and the lower character added to the upper character string If the combination of the second attribute added to falls, claim 1, characterized in that to exclude generation of the last name data or
2. A device for generating surname data described in 2 .