JP2002259368A

JP2002259368A - Method and device for working document cipher, document cipher working processing program and recording medium therefor

Info

Publication number: JP2002259368A
Application number: JP2001056249A
Authority: JP
Inventors: Kenichi Kawamura; 賢一川村; Hideaki Harada; 英昭原田; Yoshiyuki Kawabe; 美如河辺; Toshihiro Iwamoto; 俊洋岩元
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2001-03-01
Filing date: 2001-03-01
Publication date: 2002-09-13

Abstract

PROBLEM TO BE SOLVED: To easily distribute a document by providing automatic cipher working processing for a privacy information part in the document. SOLUTION: This device is provided with a means 111 for storing a word (replacing word) for replacing a relevant word set by a user unspecifiably in a word dictionary 130 and performing morpheme analysis concerning an input document while referring to the word dictionary, a means 112 for extracting a peculiar noun part concerning privacy information on the basis of the morpheme analysis result, a means 122 for working the replacing word of the extracted peculiar noun part into cipher by acquiring it from the word dictionary, further, means 123, 124 and 125 for replacing the extracted peculiar noun part into unspecifiable symbol, alphabet letter or initial letter and a means 121 for selecting any one of the means 122-125 corresponding to the kind of a working target character string or the like.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文書の伏字加工技
術に係わり、特に文書内容に対して伏字加工処理を施す
ことにより、プライバシー情報の侵害を回避することを
可能にする文書伏字加工方法、文書伏字加工装置、その
ためのプログラム及びプログラム記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for processing a character in a document, and more particularly, to a method for processing a character in a document, which can avoid infringement of privacy information by performing a character processing on a document content. The present invention relates to a document processing device, a program therefor, and a program recording medium.

【０００２】[0002]

【従来の技術】既存の電子化文書（以下、単に文書）を
そのまま会社内報、インターネット、メール添付等で流
通しようとすると、文書によっては固有名詞の持つプラ
イバシー情報が侵害される可能性がある。従来、これを
回避するには、人間が一々文書に含まれるプライバシー
情報に関する固有名詞部分を抽出して、記号等に置き換
えることで対処していた。2. Description of the Related Art If an existing electronic document (hereinafter simply referred to as "document") is to be distributed as it is via a company newsletter, the Internet, an e-mail attachment, or the like, privacy information of a proper noun may be violated depending on the document. . Conventionally, to avoid this, a human has taken measures by extracting a proper noun part relating to privacy information contained in a document one by one and replacing it with a symbol or the like.

【０００３】[0003]

【発明が解決しようとする課題】従来技術においては、
文書に含まれるプライバシー情報に関する固有名詞部分
の抽出および伏字処理を人手で行っていたため、煩雑で
間違いが起きやすい、文書作成から流通可能になるまで
に時間がかかる、さらには、文書を容易に流通させるこ
とが困難である等の問題があった。In the prior art,
Manual extraction of the proper noun part of the privacy information contained in the document and processing of the hidden character are performed manually, which is complicated and error-prone. There was a problem that it was difficult to make it.

【０００４】本発明は、このような問題を解決し、文書
に対して自動的に伏字加工処理を施すことにより、プラ
イバシー情報を侵害することを避け、文書の流通等を容
易にすることを目的とする。An object of the present invention is to solve such a problem, and to automatically inflict processing a document to avoid invasion of privacy information and facilitate distribution of the document. And

【０００５】[0005]

【課題を解決するための手段】本発明は、パソコンやネ
ット端末、その他、文書作成編集機器に、単語対応に当
該単語を特定不可能に置換する置換単語を格納した形態
素解析用単語辞書と、入力された文書について、単語辞
書を参照して形態素解析を行い、該形態素解析結果を基
にプライバシー情報に関する固有名詞部分を抽出する機
能と、該抽出された固有名詞部分を特定不可能に伏字加
工するための置換単語を単語辞書から取得し、該取得し
た置換単語を用いて固有名詞部分の文字列を特定不可能
に伏字加工する機能を設けたことを最も主要な特徴とす
る。According to the present invention, there is provided a word dictionary for morphological analysis in which a personal computer, a net terminal, or another document creation / editing device stores a replacement word for replacing a word in a non-specifiable manner in correspondence with a word. A function of performing a morphological analysis on the input document with reference to the word dictionary and extracting a proper noun part relating to privacy information based on the result of the morphological analysis; The most main feature of the present invention is to provide a function of obtaining a replacement word for performing a character string processing from a word dictionary, and using the obtained replacement word to make a character string of a proper noun part unprintable.

【０００６】入力された文書に対して、単語辞書をもと
に、まず、プライバシー情報に関する固有名詞部分（肖
像権に関する固有名詞、名誉に関する会社情報および個
人情報等）を抽出する。次に、抽出されたプライバシー
情報に関する固有名詞部分に対して、単語辞書から置換
単語を取得し、該置換単語を用いて、伏字加工を施すこ
とによって、プライバシー情報に関する固有名詞部分を
特定不可能にする。ユーザは、単語辞書の所望の単語に
ついて、あらかじめ当該単語を特定不可能に置換する任
意の単語を設定しておく。これは、ユーザ単語登録機能
等で容易に可能である。First, a proper noun portion relating to privacy information (a proper noun relating to portrait right, company information and personal information relating to honor, etc.) is extracted from an input document based on a word dictionary. Next, a replacement word is obtained from the word dictionary with respect to the extracted proper noun part of the privacy information, and the substitute word is subjected to a lowercase letter processing so that the proper noun part of the privacy information cannot be specified. I do. For a desired word in the word dictionary, the user sets in advance an arbitrary word that replaces the word unspecified. This can be easily performed by a user word registration function or the like.

【０００７】[0007]

【発明の実施の形態】以下、本発明の一実施例について
図面により詳しく説明する。図１は、本発明の一実施例
のブロック図である。図１において、１００は文書伏字
加工装置本体であり、ハードウエア的にはＣＰＵやメモ
リ（ＲＡＭ）などから構成される。この文書伏字加工装
置本体１００は機能上、入力された文書（電子化文書）
からプライバシー情報に関する固有名詞部分を抽出する
抽出部１１０と、該抽出部１１０で抽出されたプライバ
シー情報に関する固有名詞部分を特定不可能に伏字加工
を施す加工部１２０のモジュールに分かれる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a block diagram of one embodiment of the present invention. In FIG. 1, reference numeral 100 denotes a main body of the document copy processing apparatus, which is composed of a CPU, a memory (RAM), and the like in hardware. The main function 100 of the document covert processing apparatus is functionally input document (digitized document).
The module is divided into an extraction unit 110 that extracts a proper noun part related to privacy information from the extraction unit 110, and a processing unit 120 that performs a hidden character processing so that the proper noun part related to privacy information extracted by the extraction unit 110 cannot be specified.

【０００８】ここで、抽出部１１０は、単語辞書１３０
を参照して入力文書を形態素解析する形態素解析部１１
１、該形態素析部１１１で解析された形態素情報を基に
固有名詞部分を抽出すると共に、接尾語テーブル１４０
を参照して固有名詞部分の社会的属性や個人的属性の種
類を取得する固有名詞抽出部１１２から構成される。加
工部１２０は、抽出された固有名詞部分を単語辞書１３
０に格納されたユーザ設定の置換単語（置換文字）に置
換するユーザ設定文字処理部１２２、抽出された固有名
詞部分を記号に置換する記号処理加工部１２３、抽出さ
れた固有名詞部分をアルファベット文字に置換するアル
ファベット文字処理加工部１２４、イニシャル文字テー
ブル１６０を参照して固有名詞部分をそのイニシャル文
字に置換するイニシャル文字処理加工部１２５、及び、
伏字処理テーブル１５０などを参照して処理加工部１２
２、１２３、１２４、１２５を選択する処理加工選択部
１２１から構成される。[0008] Here, the extraction unit 110 is provided with a word dictionary 130
Morphological analysis unit 11 that morphologically analyzes an input document with reference to
1. Extract a proper noun part based on the morphological information analyzed by the morphological analysis unit 111, and suffix table 140
, A proper noun extracting unit 112 for acquiring the type of social attribute or personal attribute of the proper noun part. The processing unit 120 converts the extracted proper noun part into the word dictionary 13
0, a user-set character processing unit 122 for replacing a user-set replacement word (replacement character), a symbol processing unit 123 for replacing an extracted proper noun part with a symbol, and an alphabetic character for an extracted proper noun part. , An initial character processing unit 125 that refers to the initial character table 160 and replaces the proper noun part with the initial character, and
The processing unit 12 with reference to the hidden character processing table 150 and the like.
2, 123, 124 and 125 are selected.

【０００９】単語辞書１３０、接尾語テーブル１４０、
伏字処理テーブル１５０、イニシャル文字テーブル１６
０等は、実際には、例えばハードディスク等に格納され
ている。The word dictionary 130, the suffix table 140,
Wobble processing table 150, initial character table 16
0 and the like are actually stored in, for example, a hard disk or the like.

【００１０】図３に単語辞書１３０の一例を示す。単語
辞書１３０は、単語（見出し）毎に、読み、属性、品詞
などを格納し、文章の形態素解析などに用いられるもの
である。本発明では、この単語辞書１３０に、対象単語
（固有名詞）を特定不可能に置換するためにユーザが設
定した文字（置換単語）を格納する。図３において、
「ユーザ設定文字」が該置換単語を示し、例では、「武
蔵野電信電話」、「電電太郎」に、それぞれ「ラボ
研」、「Ａ研究員」の置換単語が格納されている。置換
単語は、ユーザが任意に変更可能である。FIG. 3 shows an example of the word dictionary 130. The word dictionary 130 stores readings, attributes, parts of speech, and the like for each word (heading), and is used for morphological analysis of sentences. In the present invention, a character (replacement word) set by the user to replace the target word (proper noun) in an unidentifiable manner is stored in the word dictionary 130. In FIG.
“User setting character” indicates the replacement word. In the example, “Musashino Telegraph and Telephone” and “Denden Taro” store replacement words of “Lab Lab” and “Researcher A”, respectively. The replacement word can be arbitrarily changed by the user.

【００１１】図４に接尾語テーブル１４０の一例を示
す。接尾語テーブル１４０は固有名詞の種類と接尾語の
対応を表わしたものである。例では、接尾語が「株式会
社」、「氏」、「市」の場合、前方の固有名詞部分の種
類はそれぞれ「社名」、「人名」、「市名」であること
を表わしている。FIG. 4 shows an example of the suffix table 140. The suffix table 140 shows correspondence between types of proper nouns and suffixes. In the example, if the suffixes are “stock company”, “shi”, and “city”, it indicates that the types of proper noun parts in front are “company name”, “person name”, and “city name”, respectively.

【００１２】図５に伏字処理テーブル１５０の一例を示
す。伏字処理テーブル１５０は、加工対象部分文字列
（固有名詞部分）の種類などと伏字処理方法（ユーザ設
定文字処理、記号処理、アルファベット文字処理、イニ
シャル文字処理等）との対応を表わしたものである。
「ユーザ設定文字処理フラグ」は、「〇」印はユーザ設
定文字処理を優先的に選択し、「×」印はユーザ設定文
字処理を選択しないことを表わしている。例えば、加工
対象部分文字列の種類が社会的属性で「社名」の場合
は、単語辞書の置換単語によるユーザ設定文字処理を優
先し、単語辞書に置換単語がなかった場合、記号処理を
選択し、社会的属性でも「市名」の場合には、ユーザ設
定文字処理は選択しないで、ただちにイニシャル文字処
理を選択することを示している。また、加工対象部分文
字列の種類が個人的属性で「人名」の場合は、「社名」
と同様に単語辞書の置換単語によるユーザ設定文字処理
を優先し、単語辞書に置換単語がなかった場合、アルフ
ァベット文字処理を選択することを示している。この伏
字処理テーブル１６０は、ユーザが任意に変更可能であ
る。FIG. 5 shows an example of the hidden character processing table 150. The underprint processing table 150 shows the correspondence between the type of the processing target partial character string (proper noun part) and the underprint processing method (user-set character processing, symbol processing, alphabetic character processing, initial character processing, etc.). .
As for the “user-set character processing flag”, a mark “〇” indicates that the user-set character processing is preferentially selected, and a mark “x” indicates that the user-set character processing is not selected. For example, if the type of the partial character string to be processed is a social attribute "company name", the user-set character processing using the replacement word in the word dictionary is prioritized, and if there is no replacement word in the word dictionary, the symbol processing is selected. When the social attribute is also "city name", it indicates that the initial character processing is immediately selected without selecting the user-set character processing. If the type of the partial character string to be processed is a personal attribute and "person name", the "company name"
In the same manner as in the above, the user-set character processing using the replacement word in the word dictionary is prioritized, and if there is no replacement word in the word dictionary, the alphabet character processing is selected. This hidden character processing table 160 can be arbitrarily changed by the user.

【００１３】図５は単なる一例であり、加工対象部分文
字列の種類に関係なく、まず、単語辞書の置換単語によ
るユーザ設定文字処理を優先的に選択し、単語辞書に置
換単語がなかった場合、加工対象部分文字列の種類等に
より伏字処理加工方法を場合分けするとか、あるいは、
種類に関係なく、あらかじめ定めた一つの加工方法を選
択することが可能である。FIG. 5 is a mere example. Regardless of the type of the partial character string to be processed, first, the user-set character processing using the replacement word in the word dictionary is preferentially selected, and there is no replacement word in the word dictionary. , To classify the hidden character processing method depending on the type of the character string to be processed, or
Regardless of the type, it is possible to select one predetermined processing method.

【００１４】図６にイニシャル文字テーブル１６０の一
例を示す。イニシャル文字テーブル１６０は、「読み」
と置換すべき「イニシャル文字」の対応を表わしたもの
である。例えば、加工対象部分文字列の固有名詞部分が
「あ」から始まる場合、当該固有名詞部分は「Ａ」に置
換すべきことを示している。FIG. 6 shows an example of the initial character table 160. The initial character table 160 is “read”
And the "initial character" to be replaced. For example, when the proper noun part of the partial character string to be processed starts with “A”, it indicates that the proper noun part should be replaced with “A”.

【００１５】図２は、本実施例の動作の概略フローチャ
ートであり、以下、図２に従って図１の動作を説明す
る。FIG. 2 is a schematic flowchart of the operation of the present embodiment. Hereinafter, the operation of FIG. 1 will be described with reference to FIG.

【００１６】まず、抽出部１１０では、処理対象となる
文書（電子化文書）をメモリ（ＲＡＭ）等に読み込む
（ステップ１）。抽出部１１０の形態素解析部１１１
は、単語辞書１３０（図３）を参照して、入力された文
書を単語単位に区切り、各単語の読み、品詞および活用
形等の形態素情報を取得する（ステップ２）。この形態
素解析では、品詞の属性も得られ、固有名詞については
社会的属性や個人的属性等も取得される。First, the extraction unit 110 reads a document to be processed (digitized document) into a memory (RAM) or the like (step 1). Morphological analysis unit 111 of extraction unit 110
Refers to the word dictionary 130 (FIG. 3), divides the input document into words, and acquires morpheme information such as reading of each word, part of speech, and inflected forms (step 2). In this morphological analysis, attributes of parts of speech are also obtained, and for proper nouns, social attributes, personal attributes, and the like are also obtained.

【００１７】次に、固有名詞抽出部１１２は、得られた
形態素情報を基に、固有名詞が存在するかどうかをチェ
ックし、存在する場合には、固有名詞を含む部分文字列
をプライバシー情報を侵害する可能性のある文字列と認
識し、加工対象部分文字列とする（ステップ３）。この
抽出された加工対象部分文字列には、他の文字列と区別
するために、例えばフラグを付加する。さらに固有名詞
抽出部１１２は、接尾語テーブル１４０（図４）を参照
して、抽出された加工対象部分文字列について「社
名」、「市名」、「人名」等、社会的属性や個人的属性
の更に具体的種類を取得する。なお、形態素解析部１１
１が、形態素解析の処理過程で接尾語テーブル１４０を
参照して、固有名詞を「社名」、「市名」、「人名」等
に細分することも可能である。Next, the proper noun extraction unit 112 checks whether the proper noun exists based on the obtained morphological information, and if there is, the partial character string including the proper noun is converted into the privacy information. It is recognized as a character string that may be infringed, and is set as a partial character string to be processed (step 3). For example, a flag is added to the extracted character string to be processed to distinguish it from other character strings. Furthermore, the proper noun extraction unit 112 refers to the suffix table 140 (FIG. 4) and extracts social attributes and personal attributes such as “company name”, “city name”, and “person name” for the extracted partial character string to be processed. Get more specific types of attributes. The morphological analysis unit 11
1 can refer to the suffix table 140 in the course of the morphological analysis to subdivide proper nouns into “company name”, “city name”, “person name”, and the like.

【００１８】加工部１２０では、まず、処理加工選択部
１２１において、文書中に加工対象部分文字列が抽出さ
れているか否かをチエックする（ステップ４）。これ
は、例えば文字列にフラグが付加されているかどうかで
判定する。そして、加工対象部分文字列が抽出されてい
ない場合には何もせずに、加工処理を終了する。In the processing section 120, first, the processing / processing selection section 121 checks whether or not a character string to be processed has been extracted in the document (step 4). This is determined by, for example, whether a flag is added to the character string. If the partial character string to be processed has not been extracted, the processing is terminated without doing anything.

【００１９】一方、加工対象部分文字列が抽出されてい
た場合には、処理加工選択部１２１は、伏字処理テーブ
ル１５０（図５）を参照して、すべての加工対象部分文
字列について、その社会的属性や個人的属性の種類等に
より、ユーザ設定文字処理加工部１２２、記号処理加工
部１２３、アルファベット文字加工部１２４あるいはイ
ニシャル文字処理加工部１２５を選択する（ステップ
５）。図５に示した伏字処理テーブル１５０の場合、処
理加工選択部１２１では、例えば、加工対象部分文字列
の種類が社会的属性で「社名」の場合、まず、ユーザ設
定文字処理加工部１２２を選択し、該ユーザ設定文字処
理加工部１２２から処理不可能（単語辞書１３０に置換
単語がない）の通知を受けると、次に記号処理加工部１
２３を選択する。また、加工対象部分文字列の種類が社
会的属性でも「市名」の場合には、直ちにイニシャル文
字処理加工部１２５を選択する。また、加工対象部分文
字列の種類が個人的属性で「人名」の場合には、まず、
ユーザ設定文字処理加工部１２２を選択し、該ユーザ設
定文字処理加工部１２２から処理不可能の通知を受ける
と、次にアルファベット文字処理加工部１２４を選択す
る。On the other hand, if the partial character string to be processed has been extracted, the processing selection section 121 refers to the hidden character processing table 150 (FIG. 5) to extract the social character strings for all the partial character strings to be processed. The user-set character processing unit 122, the symbol processing unit 123, the alphabet character processing unit 124, or the initial character processing unit 125 is selected according to the type of the personal attribute or personal attribute (step 5). In the case of the hidden character processing table 150 shown in FIG. 5, for example, when the type of the partial character string to be processed is a social attribute “company name”, the processing processing selecting unit 121 first selects the user-set character processing processing unit 122. Then, upon receiving a notification from the user-set character processing unit 122 that processing is not possible (there is no replacement word in the word dictionary 130), the symbol processing unit 1
Select 23. If the type of the partial character string to be processed is “city name” even if the attribute is a social attribute, the initial character processing unit 125 is immediately selected. When the type of the partial character string to be processed is a personal attribute “person name”, first,
When the user-set character processing unit 122 is selected and a notification that processing is impossible is received from the user-set character processing unit 122, the alphabet character processing unit 124 is selected.

【００２０】なお、処理加工選択部１２１では、図４で
も触れたように、すべての加工対象部分文字列につい
て、その種類に関係なく、まず、ユーザ設定文字処理加
工部１２２を選択し、該ユーザ設定文字処理加工部１２
２から処理不可能の通知が受けた場合に、記号処理加工
部１２３、アルファベット文字処理加工部１２４あるい
はイニシャル文字処理加工部１２５を選択することも可
能である。どのような選択方法（選択モード）を適用す
るかは、ユーザがあらかじめ指定しておけばよい。As mentioned in FIG. 4, the processing / selection unit 121 first selects the user-set character processing / processing unit 122, regardless of its type, for all the partial character strings to be processed. Set character processing section 12
When a notification indicating that processing is impossible is received from 2, the symbol processing unit 123, the alphabet character processing unit 124, or the initial character processing unit 125 can be selected. The user may specify in advance what selection method (selection mode) is to be applied.

【００２１】ステップ５でユーザ設定文字処理加工部１
２２が選択されると、ユーザ設定文字処理加工部１２２
では、加工対象部分文字列の固有名詞の読み情報から単
語辞書１３０（図３）を検索し、該固有名詞を含む加工
対象部分文字列（固有名詞および接尾語）を単語辞書１
３０に格納されているユーザ設定文字（置換単語）に置
換し伏字処理する（ステップ６）。図３の場合、例え
ば、「武蔵野電信電話株式会社」は「ラボ研」に、「電
電太郎氏」は「Ａ研究員」に伏字処理する。また、ユー
ザ設定文字処理加工部１２２は、単語辞書１３０にユー
ザ設定文字（置換単語）が格納されていない場合には、
処理不可能通知を加工処理選択部１２１に返す。In step 5, the user-set character processing unit 1
When 22 is selected, the user-set character processing unit 122
Then, the word dictionary 130 (FIG. 3) is searched from the reading information of the proper noun of the partial character string to be processed, and the partial character string to be processed (proper noun and suffix) including the proper noun is stored in the word dictionary 1.
The character is replaced with a user-set character (replacement word) stored in 30 and is subjected to a hidden character processing (step 6). In the case of FIG. 3, for example, “Musashino Telegraph and Telephone Co., Ltd.” is processed as “Lab Lab”, and “Dentaro Taro” is processed as “Researcher A”. If the user-set character (replacement word) is not stored in the word dictionary 130,
A processing impossible notification is returned to the processing selection unit 121.

【００２２】ステップ５で記号処理加工部１２３が選択
されると、記号処理加工部１２３では、加工対象部分文
字列の固有名詞部分に対して、「××」、「○○」や
「□□」等の記号に置換し、例えば、「武蔵野電信電話
株式会社」を「××会社」とするような記号処理を施す
（ステップ７）。どのような記号を使用するかは、ユー
ザが自由に設定可能である。When the symbol processing unit 123 is selected in step 5, the symbol processing unit 123 applies “XX”, “OO”, or “□□” to the proper noun part of the partial character string to be processed. , Etc., and performs symbol processing such as, for example, "Musashino Telegraph and Telephone Corporation" as "xx company" (step 7). Which symbol is used can be freely set by the user.

【００２３】同様にステップ５でアルファベット文字処
理加工部１２４が選択されると、アルファベット文字処
理加工部１２４では、加工対象部分文字列の固有名詞部
分に対して、「Ａ」、「Ｂ」、「Ｃ」等のアルファベッ
ト文字に置換し、例えば、「電電太郎氏」を「Ａ氏」と
するようなアルファベット文字処理を施す（ステップ
８）。この場合も、利用者は、使用するアルファベット
文字を自由に設定できるようにする。Similarly, when the alphabet character processing unit 124 is selected in step 5, the alphabet character processing unit 124 applies “A”, “B”, “B” to the proper noun part of the partial character string to be processed. The character string is replaced with an alphabetic character such as "C", and alphabetical character processing is performed such that "Dentaro Taro" becomes "Mr. A" (step 8). Also in this case, the user can freely set the alphabet characters to be used.

【００２４】同様にステップ５でイニシャル文字処理加
工部１２５が選択されると、イニシャル文字処理加工部
１２５では、イニシャル文字テーブル１６０（図６）を
参照し、加工対象部分文字列の固有名詞部分に対して、
当該固有名詞の「Ｍ」、「Ｏ」、「Ｍ．Ｋ」等のイニシ
ャル文字に置換し、例えば、「東京都武蔵野市」を「東
京都Ｍ市」というようなイニシャル文字処理を施す（ス
テップ９）。具体的には、イニシャル文字処理加工部１
２５は、固有名詞の読み情報からイニシャル文字テーブ
ル１６０を検索し、固有名詞を該当するイニシャル文字
に伏字する。Similarly, when the initial character processing unit 125 is selected in step 5, the initial character processing unit 125 refers to the initial character table 160 (FIG. 6) to add the proper noun part of the partial character string to be processed. for,
The proper nouns are replaced with initial characters such as "M", "O", "M.K" and the like, and initial character processing such as "Musashino-shi, Tokyo" is performed as "M-shi, Tokyo" (step 9). Specifically, the initial character processing unit 1
In step 25, the initial character table 160 is searched from the proper noun reading information, and the proper noun is converted to the corresponding initial character.

【００２５】最後に、加工部１２０では、すべての加工
対象部分文字列について伏字加工を施こした文書を元の
文書に上書きする（ステップ１０）。このようにして、
プライバシー情報を侵害される可能性のある部分の伏字
加工された文書が自動的に作成される。Lastly, the processing unit 120 overwrites the original document with the document subjected to the hidden character processing for all the character strings to be processed (step 10). In this way,
A text-overlaid document is automatically created for the parts where privacy information could be violated.

【００２６】図７ないし図１０に、本発明による文書伏
字加工の具体例を示す。いま、元の文書（処理対象文
書）が図７の如くであったとする。図７に示す文書が入
力され、抽出部１１０の形態素解析部１１１において形
態素解析することにより、図８に示すような形態素情報
が得られる。固有名詞抽出部１１２では、図８に示す形
態素情報を基に、入力文書中に固有名詞を含む加工対象
部分文字列が存在するかチエックする。その結果、本例
では「武蔵野電信電話株式会社」、「電電太郎氏」およ
び「東京都武蔵野市」が「固有名詞を含む加工対象部分
文字列」として抽出される。さらに、図４に示すような
接尾語テーブル１４０より、これらの加工対象部分文字
列の種類は、それぞれ「社名」、「人名」、「市名」と
抽出される。FIGS. 7 to 10 show a specific example of the process for processing a document in accordance with the present invention. Now, assume that the original document (processing target document) is as shown in FIG. The morpheme information shown in FIG. 8 is obtained by inputting the document shown in FIG. 7 and performing morphological analysis in the morphological analysis unit 111 of the extraction unit 110. The proper noun extraction unit 112 checks based on the morphological information shown in FIG. 8 whether there is a partial character string to be processed including the proper noun in the input document. As a result, in this example, "Musashino Telegraph and Telephone Corporation", "Dentaro Taro", and "Musashino City, Tokyo" are extracted as "substrings to be processed including proper nouns". Further, from the suffix table 140 as shown in FIG. 4, the types of the partial character strings to be processed are extracted as “company name”, “person name”, and “city name”, respectively.

【００２７】加工部１２０では、まず、処理加工選択部
１２１において、伏字処理テーブル１５０に基づき、抽
出部１１０で抽出された加工対象部分文字列の「武蔵野
電信電話株式会社」、「電電太郎氏」および「東京都市
武蔵野市」について、それぞれ伏字処理を実施する処理
加工部１２２、１２３、１２４あるいは１２５を選択す
る。ここで、ユーザ設定文字処理加工部１２２が選択さ
れると、該ユーザ設定文字処理加工部では、単語辞書１
３０を検索し、例えば、「武蔵野電信電話株式会社」お
よび「電電太郎氏」を、それぞれ「ラボ研」、「Ａ研究
員」に置換する。また、記号処理加工部１２３、アルフ
ァベット文字処理加工部１２４、イニシャル文字処理加
工部１２５が選択されると、それぞれ、当該加工対象部
分文字列の固有名詞部分を記号、アルファベット文字あ
るいはイニシャル文字に置換する。In the processing unit 120, first, in the processing selection unit 121, based on the hidden character processing table 150, “Musashino Telegraph and Telephone Corporation” and “Taro Denden” of the character strings to be processed extracted by the extraction unit 110 are extracted. And a processing unit 122, 123, 124, or 125 that performs the hidden character processing for "Tokyo City Musashino City", respectively. Here, when the user setting character processing unit 122 is selected, the user setting character processing unit 122
30 is searched and, for example, "Musashino Telegraph and Telephone Corporation" and "Dentaro Taro" are replaced with "Lab Lab" and "Researcher A", respectively. When the symbol processing unit 123, the alphabet character processing unit 124, and the initial character processing unit 125 are selected, the proper noun part of the partial character string to be processed is replaced with a symbol, an alphabet character, or an initial character, respectively. .

【００２８】図９は、「武蔵野電信電話株式会社」、
「電電太郎氏」および「東京都武蔵野市」のすべての加
工対象部分文字列に対して、その社会的属性や個人的属
性の種類に関係なく、処理加工部１２３、１２４、１２
５においてそれぞれ記号処理、アルファベット文字処
理、イニシャル文字処理を実施した場合の処理例を示し
たものである。FIG. 9 shows “Musashino Telegraph and Telephone Corporation”
The processing units 123, 124, and 12 are provided for all the partial character strings to be processed of "Dentaro Taro" and "Musashino City, Tokyo", regardless of their social attributes and personal attributes.
5 shows an example of processing when symbol processing, alphabet character processing, and initial character processing are respectively performed.

【００２９】ここでは、図５に示した伏字処理テーブル
１５０に基づき、社会的属性が「社名」の「武蔵野電信
電話株式会社」、及び、個人的属性が「人名」の「電電
太郎氏」に対してはユーザ設定文字処理を、社会的属性
が「市名」の「東京都武蔵野市」に対してはイニシャル
文字処理をそれぞれに施すものとする。また、イニシャ
ル文字処理は、図９の処理例にしたがうとする。したが
って、「武蔵野電信電話株式会社」は「ラボ研」、「電
電太郎氏」は「Ａ研究員」、「東京都市武蔵野市」は
「東京都Ｍ市」と、それぞれ置換される。この結果、図
７に示した元の文書に対して、図１０のように伏字加工
された文書が得られる。Here, based on the hidden character processing table 150 shown in FIG. 5, the social attribute is "Musashino Telegraph and Telephone Co., Ltd." of "company name" and the personal attribute is "Taro Denden" of "person name". For this, user-set character processing is performed, and initial character processing is performed for “Musashino City, Tokyo” having a social attribute of “city name”. It is assumed that the initial character processing follows the processing example of FIG. Therefore, "Musashino Telegraph and Telephone Corporation" is replaced with "Lab Lab", "Dentaro Taro" is replaced with "A Researcher", and "Tokyo City Musashino City" is replaced with "Tokyo M City", respectively. As a result, a document in which the original document shown in FIG. 7 is processed as shown in FIG. 10 is obtained.

【００３０】なお、加工対象部分文字列に対して、どの
ように伏字加工処理するかは、個人や会社等が自由に設
定でき、その設定に基づいて伏字加工処理を実施するこ
とが可能である。特に、実施例では、伏字処理テーブル
１５０の内容を変更することで容易に実現できる。It should be noted that an individual or a company can freely set how to perform the hidden character processing on the partial character string to be processed, and the hidden character processing can be performed based on the setting. . In particular, in the embodiment, it can be easily realized by changing the contents of the hidden character processing table 150.

【００３１】以上、本発明について図示の実施例にもと
づいて説明したが、本発明は図示の実施例に限定される
ものでないことは云うまでもない。例えば、加工対象部
分文字列に対する伏字処理の選択は、テーブルを持つ方
法に限る必要はない。Although the present invention has been described based on the illustrated embodiment, it is needless to say that the present invention is not limited to the illustrated embodiment. For example, the selection of the hidden character processing for the partial character string to be processed need not be limited to a method having a table.

【００３２】また、入力された文書からプライバシー情
報に関する固有名詞部分を抽出する処理手順、抽出され
たプライバシー情報に関する固有名詞部分を特定不可能
に伏字加工する処理手順（具体例には図２に示したよう
な処理手順）をコンピュータに実行させるためのプログ
ラムは、あらかじめコンピュータ読み取り可能な記録媒
体（ＦＤ、ＣＤ−ＲＯＭ、ＭＯ等）に記録して提供する
ことも可能である。この記録媒体に記録されたプログラ
ムをコンピュータにインストールすることにより、図１
に示したような抽出部１１０、加工部１２０が所期の機
能を達成することになる。さらには、この種のプログラ
ムはコンピュータにプレインストールされていてもよ
い。Further, a processing procedure for extracting a proper noun part relating to privacy information from an input document, and a processing procedure for processing the extracted proper noun part relating to privacy information so as to be unidentifiable (see FIG. 2 for a concrete example). A program for causing a computer to execute the above-described processing procedure can be provided by being recorded in a computer-readable recording medium (FD, CD-ROM, MO, or the like) in advance. By installing the program recorded on this recording medium into a computer, the program shown in FIG.
The extraction unit 110 and the processing unit 120 as shown in FIG. Further, such a program may be preinstalled on a computer.

【００３３】[0033]

【発明の効果】以上説明したように、本発明の文書伏字
加工方法および装置、そのためのプログラムやプログラ
ム記録媒体を用いれば以下のような効果が得られる。（１）自動処理のため、従来の人手による伏字加工処理
に比較して、時間・稼動が削減できる。（２）（１）により、文書作成から流通可能になるまで
の時間が、従来に比べ短縮される。（３）（１）や（２）により、文書を容易に流通させる
ことが可能となる。（４）形態素解析用の単語辞書にユーザが自分の好きな
ように自在に置換文字を設定できる。As described above, the following effects can be obtained by using the method and apparatus for processing a document covert according to the present invention, a program and a program recording medium for the method. (1) Because of the automatic processing, the time and operation can be reduced as compared with the conventional manual processing of the hidden character processing. (2) According to (1), the time from the creation of a document until the document can be distributed is reduced as compared with the related art. (3) According to (1) and (2), the document can be easily distributed. (4) The user can freely set substitution characters in the word dictionary for morphological analysis as he or she likes.

[Brief description of the drawings]

【図１】本発明の一実施例の構成図である。FIG. 1 is a configuration diagram of an embodiment of the present invention.

【図２】本発明の動作例を示す概略フロー図である。FIG. 2 is a schematic flowchart showing an operation example of the present invention.

【図３】単語辞書の一例を示す図である。FIG. 3 is a diagram illustrating an example of a word dictionary.

【図４】接尾語テーブルの一例を示す図である。FIG. 4 is a diagram illustrating an example of a suffix table.

【図５】伏字処理テーブルの一例を示す図である。FIG. 5 is a diagram illustrating an example of a hidden character processing table.

【図６】イニシャル文字テーブルの一例を示す図であ
る。FIG. 6 is a diagram illustrating an example of an initial character table.

【図７】本発明の具体例の説明に用いる文書例を示す図
である。FIG. 7 is a diagram showing an example of a document used for explaining a specific example of the present invention.

【図８】図７の文書例の形態素情報を示す図である。FIG. 8 is a diagram showing morpheme information of the document example of FIG. 7;

【図９】記号処理、アルファベット文字処理、イニシャ
ル文字処理の一例を示す図である。FIG. 9 is a diagram illustrating an example of symbol processing, alphabet character processing, and initial character processing.

【図１０】図７の文書例に対して伏字加工処理を施した
文書例を示す図である。FIG. 10 is a diagram illustrating an example of a document obtained by performing a hidden character processing process on the example of the document in FIG. 7;

[Explanation of symbols]

１００文書伏字加工装置本体１１０抽出部１１１形態素解析部１１２固有名詞抽出部１２０加工部１２１処理加工選択部１２２ユーザ設定文字処理加工部１２３記号処理加工部１２４アルファベット文字処理加工部１２５イニシャル文字処理加工部１３０単語辞書１４０接尾語テーブル１５０伏字処理テーブル１６０イニシャル文字テーブル REFERENCE SIGNS LIST 100 Document cover processing apparatus main body 110 Extraction unit 111 Morphological analysis unit 112 Proper noun extraction unit 120 Processing unit 121 Processing processing selection unit 122 User-set character processing processing unit 123 Symbol processing processing unit 124 Alphabet character processing processing unit 125 Initial character processing processing unit 130 Word dictionary 140 Suffix table 150 Underprint processing table 160 Initial character table

───────────────────────────────────────────────────── フロントページの続き (72)発明者河辺美如東京都千代田区大手町二丁目３番１号日本電信電話株式会社内 (72)発明者岩元俊洋東京都千代田区大手町二丁目３番１号日本電信電話株式会社内Ｆターム(参考） 5B009 MB03 ME24 QB14 5B091 AB06 CA02 CC02 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Miyo Kawabe 2-3-1 Otemachi, Chiyoda-ku, Tokyo Inside Nippon Telegraph and Telephone Corporation (72) Inventor Toshihiro Iwamoto 2--3, Otemachi, Chiyoda-ku, Tokyo No. 1 Nippon Telegraph and Telephone Corporation F-term (reference) 5B009 MB03 ME24 QB14 5B091 AB06 CA02 CC02

Claims

[Claims]

Claims 1. A method for processing a hidden character in a document in which a proper noun part relating to privacy information in a document is unidentifiably processed, wherein a word that replaces the word unidentifiably (hereinafter, a replacement word) is stored in a word dictionary. A morphological analysis is performed on the input document with reference to the word dictionary, and a proper noun part relating to privacy information is extracted based on the result of the morphological analysis, and the extracted proper noun part is not identified. A method according to claim 1, wherein a substitute word for processing a hidden character is obtained from the word dictionary as possible, and the character string of a proper noun portion is converted into a character using the obtained replacement word.

2. A method according to claim 1, wherein when a replacement word for processing the extracted proper noun part so as to be unidentifiable is not stored in the word dictionary, the character of the proper noun part is stored. A method of processing a document in a hidden part of a document, in which a string is replaced with at least one of a predetermined symbol or alphabetic character or an initial character of the character string and the part is unidentifiable.

3. The method according to claim 1, wherein, in accordance with the type of the extracted proper noun part, a character string of the proper noun part is replaced with a replacement word stored in a word dictionary.
A method of processing a document in a hidden part of a document, wherein the character is replaced with one of a predetermined symbol, an alphabetic character, or an initial character of the character string and the part is unidentifiable.

4. A document covert processing device for processing a proper noun part relating to privacy information in a document so as to be unidentifiable, and a word dictionary storing a substitute word corresponding to a word so as to replace the word in an unspecifiable manner. Extracting means for morphologically analyzing an input document with reference to the word dictionary and extracting a proper noun part relating to privacy information based on the result of the morphological analysis, and wherein the extracted proper noun part cannot be specified. And a processing unit for obtaining a replacement word for processing a hidden character from the word dictionary and processing the character string of a proper noun portion using the obtained replacement word.

5. A document covert character processing device for processing a proper noun part relating to privacy information in a document so as to be unidentifiable, comprising: a word dictionary storing a replacement word corresponding to a word and replacing the word unspecifiably. Extracting means for morphologically analyzing an input document with reference to the word dictionary and extracting a proper noun part relating to privacy information based on the result of the morphological analysis; and wherein the extracted proper noun part cannot be specified. First processing processing means for obtaining a replacement word for processing a hidden character from the dictionary, and processing the character string of the proper noun portion using the obtained replacement word, and a character string of the extracted proper noun portion Is replaced with at least one of a predetermined symbol or alphabetic character, or an initial character of the character string, and the second character is processed in an unidentifiable manner. When the document asterisk processing apparatus characterized by having a process processing selection means for selecting the first treatment processing means or the second processing processing means.

6. The document processing apparatus according to claim 5, wherein the processing processing selecting means selects the first processing processing means,
A document covert character processing device, wherein a second processing means is selected unless a replacement word for processing an extracted proper noun part so as to be unidentifiable in a word dictionary is not stored.

7. The document processing apparatus according to claim 5, wherein the processing processing selecting means selects the first processing processing means or the second processing processing according to the type of the extracted proper noun part. Document processing device.

8. A process for morphologically analyzing a document input to a computer with reference to a word dictionary and extracting a proper noun part relating to privacy information based on a result of the morphological analysis; And a process of acquiring a replacement word for processing a character to make it impossible to specify a character from a word dictionary, and performing a process of processing a character string of a proper noun portion using the obtained replacement word.

9. A computer-readable recording medium on which the program for processing a document hidden character processing according to claim 8 is recorded.