JP2002230006A

JP2002230006A - Method for analyzing free descriptive answer, method for extracting keyword from free descriptive document, and method for supporting analysis of the document

Info

Publication number: JP2002230006A
Application number: JP2001360968A
Authority: JP
Inventors: Sadanobu Takane; 定信高根
Original assignee: Individual
Current assignee: Individual
Priority date: 2000-11-28
Filing date: 2001-11-27
Publication date: 2002-08-16

Abstract

PROBLEM TO BE SOLVED: To construct a keyword data base by using an actual free descriptive answer text, and to analyze the free descriptive answer text based on the keyword data base. SOLUTION: This method comprises a process for generating the elements of keywords by extracting the same character string included in the answers of not less than two answerers from a plurality of free descriptive answers, a process for constructing a keyword data base by erasing the same overlapped expression from the elements of the keywords, and erasing any unnecessary character string which can not be any keyword, a process for comparing the plurality of free descriptive answers with the data base, and for, when any character string including each keyword is present in each free descriptive answer, counting it as 'the presence of reaction', and a process for relating the attribute of a person whose reaction is counted with the keyword.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、アンケートに対す
る回答のような自由記述回答を解析する方法および多数
の自由記述文書からキーワードを抽出する方法に関する
ものである。本明細書において、「自由記述回答」に
は、当初よりテキスト形式で存在する文書のみならず、
結果としての文書データも含む。「自由記述回答」に
は、例えば、コールセンターにおいて消費者や回答者か
ら音声を介して獲得した情報を電子文書化したようなも
のも含まれる。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for analyzing a free description answer such as a response to a questionnaire and a method for extracting a keyword from a large number of free description documents. In the present specification, the "free answer" includes not only documents that exist in text format from the beginning,
Also includes the resulting document data. The "freely described answer" includes, for example, electronically documented information obtained from a consumer or a respondent via voice at a call center.

【０００２】[0002]

【従来の技術】電子メールの普及により、被験者からの
アンケート調査の回答や顧客の意見等を電子文書として
取得する機会が増えている。通常、電子文書として送ら
れる文書はいわゆる自由記述回答文であり、アンケート
結果等の取得した自由記述回答文の分析は、従来、手作
業で行なっていた。しかしながら、何百、何千という大
量の文書を手作業で分析することは極めて煩雑で時間が
かかるものである。2. Description of the Related Art With the spread of e-mail, opportunities for obtaining responses to questionnaire surveys from customers and opinions of customers as electronic documents are increasing. Usually, a document sent as an electronic document is a so-called free description answer sentence, and the analysis of the obtained free description answer sentence such as a questionnaire result has conventionally been performed manually. However, manually analyzing hundreds or thousands of documents by hand is extremely cumbersome and time consuming.

【０００３】最近では、コンピュータによる日本文の自
動解析が進み、キーワードによって電子文書を検索する
ことが行なわれる。しかしながら、アンケート調査の回
答や一般意見は自由記述回答文であり、予めキーワード
のデータベースを作成しておいても、可能性のある全て
の自由記述回答に完全に対応することはできない。ま
た、同じ質問のアンケート調査であっても、調査時期等
によっては、回答内容が大きく変化する場合があり、実
際に回答内容を分析しないとキーワードが絞れないこと
もある。Recently, automatic analysis of Japanese sentences by computers has been advanced, and electronic documents are searched by keywords. However, the answers and general opinions of the questionnaire survey are free description answers, and even if a database of keywords is created in advance, it is not possible to completely correspond to all possible free description answers. Further, even in a questionnaire survey of the same question, the answer content may change significantly depending on the survey time and the like, and keywords may not be narrowed down unless the answer content is actually analyzed.

【０００４】[0004]

【発明が解決しようとする課題】本発明は、実際の自由
記述回答文を利用することで、キーワードデータベース
を構築し、かかるキーワードデータベースに基づいて自
由記述回答文を解析すること並びに解析を支援すること
を目的とするものである。本発明の他の目的は、多数の
自由記述文からキーワードを抽出することを目的とする
ものである。SUMMARY OF THE INVENTION According to the present invention, a keyword database is constructed by using an actual free description answer, and the free description answer is analyzed based on the keyword database and the analysis is supported. The purpose is to do so. Another object of the present invention is to extract keywords from a large number of free description sentences.

【０００５】[0005]

【課題を解決するための手段】かかる目的を達成するた
めに本発明が採用した技術手段は、多数の自由記述回答
から二つ以上の回答者の回答に含まれる同一文字列を抽
出してキーワードの素を生成する工程と、該キーワード
の素から重複した同一表現を削除すると共に、キーワー
ドとなり得ない不要文字列を削除してキーワードデータ
ベースを構築する工程と、該多数の自由記述回答と該デ
ータベースとを比較し、各自由記述回答に各キーワード
を含む文字列があれば「反応あり」としてカウントする
工程と、反応がカウントされた人の属性と該キーワード
とを関連付ける工程と、を有するものである。The technical means adopted by the present invention to achieve the object is to extract the same character string included in the answers of two or more respondents from a large number of freely described answers, Generating a keyword database, deleting redundant identical expressions from the keyword components, deleting unnecessary character strings that cannot be keywords, and constructing a keyword database. And, if there is a character string including each keyword in each free description answer, counting as "reaction", and associating the attribute of the person whose response was counted with the keyword. is there.

【０００６】本発明が採用した他の技術手段は、多数の
自由記述回答から二つ以上の回答者の回答に含まれる同
一文字列を抽出してキーワードの素を生成する工程と、
該キーワードの素から重複した同一文字列を削除すると
共に、キーワードとなり得ない不要文字列を削除してキ
ーワード候補を抽出する工程と、予め格納されているキ
ーワードデータベースと該キーワード候補とを比較し、
該キーワード候補が該データベースにない文字列である
場合には、新しいキーワードとして追加して該データベ
ースをアップデートして新しいデータベースを構築する
工程と、該多数の自由記述回答と該新しいデータベース
とを比較し、各自由記述回答に該新しいデータベースの
各キーワードを含む文字列があれば「反応あり」として
カウントする工程と、反応がカウントされた人の属性と
該キーワードとを関連付ける工程と、を有するものであ
る。この手法は、一つの好ましい態様では、同じ質問に
対する二度目以降の解析に採用される。すなわち、既存
のキーワードデータベースは、請求項１や請求項１７に
記載された手法で構築される。しかしながら、予め予備
的なキーワードデータベースを作成しておき、これを既
存のキーワードデータベースとしてもよい。例えば、あ
る設問がある場合に、回答として予想され得るキーワー
ド（重要語）のシソーラスを用意しておき、これをキー
ワードデータベースとしてもよい。一つの好ましい態様
では、該シソーラスを構成する各語彙にはそれぞれ識別
記号（例えば数字）が付されており、得られたキーワー
ドが該シソーラスを構成するいずれかの語彙を含む場合
には、該キーワードに該語彙の識別記号を割り当て、割
り当てられた識別記号に基づいて得られたキーワードを
分類する工程を含む。こうすることで、識別記号をキー
として抽出されたキーワードを自動的に分類することが
できる。[0006] Another technical means adopted by the present invention is a step of extracting the same character string included in the answers of two or more respondents from a large number of freely described answers to generate a prime of a keyword.
A step of deleting the same character string that is duplicated from the element of the keyword, deleting unnecessary character strings that cannot be a keyword, and extracting keyword candidates, comparing a keyword database stored in advance with the keyword candidates,
If the keyword candidate is a character string that does not exist in the database, updating the database by adding it as a new keyword and constructing a new database; and comparing the large number of freely described answers with the new database. If there is a character string including each keyword of the new database in each free description answer, counting is performed as “reacted”, and a step of associating the attribute of the person whose response is counted with the keyword. is there. This technique is, in one preferred embodiment, employed for subsequent analyzes of the same question. That is, the existing keyword database is constructed by the method described in claims 1 and 17. However, a preliminary keyword database may be created in advance, and this may be used as an existing keyword database. For example, when there is a certain question, a thesaurus of keywords (important words) that can be expected as answers may be prepared, and this may be used as a keyword database. In one preferred embodiment, each vocabulary constituting the thesaurus is provided with an identification symbol (for example, a numeral), and when the obtained keyword includes any of the vocabulary constituting the thesaurus, the keyword is included. Assigning an identification symbol of the vocabulary to the keyword, and classifying the obtained keyword based on the assigned identification symbol. In this way, the extracted keywords can be automatically classified using the identification symbol as a key.

【０００７】前記二つ以上の回答は、基本的には、異な
る二人以上の回答者の回答を意味するものであるが、二
つ以上の回答は、同一人の回答を複製したものも含む。
例えば、回答件数が少ないような場合には、同一人の回
答からキーワードの素を抽出することが有用な場合もあ
る。[0007] The two or more answers basically mean the answers of two or more different respondents, but the two or more answers also include a duplicate of the same person's answer. .
For example, when the number of answers is small, it may be useful to extract the element of the keyword from the answer of the same person.

【０００８】また、本発明が採用したキーワード抽出法
は、キーワードの素を生成する工程と、該キーワードの
素からキーワードを抽出する工程とを有し、キーワード
の素を生成する工程は、多数の自由記述文書間の文字列
で最も長く一致する表現を抽出する工程を含み、該工程
を抽出される文字列が所定文字数になるまで繰り返すこ
とで、多数の自由記述文書から少なくとも二つ以上の文
書に含まれる同一文字列を抽出してキーワードの素を生
成するものである。キーワードを抽出する工程は、該キ
ーワードの素から重複した同一文字列を削除すると共
に、キーワードとなり得ない不要文字列を削除してキー
ワード候補を抽出するものである。The keyword extraction method employed by the present invention includes a step of generating a keyword prime and a step of extracting a keyword from the keyword prime. Extracting a longest matching expression in a character string between free description documents, and repeating the process until the extracted character string reaches a predetermined number of characters, thereby obtaining at least two or more documents from a large number of free description documents. Is extracted to generate the prime of the keyword by extracting the same character string included in. In the step of extracting the keyword, the same character string that is duplicated from the element of the keyword is deleted, and an unnecessary character string that cannot be a keyword is deleted to extract a keyword candidate.

【０００９】該自由記述文書は、一つあるいは複数の単
語、一つあるいは複数の語句、一つあるいは複数の文、
および、これらの任意の組み合わせのいずれかを含むも
のである。The free description document includes one or more words, one or more phrases, one or more sentences,
And any one of these arbitrary combinations.

【００１０】本発明が採用したさらなる他の手段は、多
数の自由記述文書間の文字列で最も長く一致する表現を
抽出する工程を有し、該工程を抽出される文字列が所定
文字数になるまで繰り返すことで、少なくとも二つ以上
の自由記述文書に含まれる同一文字列を抽出してキーワ
ード候補を生成することを含む自由記述文書解析支援法
である。本支援法によって抽出したキーワード候補の用
い方、利用法、分析法、さらなる加工法等は限定される
ものではなく、例えば、得られたキーワード候補の整理
にマニュアル作業が含まれていても良い。Still another means adopted by the present invention has a step of extracting an expression that matches the longest in a character string between a large number of freely described documents, and the character string to be extracted in the step has a predetermined number of characters. This is a free description document analysis support method that includes extracting the same character string included in at least two or more free description documents and generating keyword candidates by repeating the above steps. How to use, use, analyze, further process, etc. the keyword candidates extracted by the present support method are not limited. For example, a manual operation may be included in the arrangement of the obtained keyword candidates.

【００１１】[0011]

【発明の実施の形態】本発明は、コンピュータを用いて
自由記述回答を分析する方法に関する。自由記述回答は
電子文書データであり、一つの好ましい態様では、被験
者から電子メールによって取得した電子文書であるが、
電子文書データの生成手法は限定されず、例えば手書の
文書や音声を電子文書化したものを含む。かかる電子文
書はコンピュータの記憶装置に格納され、本発明に係る
解析法によって分析される。また、本発明は一つの好ま
しい態様では、ある質問に対して多数の被験者から取得
した多数の回答を分析することに用いられる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention relates to a method for analyzing a free description answer using a computer. The open-ended answer is electronic document data, and in one preferred embodiment, is an electronic document obtained by electronic mail from the subject,
The method of generating the electronic document data is not limited, and includes, for example, electronic documents of handwritten documents and voices. Such an electronic document is stored in a storage device of a computer and analyzed by the analysis method according to the present invention. In one preferred embodiment, the present invention is used for analyzing a large number of answers to a certain question obtained from a large number of subjects.

【００１２】[自由記述文の取得]「この夏一番おいしか
った弁当」という質問に対して、多数の被験者から電子
メールにより電子文書による回答を入手した。各被験者
には被験者の属性が関連付けられており、今回の回答に
は、被験者の属性として性別、年齢が含まれるが、属性
はこれらに限定されるものではない。被験者と属性の関
連付けは、回答時に性別、年齢を記入させるようにして
もよいし、あるいは、会員から回答を入手する場合であ
れば、コンピュータに予め格納されている性別、年齢を
用いてもよい。例えば、後者の例では、回答時に会員番
号が入力され、会員番号をキーとして性別、年齢等の属
性が回答に関連付けられる。自由記述文による回答の一
部を図１に示す。自由記述回答と共に、年齢、性別が属
性として関連付けられている。[Acquisition of free description] In response to the question "Lunch was the most delicious this summer", answers were obtained by electronic mail from many subjects by e-mail. Each subject is associated with the attribute of the subject, and this answer includes gender and age as attributes of the subject, but the attributes are not limited to these. The association between the subject and the attribute may be such that the gender and age are entered at the time of answering, or if the answer is obtained from the member, the gender and age previously stored in the computer may be used. . For example, in the latter example, a member number is input at the time of answering, and attributes such as gender and age are associated with the answer using the member number as a key. FIG. 1 shows a part of an answer in a free description sentence. Age and gender are associated as attributes together with the free description answer.

【００１３】［キーワードの素の生成］自由記述文によ
る回答において、複数人の同一表現（文字列）を検出し
てキーワードの素として記憶する。すなわち、二人以上
の回答に存在する文字列をキーワードの素とする。例え
ば、「ハンバーグと唐揚げの弁当」という回答が最初に
ピックアップされると、先ず、「ハンバーグと唐揚げの
弁当」という文字列と同一の文字列の有無について残り
の回答を文頭あるいは文末、または双方から走査する。
最初に「ハンバーグと唐揚げの弁当」と同一の文字列に
ついて他の回答と比較が行なわれ、次いで、該文字列を
順次縮めていく。したがって、文字列が「ハンバーグ」
まで縮まった時に、他の回答の中に「ハンバーグ弁当」
という文字列があれば、文字列「ハンバーグ」は複数の
回答に存在するので、キーワードの素としてピックアッ
プされる。文字列が「唐揚げの弁当」まで縮まった時
に、他の回答の中に「唐揚げの弁当」という文字列があ
れば、文字列「唐揚げの弁当」は複数の回答に存在する
ので、キーワードの素としてピックアップされる。ある
いは、文字列が「唐揚げ」まで縮まった時に、他の回答
の中に「唐揚げ弁当」という文字列があれば、文字列
「唐揚げ」は複数の回答に存在するので、キーワードの
素としてピックアップされる。さらに、文字列が「の弁
当」の場合には、他の回答の中に、「天ぷらの弁当」い
う文字列があれば、文字列「の弁当」はキーワードの素
としてピックアップされる。最も短縮された文字列は２
文字、あるいは３文字とする。以上の説明から明らかな
ように、本発明に係る解析法では、一人の回答から複数
のキーワードの素が生成される場合がある。キーワード
の素の抜粋を図２に示す。文字列の左側の数字は単なる
連番である。[Generation of Keyword Prime] In an answer in a free description sentence, the same expression (character string) of a plurality of persons is detected and stored as the keyword prime. In other words, character strings existing in two or more answers are used as keywords. For example, if the answer "Hamburger and fried lunch" is picked up first, then the remaining answer about the presence of the same character string as "Hamburger and fried lunch" is first or last sentence, or Scan from both sides.
First, the same character string as “Hamburger and fried lunch” is compared with other answers, and then the character string is sequentially reduced. Therefore, the string "hamburg"
When it shrank, "Hamburg lunch" was included in other answers
If the character string “hamburg” exists in a plurality of answers, it is picked up as a keyword element. When the character string is shortened to "Fried fried lunch", if there is a character string "Fried fried lunch" in other answers, the character string "Fried fried lunch" is present in multiple answers, It is picked up as a keyword element. Alternatively, when the character string is shortened to “Fried chicken” and there is a character string “Fried chicken lunch” in other answers, the character string “Fried chicken” exists in multiple answers, so the keyword Will be picked up as Further, when the character string is "Lunch box", if the other answer includes the character string "Lunch box of tempura", the string "Lunch box" is picked up as a keyword element. The shortest string is 2
Characters or three characters. As is apparent from the above description, in the analysis method according to the present invention, a plurality of keywords may be generated from one answer. FIG. 2 shows an excerpt of a keyword element. The numbers to the left of the string are simply serial numbers.

【００１４】［キーワードの素の整理］キーワードの素
群をアイウエオ順に並べ変える。並べ替えたファイルを
図３に示す。さらに、並べ替えたファイルから同一表現
を削除したファイルを作る。これを図４に示す。例え
ば、図３では、文字列「おにぎり」が複数あるが、重複
文字列を削除することで、図４においては、文字列「お
にぎり」は一つになっている。キーワードの素群のある
一定の並べ替えは必須ではないが、同一表現を削除する
場合には有用である。[Arrangement of Element of Keyword] The element group of the keyword is rearranged in the order of Iueo. FIG. 3 shows the rearranged files. Further, a file in which the same expression is deleted from the rearranged file is created. This is shown in FIG. For example, in FIG. 3, there are a plurality of character strings “onigiri”, but by deleting the duplicate character strings, in FIG. 4, the character string “onigiri” becomes one. Certain reordering of the keyword groups is not essential, but is useful for deleting identical expressions.

【００１５】本発明に係る手法は文字列の意味にかかわ
らずとにかく複数回答間に共通する文字列をピックアッ
プするものであるため、キーワードの素にはキーワード
となり得ない文字列が含まれる。したがって、キーワー
ドの素からキーワードとなり得ない文字列を削除する工
程が必要となる。キーワードの素の文字列の先頭が、促
音（っ、ょ等）長音符（−）、撥音（ん）であるもの、
および、先頭が、漢字、ひらがな、カタカナ、アルファ
ベット以外の記号（例えば、The method according to the present invention picks up a character string that is common to a plurality of answers regardless of the meaning of the character string. Therefore, the keyword element includes a character string that cannot be a keyword. Therefore, a step of deleting a character string that cannot be a keyword from a keyword element is required. The first of the elementary character strings of the keywords is a long note (-), a repellent sound (n),
Also, if the beginning is Kanji, Hiragana, Katakana, or a symbol other than the alphabet (for example,

【、】、＜、＞、『、』、「、」、［、］、！、＃、
＄、％、＆、＝、？等）から始まる場合には、これらを
自動的に削除する。この作業は、コンピュータの表示装
置において入力手段からのマニュアルで行なうこともで
きるが、省力化の観点から機械的処理によって自動的に
行なうことが望ましい。[,], <,>, ",", ",", [,],! , #,
＄,%, &, =,? Etc.), these are automatically deleted. This work can be performed manually from the input means on the display device of the computer, but it is desirable to perform the work automatically by mechanical processing from the viewpoint of labor saving.

【００１６】さらに、キーワードとして適切でないもの
が含まれている場合には自動あるいは手作業で削除す
る。例えば、図４に含まれる文字列「のお弁当」はキー
ワードとしては適切ではないので、キーワードから削除
されるべきである。このような不要キーワードの削除
は、不要語データベースを作成して格納しておくこと
で、不要語データベースにある文字列を自動的にキーワ
ードの素から落とすようにしてもよい。あるいは、マニ
ュアルでキーワードの素から削除してもよい。これらの
工程は、次に述べる「データベースの構築」で同時に行
なってもよい。Further, when a keyword that is not appropriate is included, the keyword is automatically or manually deleted. For example, the character string “bento” included in FIG. 4 is not appropriate as a keyword, and should be deleted from the keyword. Such an unnecessary keyword may be deleted by creating and storing an unnecessary word database, so that a character string in the unnecessary word database is automatically dropped from the keyword source. Alternatively, it may be manually deleted from the keywords. These steps may be performed at the same time in “database construction” described below.

【００１７】［データベースの構築］整理されたキーワ
ード自体がデータベースとなるが、これらをグループ分
けすることは、後の分析に際して有用である。分類され
たキーワードの抜粋を図５に示す。グループ分けは、共
通文字列を含むものを同じ類とする方法（文字列の一致
率を判定することで、所定の一致率を有するものを同じ
類としてもよい）、あるいは、同義語あるいは意味が近
い言葉を一つの類とすること等が行なわれる。例えば、
「幕の内弁当」という文字列を含むものをまとめること
で、「幕の内弁当」、「和風幕の内弁当」、「デラック
ス幕の内弁当」を「幕の内弁当」グループ（類）とす
る。あるいは、「カルビ」という文字列を含むものをま
とめることで、「カルビー弁当、「牛カルビ丼」、「ス
タミナカルビ丼」を「カルビー弁当」グループ（類）と
する。[Construction of Database] The arranged keywords themselves constitute a database, and grouping these keywords is useful for later analysis. FIG. 5 shows an excerpt of the classified keywords. Grouping is performed by a method of classifying a common character string into the same class (character strings having a predetermined matching rate may be classified into the same class by determining a matching rate of character strings), or a synonym or meaning having a synonym or meaning. For example, similar words are classified into one kind. For example,
By combining the items including the character string “Maku no Uchi bento”, “Maku no Uchi bento”, “Japanese style curtain inner lunch”, and “Deluxe curtain inner lunch” are grouped into a “Maku no Uchi bento” group (class). Alternatively, the characters including the character string “Kalbi” are put together to make “Kalbee bento,“ Gyu-Kalbi-don ”, and“ Stamina-Kalbi-don ”into the“ Kalbi bento ”group (class).

【００１８】また、同義語あるいは類似の語の対応辞書
を予め格納しておき、「カルビ」と「カルビー」は同
義、「から揚げ」、「唐揚げ」、「唐揚」、「からあ
げ」は同義、あるいは、「スパゲッティ」と「ペペロン
チーノ」は類似、と自動的に判定することで分類しても
よい。尚、本発明では、分類作業に手作業が加わること
を排除するものではない。例えば、文字列「買いませ
ん」、「とくになし」、「特になし」、「別に」、「食
べない」を「買いません」グループとしてマニュアルで
分類してもよい。A dictionary of synonyms or similar words is stored in advance, and "Kalbi" and "Kalbee" are synonymous, and "Karaage", "Karaage", "Karaage", and "Karaage" are Synonymous, or "spaghetti" and "peperoncino" may be classified by automatically determining that they are similar. Note that the present invention does not exclude that a manual operation is added to the classification operation. For example, the character strings “do not buy”, “particularly none”, “particularly none”, “separately”, and “do not eat” may be manually classified as a “do not buy” group.

【００１９】同じテーマに対する二回目以降の自由記述
回答文を分析する際には、前述した手法で「キーワード
の素」を作成し、これを整理する。整理された「キーワ
ード候補」は、既に構築されているデータベースと比較
される。「キーワード候補」と同じ文字列が「データベ
ース」に存在する場合には、該「キーワード候補」は新
たにデータベースに追加する必要がない。「キーワード
候補」と既存「データベース」にある文字列とが相違す
る場合には、既に存在するグループに含めるか、あるい
は別途のグループを作成するかの判定が行なわれる。判
定は、キーワード候補と既存キーワードの文字列の一致
率を判定することで、所定の一致率を有するものを該比
較されたキーワードと同じ類に含め、所定の一致率以下
の場合には、別の類を生成する。尚、かかる判定は、マ
ニュアルで行なっても良い。When analyzing the second and subsequent free description answer sentences for the same theme, a "keyword element" is created by the above-described method and arranged. The arranged “keyword candidates” are compared with a database already constructed. When the same character string as the “keyword candidate” exists in the “database”, it is not necessary to newly add the “keyword candidate” to the database. If the "keyword candidate" is different from the character string in the existing "database", it is determined whether to include the keyword in an existing group or create a separate group. The determination is performed by determining a matching rate between the keyword candidate and the character string of the existing keyword, and includes a keyword having a predetermined matching rate in the same class as the compared keyword. Produces a class of Note that such a determination may be made manually.

【００２０】［自由記述回答の分析］キーワードのグル
ープ分けに基づいて分析用データシートを作成する。分
析用データシートは、各グループの先頭のキーワード
（文字列）で構成されたシートである。各先頭のキーワ
ードには階層的に複数のキーワードが存在する。図６に
示すように、「010 カルビー弁当には、「牛カルビ弁
当」、「カルビー丼」、「カルビ丼」が含まれている。
この分析用データシートと自由記述回答文とを比較し、
キーワードと同じ文字列を含む回答があれば、「反応あ
り」として全てカウントしていく。図７は、自由記述回
答分析の際に作成されるファイルを例示する図であり、
「010 カルビー弁当」に反応した回答の一覧、「030 幕
の内弁当」に反応した回答の一覧を示している。本発明
の解析法では、一つの回答が複数のキーワードに反応す
る場合が有り、分析結果は、「一人複数回答可」と同じ
意味合いを有することになる。尚、分析用データシート
は、後に分析結果を整理する際に有用であるが、単に、
データベースと自由記述回答とを比較して「反応」をカ
ウントし、後から、各類毎に整理してもよい。[Analysis of Freely Descended Answer] An analysis data sheet is created based on the grouping of keywords. The analysis data sheet is a sheet composed of a keyword (character string) at the head of each group. A plurality of keywords exist hierarchically at each head keyword. As shown in FIG. 6, "010 Calbee lunch" includes "beef rib lunch", "Calbee bowl", and "Calbi bowl".
Compare this analysis data sheet with the free text answer,
If there is an answer that includes the same character string as the keyword, all are counted as "Responsive". FIG. 7 is a diagram exemplifying a file created at the time of the free description answer analysis,
A list of responses to “010 Calbee lunch” and a list of responses to “030 Act Uchino bento” are shown. In the analysis method of the present invention, one answer may respond to a plurality of keywords, and the analysis result has the same meaning as "one or more answers possible". The analysis data sheet is useful for organizing the analysis results later.
The “reaction” may be counted by comparing the database with the free description answer, and may be arranged for each class later.

【００２１】［反応がカウントされた人の属性と分析結
果との関連付け］図８は、分析結果を表示する表であ
り、分析用データシートにおけるキーワード毎の反応数
を、性別（男女）、年齢層（少年／少女、２０代、３０
代、４０代）に分けて示している。図９は、さらに、主
成分分析による要素の位置関係を示す図である。FIG. 8 is a table displaying the analysis results. The number of reactions for each keyword in the analysis data sheet is represented by gender (male and female) and age. Layer (boy / girl, 20's, 30)
Teens and forties). FIG. 9 is a diagram further illustrating a positional relationship between elements based on principal component analysis.

【００２２】［他の自由記述回答例］図１０は、「自動
車購入のプロセス」に関する自由記述回答の抜粋を示し
ている。左側の数字は、会員番号であり、会員番号をキ
ーとして、回答した会員（被験者）の属性が取り出さ
れ、回答と関連付けられる。「自動車購入のプロセス」
に対する回答は、全般的に、前述の「おいしかった弁
当」に対する回答よりも長い傾向があるが、本発明に係
る解析法は、自由記述回答の長さには影響されるもので
はない。[Another Example of Freely-Descripted Answer] FIG. 10 shows an excerpt of a freely-described answer related to the “automobile purchase process”. The number on the left is a member number, and the attribute of the member (subject) who answered is extracted using the member number as a key, and is associated with the answer. "The process of purchasing a car"
Generally tend to be longer than the answer to the above-mentioned "Delicious Lunch Box", but the analysis method according to the present invention is not affected by the length of the free description answer.

【００２３】自由記述回答が長い場合の分析の一例につ
いて説明する。文頭から最初の句点ないし読点、読点か
ら次の句点ないし読点までの文字列を最長文字列とし
て、そこから文字列を順次縮めていく（すなわち、句読
点は文字列を構成しない。）。例えば、「インターネッ
トで価格帯と大まかな条件に見合う車探し。車種／メー
カーが絞られていき、ある程度絞った後連絡して見積り
その後の実車の試乗。」という自由記述回答があった場
合、この自由記述回答を、「インターネットで価格帯と
大まかな条件に見合う車探し」、「車種／メーカーが絞
られていき」、「ある程度絞った後連絡して見積りその
後の実車の試乗」の三つに分割する。「インターネット
で価格帯と大まかな条件に見合う車探し」について言う
と、先ず、「インターネットで価格帯と大まかな条件に
見合う車探し」と同一の文字列が他の自由記述回答に存
在するかを見て、漸次、文字列を短縮しながら、他の自
由記述回答における文字列との一致を判定していく。例
えば、文字列「インターネットで」まで短縮された時
に、他の自由記述回答に、「インターネットで検索」と
いう文字列があれば、「インターネットで」がキーワー
ドの素として抽出される。「車種／メーカーが絞られて
いき」、「ある程度絞った後連絡して見積りその後の実
車の試乗」についても同様の工程で、キーワードの素が
抽出される。キーワードの素を抽出した後は、前述の実
施例と同様の工程で、自由記述回答の解析を行なう。An example of the analysis when the free description answer is long will be described. The character string from the beginning of the sentence to the first punctuation mark or the punctuation mark, and the character string from the punctuation mark to the next punctuation mark or the punctuation mark is set as the longest character string, and the character string is sequentially reduced therefrom (that is, the punctuation marks do not form a character string). For example, if there is a free description answer such as "Searching for cars that match the price range and rough conditions on the Internet. Car models / manufacturers will be narrowed down, contact after narrowing down to some extent, estimate and then test drive the actual car" Free-form answers are divided into three categories: "Searching for cars that match the price range and rough conditions on the Internet", "Narrowing down vehicle models / manufacturers", and "Contacting them after narrowing down to some extent and estimating and then testing actual cars" To divide. Regarding "Searching for a car that meets the price range and rough conditions on the Internet", first, check whether the same character string as "Search for a car that matches the price range and rough conditions on the Internet" exists in other open-ended answers. As a result, while gradually shortening the character string, it is determined whether or not the character string matches with the character string in another free description answer. For example, when the character string is shortened to "on the Internet" and another free description answer includes the character string "search on the Internet", "on the Internet" is extracted as a keyword element. In the same process, the keywords are extracted for "car type / manufacturer is narrowed down" and "test after a certain squeeze to contact and estimate the actual vehicle". After extracting the element of the keyword, the free description answer is analyzed in the same steps as in the above-described embodiment.

【００２４】[シソーラスを用いたキーワードの分類]シ
ソーラスを用いたキーワードの分類について説明する。
図１１は、労働問題に関する調査から集めた重要語を分
類したシソーラスの例示である。シソーラスの作成法は
限定されないが、例えば、本発明に係る手法を用いてキ
ーワードを抽出し、抽出されたキーワードをマニュアル
で分類することによって作成される。図１２は、本発明
に係る手法を用いて抽出したキーワードを図１１に示す
シソーラスに照合させたものを示す図であって、シソー
ラスにある語彙が発見されたものには、最大３個までの
候補先が割り当てられる。左から三つの数字の列がコー
ド番号を指しており、例えば、「営業成績」というキー
ワードに対しては、シソーラスにある「営業（４
１）」、「成績（１１）」の二つの語彙が割り当てられ
る。また、２列目、３列目が０となっているのは、第二
候補、第三候補の該当がないことを意味している。ちな
みにキーワードの左の数字はヒット数である。次いで、
シソーラスにおける同じコード番号に分類されたキーワ
ードを自動的にまとめて、キーワードを分類する。例え
ば、コード番号（１１）が割り当てられたキーワードと
しては、「その業績」、「ポイントだ」、「営業成
績」、「基本給＋実績」、「業績」、「業務実績」があ
り、これらを一つの類として分類する。尚、最終的な分
析用データシートを作成する時に、マニュアルによる微
調整を行なうこともあり得る。[Classification of Keywords Using Thesaurus] Classification of keywords using a thesaurus will be described.
FIG. 11 is an example of a thesaurus in which important words collected from a survey on labor issues are classified. The method of creating the thesaurus is not limited. For example, the thesaurus is created by extracting a keyword using the method according to the present invention and manually classifying the extracted keyword. FIG. 12 is a diagram showing a keyword extracted by using the method according to the present invention collated with the thesaurus shown in FIG. 11. In the case where a vocabulary in the thesaurus is found, up to three keywords are included. Candidates are assigned. A column of three numbers from the left indicates a code number. For example, for the keyword “sales performance”, “sales (4
Two vocabularies of “1)” and “graduation (11)” are assigned. A value of 0 in the second and third columns indicates that there is no corresponding second or third candidate. By the way, the number to the left of the keyword is the number of hits. Then
The keywords classified into the same code number in the thesaurus are automatically collected and the keywords are classified. For example, keywords to which the code number (11) is assigned include “its performance”, “points”, “business results”, “basic salary + actual results”, “performance”, and “business results”. Classify as one type. When a final analysis data sheet is created, manual fine adjustment may be performed.

【００２５】[0025]

【発明の効果】本発明によれば、自由記述回答を自動的
に解析すること、あるいは解析を支援することができる
ので、従来の手作業に比べて、解析の省力化、解析時間
の短縮が大幅に図れる。本発明は、自由記述回答をいち
いち形態素に分解したりすることなく、各自由記述回答
文（生データ）を比較して同一表現（文字列）を抽出す
ることによってキーワードを拾い出すので、キーワード
抽出手段が簡単であると共に、アップツーデートなキー
ワードからキーワードデータベースを構築することがで
きる。さらに、実際の自由記述回答文にある表現からな
るキーワードによって、自由記述回答文を検索するた
め、キーワードを含む自由記述回答文を確実にカウント
することができる。According to the present invention, the free description answer can be automatically analyzed or the analysis can be supported, so that the analysis can be labor-saving and the analysis time can be reduced as compared with the conventional manual operation. Can be drastically improved. According to the present invention, keywords are picked up by comparing each free description answer sentence (raw data) and extracting the same expression (character string) without decomposing the free description answer into morphemes. The means is simple and a keyword database can be constructed from up-to-date keywords. Furthermore, since the free description answer sentence is searched by the keyword composed of the expression in the actual free description answer sentence, the free description answer sentence including the keyword can be reliably counted.

[Brief description of the drawings]

【図１】自由記述回答例を示す図である。FIG. 1 is a diagram showing an example of a free description answer.

【図２】複数人の同一表現として抽出されたキーワード
の素の一部を示す図である。FIG. 2 is a diagram illustrating a part of a keyword extracted as the same expression of a plurality of persons;

【図３】図３のものをアイウエオ順に並べたものの一部
を示す図である。FIG. 3 is a diagram showing a part of the arrangement of FIG.

【図４】図４のものから重複表現を削除したものの一部
を示す図である。FIG. 4 is a diagram showing a part of the configuration shown in FIG. 4 from which redundant expressions have been deleted;

【図５】図５のものから不要語を削除し、かつ、キーワ
ードを分類したものの一部を示す図である。FIG. 5 is a diagram showing a part of a keyword obtained by deleting unnecessary words from those shown in FIG. 5 and classifying keywords;

【図６】分析用データシートを示す図である。FIG. 6 is a diagram showing a data sheet for analysis.

【図７】キーワードデータベースを用いて自由記述回答
を分析する際に作成されるファイルの例を示す図であ
る。FIG. 7 is a diagram showing an example of a file created when analyzing a freely described answer using a keyword database.

【図８】キーワードごとの反応数と属性との関係を示す
表である。FIG. 8 is a table showing the relationship between the number of reactions and attributes for each keyword.

【図９】主成分分析による要素の位置関係を示す図であ
る。FIG. 9 is a diagram showing a positional relationship of elements by principal component analysis.

【図１０】他の自由記述回答文を例示する図である。FIG. 10 is a diagram illustrating another free description answer sentence.

【図１１】キーワードの分類を行なうためのシソーラス
を示す図である。FIG. 11 is a diagram showing a thesaurus for classifying keywords.

【図１２】得られたキーワードとシソーラスとを照らし
合わせた図である。FIG. 12 is a diagram obtained by comparing the obtained keyword with a thesaurus.

Claims

[Claims]

(A) extracting the same character string included in two or more answers from a large number of freely described answers to generate a prime of the keyword; and (b) repeating the same expression from the prime of the keyword. And a step of constructing a keyword database by deleting unnecessary character strings that cannot be keywords, and (c) comparing the large number of free description answers with the database, and including each keyword in each free description answer If there is a character string, count as "Responsive",
(D) associating the attribute of the person whose reaction has been counted with the keyword and analyzing the free-form answer.

2. The method according to claim 1, wherein the two or more answers are answers of two or more different respondents.

3. The method according to claim 1, wherein the two or more answers include a copy of the same person's answer.

4. The analysis of a free description answer according to claim 1, wherein the step (a) includes extracting an expression that matches the longest character string between the free description answers. Law.

5. The free description according to claim 4, wherein the step of extracting the expression that matches the longest in the character string between the free description answers is repeated until the extracted character string reaches a predetermined number of characters. How to parse the answer.

6. A method according to claim 5, wherein the predetermined number of characters is two or three.

7. A method according to claim 4, wherein the character string from the beginning of the sentence to the first punctuation mark or reading point and the character string from the punctuation mark to the next punctuation mark or reading point is the longest character string, and the character string is sequentially reduced therefrom. Analysis method of open-ended answer which is the feature.

8. The method according to claim 1, wherein in the step (b), the step of deleting an unnecessary character string deletes a character string whose head is any one of a prompt sound, a long note, a sound repellent sound, and a symbol. A method of analyzing a free-form answer, which includes:

9. An analysis of a free description answer according to claim 1, wherein in step (b), the step of deleting unnecessary words is performed by comparison with a previously stored unnecessary word database. Law.

10. The method of analyzing a free description answer according to claim 1, wherein the step (b) further includes a step of classifying a keyword.

11. A method according to claim 10, further comprising: classifying keywords including the same character string into the same class.

12. A free description answer analysis method according to claim 10, further comprising: classifying keywords having similar meanings into the same class.

13. A method according to claim 12, wherein keywords are classified by a previously stored similar word database.

14. The free description answer analysis method according to claim 1, wherein the free description answer sentence and the attribute information of the respondent are associated in advance.

15. A step of (a) extracting the same character string included in two or more answers from a large number of freely described answers to generate a prime of a keyword; and (b) an identical character duplicated from the prime of the keyword. (C) comparing a keyword database stored in advance with the keyword candidate, and deleting the keyword candidate and deleting the unnecessary character string that cannot be a keyword. If the character string does not exist in the list, a step of adding a new keyword and updating the database to construct a new database; and (d) comparing the large number of free description answers with the new database, and If there is a character string including each keyword of the new database in the descriptive answer, counting as "reaction"; Analysis of open-ended answers with a step, the associating the counted human attributes and the keyword.

16. A method according to claim 15, wherein keywords in the existing database are classified, and when a new keyword is added, it is determined whether to include the keyword in an existing class or create another class. Analysis method of open-ended answer including.

17. The keyword database according to claim 15, wherein the keyword database is a thesaurus, and each vocabulary constituting the thesaurus is provided with an identification symbol, and the obtained keywords constitute the thesaurus. If any of the vocabulary words is included, an analysis method of a free description answer including a step of assigning an identification symbol of the vocabulary to the keyword and classifying the keyword obtained based on the assigned identification symbol.

18. A method comprising the steps of: (a) extracting a longest-matching expression in a character string between a large number of freely described documents, and repeating this step until the extracted character string has a predetermined number of characters. Extracting the same character string included in two or more free description documents and generating a keyword prime;
(B) a step of deleting the same character string that is duplicated from the element of the keyword, and a step of extracting an unnecessary character string that cannot be a keyword and extracting a keyword candidate.

19. The free description document according to claim 18, wherein the free description document includes one or more words, one or more phrases, one or more sentences, and any combination thereof. A keyword extraction method characterized in that:

20. The keyword extracting method according to claim 18, wherein the predetermined number of characters is two or three.

21. The method according to claim 18, wherein
A keyword extraction method characterized in that a character string from the beginning of a sentence to the first punctuation mark or a punctuation mark or a character string from a punctuation mark to the next punctuation mark or a reading point is the longest character string, and the character string is sequentially reduced therefrom.

22. A method of extracting a longest matching expression in a character string between a large number of freely described documents, and repeating this step until the extracted character string has a predetermined number of characters, so that at least two or more characters are extracted. A free description document analysis method including extracting the same character string contained in a free description document and generating keyword candidates.