JP2011018155A

JP2011018155A - Method, device and program for creating infant vocabulary development database

Info

Publication number: JP2011018155A
Application number: JP2009161592A
Authority: JP
Inventors: Tetsuo Kobayashi; 哲生小林; Masaaki Nagata; 昌明永田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-07-08
Filing date: 2009-07-08
Publication date: 2011-01-27
Anticipated expiration: 2029-07-08
Also published as: JP5371589B2

Abstract

【課題】ユーザ参加型の方式で収集したデータをもとに信頼性の高い幼児語彙発達データベースを構築する技術を提供する。
【解決手段】入力手段１１が、ユーザ端末を通じて入力された単語情報を受け付け、獲得月齢生成手段１２が、受け付けた単語情報の獲得月齢を算出し、意味カテゴリ付与手段１３が、受け付けた単語情報に意味カテゴリを付与し、不正検出手段１４が、獲得月齢あるいは意味カテゴリを、あらかじめ用意した複数の定義基準に当てはめて不正な単語情報を排除し、平均獲得月齢算出手段１５が、排除されずに有効と認められた単語情報の平均獲得月齢を算出し、信頼性決定手段１６が、平均獲得月齢から単語情報の信頼性を判断し、生成手段１７が、信頼性を認められた単語情報をもとに幼児語彙発達データベースを生成する。
【選択図】図１The present invention provides a technique for constructing a highly reliable infant vocabulary development database based on data collected by a user participation type method.
An input unit 11 receives word information input through a user terminal, an acquired age generation unit 12 calculates an acquired age of the received word information, and a semantic category providing unit 13 adds the received word information to the received word information. A semantic category is assigned, the fraud detection means 14 applies the acquired age or semantic category to a plurality of definition criteria prepared in advance to eliminate illegal word information, and the average acquired age calculation means 15 is effective without being excluded. The average acquisition age of the word information recognized as being calculated is calculated, the reliability determination means 16 determines the reliability of the word information from the average acquisition age, and the generation means 17 is based on the word information whose reliability is recognized. A database for infant vocabulary development is generated.
[Selection] Figure 1

Description

本発明は、ウェブ（Ｗｅｂ）上から投稿されたデータを利用して幼児語彙発達データベースを作成する技術であり、特に投稿データの信頼性を確保して高品質な前記データベースを実現させる技術に関する。 The present invention relates to a technology for creating an infant vocabulary development database using data posted from the Web, and more particularly to a technology for ensuring the reliability of posted data and realizing the high-quality database.

現在、ウェブ上には、英和辞書や国語辞書などの従来の書籍版辞書を電子化してウェブサイト上で閲覧するものから、「ｗｉｋｉｐｅｄｉａ」に代表されるユーザ参加型の辞書まで、様々な辞書およびデータベースの閲覧サービスが存在する。 Currently, there are various dictionaries on the web, ranging from electronic book-version dictionaries such as English-Japanese dictionaries and national language dictionaries to browse on the website to user-participation dictionaries represented by “wikipedia”. There is a database browsing service.

書籍版の辞書と比較して、ウェブ上で辞書やデータベースを公開する最大のメリットは、多数のユーザによる情報の追加・変更を容易に実施可能なため、新しい情報をタイムリーに追加できる点にある。すなわち、投稿された情報の精度や信頼性に関するデメリットは考えられるものの、こうしたウェブの特性を生かせば、これまでに存在しなかった種類の辞書やデータベースを効率的にかつ迅速に構築することが可能である。 The biggest advantage of publishing dictionaries and databases on the web compared to the book version is that many users can easily add and change information, so that new information can be added in a timely manner. is there. In other words, although there are possible disadvantages related to the accuracy and reliability of posted information, it is possible to efficiently and quickly build a kind of dictionary and database that did not exist so far by taking advantage of these web characteristics. It is.

こうした中、現在、「こども語辞書」と呼ばれる幼児語彙発達に関するデータベースがウェブ上で公開され、世界に類のない辞書が構築されつつある。これは幼児が何時、どのような単語を発話するようになったかというデータを、ウェブの日誌ツール上にユーザが随時記録し、その記録情報を整理・加工して作成したデータベースをウェブ上で閲覧できるようにしたものである。この閲覧サービスによれば、ある単語がいつ獲得されるか（例えば単語「ママ」：獲得平均１５．４ヶ月齢）や、幼児語の意味（幼児語「しゃ」→意味：「電車」）などを簡単に検索・閲覧でき、０−３歳の子どもを持つ親には有用な育児情報提供サービスである（非特許文献１）。 Under such circumstances, a database on infant vocabulary development called “Children's Dictionary” has been released on the web, and a dictionary unique to the world is being constructed. This is the data recorded by the user on the web diary tool, and the database created by organizing and processing the recorded information on the web. It is something that can be done. According to this browsing service, when a certain word is acquired (for example, the word “mom”: acquisition average 15.4 months old), the meaning of an infant word (infant word “sha” → meaning: “train”), etc. Can be easily searched and viewed, and is a childcare information providing service useful for parents who have children aged 0 to 3 (Non-patent Document 1).

小林哲生，永田昌明 ”ウェブを通じた初期語彙発達データ収集の試みとその応用” 日本赤ちゃん学会第８回学術集会２００８年４月１２日・１３日開催抄録集ｐｐ．７３Tetsuo Kobayashi, Masaaki Nagata “Attempts to collect initial vocabulary development data through the web and its applications” The 8th Annual Meeting of the Japanese Society of Baby Sciences Abstracts held April 12-13, 2008 pp. 73 Ｋｉｍ，Ｍ．，ＭｃＧｒｅｇｏｒ，Ｋ．Ｋ．，＆Ｔｈｏｍｐｓｏｎ，Ｃ．Ｋ．（２０００） ”ＥａｒｌｙｌｅｘｉｃａｌｄｅｖｅｌｏｐｍｅｎｔｉｎＥｎｇｌｉｓｈ− ａｎｄＫｏｒｅａｎ−ｓｐｅａｋｉｎｇｃｈｉｌｄｒｅｎ：ｌａｎｇｕａｇｅ−ｇｅｎｅｒａｌａｎｄｌａｎｇｕａｇｅ−ｓｐｅｃｉｆｉｃｐａｔｔｅｒｎｓ” ＪｏｕｒｎａｌｏｆＣｈｉｌｄＬａｎｇｕａｇｅ，２７，２２５−２５４Kim, M.M. McGregor, K .; K. , & Thompson, C.I. K. (2000) "Early lexical development in England- and Korean-speaking children: language-general and language-specific patterns, Journal of Children 2 25 小椋たみ子，綿貫徹 ”日本の子どもの語彙発達の規準研究：日本語マッカーサー乳児言語発達質問紙から” 京都国際社会福祉センター紀要「発達・療育研究」２００８．１１第２４号ｐｐ．３−４２Tamiko Kotsuki, Toru Watanuki "Standard Research on Vocabulary Development in Japanese Children: From the Japanese MacArthur Infant Language Development Questionnaire" Bulletin of Kyoto International Social Welfare Center "Development and Rehabilitation Research" 2008.11 24th pp. 3-42

こうした幼児語彙発達に関するデータの収集は、通常、発達心理学や心理言語学の分野の研究者が子どもの発話行動を観察したり、母親に直接インタビューをしたりするため、データの信頼性が問題となることはほとんどない。 The collection of data related to infant vocabulary development is usually a problem because researchers in the fields of developmental psychology and psycholinguistics observe children's utterance behaviors and directly interview their mothers. Almost never.

しかしながら、ユーザ投稿のデータに基づいて幼児語彙発達データベースを構築する場合には、不正データの投稿などによるデータの信頼性が問題となる。特に、前記「こども語辞書」の場合には、ある単語がいつ発話されたかに関する各個人の情報が重要な構成要素となるため、多くのユーザがデータの正確さを欠いて投稿したり、実際に子どもがいないのに虚偽のデータを悪質に投稿したりすると、集計されたデータの精度が低下するおそれがある。 However, when an infant vocabulary development database is constructed based on user-submitted data, the reliability of data due to posting of illegal data becomes a problem. In particular, in the case of the “Children's Word Dictionary”, each individual's information about when a certain word was spoken is an important component, so many users can post data with inaccuracy of data, If there are no children, posting false data maliciously may reduce the accuracy of the aggregated data.

また、前記データベースを閲覧サービスとして一般公開する場合には、より一層の正確な情報を提示する責任が求められる。したがって、何らかの技術でこうした問題を回避する必要があり、データベースの精度および信頼性を確保することが重要となる。 Further, when the database is opened to the public as a browsing service, a responsibility for presenting more accurate information is required. Therefore, it is necessary to avoid these problems with some technique, and it is important to ensure the accuracy and reliability of the database.

本発明は、上記課題を解決するためのものであり、ユーザ参加型の方式で収集したデータをもとに信頼性の高い幼児語彙発達データベースを構築する技術の提供を解決課題としている。 The present invention has been made to solve the above-mentioned problems, and it is an object of the present invention to provide a technique for constructing a reliable infant vocabulary development database based on data collected by a user participation type method.

そこで、本発明は、ユーザ参加型の方式で収集したデータに基づき幼児語彙発達データベースを作成する際に幼児語彙発達特性を生かした不正データ検出および信頼性検証のプロセスを実施し、高精度・高品質のデータベース作成技術を提供する。 Therefore, the present invention implements a process for detecting and reliability of fraudulent data that makes use of infant vocabulary development characteristics when creating an infant vocabulary development database based on data collected by a user participation type method. Provide quality database creation technology.

本発明の一態様は、ユーザの端末を通じてウェブ上に投稿された単語情報を利用して、幼児語彙発達データベースを作成するための方法であって、入力手段が、前記端末に単語情報を入力するインタフェースを表示させ、前記端末を通じてユーザが入力した単語情報を受け付ける入力受付ステップと、不正検出手段が、前記入力受付ステップで受け付けた単語情報の獲得月齢あるいは意味カテゴリを、あらかじめ用意した複数の定義基準に当てはめて不正な単語情報を検出し、該不正情報を排除する不正検出ステップと、平均獲得月齢算出手段が、前記不正検出ステップで排除されずに有効と認められた単語情報の平均獲得月齢を算出する平均獲得月齢算出ステップと、信頼性決定手段が、前記平均月齢算出ステップで算出した平均獲得月齢に基づき有効と認められた単語情報の信頼性を判断する信頼性決定ステップと、生成手段が、前記信頼性決定ステップで信頼性を認められた単語情報をもとに幼児語彙発達データベースを生成する生成ステップと、を有する。 One aspect of the present invention is a method for creating an infant vocabulary development database using word information posted on the web through a user's terminal, wherein the input means inputs the word information to the terminal. An input receiving step for displaying the interface and receiving the word information input by the user through the terminal, and the fraud detection means, a plurality of definition criteria prepared in advance for the acquired age or meaning category of the word information received in the input receiving step And the fraud detection step for detecting fraudulent word information and eliminating the fraud information, and the average acquisition age calculation means determine the average acquisition age of the word information that is recognized as valid without being excluded in the fraud detection step. The average acquired age calculated by the average acquired age calculated in the average acquired age calculated in the average acquired age A reliability determination step for judging the reliability of the word information recognized as valid based on the above, and generation means for generating the infant vocabulary development database based on the word information whose reliability is recognized in the reliability determination step Steps.

本発明の他の態様は、ユーザの端末を通じてウェブ上に投稿された単語情報を利用して、幼児語彙発達データベースを作成するための装置であって、前記端末に単語情報を入力するインタフェースを表示させ、前記端末を通じてユーザが入力した単語情報を受け付ける入力手段と、前記入力手段で受け付けた単語情報の獲得月齢あるいは意味カテゴリを、あらかじめ用意した複数の定義基準に当てはめて不正な単語情報を検出し、該不正情報を排除する不正検出手段と、前記不正検出手段で排除されずに有効と認められた単語情報の平均獲得月齢を算出する平均獲得月齢算出手段と、前記平均月齢算出手段の算出した平均獲得月齢に基づき有効と認められた単語情報の信頼性を判断する信頼性決定手段と、前記信頼性決定手段で信頼性を認められた単語情報をもとに幼児語彙発達データベースを生成する生成手段と、を備える。 Another aspect of the present invention is an apparatus for creating an infant vocabulary development database using word information posted on the web through a user terminal, and displaying an interface for inputting word information on the terminal Incorrect word information is detected by applying the input means for receiving word information input by the user through the terminal and the acquired age or semantic category of the word information received by the input means to a plurality of definition criteria prepared in advance. Calculated by the fraud detection means for eliminating the fraud information, the average acquisition age calculation means for calculating the average acquisition age of the word information that is recognized as valid without being excluded by the fraud detection means, and the average age calculation means Reliability determination means for judging the reliability of word information recognized as effective based on average acquired age, and reliability is recognized by the reliability determination means. Based on the word information comprises generation means for generating an infant vocabulary development database, a.

なお、本発明は、前記装置の各手段としてコンピュータを機能させるためのプログラムの態様としてもよい。このプログラムは、記録媒体に格納した態様で提供してもよい。 In addition, this invention is good also as an aspect of the program for functioning a computer as each means of the said apparatus. This program may be provided in a form stored in a recording medium.

本発明によれば、ユーザ参加型の方式を通じて収集したデータをもとに信頼性の高い幼児語彙発達データベースが構築される。 According to the present invention, a highly reliable infant vocabulary development database is constructed based on data collected through a user participation method.

本発明の実施形態に係る幼児語彙発達データベース作成装置の基本構成図。The basic composition figure of the infant vocabulary development database creation device concerning the embodiment of the present invention. 同処理チャート図。FIG. 実施例に係る幼児語彙発達データベース作成装置の構成図。The block diagram of the infant vocabulary development database creation apparatus which concerns on an Example. 同ユーザインタフェースの単語入力例のイメージ図。The image figure of the word input example of the same user interface. 同意味カテゴリ分類表の一例を示す図。The figure which shows an example of a synonym category classification table. 同意味カテゴリ定義辞書の一例を示す図。The figure which shows an example of a synonym category definition dictionary. 同不正データ検出部の処理を示すチャート図。The chart figure which shows the process of the same unauthorized data detection part. 同横断５０％到達月齢辞書の一例を示す図。The figure which shows an example of the same crossing 50% arrival age dictionary. 同幼児語彙発達データベースの一例を示す図。The figure which shows an example of the same infant vocabulary development database. 同ユーザインタフェース閲覧部の提供する検索ページのイメージ図。The image figure of the search page which the user interface browsing part provides. 同ユーザインタフェース閲覧部の提供する検索結果ページのイメージ図。The image figure of the search result page which the user interface browsing part provides.

≪基本構成≫
図１は、本発明の実施形態に係る幼児語彙発達データベースの作成装置の基本的な構成を示している。この作成装置１は、図示省略のユーザ端末とインターネット経由で接続されているものとする。 ≪Basic structure≫
FIG. 1 shows a basic configuration of an apparatus for creating an infant vocabulary development database according to an embodiment of the present invention. This creation device 1 is connected to a user terminal (not shown) via the Internet.

ここでは前記作成装置１は、前記ユーザ端末からの幼児語彙発達に関する投稿データに対して、不正データ検出及びデータ信頼性決定の処理プロセスを実施し、高品質の幼児語彙発達データベースを作成する。具体的には、前記作成装置１は、コンピュータにより構成され、通常のコンピュータのハードウェアリソース、例えばＣＰＵ，メモリ（ＲＡＭ），ハードディスクドライブ装置，通信デバイスなどを備えている。 Here, the creation device 1 performs a processing process of fraudulent data detection and data reliability determination on post data relating to infant vocabulary development from the user terminal, and creates a high-quality infant vocabulary development database. Specifically, the creation device 1 is configured by a computer and includes hardware resources of a normal computer, such as a CPU, a memory (RAM), a hard disk drive device, and a communication device.

このハードウェアリソースとソフトウェアリソース（ＯＳ，アプリケーション）との協同の結果、前記作成装置１は、前記ユーザ端末に対して幼児語彙の特徴に応じて単語を入力可能なユーザインタフェースを提供するユーザインタフェース入力手段１１と、該入力手段１１を通じて入力された各単語データの獲得月齢を計算する単語獲得月齢生成手段１２と、各入力単語に意味カテゴリを付与する幼児語彙意味カテゴリ付与手段１３と、該両手段１２．１３の処理結果を利用して入力単語群に複数の定義基準を当てはめて不正データを検出する不正データ検出手段１４と、該検出手段１４で排除されなかった有効なデータセットから入力単語の平均獲得月齢を生成する平均獲得月齢生成手段１５と、該生成手段１５の算出した平均獲得月齢を用いて前記データセットの信頼性を判断するデータ信頼性決定手段１６と、該決定手段１６の判断した信頼性の高いデータセットをもとに各入力単語の項目を生成して幼児語彙発達データベース化するデータベース項目生成手段１７と、該生成手段１７の生成した前記データベースを前記ユーザ端末で閲覧・検索可能なユーザインタフェースを提供するユーザインタフェース閲覧手段１８と、を有している。 As a result of the cooperation between the hardware resource and the software resource (OS, application), the creation device 1 provides a user interface that can input a word to the user terminal according to the characteristics of the infant vocabulary. Means 11, word acquisition age generation means 12 for calculating the acquisition age of each word data input through the input means 11, infant vocabulary meaning category assignment means 13 for assigning a semantic category to each input word, both means 12. By using the processing result of 12.13, a plurality of definition criteria are applied to the input word group to detect illegal data, and the invalid data detecting means 14 detects the input word from the valid data set that is not excluded by the detecting means 14. Average acquired age generation means 15 for generating an average acquired age, and average acquired age calculated by the generation means 15 A data reliability determination unit 16 for determining the reliability of the data set using the data set, and an entry word item is generated based on the highly reliable data set determined by the determination unit 16 to form an infant vocabulary development database. Database item generation means 17 for performing the search, and user interface browsing means 18 for providing a user interface capable of browsing and searching the database generated by the generation means 17 on the user terminal.

ここで前記生成手段１７は、前記データベースを前記ハードディスクドライブ装置上に生成するものとする。また、前記入力手段１１および前記閲覧手段１８は、前記通信デバイスを通じて前記ユーザ端末に各ユーザインタフェースを提供するものとする。なお、前記入力手段１１は、ウェブ日誌ツールのユーザインタフェースを提供するものとする。 Here, the generation unit 17 generates the database on the hard disk drive device. The input unit 11 and the browsing unit 18 provide each user interface to the user terminal through the communication device. The input unit 11 provides a user interface for a web diary tool.

図２は、前記作成装置１のデータベース作成プロセスを示している。ここではまず、前記入力手段１１を通じて前記ユーザ端末に表示されたウェブ日誌ツールのインタフェースにわが子の覚えた単語を入力させ、該入力された単語を前記入力手段１１がインターネット経由で受け付けるウェブ日誌ツール単語受付ステップ（Ｓ０１）が実施される。 FIG. 2 shows a database creation process of the creation device 1. Here, first, a word remembered by a child is input to the interface of the web diary tool displayed on the user terminal through the input unit 11, and the web diary tool word that the input unit 11 accepts via the Internet. A reception step (S01) is performed.

つぎにＳ０１で受け付けた各入力単語に対して、前記生成手段１２が獲得月齢を算出する単語獲得月齢生成ステップ（Ｓ０２）が実施され、その後に該各入力単語の意味カテゴリを前記付与手段１３が付与する幼児語彙意味カテゴリ付与ステップ（Ｓ０３）が実施される。 Next, for each input word received in S01, a word acquisition age generation step (S02) in which the generation means 12 calculates an acquisition month age is performed, and then the meaning category of each input word is given by the assignment means 13. An infant vocabulary meaning category assignment step (S03) to be provided is performed.

続いてＳ０１で受け付けた入力単語群に対して、前記検出手段１４が幼児語彙の科学的特徴から考案された複数の基準指標に基づいて不正データを検出・排除する不正データ検出ステップ（Ｓ０４）が実施される。この検出ステップの実施にはＳ０２で算出された単語獲得月齢とＳ０３で付与された意味カテゴリとが利用される。 Subsequently, for the input word group received in S01, there is an illegal data detection step (S04) in which the detection means 14 detects and eliminates illegal data based on a plurality of reference indices devised from scientific characteristics of infant vocabulary. To be implemented. The detection step is performed by using the word acquisition age calculated in S02 and the semantic category assigned in S03.

そして、Ｓ０４で排除されなかった有効な各入力単語に対して、前記生成手段１５にて平均獲得月齢を生成する平均獲得月齢生成ステップ（Ｓ０５）が実施される。ここで生成された平均獲得月齢に基づき前記決定手段１６が、あらかじめ単語毎の獲得月齢を集計・リスト化した辞書を参照し、各入力単語の信頼性を評価・決定するデータ信頼性決定ステップ（Ｓ０６）が実施される。 Then, for each valid input word that has not been excluded in S04, the generation means 15 performs an average acquired age generation step (S05) in which an average acquired age is generated. A data reliability determination step in which the determination means 16 evaluates / determines the reliability of each input word by referring to a dictionary in which the acquired age for each word is previously tabulated / listed based on the generated average acquired age. S06) is performed.

このＳ０６の段階で信頼性の高かった各入力単語をもとに、前記生成手段１７が最終的なデータベース項目を生成するデータベース項目生成ステップ（Ｓ０７）が実施される。ここでは生成された前記項目がデータベース化されて幼児語彙発達データベースとして利用され、前記閲覧手段１８を通じて前記ユーザ端末に閲覧・検索可能なユーザインタフェースが表示される。 A database item generating step (S07) is performed in which the generating means 17 generates a final database item based on each input word having high reliability in the stage of S06. Here, the generated items are converted into a database and used as an infant vocabulary development database, and a user interface that can be browsed and searched is displayed on the user terminal through the browsing means 18.

≪実施例≫
図３は、前記作成装置１の実施例の構成を示し、ユーザインタフェース入力部３１，単語獲得月齢生成部３２，幼児語彙意味カテゴリ付与部３３，中間データ保持部３４，不正データ検出部３５，平均獲得月齢生成部３６，データ信頼性決定部３７，データベース項目生成部３８，ユーザインタフェース閲覧部３９を有している。ここでは前記各部３１〜３３．３５〜３９は、それぞれ前記各手段１１〜１８に対応している。以下、各部３１〜３９を個別具体的に説明する。 <Example>
FIG. 3 shows a configuration of an embodiment of the creation device 1, which includes a user interface input unit 31, a word acquisition age generation unit 32, an infant vocabulary meaning category assignment unit 33, an intermediate data holding unit 34, an unauthorized data detection unit 35, an average It has an acquired age generation unit 36, a data reliability determination unit 37, a database item generation unit 38, and a user interface browsing unit 39. Here, the parts 31 to 33.35 to 39 correspond to the means 11 to 18, respectively. Hereinafter, each part 31-39 is demonstrated individually concretely.

（１）ユーザインタフェース入力部３１
前記ユーザインタフェース入力部３１は、インターネット経由で前記ユーザ端末のブラウザに個人用のウェブ日誌ツールのユーザインタフェースを表示させる。このインタフェースによれば、図４に示すように、日時ごと（何時「例：２００ｘ年ｘ月ｘ日」）に、どんな単語（例：わんわん）をどんな意味（例：犬）で発話したかを記録するための入力欄Ｑ．Ｒが表示される。ここで入力される単語情報、即ち前記入力欄Ｑの入力データを入力単語，前記入力欄Ｒの入力データを入力意味と呼ぶものとする。 (1) User interface input unit 31
The user interface input unit 31 displays the user interface of a personal web diary tool on the browser of the user terminal via the Internet. According to this interface, as shown in FIG. 4, what word (eg, doggie) and what meaning (eg: dog) is spoken for each date and time (when “example: 200 x year x month x day”). Input field for recording R is displayed. The word information input here, that is, input data in the input field Q is referred to as input word, and input data in the input field R is referred to as input meaning.

この各入力データを前記入力部３１が受け付けることによりデータベース作成に必要な投稿データが収集される。ここで収集された各データは前記各部３２．３３に転送される。なお、個人識別ＩＤ、性別や誕生日、出生順、居住地域などの基本情報は、別途ウェブ日誌ツール利用前に入力しておくものとする。入力された基本情報は、前記生成部３２および前記データ保持部３４に転送される。 When the input unit 31 accepts each input data, post data necessary for database creation is collected. Each data collected here is transferred to each section 32.33. Basic information such as personal identification ID, gender, date of birth, birth order, and residential area shall be entered separately before using the web diary tool. The input basic information is transferred to the generation unit 32 and the data holding unit 34.

（２）単語獲得月齢生成部３２
前記単語獲得月齢生成部３２は、前記入力欄Ｑ．Ｒへのデータ記録日時と、前記基本情報として入力されたユーザの子供の生年月日との差から、入力単語の獲得月齢、即ち生後何ヶ月目に発話されたかを決定する。例えばデータ記録日時「２００８．１０．２１」、生年月日「２００７．８．５」の場合には、獲得月齢「１４．５ヶ月」と算出される。ここで算出された獲得月齢は入力単語とペアで前記中間データ保持部３４に転送される。 (2) Word acquisition age generation unit 32
The word acquisition age generation unit 32 is configured to input the input fields Q. From the difference between the date and time of data recording in R and the date of birth of the user's child input as the basic information, the age at which the input word was acquired, that is, the number of months after birth was determined. For example, in the case of data recording date “2008.10.21” and date of birth “2007.8.5”, the acquired age is calculated as “14.5 months”. The acquired age calculated here is transferred to the intermediate data holding unit 34 as a pair with the input word.

（３）幼児語彙意味カテゴリ付与部３３
前記幼児語彙意味カテゴリ付与部３３は、入力意味（例：犬）に対して、図５の意味カテゴリ分類表を参照して意味カテゴリ（例：動物）を付与する。その際に入力単語（例：わんわん）を図３中の意味カテゴリ定義辞書３１０と照合し、入力意味のカテゴリを決定するものとする。この定義辞書３１０は、図６に示すように、幼児語彙として投稿される可能性のある単語に対して事前に意味カテゴリが定義されている。 (3) Infant vocabulary meaning category assigning unit 33
The infant vocabulary meaning category assigning unit 33 assigns a meaning category (eg, animal) to the input meaning (eg, dog) with reference to the meaning category classification table of FIG. At this time, an input word (eg, doggie) is collated with the semantic category definition dictionary 310 in FIG. 3 to determine an input semantic category. As shown in FIG. 6, in this definition dictionary 310, semantic categories are defined in advance for words that may be posted as infant vocabulary.

例えば入力単語（例：わんわん）は、図６の前記定義辞書３１０によれば、カテゴリＩＤ「２５」のカテゴリ名「動物」に該当する。これにより入力意味（例：犬）には、図５の前記カテゴリ分類表に示すように、「２．身のまわり」中の「２５．動物」の意味カテゴリが付与される。ここで付与された意味カテゴリは、入力意味とペアで前記中間データ保持部３４に転送される。なお、図５の意味カテゴリ分類表および図６の前記定義辞書３１０は、それぞれ前記ハードディスクドライブ装置に保存されているものとする。 For example, according to the definition dictionary 310 of FIG. 6, the input word (eg, doggie) corresponds to the category name “animal” of the category ID “25”. As a result, as shown in the category classification table of FIG. 5, the meaning category of “25. Animal” in “2. The semantic category assigned here is transferred to the intermediate data holding unit 34 in pairs with the input meaning. It is assumed that the semantic category classification table of FIG. 5 and the definition dictionary 310 of FIG. 6 are stored in the hard disk drive device, respectively.

（４）中間データ保持部３４
前記中間データ保持部３４は、前記各部３１〜３３からの転送データを整理・保持する。ここでは入力単語（例：わんわん）に対して、入力意味（例：犬）と獲得月齢（例：１４．５ヶ月）と意味カテゴリ（例：２５．動物）と個人識別ＩＤ（例：Ｆ０９−３−４５６）をデータセットにするものとする。 (4) Intermediate data holding unit 34
The intermediate data holding unit 34 organizes and holds the transfer data from the units 31 to 33. Here, for input words (example: doggie), input meaning (example: dog), acquired age (example: 14.5 months), semantic category (example: 25. animal), and personal identification ID (example: F09-) 3-456) shall be a data set.

具体的には、前記中間データ保持部３４は、前記メモリ（ＲＡＭ）あるいは前記ハードディスクドライブ装置を通じて前記データセットをユーザ毎に順次保持していくものとする。 Specifically, the intermediate data holding unit 34 sequentially holds the data set for each user through the memory (RAM) or the hard disk drive device.

（５）不正データ検出部３５
前記不正データ検出部３５は、プログラムに定義された４つの指標、即ち図３中の有意味単語月齢照合定義３１１・名詞カテゴリ照合定義３１２・日課／挨拶カテゴリ照合定義３１３・ＮＶ比計算照合定義３１４に基づき前記中間データ保持部３４の保持するデータセットから不正データを決定・検出する。以下、図７に基づき前記不正データ検出部３５の処理プロセスを説明する。 (5) Unauthorized data detection unit 35
The fraudulent data detection unit 35 includes four indexes defined in the program, that is, the meaningful word age verification definition 311 in FIG. On the basis of the above, illegal data is determined and detected from the data set held by the intermediate data holding unit 34. Hereinafter, the processing process of the unauthorized data detection unit 35 will be described with reference to FIG.

Ｓ１１：まず、前記不正データ検出部３５は、前記データセットに含まれる各ユーザの入力単語のうち獲得月齢が最も小さい単語から５０番目の単語までを選択する。ここで選択された単語を「早期出現語彙５０語」と呼ぶ。この早期出現語彙５０語に対して、Ｓ１２以降で前記各指標に基づき不正データを検出する。 S11: First, the fraudulent data detection unit 35 selects the words from the smallest acquired age to the 50th word among the input words of each user included in the data set. The word selected here is referred to as “early appearance vocabulary 50 words”. For the 50 early appearance vocabulary words, illegal data is detected based on the respective indices in S12 and thereafter.

Ｓ１２：Ｓ１１で選択された早期出現語彙５０語に対する有意味単語月齢照合、即ち有意味単語月齢照合定義３１１に基づく不正データ検出を実施する。ここでは早期出現語彙５０語に獲得月齢が８ヶ月齢以前の有意味単語が含まれていれば、これを不正データとみなすものとする。この有意味単語か否かの照合にあたっては前記データセット中の入力意味が利用される。 S12: Incorrect data detection based on the meaningful word age collation for the 50 words of early appearance vocabulary selected in S11, that is, the meaningful word age collation definition 311 is performed. Here, if the early appearance vocabulary 50 word includes a meaningful word whose acquired age is 8 months or older, this is regarded as illegal data. The input meaning in the data set is used for checking whether this is a meaningful word.

このように獲得月齢が８ヶ月以前の場合に不正データとする理由は、乳児の８ヶ月齢以前には、構音器官とその制御を司る脳機能、および音声表象と指示対象の認知的関連づけ脳機能が十分に成熟していないという自然科学的知見が存在するためである。例えば３ヶ月齢児が「ちょうだい」・「ジャンプ」・「落ちた」のような有意味単語を発したと前記入力欄Ｑに入力されたとしても、現実には不可能だと推定される。 As described above, the reason why the data is incorrect when the acquired age is 8 months or older is that the infant's 8 months of age or earlier is the brain function that controls the articulatory organ and its control, and the cognitively related brain function of the speech representation and the target object. This is because there is a natural scientific finding that is not mature enough. For example, even if a 3-month-old child utters a meaningful word such as “give me”, “jump”, or “fallen”, it is presumed that it is actually impossible even if it is entered in the input field Q.

Ｓ１３．Ｓ１４：つぎにＳ１１で選択された早期出現語彙５０語に対するに対する名詞カテゴリ照合、即ち名詞カテゴリ照合定義３１２に基づく不正データ検出を実施する（Ｓ１３）。ここでは早期出現語彙５０語中に名詞カテゴリ（図５の意味カテゴリ分類表における２１「食べ物・飲み物」〜２６「乗物」）に該当する単語の存否が照合される。照合の結果、名詞カテゴリに該当する単語がひとつも存在しない場合には不正データとする。 S13. S14: Next, noun category matching is performed on the 50 early appearing vocabulary words selected in S11, that is, illegal data detection based on the noun category matching definition 312 is performed (S13). Here, the presence / absence of a word corresponding to a noun category (21 “food / drink” to 26 “vehicle” in the semantic category classification table of FIG. 5) in 50 early appearance vocabulary words is collated. If there is no word corresponding to the noun category as a result of collation, the data is regarded as invalid data.

その後に早期出現語彙５０語に対する日課／挨拶カテゴリ照合、即ち日課／挨拶カテゴリ照合定義３１３に基づき不正データ検出を実施する（Ｓ１４）。ここでは早期出現語彙５０語中に日課／挨拶カテゴリ（図５の意味カテゴリ分類表における３１「日課・あいさつ」）に該当する単語の存否が照合される。照合の結果、日課／挨拶カテゴリに該当する単語がひとつも存在しない場合には、Ｓ１３と同様に不正データとする。なお、Ｓ１３．Ｓ１４の照合にあたっては前記データセット中の意味カテゴリが利用される。 After that, illegal data detection is performed based on daily / greeting category matching for the 50 words that appear early, that is, daily / greeting category matching definition 313 (S14). Here, the presence / absence of a word corresponding to a daily / greeting category (31 “daily routine / greeting” in the semantic category classification table of FIG. 5) in the 50 early appearing vocabulary words is collated. If there is no word corresponding to the daily / greeting category as a result of the collation, it is regarded as illegal data as in S13. S13. The semantic category in the data set is used for the collation in S14.

このように両カテゴリに該当する単語が不存在の場合に不正データとする理由は、発明者達の実施した日本語学習児の調査によれば、早期出現語彙５０語にかならず両カテゴリのものが存在することを発見しており、もしこのカテゴリの単語が全く存在しないとすれば、それは不正データの可能性が高いと考えられるためである。 As described above, the reason why the data is invalid when there are no words corresponding to both categories is that, according to the survey of Japanese learning children conducted by the inventors, the words in both categories are not limited to the 50 early appearance vocabulary words. If it has been found that there is no word in this category, it is considered that there is a high possibility of illegal data.

Ｓ１５：最後にＳ１１で選択された早期出現語彙５０語に含まれる名詞カテゴリ（Ｎ）と動詞カテゴリ（Ｖ）とのＮＶ比計算照合、即ちＮＶ比計算照合定義３１４に基づく不正データの検出を実施する。 S15: Finally, NV ratio calculation collation between the noun category (N) and the verb category (V) included in the 50 early appearance vocabulary words selected in S11, that is, illegal data detection based on the NV ratio calculation collation definition 314 is performed. To do.

具体的には、早期出現語彙５０語に含まれる名詞カテゴリ（図５の意味カテゴリ分類表における２１「食べ物・飲み物」〜２６「乗物」）と動詞カテゴリ（図５の意味カテゴリ分類表における４１「動作語」）の単語の各個数から「ＮＶ比（ｎｏｕｎ−ｖｅｒｂｒａｔｉｏ）」を計算する。ここでは「ＮＶ比＝名詞カテゴリ数÷動詞カテゴリ数」とし、算出したＮＶ比が特定の範囲外の場合は不正データとみなす。 Specifically, the noun category (21 “food / drink” to 26 “vehicle” in the semantic category classification table of FIG. 5) included in the 50 words of early appearance vocabulary and the verb category (41 “in the semantic category classification table of FIG. 5). “Non-verb ratio” is calculated from the number of words of “operation word”). Here, “NV ratio = noun category number ÷ verb category number” is assumed, and when the calculated NV ratio is outside a specific range, it is regarded as illegal data.

発明者達の実施した研究によれば、日本語学習児の早期出現語彙５０語のＮＶ比は、平均「３．３２」、標準偏差「１．７８」であり、標準偏差の２倍を基準とした２ＳＤに該当する「９５．４５％」のデータが、「０〜６．８７」の値に当てはまることを発見した。この自然科学的現象を指標として使用し、ＮＶ比が「０〜６．８７」の範囲外であれば、不正データとみなすものとする。 According to the research conducted by the inventors, the NV ratio of the 50 early vocabulary words of Japanese learning children is an average of “3.32” and a standard deviation of “1.78”, and is based on twice the standard deviation It was found that the data of “95.45%” corresponding to 2SD, which falls under the above, falls within the value of “0 to 6.87”. If this natural scientific phenomenon is used as an index and the NV ratio is outside the range of “0 to 6.87”, it is regarded as illegal data.

ただし、ＮＶ比は獲得する言語によって異なることが判明しており、ＮＶ比の前記範囲を日本語以外の言語に適用することはできない。なお、非特許文献２によれば、英語ではＮＶ比の平均が約１２．０と日本語よりもかなり高く、韓国語では平均が約１．８と日本語よりも低いことから、言語に応じてＮＶ比の前記範囲を調整すればよい。 However, it has been found that the NV ratio varies depending on the language to be acquired, and the above range of the NV ratio cannot be applied to languages other than Japanese. According to Non-Patent Document 2, the average NV ratio in English is about 12.0, which is considerably higher than Japanese, and the average in Korean is about 1.8, which is lower than Japanese. The range of the NV ratio may be adjusted.

このようにＳ１２〜Ｓ１５の処理において、４つの前記指標３１１〜３１４を最終的にすべて満たせば、そのユーザの前記各データセットを有効とし、以降の分析対象の有効データとして利用する。この有効データは前記中間データ保持部３４に保持しておくものとする。 As described above, in the processes of S12 to S15, if all the four indicators 311 to 314 are finally satisfied, the respective data sets of the user are validated and used as valid data to be analyzed thereafter. This valid data is held in the intermediate data holding unit 34.

一方、前記指標３１１〜３１４のうちで１つでも不正データに該当すれば、そのユーザの前記各データセットを無効とし、以降の分析対象から除外する。その際に前記中間データ保持部３４から消去してもよい。こうした幼児言語発達の特徴を利用した指標による基準適用により、悪戯などによる不正データを高精度に検出可能となる。 On the other hand, if any one of the indicators 311 to 314 corresponds to illegal data, the data set of the user is invalidated and excluded from the subsequent analysis targets. At that time, the intermediate data holding unit 34 may be deleted. By applying a criterion based on an index using the characteristics of infant language development, it becomes possible to detect illegal data due to mischief and the like with high accuracy.

（６）平均獲得月齢生成部３６
前記平均獲得月齢生成部３６は、前記不正データ検出部３５で認められた前記有効データを対象に、各入力単語の平均獲得月齢および投稿データ個数を生成する。例えば、「ママ」という意味で登録された単語をすべて選択し、該単語の各ユーザの獲得月齢を平均化することで、単語「ママ」の平均獲得月齢（例：１６．７ヶ月齢）と、入力された投稿データ個数（例：１２３件）とを生成する。ここで生成された平均獲得月齢および投稿データ個数は前記メモリ（ＲＡＭ）に記憶されるものとする。 (6) Average acquired age generation unit 36
The average acquired age generation unit 36 generates the average acquired age of each input word and the number of posted data for the valid data recognized by the unauthorized data detection unit 35. For example, by selecting all the words registered in the meaning of “mama” and averaging the acquired age of each user of the word, the average acquired age of the word “mama” (eg, 16.7 months) The number of post data input (for example, 123) is generated. The average acquired age and the number of posted data generated here are stored in the memory (RAM).

（７）データ信頼性決定部３７
前記データ信頼性決定部３７は、前記メモリ（ＲＡＭ）に記憶された各入力単語の平均獲得月齢値（例：単語「ママ」、１６．７ヶ月齢）および投稿データ個数（例：個数値単語「ママ」、１２３件）を利用して、前記中間データ保持部３４に保持された前記有効データの信頼性を決定する。 (7) Data reliability determination unit 37
The data reliability determination unit 37 calculates the average acquired age value (eg, word “Mama”, 16.7 months) of each input word stored in the memory (RAM) and the number of posted data (eg, number value word). The reliability of the effective data held in the intermediate data holding unit 34 is determined using “Mama” (123 cases).

詳細を説明すれば、まず各単語の平均獲得月齢値（例：単語「ママ」、１６．７ヶ月齢）を、図３中の横断５０％到達月齢辞書３１５と照合し、前記有効データの信頼性を決定する。この辞書３１５は、図８に示すように、各単語の「５０％到達月齢」値をあらかじめリスト化して、前記ハードディスクドライブ装置に保存しているものとする。 More specifically, first, the average acquired age value of each word (eg, the word “mama”, 16.7 months old) is collated with the crossing 50% reaching age dictionary 315 in FIG. Determine sex. As shown in FIG. 8, the dictionary 315 assumes that “50% reached age” values of each word are listed in advance and stored in the hard disk drive.

ここで保存される「５０％到達月齢」は、１０−３６ヶ月齢の子どもを持つ親を対象に、わが子が現時点で、どのような単語を覚えているかをチェックリスト（質問紙）に回答してもらい、該回答のデータを月齢ごとに集計し、各単語がその月齢群のどの程度の割合の子どもに獲得されているかを算出し（例えば、１８ヶ月齢群では「ママ」は６５％，パパは５７％など）、各単語の獲得割合が５０％に到達した月齢を該単語の獲得月齢と暫定的に設定したものである（非特許文献３参照）。 “50% Reaching Age” stored here is for parents with children aged 10-36 months, answering the checklist (question paper) what words my child currently remembers The data of the answer is tabulated for each age, and the percentage of children in each age group that each word is acquired is calculated (for example, 65% The age at which the acquisition rate of each word reaches 50% is provisionally set as the acquisition age of the word (see Non-Patent Document 3).

例えば、「まんま」という単語に対して１３ヶ月齢群では４６％、１４ヶ月齢群では５１％、１５ヶ月齢群では６０％の獲得割合とすれば、「まんま」の５０％到達月齢を１４ヶ月齢と設定する。この「５０％到達月齢」は、前記平均獲得月齢生成部３６で算出される平均獲得月齢値と、データ収集および計算手法の点で全く異なるものであるが、発明者達の研究成果によれば、「５０％到達月齢」の値が統計的に有意なレベルで合致することを発見した（級内相関係数が０．７であり、５％水準で有意に合致した）。 For example, if the acquisition rate is 46% in the 13-month-old group, 51% in the 14-month-old group, and 60% in the 15-month-old group for the word “manma”, the 50-month-old age of “manma” is 14 Set as months of age. This “50% reached age” is completely different from the average acquired age value calculated by the average acquired age generation unit 36 in terms of data collection and calculation methods, but according to the research results of the inventors. , It was found that the value of “50% reaching age” matched at a statistically significant level (the intraclass correlation coefficient was 0.7, which matched significantly at the 5% level).

こうした科学的事実に基づいて前記転送データ中の平均獲得月齢値を前記辞書３１５と単語ごとに照合し、５０％獲得月齢（例えば単語「まんま：１４ヶ月齢」）の前後２ヶ月の範囲（例えば、１２−１６ヶ月齢）に該当すれば、該平均獲得月齢の前記有効データの信頼性を肯定する。この信頼性肯定範囲（平均獲得月齢の前後２ヶ月）はプログラムに設定されているものとする。 Based on such scientific facts, the average acquired age value in the transferred data is collated with the dictionary 315 for each word, and the range of 2 months before and after 50% acquired age (for example, the word “Manma: 14 months old”) (for example, , 12-16 months old), the reliability of the effective data of the average acquired age is affirmed. This reliability affirmative range (two months before and after the average acquired age) is set in the program.

ただし、前記転送データに含まれる投稿データ個数が、あまりに少ない場合には平均獲得月齢の正確な推定が難しい。そこで、投稿データ個数が１０個以上の入力単語のみを有効とする基準を設定し、投稿データ個数が９以下の場合は信頼性のないデータとして、前記有効データから除外するものとする。この基準個数も、プログラムに設定され、必要に応じて調整可能とする。 However, when the number of posted data included in the transfer data is too small, it is difficult to accurately estimate the average acquired age. Therefore, a criterion for validating only the input words having the posted data number of 10 or more is set, and when the posted data number is 9 or less, it is excluded from the valid data as unreliable data. This reference number is also set in the program and can be adjusted as necessary.

（８）データベース項目生成部３８
前記データベース項目生成部３８は、前記各部３２〜３７の一連の処理手続を経て信頼性を肯定された有効データを、入力単語ごと（例：アンパンマン）に平均獲得月齢（例：１８．３ヶ月齢）・意味カテゴリのカテゴリＩＤ（例：５２キャラクター）・投稿されたデータ数（例：１３３件）・発話例（例：あんぱん、ぱんぱん）の項目別に集計・加工し、図９に示すように、データベース化する。ここで生成されたデータベースを幼児語彙発達データベースとする。 (8) Database item generation unit 38
The database item generation unit 38 obtains effective data whose reliability has been affirmed through a series of processing procedures of the units 32 to 37 for each input word (for example, Anpanman) on the average acquired age (for example, 18.3 months). ) ・ Category ID of category (example: 52 characters) ・ Number of posted data (example: 133 cases) ・ Examples of utterances (example: Anpan, Panpan) are summarized and processed as shown in FIG. Create a database. The database generated here is an infant vocabulary development database.

したがって、前記作成装置１によれば、前記ユーザ端末からのウェブ投稿データ、即ちユーザ参加型の方式で収集したデータから、日本語学習児の幼児語彙発達特性を生かした不正データ検出（Ｓ１１〜Ｓ１５）および信頼性検証のプロセスを通じて高精度・高品質の幼児語彙発達データベースが作成される。 Therefore, according to the creation apparatus 1, fraudulent data detection (S11 to S15) utilizing infant vocabulary development characteristics of Japanese learning children from web posting data from the user terminal, that is, data collected by a user participation type method. ) And a reliability verification process, a high-precision and high-quality infant vocabulary development database is created.

（９）ユーザインタフェース閲覧部３９
前記ユーザインタフェース閲覧部３９によれば、前記データベース項目生成部３８で作成された幼児語彙発達データベースを、ウェブ上で検索・閲覧できるユーザインタフェースが前記ユーザ端末に提供される。すなわち、前記閲覧部３９は、図１０に示すように、前記データベースを検索可能な検索ページを前記ユーザ端末のブラウザに表示させる。 (9) User interface browsing unit 39
According to the user interface browsing unit 39, a user interface capable of searching and browsing the infant vocabulary development database created by the database item generating unit 38 on the web is provided to the user terminal. That is, as shown in FIG. 10, the browsing unit 39 displays a search page for searching the database on the browser of the user terminal.

この検索ページに入力された検索要求に応じて検索結果を出力し、図１１に示すように、前記ユーザ端末のブラウザ表示を検索結果のページに切り替える。これによりユーザは、前記ユーザ端末を通じて幼児の語彙発達に関する情報を簡単に取得することができる。例えば、図１０の検索ページのＴ欄「フリーキーワードから調べる」に発音「わんわん」を入力し、発音から検索要求すれば、図１１に示すように、音声」（わんわん）に対する「意味」（犬、動物、ＮＨＫのキャラクター）と、「カテゴリ」（動物）と、「平均獲得月齢」（１８，１ヶ月齢）と、発話された割合（％）を示す獲得分布（度数分布表）Ｗが前記ユーザ端末のブラウザに表示される。 The search result is output in response to the search request input to the search page, and the browser display of the user terminal is switched to the search result page as shown in FIG. Thereby, the user can easily acquire information related to infant vocabulary development through the user terminal. For example, if the pronunciation “Wanwan” is entered in the T column “Search from free keyword” on the search page of FIG. 10 and a search is requested from the pronunciation, the “meaning” (dog) for the voice (wanwan) as shown in FIG. , Animals, NHK characters), “category” (animals), “average acquired age” (18, 1 month old), and acquisition distribution (frequency distribution table) W indicating the percentage (%) spoken It is displayed on the browser of the user terminal.

また、１６ヶ月齢の子どもが平均的にどんな単語を覚える傾向にあるかを検索したい場合には、図１０の項目「月齢から調べる」のうち「１６ヶ月」をブラウザ上でクリックすれば、平均獲得月齢が「１６．０ヶ月〜１６．９ヶ月」の単語が一覧リストとして前記ユーザ端末のブラウザに表示される。その他に図１０の項目「索引から調べる」、項目「意味カテゴリから調べる」をクリックし、該当結果をブラウザに表示させることもできる。 Also, if you want to find out what words 16-month-old children tend to memorize on average, you can click on “16 months” on the browser in the item “Check by age” in FIG. Words having an acquired age of “16.0 months to 16.9 months” are displayed as a list on the browser of the user terminal. In addition, by clicking the item “Check from index” and the item “Check from semantic category” in FIG. 10, the corresponding result can be displayed on the browser.

これにより幼児の言語発達過程を簡単に検索／閲覧可能な「こども語辞書」サービスが提供され、親が知りたいと思う子どもの言葉の成長に関する情報が容易に検索可能となる。このとき前記幼児語彙発達データベースは、前記不正データ検出部３５および前記データ信頼性決定部３７を通じて不正データが排斥され、信頼性を有する投稿データに基づき構築されているため、幼児語彙発達に関する高精度・高品質のデータベースとして構築されている。 This provides a “children dictionary” service that allows easy search / browsing of the language development process of infants, and makes it easy to search for information on the growth of children's words that parents want to know. At this time, the infant vocabulary development database is constructed on the basis of reliable post data in which invalid data is rejected through the invalid data detection unit 35 and the data reliability determination unit 37, so that high accuracy regarding infant vocabulary development is obtained.・ It is constructed as a high-quality database.

したがって、子供の言葉の成長をより正確に推定・知得でき、エビデンスベース（ＥｖｉｄｅｎｃｅＢａｓｅ）の情報開示が可能となる。また、前記幼児語彙発達データベースに含まれる各単語の平均獲得月齢を利用すれば、初期語彙発達教育支援システムや、月齢に応じた幼児向け音声対話システムへの応用も将来的に可能であり、様々な産業場面で適用できる。 Therefore, the growth of the child's words can be estimated and understood more accurately, and evidence-based information disclosure is possible. In addition, if the average acquired age of each word contained in the infant vocabulary development database is used, it can be applied to an early vocabulary development education support system and a spoken dialogue system for infants according to the age. Applicable in various industrial situations.

≪プログラムなど≫
本発明は、前記作成装置１を構成する各手段１２〜１８．各部３１〜３９の一部若しくは全部として、コンピュータを機能させるためのプログラムとして構成することもできる。このプログラムでは、Ｓ０１〜Ｓ０７．Ｓ１１〜Ｓ１５の全ステップあるいは一部のステップをコンピュータに実行させることが可能である。 ≪Programs≫
In the present invention, each means 12-18. Some or all of the units 31 to 39 can be configured as a program for causing a computer to function. In this program, S01 to S07. It is possible to cause the computer to execute all or some of steps S11 to S15.

このプログラムは、Ｗｅｂサイトや電子メールなどネットワークを通じて提供することができる。また、前記プログラムは、ＣＤ−ＲＯＭ，ＤＶＤ−ＲＯＭ，ＣＤ−Ｒ，ＣＤ−ＲＷ，ＤＶＤ−Ｒ，ＤＶＤ−ＲＷ，ＭＯ，ＨＤＤ，Ｂｌｕ−ｒａｙＤｉｓｋ（登録商標）などの記録媒体に記録して、保存・配布することも可能である。この記録媒体は、記録媒体駆動装置を利用して読み出され、そのプログラムコード自体が前記実施形態の処理を実現するので、該記録媒体も本発明を構成する。 This program can be provided through a network such as a website or e-mail. The program is recorded on a recording medium such as a CD-ROM, DVD-ROM, CD-R, CD-RW, DVD-R, DVD-RW, MO, HDD, Blu-ray Disk (registered trademark). It is also possible to save and distribute. This recording medium is read using a recording medium driving device, and the program code itself realizes the processing of the above embodiment, so that the recording medium also constitutes the present invention.

１…幼児語彙発達データベース作成装置
１１…ユーザインタフェース入力手段（入力手段）
１２…単語獲得月齢生成手段（獲得月齢生成手段）
１３…幼児語彙意味カテゴリ付与手段（カテゴリ付与手段）
１４…不正データ検出手段（不正検出手段）
１５…平均獲得月齢生成手段
１６…データ信頼性決定手段（信頼性決定手段）
１７…データベース項目生成手段（生成手段）
１８…ユーザインタフェース閲覧手段
３１…ユーザインタフェース入力部
３２…単語獲得月齢生成部
３３…幼児語彙意味カテゴリ付与部
３４…中間データ保持部
３５…不正データ検出部
３６…平均獲得月齢生成部
３７…データ信頼性決定部
３８…データベース項目生成部
３１０…意味カテゴリ定義辞書
３１１…有意味単語月齢照合定義
３１２…名詞カテゴリ照合定義
３１３…日課／挨拶カテゴリ照合定義
３１４…ＮＶ比計算照合定義
３１５…横断５０％到達月齢辞書 1 ... Infant vocabulary development database creation device 11 ... User interface input means (input means)
12 ... Word acquisition age generation means (acquisition age generation means)
13. Infant vocabulary meaning category assigning means (category assigning means)
14: Unauthorized data detection means (injustice detection means)
15 ... Average acquired age generation means 16 ... Data reliability determination means (reliability determination means)
17 ... Database item generation means (generation means)
DESCRIPTION OF SYMBOLS 18 ... User interface browsing means 31 ... User interface input part 32 ... Word acquisition age generation part 33 ... Infant vocabulary meaning category provision part 34 ... Intermediate data holding part 35 ... Incorrect data detection part 36 ... Average acquisition age generation part 37 ... Data reliability Sex determination unit 38 ... Database item generation unit 310 ... Semantic category definition dictionary 311 ... Meaning word age collation definition 312 ... Noun category collation definition 313 ... Daily / greeting category collation definition 314 ... NV ratio calculation collation definition 315 ... Crossing reached 50% Age dictionary

Claims

A method for creating an infant vocabulary development database using word information posted on the web through a user terminal,
An input receiving step of displaying an interface for inputting word information on the terminal and receiving the word information input by the user through the terminal;
A fraud detection step in which the fraud detection means detects fraudulent word information by applying the acquired age or semantic category of the word information received in the input reception step to a plurality of definition criteria prepared in advance, and eliminates the fraud information; ,
An average acquisition age calculation means for calculating an average acquisition age of word information recognized as effective without being excluded in the fraud detection step;
A reliability determining step for determining the reliability of the word information recognized as valid based on the average acquired age calculated in the average age calculating step;
A generating step for generating an infant vocabulary development database based on the word information whose reliability is recognized in the reliability determining step;
A method for creating an infant vocabulary development database.

An acquired age generation step for calculating the acquired age from the difference between the input date and time of the word information and the date of birth input through the terminal, with respect to the word information received in the input receiving step. ,
A category assigning step in which the category assigning unit assigns the semantic category with reference to a category dictionary prepared in advance for the word information received in the input accepting step;
A user interface browsing means for displaying on the terminal an interface capable of searching and browsing the database generated in the generating step on the web;
The infant vocabulary development database creation method according to claim 1, further comprising:

The fraud detection step includes a step of selecting an arbitrary number of the word information for each user in ascending order of the acquired age;
Recognizing the user's word information group as incorrect information if the acquired word information group includes meaningful word information whose acquired age is before a reference value;
Collating the semantic categories of each of the selected word information, and if there is no word information corresponding to a noun category or daily / greeting category, recognizing the user's word information group as incorrect information;
By referring to the semantic category of each selected word information, the ratio of the word information corresponding to the noun category and the word information belonging to the verb category is calculated, and the ratio is outside the range of the predetermined index. If there is a step of recognizing the user's word information group as illegal information,
The method for creating an infant vocabulary development database according to claim 1, wherein:

The reliability determining step is to check the reliability of each word information recognized as valid by comparing the average acquired age with an age dictionary summarizing acquired age for each word;
If the number of word information groups recognized as valid does not exceed a preset reference number, denying the reliability of the word information groups;
The method for creating an infant vocabulary development database according to any one of claims 1 to 3.

A device for creating an infant vocabulary development database using word information posted on the web through a user terminal,
An input means for displaying an interface for inputting word information on the terminal, and receiving word information input by a user through the terminal;
The fraud detection means for detecting the fraudulent word information by applying the acquisition age or semantic category of the word information received by the input means to a plurality of definition criteria prepared in advance, and removing the fraud information;
An average acquired age calculating means for calculating an average acquired age of word information recognized as effective without being excluded by the fraud detector;
Reliability determining means for judging the reliability of word information recognized as effective based on the average acquired age calculated by the average age calculating means;
Generating means for generating an infant vocabulary development database based on word information whose reliability is recognized by the reliability determining means;
An infant vocabulary development database creation device characterized by comprising:

For the word information received by the input means, acquired age generation means for calculating the acquired age from the difference between the input date and time of the word information and the date of birth input through the terminal;
For the word information received by the input means, referring to a category dictionary prepared in advance, category giving means for giving the semantic category;
User interface browsing means for displaying on the terminal an interface capable of searching and browsing the database generated in the generating step on the web;
The infant vocabulary development database creation device according to claim 5, further comprising:

The fraud detection means selects an arbitrary number of the word information for each user in ascending order of the acquired age;
Means for recognizing the user's word information group as fraudulent information if the acquired word information group includes meaningful word information whose acquired age is before a reference value;
Means for recognizing the user's word information group as illegal information if the semantic category of each selected word information is collated and there is no word information corresponding to a noun category or daily / greeting category;
By referring to the semantic category of each selected word information, the ratio of the word information corresponding to the noun category and the word information belonging to the verb category is calculated, and the ratio is outside the range of the predetermined index. If there is, means for recognizing the user's word information group as illegal information,
The infant vocabulary development database creation device according to any one of claims 5 and 6, characterized by comprising:

The reliability determination means is a means for checking the reliability of each word information recognized as valid by checking the average acquired age for each word with a month dictionary summarizing acquired age for each word;
If the number of word information groups recognized as valid does not exceed a preset reference number, means for denying the reliability of the word information groups;
The infant vocabulary development database creation device according to any one of claims 5 to 7, further comprising:

An infant vocabulary development database creation program for causing a computer to function as each means constituting the infant vocabulary development database creation device according to any one of claims 5 to 8.