JP5877775B2

JP5877775B2 - Content management apparatus, content management system, content management method, program, and storage medium

Info

Publication number: JP5877775B2
Application number: JP2012193445A
Authority: JP
Inventors: 岩田　泰明; 泰明岩田; 康博中駄; 美樹真山; 豊明鈴鹿
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2012-09-03
Filing date: 2012-09-03
Publication date: 2016-03-08
Anticipated expiration: 2032-09-03
Also published as: JP2014049044A

Description

本発明は、コンテンツ管理装置、コンテンツ管理システム、コンテンツ管理方法、プログラム、及び記憶媒体に関し、例えば、コンテンツの特徴を効率的に付与するための技術に関する。 The present invention relates to a content management apparatus, a content management system, a content management method, a program, and a storage medium. For example, the present invention relates to a technique for efficiently providing content features.

近年、教育分野でICT化が進んでおり、デジタルコンテンツを有効活用するニーズが高まっている。例えば、デジタル教材や学習指導案のファイルを学校内や複数の学校間で共有させることで教師によるコンテンツの活用を支援するＣＭＳ（Content Management System）などのソフトウェア（例えば、非特許文献１参照）が提案されている。このようなソフトウェアでは、コンテンツを効果的に共有するためにコンテンツ一つ一つに対して適切なメタデータを付与する必要がある。例えば、カテゴリやテキストを用いた検索においてコンテンツを効果的に検索する場合やコンテンツの内容を全て確認せず容易にそれを取捨選択の判断をする場合には、コンテンツが属するカテゴリやキーワードが適切に付与されていなければならない。そのため、ＣＭＳに登録されたコンテンツに対して、タイトル、学校区分、学年、科目、キーワードなどを利用者（以下、利用者とも言う）が手入力するためのメタデータ登録画面などを表示する機能を持つものがある。また、コンテンツ内のテキストから特徴的な単語を自動的に抽出する技術（例えば、非特許文献２参照）などが提案されている。 In recent years, the use of ICT has advanced in the education field, and the need for effective use of digital content is increasing. For example, software such as CMS (Content Management System) that supports the use of content by teachers by sharing files of digital teaching materials and learning teaching plans within a school or between a plurality of schools (for example, see Non-Patent Document 1). Proposed. In such software, in order to share content effectively, it is necessary to give appropriate metadata to each content. For example, when searching for content effectively in a search using categories or texts, or when making a decision to select the content easily without checking all of the content, the category or keyword to which the content belongs Must be granted. Therefore, a function to display a metadata registration screen for the user (hereinafter also referred to as a user) to manually input a title, a school division, a grade, a subject, a keyword, etc. with respect to content registered in the CMS. There is something to have. In addition, a technique for automatically extracting characteristic words from text in content (for example, see Non-Patent Document 2) has been proposed.

株式会社日立ソリューションズ、MEANSファイルサーバスリム化ソリューション、http://hitachisoft.jp/products/means/slimserver/Hitachi Solutions, Inc., MEANS file server slimming solution, http://hitachisoft.jp/products/means/slimserver/ 小山照夫, “日本語テキストからの複合語用語抽出”、情報知識学会誌, vol.19, No.4, pp.306-315, 2010Teruo Koyama, “Extracting Compound Words from Japanese Text”, Journal of Information Knowledge Society, vol.19, No.4, pp.306-315, 2010

しかしながら、登録するコンテンツの内容を確認しながらメタデータ登録画面などでコンテンツ一つ一つにメタデータを入力する必要がある。このため、コンテンツが増加するにつれて登録作業が膨大になってしまう。また、コンテンツの内容を示すキーワードを付与する際においては、登録者の主観の下、コンテンツ内のテキストに重要である単語がないか探したり、テキストには無いがコンテンツを表現するのに最適なキーワードを考察したりするため、登録作業者によって付与する語彙に差が生じる。さらに、教育分野向けコンテンツでは、特に教師が学生に向けて解説する際に用いる図などは、教師の授業スタイルに合わせて画像などを用いて作成されたものが多く、既存技術（非特許文献２）で抽出したキーワードだけでテキストによる検索やコンテンツの内容を理解するのに十分な語彙があるとは限らない。
本発明はこのような状況に鑑みてなされたものであり、利用者がＩＴリテラシーに乏しくても、デジタルコンテンツを簡単かつ的確に共有サーバ等に登録することを可能にする技術を提供するものである。 However, it is necessary to input metadata for each content on the metadata registration screen while confirming the content of the content to be registered. For this reason, the registration work becomes enormous as the content increases. In addition, when assigning keywords that indicate the contents of content, it is best suited to search for important words in the text in the content, or to express content that is not in the text, under the subjectivity of the registrant. Because keywords are considered, there is a difference in vocabulary given by registered workers. Furthermore, in the content for education field, especially the diagrams used when teachers explain to students, many of them are created by using images etc. according to the teacher's class style. ) Is not necessarily a vocabulary sufficient to understand textual search and content.
The present invention has been made in view of such a situation, and provides a technique that enables a user to easily and accurately register digital contents in a shared server or the like even if the user has poor IT literacy. is there.

上記課題を解決するために、本発明では、コンテンツ管理装置は、記憶装置に、コンテンツを登録する際に、当該コンテンツを分類するために用いられ、予め用意された複数種類のコンテンツ管理情報と、当該各コンテンツ情報に関連付けられた、予め用意された特徴語情報と、を格納させておき、記憶装置に格納されたコンテンツ管理情報と特徴語情報に基づいて、登録しようとするコンテンツのメタデータを支援情報として取得し、出力する。各コンテンツ管理情報は、コンテンツのカテゴリに関する説明文を含んでいる。このとき、コンテンツ管理装置は、登録しようとするコンテンツに含まれるキーワードを抽出する処理と、抽出したキーワードの、コンテンツ管理情報の説明文における出現頻度を計算し、当該出現頻度に基づいて複数種類のコンテンツ管理情報をスコアリングする処理と、スコアリングする処理によって得られたスコア値が最も高いコンテンツ管理情報に関連付けられた特徴語情報を取得し、当該特徴語情報を支援情報（メタデータ）として出力する処理と、を実行する。 In order to solve the above problems, in the present invention, the content management device is used to classify the content when registering the content in the storage device, and a plurality of types of content management information prepared in advance, The feature word information prepared in advance associated with each content information is stored, and the metadata of the content to be registered is stored based on the content management information and the feature word information stored in the storage device. Obtain and output as support information. Each content management information includes an explanatory text related to the content category. At this time, the content management apparatus calculates a keyword included in the content to be registered, calculates an appearance frequency of the extracted keyword in the description of the content management information, and determines a plurality of types based on the appearance frequency. Acquires feature word information associated with content management information having the highest score value obtained by scoring content management information and scoring processing, and outputs the feature word information as support information (metadata) And processing to execute.

本発明によれば、利用者は、ＩＴ技術に詳しくなかったとしても、簡単に、かつ的確なキーワードを付与して自分のコンテンツを共有サーバ等に登録することができるようになる。
本発明に関連する更なる特徴は、本明細書の記述、添付図面から明らかになるものである。また、本発明の態様は、要素及び多様な要素の組み合わせ及び以降の詳細な記述と添付される特許請求の範囲の様態により達成され実現される。
本明細書の記述は典型的な例示に過ぎず、本発明の特許請求の範囲又は適用例を如何なる意味に於いても限定するものではないことを理解する必要がある。 According to the present invention, even if the user is not familiar with the IT technology, the user can register his / her content in a shared server or the like simply by assigning an appropriate keyword.
Further features related to the present invention will become apparent from the description of the present specification and the accompanying drawings. The embodiments of the present invention can be achieved and realized by elements and combinations of various elements and the following detailed description and appended claims.
It should be understood that the description herein is merely exemplary and is not intended to limit the scope of the claims or the application of the invention in any way.

本発明の実施形態によるコンテンツ管理装置（コンテンツ管理システム）の概略構成例を示す機能ブロック図である。It is a functional block diagram which shows the schematic structural example of the content management apparatus (content management system) by embodiment of this invention. カリキュラムおよび特徴語のデータ構造例を示す図である。It is a figure which shows the data structure example of a curriculum and a feature word. コンテンツ情報のデータ構造を示す図である。It is a figure which shows the data structure of content information. 登録コンテンツ選択画面表示部（１０９）によって表示される画面例を示す図である。It is a figure which shows the example of a screen displayed by the registration content selection screen display part (109). テキスト情報を含むコンテンツの一例である。It is an example of the content containing text information. コンテンツ管理装置において実行される処理の全体概要を説明するためのフローチャートである。It is a flowchart for demonstrating the whole outline | summary of the process performed in a content management apparatus. メタデータ推測処理部１１１の詳細を説明するためのフローチャートである。5 is a flowchart for explaining details of a metadata estimation processing unit 111; メタデータ推測処理部１１１におけるカリキュラムをスコアリングする処理を説明するためのフローチャートである。It is a flowchart for demonstrating the process which scores the curriculum in the metadata estimation process part. メタデータ入力画面表示部１１２の詳細を説明するためのフローチャートである。5 is a flowchart for explaining details of a metadata input screen display unit 112. メタデータ入力画面表示部１１２によって表示される画面例を示す図である。It is a figure which shows the example of a screen displayed by the metadata input screen display part. メタデータ入力画面表示部１１２によって表示される画面例を示す図である。It is a figure which shows the example of a screen displayed by the metadata input screen display part. メタデータ入力画面表示部１１２によって表示される画面例を示す図である。It is a figure which shows the example of a screen displayed by the metadata input screen display part.

以下、添付図面を参照しながら、本発明の装置を実施するための最良の実施形態を詳細に説明する。図１〜図１２は、本発明の実施形態を例示する図である。これらの図において、同一の符号を付した部分は同一物を表し、基本的な構成及び動作は同様であるものとする。尚、本発明の実施形態において、使用される機器、手法等は一例であり、本発明はこれらに限定されるものではないことは勿論である。 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. 1-12 is a figure which illustrates embodiment of this invention. In these drawings, parts denoted by the same reference numerals represent the same items, and the basic configuration and operation are the same. In addition, in embodiment of this invention, the apparatus, method, etc. which are used are examples, and of course, this invention is not limited to these.

更に、本発明の実施形態は、後述されるように、汎用コンピュータ上で稼動するソフトウェアで実装しても良いし専用ハードウェア又はソフトウェアとハードウェアの組み合わせで実装しても良い。 Furthermore, as will be described later, the embodiment of the present invention may be implemented by software running on a general-purpose computer, or may be implemented by dedicated hardware or a combination of software and hardware.

なお、以後の説明では「テーブル」形式によって本発明の各情報について説明するが、これら情報は必ずしもテーブルによるデータ構造で表現されていなくても良く、リスト、ＤＢ、キュー等のデータ構造やそれ以外で表現されていても良い。そのため、データ構造に依存しないことを示すために「テーブル」、「リスト」、「ＤＢ」、「キュー」等について単に「情報」と呼ぶことがある。 In the following description, each information of the present invention will be described in a “table” format. However, the information does not necessarily have to be expressed in a data structure by a table, such as a data structure such as a list, a DB, a queue, or the like. It may be expressed as Therefore, “table”, “list”, “DB”, “queue”, etc. may be simply referred to as “information” to indicate that they do not depend on the data structure.

また、各情報の内容を説明する際に、「識別情報」、「識別子」、「名」、「名前」、「ＩＤ」という表現を用いることが可能であり、これらについてはお互いに置換が可能である。 In addition, when explaining the contents of each information, the expressions “identification information”, “identifier”, “name”, “name”, “ID” can be used, and these can be replaced with each other. It is.

以下では「プログラム」を主語（動作主体）として本発明の実施形態における各処理について説明を行うが、プログラムはプロセッサによって実行されることで定められた処理をメモリ及び通信ポート（通信制御装置）を用いながら行うため、プロセッサを主語とした説明としてもよい。また、プログラムを主語として開示された処理は管理サーバ等の計算機、情報処理装置が行う処理としてもよい。プログラムの一部または全ては専用ハードウェアで実現してもよく、また、モジュール化されていても良い。各種プログラムはプログラム配布サーバや記憶メディアによって各計算機にインストールされてもよい。 In the following, each process in the embodiment of the present invention will be described using “program” as a subject (operation subject). However, a program is executed by a processor and a process determined by a memory and a communication port (communication control device). Since it is performed while being used, the description may be made with the processor as the subject. Further, the processing disclosed with the program as the subject may be processing performed by a computer such as a management server or an information processing apparatus. Part or all of the program may be realized by dedicated hardware, or may be modularized. Various programs may be installed in each computer by a program distribution server or a storage medium.

＜コンテンツ管理システムの構成＞
図１は、コンテンツ管理システム（コンテンツ管理装置）の概略構成を示す機能ブロック図である。図１において、当該コンテンツ管理システム１は、単独の計算機として示されているが、各構成部が遠隔的に配置され、それぞれがネットワークを介して接続されて構成されるようにしても良い。この場合、例えば、コンテンツ管理システム１をクライアント端末装置と共有サーバ（コンテンツ管理サーバ装置）で構成しても良い。この場合、コンテンツを登録しようとする利用者が用いるクライアント端末装置は、表示装置１０６及び入力装置１０７を有するコンピュータで構成され、表示装置１０６及び入力装置１０７以外の構成要素は、共有サーバ側に配置することが可能である。 <Content management system configuration>
FIG. 1 is a functional block diagram showing a schematic configuration of a content management system (content management device). In FIG. 1, the content management system 1 is shown as a single computer, but each component may be remotely arranged and connected via a network. In this case, for example, the content management system 1 may be configured by a client terminal device and a shared server (content management server device). In this case, the client terminal device used by the user who wants to register content is composed of a computer having the display device 106 and the input device 107, and components other than the display device 106 and the input device 107 are arranged on the shared server side. Is possible.

コンテンツ管理システム１は、カリキュラムＤＢ１００と、特徴語ＤＢ１０１と、コンテンツ情報ＤＢ１０２と、中央処理装置１０３と、プログラムメモリ１０４と、中央処理装置（プロセッサ）１０３での処理に必要なデータを格納するデータメモリ１０５と、データを表示するための表示装置１０６と、表示されたデータに対してメニューを選択するなどの操作を行うための入力装置１０７と、コンテンツをファイルシステムで格納しているファイルサーバ１０８と、を有している。 The content management system 1 includes a curriculum DB 100, a feature word DB 101, a content information DB 102, a central processing unit 103, a program memory 104, and a data memory that stores data necessary for processing in the central processing unit (processor) 103. 105, a display device 106 for displaying data, an input device 107 for performing operations such as selecting a menu for the displayed data, and a file server 108 for storing contents in a file system ,have.

中央処理装置１０３は、登録コンテンツ選択画面表示部１０９と、コンテンツ情報抽出処理部１１０と、メタデータ推測処理部１１１と、メタデータ入力画面表示部１１２と、を備えている。この形態例の場合、コンピュータによって構成され、登録コンテンツ選択画面表示部１０９と、コンテンツ情報抽出処理部１１０と、メタデータ推測処理部１１１と、メタデータ入力画面表示部１１２は、いずれもコンピュータ上で実行されるプログラムの機能の一部として実現される。なお、これらのプログラムは、プログラムメモリ１０４に格納され、処理実行の際に中央処理装置１０３が内部メモリに読み込むようにしている。 The central processing unit 103 includes a registered content selection screen display unit 109, a content information extraction processing unit 110, a metadata estimation processing unit 111, and a metadata input screen display unit 112. In the case of this embodiment, it is configured by a computer, and the registered content selection screen display unit 109, the content information extraction processing unit 110, the metadata estimation processing unit 111, and the metadata input screen display unit 112 are all on the computer. Realized as part of the functionality of the program being executed. These programs are stored in the program memory 104 so that the central processing unit 103 reads them into the internal memory when executing the processing.

データメモリ１０５は、カリキュラムＤＢ１００から読み込まれたカリキュラム（情報）１１３と、特徴語ＤＢ１０１から読み込まれた特徴語（情報）１１４と、コンテンツ情報ＤＢ１０２から読み込まれたコンテンツ情報１１５と、を格納している。 The data memory 105 stores a curriculum (information) 113 read from the curriculum DB 100, a feature word (information) 114 read from the feature word DB 101, and content information 115 read from the content information DB 102. .

＜データ構造＞
図２は、データメモリ１０５に含まれるカリキュラム１１３及び特徴語１１４のデータ構造を示す図である。例えば、カリキュラム１１３は、教育分野における具体的な指導内容について記載されたカリキュラムの複数の文書の情報を、複数のカテゴリに分類したテキストの情報として保持している（図２Ａ）。特徴語１１４は、各カリキュラムに関連付けされた教育分野の特徴語であって、各カリキュラムの主題に密接な関連性を有する情報を保持している（図２Ｂ）。各カリキュラムは、それぞれ少なくとも１つの特徴語を有している。教育分野向けカリキュラムの具体例として、教科書などの出版社や教育機関（学習塾など）が発行した学習指導案などがある。 <Data structure>
FIG. 2 is a diagram illustrating a data structure of the curriculum 113 and the feature word 114 included in the data memory 105. For example, the curriculum 113 holds information on a plurality of documents in a curriculum that describes specific instruction contents in the education field as text information classified into a plurality of categories (FIG. 2A). The feature word 114 is a feature word in the education field associated with each curriculum, and holds information closely related to the subject of each curriculum (FIG. 2B). Each curriculum has at least one characteristic word. Specific examples of curriculum for the educational field include study guidance proposals issued by publishers of textbooks and educational institutions (such as learning cram schools).

図２Ａに示されるように、カリキュラム１１３は、ＩＤ２００、学校区分２０１、学年２０２、科目２０３、主題２０４、説明文２０５、スコア２０６を含み、例えば配列の形式でこれらの情報を保有している。ＩＤ２００は、カリキュラム１１３に対して一意に振られた値が保持される。学校区分２０１は、小学校、中学校、高等学校などの学校区分のカテゴリの文字列が保持される。学年２０２は、学校区分２０３を上位階層としてもつ学年のカテゴリの文字列が保持される。科目２０３は、学年２０２を上位階層としてもつ科目のカテゴリが保持される。主題２０４は、学校区分２０１と学年２０２、科目２０３の条件で分類されたカリキュラムの一部のテキストにおいて、指導内容ごとにさらに細分化した各テキストの概要の文字列が保持される。説明文２０５は、主題２０４に該当するテキストが保持される。スコア２０６は、登録するコンテンツの内容が説明文２０５の内容に含まれている度合いを示す値が保持され、初期値は０である。図２Ａで示す例では、中学校１年生の科目が理科における火山活動の様子について指導する際の内容を記述した文であり、主題を「火山活動の様子」と定義した情報が示されている。 As shown in FIG. 2A, the curriculum 113 includes an ID 200, a school division 201, a grade 202, a subject 203, a subject 204, an explanatory note 205, and a score 206, and holds such information in the form of an array, for example. The ID 200 holds a value uniquely assigned to the curriculum 113. The school division 201 holds a character string of a category of a school division such as an elementary school, a junior high school, and a high school. The grade 202 stores a character string of a grade category having the school division 203 as an upper hierarchy. The subject 203 holds a category of subjects having the grade 202 as an upper hierarchy. The subject 204 holds a character string of the outline of each text further subdivided for each teaching content in a part of the text of the curriculum classified under the conditions of the school division 201, the school year 202, and the subject 203. The explanatory note 205 holds text corresponding to the subject 204. The score 206 holds a value indicating the degree to which the content of the registered content is included in the content of the explanatory note 205, and the initial value is 0. In the example shown in FIG. 2A, a sentence describing the content when a subject of a junior high school first grader teaches about the state of volcanic activity in science, the information defining the subject as “state of volcanic activity” is shown.

図２Ｂに示されるように、特徴語１１４は、ＩＤ２０７、関連ＩＤ２０８、キーワード２０９、重要フラグ２１０を含み、例えば配列の形式でこれらの情報を保有している。ＩＤ２０７は、特徴語１１４に対して一意に振られた値が保持される。関連ＩＤ２０８は、カリキュラム１１３のリストの中のいずれか一つのＩＤ２００が保有され、特徴語１１４がカリキュラム１１３に多：１の関係で関連付けされていることを示す。キーワード２０９は、関連付けされたカリキュラム１１３の説明文２０５で定義されている指導する事柄・言葉などの文字列が保持される。重要フラグ２１０は、キーワード２０９の単語が登録するコンテンツにとって重要であると判断した場合は「true」が保持され、初期値は「false」とする。本発明において、特徴語は、必ずしもカリキュラムの主題２０４、説明文２０５、利用者のコンテンツに含まれるテキスト情報に含まれる文言（キーワード）である必要はない。後述するように、利用者は、自身が登録しようとするコンテンツについてのキーワードを自ら考える必要はなく、メタデータとして提示される特徴語（より関連性が強いワードは太字等の強調が施されて提示される）の中から当該コンテンツのキーワードを選択するようになっている。各カリキュラムに関連付けられる特徴語（複数の文言から構成される）は、カリキュラムの主題２０４、説明文２０５、利用者のコンテンツに含まれるテキスト情報に含まれる文言（キーワード）に含まれないキーワードを有している。 As shown in FIG. 2B, the feature word 114 includes an ID 207, a related ID 208, a keyword 209, and an important flag 210, and holds such information in the form of an array, for example. The ID 207 holds a value uniquely assigned to the feature word 114. The related ID 208 indicates that any one ID 200 in the list of the curriculum 113 is held, and the feature word 114 is associated with the curriculum 113 in a multi: 1 relationship. The keyword 209 holds a character string such as a matter / word to be taught defined in the description 205 of the associated curriculum 113. The importance flag 210 holds “true” when the word of the keyword 209 is determined to be important for the registered content, and the initial value is “false”. In the present invention, the characteristic words are not necessarily the words (keywords) included in the text information included in the curriculum theme 204, the explanation sentence 205, and the user content. As will be described later, the user does not have to think about the keyword for the content he / she wants to register, but the feature word presented as metadata (higher-relevance words are highlighted in bold, etc.) The keyword of the content is selected from (presented). The characteristic words (consisting of a plurality of words) associated with each curriculum have keywords that are not included in the words (keywords) included in the text information included in the text information included in the curriculum theme 204, the explanatory text 205, and the user content. doing.

図３は、データメモリ１０５に含まれるコンテンツ情報１１５のデータ構造を示す図である。このコンテンツ情報は、利用者が自身のコンテンツを登録する際に用いられる情報であり、図３に示すようなデータ構造でデータメモリ１０５において管理される。該当するコンテンツの登録が完了すると、該当するコンテンツ情報１１５はデータメモリから消去される。 FIG. 3 is a diagram illustrating a data structure of the content information 115 included in the data memory 105. This content information is information used when a user registers his / her content, and is managed in the data memory 105 in a data structure as shown in FIG. When the registration of the corresponding content is completed, the corresponding content information 115 is deleted from the data memory.

図３に示されるように、コンテンツ情報１１５は、ＩＤ３００、ファイル３０１、タイトル３０２、学校区分３０３、学年３０４、科目３０５、コンテンツテキスト３０６、コンテンツキーワード３０７、付与キーワード３０８を含み、登録するコンテンツのメタデータ情報を示す。ＩＤ３００は、コンテンツ情報１１５に対して一意に振られた値が格納されている。ファイル３０１は、登録するコンテンツのファイル名が格納されている。タイトル３０２は、コンテンツのタイトルを示す文字列が格納されている。学校区分３０３は、登録するコンテンツが該当する学校区分のカテゴリを示す。学年３０４は、登録するコンテンツが該当する学年のカテゴリを示す。科目３０５は、登録するコンテンツが該当する科目のカテゴリを示す。コンテンツテキスト３０６は、コンテンツ内におけるテキスト情報が格納される。コンテンツキーワード３０７は、コンテンツテキスト３０６から抽出したキーワードが配列の形式で格納される。付与キーワード３０８は、後述するメタデータ推測処理において推測したキーワードをサジェストし、利用者が選択したものが配列の形式で格納される。なお、初期値は、ＩＤ３００とファイル３０１以外の項目値はすべてnullとする。 As shown in FIG. 3, the content information 115 includes an ID 300, a file 301, a title 302, a school division 303, a grade 304, a subject 305, a content text 306, a content keyword 307, and a grant keyword 308. Indicates data information. The ID 300 stores a value uniquely assigned to the content information 115. The file 301 stores the file name of the content to be registered. The title 302 stores a character string indicating the title of the content. School division 303 shows the category of the school division to which the content to register corresponds. The grade 304 indicates the category of the grade to which the content to be registered corresponds. A subject 305 indicates a category of a subject corresponding to the content to be registered. The content text 306 stores text information in the content. In the content keyword 307, keywords extracted from the content text 306 are stored in an array format. The assigned keyword 308 suggests a keyword estimated in the metadata estimation process described later, and the keyword selected by the user is stored in an array format. Note that the initial values are all null except for the ID 300 and the file 301.

＜登録コンテンツ選択画面＞
図４は、図１の登録コンテンツ選択画面表示部１０９が処理し、表示装置１０６に表示される登録コンテンツ選択画面の例を示す図である。 <Registered content selection screen>
FIG. 4 is a diagram illustrating an example of a registered content selection screen that is processed by the registered content selection screen display unit 109 of FIG. 1 and displayed on the display device 106.

図４に示されるように、登録コンテンツ選択画面は、プルダウンメニューである学校区分指定メニュー４００と、ファイルパス入力フォーム４０１と、登録ボタン４０２と、終了ボタン４０３と、を有している。利用者は、学校区分指定メニュー４００から登録するコンテンツに該当する学校区分を指定する。なお、学校区分指定メニュー４００には、「小学校」「中学校」「高等学校」などの学校区分をリストとしてあらかじめ指定した値を保持している。 As shown in FIG. 4, the registered content selection screen has a school category designation menu 400 that is a pull-down menu, a file path input form 401, a registration button 402, and an end button 403. The user designates the school division corresponding to the content to be registered from the school division designation menu 400. The school category designation menu 400 holds values designated in advance as a list of school categories such as “elementary school”, “junior high school”, and “high school”.

次に、利用者は、図示しないＨＤＤやファイルサーバ１０８に格納された、登録したいコンテンツのファイルパスを入力し、登録ボタン４０２を押下する。これにより、当該コンテンツのメタデータの登録処理が開始される。 Next, the user inputs the file path of the content to be registered, which is stored in the HDD or file server 108 (not shown), and presses the registration button 402. Thereby, the registration process of the metadata of the content is started.

なお、登録するコンテンツのファイルはMicrosoft（登録商標）社が提供するWordやExcel、PowerPoint（登録商標）のファイルやFlashファイルなど、テキスト情報を含むファイルを対象としている。登録するコンテンツファイルとして、図５にテキスト情報を含むPowerPoint（登録商標）の例が示されている。なお、このメタデータ登録処理については、図６を参照して詳細に後述する。また、利用者が終了ボタン４０３を押下した際には、コンテンツ管理装置における処理を終了する。 The content file to be registered is a file including text information such as Word, Excel, PowerPoint (registered trademark) file or Flash file provided by Microsoft (registered trademark). As a content file to be registered, FIG. 5 shows an example of PowerPoint (registered trademark) including text information. The metadata registration process will be described later in detail with reference to FIG. When the user presses the end button 403, the processing in the content management apparatus is ended.

＜メタデータ登録処理の概要＞
図６は、メタデータ登録処理において行われる処理の概要を説明するためのフローチャートである。このフローチャートは、登録コンテンツ選択画面（図４）において利用者が指定したコンテンツのファイルに対してメタデータを推論し、候補としてメタデータ登録画面（図１０）に表示する処理を示す。そして、利用者によって入力されたメタデータを図示しない記憶装置（例えば、共有サーバ側の記憶装置）に保存する。 <Outline of metadata registration process>
FIG. 6 is a flowchart for explaining an outline of processing performed in the metadata registration processing. This flowchart shows processing for inferring metadata for a content file designated by the user on the registered content selection screen (FIG. 4) and displaying it as a candidate on the metadata registration screen (FIG. 10). Then, the metadata input by the user is stored in a storage device (not shown) (for example, a storage device on the shared server side).

図６において、まず、コンテンツ情報抽出処理部１１０は、コンテンツ情報１１５として、利用者が選択した学校区分を学校区分３０３、ファイルパスをファイル３０１に、コンテンツ内のテキストをコンテンツテキスト３０６に格納する(ステップ６００)。 In FIG. 6, first, the content information extraction processing unit 110 stores, as the content information 115, the school division selected by the user in the school division 303, the file path in the file 301, and the text in the content in the content text 306 ( Step 600).

次に、コンテンツ情報抽出処理部１１０は、コンテンツ情報１１５における学校区分３０３、学年３０４、科目３０５と同じ値を持つカリキュラム１１３を取得する（ステップ６０１）。ただし、コンテンツ情報１１５の学校区分３０３、学年３０４、科目３０５の中でnullを持つメンバについては、取得条件に含まない。すなわち、学校区分３０３が「中学校」、学年３０４と科目３０５がnullであった場合（つまり、後述の再計算時ではなく初回のメタデータ登録処理の場合は、利用者が登録コンテンツ選択画面（図４）によって指定した学校区分３０３のみが分かっている）、登録学校区分２０１が「中学校」であるすべてのカリキュラム１１３を取得する。 Next, the content information extraction processing unit 110 acquires a curriculum 113 having the same values as the school division 303, the grade 304, and the subject 305 in the content information 115 (step 601). However, members having null in the school information 303, grade 304, and subject 305 of the content information 115 are not included in the acquisition conditions. That is, when the school division 303 is “junior high school” and the grade 304 and the subject 305 are null (that is, in the case of the first metadata registration processing, not at the time of recalculation described later, the user selects the registered content selection screen (FIG. Only the school division 303 designated by 4) is known), and all curriculums 113 whose registered school division 201 is “junior high school” are acquired.

そして、メタデータ推測処理部１１１は、ファイルパスやコンテンツ内のテキストから当該コンテンツに該当する学年、科目、及び関連するキーワードを推測する（ステップ６０２）。なお、ステップ６０２についての処理は、図７を用いて詳細に後述する。 Then, the metadata estimation processing unit 111 estimates a grade, a subject, and a related keyword corresponding to the content from the file path and the text in the content (step 602). The process for step 602 will be described later in detail with reference to FIG.

続いて、メタデータ入力画面表示部１１２は、コンテンツのメタデータの推測結果を反映したメタデータ入力画面を表示する（ステップ６０３）。なお、ステップ６０３についての処理は、図９を用いて詳細に後述する。 Subsequently, the metadata input screen display unit 112 displays a metadata input screen that reflects the estimation result of the metadata of the content (step 603). The process for step 603 will be described later in detail with reference to FIG.

また、図９の処理によって表示された画面に対して利用者がタイトルや学年、科目のいずれかの値を入力や変更した場合、入力された情報を加味してメタデータを再度推測するため、処理はステップ６０１に戻る（ステップ６０４）。ステップ６０４の処理において表示された画面に対して利用者がタイトルや学年、科目のいずれかの値を入力や変更しなかった場合には、メタデータ登録処理は終了し、図４の登録コンテンツ選択画面に戻る。 In addition, when the user inputs or changes any of the title, grade, and subject values on the screen displayed by the processing of FIG. 9, in order to re-estimate the metadata considering the input information, The process returns to step 601 (step 604). If the user does not enter or change any value of title, grade, or subject on the screen displayed in the process of step 604, the metadata registration process ends and the registered content selection in FIG. Return to the screen.

＜メタデータ推測処理部（ステップ６０２）の詳細＞
図７は、図６のステップ６０２におけるメタデータ推測処理部１１１の詳細を説明するためのフローチャートである。このフローチャートは、コンテンツテキストからキーワードを抽出し、さらに、登録するコンテンツがカリキュラムにおいてどの主題に該当するか当該キーワードを用いて推測する処理を示す。最高値のスコアを示すカリキュラムと関連付けられた特徴語が当該コンテンツのメタデータ候補となる。 <Details of Metadata Estimation Processing Unit (Step 602)>
FIG. 7 is a flowchart for explaining details of the metadata estimation processing unit 111 in step 602 of FIG. This flowchart shows a process of extracting a keyword from a content text, and further inferring which subject the registered content corresponds to in the curriculum using the keyword. The feature word associated with the curriculum indicating the highest score is a metadata candidate for the content.

図７において、まず、メタデータ推測処理部１１１は、コンテンツ情報１１５におけるファイル３０１とコンテンツテキスト３０６について形態素解析を行う（ステップ７００）。 In FIG. 7, first, the metadata estimation processing unit 111 performs morphological analysis on the file 301 and the content text 306 in the content information 115 (step 700).

次に、メタデータ推測処理部１１１は、コンテンツ情報１１５におけるタイトル３０２がnullかどうか調べる（ステップ７０１）。 Next, the metadata estimation processing unit 111 checks whether the title 302 in the content information 115 is null (step 701).

ステップ７０１においてタイトル３０２がnullである判断された場合、メタデータ推測処理部１１１は、ステップ７００での形態素解析結果からキーワードの抽出を行う（ステップ７０３）。 If it is determined in step 701 that the title 302 is null, the metadata estimation processing unit 111 extracts keywords from the morphological analysis result in step 700 (step 703).

ステップ７０１においてタイトル３０２がnullでないと判断された場合、メタデータ推測処理部１１１は、タイトル３０２の文字列について形態素解析を行い（ステップ７０２）、ステップ７００とステップ７０２での形態素解析結果からキーワードの抽出を行う（ステップ７０３）。ここで、キーワードは、コンテンツテキスト中に含まれる、名詞的役割を持つひとまとまりの単語を意味する。例えば、「光の屈折」という文字列には、「光」、「屈折」、「光の屈折」の３パターンのキーワードがある。また、「凸レンズの働き」という文字列には、「凸」「レンズ」「凸レンズ」「凸レンズの働き」の４パターンのキーワードがある。ステップ７０３では、文字列中からこのような名詞の連続や動詞を接尾に組み合わせたキーワードを抽出する処理を行う。キーワードを抽出する手法には、様々な既存技術が適用可能である。代表的な技術には、例えば、キーワード抽出の対象文字列に対して形態素解析を施し、抽出された単語の品詞の種類に応じて連結したものをキーワードとみなすものがある（例えば、非特許文献１参照）。基本的には、名詞が１つ以上連続した文字列をキーワードとみなす手法であり、一般的によく用いられている。抽出されたキーワードをさらに詳細に分析し、よりキーワードの抽出精度を高める技術も多数提案されている。本実施形態では、このようなキーワード抽出技術を用いる。 If it is determined in step 701 that the title 302 is not null, the metadata estimation processing unit 111 performs a morphological analysis on the character string of the title 302 (step 702), and determines the keyword from the morphological analysis results in the steps 700 and 702. Extraction is performed (step 703). Here, the keyword means a group of words having a noun role included in the content text. For example, in the character string “light refraction”, there are three patterns of keywords “light”, “refraction”, and “light refraction”. The character string “function of convex lens” has four patterns of keywords “convex”, “lens”, “convex lens”, and “function of convex lens”. In step 703, a keyword is extracted from the character string by combining such a series of nouns and verbs with a suffix. Various existing techniques can be applied to the method of extracting keywords. As a typical technique, for example, a morphological analysis is performed on a target character string for keyword extraction, and a concatenation according to the type of part of speech of extracted words is regarded as a keyword (for example, non-patent document 1). Basically, it is a technique that regards a character string including one or more nouns as a keyword, and is generally used. Many techniques for analyzing extracted keywords in more detail and improving the keyword extraction accuracy have been proposed. In this embodiment, such a keyword extraction technique is used.

続いて、メタデータ推測処理部１１１は、コンテンツテキスト内の形態素の中で品詞が動詞であるものを取得し、原型に正規化してコンテンツキーワード３０７に追加する（ステップ７０４）。例えば、「〜が見えて」という文字列があった際に、この中から「見え」が動詞として取得でき、原型は「見える」となる。 Subsequently, the metadata estimation processing unit 111 acquires a morpheme in the content text whose part of speech is a verb, normalizes it to a prototype, and adds it to the content keyword 307 (step 704). For example, when there is a character string “Visible”, “Visible” can be acquired as a verb from this, and the prototype becomes “visible”.

そして、メタデータ推測処理部１１１は、登録するコンテンツがどのカリキュラム１１３に該当するかコンテンツキーワード３０７を用いてスコアリングして推測する（ステップ７０５）。なお、ステップ７０５の処理の詳細については、図８を用いて後述する。 Then, the metadata estimation processing unit 111 estimates which curriculum 113 the registered content corresponds to by scoring using the content keyword 307 (step 705). Details of the processing in step 705 will be described later with reference to FIG.

＜カリキュラムのスコアリング処理（ステップ７０５）の詳細＞
図８は、図７のステップ７０５におけるカリキュラムのスコアリング処理の詳細を説明するためのフローチャートである。このフローチャートは、登録するコンテンツの内容が、予め辞書として備えた、学年や科目などにカテゴリに分類済みのカリキュラムのどの文書と一致するかを計算するための処理を示す。コンテンツがカリキュラムのどの文書と一致するかの度合について、各カリキュラム１１３にコンテンツキーワード３０７の出現頻度を用いてスコアを計算する。スコアの計算は、コンテンツキーワード３０７における各単語が説明文２０５内で出現する頻度と、コンテンツキーワード３０７における各単語を説明文２０５内に持つカリキュラムの頻度の２つをそれぞれ集計して掛け合わせる。コンテンツキーワード３０７における各単語が説明文２０５内で出現する頻度は、カリキュラム内で説明する内容がコンテンツと一致している場合には説明文内でコンテンツキーワードが多用されることが多く、各単語の合計出現頻度が高いほどコンテンツがカリキュラムに一致しているとする指標である。また、コンテンツキーワード３０７における各単語を説明文２０５内に持つカリキュラムの頻度は、より多くのカリキュラム１１３で出現する単語については全カリキュラムの文書において「一般的である単語」であるとし、この頻度が高いほど該当するカリキュラムを特定するのにその単語は有効でないとする指標である。つまり、あまりに一般的過ぎる単語のスコア値を高くしないための措置である。そして、これら２つの頻度に基づき各カリキュラムのスコアを算出して、登録するコンテンツに該当するカリキュラムを推測する。 <Details of Curriculum Scoring Processing (Step 705)>
FIG. 8 is a flowchart for explaining the details of the curriculum scoring process in step 705 of FIG. This flowchart shows a process for calculating which document of the curriculum classified into categories such as grades and subjects prepared in advance as a dictionary in the contents to be registered. A score is calculated by using the appearance frequency of the content keyword 307 in each curriculum 113 with respect to the degree of which document in the curriculum matches the content. The score is calculated by multiplying and multiplying the frequency of each word in the content keyword 307 appearing in the explanatory sentence 205 and the frequency of the curriculum having the word in the content keyword 307 in the explanatory sentence 205. The frequency at which each word in the content keyword 307 appears in the explanatory text 205 is that the content keyword is often used in the explanatory text when the content explained in the curriculum matches the content. The higher the total appearance frequency is, the more the content is matched with the curriculum. The frequency of the curriculum having each word in the content keyword 307 in the description sentence 205 is assumed to be “general words” in all curriculum documents for words appearing in more curriculums 113. The higher the index, the less effective the word is in identifying the corresponding curriculum. In other words, this is a measure for preventing the score value of a too general word from being increased. Based on these two frequencies, the score of each curriculum is calculated, and the curriculum corresponding to the content to be registered is estimated.

図８において、まず、メタデータ推測処理部１１１は、コンテンツキーワード３０７から処理対象として１つの単語（以降iとする）を取得する（ステップ８００）。 In FIG. 8, first, the metadata estimation processing unit 111 acquires one word (hereinafter referred to as i) as a processing target from the content keyword 307 (step 800).

次に、メタデータ推測処理部１１１は、メモリ上に格納された全てのカリキュラム１１３の説明文２０５におけるiの出現頻度を格納するための変数 total_frequencyを０で初期化する（ステップ８０１）。 Next, the metadata estimation processing unit 111 initializes a variable total_frequency for storing the appearance frequency of i in the explanatory text 205 of all the curriculums 113 stored in the memory with 0 (step 801).

また、メタデータ推測処理部１１１は、メモリ上に格納されたカリキュラム１１３の中からカリキュラムを１つ取得する（ステップ８０２）。 Further, the metadata estimation processing unit 111 acquires one curriculum from the curriculum 113 stored in the memory (step 802).

そして、メタデータ推測処理部１１１は、取得したカリキュラムの説明文におけるiの出現回数を集計し（ステップ８０３）、total_frequencyに出現回数を加算する（ステップ８０４）。なお、ステップ８０３におけるカリキュラムの説明文におけるiの出現回数を集計する際に、カリキュラムの説明文について形態素解析を行い、動詞に該当する単語を原型に正規化する。 The metadata estimation processing unit 111 then counts up the number of occurrences of i in the acquired curriculum description (step 803), and adds the number of appearances to total_frequency (step 804). Note that when the number of occurrences of i in the curriculum explanation text in step 803 is tabulated, morphological analysis is performed on the curriculum explanation text, and words corresponding to the verbs are normalized to the prototype.

メタデータ推測処理部１１１は、メモリ上の全てのカリキュラムについてステップ８０３及びステップ８０４の処理をしたかを確認する（ステップ８０５）。ステップ８０５においてメモリ上の全てのカリキュラムが処理されていなかった場合、処理されていないカリキュラムについてiの出現頻度を計算するためステップ８０２に戻り、次のカリキュラムについて処理が行われる。 The metadata estimation processing unit 111 confirms whether or not the processing in steps 803 and 804 has been performed for all the curriculums on the memory (step 805). If all the curriculums in the memory have not been processed in step 805, the process returns to step 802 to calculate the appearance frequency of i for the unprocessed curriculum, and the next curriculum is processed.

ステップ８０５においてメモリ上のすべてのカリキュラムが処理されていた場合、メタデータ推測処理部１１１は、メモリ上のすべてのカリキュラムの数に対してiを１つ以上含んでいたカリキュラムの数で割った値を変数curriculum_frequencyに格納する（ステップ８０６）。 When all the curriculums in the memory are processed in step 805, the metadata inference processing unit 111 divides the number of all curriculums in the memory by the number of curriculums including one or more i. Is stored in the variable curriculum_frequency (step 806).

さらに、メタデータ推測処理部１１１は、予め指定した値ｘを底とするcurriculum_frequencyの対数（以降、ＩＤＦ(i)とする）を計算する（ステップ８０７）。ここで、ＩＤＦ(i)は、iがカリキュラム全体の文書の中でどの程度一般的に使用されている単語であるかを示す値（第１の評価値）であり、高ければ高いほど限られたカリキュラムのみで用いられた単語であることを示す。 Further, the metadata estimation processing unit 111 calculates the logarithm of curriculum_frequency (hereinafter referred to as IDF (i)) with the value x specified in advance as a base (step 807). Here, IDF (i) is a value (first evaluation value) indicating how commonly i is a word used in the document of the entire curriculum. Indicates that the word is used only in the curriculum.

次に、メタデータ推測処理部１１１は、メモリ上に格納されたカリキュラム１１３の中からカリキュラムを１つ取得する（ステップ８０８）。 Next, the metadata estimation processing unit 111 acquires one curriculum from the curriculum 113 stored on the memory (step 808).

そして、メタデータ推測処理部１１１は、当該カリキュラムにおけるiの出現頻度を total_frequency で割る（以降、ＴＦ(i)）（ステップ８０９）。ステップ８０９の処理は、各カリキュラムにおけるiの出現回数をメモリ上の全カリキュラムにおけるiの出現回数で割ることで、iを含むカリキュラム間での相対的な頻度の比較を行う（第２の評価値）。また、コンテンツキーワードにおけるi以外の単語の出現回数との正規化を計る効果がある。例えば、中学校の理科におけるカリキュラムＡ、カリキュラムＢとコンテンツキーワードの「水」、「太陽」、「植物」があるとする。カリキュラムＡは光の屈折における指導内容について、カリキュラムＢは植物の体のしくみについての指導内容の記述がある。カリキュラムＡにおける指導内容の記述には、太陽などの光が空気中から水中に入射する際の屈折する現象について解説しており、カリキュラムＢは植物の光合成や根から水を取り込む体のしくみについて解説している。また、カリキュラムＡにおける「水」の出現回数は１９回、「太陽」の出現回数は１回、「植物」の出現回数は０回、カリキュラムＢにおける「水」の出現回数は３回、「太陽」の出現回数は２回、「植物」の出現回数は５回、であるとする。登録するコンテンツが該当するカリキュラムはＢに該当するにも拘わらず、単純な合計出現回数だけで比較した場合、カリキュラムＡは２０回、カリキュラムＢは１０回となり、カリキュラムＡの方が登録するコンテンツに該当すると誤って推測してしまうケースが考えられる。すると、コンテンツキーワードに「水」を含む時点で、異なる内容のコンテンツでもカリキュラムＡに誤って推測しやすくなってしまう。このようなケースを回避するため、各単語の出現回数をすべての出現回数で割ることで、カリキュラム間の頻度の比較は維持したまま、他の単語に比べて頻度が高すぎる単語のスコアを抑えることができる。この場合、カリキュラムＡ、および、カリキュラムＢにおける「水」の出現頻度は２２回、「太陽」は３回、「植物」は５回であることから、正規化したカリキュラムＡにおけるコンテンツキーワードの合計出現回数は、「水」「太陽」「植物」それぞれについての出現頻度を加算することにより求められ、１９／２２＋１／３＋０／５＝１．２回（小数第２位四捨五入）となる。また、カリキュラムＢにおけるコンテンツキーワードの合計出現回数は、同様に、３／２２＋２／３＋５／５＝１．８回（小数第２位四捨五入）となる。 Then, the metadata estimation processing unit 111 divides the appearance frequency of i in the curriculum by total_frequency (hereinafter referred to as TF (i)) (step 809). The processing in step 809 compares the frequency of i occurrences in each curriculum by the number of occurrences of i in all the curriculums in the memory, thereby comparing the relative frequencies among curriculums including i (second evaluation value). ). In addition, there is an effect of measuring normalization with the number of appearances of words other than i in the content keyword. For example, it is assumed that there are curriculum A and curriculum B in the science of junior high school and the content keywords “water”, “sun”, and “plant”. Curriculum A has a description of instruction content regarding refraction of light, and Curriculum B has a description of instruction content regarding the structure of a plant body. The description of the content of the curriculum A explains the phenomenon of refraction when light such as the sun enters the water from the air, and the curriculum B explains the structure of the body that takes in water from plant photosynthesis and roots. doing. In addition, the number of appearances of “water” in curriculum A is 19, the number of appearances of “sun” is 1, the number of appearances of “plants” is 0, the number of occurrences of “water” in curriculum B is 3, "Appears twice, and" plant "appears five times. Although the curriculum to which the content to be registered corresponds to B, when compared only with the simple total number of appearances, curriculum A is 20 times and curriculum B is 10 times, and curriculum A is the content to be registered. There may be a case where it is erroneously guessed that this is the case. Then, at the time when the content keyword includes “water”, it becomes easy to erroneously guess the curriculum A even for content with different contents. To avoid this case, divide the number of occurrences of each word by the number of occurrences of each word, and keep the comparison of the frequency between curriculums while keeping the score of words that are too frequent compared to other words. be able to. In this case, since the frequency of occurrence of “water” in curriculum A and curriculum B is 22 times, “sun” is 3 times, and “plant” is 5 times, the total appearance of content keywords in normalized curriculum A The number of times is obtained by adding the appearance frequencies for each of “water”, “sun”, and “plant”, and is 19/22 + 1/3 + 0/5 = 1.2 times (rounded to the first decimal place). Similarly, the total number of appearances of the content keyword in the curriculum B is 3/22 + 2/3 + 5/5 = 1.8 times (rounded to the first decimal place).

続いて、メタデータ推測処理部１１１は、ＴＦ(i)にＩＤＦ(i)を乗算して当該カリキュラムのスコアに加算する（ステップ８１０）。この処理によって、当該カリキュラムにおいて多くのキーワードが高頻度で使用され、かつ、当該カリキュラムで限定的に使用されている単語がコンテンツキーワードに多いほど高いスコアとなる。 Subsequently, the metadata estimation processing unit 111 multiplies TF (i) by IDF (i) and adds it to the score of the curriculum (step 810). By this processing, a higher score is obtained as more keywords are used more frequently in the curriculum and more words are used in a limited amount in the curriculum.

そして、メタデータ推測処理部１１１は、メモリ上のすべてのカリキュラムについてiの出現頻度に基づくスコアを計算したか調べる（ステップ８１１）。 Then, the metadata estimation processing unit 111 checks whether or not the scores based on the appearance frequency of i have been calculated for all the curriculums on the memory (step 811).

ステップ８１１において、メモリ上の全てのカリキュラムについて処理していないと判断された場合、処理は、残りのカリキュラムについてスコアを計算するためステップ８０８に戻る。 If it is determined in step 811 that all the curriculums in the memory have not been processed, the process returns to step 808 to calculate scores for the remaining curriculums.

ステップ８１１において、メモリ上納の全てのカリキュラムについて処理したと判断された場合、メタデータ推測処理部１１１は、さらに、コンテンツキーワードにおけるすべての単語を処理したか確認する（ステップ８１２）。 If it is determined in step 811 that all the curriculums stored in the memory have been processed, the metadata estimation processing unit 111 further checks whether all the words in the content keyword have been processed (step 812).

ステップ８１２において、すべての単語を処理していないと判断された場合、処理は、残りの単語を処理するためステップ８００に戻る。 If it is determined in step 812 that all words have not been processed, processing returns to step 800 to process the remaining words.

一方、ステップ８１２において、すべての単語を処理したと判断された場合、当該カリキュラムのスコアリング処理は終了する。 On the other hand, if it is determined in step 812 that all the words have been processed, the scoring process for the curriculum ends.

＜メタデータ入力画面表示部の処理（Ｓ６０３）の詳細＞
図９は、図６のステップ６０３におけるメタデータ入力画面表示部１１２による処理の詳細を説明するためのフローチャートである。このフローチャートは、スコアリングされたメモリ上のカリキュラム１１３のから、登録するコンテンツに付与するメタデータの候補としてカテゴリ情報やキーワード（特徴語）をサジェストする画面表示の処理を示す。また、候補として表示するメタデータにおいて、例えば、よりコンテンツに関連する可能性が高いキーワードを判定し、メタデータ入力画面表示上でハイライト表示を行う。 <Details of Processing of Metadata Input Screen Display Unit (S603)>
FIG. 9 is a flowchart for explaining details of processing by the metadata input screen display unit 112 in step 603 of FIG. This flowchart shows a screen display process for suggesting category information and keywords (feature words) as metadata candidates to be added to the content to be registered from the scored curriculum 113 on the memory. In addition, in the metadata to be displayed as candidates, for example, a keyword that is more likely to be related to content is determined, and a highlight display is performed on the metadata input screen display.

図９において、メタデータ入力画面表示部１１２は、まず、メモリ上にあるカリキュラム１１３をスコアの降順でソートする（ステップ９００）。 In FIG. 9, the metadata input screen display unit 112 first sorts the curriculum 113 on the memory in descending order of scores (step 900).

次に、メタデータ入力画面表示部１１２は、先頭のカリキュラムを取得し（ステップ９０１）、当該カリキュラム１１３における学年２０２、科目２０３の値をコンテンツ情報１１５における学年３０４、科目３０５に格納する（ステップ９０２）。 Next, the metadata input screen display unit 112 acquires the first curriculum (step 901), and stores the values of the grade 202 and the subject 203 in the curriculum 113 in the grade 304 and the subject 305 in the content information 115 (step 902). ).

さらに、メタデータ入力画面表示部１１２は、当該カリキュラム１１３のＩＤ２００の値を関連ＩＤ２０８に持つ特徴語１１４を特徴語ＤＢ１０２から取得する（ステップ９０３）。 Further, the metadata input screen display unit 112 acquires the feature word 114 having the ID 200 value of the curriculum 113 as the related ID 208 from the feature word DB 102 (step 903).

また、メタデータ入力画面表示部１１２は、取得した特徴語１１４の中で登録するコンテンツにより強く関連している単語がないか、コンテンツキーワードと比較して判定するため、メモリ上に格納されている特徴語１１４におけるキーワード２０９から１つ（Ａとする）を取得し（ステップ９０５）、コンテンツ情報１１５におけるコンテンツキーワード３０７から一つ（Ｂとする）を取得する（ステップ９０６）。 Further, the metadata input screen display unit 112 is stored in the memory in order to determine whether there is a word more strongly related to the registered content among the acquired feature words 114 in comparison with the content keyword. One (referred to as A) is acquired from the keyword 209 in the feature word 114 (step 905), and one (referred to as B) is acquired from the content keyword 307 in the content information 115 (step 906).

そして、メタデータ入力画面表示部１１２は、ＡとＢのどちらかの文字列が、もう一方の文字列の一部に含まれているかを確認するため、Ａの文字列にＢの文字列が、または、Ｂの文字列にＡの文字列が部分一致するかを確認する（ステップ９０７）。 Then, the metadata input screen display unit 112 confirms whether one of the character strings A and B is included in a part of the other character string, so that the character string B is included in the character string A. Or, it is confirmed whether the character string of A partially matches the character string of B (step 907).

ステップ９０７において一方の文字列がもう一方の文字列に部分一致すると判断された場合、メタデータ入力画面表示部１１２は、Ａは登録するコンテンツにおいてより関連する可能性が高いと判断してＡをキーワード２０９に保持する特徴語１１４の重要フラグ２１０にtrueを格納する（ステップ９０８）。ステップ９０７において部分一致しないと判断された場合には、ステップ９０８の処理はスキップされる。 When it is determined in step 907 that one character string partially matches the other character string, the metadata input screen display unit 112 determines that A is more likely to be related to the content to be registered, and determines A. True is stored in the important flag 210 of the feature word 114 held in the keyword 209 (step 908). If it is determined in step 907 that there is no partial match, step 908 is skipped.

そして、メタデータ入力画面表示部１１２は、Ａに対して全てのコンテンツキーワードを比較したかを確認する（ステップ９０９）。 Then, the metadata input screen display unit 112 checks whether all content keywords have been compared with A (step 909).

ステップ９０９において、全てのコンテンツキーワードについて比較済みではないと判断された場合、処理は、残りのコンテンツキーワードについてＡと比較するためステップ９０６に戻る。 If it is determined in step 909 that all content keywords have not been compared, the process returns to step 906 to compare the remaining content keywords with A.

一方、ステップ９０９において、コンテンツキーワードがＡ対して全て確認済であると判断された場合、メタデータ入力画面表示部１１２は、全てのカテゴリキーワード２０９におけるキーワード２０９について登録するコンテンツとの関連する可能性が高いかを確認したかを確認する（ステップ９１０）。 On the other hand, if it is determined in step 909 that all content keywords have been confirmed for A, the metadata input screen display unit 112 may be related to the content to be registered for the keywords 209 in all category keywords 209. Is confirmed to be high (step 910).

ステップ９１０において、すべての特徴語１１４におけるキーワード２０９について処理していないと判断された場合、残りの特徴語１１４におけるキーワード２０９を処理するため、処理は、ステップ９０５に戻る。 If it is determined in step 910 that the keywords 209 in all the feature words 114 have not been processed, the process returns to step 905 to process the keywords 209 in the remaining feature words 114.

一方、ステップ９１０においてすべての特徴語１１４におけるキーワード２０９について処理したと判断された場合、メタデータ入力画面表示部１１２は、メタデータ入力画面（図１０）を表示する（ステップ９１１）。なお、メタデータ入力画面を表示後、当該プログラムは利用者によるメタデータ入力画面からの処理命令の待ち状態となる。メタデータ入力画面の詳細については、図１０を参照して後述する。 On the other hand, when it is determined in step 910 that the keyword 209 in all the feature words 114 has been processed, the metadata input screen display unit 112 displays the metadata input screen (FIG. 10) (step 911). Note that after displaying the metadata input screen, the program waits for a processing command from the metadata input screen by the user. Details of the metadata input screen will be described later with reference to FIG.

続いて、メタデータ入力画面表示部１１２は、メタデータ入力画面（図１０）において、利用者が表示カテゴリ欄１００５の値をプルダウンから変更したか判断する（ステップ９１２）。 Subsequently, the metadata input screen display unit 112 determines whether the user has changed the value of the display category column 1005 from the pull-down on the metadata input screen (FIG. 10) (step 912).

表示カテゴリ欄１００５の値が変更されなければ、処理は終了する。一方、表示カテゴリ欄１００５の値が変更されると、処理はステップ９１３に移行する。 If the value in the display category column 1005 is not changed, the process ends. On the other hand, when the value of the display category column 1005 is changed, the process proceeds to step 913.

メタデータ入力画面表示部１１２は、利用者の変更命令によって選択された主題の文字列を主題２０４の値としてもつカリキュラム１１３のＩＤ２００を取得する（ステップ９１３）。 The metadata input screen display unit 112 acquires the ID 200 of the curriculum 113 having the subject character string selected by the change instruction of the user as the value of the subject 204 (step 913).

次に、メタデータ入力画面表示部１１２は、当該ＩＤを関連ＩＤ２０８として持つ特徴語１１４を特徴語ＤＢ１０２から取得してメモリを更新する（ステップ９１４）。 Next, the metadata input screen display unit 112 acquires the feature word 114 having the ID as the related ID 208 from the feature word DB 102 and updates the memory (step 914).

そして、メタデータ入力画面表示部１１２は、利用者によって指定されたカリキュラムに関連付けられたカテゴリキーワードについて、登録するコンテンツと関連する可能性が高いかを判定するためステップ９０５〜ステップ９１１の処理を行い、メタデータ入力画面における関連キーワード欄１００６を更新して画面に再描画する。例えば、後述の図１２における表示カテゴリ１２００のプルダウンリストから利用者が「２．地層の様子」を選択した場合、メタデータ入力画面表示部１１２は、当該文字列を主題として持つカリキュラムを取得し、当該カリキュラムに関連付けられた特徴語を取得し、関連キーワード欄１２０１を更新する。なお、関連キーワード欄１２０１を更新する際に、利用者によってチェックボックスにチェックが付いている単語はメモリ上に保持されたままとなり、更新後も関連キーワード欄１２０１に表示される。 Then, the metadata input screen display unit 112 performs the processing from step 905 to step 911 to determine whether the category keyword associated with the curriculum designated by the user is likely to be associated with the content to be registered. Then, the related keyword column 1006 on the metadata input screen is updated and redrawn on the screen. For example, when the user selects “2. State of formation” from the pull-down list of the display category 1200 in FIG. 12 described later, the metadata input screen display unit 112 acquires a curriculum having the character string as a subject, A feature word associated with the curriculum is acquired, and the related keyword column 1201 is updated. When the related keyword column 1201 is updated, words whose checkboxes are checked by the user remain stored in the memory and are displayed in the related keyword column 1201 even after the update.

＜メタデータ入力画面＞
図１０を参照して、メタデータ入力画面について詳細に説明する。図１０は、メタデータ入力画面の一例を示す図である。 <Metadata input screen>
The metadata input screen will be described in detail with reference to FIG. FIG. 10 is a diagram illustrating an example of a metadata input screen.

メタデータ入力画面では、コンテンツのメタデータの推測結果として学年や科目は利用者によってあらかじめ選択済みとして表示され、かつ、コンテンツに関連するキーワード候補（特徴語）が表示される。 On the metadata input screen, the grade and subject are displayed as pre-selected by the user as a guess result of the content metadata, and keyword candidates (feature words) related to the content are displayed.

ＧＵＩウィンドウにおいて、登録するファイルのファイルサーバ１０８におけるファイルパスがファイル欄１０００に表示され、登録するファイルのサムネイル画像がサムネイル欄１００１に表示されている。 In the GUI window, the file path of the file to be registered in the file server 108 is displayed in the file column 1000, and the thumbnail image of the file to be registered is displayed in the thumbnail column 1001.

また、タイトル欄１００２には、メモリ上に格納されたコンテンツ情報１１５におけるタイトル３０２が表示される。学年欄１００３には学年３０４が、科目欄１００４には科目３０５が表示される。なお、タイトル欄１００２はテキストエリアになっており、利用者が内容を自由に入力することができる。また、学年欄１００３と科目欄１００４は、プルダウンリストになっており、コンテンツ情報１１５における学校区分３０３で指定されたカテゴリに対応関係のある値があらかじめリストとして備える。例えば、学校区分３０３が「中学校」である場合、学年欄１００３は「１年生」、「２年生」、「３年生」を、科目欄１００４は「国語」、「数学」、「英語」、「理科」、「社会」、「音楽」、「美術」、「保健体育」、「技術・家庭」、「道徳」を、あらかじめリストとして備える。学年欄１００３及び科目欄１００４には、コンテンツ情報１１５における学年３０４と科目３０５と同じ文字列が選択済みとして表示される。 In the title column 1002, the title 302 in the content information 115 stored in the memory is displayed. The grade column 304 is displayed in the grade column 1003 and the subject 305 is displayed in the subject column 1004. Note that the title column 1002 is a text area, and the user can freely input the contents. The grade column 1003 and the subject column 1004 are pull-down lists, and values corresponding to the category specified in the school division 303 in the content information 115 are provided as a list in advance. For example, when the school division 303 is “junior high school”, the grade column 1003 is “first grade”, “second grade”, “third grade”, and the subject column 1004 is “national language”, “math”, “English”, “ “Science”, “Society”, “Music”, “Art”, “Health and Physical Education”, “Technology / Home”, and “Morality” are provided as a list in advance. In the grade column 1003 and the subject column 1004, the same character strings as the grade 304 and the subject 305 in the content information 115 are displayed as selected.

上述のように、データメモリ１０５は、全てのカリキュラムの情報を格納しているが、例えば、カリキュラム１１３の主題２０４はリストの順で格納されている。そして、表示カテゴリ欄１００５は、選択されたカリキュラムの主題を表示する。また、表示カテゴリ欄１００５はプルダウンリストとなっており、利用者はそのプルダウンリストから所望のカリキュラムの主題を選択できるようになっている。利用者が当該プルダウンリストによる選択値を変更した場合、待ち状態のプログラムに対して処理を開始する命令を渡す（図９のステップ９１３参照）。なお、当該メタデータ入力画面が表示された際の初期選択値は、カリキュラム１１３のリストにおける先頭の主題２０４とされている。 As described above, the data memory 105 stores all the curriculum information. For example, the subjects 204 of the curriculum 113 are stored in the order of the list. The display category column 1005 displays the theme of the selected curriculum. The display category column 1005 is a pull-down list, and the user can select a desired curriculum theme from the pull-down list. When the user changes the value selected from the pull-down list, a command to start processing is passed to the waiting program (see step 913 in FIG. 9). The initial selection value when the metadata input screen is displayed is the first subject 204 in the curriculum 113 list.

関連キーワード欄１００６においては、データメモリ１０５上における特徴語１１４であって、表示カテゴリ欄１００５に表示された主題を有するカリキュラムに関連付けられた特徴語１１４のキーワード２０９が、チェックボックス付きで表示される。また、特徴語１１４における重要フラグ２１０がtrueである場合、キーワードを太字にするなどして強調して表示する。なお、太字表示以外にも色を変えたり、文字の大きさを変える等により強調表示しても良い。これにより、利用者は表示されたキーワードの中から登録するコンテンツに関連性が高いキーワードを探しやすくなる。 In the related keyword column 1006, the keyword 209 of the feature word 114 associated with the curriculum having the subject displayed in the display category column 1005, which is the feature word 114 on the data memory 105, is displayed with a check box. . Further, when the important flag 210 in the feature word 114 is true, the keyword is highlighted and displayed, for example. In addition to bold display, highlighting may be performed by changing the color or changing the size of the character. This makes it easier for the user to search for keywords that are highly relevant to the content to be registered from among the displayed keywords.

追加キーワード欄１００７はテキストエリアになっており、関連キーワード欄に表示されたキーワード候補以外で任意に追加したいキーワードがある場合に、利用者がキーボードで直接入力できるようになっている。追加したいキーワードが複数ある場合は、スペース区切りで入力を行う。 The additional keyword column 1007 is a text area, and when there is a keyword to be arbitrarily added other than the keyword candidates displayed in the related keyword column, the user can directly input with the keyboard. If there are multiple keywords you want to add, enter them separated by spaces.

利用者は、サムネイル１００１で登録しようとするコンテンツの内容を確認しながら、当該コンテンツのタイトルの入力や、推測された学年、科目、関連キーワードの確認と選択を行う。 While confirming the content of the content to be registered with the thumbnail 1001, the user inputs the title of the content and confirms and selects the estimated grade, subject, and related keywords.

そして、図１０において、利用者は、各項目欄の入力が終わったら登録ボタン１００８を押下することで、メタデータ入力画面で入力・選択された値がコンテンツ情報１１５に格納される。ファイル欄１０００はコンテンツ情報１１５におけるファイル３０１、タイトル欄１００２はタイトル３０２、学年欄１００３は学年３０４、科目欄１００４は科目３０５、関連キーワード欄１００６のチェックボックスがチェックされた単語は付与キーワード３０８、追加キーワード欄１００７は文字列をスペース区切りに分割してそれぞれ付与キーワード３０８、へ格納する。そして、コンテンツのメタデータが格納されたコンテンツ情報１１５をコンテンツ情報ＤＢ１０２へ格納する。このようにして利用者は、登録するコンテンツについてメタデータ入力画面からメタデータを登録することが出来る。 In FIG. 10, the user presses a registration button 1008 when input of each item field is completed, and the value input / selected on the metadata input screen is stored in the content information 115. The file column 1000 is the file 301 in the content information 115, the title column 1002 is the title 302, the grade column 1003 is the grade 304, the subject column 1004 is the subject 305, and the word whose associated keyword column 1006 is checked is the assigned keyword 308. The keyword column 1007 divides the character string into space delimiters and stores them in the assigned keywords 308. Then, the content information 115 in which content metadata is stored is stored in the content information DB 102. In this way, the user can register metadata about the content to be registered from the metadata input screen.

また、利用者がタイトル欄１００２、学年欄１００３、科目欄１００４のいずれかに対して入力・変更した場合、当該コンテンツ登録プログラムは、処理命令を画面から受け付けて処理を開始する。そして、メタデータ入力画面におけるタイトル欄１００２、学年欄１００３、科目欄１００４の値が、コンテンツ情報１１５におけるタイトル３０２、学年３０４、科目３０５に格納される。この場合、ステップ９１２において表示カテゴリ欄の値は変更されていないため、図９のフローチャートの処理が終了する。また、図６のフローチャートにおけるステップ６０４において、利用者によってタイトル、学年、科目のいずれかの値が入力か変更されていると判断され、処理はステップ６０１に戻り、変更された学年や科目の条件におけるカリキュラムのみが取得され、コンテンツのメタデータが再計算される。 When the user inputs or changes any of the title column 1002, the grade column 1003, and the subject column 1004, the content registration program accepts a processing command from the screen and starts processing. Then, the values in the title column 1002, the grade column 1003, and the subject column 1004 on the metadata input screen are stored in the title 302, the grade 304, and the subject 305 in the content information 115. In this case, since the value of the display category column has not been changed in step 912, the processing of the flowchart of FIG. 9 ends. Further, in step 604 in the flowchart of FIG. 6, it is determined that the user has entered or changed any of the title, grade, and subject values, and the process returns to step 601 to change the grade and subject conditions. Only the curriculum at is acquired and the content metadata is recalculated.

ステップ６０２のコンテンツのメタデータを推測する処理においても、利用者によって入力されたタイトルの文字列からキーワードが抽出され、コンテンツキーワードに追加される。この追加されたコンテンツキーワードは、該当するカリキュラムの推定に利用される。 Also in the process of estimating the metadata of the content in step 602, keywords are extracted from the title character string input by the user and added to the content keywords. The added content keyword is used to estimate the corresponding curriculum.

例えば、図１０において、学年が「１年生」から「３年生」に変更された場合、メモリ上のコンテンツ情報１１５におけるタイトル３０２は「火山の噴火」、学校区分３０３は「中学校」、学年３０４は「３年生」、科目３０５は「理科」に更新される。そして、カリキュラムＤＢ１００からは、学校区分が「中学校」、学年が「３年生」、科目が「理科」であるカテゴリに分類されたカリキュラム１１３のみが取得され、登録するコンテンツが取得したカリキュラムに対してスコアが再計算される。 For example, in FIG. 10, when the grade is changed from “first grade” to “third grade”, the title 302 in the content information 115 in the memory is “volcanic eruption”, the school division 303 is “junior high school”, and the grade 304 is “3rd grade” and subject 305 are updated to “science”. From the curriculum DB 100, only the curriculum 113 classified into the category of “junior high school”, school grade “third grade”, and subject “science” is acquired from the curriculum DB 100, and the curriculum acquired by the content to be registered is acquired. The score is recalculated.

図１１は、スコアの再計算後のメタデータ入力画面を示す図である。利用者が学年欄１１００を「１年生」から「３年生」に変更したことに伴い、表示カテゴリ１１０１と関連キーワード欄１１０２が更新される。そして、利用者が登録ボタンを押下することにより、関連キーワード（関連特徴語）欄１１０２に表示され、利用者がチェックマークを付与したキーワードが当該登録しようとするコンテンツのメタデータとして登録される。 FIG. 11 is a diagram illustrating a metadata input screen after score recalculation. The display category 1101 and the related keyword column 1102 are updated when the user changes the grade column 1100 from “1st grade” to “3rd grade”. When the user presses the registration button, the keyword displayed in the related keyword (related feature word) column 1102 and the check mark given by the user is registered as metadata of the content to be registered.

＜まとめ＞
（１）以上、登録するコンテンツについてメタデータを付与するためのコンテンツ管理装置（コンテンツ管理システム）について説明している。当該コンテンツ管理装置は、登録するコンテンツから取得できるテキスト情報を基に、学年や科目などあらかじめ階層化されたカテゴリに分類したカリキュラム（コンテンツ管理情報）の各文書（コンテンツの説明文）にコンテンツを分類して、カテゴリやカリキュラムに関連付けされたキーワード（特徴語）をサジェストする。特に、カリキュラムに沿って、テキストや画像を用いて作成された教育分野向けコンテンツでは、カリキュラムで指導を支持された事柄を説明するために各主題で限定的、または、高頻度な単語を含むことが多い。このような特徴を利用することで、コンテンツをカリキュラムの主題毎に分類しやすくなる。さらに、主題毎に特徴的な単語（特徴語）をあらかじめ定義して関連付けすることで、コンテンツに関連するキーワードとしてサジェストを行うことを特徴とする。キーワードをあらかじめ辞書として備えることで、コンテンツ上のテキストにはないが主題を説明するために重要な単語や主題全体を示す概要的な言葉、表記揺れなどに対応することができる。このようにしてサジェストされたキーワードについて、利用者はチェックボックスで選択するだけでよい。このようにすることにより、コンテンツ内のテキスト情報から検索や利用者の内容理解に有効なメタデータを推測し、推測結果を登録画面に反映させることでコンテンツの登録作業の簡素化することができる。 <Summary>
(1) The content management apparatus (content management system) for giving metadata to the content to be registered has been described above. The content management device classifies the content into each document (content description) of the curriculum (content management information) that is classified into categories that have been hierarchized in advance, such as grades and subjects, based on text information that can be acquired from the registered content Then, the keyword (characteristic word) associated with the category or curriculum is suggested. In particular, educational content created using text and images in line with the curriculum should contain limited or frequent words in each subject to explain what was supported by the curriculum. There are many. By using such a feature, it becomes easy to classify content for each subject of the curriculum. Furthermore, a characteristic word (characteristic word) is defined in advance for each subject and associated with each other to perform a suggestion as a keyword related to the content. By providing the keyword as a dictionary in advance, it is possible to deal with words that are not included in the text on the content but are important for explaining the subject, general words that indicate the entire subject, and fluctuations in notation. The user only needs to select a check box for the keyword suggested in this way. By doing so, it is possible to simplify the content registration work by inferring effective metadata for searching and understanding the contents of the user from text information in the content and reflecting the estimation result on the registration screen. .

また、図１０におけるメタデータ登録画面では、メタデータ項目を上からタイトル、学年、科目の順に値の入力、確認をおこなうことで、カリキュラムの対象が絞られていくため、登録するコンテンツに該当するカリキュラムを推測する精度が高くなり、関連キーワードを選択する際にはよりコンテンツに適切なキーワードをサジェストすることが可能となる。このように、当該特徴抽出装置を用いることで、検索時やコンテンツの情報を参照した利用者の内容理解の際に有効なメタデータ付与の登録作業を簡素化することができる。 In addition, in the metadata registration screen in FIG. 10, by inputting and confirming metadata items in the order of title, grade, and subject from the top, the curriculum targets are narrowed down. The accuracy of guessing the curriculum is improved, and it is possible to suggest more appropriate keywords for content when selecting related keywords. In this way, by using the feature extraction device, it is possible to simplify the registration process for adding metadata that is effective at the time of search or when the user understands the content by referring to the content information.

本実施形態では、コンテンツファイルを登録する処理について説明している。また、コンテンツファイル以外でも、Ｗｅｂ上で公開されたＷｅｂページなどでもよい。この場合、登録したいＷｅｂページのＵＲＬを入力として、図６のステップ６００におけるコンテンツ情報を抽出する処理の際に、Ｗｅｂページ内のテキストを取得してコンテンツテキスト３０６に格納する。このようにすることで、ファイルからテキストを取得した場合と同様の処理が可能となる。 In the present embodiment, processing for registering a content file is described. In addition to a content file, a web page published on the web may be used. In this case, the URL of the Web page to be registered is input, and the text in the Web page is acquired and stored in the content text 306 in the process of extracting the content information in Step 600 of FIG. In this way, the same processing as when text is acquired from a file can be performed.

また、本実施形態では、ファイルサーバ１０８上のOfficeファイルやFlashファイルなどのテキスト情報を含むコンテンツファイルを登録する処理について説明している。また、テキスト情報を含むファイルに限らず、画像などのファイル内にテキスト情報を含まないファイルを登録してもよい。この場合、メタデータ入力画面において、登録する画像ファイルの活用事例などを入力するテキストエリアを追加して利用者に入力させる。利用者が入力した活用事例の文書を形態素解析し、図７のステップ７０３および７０４の処理によってキーワードを抽出しコンテンツキーワードとする。また、画像からOCR（Optical Character Recognition）で文字列を抽出して、それら文字列をコンテンツキーワードとしても良い。このようにすることで、各カリキュラムのスコアリングについて、テキスト情報をもつコンテンツファイルと同様処理することができる。また、利用者が当該テキストエリアに文字を入力する度に、該当するカリキュラム１１３を再計算して動的に関連キーワード欄１００６を更新する。 In the present embodiment, processing for registering content files including text information such as Office files and Flash files on the file server 108 is described. Further, not only a file including text information but also a file not including text information may be registered in a file such as an image. In this case, on the metadata input screen, a text area for inputting a utilization example of the image file to be registered is added and the user inputs it. A morphological analysis is performed on the utilization case document input by the user, and keywords are extracted by the processing in steps 703 and 704 in FIG. Alternatively, character strings may be extracted from the image by OCR (Optical Character Recognition), and these character strings may be used as content keywords. In this way, scoring of each curriculum can be processed in the same way as a content file having text information. Each time the user inputs a character in the text area, the corresponding curriculum 113 is recalculated and the related keyword column 1006 is dynamically updated.

（２）本実施形態では、図６のステップ６０１の処理において、学年や科目などあらかじめ階層されたカテゴリに分類した教育分野向けカリキュラムについてコンテンツを分類する処理について説明している。また、教育分野向けカリキュラムに限らず、コールセンターや工業製品のマニュアルなどを用いてもよい。例えば、コールセンターのマニュアルを用いた場合、マニュアルをあらかじめ対応業務ごとに分類した文書を辞書としてそなえることで、問い合わせやクレームに対応するマニュアルの部分を容易に特定できるほか、カテゴリごとの統計処理などに活用することができる。さらに、対応業務毎に、過去の対応事例などをあらかじめ関連付けさせておくことで、コールセンターの利用者にサジェストすることが出来る。また、製品マニュアルを用いた場合でも同様に、顧客からのクレーム情報に対して不具合内容毎に分類することで、不具合に対応するマニュアル部分の特定や不具合ごとの統計処理などに活用できる。 (2) In the present embodiment, the processing of classifying content in the curriculum for educational fields classified into categories such as grades and subjects in advance in the processing of step 601 in FIG. 6 is described. In addition, the curriculum for the education field is not limited to a call center or a manual for industrial products. For example, when using a call center manual, it is possible to easily identify the part of the manual corresponding to inquiries and complaints by providing a document that is classified in advance according to the work to be handled, and for statistical processing for each category. Can be used. Furthermore, it is possible to suggest to call center users by associating past correspondence cases and the like in advance for each correspondence business. Similarly, when a product manual is used, the complaint information from the customer is classified for each defect content, so that it can be used for identification of a manual portion corresponding to the defect and statistical processing for each defect.

さらに、例えば、個人が自己の所有する車を中古車として販売したい場合にデータベースに登録するときの処理にも適用することができる。この場合、本実施形態で示したカリキュラム情報に代えて、車の車種、年式、色、状態等を管理するためのコンテンツ管理情報と、それに関連付けられた特徴語を予め用意しておくことになる。また、個人が特定の趣味に関するコンテンツをデータベースに登録して情報を他の者と共有したいと考える場合にも適用することができる。この場合、カリキュラム情報に代えて、様々な趣味のカテゴリで構成されるコンテンツ管理情報とそれに関連付けられる特徴語を予め用意しておくことになる。 Furthermore, for example, when an individual wants to sell his / her own car as a used car, the present invention can also be applied to processing when registering in a database. In this case, instead of the curriculum information shown in the present embodiment, content management information for managing the vehicle type, model year, color, state, etc. of the car and feature words associated therewith are prepared in advance. Become. The present invention can also be applied when an individual wants to register content related to a specific hobby in a database and share information with other people. In this case, instead of the curriculum information, content management information composed of various hobby categories and characteristic words associated therewith are prepared in advance.

このように、本発明は教育用のコンテンツを登録する際のメタデータ提示処理のみに適用されるのではなく、様々な種類のコンテンツを登録する際のメタデータ提示処理にも提供されることが理解される。 As described above, the present invention is not only applied to the metadata presentation process when registering educational content, but also provided to the metadata presentation process when registering various types of content. Understood.

（３）本実施形態では、図９のステップ９０３、および、ステップ９１４の処理において、カリキュラムの各主題の特徴語について、あらかじめ用意した主題ごとの特徴的な単語を特徴語ＤＢから取得する処理について説明している。当該処理の特徴語ＤＢから特徴語を取得する処理の代わりに、特徴語をカリキュラムの文書から自動生成してもよい。この場合、主題毎に分類したカリキュラムの文書を対象に形態素解析を行い、TF・IDFなどの頻度を基準にした手法や相互情報量やカイ二乗検定などの単語の共起頻度を基準にした手法を用いることで実現できる。 (3) In the present embodiment, with respect to the feature words of each subject in the curriculum in the processing of Step 903 and Step 914 in FIG. 9, processing for acquiring a characteristic word for each subject prepared in advance from the feature word DB Explains. Instead of the process of acquiring the feature word from the feature word DB of the process, the feature word may be automatically generated from the curriculum document. In this case, a morphological analysis is performed on curriculum documents classified by subject, and a method based on the frequency of TF / IDF or a method based on the co-occurrence frequency of words such as mutual information and chi-square test. This can be realized by using

（４）本発明は、実施形態の機能を実現するソフトウェアのプログラムコードによっても実現できる。この場合、プログラムコードを記録した記憶媒体をシステム或は装置に提供し、そのシステム或は装置のコンピュータ（又はＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出す。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコード自体、及びそれを記憶した記憶媒体は本発明を構成することになる。このようなプログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどが用いられる。 (4) The present invention can also be realized by software program codes that implement the functions of the embodiments. In this case, a storage medium in which the program code is recorded is provided to the system or apparatus, and the computer (or CPU or MPU) of the system or apparatus reads the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the program code itself and the storage medium storing the program code constitute the present invention. As a storage medium for supplying such program code, for example, a flexible disk, CD-ROM, DVD-ROM, hard disk, optical disk, magneto-optical disk, CD-R, magnetic tape, nonvolatile memory card, ROM Etc. are used.

また、プログラムコードの指示に基づき、コンピュータ上で稼動しているＯＳ（オペレーティングシステム）などが実際の処理の一部又は全部を行い、その処理によって前述した実施の形態の機能が実現されるようにしてもよい。さらに、記憶媒体から読み出されたプログラムコードが、コンピュータ上のメモリに書きこまれた後、そのプログラムコードの指示に基づき、コンピュータのＣＰＵなどが実際の処理の一部又は全部を行い、その処理によって前述した実施の形態の機能が実現されるようにしてもよい。 Also, based on the instruction of the program code, an OS (operating system) running on the computer performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing. May be. Further, after the program code read from the storage medium is written in the memory on the computer, the computer CPU or the like performs part or all of the actual processing based on the instruction of the program code. Thus, the functions of the above-described embodiments may be realized.

さらに、実施の形態の機能を実現するソフトウェアのプログラムコードを、ネットワークを介して配信することにより、それをシステム又は装置のハードディスクやメモリ等の記憶手段又はＣＤ−ＲＷ、ＣＤ−Ｒ等の記憶媒体に格納し、使用時にそのシステム又は装置のコンピュータ（又はＣＰＵやＭＰＵ）が当該記憶手段や当該記憶媒体に格納されたプログラムコードを読み出して実行するようにしても良い。 Further, by distributing the program code of the software that realizes the functions of the embodiment via a network, it is stored in a storage means such as a hard disk or memory of a system or apparatus, or a storage medium such as a CD-RW or CD-R And the computer (or CPU or MPU) of the system or apparatus may read and execute the program code stored in the storage means or the storage medium when used.

最後に、ここで述べたプロセス及び技術は本質的に如何なる特定の装置に関連することはなく、コンポーネントの如何なる相応しい組み合わせによってでも実装できることを理解する必要がある。更に、汎用目的の多様なタイプのデバイスがここで記述した教授に従って使用可能である。ここで述べた方法のステップを実行するのに、専用の装置を構築するのが有益であることが判るかもしれない。また、実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。本発明は、具体例に関連して記述したが、これらは、すべての観点に於いて限定の為ではなく説明の為である。本分野にスキルのある者には、本発明を実施するのに相応しいハードウェア、ソフトウェア、及びファームウエアの多数の組み合わせがあることが解るであろう。例えば、記述したソフトウェアは、アセンブラ、Ｃ／Ｃ＋＋、ｐｅｒｌ、Ｓｈｅｌｌ、ＰＨＰ、Ｊａｖａ（登録商標）等の広範囲のプログラム又はスクリプト言語で実装できる。 Finally, it should be understood that the processes and techniques described herein are not inherently related to any particular apparatus, and can be implemented by any suitable combination of components. In addition, various types of devices for general purpose can be used in accordance with the teachings described herein. It may prove useful to build a dedicated device to perform the method steps described herein. Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiments. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined. Although the present invention has been described with reference to specific examples, these are in all respects illustrative rather than restrictive. Those skilled in the art will appreciate that there are numerous combinations of hardware, software, and firmware that are suitable for implementing the present invention. For example, the described software can be implemented in a wide range of programs or script languages such as assembler, C / C ++, perl, shell, PHP, Java (registered trademark).

さらに、上述の実施形態において、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。全ての構成が相互に接続されていても良い。 Furthermore, in the above-described embodiment, control lines and information lines are those that are considered necessary for explanation, and not all control lines and information lines on the product are necessarily shown. All the components may be connected to each other.

加えて、本技術分野の通常の知識を有する者には、本発明のその他の実装がここに開示された本発明の明細書及び実施形態の考察から明らかになる。記述された実施形態の多様な態様及び／又はコンポーネントは、データを管理する機能を有するコンピュータ化ストレージシステムに於いて、単独又は如何なる組み合わせでも使用することが出来る。明細書と具体例は典型的なものに過ぎず、本発明の範囲と精神は後続する請求範囲で示される。 In addition, other implementations of the invention will be apparent to those skilled in the art from consideration of the specification and embodiments of the invention disclosed herein. Various aspects and / or components of the described embodiments can be used singly or in any combination in a computerized storage system capable of managing data. The specification and specific examples are merely exemplary, and the scope and spirit of the invention are indicated in the following claims.

１００・・・カリキュラムＤＢ
１０１・・・特徴語ＤＢ
１０２・・・コンテンツ情報ＤＢ
１０３・・・中央処理装置
１０４・・・プログラムメモリ
１０５・・・データメモリ
１０６・・・表示装置
１０７・・・入力装置
１０８・・・ファイルサーバ
１０９・・・登録コンテンツ選択画面表示部
１１０・・・コンテンツ情報抽出処理部
１１１・・・メタデータ推測処理部
１１２・・・メタデータ入力画面表示部
１１３・・・カリキュラム
１１４・・・特徴語
１１５・・・コンテンツ情報 100 ... Curriculum DB
101 ... feature word DB
102 ... Content information DB
103 ... Central processing unit 104 ... Program memory 105 ... Data memory 106 ... Display device 107 ... Input device 108 ... File server 109 ... Registered content selection screen display unit 110 ... Content information extraction processing unit 111: Metadata estimation processing unit 112 ... Metadata input screen display unit 113 ... Curriculum 114 ... Feature word 115 ... Content information

Claims

A content management device that provides metadata of content as support information when a user registers content,
A plurality of types of content management information prepared in advance and used for classifying the content when registering the content, and feature word information prepared in advance associated with the content management information. A storage device for storing;
A processor that acquires and outputs metadata of content to be registered as the support information based on the content management information and the feature word information stored in the storage device;
The content management information includes a description regarding a content category,
The processor is
A process of extracting a keyword included in the content to be registered;
Calculating the appearance frequency of the extracted keyword in the description of the content management information, and scoring the plurality of types of content management information based on the appearance frequency;
Processing for obtaining feature word information associated with content management information having the highest score value obtained by the scoring process, and outputting the feature word information as the support information;
The content management apparatus characterized by performing.

In claim 1,
In the process of scoring when the extracted keyword appears across a plurality of pieces of content management information,
Calculating a first evaluation value indicating whether or not the extracted keyword is characteristic and relatively important;
Calculating a second evaluation value obtained by normalizing the appearance frequency of the extracted keyword in each content management information;
The content management apparatus, wherein the score value of each content management information is calculated by multiplying the first evaluation value and the second evaluation value.

In claim 2,
The processor is
The first evaluation value is calculated by dividing the total number of content management information by the number of content management information in which the extracted keyword appears,
The content management characterized in that the second evaluation value is calculated by dividing the appearance frequency of the extracted keyword in each content management information by the appearance frequency of the extracted keyword in the entire content management information apparatus.

In claim 1,
The processor generates and outputs a GUI for enabling a user to select feature word information associated with the content management information having the highest score value in the output process. Management device.

In claim 4,
The processor further includes:
For a plurality of feature words included in the feature word information associated with the content management information having the highest score value, it is determined whether or not there is an inclusion relationship with the extracted keyword. Execute processing to add flag information indicating that the feature word information is included in the inclusion relationship,
In the output process, the processor outputs the GUI so that the feature words of the feature word information to which the flag information is assigned are output separately from the feature words of the feature word information not having the flag information. A content management apparatus that generates the content management apparatus.

In claim 1,
The content management information includes a plurality of types of hierarchical category information used when classifying content,
The plurality of types of hierarchized category information includes large item category information for classifying content in a first category, and content classified in the first category, further subdivided into a second category. Medium item category information, and sub-category category information for subdividing and classifying the content classified in the second category into a third category,
The content management apparatus, wherein the processor narrows down content management information that is a target for counting the appearance frequency of the extracted keyword based on category information designated by the user.

A content management device according to claim 1;
At least one client terminal device having a display device and an input device;
The client terminal device transmits content to be registered designated by the input device to the content management device,
The content management device transmits the support information to the client terminal device;
The content management system, wherein the display device of the client terminal device displays support information received from the content management device.

A content management method for providing metadata of content as support information when a user registers content,
When a processor of a content management apparatus registers the content, it is used to classify the content, and a plurality of types of content management information prepared in advance and associated with each content management information are prepared in advance. Providing a storage device for storing the feature word information;
The processor acquires the metadata of the content to be registered as the support information based on the content management information and the feature word information stored in the storage device, and outputs the support metadata;
The content management information includes a description regarding a content category,
In the outputting step, the processor includes:
A process of extracting a keyword included in the content to be registered;
Calculating the appearance frequency of the extracted keyword in the description of the content management information, and scoring the plurality of types of content management information based on the appearance frequency;
Processing for obtaining feature word information associated with content management information having the highest score value obtained by the scoring process, and outputting the feature word information as the support information;
The content management method characterized by performing.

In claim 8,
In the process of scoring when the extracted keyword appears across a plurality of pieces of content management information,
Calculating a first evaluation value indicating whether or not the extracted keyword is characteristic and relatively important;
Calculating a second evaluation value obtained by normalizing the appearance frequency of the extracted keyword in each content management information;
The content management method, wherein the score value of each content management information is calculated by multiplying the first evaluation value and the second evaluation value.

In claim 9,
The processor is
The first evaluation value is calculated by dividing the total number of content management information by the number of content management information in which the extracted keyword appears,
The content management characterized in that the second evaluation value is calculated by dividing the appearance frequency of the extracted keyword in each content management information by the appearance frequency of the extracted keyword in the entire content management information Method.

In claim 8,
The processor generates and outputs a GUI for enabling a user to select feature word information associated with the content management information having the highest score value in the output process. Management method.

In claim 11,
The processor further includes:
For a plurality of feature words included in the feature word information associated with the content management information having the highest score value, it is determined whether or not there is an inclusion relationship with the extracted keyword. Execute processing to add flag information indicating that the feature word information is included in the inclusion relationship,
In the output process, the processor outputs the GUI so that the feature words of the feature word information to which the flag information is assigned are output separately from the feature words of the feature word information not having the flag information. A content management method comprising generating the content management method.

In claim 8,
The content management information includes a plurality of types of hierarchical category information used when classifying content,
The plurality of types of hierarchized category information includes large item category information for classifying content in a first category, and content classified in the first category, further subdivided into a second category. Medium item category information, and sub-category category information for subdividing and classifying the content classified in the second category into a third category,
The method further comprises:
The content management method comprising: a step of narrowing down the content management information to be counted by the processor based on the category information designated by the user.

A program for causing a computer to execute the content management method according to claim 8.

A computer-readable storage medium storing a program for causing a computer to execute the content management method according to claim 8.