JP2008046738A

JP2008046738A - Life record creation system and its control method

Info

Publication number: JP2008046738A
Application number: JP2006219734A
Authority: JP
Inventors: Atsushi Chazono; 篤茶園
Original assignee: SoftBank Mobile Corp
Current assignee: SoftBank Corp
Priority date: 2006-08-11
Filing date: 2006-08-11
Publication date: 2008-02-28
Anticipated expiration: 2026-08-11
Also published as: JP4937671B2

Abstract

<P>PROBLEM TO BE SOLVED: To create a Web page by directly inputting data from such a job site as a trip destination without necessitating any knowledge about Web page creation. <P>SOLUTION: This life record creation system is provided with a server which provides a guidance to a user who wants the creation of life records such as a diary or trip records, and a user transmits characters, voices, and videos to the server by using a portable telephone set according to the guidance provided by the server. The server determines the genre and concept of information from the location information of a portable telephone set and the transmitted information or the like, and creates a Web page according to a predetermined format. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は，日記、旅行記録、生活記録の作成を支援するシステム及びその制御方法に関するものである。 The present invention relates to a system that supports creation of a diary, a travel record, and a life record, and a control method thereof.

映像あるいは音声を含む日記、旅行記録等の生活記録を電子データで作成し、ウェブ・ページとして公開することが広く行われている。日記等の生活記録をウェブ・ページとして公開するには，電子データをHTML（hyper text markup language）で記述し、利用者のパーソナルコンピュータからウェブ上のサーバにアップロードする。従って、旅行先で撮影した映像、あるいは音声をウェブ・ページで公開するには、自宅に戻り、映像及び音声と説明文を編集しHTMLによるウェブ・ページを作成する必要があり、大きな手間と、高度な技術および知識が必要である。
映像あるいは音声による情報を含んだウェブ・ページの作成に関する文献には以下のものがある。
特開2003-173309号公報特開2004-32129号公報 It is widely practiced to create a life record such as a diary including video or audio, a travel record, etc. as electronic data and publish it as a web page. To publish life records such as diaries as web pages, electronic data is described in HTML (hyper text markup language) and uploaded from a user's personal computer to a server on the web. Therefore, in order to publish video or audio taken at a travel destination on a web page, it is necessary to return to home, edit the video, audio and explanatory text to create a web page in HTML, Advanced skills and knowledge are required.
References related to the creation of web pages that contain video or audio information include:
JP 2003-173309 A JP 2004-32129 A

本発明は、日記、旅行記録等の生活に関する記録の作成を希望する利用者に、ウェブ・ページとして公開可能な形式の生活記録を簡単な操作で作成可能とするシステム及びその制御方法を提供するものである。 The present invention provides a system that allows a user who desires to create a life-related record, such as a diary or a travel record, to create a life record in a format that can be disclosed as a web page with a simple operation, and a control method therefor. Is.

本発明に係るシステムはガイダンスを提供するサーバを有しており、利用者はサーバが提供するガイダンスに従って携帯電話機(移動端末)を用いて文字、音声、映像をサーバに送信する。サーバは携帯電話機の位置情報、ユーザが送信した情報等から、ユーザが希望する記録のジャンル、概要等を決定し、予め決められている書式に従ってウェブ・ページを作成する。 The system according to the present invention includes a server that provides guidance, and a user transmits text, audio, and video to the server using a mobile phone (mobile terminal) according to the guidance provided by the server. The server determines the genre and outline of the recording desired by the user from the location information of the mobile phone, information transmitted by the user, etc., and creates a web page according to a predetermined format.

本発明のシステムは、日記の対象となる現場、あるいは旅行先等で、サーバからの指示に従って、携帯電話機のカメラにより撮影された映像、あるいはマイクから入力された音声をサーバに送信する。サーバは送信された映像あるいは音声から必要な情報を作成してウェブ・ページを構成する。従って、ユーザはウェブ・ページに関する知識を必要とせず、旅行先等の現場から直接データを入力してウェブ・ページを作成することが可能となる。 The system according to the present invention transmits video captured by a camera of a mobile phone or audio input from a microphone to a server in accordance with an instruction from the server at a site to be a diary target or a travel destination. The server creates necessary information from the transmitted video or audio and composes a web page. Therefore, the user does not need knowledge about the web page, and can create a web page by directly inputting data from a site such as a travel destination.

図１は本発明に係るサーバ100の機能と携帯電話機(移動端末)110の概要を示す図である。サーバ100は日記、旅行記録等の生活記録の作成を希望する利用者にガイダンスを提供するものである。利用者はサーバ100が提供するガイダンスに従って携帯電話機110のカメラ、マイク、あるいはキーボートを用いて映像、音声、文字をサーバ100に送信する。サーバ100は送信された音声を音声認識手段よりテキストデータに変換し、含まれる単語の頻度を算出してユーザが送信した情報のジャンル等を判断する。また、送信された映像を編集してウェブ・ページ用のデータを作成する。 FIG. 1 is a diagram showing the functions of a server 100 and an outline of a mobile phone (mobile terminal) 110 according to the present invention. The server 100 provides guidance to a user who desires to create a life record such as a diary or a travel record. The user transmits video, audio, and text to the server 100 using the camera, microphone, or keyboard of the mobile phone 110 according to the guidance provided by the server 100. The server 100 converts the transmitted voice into text data by the voice recognition means, calculates the frequency of the included words, and determines the genre of the information transmitted by the user. The transmitted video is edited to create data for the web page.

生活記録の作成を希望する利用者は携帯電話機110からサーバ100を呼び出し、記録の開始とウェブ・ページの作成を要求する。日記、旅行記等、複数の形式が提供される場合は、サーバ100は利用可能な形式をユーザに示し、ユーザに選択を
要求する。テンプレート記憶部101にはユーザに音声あるいは映像の送信を指示するガイダンスと、ユーザから送信された音声あるいは映像のデータを整形し処理する方法を指定するコマンドが記述されたガイダンス用テンプレートが記憶されている。
サーバ100のテンプレート読み出し部102は、テンプレート記憶部101からユーザが指定した形式のガイダンス用テンプレートを読み出し、テンプレート解釈実行部103に渡す。解釈実行部103は渡されたテンプレートに記述されているコマンドを1つずつ取り出し実行する。 A user who wishes to create a life record calls the server 100 from the mobile phone 110 and requests the start of the record and the creation of a web page. When a plurality of formats such as a diary and a travel diary are provided, the server 100 indicates the available formats to the user and requests the user to select. The template storage unit 101 stores a guidance template in which a guidance for instructing a user to transmit audio or video and a command for specifying a method for shaping and processing audio or video data transmitted from the user are stored. Yes.
The template reading unit 102 of the server 100 reads a guidance template in a format specified by the user from the template storage unit 101 and passes it to the template interpretation execution unit 103. The interpretation execution unit 103 extracts and executes the commands described in the passed template one by one.

テンプレートから取り出して解釈したコマンドが表示部にガイダンス文を表示し、ユーザにキー操作を求めるコマンドの時、サーバ100は通信制御部104を介してガイダンス文とキー入力の指示を携帯電話機110に送信し、キー情報の返信を待つ。携帯電話機110は通信制御部111を介して受信したガイダンス文を表示部113に表示し、ユーザによるキー114の操作を要求する。
コマンドが音声の入力を求めるものの時は、サーバ100は通信制御部104及び111を介して携帯電話機110に音声の入力を要求する。この要求を受けた携帯電話機110は音声入力制御部115によりマイク116から音声を取得して、サーバ100に送信する。コマンドが映像の入力を求めるものの時は、サーバ100は携帯電話機110に映像の入力を要求する。携帯電話機110の映像入力制御部117はカメラ118を制御し、得られた映像データをサーバ100に送信する。
また、テンプレートから取り出されたコマンドがユーザから送信された映像データまたは音声データの加工を指示した時、テンプレート解釈実行部103は整形処理部105の対応する処理手段に映像データまたは音声データを渡し処理を指示する。
ガイダンス用テンプレートに記述されているコマンドの実行が全て終了すると、サーバ100は整形処理部105が生成したデータを出力形式テンプレートに適用して、ウェブ・ページ用のHTMLデータを生成しデータ記憶部106に記憶する。データ記憶部106に記憶されたデータは、インターネット出力部107により適宜ウェブ・ページとして公開される。 When the command taken from the template and interpreted displays a guidance sentence on the display unit and requests a user to perform a key operation, the server 100 transmits a guidance sentence and a key input instruction to the mobile phone 110 via the communication control unit 104. And wait for a reply of key information. The mobile phone 110 displays the guidance text received via the communication control unit 111 on the display unit 113, and requests the user to operate the key 114.
When the command requires voice input, the server 100 requests the mobile phone 110 to input voice via the communication control units 104 and 111. Upon receiving this request, the cellular phone 110 acquires voice from the microphone 116 by the voice input control unit 115 and transmits it to the server 100. When the command requires video input, the server 100 requests the mobile phone 110 to input video. The video input control unit 117 of the mobile phone 110 controls the camera 118 and transmits the obtained video data to the server 100.
In addition, when the command extracted from the template instructs to process the video data or audio data transmitted from the user, the template interpretation execution unit 103 passes the video data or audio data to the corresponding processing means of the shaping processing unit 105 and performs processing. Instruct.
When all the commands described in the guidance template have been executed, the server 100 applies the data generated by the shaping processing unit 105 to the output format template to generate HTML data for the web page, and the data storage unit 106 To remember. The data stored in the data storage unit 106 is appropriately disclosed as a web page by the Internet output unit 107.

図2はサーバ100の概要図であり、特に整形処理部105の機能構成を示す図である。各部の機能は以下のとおりである。
通信管理部104：ネットワーク109を介して携帯電話機(移動端末)110とデータの授受を行う。
テンプレート解釈実行部103：端末110からの要求により、図示されていないテンプレート記憶部(図１の101)からガイダンス用テンプレートを読み出し、テンプレートに記述されているコマンドを順次実行する。
ユーザ情報取得部131：ユーザ特定部132、時間取得部133、及び、位置取得部134から構成されている。ユーザ特定部132は要求元の端末110の情報、あるいはユーザのログインID等からユーザを特定し、データベースから生活記録に利用するユーザ情報を取得する。時間取得部133はサーバ100または端末110の時刻機構から現在の時刻を取得する。位置取得部134は端末110のGPS機能、あるいは屋内外のビーコン等から取得した情報により端末110の位置情報を取得する。位置情報に基づいて時間取得部133が取得した時刻を現地の時刻に変更する構成も可能である。 FIG. 2 is a schematic diagram of the server 100, and particularly shows a functional configuration of the shaping processing unit 105. The function of each part is as follows.
Communication management unit 104: exchanges data with the mobile phone (mobile terminal) 110 via the network 109.
Template interpretation execution unit 103: In response to a request from the terminal 110, a guidance template is read from a template storage unit (101 in FIG. 1) not shown, and commands described in the template are sequentially executed.
User information acquisition unit 131: a user identification unit 132, a time acquisition unit 133, and a position acquisition unit 134. The user specifying unit 132 specifies a user from information on the requesting terminal 110, a user's login ID, or the like, and acquires user information used for life records from a database. The time acquisition unit 133 acquires the current time from the time mechanism of the server 100 or the terminal 110. The position acquisition unit 134 acquires the position information of the terminal 110 from information acquired from the GPS function of the terminal 110 or an indoor or outdoor beacon. A configuration in which the time acquired by the time acquisition unit 133 based on the position information is changed to the local time is also possible.

通話情報記録部135：テンプレート解釈実行部103が、ガイダンス用テンプレートから音声あるいは映像を取得するコマンドを実行した時に機能するものであり、音声記録部136と映像記録部137からなっている。音声または映像を取得するコマンドが実行されると、サーバ100は端末110にマイクからの音声、またはカメラにより撮影された映像の送信を指示すると共に、アナログ通話記録格納部138に送信されてきた音声データあるいは映像データの記録を指示する。
アナログ通話記録格納部138：端末110から送信された音声データと映像データを記録する手段であり、アナログ音声データ用記録部139とアナログ映像データ用記録部140からなっている。
通話情報符号化部141：端末110から送信され、アナログ通話記録格納部138に記録された音声データと映像データをディジタル化する部分であり、音声データをディジタル化する音声符号化部142と、映像データをディジタル化する映像符号化部143からなっている。 Call information recording unit 135: This function functions when the template interpretation execution unit 103 executes a command for obtaining audio or video from the guidance template, and includes an audio recording unit 136 and a video recording unit 137. When a command for acquiring audio or video is executed, the server 100 instructs the terminal 110 to transmit audio from the microphone or video taken by the camera, and audio transmitted to the analog call record storage unit 138. Instructs recording of data or video data.
Analog call record storage unit 138: means for recording audio data and video data transmitted from terminal 110, and comprises analog audio data recording unit 139 and analog video data recording unit 140.
Call information encoding unit 141: a part for digitizing audio data and video data transmitted from the terminal 110 and recorded in the analog call record storage unit 138, an audio encoding unit 142 for digitizing the audio data, and a video It comprises a video encoding unit 143 for digitizing data.

ディジタル通話記録格納部145：通話情報符号化部134が音声データと映像データをディジタル化したデータを記録する部分であり、ディジタル音声データ用記録部146とディジタル映像データ用記録部147からなっている。
カテゴリデータ部148：カテゴリ分類部154がテキストのカテゴリを推定する時に使用するデータを記憶する部分であり、カテゴリ定義データ149と類義語データ150からなっている。類義語データ150は類義語のデータベースであり、出現頻度等の解析等のため類似の意義をもつ語を統一された語に変換するためのものである。カテゴリ定義データ149は各分野で使用される確率の高い単語の情報を記憶するデータベースであり、解析対象のテキストに含まれる単語の分布あるいは頻度からそのテキストが属するカテゴリを決定するために使用される。 Digital call record storage unit 145: The call information encoding unit 134 records data obtained by digitizing audio data and video data, and includes a digital audio data recording unit 146 and a digital video data recording unit 147. .
Category data unit 148: This is a part that stores data used when the category classification unit 154 estimates the category of text, and includes category definition data 149 and synonym data 150. The synonym data 150 is a database of synonyms, and is used to convert words having similar significance into unified words for analysis of appearance frequency and the like. The category definition data 149 is a database that stores information on words that have a high probability of being used in each field, and is used to determine a category to which the text belongs from the distribution or frequency of words included in the text to be analyzed. .

メディア解析部151：ディジタル音声データとディジタル映像データから生活記録を作成するために必要となるデータを取得する部分である。ディジタル音声データは音声認識部152によりテキストデータに変換され、テキスト解析部153により形態解析、意味解析等の解析が行われる。この解析によりテキストから名詞あるいは動詞等、重要な単語が抽出される。カテゴリ分類部154は抽出された単語の出現頻度等をカテゴリデータ部148に記憶されている単語情報と比較照合しテキストが属するカテゴリを決定する。
メディア分解部155は映像データに含まれる映像と音声を分離し、音声データを音声認識部152に渡しカテゴリの決定に利用する。分離された映像データは映像解析部156に渡される。映像解析部156は映像の変化量を検出し、変化量が大きな時点をシーンの変更点とみなして映像を分割する。また、画像抽出部157はシーンの先頭部分、あるいは映像の持続時間等から映像の特徴点を抽出して静止画を作成する。 Media analysis unit 151: A part for acquiring data necessary for creating a life record from digital audio data and digital video data. The digital speech data is converted into text data by the speech recognition unit 152, and analysis such as morphological analysis and semantic analysis is performed by the text analysis unit 153. This analysis extracts important words such as nouns and verbs from the text. The category classification unit 154 compares the extracted word appearance frequency and the like with the word information stored in the category data unit 148 to determine the category to which the text belongs.
The media decomposing unit 155 separates the video and audio included in the video data, passes the audio data to the audio recognizing unit 152, and uses it for category determination. The separated video data is passed to the video analysis unit 156. The video analysis unit 156 detects the amount of change in the video and divides the video by regarding a time point when the change is large as a scene change point. Further, the image extraction unit 157 extracts a feature point of the video from the head part of the scene or the duration of the video and creates a still image.

出力形式テンプレート162：日記あるいは旅行記録をウェブ・ページとして公開するための形式を定義するテンプレート、あるいは、それらを印刷するためのテンプレートが記憶されている。
出力部158：出力形式解析部160と出力整形部161から構成されている。出力形式解析部160はユーザが要求した形式、例えば、日記か旅行記録か、ウェブに公開するか、印刷するか等に従って、出力形式テンプレート162に記憶されているテンプレートから対応するものを読み出し解析する。例えば、旅行記録をウェブ・ページとして公開可能する場合、この解析により、整形処理部105が作成したタイトル文、要約、静止画等を、HTMLにより記述されデータに組み込み、ウェブ・ページとして公開可能な形式に変換される。
変換されたデータは、出力データ記憶部163に保管され公開される。 Output format template 162: A template for defining a format for publishing a diary or a travel record as a web page, or a template for printing them is stored.
Output unit 158: an output format analysis unit 160 and an output shaping unit 161. The output format analysis unit 160 reads and analyzes the corresponding one from the templates stored in the output format template 162 according to the format requested by the user, for example, diary or travel record, published on the web, or printed. . For example, when it is possible to publish a travel record as a web page, the analysis allows the title sentence, summary, still image, etc. created by the shaping processing unit 105 to be described in HTML and embedded in the data, and published as a web page. Converted to format.
The converted data is stored in the output data storage unit 163 and released.

図3はユーザが携帯電話機300からシステム301のサーバ100にアクセスして、生活記録、例えば旅行記録を登録し、ウェブ・ページとして公開する過程を示すものである。
ステップ302：ユーザはシステム301のサーバ100に携帯電話機あるいはTV電話機からアクセスし生活記録の作成を依頼する。サーバ100は発信元の電話番号、ユーザが入力したユーザコード、暗証番号からユーザの正当性を認証する。日記形式、旅行記形式等、複数の形式の生活記録が可能、あるいは、動画の有無、音声の有無等複数の態様が可能であれば、ユーザに利用可能な形式を提示し、希望する形式の選択を要求する。
ステップ303：ユーザが選択した形式のウェブ・ページに対応したガイダンス用テンプレートがテンプレート記憶部101から読み出され、テンプレート解釈実行部103に渡される。テンプレート解釈実行部103は渡されたガイダンス用テンプレートに記述されているコマンドを1つずつ実行する。ユーザ情報はユーザ特定部132により、現在の時刻は時間取得部133により、ユーザの位置情報は位置取得部134により取得され記憶される。またに音声データは音声記録部136により取得され、アナログ音声データ記録部139に記録され、映像データは映像記録部137により取得されアナログ映像データ記録部140に記録される。 FIG. 3 shows a process in which a user accesses the server 100 of the system 301 from the mobile phone 300, registers a life record, for example, a travel record, and publishes it as a web page.
Step 302: The user accesses the server 100 of the system 301 from a mobile phone or a TV phone and requests creation of a life record. The server 100 authenticates the validity of the user from the telephone number of the caller, the user code input by the user, and the password. If you can record life in multiple formats, such as diary format, travel diary format, etc., or if there are multiple modes, such as the presence or absence of video, the presence or absence of audio, present the format available to the user, Request selection.
Step 303: A guidance template corresponding to the web page in the format selected by the user is read from the template storage unit 101 and passed to the template interpretation execution unit 103. The template interpretation execution unit 103 executes the commands described in the received guidance template one by one. The user information is acquired and stored by the user specifying unit 132, the current time is acquired by the time acquiring unit 133, and the user position information is acquired by the position acquiring unit 134. Audio data is acquired by the audio recording unit 136 and recorded in the analog audio data recording unit 139, and video data is acquired by the video recording unit 137 and recorded in the analog video data recording unit 140.

ステップ304：ガイダンス用テンプレートに記述された全てのコマンドの実行が終了すると、サーバ100はユーザに通信の終了を通知する。ユーザは電話の接続を切断する。
ステップ305：サーバ100はアナログ音声データ記録部139及びアナログ映像データ記録部140に記録されている音声データ及び映像データをウェブ・ページとして公開可能なディジタル形式のデータに変換する。音声データは音声符号化部142によりディジタル化され、ディジタル音声データ記録部146に記録され、映像データは映像符号化部143によりディジタル化され、ディジタル映像データ記録部147に記録される。
ステップ306：メディア解析部151がディジタル化された音声データと映像データを解析する。音声データはテキストデータに変換され、公開する記録のジャンル、要約等が作成される。また、映像データから映像を特徴付ける静止画等が作成される。
ステップ307：ステップ306で作成された要約、静止画、映像データ等をウェブ・ページ用のテンプレートに合成し、ウェブ・ページとして公開可能なデータを作成する。
ステップ308：作成されたウェブ・ページ・データをウェブ・ページ・サーバに登録して公開する。
なお、図3の例は、ステップ303で携帯電話機から音声と映像を取得し、ステップ304で電話の接続を切断した後に、ステップ305からステップ307で取得した音声と映像を加工し、ウェブ・ページ用のデータを作成する手順となっている。しかし、サーバ100が取得した音声と映像の処理を高速に行うことが可能であれば、ステップ307の処理が終了した時に電話の接続を切断する手順とすることが可能である。 Step 304: When the execution of all the commands described in the guidance template is completed, the server 100 notifies the user of the end of communication. The user disconnects the telephone connection.
Step 305: The server 100 converts the audio data and video data recorded in the analog audio data recording unit 139 and the analog video data recording unit 140 into digital format data that can be published as a web page. The audio data is digitized by the audio encoding unit 142 and recorded in the digital audio data recording unit 146, and the video data is digitized by the video encoding unit 143 and recorded in the digital video data recording unit 147.
Step 306: The media analysis unit 151 analyzes the digitized audio data and video data. The audio data is converted into text data, and the genre, summary, etc. of the record to be released are created. In addition, a still image or the like that characterizes the video is created from the video data.
Step 307: The summary, still image, video data, etc. created in step 306 are combined with a web page template to create data that can be published as a web page.
Step 308: Register and publish the created web page data in the web page server.
In the example of FIG. 3, the audio and video are acquired from the mobile phone in step 303, the telephone connection is disconnected in step 304, and then the audio and video acquired in step 305 to step 307 are processed to obtain a web page. It is a procedure to create data for. However, if the processing of the audio and video acquired by the server 100 can be performed at high speed, a procedure for disconnecting the telephone connection when the processing of step 307 is completed can be used.

図3の310は、公開されるウェブ・ページの例である。311は、時間取得部133と位置取得部134により取得された時間と位置、メディア解析部151により得られたタイトル・見出し情報である。312はテキスト解析部153によりユーザの音声データから作成された要約文である。313はユーザが携帯電話機のカメラにより撮影した映像データから画像抽出部157が作成した静止画像であり、映像データへリンクが張られている。314は、テキスト解析部153によりユーザの音声データから作成されたテキストに多く出現する単語のリストであり、ホームページのウェブ検索等に利用される。315は音声ファイルへのリンクである。 310 in FIG. 3 is an example of a published web page. Reference numeral 311 denotes the time and position acquired by the time acquisition unit 133 and the position acquisition unit 134, and title / heading information acquired by the media analysis unit 151. Reference numeral 312 denotes a summary sentence created from the user's voice data by the text analysis unit 153. Reference numeral 313 denotes a still image created by the image extraction unit 157 from video data taken by the user with the camera of the mobile phone, and is linked to the video data. 314 is a list of words that frequently appear in the text created from the user's voice data by the text analysis unit 153, and is used for web search of a homepage. Reference numeral 315 denotes a link to an audio file.

図4は整形処理部105における処理の流れを示す図である。
ステップ400：テンプレート解釈実行部103がユーザにより指定された形式に対応するガイダンス用テンプレートをテンプレート記憶部101から読み出し、テンプレートに記述されているコマンドの実行を開始する。
ステップ401：ユーザ情報取得部131のユーザ特定部132、時間取得部133、位置取得部134が各々ユーザ情報、時刻情報、端末110の位置情報を取得し、所定の記憶場所に記憶する。
ステップ402：音声データと映像データを取得するコマンドを実行したテンプレート解釈実行部103は、ユーザに音声と映像の送信を要求すると共に、通話情報記録部132にユーザが送信した音声データと映像データの取得を指示する。通話情報記録部132は携帯電話機から送信されて来た音声データと映像データを各々音声記録部136と映像録部137により受信し、アナログ通話記録格納部138のアナログ音声データ用記録部139とアナログ映像データ用記録部140に記録する。ユーザが音声及び映像の終了を指示すると記録動作は終了する。
ステップ403：音声データと映像データの取得を終了すると、テンプレート解釈実行部103は通話情報符号化部141にアナログ通話記録格納部138に記録された音声データと映像データのディジタル化を指示する。通話情報符号化部141の音声符号化部142はアナログ音声データ記録部139に記録された音声データをインターネットのウェブ・ページに公開可能な形式のディジタルデータに変換し、同じく、映像符号化部143はアナログ映像データ記録部140に記録された映像データをインターネットのウェブ・ページに公開可能な形式のディジタルデータに変換する。 FIG. 4 is a diagram showing a flow of processing in the shaping processing unit 105.
Step 400: The template interpretation execution unit 103 reads the guidance template corresponding to the format specified by the user from the template storage unit 101, and starts executing the command described in the template.
Step 401: The user identification unit 132, time acquisition unit 133, and position acquisition unit 134 of the user information acquisition unit 131 acquire user information, time information, and position information of the terminal 110, respectively, and store them in a predetermined storage location.
Step 402: The template interpretation execution unit 103 that has executed the command for acquiring the audio data and the video data requests the user to transmit the audio and the video, and transmits the audio data and the video data transmitted by the user to the call information recording unit 132. Instruct acquisition. The call information recording unit 132 receives the audio data and the video data transmitted from the mobile phone by the audio recording unit 136 and the video recording unit 137, respectively, and the analog audio data recording unit 139 of the analog call recording storage unit 138 and the analog data Recorded in the video data recording unit 140. When the user instructs the end of audio and video, the recording operation ends.
Step 403: When the acquisition of the audio data and video data is completed, the template interpretation execution unit 103 instructs the call information encoding unit 141 to digitize the audio data and video data recorded in the analog call record storage unit 138. The audio encoding unit 142 of the call information encoding unit 141 converts the audio data recorded in the analog audio data recording unit 139 into digital data in a format that can be disclosed on the Internet web page, and similarly, the video encoding unit 143 Converts the video data recorded in the analog video data recording unit 140 into digital data in a format that can be disclosed on a web page on the Internet.

ステップ404：メディア解析部151は、音声認識部152によりディジタル化された音声データ、及び映像データから抽出されたディジタル音声データを音声認識手段によりテキストデータに変換する。
ステップ405：メディア解析部151のテキスト解析部153は、文法解析あるいは意味解析等の形態解析の手法を用いて音声認識部152が生成したテキストデータを解析し、テキストを構成している単語を抽出する。
ステップ406：カテゴリデータ部148には、意味が同じあるいは非常に近い語に関するデータベースである類義語データ150が設けられている。テキスト解析部153は類義語データ150を参照して、テキストデータから抽出された各単語を統一された単語に変換し、各単語の出現頻度を計算する。
ステップ407：ステップ406でテキストデータから算出した出現頻度の高い単語を多く含む部分を抽出し、要約文とする。本実施例では単語の出現頻度に基づいて要約文を作成する方法を用いた。しかし、長文のテキストデータから要約文を作成する多くの方法が提案されており、採用する方法は適宜選択可能である。
ステップ408：カテゴリデータ部148には、例えば「旅行」「ヨーロッパ」等のカテゴリと、各カテゴリに属する単語からなるカテゴリ定義データ149が設けられている。テキスト解析部153はテキストデータから抽出された単語を多く含むカテゴリを所定数抽出して、作成するウェブ・ページのカテゴリとする。図2に示されるカテゴリデータ部184は、例えば「ホテル」、「旅館」、「コテージ」等を「ホテル」に統一する類義語データベース150と、「旅行」を「ホテル」「飛行機」「観光」等の語により定義するカテゴリ定義データベース149により構成するものであるが、両データベースを1つのデータベースに統合することも可能である。図5-1に、類義語の定義と、カテゴリの定義を同時に行う例が示されている。同図では、例えば、「海外旅行」「家族旅行」「新婚旅行」等の類義語を代表する語である「旅行」を直接カテゴリとしている。 Step 404: The media analysis unit 151 converts the voice data digitized by the voice recognition unit 152 and the digital voice data extracted from the video data into text data by voice recognition means.
Step 405: The text analysis unit 153 of the media analysis unit 151 analyzes the text data generated by the speech recognition unit 152 using a form analysis method such as grammatical analysis or semantic analysis, and extracts words constituting the text. To do.
Step 406: The category data unit 148 is provided with synonym data 150 which is a database relating to words having the same meaning or very close to each other. The text analysis unit 153 refers to the synonym data 150, converts each word extracted from the text data into a unified word, and calculates the appearance frequency of each word.
Step 407: A portion including many words having a high appearance frequency calculated from the text data in step 406 is extracted and used as a summary sentence. In this embodiment, a method of creating a summary sentence based on the appearance frequency of words is used. However, many methods for creating a summary sentence from long text data have been proposed, and the method to be adopted can be selected as appropriate.
Step 408: The category data unit 148 is provided with category definition data 149 including categories such as “travel” and “Europe” and words belonging to each category. The text analysis unit 153 extracts a predetermined number of categories including many words extracted from the text data, and sets them as the category of the web page to be created. The category data section 184 shown in FIG. 2 includes, for example, a synonym database 150 that unifies “hotel”, “inn”, “cottage”, etc. into “hotel”, and “travel” as “hotel”, “airplane”, “tourism”, etc. However, it is also possible to integrate both databases into one database. Figure 5-1 shows an example of synonym definition and category definition at the same time. In the figure, for example, “travel”, which is a word representing synonyms such as “overseas travel”, “family travel”, and “honeymoon”, is directly used as a category.

ステップ409：メディア分解部155はディジタル化された映像データから音声部分を抽出して音声認識部152に渡し、ステップ404以降の処理を依頼すると共に、映像データの解析を行う。解析のフローが図5-2(a)に示されている。この処理は、画像の変化が大きい箇所をシーンの切り替わり部分とみなして、映像データをシーンに分割するものである。 Step 409: The media decomposing unit 155 extracts an audio part from the digitized video data, passes it to the audio recognizing unit 152, requests processing after step 404, and analyzes the video data. The analysis flow is shown in Fig. 5-2 (a). In this process, video data is divided into scenes by regarding a portion where the image change is large as a scene switching portion.

図5-2(a)のフローの概要は以下の通りである。
ステップ521でディジタル映像データの先頭に第１のポインタを設定し、ステップ522で映像を１画面進める。ステップ523で映像データは終了したが否かを判定し、終了した時はステップ526に進み、処理を終了する。映像データが続く場合はステップ524に進む。ステップ524で、進められた画面と前の画面の変化量を算出し、変化量が所定量以上か否かを判断する。所定量が小さい場合は、進められた画面と前の画面は連続したシーン内と判断し、ステップ52２に戻る。変化量が所定以上在る時はステップ525に進む。現在の画像は新たなシーンの先頭であるため、ステップ525で、ポインタの値を＋１して、現在の画像の位置にポインタを設定する。その後、ステップ522に戻り処理を続ける。
図5-2(b)は映像データとポインタの関係を示す図である。映像データ527の先頭にはボインタ1が設定されておの、変化の大きい映像部分に順次ボインタ2、ボインタ3、・・・、ボインタnが設定されている。ポインタとポインタに挟まれた映像が1つのシーンとみなされる。 The outline of the flow in Fig. 5-2 (a) is as follows.
In step 521, the first pointer is set at the head of the digital video data, and in step 522, the video is advanced by one screen. In step 523, it is determined whether or not the video data has ended. When the video data has ended, the process proceeds to step 526 to end the process. If video data continues, the process proceeds to step 524. In step 524, the amount of change between the advanced screen and the previous screen is calculated, and it is determined whether the amount of change is equal to or greater than a predetermined amount. If the predetermined amount is small, it is determined that the advanced screen and the previous screen are in a continuous scene, and the process returns to step 522. When the amount of change is greater than or equal to the predetermined amount, the process proceeds to step 525. Since the current image is the head of a new scene, in step 525, the pointer value is incremented by 1, and the pointer is set at the position of the current image. Thereafter, the process returns to step 522 to continue the processing.
FIG. 5-2 (b) shows the relationship between video data and pointers. In the beginning of the video data 527, the boyer 1 is set, but in the video part having a large change, the boyer 2, the boyr 3,. The video between the pointer and the pointer is regarded as one scene.

図4の整形処理部105における処理の流れに戻る。
ステップ410：ステップ409の処理で決定されたシーン毎に映像データを分割し、個別のファイルを生成する。
ステップ411：各シーンの開始部分を静止画として取り出し、シーンの代表画像とする。
ステップ412：ウェブ・ページの形態が記述されたウェブ・ページ用のテンプレートに、ステップ407、408、410、411で作成決定された要約文、カテゴリ、シーン毎の映像ファイル、静止画、及びユーザのディジタル音声ファイルを結合してウェブ・ページを作成する。
ステップ413：ステップ412で作成されたHTMLで記述されたウェブ・ページ・データをウェブ・ページ・サーバに登録して、処理を終了する。 Returning to the processing flow in the shaping processing unit 105 in FIG.
Step 410: Divide the video data for each scene determined in the process of Step 409, and generate individual files.
Step 411: The start portion of each scene is taken out as a still image, and used as a representative image of the scene.
Step 412: The web page template in which the form of the web page is described is added to the summary sentence, the category, the video file for each scene, the still image, and the user's created in step 407, 408, 410, 411. Combine digital audio files to create web pages.
Step 413: The web page data described in HTML created in step 412 is registered in the web page server, and the process ends.

図6は、テンプレート解釈実行部103がガイダンス用のテンプレートを実行することにより行われるユーザとの対話の一例であり、図7-1及び図7-2はテンプレート解釈実行部103が実行するガイダンス用のテンプレートの例である。図6のフローを参照してガイダンス用のテンプレートを説明する。 FIG. 6 is an example of a dialog with the user that is performed when the template interpretation execution unit 103 executes a template for guidance. FIGS. 7-1 and 7-2 are diagrams for guidance executed by the template interpretation execution unit 103. It is an example of a template. A guidance template will be described with reference to the flow of FIG.

ステップ600：ユーザがサーバ100にアクセスし、ガイダンス用のテンプレートを決定すると、テンプレート解釈実行部103は決定されたガイダンス用のテンプレートをテンプレート記憶部から読み出しの実行を開始する。ステップ600ではユーザに記録のタイトルの入力を要求する。
(A)テンプレートの開始を宣言するものであり、変数の確保と初期化を行う。
(B)タイトル用の音声と映像を取得するコマンドである。音声による対話方式（dialogue）で、音声をファイル「title_audio」に、映像をファイル「title_video」に取得する。終了は「＃」キーを操作する。
ユーザに対するガイダンスとして、「発信音の後にタイトルを発声してください。終了時には＃ボタンを押してください。」のメッセージを、音声合成機能（TTS）を用いて再生する。 Step 600: When the user accesses the server 100 and determines a guidance template, the template interpretation execution unit 103 starts to read the determined guidance template from the template storage unit. In step 600, the user is requested to input a recording title.
(A) Declare the start of the template and secure and initialize variables.
(B) A command for acquiring audio and video for a title. The voice is acquired in the file “title_audio” and the video is acquired in the file “title_video” by a dialog system by voice. To finish, operate the “#” key.
As a guidance to the user, the message “Please say the title after the dial tone. Press the # button when finished” is played using the speech synthesis function (TTS).

ステップ601：ユーザは携帯電話機のカメラにより対象物を撮影すると共にタイトル「はじめてのかいがいりょこう」を発声する。終了時に「＃」キーを操作する。サーバは、時刻、位置情報等を収集し、ユーザが入力したタイトル文の解析を行う。
(C) 取得した音声データを音声のテキスト化機能（STT）を用いて音声認識し、その結果をテキストファイル「user_title」に記憶する。
(D)現地時間(local)のタイムゾーンを変数「user_timel」に取得する。また、必要に応じて+09:00等の処理を行い日本時間、あるいは現地時間を取得する。
(E)GPS、基地局等から携帯電話機の位置情報を変数「user_place」に取得する。測地系、座標系を指定して、緯度経度による取得等が可能である。また、必要に応じて、取得した緯度経度から、国、市（city）への変換を行う。変換のレベルの指定が可能である。
(F)取得した音声データに含まれる単語から、その音声データが分類されるカテゴリ（user_category）を決定する。複数に分類することも可能であるが、最大の数は「maxnum」とする。
(G)取得した音声データに含まれる単語の出現頻度を算出し、出現頻度の高い単語（user_word）を最大値以下（maxnum）で取得する。 Step 601: The user shoots the target object with the camera of the mobile phone and utters the title “First time singing”. Operate the “#” key when finished. The server collects time, position information, etc., and analyzes the title sentence input by the user.
(C) The acquired voice data is voice-recognized using a voice text function (STT), and the result is stored in a text file “user_title”.
(D) Get the time zone of local time (local) in variable “user_timel”. Also, if necessary, process +09: 00 etc. to obtain Japan time or local time.
(E) The location information of the mobile phone is acquired from the GPS, the base station, etc. in the variable “user_place”. The geodetic system and coordinate system can be specified, and acquisition by latitude and longitude can be performed. Moreover, it converts into the country and city (city) from the acquired latitude longitude as needed. The level of conversion can be specified.
(F) A category (user_category) into which the voice data is classified is determined from words included in the acquired voice data. Although it is possible to classify them into a plurality, it is assumed that the maximum number is “maxnum”.
(G) The appearance frequency of the word contained in the acquired audio | voice data is calculated, and the word (user_word) with high appearance frequency is acquired below a maximum value (maxnum).

ステップ602：このステップでは、ユーザに記録する本文の送信を要求する。
(H)本文の音声と映像を取得するコマンドである。音声による対話方式（dialogue）で、音声データをファイル「message_audio」に、映像データをファイル「message_video」に取得する。終了は「＃」キーを操作する。
ユーザに対するガイダンスとして、「発信音の後に本文を発声してください。終了時には＃ボタンを押してください。」のメッセージを、音声合成機能（TTS）を用いて再生する。 Step 602: In this step, the user is requested to transmit the text to be recorded.
(H) A command for acquiring audio and video of the body. The voice data is acquired in the file “message_audio” and the video data is acquired in the file “message_video” by a dialog system by voice. To finish, operate the “#” key.
As a guidance for the user, the message “Please utter the text after the dial tone. Press the # button when finished.” Is played using the speech synthesis function (TTS).

ステップ603：ユーザは携帯電話機のカメラにより対象物を撮影すると共に本文「パリに来ています・・・・・・」を発声し、終了時に「＃」キーを操作する。サーバはユーザが入力した本文を解析する。
(I)音声のテキスト化機能（STT）を利用して取得した音声を認識し、その結果をテキストファイル「user_message」に記憶する。
(J)テキストファイル「user_message」から頻出単語「user_word」を抽出し、それらを参考に要約文「user_summary」を作成。
(K)映像ファイル「message_video」を解析し、変化量が大きな部分、即ちシーンの切り替わり箇所を検出し、その時間（scene_time）を取得する。
(L)検出シーンに合わせて、音声ファイル「message_audio」を分割する。
(M)検出シーンに合わせて、映像ファイル「message_video」を分割する。
(N)ステップ600で作成したタイトル映像「title_video」の開始箇所から静止画を作成し、タイトル静止画「title_image」とする。
(O)本文映像をシーンで分割した映像ファイルの開始箇所から静止画を作成する（imessage_div）。 Step 603: The user shoots the target object with the camera of the mobile phone and utters the text “I am in Paris…” and operates the “#” key at the end. The server parses the text entered by the user.
(I) Recognize the voice acquired using the text-to-speech function (STT) and store the result in the text file “user_message”.
(J) Extract frequently used word “user_word” from text file “user_message” and create summary sentence “user_summary” with reference to them.
(K) The video file “message_video” is analyzed, a part having a large amount of change, that is, a scene switching point is detected, and the time (scene_time) is acquired.
(L) The audio file “message_audio” is divided in accordance with the detected scene.
(M) The video file “message_video” is divided in accordance with the detected scene.
(N) A still image is created from the start position of the title video “title_video” created in step 600 and is set as the title still image “title_image”.
(O) A still image is created from the start of a video file obtained by dividing the main video by scene (imessage_div).

ステップ604：ユーザに作成した記録を公開するか否かの確認を求める。
(P)作成した記録を公開するか、非公開とするかを確認するため、「１件お預かりしました。公開する場合は＊、非公開の場合は＃ボタンを押してください。」の音声メッセージを通知する。 Step 604: The user is asked to confirm whether or not to release the created record.
(P) To confirm whether the created record is public or non-disclosure, you will receive a voice message "You have received one item. If you want to make it public, please press the * button. To be notified.

ステップ605：ユーザは公開するため「＊」キーを操作する。「＊」キーを受信したテンプレート解釈実行部103は作成したデータをウェブ・サーバに登録する。 Step 605: The user operates the “*” key to make it public. Upon receiving the “*” key, the template interpretation execution unit 103 registers the created data in the web server.

ステップ606：作成したデータをウェブ・サーバに登録した旨の通知と、処理を継続するか、終了するかの確認を行う。
(Q)処理の継続／終了を確認するため「公開設定にしました。続ける場合は＊、終了時には＃ボタンを押してください。」の音声メッセージを通知する。 Step 606: A notification that the created data has been registered in the web server and whether the process is to be continued or not are confirmed.
(Q) In order to confirm the continuation / termination of the process, a voice message “You have made it publicly available.

ステップ607：ユーザは「＊」キーにより終了を通知する。
(R)テンプレート解釈実行部103はファイル・クローズ等の終了処理を行い、携帯電話機との接続を切る。 Step 607: The user notifies the end with the “*” key.
(R) The template interpretation execution unit 103 performs end processing such as file closing and disconnects from the mobile phone.

図7-1及び図7-2に示されるガイダンス用テンプレートは、コマンドの先頭から終了まで順次直線的に実行する例である。例えば、ユーザが入力したキーの種類に応答して実行順序を変更する条件ジャンプ・コマンド、あるいは所定回繰り返し実行するループ・コマンド等、処理の順序を変更する制御用コマンドを付加することにより複雑なガイダンスが可能となる。 The guidance templates shown in FIGS. 7-1 and 7-2 are examples in which commands are executed linearly sequentially from the beginning to the end of the command. For example, by adding a control command for changing the processing order such as a conditional jump command for changing the execution order in response to the type of key input by the user or a loop command for repeatedly executing a predetermined number of times Guidance is possible.

図４のステップ412で、出力形式解析部160は出力形式テンプレート記憶部152からウェブ・ページ用のテンプレートを読み出し、位置情報、時刻情報等、ユーザ情報取得部131が取得した情報、及び、メディア解析部151により生成された画像あるいはテキストと合成して公開可能なウェブ・ページを生成する。図8は生成されたウェブ・ページが例である。
図8の801はタイトル部であり、図7-1のコマンド(C)により変換されたたタイトルが表示される。
802は登録時刻であり図7-1のコマンド(D)により取得された時刻情報を日本時間(又は、現地時間)に変換したものが表示される。
803は登録場所であり、図7-1のコマンド(C)により取得された携帯電話機の位置情報を国名あるいは都市名に変換したものが表示される。
805はメッセージに出現する頻度の高い単語であり、図7-2のコマンド(J)によりテキストファイルから抽出された高い出現頻度の単語が表示される。
806はメッセージの概要であり、図7-2のコマンド(J)により頻出単語から生成された要約文が表示される。
807は音声メッセージへのリンクであり、図7-2のコマンド(L)により生成された音声ファイルの格納場所を指示するデータがリンク・タグと共に格納されている。
808は静止画像であり、図7-2のコマンド(N)により生成された静止画が表示される。
809は静止画像の説明であり、タイトル部の文字が表示される。 In step 412 of FIG. 4, the output format analysis unit 160 reads the web page template from the output format template storage unit 152, information acquired by the user information acquisition unit 131 such as position information and time information, and media analysis A web page that can be made public is generated by combining with the image or text generated by the unit 151. FIG. 8 shows an example of a generated web page.
801 in FIG. 8 is a title portion, and the title converted by the command (C) in FIG. 7-1 is displayed.
Reference numeral 802 denotes a registration time, which is obtained by converting the time information acquired by the command (D) in FIG. 7-1 into Japan time (or local time).
Reference numeral 803 denotes a registration location, which is obtained by converting the location information of the mobile phone acquired by the command (C) in FIG. 7-1 into a country name or a city name.
Reference numeral 805 denotes a word that frequently appears in a message, and a word having a high appearance frequency extracted from the text file by the command (J) in FIG. 7-2 is displayed.
Reference numeral 806 denotes an outline of the message, which displays a summary sentence generated from the frequent words by the command (J) in FIG. 7-2.
Reference numeral 807 denotes a link to the voice message, in which data indicating the storage location of the voice file generated by the command (L) in FIG. 7-2 is stored together with the link tag.
Reference numeral 808 denotes a still image, and a still image generated by the command (N) in FIG. 7-2 is displayed.
Reference numeral 809 is a description of a still image, and characters in the title portion are displayed.

図9は、図8に示されるウェブ・ページを生成するためのHTML用テンプレートの例である。出力形式解析部160は当該HTML用テンプレートを解析し、<#server# … #server#>で囲まれた部分にメディア解析部151が作成したデータ、あるいはデータファイルが格納され場所をリンク情報として代入する。各部の概要は以下のとおりである。 FIG. 9 is an example of an HTML template for generating the web page shown in FIG. The output format analysis unit 160 analyzes the HTML template and substitutes the location where the data or data file created by the media analysis unit 151 is stored as link information in the part enclosed by <# server #… # server #> To do. The outline of each part is as follows.

(A)ウェブ・ページのためのヘッダー部分である。
(B)図8のタイトル部801に対応する部分であり、「<#server# print_text($user_title[$num]); #server#>」に、タイトルデータ「user_title」が代入される。
(C)図8の登録時刻802に対応する部分であり、「<#server# print_time($user_time, "hh:mm", "local"); #server#>」に時刻データ「user_time」が日本時間、あるいは現地時間に変換され代入される。
(D)図8の登録場所803に対応する部分であり、「<#server# print_text($user_place[$num]); #server#>」に位置のデータ「user_place」を国名、あるいは都市名に変換したものが代入される。
(E)図8のカテゴリ804に対応する部分であり、「<#server# print_texts($user_category[$num][], "、"); #server#>」にカテゴリに関するデータ「user_category」が代入される。
(F)図8のメッセージ中の頻出単語805に対応する部分であり、「<#server# print_texts($user_word[$num][], "、"); #server#>」に頻度データ「user_word」が代入される。
(G)図8の静止画像808に対応する部分であり、「<#server# insert_img($title_image[$num]); #server#>」に画像ファイル「title_image」が格納されている位置データが代入される。
(H)図8の静止画像の説明文809に対応する部分であり、「<#server# print_text($user_title[$num]); #server#>」にタイトルデータ「user_title」が代入される。
(I)図8のメッセージの概要805に対応する部分であり、「<#server# print_text($user_summary[$num]); #server#>」に概要データ「user_summary」が代入される。
(J)図8の音声へのリンク807に対応する部分であり、「<#server# print_filepath($message_audio[$num]); #server#>」に音声ファイル「message_audio」が格納されている位置情報が代入される。 (A) A header part for a web page.
(B) The title data “user_title” is assigned to “<# server # print_text ($ user_title [$ num]); # server #>”, which corresponds to the title portion 801 in FIG.
(C) The part corresponding to the registration time 802 in Fig. 8, the time data "user_time" is displayed in "<# server # print_time ($ user_time," hh: mm "," local ");# server #>" in Japan Converted to time or local time and assigned.
(D) It corresponds to the registration place 803 in Fig. 8, and the location data "user_place" in "<# server # print_text ($ user_place [$ num]); # server #>" The converted version is substituted.
(E) This is the part corresponding to category 804 in Fig. 8, and the data related to category "user_category" is assigned to "<# server # print_texts ($ user_category [$ num] [],", ");# server #> Is done.
(F) This is the part corresponding to the frequent word 805 in the message of FIG. 8, and the frequency data “user_word” is stored in “<# server # print_texts ($ user_word [$ num] [],“, “); # server #>”. Is substituted.
(G) This is a part corresponding to the still image 808 in FIG. 8, and the position data in which the image file “title_image” is stored in “<# server # insert_img ($ title_image [$ num]); # server #>” Assigned.
(H) This is a part corresponding to the description 809 of the still image in FIG. 8, and the title data “user_title” is substituted into “<# server # print_text ($ user_title [$ num]); # server #>”.
(I) This is a part corresponding to the message summary 805 of FIG. 8, and the summary data “user_summary” is substituted into “<# server # print_text ($ user_summary [$ num]); # server #>”.
(J) This is the part corresponding to the audio link 807 in Fig. 8, where "<# server # print_filepath ($ message_audio [$ num]); # server #>" stores the audio file "message_audio" Information is substituted.

上記の例は、HTML用テンプレートの所定箇所にテキストあるいは画像を代入してウェブ用のデータを生成するものであるが、図4のステップ410で分割された映像データのシーンの数に応じてウェブ・ページに表示する画像の数を変更する構成とすることも可能である。また、本実施例のユーザは音声と映像を入力する構成であるが、ユーザが文字あるいは静止画を入力し、それらを用いてウェブ・ページを作成する構成も可能である。また、本実施例では出力用のテンプレートはウェブ・ページ作成用のHTML用テンプレートであるが、XMLあるいは印刷用のスクリプトを出力用のテンプレートとすることも可能である。 In the above example, web data is generated by substituting text or images into predetermined locations of the HTML template. However, depending on the number of scenes of the video data divided in step 410 in FIG. -It is also possible to change the number of images displayed on the page. In addition, the user of this embodiment is configured to input audio and video, but a configuration in which the user inputs characters or still images and uses them to create a web page is also possible. In this embodiment, the output template is an HTML template for creating a web page, but an XML or print script can be used as an output template.

本発明に係るサービス提供システムは、日記、旅行記録等の生活記録の作成を希望する利用にガイダンスを提供するサーバを有しており、利用者はサーバが提供するガイダンスに従って携帯電話機から文字、音声、映像をサーバに送信する。サーバは携帯電話機の位置情報、送信されたデータから、情報のジャンル、概要等を決定し、予め決められている書式に従ってウェブ・ページを作成する。 The service providing system according to the present invention includes a server that provides guidance for use in which a life record such as a diary and a travel record is desired, and the user can use the mobile phone to read characters, voice, and voice according to the guidance provided by the server. , Send the video to the server. The server determines the genre and summary of information from the location information of the mobile phone and transmitted data, and creates a web page according to a predetermined format.

本発明の生活記録作成装置およびその制御方法により、日記の対象となる現場、あるいは旅行先等で、サーバからの指示に従って、携帯電話機のカメラにより撮影された映像、あるいはマイクから入力された音声をサーバに送信することにより、簡単にウェブ・ページを作成することが可能となる。また、サーバは送信された映像あるいは音声から必要な情報を抽出してウェブ・ページを作成する。この構成により、利用者はウェブ・ページ作成に関する知識を必要とせず、旅行先等の現場から直接データを入力してウェブ・ページを作成することが可能となる。 According to the life record creation apparatus and the control method thereof of the present invention, video captured by a camera of a mobile phone or sound input from a microphone can be obtained in accordance with an instruction from a server at a site to be a diary or a travel destination. By transmitting to the server, it is possible to easily create a web page. The server also extracts necessary information from the transmitted video or audio and creates a web page. With this configuration, the user does not need knowledge about web page creation, and can create a web page by directly inputting data from a site such as a travel destination.

本発明に係るシステムの概要図Overview of the system according to the present invention 本発明に係るシステムのサーバの構成図Configuration diagram of server of system according to the present invention 本発明に係るシステムの処理フローの概要図Overview diagram of processing flow of system according to the present invention 本発明に係るシステムのサーバの処理フローの概要図Schematic diagram of the processing flow of the server of the system according to the present invention 本発明に係るシステムで用いられる類語テーブルの図Diagram of the synonym table used in the system according to the present invention 本発明に係るシステムにおける映像データの処理フロー図Processing flow diagram of video data in the system according to the present invention 本発明に係るシステムとユーザ間のデータの授受を示す図The figure which shows transfer of the data between the system which concerns on this invention, and a user 本発明に係るシステムとユーザ間のデータ授受を制御するテンプレートの図Diagram of a template for controlling data exchange between a system and a user according to the present invention 本発明に係るシステムとユーザ間のデータ授受を制御するテンプレートの図Diagram of a template for controlling data exchange between a system and a user according to the present invention ウェブ・ページの例を示す図Figure showing an example of a web page 本発明に係るシステムにおいてウェブ・ページを作成するテンプレートの図Diagram of a template for creating a web page in the system according to the present invention

Claims

A guidance template storage unit for storing a guidance template;
An output template storage unit for storing an output template;
A template reading unit that reads out the guidance template in response to a request from a mobile terminal;
A template interpretation execution unit for executing the read guidance template;
A shaping processing unit for receiving and shaping audio data and video data from the mobile terminal in response to execution of the guidance template by the template interpretation execution unit;
A life record creation system comprising: an output shaping unit that generates output data based on the output template for the data shaped by the shaping processing unit.

2. The life record creation system according to claim 1, wherein the guidance template includes a command for acquiring time information and position information of the mobile terminal.

3. The life record creation system according to claim 1, wherein the guidance template includes a command for displaying a guide on a mobile terminal and acquiring a key input from the mobile terminal.

4. The life record creating system according to claim 1, wherein the shaping processing unit includes speech recognition means for recognizing speech data acquired from a mobile terminal and converting the speech data into text data.

5. The life record creating system according to claim 4, wherein the shaping processing unit includes category determining means for extracting words having a high appearance frequency from the text data and determining a category of the text data.

6. The life record creation system according to claim 4, wherein the shaping processing unit includes summary sentence generation means for generating a summary sentence from the text data.

7. The life record creating system according to claim 1, wherein the shaping processing unit includes a scene decomposing unit that extracts a change point of video data acquired from a mobile terminal and decomposes it into a scene.

The life recording creation system according to claim 1, wherein the shaping processing unit includes a still image generation unit that generates a still image from video data acquired from a mobile terminal.

9. The life record creation system according to claim 1, wherein the output template is a template that generates HTML data of a web page from data shaped by the shaping processing unit.

Reading a template for guidance from the template storage unit in response to a request from the mobile terminal;
A template interpretation execution step of sequentially executing the read guidance template commands;
A shaping processing unit step for receiving and shaping audio data and video data from the mobile terminal in response to execution by the template interpretation execution unit step;
A life record creating method comprising: an output shaping step for generating output data based on an output template for the data shaped by the shaping processing step.

11. The life record creation method according to claim 10, wherein the guidance template includes a command for acquiring position information and time information of a mobile terminal.

12. The method according to claim 10 or 11, wherein the guidance template includes a command for displaying a guide on a mobile terminal and acquiring a key input from the mobile terminal.

13. The method for creating a life record according to claim 10, wherein the shaping processing step includes a speech recognition step of recognizing speech data acquired from a mobile terminal and converting the speech data into text data.

14. The life record creation method according to claim 13, wherein the shaping processing step includes a category determination step of extracting words having a high appearance frequency from the text data and determining a category of the text data.

15. The life record creation method according to claim 13, wherein the shaping processing step includes a summary sentence generation step for generating a summary sentence from the text data.

16. The method of creating a life record according to claim 1, wherein the shaping processing step includes a scene decomposition step of extracting a change point of the video data acquired from the mobile terminal and disassembling it into a scene.

17. The life recording creation method according to claim 10, wherein the shaping processing step includes a still image generation step of generating a still image from video data acquired from a mobile terminal.

18. The life record creating method according to claim 10, wherein the output template is a template for generating HTML data of a web page from the data subjected to the shaping process in the shaping process step.