JP2002041529A

JP2002041529A - Portable telephone set and unarranged visual information processing method

Info

Publication number: JP2002041529A
Application number: JP2000229604A
Authority: JP
Inventors: Atsuo Namekawa; 敦夫滑川
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2000-07-28
Filing date: 2000-07-28
Publication date: 2002-02-08
Anticipated expiration: 2020-07-28
Also published as: JP4438194B2

Abstract

PROBLEM TO BE SOLVED: To easily retrieve pieces of unarranged visual information (images, text information, etc.), file obtained by a portable telephone set. SOLUTION: The title of sound information is added as retrieval information to pieces of unarranged visual information to perform retrieval based upon the sound information. Text indexes are generated from the beginning part of a sound information file and used to facilitate retrieval.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は携帯電話機及び携帯
電話機に用いて好適な未整理の視覚情報処理方法に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a portable telephone and an unorganized visual information processing method suitable for the portable telephone.

【０００２】[0002]

【従来の技術】従来から、携帯電話機或は送受信機能を
有するＰＤＡ（個人情報端末）１等では図１０に示す様
に動画や静止画像を撮像するデジタルビデオカメラ２、
静止画像等を撮影可能なデジタルスチールカメラ３、携
帯電話機１に内蔵の撮像カメラ１５からの映像情報やイ
ンターネット網４からのテキスト情報等（本発明では映
像情報及びテキスト情報等の表示装置上に映出される情
報を視覚情報として定義して、以下、視覚情報と記す）
を容易に取得することが可能になっている。2. Description of the Related Art Conventionally, a portable telephone or a PDA (personal information terminal) 1 having a transmission / reception function has a digital video camera 2 for capturing moving images and still images as shown in FIG.
The digital still camera 3 capable of photographing a still image and the like, video information from the imaging camera 15 built in the mobile phone 1 and text information from the Internet network 4 (in the present invention, the video information and the text information are displayed on a display device). The information to be issued is defined as visual information, and is hereinafter referred to as visual information.)
Can be easily obtained.

【０００３】例えば、デジタルビデオカメラ２と携帯電
話機１との間ではＵＳＢ：ＩＥＥＥ１３９４（Ｕｎｉｖ
ｅｒｓａｌＳｅｒｉａｌＢｕｓ：Ｉｎｓｔｉｔｕｔ
ｅｏｆＥｌｅｃｔｒｉｃａｌａｎｄＥｌｅｃｔｒ
ｏｎｉｃｓＥｎｇｉｎｅｅｒｓ１３９４）等のイン
タフェースを介して有線による接続によって、またデジ
タルスチールカメラ３と携帯電話機１間ではＢｌｕｅｔ
ｏｏｔｈ（近距離での無線による送受信接続用インタフ
ェース）等のインタフェースを介する無線による送受信
によって、更に、インターネット網４の各ウェブ（Ｗｅ
ｂ）サイト５ａ，５ｂ，５ｃ‥‥からの各種テキストデ
ータや図形、映像データ等の視覚情報は携帯電話基地局
６及び電波塔７を介して無線によって容易にダウンロー
ドが可能であり、携帯電話機〔本発明では携帯電話端
末、（６４ｋｂｐｓ）携帯情報端末、（１６４ｋｂｐｓ
〜３８４ｋｂｐｓ）、カーマルチメディア端末（６４ｋ
ｂｐｓ，１２８ｋｂｐｓ，１４４ｋｂｐｓ）、携帯用の
マルチメディア端末（２Ｍｂｐｓ）等を含めて、携帯電
話機として説明を進める。〕にメモリスティックやフラ
ッシュメモリ等の小型メモリカードやＢｌｕｅｔｏｏｔ
ｈ用の送受信用カード等が搭載可能とされ、大量の視覚
情報を携帯電話機のメモリ内に格納可能で且つＰＤＡ等
でも送受信が可能と成されている。For example, between the digital video camera 2 and the portable telephone 1, USB: IEEE 1394 (Univ.
eral Serial Bus: Institute
eof Electrical and Electr
nics Engineers 1394) or the like, and between the digital still camera 3 and the mobile phone 1 by Bluetooth.
wireless (a short-distance wireless transmission / reception connection interface) or the like, wireless transmission / reception via an interface such as the Internet (Internet) 4 further allows each web (Web)
b) Visual information such as various text data, graphics, and video data from the sites 5a, 5b, and 5c can be easily downloaded wirelessly through the mobile phone base station 6 and the radio tower 7, and the mobile phone [ In the present invention, a mobile phone terminal, a (64 kbps) portable information terminal, a (164 kbps)
~ 384kbps), car multimedia terminal (64k
bps, 128 kbps, 144 kbps), a portable multimedia terminal (2 Mbps), and the like, and the description will be made as a mobile phone. ] And small memory cards such as memory sticks and flash memories, and Bluetooth
h, a transmission / reception card or the like can be mounted, a large amount of visual information can be stored in the memory of the mobile phone, and transmission / reception can also be performed with a PDA or the like.

【０００４】[0004]

【発明が解決しようとする課題】上述の携帯電話機１で
はＬＣＤ等の表示装置８の表示面が極めて小さく、且つ
テンキー等の操作部９のキー数が少ないために、例えば
小型メモリカード内に格納した未整理の複数の視覚情報
ファイルを検索するためにタイトルや説明情報をキーを
介して入力しようとすると、一般の比較的大きな表示装
置及び文字入力用のキーボードを有するものに比べて操
作性が悪く、且つ、長時間を要する課題があった。In the above-mentioned portable telephone 1, since the display surface of the display device 8 such as an LCD is extremely small and the number of keys of the operation unit 9 such as a ten-key is small, the portable telephone 1 is stored in a small memory card, for example. If the user tries to enter titles and description information via keys to search for multiple unorganized visual information files, the operability is lower than that of a general device having a relatively large display device and a keyboard for character input. There is a problem that is bad and requires a long time.

【０００５】本発明は叙上の課題を解決するためになさ
れたものであり、発明が解決しようとする課題は携帯電
話機が有する基本機能である音声送受信機能から音声フ
ァイルを作成し、この音声ファイルを複数の視覚情報フ
ァイルにタイトルとして添付することで所定の画像等の
視覚情報を容易に検索可能な携帯電話機及び未整理の視
覚情報処理方法を提供しようとするものである。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems. The problem to be solved by the present invention is to create a voice file from a voice transmission / reception function which is a basic function of a portable telephone, Is attached to a plurality of visual information files as titles to provide a mobile phone and an unarranged visual information processing method capable of easily searching for visual information such as a predetermined image.

【０００６】[0006]

【課題を解決するための手段】第１の本発明の携帯電話
機１は、複数の未整理の視覚情報をファイルに取得する
視覚情報取得手段１４，１６と、未整理の視覚情報ファ
イルに対応する検索用音声情報を作成する検索用音声情
報作成手段１９と、視覚情報と上記検索用音声情報を記
憶する記憶手段１１ａ，１１ｂと、視覚情報と検索用音
声情報を記憶手段内にマージ或はリンクさせる検索ファ
イル作成手段２４とを具備して成るものである。A portable telephone 1 according to a first embodiment of the present invention corresponds to visual information acquisition means 14 and 16 for acquiring a plurality of unorganized visual information in a file, and an unorganized visual information file. Search voice information generating means 19 for generating search voice information, storage means 11a and 11b for storing visual information and the search voice information, and merging or linking the visual information and search voice information into the storage means. And a search file creating means 24 for causing the search file to be created.

【０００７】第２の本発明の携帯電話機１は第１の発明
の検索用音声情報作成手段２１で作成した説明用ファイ
ルの先頭部分を音声認識する音声認識手段２５と、音声
認識手段２５で音声認識した先頭部分の情報をテキスト
情報に変換して検索用テキスト情報を作成する検索用テ
キスト情報作成手段２６とを有し、検索用音声情報と視
覚情報に検索用テキスト情報をマージ或はリンクさせて
成るものである。The portable telephone 1 of the second invention has a speech recognition means 25 for recognizing the first part of the description file created by the search speech information creation means 21 of the first invention, and a speech recognition means 25. A search text information creating means 26 for converting the recognized information of the leading part into text information to create search text information, and merging or linking the search text information and the visual information with the search text information. It consists of

【０００８】第３の本発明の携帯電話機は第１及び第２
の発明に於いて、記憶手段１１ａ，１１ｂに格納した検
索用音声情報を再生、選択する再生、選択手段２２と、
再生、選択手段２２の再生速度を高速或は低速に可変可
能な再生速度可変手段２２ａとを有し、検索用音声情報
の検索時間を可変するように成したものである。[0008] The mobile phone according to the third aspect of the present invention comprises the first and second mobile phones.
In the invention, the reproduction / selection means 22 for reproducing / selecting the search audio information stored in the storage means 11a, 11b;
The reproduction / selection unit 22 has a reproduction speed variable unit 22a capable of changing the reproduction speed to a high speed or a low speed, so that the search time of the search voice information is variable.

【０００９】本発明の未整理の視覚情報処理方法は、複
数の未整理の視覚情報をファイルに取得する視覚情報取
得ステップＳ₁と、未整理の視覚情報ファイルに対応す
る検索用音声情報を作成する検索用音声情報作成ステッ
プＳ₆と、視覚情報と検索用音声情報を記憶手段内にマ
ージ或はリンクさせる検索ファイル作成ステップＳ₉と
を具備してなるものである。[0009] Unsorted visual information processing method of the present invention, creates a visual information obtaining step S ₁ to obtain the visual information of a plurality of non-organized file, the search for audio information corresponding to the unorganized visual information file and search for audio information creating step S ₆ that is made by and a search file creation step S ₉ to merge or link the search audio information and visual information in the storage means.

【００１０】本発明の第１の携帯電話機及び未整理の視
覚情報処理方法によれば記憶手段に格納された複数の視
覚情報を検索する際に携帯電話機では入力し難いテキス
トデータとして入力することなく、容易に入力可能な音
声をタイトル、説明情報として添付することが可能で、
複数の視覚情報の整理、検索を容易にすることが出来
る。又、テキスト索引ファイルに音声情報ファイル、視
覚情報ファイルへのリンクを記述することで音声情報フ
ァイル、視覚情報ファイルの関連付けの変更を容易に高
速に行える。さらに、音声情報ファイルと視覚情報ファ
イルの関連付けを柔軟に行うことができ、１対１だけで
なく、１つの音声情報ファイルに複数の視覚情報ファイ
ルを、または複数の音声情報ファイルを１つの視覚情報
ファイルに関連付けることが容易になる。According to the first portable telephone and the unorganized visual information processing method of the present invention, when searching for a plurality of pieces of visual information stored in the storage means, the portable telephone does not input as text data which is difficult to input. , Easily input audio can be attached as title and description information,
It is possible to easily organize and search a plurality of visual information. Also, by describing a link to the audio information file and the visual information file in the text index file, the association between the audio information file and the visual information file can be easily and quickly changed. Further, the association between the audio information file and the visual information file can be flexibly performed, and not only one-to-one but also a plurality of visual information files in one audio information file or a plurality of audio information files It becomes easier to associate with files.

【００１１】本発明の第２の携帯電話機によれば音声認
識・テキスト変換により音声情報ファイルの先頭部分の
テキスト索引を作成することにより、テキスト索引によ
る検索も加えることで、より検索効率を上げることがで
きる。According to the second portable telephone of the present invention, by creating a text index at the beginning of a speech information file by speech recognition and text conversion, a search by the text index is also added, thereby further improving the retrieval efficiency. Can be.

【００１２】本発明の第３の携帯電話機によれば、音声
情報ファイルを作成する際に、再生速度制御可能な音声
コーデックを使用することにより、検索時間を短縮化す
ることができる。According to the third portable telephone of the present invention, the search time can be reduced by using an audio codec capable of controlling the reproduction speed when creating the audio information file.

【００１３】[0013]

【発明の実施の形態】以下、本発明の携帯電話機及び未
整理の視覚情報処理方法を図１乃至図９によって説明す
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A portable telephone and an unorganized visual information processing method according to the present invention will be described below with reference to FIGS.

【００１４】図１は本発明の携帯電話機の機能ブロック
図、図２は本発明の携帯電話機のブロック図、図３乃至
図６は動作説明用のフローチャート図、図７乃至図９は
ファイル説明図である。FIG. 1 is a functional block diagram of the portable telephone of the present invention, FIG. 2 is a block diagram of the portable telephone of the present invention, FIGS. 3 to 6 are flowcharts for explaining the operation, and FIGS. It is.

【００１５】本発明に用いる携帯電話機１は図２のブロ
ック図に示す様に送受信用アンテナ３０、ＲＦブロック
３１、変復調ブロック３２を含む送受信用無線部、スピ
ーカ３３、デジタル−アナログ変換器（Ｄ／Ａ）３４を
含む音声及びダイヤラ信号放音用の音声出力部とマイク
ロホン３５、アナログ−デジタル変換器（Ａ／Ｄ）３６
を含む音声入力部を有する音声コーデック部３９、キー
ボードを含む小さな操作部９、ＬＣＤ等の小さな表示部
８、マイクロコンピュータ（ＣＰＵ）、デジタルシグナ
ルプロセッサ（ＤＳＰ）等のシステム及び伝送制御用の
プロセッサ部３７とプログラム・データ用のＲＯＭ，Ｒ
ＡＭ等の内蔵メモリ部３８、メモリカード等の外部メモ
リ部１１ａ，１１ｂ、携帯電話機１内蔵の撮像カメラ１
５及び画像コーデック部４０を有し、これら各部分はバ
スを介してプロセッサ３７に接続されている。As shown in the block diagram of FIG. 2, the portable telephone 1 used in the present invention includes a transmitting / receiving radio section including a transmitting / receiving antenna 30, an RF block 31, and a modulation / demodulation block 32, a speaker 33, and a digital / analog converter (D / A / D converter). A) A voice output unit for voice and dialer signal emission including 34, a microphone 35, an analog-digital converter (A / D) 36
, An audio codec unit 39 having an audio input unit, a small operation unit 9 including a keyboard, a small display unit 8 such as an LCD, a system such as a microcomputer (CPU), a digital signal processor (DSP), and a processor unit for transmission control. 37, ROM and R for program data
Built-in memory unit 38 such as AM, external memory units 11a and 11b such as a memory card, imaging camera 1 built in mobile phone 1
5 and an image codec unit 40, each of which is connected to the processor 37 via a bus.

【００１６】外部メモリ（メモリ・スティックカード、
ＰＣカード、スマート・メディアカード、ミニチュアカ
ード、コンパクト・フラッシュカード）１１ａ，１１ｂ
は携帯電話機１の外筺に設けたカード挿入用のスロット
を介して着脱自在と成されている。External memory (memory stick card,
PC card, smart media card, miniature card, compact flash card) 11a, 11b
Is detachable via a card insertion slot provided in the outer housing of the mobile phone 1.

【００１７】上述のハードウェア構成に於ける、本発明
の機能ブロックを図１に示す。図１に於いて、画像情報
等の視覚情報ファイル２７と音声情報ファイル２８は外
部メモリ１１ａ，１１ｂ内に格納される。FIG. 1 shows functional blocks of the present invention in the above-described hardware configuration. In FIG. 1, a visual information file 27 such as image information and an audio information file 28 are stored in the external memories 11a and 11b.

【００１８】図１では二つの外部メモリ１１ａ，１１ｂ
として書かれているが１つのメモリでよく、外部メモリ
１１ａ，１１ｂでなく携帯電話機１に内蔵された内蔵メ
モリ（内蔵メモリ部３８以外）であってもよい。FIG. 1 shows two external memories 11a and 11b.
, But may be a single memory, and may be a built-in memory (other than the built-in memory unit 38) built into the mobile phone 1 instead of the external memories 11a and 11b.

【００１９】図１０で説明した様にデジタルビデオカメ
ラ２、デジタルスチルカメラ３、イメージスキャナ等の
他のデバイス１２（図１参照）から取り出された静止画
や動画等の画像情報とインタラネットやインターネット
等１３のパーソナルコンピュータやＰＤＡ等から取り出
された図形、表、テキスト情報及び画像情報と図形、
表、テキスト情報を組合せた組合せ情報が視覚情報とし
て外部入力部１４に供給される。As described with reference to FIG. 10, image information such as a still image or a moving image taken out from another device 12 (see FIG. 1) such as the digital video camera 2, digital still camera 3, image scanner, etc. 13 figures, tables, text information and image information and figures taken out from personal computers, PDAs, etc.
Combination information obtained by combining a table and text information is supplied to the external input unit 14 as visual information.

【００２０】外部入力部１４は物理的にはＵＳＢであっ
たり、無線で送受信されるデジタルデータをベースバン
ドのデジタルデータに変換するＢｌｕｅｔｏｏｔｈ用の
モデムであったりする。これら視覚情報を取得するため
の制御ソフトウェアはプロセッサ３７内に含まれ、これ
らを含めて外部入力部１４としている。The external input unit 14 is physically a USB or a Bluetooth modem for converting digital data transmitted and received wirelessly into baseband digital data. Control software for acquiring these visual information is included in the processor 37, and the external software is included in the external input unit 14.

【００２１】内蔵画像デバイスは携帯電話機１内蔵のＣ
ＣＤ、ＣＭＯＳセンサ等の撮像カメラ１５であり、画像
ファイル作成部１６は図２の画像コーデック部４０と上
述の外部入力部１４に対応しているので内蔵画像デバイ
ス１５の出力を外部入力部に供給するようにしてもよ
い。The built-in image device is C built in the mobile phone 1.
The image file creation unit 16 supplies the output of the built-in image device 15 to the external input unit since the image file creation unit 16 corresponds to the image codec unit 40 of FIG. You may make it.

【００２２】メモリ１１ａに格納された画像情報、テキ
スト情報等の視覚情報ファイル２７は画像表示・選択部
１７及びテキスト表示・選択部２３によりＬＣＤ等の表
示部８上に表示、選択される。The visual information file 27 such as image information and text information stored in the memory 11a is displayed and selected on the display unit 8 such as an LCD by the image display / selection unit 17 and the text display / selection unit 23.

【００２３】次に音声ファイル作成部１９は携帯電話機
１内にすでに備わっている機能としての音声入力デバイ
ス（マイクロホン）３５及び音声コーデック部３９並び
に音声再生・選択部２２等を利用する。Next, the audio file creation unit 19 uses an audio input device (microphone) 35, an audio codec unit 39, and an audio reproduction / selection unit 22 as functions already provided in the mobile phone 1.

【００２４】一般に携帯電話機１では図２に示す様にマ
イクロホン３５から入力した音声情報信号はＡ／Ｄ３６
でデジタルデータに変換後に決められた音声コーデック
・フォーマットに音声コーデック部３９でエンコード
し、そのデジタルデータを無線で送信可能な様に変復調
部３２及びＲＦブロック部３１を介してアンテナ３０よ
り送信する。In general, in the portable telephone 1, as shown in FIG.
After conversion into digital data, the audio data is encoded by the audio codec unit 39 into the determined audio codec format, and the digital data is transmitted from the antenna 30 via the modulation / demodulation unit 32 and the RF block unit 31 so that the digital data can be transmitted wirelessly.

【００２５】また、アンテナ３０で受信した信号はＲＦ
ブロック部３１及び変復調部３２で復調してデジタルデ
ータに戻し、音声コーデック部３９で音声コーデックフ
ォーマットに従ってデコードし、Ｄ／Ａ３４でアナログ
信号に変換してスピーカ３３に出力される。The signal received by the antenna 30 is RF
The digital data is demodulated by the block unit 31 and the modulation / demodulation unit 32 to be converted back to digital data, decoded by the audio codec unit 39 according to the audio codec format, converted to an analog signal by the D / A 34, and output to the speaker 33.

【００２６】本発明は、音声ファイル作成部１９により
音声コーデック・フォーマットにエンコードしてメモリ
１１ｂに音声情報ファイル２８としてファイルさせ、こ
の音声情報ファイル２８をデコードし、再生、選択する
音声再生、選択部２２（音声コーデック部３９）を用い
て画像情報、テキスト情報等の複数の未処理の視覚情報
に説明情報、検索情報としてのタイトルを添付（マー
ジ）させる音声ファイル添付部２４を設けて、ｉモード
やＥＺ−Ｗｅｂなどと称されるインターネット網からダ
ウンロードした複数の未整理の視覚情報を整理し、検索
を容易にしようとするものである。According to the present invention, the audio file creating section 19 encodes the audio information into an audio codec format and causes the memory 11b to file it as an audio information file 28. The audio information file 28 is decoded, reproduced, and selected. 22 (audio codec unit 39) is provided with an audio file attachment unit 24 for attaching (merging) titles as explanation information and search information to a plurality of unprocessed visual information such as image information and text information. It aims at organizing a plurality of unorganized visual information downloaded from an Internet network called EZ-Web or EZ-Web to facilitate retrieval.

【００２７】従来から、複数の未整理の視覚情報を扱う
際に、ユーザがインデックス番号やファイル名等の検索
情報を入力することなく、自動的にファイルの区別を行
なうためダウンロード時や視覚情報の作成順に通しのイ
ンデックス番号を付加するとか、ファイルへのダウンロ
ードや作成時のタイムスタンプをファイル名とする方法
があるが、必要な視覚情報を探す際には順次画像等の視
覚情報を表示部８に表示しなくてはならず面倒である。
そこで単純なインデックス番号やタイムスタンプのファ
イル名でなく、より効率良くファイルを整理するため
に、ユーザが識別するためのタイトルをつけることが必
要になるのだが携帯電話機ではキーがパーソナルコンピ
ュータ等に比べて少なく文字タイトルの入力も容易でな
い。Conventionally, when handling a plurality of unorganized visual information, the user can automatically distinguish files without inputting search information such as an index number or a file name. There is a method of adding a sequential index number in the order of creation or using a time stamp at the time of downloading or creation to a file as a file name. When searching for necessary visual information, visual information such as an image is sequentially displayed on the display unit 8. Must be displayed on the screen.
Therefore, instead of a simple index number or timestamp file name, it is necessary to provide a title for the user to identify the file more efficiently. It is not easy to input text titles.

【００２８】本発明は、携帯電話機の基本機能である音
声ファイルが容易に作成できる部分に着目し、この音声
ファイルを、整理が必要な複数の画像等のファイルにタ
イトル或は画像等ファイルの説明として添付する機能を
新たに追加することで上記問題を解決しようとするもの
である。The present invention focuses on a part that can easily create an audio file, which is a basic function of a portable telephone, and converts the audio file into a plurality of files such as images that need to be organized by a title or a file such as an image. The above-mentioned problem is intended to be solved by newly adding a function to be attached.

【００２９】図３は上述の音声情報ファイルを視覚情報
の１つである画像情報ファイルにタイトル等の検索情報
として添付するためのプロセッサ（以下ＣＰＵと記す）
３７による音声ファイル添付処理のフローチャートを示
すものである。FIG. 3 shows a processor (hereinafter referred to as a CPU) for attaching the above-mentioned audio information file to the image information file as one of the visual information as search information such as a title.
37 shows a flowchart of an audio file attachment process by the subroutine 37.

【００３０】図３の第１ステップＳ₁ではデジタルビデ
オカメラ２、デジタルスチルカメラ３等の他のデバイス
１２からＵＳＢやＢｌｕｅｔｏｏｔｈ等で取得した画像
情報或はインターネット網等１３からのテキスト情報、
図形情報等の視覚情報をダウンロードする。In the first step S ₁ in FIG. 3, image information obtained from another device 12 such as the digital video camera 2 or the digital still camera 3 by USB or Bluetooth, or text information from the Internet network 13,
Download visual information such as graphic information.

【００３１】第２ステップＳ₂ではダウンロード時刻に
基づいて自動的にファイル名やタイムスタンプ等が付加
される。[0031] automatically file name, time stamp, etc. is added on the basis of the second step S ₂ in the download time.

【００３２】この状態で表示部８にはダウンロードした
画像８ａや画像８ａを保存するか否かの質問文８ｂが表
示される。In this state, the display section 8 displays the downloaded image 8a and a question 8b as to whether or not to save the image 8a.

【００３３】第３ステップＳ₃ではＣＰＵ３７はダウン
ロードした画像を保存するか否かを判断する。ユーザは
表示部８上の質問文８ｂの０（はい）、１（いいえ）を
クリックする。ＮＯであれば第４ステップＳ₄に進み画
像８ａを表示して終了し、ＹＥＳであれば第５ステップ
Ｓ₅に進められる。[0033] In the third step S ₃ CPU 37 determines whether or not to save the downloaded images. The user clicks 0 (Yes) and 1 (No) of the question sentence 8b on the display unit 8. 4 displays the images 8a proceeds to step S ₄ ends if NO, the proceeds to a fifth step S _5, if YES.

【００３４】第５ステップＳ₅では音声入力メッセージ
の表示が成され、表示部８にはダウンロード画像８ａと
共に「音声を入力して下さい」等のメッセージ８ｃが表
示される。[0034] In a fifth step S ₅ the display of the voice input message made, a message 8c such as "Please input speech" is displayed together with the downloaded image 8a on the display section 8.

【００３５】第６ステップＳ₆ではユーザが表示部８の
画像８ａを視ながら画像８ａに対応した検索用タイト
ル、説明文等をマイクロホン３５を介して音声で例えば
「逗子海岸の夕日」等と入力する。In the sixth step S ₆ , the user inputs a search title, a description and the like corresponding to the image 8 a by voice through the microphone 35, for example, “Sunset at Zushi Beach” while watching the image 8 a on the display unit 8. I do.

【００３６】第７ステップＳ₇ではＣＰＵ３７は音声コ
ーデック部３９及びＤ／Ａ３４を介してスピーカ３３か
ら「逗子海岸の夕日」と再生する。[0036] In a seventh step S ₇ CPU 37 reproduces from the speaker 33 via the audio codec 39 and D / A34 as "Zushi coast sunset".

【００３７】第８ステップＳ₈では音声再生音がＯＫか
否かをユーザに求め、表示部８に「ＯＫですか」の質問
文８ｄを表示する。ユーザは「１」或は「０」をクリッ
クすることでＮＯであれば第５ステップＳ₅の頭に戻さ
れるが、ＹＥＳであれば第９ステップＳ₉に進む。In an eighth step S ₈ , the user is asked whether or not the voice reproduction sound is OK, and a question 8 d of “OK?” Is displayed on the display unit 8. While the user is returned to the beginning of the fifth step S ₅ If NO by clicking the "1" or "0", the flow proceeds to a ninth step S ₉ if YES.

【００３８】第９ステップＳ₉では音声情報ファイルを
画像等の視覚情報ファイルとマージすることで図７に示
す様に１つのファイルに併合させる。[0038] is annexed to a ninth step S ₉ in the voice information file image such visual information file and merges it with one as shown in FIG. 7 to files.

【００３９】図７は音声ファイル添付部２４で視覚情報
ファイル２７と音声情報ファイル２８を１つにまとめた
ファイル・イメージを示すものであり、５０は第２ステ
ップＳ₂で自動生成されたファイル名等のテキスト索引
部、５１は第６ステップＳ₆で音声情報ファイル２７に
格納した「逗子海岸の夕日」等の音声情報ファイル、５
２は第１ステップＳ₁でデジタルビデオカメラ３等から
取得した画像８ａを格納した画像等の視覚情報ファイル
である。尚ファイル名５０は００年０３月０１日１６時
２５分００秒（０００３０１１６２５００，ＰＣＫ）の
様なタイムスタンプであってもよい。FIG. 7 shows a file image that summarizes the visual information file 27 and audio information file 28 into one voice file attachment unit 24, 50 is a file name that is automatically generated in the second step S ₂ And a text index part 51 such as “Sunset at Zushi Beach” stored in the voice information file 27 in the sixth step S _6.
2 is a visual information file such as an image that contains the image 8a obtained from the digital video camera 3 or the like in the first step S _1. Note that the file name 50 may be a time stamp such as 16:25:00 on March 01, 2000 (000301162500, PCK).

【００４０】この様に音声情報ファイル５１と視覚情報
ファイル５２をマージして１つのファイルとした後に終
了に至ることになる。As described above, after the audio information file 51 and the visual information file 52 are merged into one file, the process ends.

【００４１】次に、上述の様にマージした複数の未整理
の視覚情報ファイルから所定の視覚情報を検索する場合
のフローチャートを図４に示す。Next, FIG. 4 shows a flowchart in a case where predetermined visual information is retrieved from a plurality of unarranged visual information files merged as described above.

【００４２】図４に於いて、第１ステップＳＴ₁ではＣ
ＰＵ３７は音声タイトルを再生するか否かを表示部８に
表示する。例えば表示部８の画面上には「音声を再生し
ますか」等の質問文８ｅが表示される。ユーザはＹＥＳ
では「０」をクリックし、ＮＯでは「１」をクリックす
る。In FIG. 4, in the first step ST ₁ , C
The PU 37 displays on the display unit 8 whether or not to reproduce the audio title. For example, a question text 8e such as "Do you want to reproduce voice?" Is displayed on the screen of the display unit 8. User is YES
Then, click "0", and if NO, click "1".

【００４３】第１ステップＳＴ₁でＮＯであれば第２ス
テップＳＴ₂に進んで従来一般に行なわれている様な複
数の画像等の視覚情報ファイルを順次、表示させ、これ
ら表示画像を見ながら所定の視覚情報検索を行なう様な
従来の処理が行なわれる。The first step ST ₁ sequentially visual information file of the plurality of images such as a second as is done conventionally proceeds to step ST _2, if NO, the is displayed, predetermined while viewing these displayed images Conventional processing such as performing a visual information search is performed.

【００４４】第１ステップＳＴ₁がＹＥＳであれば、第
３ステップＳＴ₃でＣＰＵ３７はスピーカ３３を介して
視覚情報ファイルにマージした音声タイトルを順番に再
生する。If the first step ST ₁ is YES, in a third step ST ₃ , the CPU 37 sequentially reproduces the audio titles merged into the visual information file via the speaker 33.

【００４５】この様な再生時に音声を単純に通常の速度
で再生するステップ（第５ステップＳＴ₅）の場合、入
力時に文字で入力する場合に比べれば音声で入力するの
で入力方法は容易であるが、検索時には画像を順番に見
に行く第２ステップＳＴ₂に比べて時間がかかってしま
う可能性があるので、第４ステップＳＴ₄で早聞き操作
を行なうか否かを判断させる。In the case of the step of simply reproducing the voice at the normal speed during such reproduction (fifth step ST ₅ ), the input method is easier because the voice is input as compared with the case where the input is performed by text at the time of input. but when searching because time than the second step ST ₂ going to see images in sequence there is a possibility that it takes, thereby determines whether to perform a fast playback operation at a fourth step ST _4.

【００４６】この早聞機能は図１の音声再生、選択部２
２内に設けられた再生速度可変制御部２２ａで行ない、
この音声コーデックはＭＰＥＧ−４（ＭｏｖｉｎｇＰ
ｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ−４：オー
ディオビジュアルオブジェクト符号化規格）ＨＶＸＣ
（ＨａｒｍｏｎｉｃＶｅｃｔｏｒｅＸｃｉｔａｔｉ
ｏｎＣｏｄｉｎｇ）を用いることで早聞き、遅聞き再
生を容易に行なうことが出来る。This fast-reading function is performed by the sound reproduction / selection unit 2 shown in FIG.
2 is performed by a variable reproduction speed control unit 22a provided in
This audio codec is MPEG-4 (Moving P
ictx Experts Group-4: audiovisual object coding standard) HVXC
(H armonic V ector e X citati
heard early by using the on C oding), slow to hear it can be easily performed the play.

【００４７】パラメトリック符号化であるＨＶＸＣでの
音声符号化のビットレートは２ｋｂｐｓ〜４ｋｂｐｓで
あるが可変ビットレートモードによって１．２ｋｂｐｓ
程度まで使用可能である。人間の声等の有声音をスペク
トル分布させるとハーモニックと呼ばれる一様な信号に
なるがＨＶＸＣではこの中から位相情報を捨てたハーモ
ニック振幅列として周波数や振幅（ピッチ）等を量子化
しているので再生側で認識可能で、速度やピッチの制御
・変更が容易に可能となっている。The bit rate of voice coding in HVXC, which is parametric coding, is 2 kbps to 4 kbps, but 1.2 kbps depending on the variable bit rate mode.
Can be used to the extent. When voiced sounds such as human voices are spectrally distributed, they become uniform signals called harmonics. However, in HVXC, frequency and amplitude (pitch) are quantized as a harmonic amplitude sequence from which phase information is discarded. It can be recognized on the side, and it is easy to control and change the speed and pitch.

【００４８】この場合、早聞き、遅聞きに対応するキー
操作は例えば「早聞きしますか」等の質問文８ｆが表示
部８の画面上に表示され、「０」：はい、「１」：いい
えのいずれかをクリックすることで早聞き操作か否かの
判断が成され、ＮＯでは上記した様に第５ステップＳＴ
₅の通常再生が行なわれ、ＹＥＳでは第６ステップＳＴ
₆によって速度可変制御部２２ａで早聞き、遅聞き等の
再生が成される。In this case, as for the key operation corresponding to the early listening and the late listening, for example, a question 8f such as "Do you listen early?" Is displayed on the screen of the display unit 8, and "0": yes, "1" : By clicking any one of No, it is determined whether or not the operation is a fast-listening operation. If NO, the fifth step ST is performed as described above.
₅ is performed, and if YES, the sixth step ST
_{By 6} , reproduction such as early listening and late listening is performed by the variable speed control unit 22 a.

【００４９】第５ステップＳＴ₅及び第６ステップＳＴ
₆終了後は第７ステップＳＴ₇に進んで再生された音声
タイトルに対応した画像等の視覚情報でＯＫか否かを判
断する。Fifth step ST ₅ and sixth step ST
After ₆ completed to determine OK or not the visual information such as an image corresponding to audio titles played proceeds to a seventh step ST _7.

【００５０】即ち、表示部８の画面には「この音声に対
応する画面ですか」と「０」：はい、「１」：戻る、
「２」：次へ等の質問文８ｇが表示される。従って、例
えば「逗子海岸の夕日」と云う音声タイトルが再生され
たらＹＥＳとし「０」をクリックすることで第８ステッ
プＳＴ₈に進む。ＮＯでは第４ステップＳＴ₄の頭に戻
され、「１」のクリックでは元の早聞きが再度再生さ
れ、「２」のクリックでは次の音声タイトルが再生され
る。That is, the screen of the display unit 8 displays "Does this screen correspond to this voice?", "0": Yes, "1": Return,
"2": The next question text 8g is displayed. Thus, for example, audio title called "Zushi coast sunset" proceeds to the eighth step ST ₈ by clicking on the "0" and YES Once play. Returned to the head of NO in the fourth step ST _4, the click of the "1" is played back the original fast to hear again, the next voice title is reproduced by clicking the "2".

【００５１】第８ステップＳＴ₈では対応する画像等の
視覚情報ファイルの「逗子海岸の夕日」に対応した画像
８ａが表示部８に表示されてファイル検索、再生表示終
了に至る。The eighth image 8a corresponding to "Zushi coast Sunset" visual information file such as an image is displayed on the display unit 8 the file search step ST ₈ corresponding, leading to reproduction display end.

【００５２】上述の検索方法によれば、所定の未整理の
複数の視覚情報から所定の視覚情報を音声タイトルを開
いて高速に検索することが出来る。According to the above-described search method, predetermined visual information can be searched at high speed by opening a voice title from a plurality of predetermined unorganized visual information.

【００５３】更に図３で説明した音声情報ファイル添付
処理の第１ステップＳ₁乃至第８ステップＳ₈に続け
て、図１の機能ブロックに示す様に音声認識テキスト変
換部２５及び音声ファイル並び替え部２６を介して、２
９で示す様なイメージのテキスト索引付音声ファイルを
作成することで視覚情報を効率よく検索整理可能な携帯
電話機及び未整理の視覚情報処理方法を図５及び図６並
びに図８及び図９によって説明する。[0053] Further following the first step S ₁ to the eighth step S ₈ of the audio information file attachment process described with reference to FIG. 3, the speech recognition text conversion unit 25 and the audio file rearrangement as shown in the functional block of FIG. 1 Through the part 26, 2
A mobile phone capable of efficiently retrieving and organizing visual information by creating an audio file with a text index of an image as shown in FIG. 9 and an unorganized visual information processing method will be described with reference to FIGS. 5 and 6 and FIGS. I do.

【００５４】図１の音声認識・テキスト変換部２５では
メモリ１１ｂの音声情報ファイル２８から取り出した音
声情報をテキスト（文字）に変換する。ここで重要なの
は通常の会話をすべて正しくテキストへ変換するには、
プロセッサ能力、メモリ容量を考えても携帯電話機に搭
載するのは困難を要するので、ここでの音声認識・テキ
スト変換部２４は、音声情報ファイルを（例えば辞書順
に）並び替えるために必要な音声情報ファイルの先頭の
一部（一単語程度）のみをテキストに変換するだけの機
能でよい。この様にテキスト変換されたテキスト索引付
音声情報のファイル２９はテキストタグ（例えば辞書
順）を元の並び替えが音声情報ファイルの並び替え部２
６で行なわれる。The voice recognition / text converter 25 in FIG. 1 converts voice information extracted from the voice information file 28 in the memory 11b into text (character). It ’s important to convert all normal conversations to text correctly,
Since it is difficult to mount the voice information on the mobile phone even in consideration of the processor capacity and the memory capacity, the voice recognition / text conversion unit 24 here needs the voice information necessary to rearrange the voice information files (for example, in dictionary order). A function that converts only a part (about one word) of the head of the file to text may be used. The file 29 of the text-indexed voice information with the text converted in this way is based on the text tags (eg, in dictionary order).
6 is performed.

【００５５】図５に於いては、携帯電話機１のモード設
定時に索引（インデックス）作成モードになっているこ
とを前提とし、図３で説明した第１〜第８ステップＳ₁
〜Ｓ ₈での音声ファイル添付処理が終了した第８ステッ
プＳ₈がＹＥＳの状態から第１ステップＳＴＥ₁のイン
デックス作成モードか否かをＣＰＵ３７は判断し、イン
デックスを作成しない場合は第２ステップＳＴＥ₂から
第３ステップＳＴＥ₃に進んで音声情報ファイルを画像
等の視覚情報ファイルとマージさせて終了に至る。In FIG. 5, the mode setting of the mobile phone 1 is shown.
Make sure that you are in index creation mode
And the first to eighth steps S described in FIG.₁
~ S ₈Step 8 when the audio file attachment process is completed
S₈From the state of YES to the first step STE₁Inn
The CPU 37 determines whether or not the mode is the index creation mode, and
If you do not create a dex, the second step STE_TwoFrom
Third step STE_ThreeProceed to the audio information file
And merge it with the visual information file.

【００５６】一方、インデックス作成モードがＹＥＳで
あれば第４ステップＳＴＥ₄に進んで音声情報ファイル
の先頭部を切り出し、音声認識する。On the other hand, if the index creation mode is YES, the process proceeds to the fourth step STE ₄ to cut out the head of the voice information file and perform voice recognition.

【００５７】第５ステップＳＴＥ₅では音声認識結果か
ら「逗子海岸の夕日」の一部「ズシカイガン」と云うイ
ンデックスを作成する。[0057] to create an index that some referred to as "Zushikaigan" of "Zushi coast sunset" from the fifth step STE ₅ in the voice recognition result.

【００５８】第６ステップＳＴＥ₆ではインデックス、
音声情報ファイルを画像等の視覚情報ファイルとマージ
する。In the sixth step STE ₆ , the index
The audio information file is merged with a visual information file such as an image.

【００５９】第７ステップＳＴＥ₇では検索のために画
像等の視覚情報ファイルのインデックスによるソートが
行なわれる。例えば辞書順に並び替えが行なわれた後に
インデックス作成処理終了に至る。In the seventh step STE ₇ , visual information files such as images are sorted by index for retrieval. For example, after sorting in dictionary order, the index creation processing ends.

【００６０】上述ではインデックスによる並び替えを第
７ステップＳＴＥ₇で行なったが、第６ステップＳ₆の
音声情報ファイル作成時に並び替えたり、第７ステップ
Ｓ₇での音声再生時に選択された（例えば「ズ」で始ま
る音声を指定）音声が再生できればテキストのインデッ
クスが付けられたまま並び替えることなくメモリ１１ｂ
上に保存されていてもよい。[0060] Although in the above was subjected to rearrangement by the index in the seventh step STE _7, or sorted at the time of voice information file created in the sixth step S _6, has been selected at the time of the audio playback of the seventh step S ₇ (for example If the voice can be played back, the memory 11b is not sorted and the text is indexed.
It may be stored on top.

【００６１】上述のインデックス作成処理の第６ステッ
プＳＴＥ₆でインデックス、音声情報ファイル、画像等
の視覚情報ファイルのマージで作成した１つのファイル
のイメージは図７に示す様に例えば「ズシカイガン」の
様なインデックス５３が付加される。The image of one file created by merging the index, audio information file, visual information file such as image, etc. in the sixth step STE ₆ of the above index creation processing is, for example, as shown in FIG. Index 53 is added.

【００６２】この様な１つのファイル内に３つの各ファ
イル（インデックス、音声情報、視覚情報）をまとめた
場合、一度作成した１つのファイルに対し、例えば音声
情報ファイルだけを付け替え（関連付けの変更）る場合
に、プロセッサの負荷が大きくなり、メモリの利用効率
も悪くなる。When three files (index, audio information, and visual information) are put together in one such file, for example, only the audio information file is replaced (change of association) with respect to the once created file. In such a case, the load on the processor increases, and the efficiency of using the memory decreases.

【００６３】そこでテキストのタグファイルを作成し、
複数のファイルをリンクさせることで検索を容易にし、
メモリも小容量化可能にすることも出来る。これを図８
及び図９を用いて説明する。Then, a text tag file is created,
Linking multiple files makes searching easier,
The memory can also be made smaller. This is shown in FIG.
This will be described with reference to FIG.

【００６４】図８（Ａ）の２７は図３の第１ステップＳ
₁でダウンロード等により取得した画像等の視覚情報フ
ァイルであり、例えば０００４２６１９００００Ｊｐｇ
は取得日時にしたがって自動的に第２ステップＳ₂で付
加したファイル名であるが携帯電話機１でのファイル名
と重複しない様にしている。FIG. 8A shows a first step S 27 in FIG.
_This is a visual information file such as an image acquired by downloading or the like in _1.
And the manner it is automatically file name added in the second step S ₂ according to the acquired date and time does not overlap with the file name in the mobile phone 1.

【００６５】図８（Ｂ）の２８は図３の第６ステップＳ
₆で作成した「逗子海岸の夕日」という音声情報ファイ
ルであり、例えば０００４２１９０２００．ｗａｖは音
声情報ファイルのファイル名である。In FIG. 8B, reference numeral 28 denotes the sixth step S in FIG.
₆ is a voice information file named “Sunset at Zushi Beach”. wav is the file name of the audio information file.

【００６６】図８（Ｃ）の５３は音声情報ファイル２８
の先頭から音声認識により音声からテキストに変換し、
図５の第５ステップＳＴＥ₅で作成した「ズシカイガ
ン」と云うインデックスファイルであり、０００４２６
１９０２００．ｔｘｔはインデックス作成時のファイル
名である。In FIG. 8C, reference numeral 53 denotes the audio information file 28.
From speech to text by speech recognition from the beginning of
The index file “Zushikaigan” created in the fifth step STE ₅ of FIG.
190200. txt is a file name at the time of index creation.

【００６７】本発明では上述の視覚情報ファイル２７、
音声情報ファイル２８、インデックスファイル５３のフ
ァイルをリンクするテキストインデックスタグファイル
５４を作成する。このテキストインデックスタグファイ
ルを作成する際にファイル名を新たに作成してもよいが
図８（Ｃ）に示すインデックスファイルは既にテキスト
なのでこのファイル名を修正したファイル名０００４２
６１９０２００．ｈｅａｄを作成しリンク情報として
〈ＴＩＴＴＬＥ〉，〈ＶＯＩＣＥ〉，〈ＣＯＮＴＥＮＴ
Ｓ〉等のタグを定義し、ここに各ファイル名を記述する
様にする。In the present invention, the above-mentioned visual information file 27,
A text index tag file 54 that links the audio information file 28 and the index file 53 is created. When the text index tag file is created, a new file name may be created. However, since the index file shown in FIG.
6190200. Create a head and use <TITLE>, <VOICE>, <CONTENT> as link information.
S> etc. are defined, and each file name is described here.

【００６８】又、テキストインデックスタグファイル５
４をユーザ独自の仕様でなく、国際標準であるＸＭＬ
（ＥｘｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａ
ｇｅ）のタグが定義可能なマークアップ言語で記述する
ことによって、携帯電話機に限定されることなく、イン
ターネット上のＷｅｂサイト、他のユーザのパーソナル
コンピュータ、その他ＸＭＬ記述を認識できる機器でも
テキストインデックスタグファイルのリンク情報が認識
でき、音声情報ファイル、画像等の視覚情報ファイルを
テキストインデックスタグファイルと一緒に配布・管理
することで、異なるメディア（音声と画像等の視覚情
報）のファイルをまとめて扱うことが、インターネット
上のＷｅｂサイト、他のユーザのパーソナルコンピュー
タ、その他ＸＭＬ記述を認識できる機器上で容易にな
る。The text index tag file 5
4 is not a user's own specification but an international standard XML
(Extensible Markup Langua
g) is described in a definable markup language, so that the text index tag is not limited to a mobile phone but can be used on a Web site on the Internet, a personal computer of another user, or any other device capable of recognizing an XML description. File link information can be recognized, and visual information files such as audio information files and images are distributed and managed together with text index tag files, so that files of different media (audio information and visual information such as images) are handled collectively. This is facilitated on Web sites on the Internet, personal computers of other users, and other devices that can recognize XML descriptions.

【００６９】図６は、インデックスが付加されたファイ
ル（インデックスモード）でのファイル検索再生、表示
のフローチャートを示すものであり、第１ステップＳＴ
ＥＰ ₁ではインデックスを表示するか否かの判断が成さ
れた、ＮＯであれば第２ステップＳＴＥＰ₂で従来の画
像表示による検索或は図３で説明した音声タイトルによ
る検索が行なわれる。FIG. 6 shows a file to which an index has been added.
Search and playback and display files in index mode
Is a flowchart of the first step ST.
EP ₁Now the decision is made whether to display the index
If NO, the second step STEP_TwoWith conventional picture
Search by image display or by audio title described in FIG.
Search is performed.

【００７０】第１ステップＳＴＥＰ₁がＹＥＳの場合
は、第３ステップＳＴＥＰ₃に進んでインデックスの一
覧表８ｈを表示部８に表示するか、インデックスの最初
の文字を表示する。When the first step STEP ₁ is YES, the process proceeds to the third step STEP ₃ to display the index list 8 h on the display unit 8 or to display the first character of the index.

【００７１】第４ステップＳＴＥＰ₄では表示部８の該
当する番号、例えば「４」を入力するか、ファイルの多
いときは１頁で一覧できないので最初の文字「ズシカイ
ガン」の「ズ」のみを入力する。In the fourth step STEP ₄ , the corresponding number of the display section 8, for example, “4” is input, or if there are many files, only the first character “Z” of the first character “Zushikaigan” is input because it cannot be listed on one page. I do.

【００７２】第５ステップＳＴＥＰ₅では対応する画像
８ａ等の視覚情報ファイルの表示と音声タイトルが再生
される。即ち、表示部８には画像８ａと「ズシカイガ
ン」の文字８ｉが表示されると同時に音声タイトルも再
生されて終了に至る。[0072] Display and audio title visual information file of the image 8a and the like corresponding in fifth step STEP ₅ is reproduced. That is, the image 8a and the character 8i of "Zushikaigan" are displayed on the display unit 8, and at the same time, the audio title is reproduced and the process ends.

【００７３】上述の様なインデックスモードでの視覚情
報ファイルの検索は（１）テキストインデックスタグファイル５４のタグ
〈ＴＩＴＬＥ〉の中身の一覧表の表示する。（２）同じテキストインデックスがある場合等は音声フ
ァイルを再生する。２段階にすることで単純に音声ファイルを再生して探す
よりも検索効率を上げることが可能となる。また、テキ
ストインデックスタグファイル５４を設けることで音声
情報ファイル２８と視覚情報ファイル２７を別々に存在
させたままでリンクを変更するだけで関連付けの変更が
容易に行なえる。更に、音声情報ファイルと視覚情報フ
ァイルの関連付けを１：１だけでなく１つの音声情報フ
ァイルに対して複数の画像等の視覚情報ファイルや、複
数の音声情報ファイルを１つの視覚情報ファイルに関連
付けすることが可能となり、メモリを節約可能となる。The search for the visual information file in the index mode as described above is as follows: (1) A list of the contents of the tag <TITLE> in the text index tag file 54 is displayed. (2) If the same text index exists, the audio file is reproduced. With two stages, it is possible to increase the search efficiency rather than simply reproducing and searching for an audio file. Further, by providing the text index tag file 54, the association can be easily changed only by changing the link while the audio information file 28 and the visual information file 27 are separately provided. Further, the audio information file is associated not only with the visual information file in a ratio of 1: 1, but also with respect to one audio information file, such as a plurality of visual information files such as a plurality of images, and a plurality of audio information files are associated with one visual information file. And saves memory.

【００７４】[0074]

【発明の効果】本発明の携帯電話機及び未整理の視覚情
報の処理方法によれば、（１）インターネット等と接続して得た複数の視覚情報
ファイルや、端末内蔵あるいは外部の画像デバイスから
入力した複数の視覚情報ファイルについて、携帯電話機
では困難なテキスト入力をすることなく、容易に入力可
能な音声をタイトル、説明情報として添付することによ
り、複数の視覚情報ファイルの整理が可能となる。（２）音声情報ファイルを作成する際に、再生速度制御
可能な音声コーデック（例えばＭＰＥＧ−４ＨＶＸ
Ｃ）を使用することにより、検索時間を短縮化すること
が出来る。（３）音声認識・テキスト変換により音声情報ファイル
の先頭部分のテキスト検索を作成することにより、テキ
ストインデックスによる検索も加えることで、より検索
効率を上げることができる。（４）テキストインデックス、音声情報ファイル、画像
等の視覚情報ファイルを１つのファイルとしてあらたに
作成し直すのではなく、テキストインデックスファイル
に音声情報ファイル、画像等の視覚情報ファイルへのリ
ンクを記述することで音声情報ファイル、視覚情報ファ
イルの関連付けの変更を容易に高速に行なえる。さら
に、音声情報ファイルと視覚情報ファイルの関連付けを
柔軟に行なうことができ、１対１だけでなく、１つの音
声情報ファイルに複数の視覚情報ファイルを、または複
数の音声情報ファイルを１つの視覚情報ファイルに関連
付けることが容易になる。等の多くの効果を奏する。According to the portable telephone and the method of processing unarranged visual information according to the present invention, (1) a plurality of visual information files obtained by connecting to the Internet or the like, or an input from a terminal built-in or an external image device; With regard to the plurality of visual information files described above, the easily inputtable audio is attached as title and explanation information without text input which is difficult with a mobile phone, so that the plurality of visual information files can be organized. (2) When creating an audio information file, an audio codec (for example, MPEG-4 HVX) capable of controlling the playback speed
By using C), the search time can be reduced. (3) By creating a text search at the beginning of a speech information file by speech recognition and text conversion, a search using a text index is also added, so that search efficiency can be further improved. (4) Instead of recreating a visual information file such as a text index, an audio information file, and an image as a single file, a link to the visual information file such as an audio information file and an image is described in the text index file. This makes it possible to easily and quickly change the association between the audio information file and the visual information file. Furthermore, the audio information file and the visual information file can be flexibly associated with each other, and not only one-to-one but also one audio information file with a plurality of visual information files or a plurality of audio information files with one visual information file It becomes easier to associate with files. And many other effects.

[Brief description of the drawings]

【図１】本発明の携帯電話機の機能ブロック図である。FIG. 1 is a functional block diagram of a mobile phone according to the present invention.

【図２】本発明の携帯電話機のブロック図である。FIG. 2 is a block diagram of a mobile phone according to the present invention.

【図３】本発明の音声情報ファイル添付時のフローチャ
ートである。FIG. 3 is a flowchart when an audio information file is attached according to the present invention.

【図４】本発明のファイル検索時のフローチャートであ
る。FIG. 4 is a flowchart at the time of file search according to the present invention.

【図５】本発明のインデックス作成時のフローチャート
である。FIG. 5 is a flowchart at the time of creating an index according to the present invention.

【図６】本発明のインデックスモードのファイル検索時
のフローチャートである。FIG. 6 is a flowchart at the time of file search in the index mode of the present invention.

【図７】本発明の音声情報ファイル添付時のファイルの
イメージ図である。FIG. 7 is an image diagram of a file when an audio information file is attached according to the present invention.

【図８】本発明に用いる各ファイルの説明図である。FIG. 8 is an explanatory diagram of each file used in the present invention.

【図９】本発明のテキストインデックスタグファイルの
イメージ図である。FIG. 9 is an image diagram of a text index tag file of the present invention.

【図１０】従来の情報取得方法説明図である。FIG. 10 is an explanatory diagram of a conventional information acquisition method.

[Explanation of symbols]

１‥‥携帯電話機、８‥‥表示部、１１ａ，１１ｂ‥‥
メモリ、１４‥‥外部入力部、１６‥‥画像ファイル作
成部、１９‥‥音声ファイル作成部、２２‥‥音声情報
ファイル添付部、２５‥‥音声認識テキスト変換部、２
７‥‥視覚情報ファイル、２８‥‥音声情報ファイル、
３３‥‥スピーカ、３５‥‥マイクロホン1 mobile phone, 8 display, 11a, 11b
Memory, 14 external input unit, 16 image file creation unit, 19 audio file creation unit, 22 audio information file attachment unit, 25 speech recognition text conversion unit, 2
7 visual information file, 28 audio information file,
33 ‥‥ speaker, 35 ‥‥ microphone

【手続補正書】[Procedure amendment]

【提出日】平成１２年８月１７日（２０００．８．１
７）[Submission date] August 17, 2000 (2008.1.
7)

【手続補正１】[Procedure amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００６４[Correction target item name] 0064

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００６４】図８（Ａ）の２７は図３の第１ステップＳ
1 でダウンロード等により取得した画像等の視覚情報フ
ァイルであり、例えば０００４２６１９００００．ｊｐ
ｇは取得日時にしたがって自動的に第２ステップＳ2 で
付加したファイル名であるが携帯電話機１でのファイル
名と重複しない様にしている。FIG. 8A shows a first step S 27 in FIG.
1 is a visual information file such as an image acquired by downloading or the like. jp
g is the file name automatically added in the second step S2 according to the acquisition date and time, but is not duplicated with the file name in the mobile phone 1.

【手続補正２】[Procedure amendment 2]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００６７[Correction target item name] 0067

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００６７】本発明では上述の視覚情報ファイル２７、
音声情報ファイル２８、インデックスファイル５３のフ
ァイルをリンクするテキストインデックスタグファイル
５４を作成する。このテキストインデックスタグファイ
ルを作成する際にファイル名を新たに作成してもよいが
図８（Ｃ）に示すインデックスファイルは既にテキスト
なのでこのファイル名を修正したファイル名０００４２
６１９０２００．ｈｅａｄを作成しリンク情報として
〈ＴＩＴＬＥ〉，〈ＶＯＩＣＥ〉，〈ＣＯＮＴＥＮＴ
Ｓ〉等のタグを定義し、ここに各ファイル名を記述する
様にする。In the present invention, the above-mentioned visual information file 27,
A text index tag file 54 that links the audio information file 28 and the index file 53 is created. When the text index tag file is created, a new file name may be created. However, since the index file shown in FIG.
6190200. Create a head and use <TITLE>, <VOICE>, <CONTENT> as link information.
S> etc. are defined, and each file name is described here.

【手続補正３】[Procedure amendment 3]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００７３[Correction target item name] 0073

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００７３】上述の様なインデックスモードでの視覚情
報ファイルの検索は（１）テキストインデックスタグファイル５４のタグ
〈ＴＩＴＬＥ〉の中身の一覧表を表示する。（２）同じテキストインデックスがある場合等は音声フ
ァイルを再生する。２段階にすることで単純に音声ファイルを再生して探す
よりも検索効率を上げることが可能となる。また、テキ
ストインデックスタグファイル５４を設けることで音声
情報ファイル２８と視覚情報ファイル２７を別々に存在
させたままでリンクを変更するだけで関連付けの変更が
容易に行なえる。更に、音声情報ファイルと視覚情報フ
ァイルの関連付けを１：１だけでなく１つの音声情報フ
ァイルに対して複数の画像等の視覚情報ファイルや、複
数の音声情報ファイルを１つの視覚情報ファイルに関連
付けすることが可能となり、メモリを節約可能となる。Searching for a visual information file in the index mode as described above: (1) A list of the contents of the tag <TITLE> in the text index tag file 54 is displayed. (2) If the same text index exists, the audio file is reproduced. With two stages, it is possible to increase the search efficiency rather than simply reproducing and searching for an audio file. Further, by providing the text index tag file 54, the association can be easily changed only by changing the link while the audio information file 28 and the visual information file 27 are separately provided. Further, the audio information file is associated not only with the visual information file in a ratio of 1: 1, but also with respect to one audio information file, such as a plurality of visual information files such as a plurality of images, and a plurality of audio information files are associated with one visual information file. And saves memory.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 13/00 Ｈ０４Ｍ 1/00 Ｒ５Ｋ０２７ 15/00 Ｈ０４Ｎ 5/76 Ｂ５Ｋ０６７Ｈ０４Ｑ 7/38 Ｇ１０Ｌ 3/00 ＥＨ０４Ｍ 1/00 ５５１ＡＨ０４Ｎ 5/76 Ｈ０４Ｂ 7/26 １０９ＴＦターム(参考） 5B075 ND14 ND16 ND36 NK10 NK54 5B082 EA09 GC04 5C052 AC08 DD04 5D015 EE01 HH01 KK02 KK03 5D045 AA07 AB01 AB26 5K027 AA11 FF22 HH26 5K067 AA21 BB04 DD51 EE02 FF02 HH22 HH23 KK15 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G10L 13/00 H04M 1/00 R 5K027 15/00 H04N 5/76 B 5K067 H04Q 7/38 G10L 3/00 E H04M 1/00 551A H04N 5/76 H04B 7/26 109T F-term (reference) 5B075 ND14 ND16 ND36 NK10 NK54 5B082 EA09 GC04 5C052 AC08 DD04 5D015 EE01 HH01 KK02 KK03 5D045 AA27 A04 AB26A04 A2627 FF02 HH22 HH23 KK15

Claims

[Claims]

1. A visual information obtaining means for obtaining a plurality of unorganized visual information in a file; a search audio information generating means for generating search audio information corresponding to the unorganized visual information file; A portable device comprising: storage means for storing information and the search voice information; and search file creation means for merging or linking the visual information and the search voice information in the storage means. Phone.

2. A speech recognition unit for recognizing a head of an explanation file created by the search speech information creation unit, and a search by converting information of the head recognized by the speech recognition unit into text information. 2. A search text information creating means for creating search text information, wherein the search text information is merged or linked with the search audio information and the visual information. Mobile phone.

3. A reproduction / selection unit for reproducing / selecting the search audio information stored in the storage unit, and a reproduction speed variable unit capable of changing a reproduction speed of the reproduction / selection unit to high speed or low speed. 3. The mobile phone according to claim 1, wherein a search time of the search voice information is varied.

4. A visual information obtaining step of obtaining a plurality of unorganized visual information in a file; a search audio information generating step of generating search audio information corresponding to the unorganized visual information file; An unorganized visual information processing method, comprising: a search file creating step of merging or linking information and the search audio information in a storage means.