JP2010139843A

JP2010139843A - Voice information collection device, and method and program for the same

Info

Publication number: JP2010139843A
Application number: JP2008316921A
Authority: JP
Inventors: Manabu Satsusano; 学颯々野; Tobias Cincarek; トビアスツィンツァレク
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2008-12-12
Filing date: 2008-12-12
Publication date: 2010-06-24
Anticipated expiration: 2028-12-12
Also published as: JP4808763B2

Abstract

PROBLEM TO BE SOLVED: To provide a voice information collection device, and a method and a program for the same, efficiently collecting a voice data. SOLUTION: A service providing server 100 has a function for authenticating and collecting the voice data for a text displayed in an image, and includes a voice information database 101, voice candidate database 102 and search history database 103, as storage means. It also includes a voice input request means 110, an image conversion means 120, a voice information acquiring means 130, a voice recognition means 140, a registration determination means 150, a voice registering means 160, a service provision determination means 170, a service providing means 180, and a text extracting means 190, as arithmetic processing means. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、各種情報サービスを提供する音声情報収集装置、その方法およびそのプログラムに関する。 The present invention relates to a voice information collection apparatus that provides various information services, a method thereof, and a program thereof.

従来、インターネットなどのネットワーク上で各種情報サービス（ダウンロード、アカウント作成など）を利用する際に、画像認証や音声認証という技術が用いられている。これらの技術は悪意のあるプログラムからの不正なアクセス（無作為かつ連続したパスワード入力など）を防止するための認証システムである。画像認証は、例えば、背景や文字の角度を加工した画像や、誰もが認識可能な画像を表示し、画像を表すテキストを入力させる。音声認証は、例えば、音声を流し、聞き取った音声をテキストで入力させる。 Conventionally, when various information services (downloading, account creation, etc.) are used on a network such as the Internet, techniques such as image authentication and voice authentication are used. These technologies are authentication systems for preventing unauthorized access (such as random and continuous password entry) from a malicious program. In the image authentication, for example, an image obtained by processing a background or a character angle or an image that can be recognized by anyone is displayed, and text representing the image is input. In the voice authentication, for example, a voice is played and the heard voice is input as text.

一方、個人認証等に用いられる音声認識という技術がある。音声認識は、人間が話す言葉をコンピュータによって解析し、話している内容を文字データとして取り出したり、個人を認識する技術であり、新しい入力手段として注目を集めている。この音声認識は、統計的手法が用いられるため、大量の発話を記録したデータが必要不可欠である。
例えば、言語学習を行うために、会話実例データや誤解事例データを公募し、会話場面に適応した学習用データを収集する技術が知られている（例えば、特許文献１）。 On the other hand, there is a technology called voice recognition used for personal authentication and the like. Speech recognition is a technology that analyzes words spoken by humans using a computer and extracts spoken content as character data or recognizes an individual, and has attracted attention as a new input means. Since this voice recognition uses a statistical method, data recording a large amount of utterances is indispensable.
For example, in order to perform language learning, a technique is known in which conversation example data and misunderstanding example data are solicited and learning data adapted to a conversation scene is collected (for example, Patent Document 1).

特開２００３−８４６５４号公報JP 2003-84654 A

しかしながら、特許文献１では、語学学習用のデータを公募するため、対象となるのは語学学習者である。したがって、データを投稿するユーザは限られ、幅広く数多くのデータを収集するには制限がある。また、公募という形式でデータを収集することはユーザにとって負担となることが多く、たとえインセンティブを付与したとしても、データの収集を促進することは困難である。 However, in Japanese Patent Application Laid-Open No. 2004-260260, language learners are invited so that language learners are targeted. Therefore, the number of users who post data is limited, and there is a limit in collecting a large amount of data. In addition, collecting data in the form of public offering is often a burden on the user, and even if an incentive is given, it is difficult to promote data collection.

本発明の目的は、音声データを効率よく収集することのできる音声情報収集装置、その方法およびそのプログラムを提供することである。 An object of the present invention is to provide an audio information collecting apparatus, a method thereof, and a program thereof that can efficiently collect audio data.

本発明の音声情報収集装置は、ネットワークを介して接続された端末装置に対する情報サービスの提供と共に、音声情報の収集を行う音声情報収集装置であって、テキストデータとこのテキストデータの発音情報とを関連付けて記憶する音声情報記憶手段と、前記音声情報記憶手段に記憶されているテキストデータを前記端末装置に送信して表示させ、このテキストデータに対して端末装置で音声入力された音声情報を取得する音声情報取得手段と、前記取得した音声情報の音声認識を行い、その音声認識結果が前記送信したテキストデータの発音情報と一致する場合に、前記情報サービスの提供を開始する情報サービス提供手段と、前記音声情報取得手段が前記端末装置に表示させるテキストデータと同一の形式によって前記音声情報の収集対象となるテキストデータを、前記表示させたテキストデータと並列に前記端末装置に表示させて、ユーザに該収集対象となるテキストデータに対して音声入力させる音声入力要求手段と、を具備したことを特徴とする。 The voice information collection device of the present invention is a voice information collection device that collects voice information together with providing information services to terminal devices connected via a network, and includes text data and pronunciation information of the text data. The voice information storage means for storing in association and the text data stored in the voice information storage means are transmitted to the terminal device for display, and the voice information voice-input by the terminal device is acquired for the text data Voice information acquisition means that performs voice recognition of the acquired voice information, and information service provision means that starts providing the information service when the voice recognition result matches the pronunciation information of the transmitted text data The voice information acquisition means stores the voice information in the same format as the text data displayed on the terminal device. Voice input requesting means for causing the text data to be displayed to be displayed on the terminal device in parallel with the displayed text data and allowing the user to input voice to the text data to be collected. Features.

この発明は、テキストデータとその音声情報とを用いた音声認識により情報サービスを提供するとともに、テキストデータに対する音声情報を収集することができるものである。ここで、音声情報とは、テキストデータを人が発話した音声をデジタル化したデータであり、発音情報とは、音声データの読み方を示すデータである。
情報サービスの提供は、音声情報がすでに音声情報記憶手段に記憶されているテキストデータを用いて行われる。また、音声情報の収集対象は、音声情報記憶手段に記憶されている全てのテキストデータであるが、音声情報が記憶されていないテキストデータを用いることが好ましい。 The present invention provides an information service by voice recognition using text data and its voice information, and can collect voice information for the text data. Here, the voice information is data obtained by digitizing voice uttered by a person from text data, and the pronunciation information is data indicating how to read the voice data.
The provision of the information service is performed using text data in which voice information is already stored in the voice information storage means. The collection target of the voice information is all text data stored in the voice information storage unit, but it is preferable to use text data in which no voice information is stored.

この発明によれば、音声情報取得手段および情報サービス提供手段により、音声情報記憶手段に記憶されたテキストデータのうち音声情報がすでに記憶されているテキストデータを端末装置に送信して表示させ、このテキストデータに対して音声入力された音声情報を音声認識し、この音声認識結果と、音声情報記憶手段に記憶された発音情報とを照合し、一致すれば情報サービスを提供する。また、音声入力要求手段により、音声情報記憶手段に記憶されたテキストデータを、音声認識に用いたテキストデータとともに端末装置に送信して並列に表示させ、このテキストデータに対して音声入力させる。
このように、情報サービスを利用するための音声入力と音声情報を収集するための音声入力を同時に行わせることにより、ユーザに迷わせることなく音声入力を促すことができる。その結果、収集対象のテキストデータの音声情報を効率よく収集することができる。また、情報サービスの利用時には、ユーザは適切な情報を入力しようとするので、より適切な音声情報を入力させることができる。 According to this invention, the voice information acquisition means and the information service provision means transmit the text data in which the voice information is already stored among the text data stored in the voice information storage means to the terminal device for display. The speech information input by speech to the text data is recognized by speech, and the speech recognition result is compared with the pronunciation information stored in the speech information storage means. If they match, an information service is provided. Further, the text input request means sends the text data stored in the voice information storage means to the terminal device together with the text data used for voice recognition, displays the text data in parallel, and causes the text data to be voice-inputted.
As described above, by performing voice input for using the information service and voice input for collecting voice information at the same time, it is possible to prompt the user to input voice without hesitation. As a result, it is possible to efficiently collect voice information of text data to be collected. Further, when using the information service, the user tries to input appropriate information, so that more appropriate audio information can be input.

本発明の音声情報収集装置において、前記音声入力要求手段が表示させたテキストデータに対して音声入力された音声情報を、該テキストデータと対応付けて前記音声情報記憶手段に記憶させる入力情報格納手段を更に備えたことが好ましい。 In the voice information collecting apparatus of the present invention, input information storage means for storing voice information voice-input to text data displayed by the voice input request means in association with the text data and storing it in the voice information storage means Is preferably further provided.

この発明では、前述の音声情報収集装置において、収集対象のテキストデータに対して音声入力された音声情報を音声情報記憶手段に記憶させるので、より精度の高い音声情報を音声情報記憶手段に収集することができる。 In the present invention, since the voice information input to the text data to be collected is stored in the voice information storage means in the voice information collection device described above, more accurate voice information is collected in the voice information storage means. be able to.

本発明の音声情報収集装置において、前記テキストデータと、前記音声情報と、同一の前記音声情報を取得した回数と、を関連付けて記憶する入力候補記憶手段と、所定の前記テキストデータおよび前記音声情報における前記回数が所定回数以上である場合に登録可能と判定する登録判定手段と、をさらに備え、前記入力情報格納手段は、前記登録可能と判定された前記音声情報を前記テキストデータに対応付けて前記音声情報記憶手段に記憶させることが好ましい。 In the speech information collecting apparatus of the present invention, the text data, the speech information, and the number of times the same speech information is acquired are stored in association with each other, the predetermined text data and the speech information Registration determination means for determining that registration is possible when the number of times is equal to or greater than a predetermined number, and the input information storage means associates the speech information determined to be registerable with the text data. It is preferable to store in the voice information storage means.

この発明によれば、入力候補記憶手段には、音声情報記憶手段に記憶されているテキストデータが記憶され、このテキストデータに対して同一の音声情報を取得した回数が記憶されている。回数が多いほど周知性の高い情報であるといえるので、登録判定手段により所定回数以上取得した音声情報を登録可能とし、入力情報格納手段は登録可能とされた音声情報をテキストデータと対応付けて音声情報記憶手段に記憶させるので、テキストデータに対するより適切な音声情報を収集することができる。このようにして収集したデータは、信頼性が高い。 According to this invention, the input candidate storage means stores the text data stored in the speech information storage means, and stores the number of times the same speech information is acquired for the text data. As the number of times increases, it can be said that the information is more well-known. Therefore, it is possible to register the voice information acquired a predetermined number of times or more by the registration determination unit, and the input information storage unit associates the voice information that can be registered with the text data Since it is stored in the voice information storage means, more appropriate voice information for the text data can be collected. The data collected in this way is highly reliable.

本発明の音声情報収集装置において、キーワードの検索の検索履歴を記憶する検索履歴記憶手段をさらに備え、前記音声入力要求手段は、前記検索履歴に基づいて、検索率が急上昇した前記キーワードを前記検索履歴記憶手段から抽出して前記音声情報の収集対象となるテキストデータとして端末装置に表示させることが好ましい。 The voice information collection device of the present invention further comprises search history storage means for storing a search history of keyword search, wherein the voice input request means searches for the keyword whose search rate has rapidly increased based on the search history. It is preferable to extract from the history storage means and display it on the terminal device as text data to be collected voice information.

この発明では、音声入力要求手段により、検索履歴記憶手段に記憶された検索履歴に基づいて、検索率が急上昇したキーワードを抽出し、音声情報の収集対象のテキストデータとする。検索履歴記憶手段には検索サイト等で検索されたキーワードが記憶されるので、話題性および流行性の高いキーワード等がいち早く記憶される。また、これらのキーワードのうち、検索率が急上昇したキーワードはより話題性および流行性が高い。したがって、話題性および流行性の高いキーワードに対する音声情報を迅速かつ高い精度で収集することができる。 In this invention, the voice input request means extracts the keyword whose search rate has increased rapidly based on the search history stored in the search history storage means, and sets it as text data to be collected voice information. The search history storage means stores keywords searched at a search site or the like, so that keywords with high topicality and fashion are stored quickly. Of these keywords, the keyword whose search rate has increased rapidly has higher topicality and fashionability. Therefore, it is possible to quickly and accurately collect voice information for keywords with high topicality and popularity.

本発明の音声情報収集装置において、前記テキストデータを画像に変換した画像データを生成する画像変換手段をさらに備えたことが好ましい。
この発明によれば、音声情報記憶手段から抽出されたテキストデータを画像データに変換し、変換したそれぞれの画像データを端末装置に送信して表示させる。画像データは解析されにくいため、不正アクセス等を防止することができ、セキュリティ性を向上させることができる。 The audio information collecting apparatus according to the present invention preferably further comprises image conversion means for generating image data obtained by converting the text data into an image.
According to the present invention, the text data extracted from the voice information storage means is converted into image data, and each converted image data is transmitted to the terminal device for display. Since image data is difficult to analyze, unauthorized access and the like can be prevented, and security can be improved.

本発明の音声情報収集装置において、前記テキストデータは、広告に関するテキストデータであることが好ましい。
この発明によれば、テキストデータとして広告に関するテキストデータを用いる。広告に関するテキストデータとは、例えば、商品名、メーカー名、キャッチフレーズなど商品またはサービスに関する情報であり、これらの情報がテキストで表示される。そのため、各種情報サービスを利用するたびにこの広告テキストがユーザの目に留まるとともに、その広告テキストをユーザ自身が声に出して発話するので、広告効果の向上を図ることができる。 In the audio information collecting apparatus of the present invention, it is preferable that the text data is text data related to an advertisement.
According to the present invention, the text data related to the advertisement is used as the text data. The text data related to the advertisement is, for example, information related to a product or service such as a product name, a manufacturer name, and a tagline, and the information is displayed in text. For this reason, every time the various information services are used, the advertisement text is noticed by the user, and the advertisement text is uttered by the user himself / herself, so that the advertising effect can be improved.

本発明の音声情報収集方法は、ネットワークを介して接続された端末装置に対する情報サービスの提供と共に、音声情報の収集を行う音声情報収集方法であって、テキストデータとこのテキストデータの発音情報とを関連付けて音声情報記憶手段に記憶させ、前記音声情報記憶手段に記憶されているテキストデータを前記端末装置に送信して表示させ、このテキストデータに対して端末装置で音声入力された音声情報を取得し、前記取得した音声情報の音声認識を行い、その音声認識結果が前記送信したテキストデータの発音情報と一致する場合に、前記情報サービスの提供を開始し、前記音声情報取得手段が前記端末装置に表示させるテキストデータと同一の形式によって前記音声情報の収集対象となるテキストデータを、前記表示させたテキストデータと並列に前記端末装置に表示させて、ユーザに該収集対象となるテキストデータに対して音声入力させることを特徴とする。 The voice information collection method of the present invention is a voice information collection method for collecting voice information as well as providing an information service to a terminal device connected via a network. The voice information collection method includes text data and pronunciation information of the text data. Associating and storing the voice data in the voice information storage means, transmitting the text data stored in the voice information storage means to the terminal device for display, and obtaining the voice information inputted by voice from the terminal device to the text data When the acquired speech information is recognized, and the result of the speech recognition matches the pronunciation information of the transmitted text data, the provision of the information service is started, and the speech information acquisition means is the terminal device. The text data to be collected from the voice information is displayed in the same format as the text data to be displayed on the display. Text to display data in parallel with the terminal device, and wherein the to voice input to text data to be the collection object to the user.

この発明によれば、音声情報記憶手段に記憶されたテキストデータのうち音声情報がすでに記憶されているテキストデータを端末装置に送信して表示させ、このテキストデータに対して音声入力された音声情報を音声認識し、この音声認識結果と、音声情報記憶手段に記憶された発音情報とを照合し、一致すれば情報サービスを提供する。また、音声情報記憶手段に記憶されたテキストデータを、音声認識に用いたテキストデータとともに端末装置に送信して並列に表示させ、このテキストデータに対して音声入力させる。
このように、情報サービスを利用するための音声入力と音声情報を収集するための音声入力を同時に行うことにより、ユーザに迷わせることなく音声入力を促すことができる。その結果、収集対象のテキストデータの音声情報を効率よく収集することができる。また、情報サービスの利用時には、ユーザは適切な情報を入力しようとするので、より適切な音声情報を入力させることができる。 According to the present invention, text data in which voice information is already stored among the text data stored in the voice information storage means is transmitted to the terminal device for display, and the voice information input by voice to the text data is displayed. The voice recognition result is compared with the pronunciation information stored in the voice information storage means, and if they match, an information service is provided. The text data stored in the voice information storage means is transmitted to the terminal device together with the text data used for voice recognition, displayed in parallel, and voice input is performed on the text data.
Thus, by performing voice input for using the information service and voice input for collecting voice information at the same time, it is possible to prompt the user to input voice without hesitation. As a result, it is possible to efficiently collect voice information of text data to be collected. Further, when using the information service, the user tries to input appropriate information, so that more appropriate audio information can be input.

本発明の音声情報収集プログラムは、前述の音声情報収集方法をコンピュータに実行させることを特徴とする。
本発明の音声情報収集プログラムは、コンピュータを前述の音声情報収集装置として機能させることを特徴とする。
この発明によれば、音声情報収集プログラムにより、コンピュータに前述の音声情報収集方法を実施させるため、この音声情報収集プログラムをインストールするだけの簡単な構成で、前述と同様の作用効果を得ることでき、有用性が高い。 The audio information collection program of the present invention causes a computer to execute the above-described audio information collection method.
The voice information collection program of the present invention is characterized by causing a computer to function as the aforementioned voice information collection device.
According to the present invention, the voice information collecting program causes the computer to carry out the voice information collecting method described above, so that the same operational effects as described above can be obtained with a simple configuration simply by installing the voice information collecting program. Highly useful.

〔第１実施形態〕
本発明の第１実施形態を図１から図４に基づいて説明する。
図１は、本発明の第１実施形態にかかるサービス提供システムの概略構成を示すブロック図であり、図２は、前記第１実施形態におけるサービス提供サーバの概略構成を示すブロック図である。図３は、前記第１実施形態における開始ページが画面表示された状態を示す概略図である。図４は、前記第１実施形態におけるサービス提供サーバの動作を示すフローチャートである。
［サービス提供システムの構成］
図１に示すように、サービス提供システム１０は、サービス提供サーバ１００と、インターネット２０を介してサービス提供サーバ１００に接続されたウェブサーバ２００と、端末装置３００と、を備えている。 [First Embodiment]
A first embodiment of the present invention will be described with reference to FIGS.
FIG. 1 is a block diagram showing a schematic configuration of a service providing system according to the first embodiment of the present invention, and FIG. 2 is a block diagram showing a schematic configuration of a service providing server in the first embodiment. FIG. 3 is a schematic diagram showing a state where the start page is displayed on the screen in the first embodiment. FIG. 4 is a flowchart showing the operation of the service providing server in the first embodiment.
[Service providing system configuration]
As shown in FIG. 1, the service providing system 10 includes a service providing server 100, a web server 200 connected to the service providing server 100 via the Internet 20, and a terminal device 300.

インターネット２０はＴＣＰ／ＩＰなどの汎用のプロトコルに基づくインターネットであるが、これに限られない。例えば、ＬＡＮ（Local Area Network）などのイントラネット、無線媒体により情報が送受信可能な複数の基地局がネットワークを構成する通信回線網や放送網などのネットワーク、さらには、データを直接受信するための媒体となる無線媒体自体など、データを送受信させるいずれの構成も利用できる。 The Internet 20 is the Internet based on a general-purpose protocol such as TCP / IP, but is not limited to this. For example, an intranet such as a LAN (Local Area Network), a network such as a communication line network or a broadcasting network in which a plurality of base stations capable of transmitting and receiving information via a wireless medium constitute a network, and a medium for directly receiving data Any configuration that transmits and receives data, such as the wireless medium itself, can be used.

サービス提供サーバ１００は、情報サービスを提供するとともに、テキストデータに対する音声情報を収集する機能を有する。すなわち、サービス提供サーバ１００は、本発明における音声情報収集装置として機能するものである。
図２に示すように、サービス提供サーバ１００は、記憶手段として、音声情報記憶手段としての音声情報データベース１０１と、入力候補記憶手段としての音声候補データベース１０２と、検索履歴記憶手段としての検索履歴データベース１０３と、を備えている。 The service providing server 100 has functions of providing information services and collecting voice information for text data. That is, the service providing server 100 functions as a voice information collecting device in the present invention.
As shown in FIG. 2, the service providing server 100 includes a voice information database 101 as voice information storage means, a voice candidate database 102 as input candidate storage means, and a search history database as search history storage means as storage means. 103.

音声情報データベース１０１は、例えば、以下の表１に示すように、テキストデータごとにこのテキストデータを表す音声データとこのテキストデータの読みを示す発音情報としての読みデータが１つのレコードとして記憶されたテーブル構造となっている。音声データはテキストを発話した音声をデジタル化したデータが各種データ形式で記憶される。読みデータはカタカナまたはひらがなのテキストデータとして記憶される。なお、音声情報データベース１０１には、各テキストデータに対する音声データが記憶されているものと記憶されていないものとが存在する。 For example, as shown in Table 1 below, the voice information database 101 stores voice data representing the text data for each text data and reading data as pronunciation information indicating the reading of the text data as one record. It has a table structure. As voice data, data obtained by digitizing voice uttering text is stored in various data formats. Reading data is stored as katakana or hiragana text data. In the voice information database 101, there are voice data stored for each text data and those not stored.

音声候補データベース１０２は、例えば、以下の表２に示すように、テキストデータごとにこのテキストの音声データ候補と、この音声データ候補に基づいた読みデータ候補と、同一のテキストデータおよび同一の音声データ候補が入力された入力回数とが１つのレコードとして記憶されたテーブル構造となっている。テキストデータは、前述の音声情報データベース１０１に記憶されているテキストデータと同一のテキストデータが記憶され、１つのテキストデータに対して複数の音声データ候補および複数の読みデータ候補が存在し得る。音声データ候補としては、ユーザによって音声入力された音声データが記憶され、読みデータ候補には、カタカナまたはひらがなのテキストが記憶される。入力回数には、同一のテキストに対して同一の音声データ候補が音声入力された回数が記憶される。 For example, as shown in Table 2 below, the speech candidate database 102 includes, for each text data, a speech data candidate of the text, a reading data candidate based on the speech data candidate, the same text data, and the same speech data. It has a table structure in which the number of times a candidate is input is stored as one record. As the text data, the same text data as the text data stored in the speech information database 101 is stored, and a plurality of speech data candidates and a plurality of reading data candidates may exist for one text data. As voice data candidates, voice data input by the user is stored, and as reading data candidates, katakana or hiragana text is stored. As the number of times of input, the number of times that the same speech data candidate is speech input for the same text is stored.

検索履歴データベース１０３は、例えば、以下の表３に示すように、キーワードごとにこのキーワードの検索回数および検索日が１つのレコードとして記憶されたテーブル構造となっている。キーワードは、検索サイト等で検索されたキーワードが記憶される。 For example, as shown in Table 3 below, the search history database 103 has a table structure in which the search frequency and search date of this keyword are stored as one record for each keyword. The keyword is stored as a keyword searched on a search site or the like.

また、サービス提供サーバ１００は、演算処理手段として、音声入力要求手段１１０と、画像変換手段１２０と、音声情報取得手段１３０と、音声認識手段１４０と、登録判定手段１５０と、入力情報格納手段としての音声登録手段１６０と、サービス提供判断手段１７０と、情報サービス提供手段としてのサービス提供手段１８０と、テキスト抽出手段１９０と、を備えている。また、図示しないが、インターネット２０を介してデータを送受信するデータ送受信手段と、ウェブページを画面表示として出力させる出力手段と、キーボードなどの入力手段と、などを備えていてもよい。 In addition, the service providing server 100 serves as an arithmetic processing unit, such as a voice input request unit 110, an image conversion unit 120, a voice information acquisition unit 130, a voice recognition unit 140, a registration determination unit 150, and an input information storage unit. Voice registration means 160, service provision determination means 170, service provision means 180 as information service provision means, and text extraction means 190. Moreover, although not shown in figure, you may provide the data transmission / reception means which transmits / receives data via the internet 20, the output means which outputs a web page as a screen display, input means, such as a keyboard, etc.

音声入力要求手段１１０は、サービスの利用を開始するための開始ページに関する情報をインターネット２０を介して端末装置３００に送信する。この開始ページに関する情報には、サービスの利用を許可するために用いられる認証テキストデータと音声を収集するために用いられる対象テキストデータのほか、後述の開始ページを形成するフォーム等の情報が含まれる。認証テキストデータとしては、音声情報データベース１０１に記憶されたテキストデータのうち、音声データおよび読みデータがすでに記憶されているテキストデータをランダムに抽出する。また、対象テキストデータとしては、音声情報データベース１０１に記憶されたテキストデータのうち、音声データおよび読みデータが記憶されていないものを優先的に抽出する。 The voice input request unit 110 transmits information related to the start page for starting use of the service to the terminal device 300 via the Internet 20. The information related to the start page includes authentication text data used for permitting use of the service and target text data used for collecting voice, and information such as a form forming the start page described later. . As the authentication text data, text data in which voice data and reading data are already stored is randomly extracted from the text data stored in the voice information database 101. As the target text data, text data stored in the voice information database 101 that is not stored with voice data and reading data is preferentially extracted.

画像変換手段１２０は、音声入力要求手段１１０によって抽出された認証テキストデータおよび対象テキストデータを、画像データに変換する。例えば、各種文字のフォント画像を予め図示しない記憶手段に記憶させておき、一文字ずつ該当するフォント画像を抽出し、抽出したフォント画像を組み合わせて、該テキストデータに対応する画像を生成する。そして、この画像をビットマップ形式の画像データ（ドットパターンで表現された画像データ）に変換し、乱数を用いて歪ませるなどして変形させる。なお、テキストデータを画像データに変換する方法はこれに限られず、任意のの方法を用いることができる。 The image conversion unit 120 converts the authentication text data and target text data extracted by the voice input request unit 110 into image data. For example, font images of various characters are stored in advance in storage means (not shown), the corresponding font images are extracted one by one, and the extracted font images are combined to generate an image corresponding to the text data. Then, this image is converted into bitmap format image data (image data expressed by a dot pattern), and deformed by distorting it using a random number. Note that the method of converting text data into image data is not limited to this, and any method can be used.

このように、音声入力要求手段１１０および画像変換手段１２０により送信された情報が端末装置３００の出力手段によって画面表示された開始ページを具体的に図３に示す。
図３に示されるように、開始ページ３０は、サービスを開始するために用いられる認証領域３１と、音声収集のために用いられる対象領域３２と、ＯＫボタン３３と、キャンセルボタン３４と、音声入力を促す文章を表示する操作説明部３５と、を備えている。 FIG. 3 specifically shows the start page in which the information transmitted by the voice input request unit 110 and the image conversion unit 120 is displayed on the screen by the output unit of the terminal device 300 as described above.
As shown in FIG. 3, the start page 30 includes an authentication area 31 used for starting a service, a target area 32 used for voice collection, an OK button 33, a cancel button 34, and voice input. And an operation explanation unit 35 for displaying a sentence for prompting the user.

認証領域３１には、認証テキスト表示部３１１と、音声入力が可能であることを表すマイクを画像表示させたマイク表示部３１２と、認証テキスト表示部３１１に表示された認証テキストの音声入力を開始するための入力開始ボタン３１３と、認証テキスト表示部３１１に表示された認証テキストの音声入力を終了するための入力終了ボタン３１４と、が設けられている。認証テキスト表示部３１１には、認証テキストデータが画像変換された画像データが表示される。また、入力開始ボタン３１３が押下され、次に入力終了ボタン３１４が押下されるまでの間に入力される音声が認証音声データとして収集される。 In the authentication area 31, an authentication text display unit 311, a microphone display unit 312 displaying an image of a microphone indicating that voice input is possible, and voice input of the authentication text displayed on the authentication text display unit 311 are started. There are provided an input start button 313 and an input end button 314 for ending speech input of the authentication text displayed on the authentication text display unit 311. The authentication text display unit 311 displays image data obtained by converting authentication text data. In addition, voices input until the input start button 313 is pressed and then the input end button 314 is pressed are collected as authentication voice data.

対象領域３２には、対象テキスト表示部３２１と、音声入力が可能であることを表すマイクを画像表示させたマイク表示部３２２と、対象テキスト表示部３２１に表示されたテキストの音声入力を開始するための入力開始ボタン３２３と、対象テキスト表示部３２１に表示されたテキストの音声入力を終了するための入力終了ボタン３２４と、が設けられている。対象テキスト表示部３２１には、対象テキストデータが画像変換された画像データが表示される。また、入力開始ボタン３２３が押下され、次に入力終了ボタン３２４が押下されるまでの間に入力される音声が対象音声データとして収集される。 In the target area 32, the target text display unit 321, the microphone display unit 322 displaying a microphone indicating that voice input is possible, and the voice input of the text displayed on the target text display unit 321 are started. An input start button 323 for input and an input end button 324 for ending the voice input of the text displayed on the target text display unit 321 are provided. The target text display unit 321 displays image data obtained by converting the target text data into an image. In addition, voices input until the input start button 323 is pressed and then the input end button 324 is pressed are collected as target voice data.

このように、ユーザに音声入力を促すための認証領域３１と対象領域３２との表示形式を同一とすることで、ユーザには音声収集を意識させずに、サービス利用を目的として音声入力させることができるため、効率よく音声情報を収集することができる。 Thus, by making the display format of the authentication area 31 and the target area 32 for prompting the user to input voices, the user can input voices for the purpose of using the service without being aware of voice collection. Audio information can be collected efficiently.

ＯＫボタン３３は、押下されると、取得した認証音声データおよび対象音声データをサービス提供サーバ１００へ送信する。
キャンセルボタン３４は、処理を中止する。
操作説明部３５には、音声入力を促す説明文が表示される。認証テキストおよび対象テキストのどちらも音声入力されるような文章が好ましく、例えば、図３に示すように、「次の２つの読みを音声入力してください。」と表示する。 When the OK button 33 is pressed, the acquired authentication voice data and target voice data are transmitted to the service providing server 100.
The cancel button 34 stops processing.
In the operation explanation section 35, an explanation sentence for prompting voice input is displayed. Sentences in which both the authentication text and the target text are input by voice are preferable. For example, as shown in FIG. 3, “Please input the following two readings by voice” is displayed.

音声情報取得手段１３０は、端末装置３００の出力手段に表示された開始ページ３０において音声入力された認証音声データと対象音声データとを取得する。具体的には、端末装置３００で音声入力されたこれらの音声データがサービス提供サーバ１００へ送信され、サービス提供サーバ１００のデータ送受信手段で受信する。 The voice information acquisition unit 130 acquires the authentication voice data and the target voice data input by voice on the start page 30 displayed on the output unit of the terminal device 300. Specifically, the voice data input by the terminal device 300 is transmitted to the service providing server 100 and received by the data transmitting / receiving means of the service providing server 100.

音声認識手段１４０は、音声情報取得手段１３０により取得した認証音声データおよび対象音声データを認識し、テキストデータ（変換テキスト）に変換する。音声データをテキストデータに変換する方法は特に限定されず、公知の方法を用いることができる。 The voice recognition unit 140 recognizes the authentication voice data and the target voice data acquired by the voice information acquisition unit 130 and converts them into text data (converted text). A method for converting voice data into text data is not particularly limited, and a known method can be used.

登録判定手段１５０は、取得した対象音声データを、音声情報データベース１０１へ登録可能か否かを判定する。具体的には、開始ページ３０の対象テキスト表示部３２１に画像として表示された対象テキストデータと、取得した対象音声データと、音声認識手段１４０により対象音声データをテキストへ変換した変換テキストとを、音声候補データベース１０２に記憶されているテキストと音声データ候補と読み候補とに関連付ける。該当するテキストデータ、音声データ候補および読み候補の入力回数が例えば２回以上であれば登録可能とし、入力回数が２回未満であれば登録不可とする。すなわち、１つのテキストに対して同一の音声データ候補および読み候補が３回以上入力されると登録可となる。したがって、入力回数が１回であれば、音声候補データベース１０２の入力回数をプラス１の値に更新し、該当するテキストが未登録であれば、新規に音声候補データベース１０２に記憶させる。 The registration determination unit 150 determines whether or not the acquired target voice data can be registered in the voice information database 101. Specifically, the target text data displayed as an image on the target text display unit 321 of the start page 30, the acquired target voice data, and the converted text obtained by converting the target voice data into text by the voice recognition unit 140, The text stored in the speech candidate database 102 is associated with speech data candidates and reading candidates. For example, registration is possible if the number of inputs of the corresponding text data, speech data candidate, and reading candidate is two or more, and registration is not possible if the number of inputs is less than two. That is, registration is possible when the same speech data candidate and reading candidate are input three times or more for one text. Therefore, if the number of inputs is 1, the number of inputs in the speech candidate database 102 is updated to a value of plus 1, and if the corresponding text is not registered, it is newly stored in the speech candidate database 102.

音声登録手段１６０は、登録判定手段１５０で登録可能と判定された対象音声データを、音声情報データベース１０１に記憶させる。音声情報データベース１０１には、該当するテキストデータはすでに記憶されているので、音声データと、その読みデータを記憶させる。１つのテキストデータに対して１つまたは複数の音声データを記憶させることができる。また、ユーザ認証などにより同一のユーザにより登録された音声データや、類似する音声データがすでに登録済みである場合は、登録を行わないようにしてもよい。 The voice registration unit 160 stores the target voice data determined to be registerable by the registration determination unit 150 in the voice information database 101. Since the corresponding text data is already stored in the voice information database 101, the voice data and the reading data thereof are stored. One or more voice data can be stored for one text data. If voice data registered by the same user by user authentication or similar voice data has already been registered, the registration may not be performed.

サービス提供判断手段１７０は、開始ページ３０に認証テキスト画像として表示させた認証テキストデータと取得した認証音声データとを用いてサービス提供の可否を判定する。具体的には、認証テキストデータと認証音声データと音声認識手段１４０により認証音声データをテキストデータへ変換した変換テキストとを、音声情報データベース１０１に記憶されているテキストデータと音声データと読みデータに関連付ける。該当するデータがあればサービス提供可とする。音声情報データベース１０１には、１つのテキストデータに対して複数の音声データが記憶されているので、いずれか１つの音声データと一致すればよい。一致するテキストデータがない場合はサービス提供不可とする。 The service provision determining unit 170 determines whether or not the service can be provided using the authentication text data displayed as the authentication text image on the start page 30 and the acquired authentication voice data. Specifically, the authentication text data, the authentication voice data, and the converted text obtained by converting the authentication voice data into text data by the voice recognition unit 140 are converted into text data, voice data, and reading data stored in the voice information database 101. Associate. Service can be provided if there is applicable data. Since the voice information database 101 stores a plurality of voice data for one text data, it may be matched with any one voice data. If there is no matching text data, the service cannot be provided.

サービス提供手段１８０は、サービス提供判断手段１７０でサービス提供可とされた場合に、サービスを提供する。サービスとしては、ダウンロード、新規アカウントの作成などの各種サービスが挙げられ、各種サービスのトップページやサービス開始ページに関する情報を端末装置３００に対して送信し、これらのウェブページに表示された内容に応じて各種サービスが利用可能となる。 The service providing unit 180 provides a service when the service providing determining unit 170 determines that the service can be provided. Examples of services include various services such as download and creation of a new account. Information on the top page and service start page of each service is transmitted to the terminal device 300, and the contents displayed on these web pages Various services become available.

テキスト抽出手段１９０は、検索履歴データベース１０３に記憶されているキーワードのうち、検索回数が急上昇したキーワードを抽出して、このキーワードを音声情報データベース１０１に記憶させる。例えば、検索履歴データベース１０３には日ごとの検索回数が記憶されているので、前日と当日との検索回数の比が所定値を超えたキーワードを、急上昇ワードをして抽出し、音声情報データベース１０１に記憶させる。テキスト抽出手段１９０が検索履歴データベース１０３をチェックする頻度は定期的に行われ、例えば、１時間ごと、３時間ごと、５時間ごと、など適宜調整することができる。
また、音声入力要求手段１１０は、音声情報データベース１０１からテキストデータを抽出することもでき、テキスト抽出手段１９０を介して検索回数が急上昇したキーワードを抽出することもできる。 The text extraction unit 190 extracts a keyword whose search frequency has increased rapidly from keywords stored in the search history database 103 and stores the keyword in the voice information database 101. For example, since the search history database 103 stores the number of searches for each day, a keyword whose ratio of the number of searches between the previous day and the current day exceeds a predetermined value is extracted as a rapidly increasing word, and the voice information database 101 is extracted. Remember me. The frequency with which the text extraction unit 190 checks the search history database 103 is periodically performed, and can be adjusted as appropriate, for example, every hour, every three hours, every five hours, or the like.
Further, the voice input request unit 110 can extract text data from the voice information database 101, and can also extract a keyword whose search frequency has increased rapidly via the text extraction unit 190.

ウェブサーバ２００は、各種情報をウェブサイト上で提供する装置であり、例えば、検索サイトのウェブページが登録されている。
端末装置３００は、図示しないが、演算処理手段として、サービス提供サーバ１００に対して利用したいサービスを要求し、要求したサービスのウェブページを受信する端末送受信手段と、ウェブページを画面表示として出力させる出力手段と、文字入力可能なマウスやキーボード、および音声入力可能なマイクなどの入力手段とを備えている。一方、記憶手段としては、各種フォームにかかわるフォームデータを記憶するデータベースなどを備えている。端末装置３００としては特に限定されないが、例えば、携帯電話やノートパソコンなどが挙げられる。
なお、図１においては、１つの端末装置３００と１つのサービス提供サーバ１００とが接続された構成を例示したが、実際には、サービス提供サーバ１００は、多数の端末装置３００に接続されている。 The web server 200 is a device that provides various types of information on a website. For example, a web page of a search site is registered.
Although not shown, the terminal device 300 requests a service desired to be used from the service providing server 100 as an arithmetic processing unit, and receives and transmits a web page of the requested service, and outputs the web page as a screen display. Output means and input means such as a mouse and keyboard capable of inputting characters and a microphone capable of inputting voice are provided. On the other hand, the storage means includes a database for storing form data related to various forms. Although it does not specifically limit as the terminal device 300, For example, a mobile telephone, a notebook personal computer, etc. are mentioned.
1 illustrates a configuration in which one terminal device 300 and one service providing server 100 are connected, but in practice, the service providing server 100 is connected to many terminal devices 300. .

［サービス提供システムの動作］
次に、サービス提供システム１０の動作を図１、図２および図４に基づいて説明する。
まず、ユーザは、端末装置３００の入力手段を入力操作し、サービス提供サーバ１００にアクセスするために、例えば、ウェブブラウザを起動させてアドレスを入力し、利用したい情報サービス、例えばダウンロードサービスを要求する。
ステップＳ１において、サービス提供サーバ１００は、図示しない送受信手段により情報サービスの要求を受信するとＳ２に進む。
ステップＳ２では、音声入力要求手段１１０が、図３に示す開始ページ３０として表示するための認証画像データや対象画像データなどの開始ページ情報を端末装置３００に送信する。認証画像データは認証テキスト表示部３１１に表示されるものであり、音声情報データベース１０１に記憶されているテキストデータのうち音声データが記憶されているテキストデータをランダムに抽出し、画像変換手段１２０により画像変換されたものである。また、対象画像データは対象テキスト表示部３２１に表示されるものであり、音声情報データベース１０１に記憶されているテキストデータのうち音声データが記憶されていないテキストデータを優先的に抽出し、画像変換手段１２０により画像変換されたものである。 [Service delivery system operation]
Next, the operation of the service providing system 10 will be described based on FIG. 1, FIG. 2, and FIG.
First, in order to access the service providing server 100, the user performs an input operation on the input unit of the terminal device 300, for example, starts a web browser, inputs an address, and requests an information service to be used, for example, a download service. .
In step S1, when the service providing server 100 receives a request for information service by a transmission / reception unit (not shown), the service providing server 100 proceeds to S2.
In step S 2, the voice input request unit 110 transmits start page information such as authentication image data and target image data to be displayed as the start page 30 shown in FIG. 3 to the terminal device 300. The authentication image data is displayed on the authentication text display unit 311, and the text data in which the voice data is stored is randomly extracted from the text data stored in the voice information database 101, and is converted by the image conversion unit 120. The image is converted. The target image data is displayed on the target text display unit 321. Text data in which no voice data is stored is preferentially extracted from the text data stored in the voice information database 101, and image conversion is performed. The image is converted by means 120.

端末装置３００は、図示しない送受信手段で開始ページ情報を受信し、出力手段により開始ページ３０を画像表示する。これにより、開始ページ３０において、ユーザは、入力開始ボタン３１３を押下した後、マイク等の入力手段を用いて認証テキスト表示部３１１に表示されているテキストを読んで音声入力を開始する。そして、入力終了ボタン３１４を押下して音声入力を終了させる。同様に、入力開始ボタン３２３を押下した後、マイク等の入力手段を用いて対象テキスト表示部３２１に表示されているテキストを読んで音声入力を開始する。そして、入力終了ボタン３２４を押下して音声入力を終了させる。ユーザがＯＫボタン３３を押すと、音声入力された認証音声データおよび対象音声データが、サービス提供サーバ１００に送信される。 The terminal device 300 receives start page information by a transmission / reception unit (not shown), and displays an image of the start page 30 by an output unit. Thereby, on the start page 30, after the user presses the input start button 313, the user reads the text displayed on the authentication text display unit 311 using input means such as a microphone and starts voice input. Then, the input end button 314 is pressed to end the voice input. Similarly, after the input start button 323 is pressed, the text displayed on the target text display unit 321 is read using input means such as a microphone and voice input is started. Then, the input end button 324 is pressed to end the voice input. When the user presses the OK button 33, the voice input authentication voice data and target voice data are transmitted to the service providing server 100.

ステップＳ３では、サービス提供サーバ１００の音声情報取得手段１３０が、端末装置３００で入力された認証音声データおよび対象音声データを取得してステップＳ４へ進む。具体的には、送受信手段により認証音声データおよび対象音声データを受信する。
ステップＳ４では、音声認識手段１４０が、取得した認証音声データおよび対象音声データを認識し、テキストデータに変換した後、ステップＳ５へ進む。なお、ここで変換したテキストデータは、音声データの読みデータとなるものである。
ステップＳ５では、登録判定手段１５０が、取得した対象音声データが音声情報データベース１０１に登録可能であるか否かを判定する。具体的には、開始ページ３０の対象テキスト表示部３２１に画像データとして表示された対象テキスト、取得した対象音声データおよび対象音声データをテキスト変換して生成した読みデータと、音声候補データベース１０２に記憶されたテキストデータ、音声データ候補および読みデータ候補とを照合する。一致するデータがある場合、その入力回数が２回以上であれば、登録可能であると判定し、ステップＳ６へ進む。一方、入力回数が２回未満であれば、所定数以上の入力がないため登録不可であると判定し、ステップＳ７へ進む。 In step S3, the voice information acquisition unit 130 of the service providing server 100 acquires the authentication voice data and the target voice data input from the terminal device 300, and the process proceeds to step S4. Specifically, authentication voice data and target voice data are received by the transmission / reception means.
In step S4, the voice recognition unit 140 recognizes the acquired authentication voice data and target voice data, converts them into text data, and then proceeds to step S5. Note that the converted text data is read data of voice data.
In step S 5, the registration determination unit 150 determines whether the acquired target voice data can be registered in the voice information database 101. Specifically, the target text displayed as image data on the target text display unit 321 of the start page 30, the acquired target voice data, the reading data generated by text conversion of the target voice data, and the voice candidate database 102 are stored. The text data, the speech data candidate, and the reading data candidate are collated. If there is matching data, if the number of inputs is two or more, it is determined that registration is possible, and the process proceeds to step S6. On the other hand, if the number of times of input is less than 2, it is determined that registration is not possible because there is no predetermined number or more, and the process proceeds to step S7.

ステップＳ６では、音声登録手段１６０が登録可能とされた対象音声データおよびその読みデータを、音声情報データベース１０１に記憶された該当するテキストデータの音声データおよび読みデータとして記憶させ、ステップＳ８へ進む。なお、音声情報データベース１０１に同一の音声データおよび読みデータがすでに記憶されている場合は処理を行わない。そして、音声候補データベース１０２の該当するテキストデータ、音声データ候補および読みデータ候補の入力回数を１加算した値で更新する。
ステップＳ７では、音声候補データベース１０２の該当するテキストデータと音声データ候補と読みデータ候補の入力回数を１加算した値で更新する。また、音声候補データベース１０２に、取得した音声データおよび読みデータと一致する音声データ候補および読みデータ候補がない場合は、新規に記憶させてステップＳ８へ進む。 In step S6, the target voice data and the reading data that can be registered by the voice registration unit 160 are stored as the voice data and reading data of the corresponding text data stored in the voice information database 101, and the process proceeds to step S8. If the same voice data and reading data are already stored in the voice information database 101, no processing is performed. Then, the number of inputs of the corresponding text data, speech data candidate, and reading data candidate in the speech candidate database 102 is updated with a value obtained by adding one.
In step S7, the number of inputs of the corresponding text data, speech data candidate, and reading data candidate in the speech candidate database 102 is updated with a value obtained by adding one. On the other hand, if there are no voice data candidates and reading data candidates that match the acquired voice data and reading data in the voice candidate database 102, they are newly stored and the process proceeds to step S8.

ステップＳ８では、サービス提供判断手段１７０が、ステップＳ３で取得した認証音声データおよびステップＳ４でテキスト変換したテキストデータが、音声情報データベース１０１に記憶された該当するテキストデータの音声データおよび読みデータと一致するか否かを判定し、一致すればサービス提供可としてステップＳ９へ進む。一方、不一致であればサービス提供不可となり、ステップＳ２へ戻り、再度開始ページ３０を送信して、音声入力を要求する。このとき、認証テキスト表示部３１１に表示させる認証テキスト画像を別の認証テキスト画像に変更してもよい。
ステップＳ９では、サービス提供手段１８０が、端末装置３００に対して各種サービスを提供する。本実施形態では、ダウンロードサービスとして、例えば、ダウンロードするファイルを送信して、端末装置３００の出力手段等により画面表示させる。 In step S8, the authentication voice data acquired in step S3 by the service provision determination unit 170 and the text data converted in text in step S4 match the voice data and reading data of the corresponding text data stored in the voice information database 101. If it matches, the service can be provided and the process proceeds to step S9. On the other hand, if they do not match, the service cannot be provided, the process returns to step S2, and the start page 30 is transmitted again to request voice input. At this time, the authentication text image displayed on the authentication text display unit 311 may be changed to another authentication text image.
In step S 9, the service providing unit 180 provides various services to the terminal device 300. In the present embodiment, as a download service, for example, a file to be downloaded is transmitted and displayed on the screen by the output means of the terminal device 300 or the like.

また、テキスト抽出手段１９０は、定期的に検索履歴データベース１０３にアクセスし、前日と本日の検索履歴の比が所定値以上となる急上昇ワードを抽出し、テキストデータとして音声情報データベース１０１に記憶させている。 Further, the text extraction means 190 periodically accesses the search history database 103, extracts a rapidly rising word in which the ratio of the previous day and today's search history is a predetermined value or more, and stores it in the voice information database 101 as text data. Yes.

［第１実施形態の作用効果］
上述した第１実施形態では、以下に示す作用効果を奏することができる。
サービス提供サーバ１００は、サービス提供判断手段１７０により開始ページ３０に画像表示された認証テキストデータとこの認証テキストデータに対して音声入力された認証音声データでサービス提供の可否を判断する一方、登録判定手段１５０と音声登録手段１６０により開始ページ３０で音声入力された対象テキストデータに対する対象音声データを音声情報データベース１０１に記憶させる。
すなわち、サービスを提供するたびに対象テキストデータの対象音声データを収集することができ、データの収集を行っていることをユーザに意識させることなく統計的なデータを収集することができる。また、サービス提供のために音声入力させる認証テキストデータと、音声収集のために音声入力させる対象テキストデータとを同一形式で並列に表示することで、ユーザに音声収集を意識させることなく、音声入力を促すことができる。 [Effects of First Embodiment]
In the first embodiment described above, the following operational effects can be achieved.
The service providing server 100 determines whether or not the service can be provided based on the authentication text data image-displayed on the start page 30 by the service providing determination unit 170 and the authentication voice data input to the authentication text data. The target voice data corresponding to the target text data input by voice on the start page 30 by the means 150 and the voice registration means 160 is stored in the voice information database 101.
That is, the target voice data of the target text data can be collected every time the service is provided, and statistical data can be collected without making the user aware of the data collection. In addition, by displaying the authentication text data for voice input for service provision and the target text data for voice input for voice collection in parallel in the same format, voice input without making the user aware of voice collection Can be encouraged.

また、１つのテキストデータに対して同一の音声データおよび同一の読みデータが３回以上入力された場合に、その対象音声データおよび読みデータを音声情報データベース１０１に記憶させることとしたので、信頼性の高いデータを収集することができる。このように収集した音声情報データベース１０１は、音声データベースとして広く利用することができ、有用性が高い。 Further, when the same voice data and the same reading data are input three times or more for one text data, the target voice data and the reading data are stored in the voice information database 101. High data can be collected. The voice information database 101 collected in this way can be widely used as a voice database and is highly useful.

また、テキスト抽出手段１９０は、定期的に検索履歴データベース１０３にアクセスし、検索回数が急上昇したキーワードを抽出して音声情報データベース１０１に記憶させている。検索回数が急上昇したキーワードはその時の話題や流行を表すものであるので、これらのキーワードに対する音声データを効率よくかつ迅速に収集することができる。したがって、最新のキーワードに対する音声データを迅速かつ高い精度で収集することができる。 Further, the text extraction unit 190 periodically accesses the search history database 103, extracts a keyword whose search frequency has increased rapidly, and stores it in the voice information database 101. Since the keyword whose search frequency has increased rapidly represents the topic or trend at that time, it is possible to efficiently and quickly collect voice data for these keywords. Therefore, voice data for the latest keyword can be collected quickly and with high accuracy.

さらに、本実施形態では、認証テキスト表示部３１１および対象テキスト表示部３２１には、画像変換手段１２０によりテキストデータを画像変換し、画像データとして表示させることとした。画像データは解析されにくいため、不正アクセス等を防止することができ、セキュリティ性の向上を図ることができる。 Furthermore, in this embodiment, the authentication text display unit 311 and the target text display unit 321 convert the text data into an image by the image conversion unit 120 and display it as image data. Since image data is difficult to analyze, unauthorized access and the like can be prevented, and security can be improved.

〔第２実施形態〕
本発明の第２実施形態を、図５に基づいて説明する。
図５は、本発明の第２実施形態におけるサービス提供サーバの概略構成を示すブロック図である。
［サービス提供システムの構成］
第２実施形態では、広告情報データベース１０４に記憶された広告テキストデータを用いて認証を行う以外は、第１実施形態と同様の構成である。以下、第１実施形態と異なる構成について説明する。 [Second Embodiment]
A second embodiment of the present invention will be described with reference to FIG.
FIG. 5 is a block diagram showing a schematic configuration of a service providing server according to the second embodiment of the present invention.
[Service providing system configuration]
In the second embodiment, the configuration is the same as that of the first embodiment except that the authentication is performed using the advertisement text data stored in the advertisement information database 104. Hereinafter, a configuration different from the first embodiment will be described.

広告情報データベース１０４は、例えば、以下の表４に示すように、広告用の広告テキストデータごとにこの広告画像に関する広告属性情報である音声データとその読みデータが１つのレコードとして記憶されたテーブル構造となっている。広告テキストデータとしては、例えば、広告画像データに表示された商品の名称、メーカー、キャッチフレーズなどが挙げられ、これらがテキストデータとして記憶される。読みデータは、音声データを分析してテキストに変換されたテキストデータがカタカナで記憶される。なお、読みデータはひらがなでもよい。 For example, as shown in Table 4 below, the advertisement information database 104 has a table structure in which audio data that is advertisement attribute information regarding the advertisement image and its reading data are stored as one record for each advertisement text data for advertisement. It has become. Examples of the advertisement text data include the name of the product displayed in the advertisement image data, the manufacturer, the catchphrase, and the like, and these are stored as the text data. As the reading data, text data converted into text by analyzing voice data is stored in katakana. The reading data may be hiragana.

また、広告テキストデータでサービス提供の可否を判断することにより、音声入力要求手段１１２、サービス提供判断手段１７２の動作が第１実施形態と異なるので、これらについて説明する。
音声入力要求手段１１２は、サービスの利用を開始するための開始ページに関する情報をインターネット２０を介して端末装置３００に送信する。この開始ページに関する情報には、サービス提供の可否を判断するための認証画像と音声を収集するための対象画像のほか、開始ページを形成するフォーム等の情報が含まれる。認証画像としては、広告情報データベース１０４からランダムに抽出された広告テキストデータを画像変換手段１２０により画像変換した画像データが表示され、対象画像としては、音声情報データベース１０１に記憶されたテキストのうち、音声データおよび読みデータが記憶されていないものを優先的に抽出して画像変換手段１２０により画像変換した画像データを表示させる。 Further, since the operations of the voice input request unit 112 and the service provision determination unit 172 are different from those of the first embodiment by determining whether or not the service can be provided based on the advertisement text data, these will be described.
The voice input request unit 112 transmits information regarding the start page for starting use of the service to the terminal device 300 via the Internet 20. The information about the start page includes information such as a form forming the start page, in addition to an authentication image for determining whether to provide a service and a target image for collecting sound. As the authentication image, image data obtained by image conversion of the advertisement text data randomly extracted from the advertisement information database 104 by the image conversion means 120 is displayed. As the target image, among the texts stored in the voice information database 101, The audio data and reading data not stored are preferentially extracted, and the image data converted by the image conversion means 120 is displayed.

サービス提供判断手段１７２は、開始ページ３０に画像表示させた広告テキストデータと、取得した広告音声データとを用いてサービス提供の可否を判定する。具体的には、認証画像として画像表示させた広告テキストデータ、取得した広告音声データおよび音声認識手段１４０により広告音声データからテキスト変換された読みデータを、音声情報データベース１０１に記憶されている広告テキストデータ、音声データおよび読みデータと照合する。これらが一致すればサービス提供可とする。広告情報データベース１０４には、１つの広告テキストデータに対して複数の音声データおよび読みデータが記憶されているので、いずれか１つの音声データおよび読みデータと一致すればよい。一致するデータがない場合はサービス提供不可とする。 The service provision determination unit 172 determines whether or not the service can be provided by using the advertisement text data displayed as an image on the start page 30 and the acquired advertisement voice data. Specifically, the advertisement text data displayed as an authentication image, the acquired advertisement voice data, and the reading data obtained by converting the advertisement voice data into text by the voice recognition means 140 are stored in the voice information database 101. Collate with data, voice data and reading data. If they match, the service can be provided. Since the advertisement information database 104 stores a plurality of voice data and reading data for one advertisement text data, it suffices to match any one of the voice data and reading data. If there is no matching data, the service cannot be provided.

以上の構成のサービス提供サーバ４００は、第１実施形態と同様の動作により、音声情報データベース１０１に記憶されたテキストデータの音声データを収集することができる。なお、広告情報データベース１０４には、インターネット２０を介して接続された図示しない広告主端末から送信される広告情報に基づいてデータが記憶される。 The service providing server 400 having the above configuration can collect voice data of text data stored in the voice information database 101 by the same operation as in the first embodiment. The advertisement information database 104 stores data based on advertisement information transmitted from an advertiser terminal (not shown) connected via the Internet 20.

［第２実施形態の作用効果］
上述した第２実施形態では、第１実施形態における作用効果のほかにも、以下に示す作用効果を奏することができる。
サービス提供サーバ４００は、開始ページ３０に広告テキストデータを画像として表示させることとした。したがって、サービスの利用を開始するたびに広告が表示されるので広告効果の向上を図ることができる。特に、サービスを利用する際に必ず表示される開始ページ３０に広告を表示することはユーザの目に留まりやすく、広告効果が高い。また、広告テキストをユーザがそのまま発話するため、ユーザに商品名やメーカー名、キャッチフレーズ等を認識させる効果が高まり、広告効果の向上が図れる。 [Effects of Second Embodiment]
In the second embodiment described above, the following operational effects can be obtained in addition to the operational effects of the first embodiment.
The service providing server 400 displays the advertisement text data on the start page 30 as an image. Accordingly, since the advertisement is displayed every time the use of the service is started, the advertising effect can be improved. In particular, displaying an advertisement on the start page 30 that is always displayed when using the service is easily noticeable by the user and has a high advertising effect. Further, since the user speaks the advertising text as it is, the effect of causing the user to recognize the product name, manufacturer name, catchphrase, etc. is enhanced, and the advertising effect can be improved.

［変形例］
なお、本発明は、上述した実施形態に限定されるものではなく、本発明の目的を達成できる範囲で、例えば、以下に示される変形をも含むものである。
音声情報データベース１０１において、１つのテキストデータに対する音声データが複数ある場合、テキストデータごとにグループ化した音声データ集を作成することができる。このように、複数の音声データを収集することができるので、音声情報データベースとして使用する場合の精度の向上を図ることができる。 [Modification]
In addition, this invention is not limited to embodiment mentioned above, In the range which can achieve the objective of this invention, the deformation | transformation shown below is also included, for example.
When there are a plurality of voice data for one text data in the voice information database 101, a voice data collection grouped for each text data can be created. As described above, since a plurality of voice data can be collected, it is possible to improve accuracy when used as a voice information database.

また、上記実施形態では、サービス提供の可否に関わらず対象音声入力情報を音声情報記憶手段に記憶させることとしたが、サービス提供可となった場合にのみ記憶させることとしてもよい。サービス提供可となるには正確な音声を入力する必要があるので、再入力させることにより、より信頼性の高い音声情報を収集することができる。また、誤操作による雑音や日本語を知らない人の音声データなど明らかに異なる音声データを排除できる可能性が高い。
さらに、上記実施形態で用いた音声情報記憶手段は、音声データのそれに対する読みデータを収集する構成としたが、これに限られず、収集したい情報に応じて適宜項目を変更してもよいし増減させてもよい。 In the above embodiment, the target voice input information is stored in the voice information storage unit regardless of whether or not the service can be provided. However, the target voice input information may be stored only when the service can be provided. Since it is necessary to input an accurate voice in order to be able to provide a service, more reliable voice information can be collected by re-input. In addition, there is a high possibility that it is possible to eliminate clearly different voice data such as noise caused by erroneous operation and voice data of a person who does not know Japanese.
Further, the voice information storage means used in the above embodiment is configured to collect the reading data of the voice data. However, the present invention is not limited to this, and the items may be appropriately changed or increased / decreased depending on the information to be collected. You may let them.

さらに、上記実施形態では、画像変換手段１２０により音声情報データベース１０１から取得したテキストデータを画像データに変換したが、テキストデータを変換せずに、そのまま端末装置に送信して表示させるようにしてもよい。これによれば、サービス提供サーバ１００の構成を簡略化することができるとともに、処理を高速化することができる。 Furthermore, in the above embodiment, the text data acquired from the audio information database 101 by the image conversion means 120 is converted into image data. However, the text data is not converted but transmitted to the terminal device as it is for display. Good. According to this, the configuration of the service providing server 100 can be simplified and the processing can be speeded up.

そして、上記実施形態では、登録判定手段１５０は、端末装置３００に表示させた対象テキストデータに対して、同一の対象音声データが３回音声入力された場合に、音声情報データベース１０１に登録可としていたが、登録の可否を判定する回数はこれに限られない。信頼性を高めるためには、回数を増やすことが好ましい。
なお、認証テキストデータおよび認証音声データによってサービスの利用を許可された回数が多いユーザは信頼性が高いため、信頼性の高い音声データを収集することができる。 In the above embodiment, the registration determination unit 150 determines that registration is possible in the voice information database 101 when the same target voice data is input three times with respect to the target text data displayed on the terminal device 300. However, the number of times to determine whether registration is possible is not limited to this. In order to increase reliability, it is preferable to increase the number of times.
In addition, since the user who has been permitted to use the service by the authentication text data and the authentication voice data has high reliability, the voice data with high reliability can be collected.

本発明は、各種情報サービスを提供する音声情報収集装置として利用できる。 The present invention can be used as a voice information collection device that provides various information services.

本発明の第１実施形態にかかるサービス提供システムの概略構成を示すブロック図。1 is a block diagram showing a schematic configuration of a service providing system according to a first embodiment of the present invention. 前記第１実施形態におけるサービス提供サーバの概略構成を示すブロック図。The block diagram which shows schematic structure of the service provision server in the said 1st Embodiment. 前記第１実施形態における開始ページが画面表示された状態を示す概略図。Schematic which shows the state by which the start page in the said 1st Embodiment was displayed on the screen. 前記第１実施形態におけるサービス提供システムの動作を示すフローチャート。The flowchart which shows operation | movement of the service provision system in the said 1st Embodiment. 第２実施形態におけるサービス提供サーバの概略構成を示すブロック図。The block diagram which shows schematic structure of the service provision server in 2nd Embodiment.

Explanation of symbols

１０サービス提供システム
２０インターネット
３０開始ページ
３１認証領域
３２対象領域
１００サービス提供サーバ
１０１音声情報データベース
１０２音声候補データベース
１０３検索履歴データベース
１０４広告情報データベース
１１０、１１２音声入力要求手段
１２０画像変換手段
１３０音声情報取得手段
１４０音声認識手段
１５０登録判定手段
１６０音声登録手段
１７０、１７２サービス提供判断手段
１８０サービス提供手段
１９０テキスト抽出手段
２００ウェブサーバ
３００端末装置 DESCRIPTION OF SYMBOLS 10 Service provision system 20 Internet 30 Start page 31 Authentication area 32 Target area 100 Service provision server 101 Voice information database 102 Voice candidate database 103 Search history database 104 Advertisement information databases 110 and 112 Voice input request means 120 Image conversion means 130 Voice information acquisition Means 140 Voice recognition means 150 Registration judgment means 160 Voice registration means 170, 172 Service provision judgment means 180 Service provision means 190 Text extraction means 200 Web server 300 Terminal device

Claims

A voice information collection device that collects voice information together with provision of information services to terminal devices connected via a network,
Voice information storage means for storing text data and pronunciation information of the text data in association with each other;
Voice information acquisition means for transmitting text data stored in the voice information storage means to the terminal device for display, and acquiring voice information voice-input by the terminal device for the text data;
An information service providing unit that performs voice recognition of the acquired voice information and starts providing the information service when the voice recognition result matches the pronunciation information of the transmitted text data;
The voice information acquisition means displays the text data to be collected of the voice information in the same format as the text data to be displayed on the terminal device by causing the terminal device to display the text data in parallel with the displayed text data. A voice information collection device comprising voice input requesting means for inputting voice to the text data to be collected.

The voice information collection device according to claim 1,
Voice information input by voice to the text data displayed by the voice input request means is further provided with input information storage means for storing the voice information in the voice information storage means in association with the text data. Audio information collection device.

The voice information collection device according to claim 2,
Input candidate storage means for storing the text data, the voice information, and the number of times the same voice information has been acquired, in association with each other;
Registration determination means for determining that registration is possible when the number of times in the predetermined text data and the audio information is a predetermined number of times or more;
The voice information collection device, wherein the input information storage means stores the voice information determined to be registerable in the voice information storage means in association with the text data.

In the voice information collection device according to any one of claims 1 to 3,
It further comprises search history storage means for storing a search history of keyword search,
The voice input request means is
The voice information collection device, wherein the keyword whose search rate has rapidly increased based on the search history is extracted from the search history storage means and displayed on the terminal device as text data to be collected by the voice information.

In the voice information collection device according to any one of claims 1 to 4,
An audio information collecting apparatus, further comprising image conversion means for generating image data obtained by converting the text data into an image.

In the voice information collection device according to any one of claims 1 to 5,
The voice information collecting apparatus, wherein the text data is text data relating to an advertisement.

A voice information collecting method for collecting voice information together with providing information services to terminal devices connected via a network,
The text data and the pronunciation information of the text data are associated with each other and stored in the voice information storage means,
The text data stored in the voice information storage means is transmitted to the terminal device for display, and the voice information voice-input by the terminal device is acquired for the text data,
Performing speech recognition of the acquired speech information, if the speech recognition result matches the pronunciation information of the transmitted text data, start providing the information service,
The voice information acquisition means displays the text data to be collected of the voice information in the same format as the text data to be displayed on the terminal device by causing the terminal device to display the text data in parallel with the displayed text data. A speech information collecting method, comprising: inputting voice to the text data to be collected.

A voice information collection program that causes a computer to execute the voice information collection method according to claim 7.

A voice information collecting program for causing a computer to function as the voice information collecting device according to any one of claims 1 to 6.