JP7136606B2

JP7136606B2 - CONTENT CONTROL DEVICE, CONTROL METHOD AND CONTROL PROGRAM

Info

Publication number: JP7136606B2
Application number: JP2018122475A
Authority: JP
Inventors: 郁雄北岸
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2018-06-27
Filing date: 2018-06-27
Publication date: 2022-09-13
Anticipated expiration: 2038-06-27
Also published as: JP2020004056A

Description

本発明の実施形態は、コンテンツ制御装置、制御方法および制御プログラムに関する。 TECHNICAL FIELD Embodiments of the present invention relate to a content control device, a control method, and a control program.

近年、インターネットの普及に伴い、生活支援を行う各種サービスが提供されるようになってきている。例えば、ユーザに対して機器の操作支援を行う技術がある。 2. Description of the Related Art In recent years, with the spread of the Internet, various services for life support have been provided. For example, there is a technique for assisting a user in operating a device.

例えば、特許文献１には、視覚障害者等の利用者に負担をかけることなく操作を習得させる技術が開示されている。 For example, Japanese Patent Laid-Open No. 2002-200000 discloses a technique for allowing a user such as a visually impaired person to learn an operation without imposing a burden on the user.

特開２００７－１３４６９号公報JP 2007-13469 A

しかしながら、上記の従来技術では、音声コンテンツがながら聴取されている場合であっても、当該音楽コンテンツを容易に制御させることができるとは限らない。例えば、上記の従来技術は、ファクシミリ装置、プリンタ装置、スキャナ装置やそれらのＯＡ機器が提供する機能を１台に統合したデジタル複合機（ＭＦＰ）を視覚障害者等の利用者でも容易に操作方法を習得できるよう、音声で提供される操作説明情報であってＭＰＦに関する操作説明情報を利用者の宣言に応じて出力制御する。 However, with the conventional technology described above, even when audio content is being listened to, it is not always possible to easily control the music content. For example, the above-described prior art provides a method for easily operating a digital multi-function peripheral (MFP) that integrates the functions provided by a facsimile device, a printer device, a scanner device, and these OA devices into a single unit, even by a user such as a visually handicapped person. The operation explanation information provided by voice and related to the MPF is output-controlled in accordance with the declaration of the user so that the user can learn the operation explanation information.

このような上記の従来技術では、利用者は音声で提供される操作説明情報を注意して聞き取ろうとしていることが前提となっていると考えられるため、例えば、操作説明情報がながら聴取されているような場合、利用者に対して操作説明情報を容易に制御させることができるとは限らない。 In the conventional technology described above, it is assumed that the user is trying to listen carefully to the operation explanation information provided by voice. In such a case, it is not always possible to allow the user to easily control the operation explanation information.

本願は、上記に鑑みてなされたものであって、音声コンテンツがながら聴取されている場合であっても、当該音楽コンテンツを容易に制御させることを目的とする。 The present application has been made in view of the above, and an object of the present application is to easily control music content even when audio content is being listened to.

本願にかかるコンテンツ制御装置は、出力された音声コンテンツの再生を制御する操作を行うよう指示する音声指示をユーザから受け付ける受付部と、前記受付部により受け付けられた音声指示に応じた操作を前記音声コンテンツに対して行うコンテンツ制御部とを有することを特徴とする。 A content control apparatus according to the present application includes a reception unit that receives a voice instruction from a user for instructing an operation to control reproduction of an output audio content, and an operation that corresponds to the voice instruction received by the reception unit. and a content control unit that performs operations on the content.

実施形態の一態様によれば、音声コンテンツがながら聴取されている場合であっても、当該音楽コンテンツを容易に制御させることができるといった効果を奏する。 According to one aspect of the embodiment, it is possible to easily control the music content even when the audio content is being listened to.

図１は、実施形態にかかるコンテンツ制御処理の一例を示す図である。FIG. 1 is a diagram illustrating an example of content control processing according to an embodiment; 図２は、実施形態にかかる制御システムの構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of a control system according to the embodiment; 図３は、実施形態にかかるコンテンツ制御装置の構成例を示す図である。FIG. 3 is a diagram illustrating a configuration example of a content control device according to the embodiment; 図４は、実施形態にかかるコンテンツ記憶部の一例を示す図である。4 is a diagram illustrating an example of a content storage unit according to the embodiment; FIG. 図５は、実施形態にかかる行動履歴記憶部の一例を示す図である。5 is a diagram illustrating an example of an action history storage unit according to the embodiment; FIG. 図６は、実施形態にかかる操作履歴記憶部の一例を示す図である。6 is a diagram illustrating an example of an operation history storage unit according to the embodiment; FIG. 図７は、実施形態にかかるユーザ情報記憶部の一例を示す図である。7 is a diagram illustrating an example of a user information storage unit according to the embodiment; FIG. 図８は、実施形態にかかる配信コンテンツ記憶部の一例を示す図である。8 is a diagram illustrating an example of a distribution content storage unit according to the embodiment; FIG. 図９は、実施形態にかかる挿入処理手順を示すフローチャートである。FIG. 9 is a flowchart illustrating an insertion processing procedure according to the embodiment; 図１０は、実施形態にかかる挿入処理を模式的に示す図である。FIG. 10 is a diagram schematically illustrating insertion processing according to the embodiment; 図１１は、実施形態にかかるＵＩ処理手順を示すフローチャートである。11 is a flowchart illustrating a UI processing procedure according to the embodiment; FIG. 図１２は、コンテンツ制御装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 12 is a hardware configuration diagram showing an example of a computer that implements the functions of the content control device.

以下に、本願にかかるコンテンツ制御装置、制御方法および制御プログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ説明する。なお、この実施形態により本願にかかるコンテンツ制御装置、制御方法および制御プログラムが限定されるものではない。また、以下の実施形態において、同一の部位には同一の符号を付し、重複する説明は省略される。 Embodiments (hereinafter referred to as "embodiments") for implementing the content control device, control method, and control program according to the present application will be described below with reference to the drawings. Note that the content control device, control method, and control program according to the present application are not limited to this embodiment. In addition, in the following embodiments, the same parts are denoted by the same reference numerals, and overlapping descriptions are omitted.

〔１．コンテンツ制御処理〕
まず、図１を用いて、実施形態にかかるコンテンツ制御処理の一例について説明する。図１は、実施形態にかかるコンテンツ制御処理の一例を示す図である。実施形態にかかるコンテンツ制御処理は、コンテンツ制御装置１００によって行われる。なお、実施形態にかかるコンテンツ制御処理には、第１の音声コンテンツに対する第２の音声コンテンツの挿入に関する処理（「挿入処理」とする）と、かかる処理により挿入された第２の音声コンテンツに対するユーザの音声指示を受け付けることにより音声指示に応じた操作を第２の音声コンテンツに対して行う処理といった２つの制御処理（「ＵＩ処理」とする）が含まれる。 [1. Content control processing]
First, an example of content control processing according to the embodiment will be described with reference to FIG. FIG. 1 is a diagram illustrating an example of content control processing according to an embodiment; Content control processing according to the embodiment is performed by the content control device 100 . Note that the content control processing according to the embodiment includes processing related to inserting the second audio content into the first audio content (referred to as "insertion processing"), and user control processing for the second audio content inserted by such processing. 2 includes two control processes (referred to as “UI processing”) such as processing for performing an operation on the second audio content in accordance with the voice instruction by receiving the voice instruction.

図１の説明に先立って、図２を用いて、実施形態にかかる制御システムについて説明する。図２は、実施形態にかかる制御システム１の構成例を示す図である。実施形態にかかる制御システム１は、図２に示すように、端末装置１０と、出力装置３０と、外部装置６０と、コンテンツ制御装置１００とを含む。端末装置１０、出力装置３０、外部装置６０、コンテンツ制御装置１００は、ネットワークＮを介して有線または無線により通信可能に接続される。なお、図１に示す制御システム１には、複数台の端末装置１０や、複数台の出力装置３０や、複数台の外部装置６０や、複数台のコンテンツ制御装置１００が含まれてよい。 Prior to the description of FIG. 1, the control system according to the embodiment will be described with reference to FIG. FIG. 2 is a diagram showing a configuration example of the control system 1 according to the embodiment. The control system 1 according to the embodiment includes a terminal device 10, an output device 30, an external device 60, and a content control device 100, as shown in FIG. The terminal device 10, the output device 30, the external device 60, and the content control device 100 are communicatively connected via a network N by wire or wirelessly. The control system 1 shown in FIG. 1 may include a plurality of terminal devices 10, a plurality of output devices 30, a plurality of external devices 60, and a plurality of content control devices 100. FIG.

端末装置１０は、ユーザによって利用される情報処理装置である。端末装置１０は、例えば、スマートフォンや、タブレット型端末や、ノート型ＰＣ（Personal Computer）や、デスクトップＰＣや、携帯電話機や、ＰＤＡ（Personal Digital Assistant）等である。例えば、端末装置１０は、ユーザ操作に従い、各種のコンテンツ（例えば、記事コンテンツ、広告コンテンツ、ブログ、ショッピングコンテンツ、音楽コンテンツ等）を表示画面に表示する。 The terminal device 10 is an information processing device used by a user. The terminal device 10 is, for example, a smart phone, a tablet terminal, a notebook PC (Personal Computer), a desktop PC, a mobile phone, a PDA (Personal Digital Assistant), or the like. For example, the terminal device 10 displays various contents (for example, article contents, advertisement contents, blogs, shopping contents, music contents, etc.) on the display screen in accordance with user operations.

また、後述するが、端末装置１０は、コンテンツ制御装置１００から送信されたコンテンツを画面表示したり音声出力する。つまり、端末装置１０は、コンテンツ制御装置１００と連携して動作することができる。なお、このような連携が実現されるために、端末装置１０には所定のアプリケーションが予めインストールされていてもよい。 Also, as will be described later, the terminal device 10 displays the content transmitted from the content control device 100 on the screen and outputs the content by voice. That is, the terminal device 10 can operate in cooperation with the content control device 100 . Note that a predetermined application may be pre-installed in the terminal device 10 in order to realize such cooperation.

出力装置３０は、例えば、スマートスピーカー、カーナビゲーション、可動式ロボット等である。本実施形態では、出力装置３０は、スマートスピーカーであるものとする。したがって、以下の実施形態では、出力装置３０を「スマートスピーカー３０」と表記する場合がある。 The output device 30 is, for example, a smart speaker, car navigation system, mobile robot, or the like. In this embodiment, the output device 30 is assumed to be a smart speaker. Therefore, in the following embodiments, the output device 30 may be referred to as "smart speaker 30".

外部装置６０は、第１の音声コンテンツや第２の音声コンテンツをコンテンツ制御装置１００に提供する提供元のサーバ装置である。外部装置６０は、コンテンツ制御装置１００を管理する事業主（「事業主Ｚ」とする）に属するサーバ装置であってもよいし、事業主Ｚとは異なる事業主（つまり、事業主Ｚから見て他社）によって管理されるサーバ装置でもよい。 The external device 60 is a provider server device that provides the content control device 100 with the first audio content and the second audio content. The external device 60 may be a server device belonging to a business owner (referred to as “business owner Z”) who manages the content control device 100, or a business owner different from the business owner Z (that is, from the perspective of business owner Z). It may be a server device managed by another company).

ここで、実施形態にかかるコンテンツ制御処理の前提について説明する。本実施形態では、音声コンテンツを聞きたいというユーザの意志に応えて音声コンテンツを提供するのではなく、例えば、何か作業をしているユーザに対して「耳の端で捉えられる」程度に音声コンテンツを加工したり制御することを前提としている。つまり、本実施形態では、ながら聴取されるような音声コンテンツを加工および制御し、また、音声コンテンツがながら聴取されるような場合であってもユーザの指示に応じて音声コンテンツを制御できるようにすることを前提としている。 Here, the premise of the content control process according to the embodiment will be described. In the present embodiment, the audio content is not provided in response to the user's desire to listen to the audio content, but rather, for example, the user who is doing some work hears the audio to the extent that it can be "caught at the edge of the ear". It is premised on processing and controlling content. In other words, in the present embodiment, audio content that is listened to while listening is processed and controlled, and even if the audio content is listened to while listening, the audio content can be controlled according to the user's instruction. It is assumed that

ながら聴取のためのコンテンツ制御処理の一例を示す。一つの例としては、ユーザが起床して出勤準備をしている場面が考えられる。例えば、コンテンツ制御装置１００は、ユーザが出勤準備をしている旨を検出すると、スマートスピーカー３０から出力されているメインの音声コンテンツ（例えば、音楽チャンネル）の途中に、ユーザの勤務先付近の天気予報が音声コンテンツとして流れるよう制御する。また、コンテンツ制御装置１００は、メインの音声コンテンツの途中に、ユーザが普段利用している路線の遅延情報が音声コンテンツとして流れるよう制御する。 An example of content control processing for listening while listening is shown. One example is a scene in which the user wakes up and prepares to go to work. For example, when the content control device 100 detects that the user is preparing to go to work, the content control device 100 adds weather near the user's place of work during the main audio content (for example, music channel) output from the smart speaker 30 . Control the forecast to flow as audio content. In addition, the content control device 100 controls so that the delay information of the route that the user usually uses flows as audio content in the middle of the main audio content.

また、コンテンツ制御装置１００は、メインの音声コンテンツの途中に、ユーザが興味のあるニュースの見出しが音声コンテンツとして流れるよう制御する。また、コンテンツ制御装置１００は、メインの音声コンテンツの途中に、ユーザの好みのジャンルの楽曲が音声コンテンツとして流れるよう制御する。また、コンテンツ制御装置１００は、メインの音声コンテンツの途中に、ユーザの勤務先付近の飲食店に関するお得情報が音声コンテンツとして流れるよう制御する。 In addition, the content control device 100 performs control so that headlines of news that the user is interested in are played as audio content in the middle of the main audio content. In addition, the content control device 100 performs control so that music of a user's favorite genre is played as audio content in the middle of the main audio content. In addition, the content control device 100 controls so that profitable information about restaurants near the user's place of work is played as audio content in the middle of the main audio content.

他の例としては、ユーザが洗濯をしている場面が考えられる。例えば、コンテンツ制御装置１００は、ユーザが洗濯をしている旨を検出すると、スマートスピーカー３０から出力されているメインの音声コンテンツ（例えば、音楽チャンネル）の途中に、洗剤や洗濯機に関するネットショッピングや広告コンテンツが音声コンテンツとして流れるよう制御する。また、コンテンツ制御装置１００は、メインの音声コンテンツの途中に、ユーザの自宅付近の天気予報が音声コンテンツとして流れるよう制御する。また、コンテンツ制御装置１００は、メインの音声コンテンツの途中に、ユーザの自宅付近のスーパーのお得情報が音声コンテンツとして流れるよう制御する。 Another example is a scene where the user is doing laundry. For example, when the content control device 100 detects that the user is doing laundry, the main audio content (e.g., music channel) output from the smart speaker 30 includes an online shopping service about detergents and washing machines. To control advertisement content to flow as audio content. In addition, the content control device 100 performs control such that the weather forecast for the vicinity of the user's home is played as audio content during the main audio content. In addition, the content control device 100 controls so that profitable information of a supermarket near the user's home is played as audio content in the middle of the main audio content.

また、さらに他の例としては、ユーザが料理をしている場面が考えられる。例えば、コンテンツ制御装置１００は、ユーザが料理をしている旨を検出すると、スマートスピーカー３０から出力されているメインの音声コンテンツ（例えば、音楽チャンネル）の途中に、調理器具に関するラジオショッピングや広告コンテンツが音声コンテンツとして流れるよう制御する。 As yet another example, a scene in which the user is cooking can be considered. For example, when the content control device 100 detects that the user is cooking, the content control device 100 inserts radio shopping or advertising content about cooking utensils in the middle of the main audio content (for example, music channel) output from the smart speaker 30. controls to flow as audio content.

なお、上記例においてスマートスピーカー３０から出力されているメインの音声コンテンツは、第１の音声コンテンツに相当する。また、メインの音声コンテンツの途中に流れるよう制御される音声コンテンツは、第２の音声コンテンツに相当する。 Note that the main audio content output from the smart speaker 30 in the above example corresponds to the first audio content. Also, the audio content controlled to flow in the middle of the main audio content corresponds to the second audio content.

また、このような状態において、コンテンツ制御装置１００は、ながら聴取される第２の音声コンテンツをユーザの目的指向に応じて操作することを前提とする。ユーザの目的指向とは、例えば、出力された第２の音声コンテンツを巻き戻したり、早送りしたりすること等である。 Also, in such a state, it is assumed that the content control apparatus 100 operates the second audio content that is listened to while listening according to the user's purpose orientation. The user's goal-orientation means, for example, rewinding or fast-forwarding the output second audio content.

以上のような前提を踏まえて、コンテンツ制御装置１００は、実施形態にかかるコンテンツ制御処理を行う。具体的には、コンテンツ制御装置１００は、挿入処理として、第１の音声コンテンツに挿入される候補の第２の音声コンテンツを取得し、配信先のユーザに関するユーザ情報に基づいて、候補の第２の音声コンテンツのうち、第１の音声コンテンツに挿入される第２の音声コンテンツを選択する。より具体的には、コンテンツ制御装置１００は、配信先のユーザに関するユーザ情報に基づいて、第１の音声コンテンツに挿入される対象の第２の音声コンテンツと、当該対象の第２の音声コンテンツが第１の音声コンテンツに挿入される時間位置であって第１の音声コンテンツにおける時間位置とを決定する。そして、コンテンツ制御装置１００は、決定した第２の音声コンテンツを選択し、実際に第１の音声コンテンツに挿入する。 Based on the above assumptions, the content control device 100 performs content control processing according to the embodiment. Specifically, as an insertion process, the content control apparatus 100 acquires the candidate second audio content to be inserted into the first audio content, and based on the user information about the user of the distribution destination, the content control apparatus 100 acquires the candidate second audio content. selects the second audio content to be inserted into the first audio content. More specifically, the content control apparatus 100 selects the second audio content to be inserted into the first audio content and the second audio content to be inserted based on the user information about the user of the distribution destination. A time position in the first audio content to be inserted into the first audio content is determined. Then, the content control device 100 selects the determined second audio content and actually inserts it into the first audio content.

また、コンテンツ制御装置１００は、ＵＩ処理として、第１の音声コンテンツに挿入された第２の音声コンテンツの再生を制御する操作を行うよう指示する音声指示をユーザから受け付けると、受け付けた音声指示に応じた操作を音声コンテンツに対して行う。例えば、コンテンツ制御装置１００は、再生を制御する操作として、巻き戻し、頭出し、クリップ（抽出）、早送り、スキップするよう指示する音声指示を受け付けることにより、第２の音声コンテンツに対してこの音声指示に応じた操作を行う。以下では、コンテンツ制御処理の一例を示す。図１の例では、ステップＳ１～Ｓ６が挿入処理に対応し、ステップＳ７～Ｓ８がＵＩ処理に対応する。 Further, as UI processing, the content control device 100 accepts from the user a voice instruction instructing to perform an operation to control the reproduction of the second audio content inserted in the first audio content. The corresponding operation is performed on the audio content. For example, the content control device 100 accepts voice instructions for rewinding, cueing, clipping (extracting), fast-forwarding, and skipping as operations for controlling playback, thereby allowing the second audio content to receive this voice instruction. Perform the operation according to the instructions. An example of content control processing is shown below. In the example of FIG. 1, steps S1 to S6 correspond to insertion processing, and steps S7 to S8 correspond to UI processing.

まず、コンテンツ制御装置１００は、第１の音声コンテンツに挿入される候補の第２の音声コンテンツを取得する（ステップＳ１）。例えば、コンテンツ制御装置１００は、外部装置６０から候補の第２の音声コンテンツの元データ（例えば、テキスト）を取得することができる。また、コンテンツ制御装置１００は、外部装置６０から第１の音声コンテンツも取得してよい。図１の例では、コンテンツ制御装置１００は、候補の第２の音声コンテンツの元データとして、ニュースコンテンツＳＣ１１、広告コンテンツＳＣ２１、路線情報ＳＣ３１、天気情報ＳＣ４１を取得する。 First, the content control device 100 acquires a candidate second audio content to be inserted into the first audio content (step S1). For example, the content control device 100 can acquire the original data (eg, text) of the candidate second audio content from the external device 60 . The content control device 100 may also acquire the first audio content from the external device 60 . In the example of FIG. 1, the content control device 100 acquires the news content SC11, the advertisement content SC21, the route information SC31, and the weather information SC41 as the source data of the candidate second audio content.

なお、ニュースコンテンツ、路線情報、天気情報にはリアルタイム性が求められる場合がある。したがって、コンテンツ制御装置１００は、配信のタイムスケジュールが決められている第１の音声コンテンツについて、例えば、配信予定時刻の１０分前の最新のニュースコンテンツＳＣ１１、路線情報ＳＣ３１、天気情報ＳＣ４１を取得し、上記配信予定時刻となるまでに以下の処理を行うようにしてもよい。 Note that news content, route information, and weather information may be required to be real-time. Therefore, the content control device 100 acquires the latest news content SC11, route information SC31, and weather information SC41, for example, 10 minutes before the scheduled delivery time for the first audio content whose delivery time schedule is determined. , the following processing may be performed before the scheduled delivery time.

コンテンツ制御装置１００は、取得した元データを音声データに変換する（ステップＳ２）。コンテンツ制御装置１００は、取得した元データをまず編成および編集する。例えば、コンテンツ制御装置１００は、所定の文字数を超えている場合には、予め決められた文字数に収まるよう元データを編集する。また、例えば、コンテンツ制御装置１００は、テキストを解析し、文章構成等に不具合がある場合には、元データを編成し直す。そして、コンテンツ制御装置１００は、編成および編集した元データを音声データに変換する。この変換後の音声データが、第１の音声コンテンツに挿入される候補の第２の音声コンテンツである。 The content control device 100 converts the acquired original data into audio data (step S2). The content control device 100 first organizes and edits the acquired original data. For example, when the number of characters exceeds a predetermined number, the content control device 100 edits the original data so that the number of characters is within the predetermined number. Also, for example, the content control apparatus 100 analyzes the text, and reorganizes the original data if there is a problem with the sentence structure or the like. Then, the content control device 100 converts the organized and edited original data into audio data. This converted audio data is the candidate second audio content to be inserted into the first audio content.

なお、コンテンツ制御装置１００は、既に音声データに変換された候補の第２の音声コンテンツを外部装置６０から取得してもよい。かかる場合、コンテンツ制御装置１００は、例えば、編成および編集は行わず、取得した音声データを第２の音声コンテンツとして以下の処理に利用する。 Note that the content control device 100 may acquire from the external device 60 the candidate second audio content that has already been converted into audio data. In such a case, the content control apparatus 100 does not organize and edit, for example, and uses the acquired audio data as the second audio content for the following processing.

次に、コンテンツ制御装置１００は、配信先のユーザに関するユーザ情報に基づいて、第１の音声コンテンツに挿入される対象の第２の音声コンテンツと、当該対象の第２の音声コンテンツが第１の音声コンテンツに挿入される時間位置であって第１の音声コンテンツにおける時間位置とを決定する（ステップＳ３）。 Next, the content control device 100 selects the second audio content to be inserted into the first audio content and inserts the target second audio content into the first audio content based on the user information about the user of the distribution destination. A time position to be inserted into the audio content and a time position in the first audio content is determined (step S3).

まず、配信先のユーザに関するユーザ情報の一例について説明する。図１の例では、配信先のユーザは、ユーザＵ１である。そうすると、コンテンツ制御装置１００は、ユーザＵ１のユーザ情報として、ユーザＵ１の行動履歴を取得する。例えば、コンテンツ制御装置１００は、ユーザＵ１の端末装置１０からユーザＵ１の行動履歴を取得する。１つの例では、コンテンツ制御装置１００は、ユーザＵ１の行動履歴としてインターネット上の行動履歴を取得する。コンテンツ制御装置１００は、インターネット上の行動履歴として、どのような行動履歴を取得するかは限定されないが、例えば、コンテンツに対する閲覧履歴、購買履歴、動画や音楽の視聴履歴を取得する。 First, an example of user information regarding users of delivery destinations will be described. In the example of FIG. 1, the user of the delivery destination is user U1. Then, the content control device 100 acquires the action history of the user U1 as the user information of the user U1. For example, the content control device 100 acquires the action history of the user U1 from the terminal device 10 of the user U1. In one example, the content control device 100 acquires an action history on the Internet as the action history of user U1. The content control apparatus 100 acquires, for example, browsing history, purchasing history, viewing history of moving images and music, although there is no limitation on what kind of action history is acquired as the action history on the Internet.

また、他の１つの例では、コンテンツ制御装置１００は、ユーザＵ１の行動履歴として、日常生活上での行動履歴を取得する。コンテンツ制御装置１００は、日常生活上の行動履歴として、どのような行動履歴を取得するかは限定されないが、例えば、起床時刻、就寝時刻、利用路線、職場位置、移動履歴等を取得する。なお、コンテンツ制御装置１００は、日常生活上の行動履歴からユーザＵ１の生活スタイルの傾向を特定し、この生活スタイルを行動履歴の一種としてもよい。 In another example, the content control device 100 acquires an action history in daily life as the action history of the user U1. The content control apparatus 100 acquires, for example, wake-up time, bedtime, used route, workplace position, movement history, etc., although there is no limitation on what kind of action history is acquired as the action history in daily life. Note that the content control device 100 may identify the tendency of the lifestyle of the user U1 from the behavior history in daily life, and use this lifestyle as one type of behavior history.

また、コンテンツ制御装置１００は、ユーザＵ１の行動履歴として、ユーザＵ１がまさに現在行っている行動を示す行動情報を行動履歴として取得してもよい。例えば、コンテンツ制御装置１００は、ユーザＵ１が現在洗濯を行っている場合には、ユーザＵ１が現在洗濯を行っている旨を行動履歴として取得する。また、コンテンツ制御装置１００は、ユーザＵ１が現在料理を行っている場合には、ユーザＵ１が現在料理を行っている旨を行動履歴として取得する。 Also, the content control apparatus 100 may acquire, as the action history of the user U1, action information indicating the action that the user U1 is currently performing. For example, when the user U1 is currently washing clothes, the content control device 100 acquires that the user U1 is currently washing clothes as an action history. In addition, when the user U1 is currently cooking, the content control device 100 acquires that the user U1 is currently cooking as an action history.

さて、ユーザ情報として、ユーザＵ１の行動履歴を取得する例を示したが、コンテンツ制御装置１００は、ユーザＵ１のユーザ情報として、ユーザＵ１の操作履歴を取得することもできる。具体的には、コンテンツ制御装置１００は、スマートスピーカー３０に対するユーザＵ１の操作履歴を取得する。例えば、コンテンツ制御装置１００は、ユーザＵ１がスマートスピーカー３０を操作してどのようなコンテンツを聞いたかを示す操作履歴を取得する。 Although an example of acquiring user U1's action history as user information has been described, the content control device 100 can also acquire user U1's operation history as user U1's user information. Specifically, the content control device 100 acquires the operation history of the user U1 on the smart speaker 30 . For example, the content control device 100 acquires an operation history indicating what kind of content the user U1 listened to by operating the smart speaker 30 .

ステップＳ３の説明に戻り、コンテンツ制御装置１００は、ユーザ情報に基づいて、ユーザＵ１がどの時間にどのような音声コンテンツを聞く傾向にあるか、すなわちユーザＵ１における時間帯と当該時間帯に聞かれる音声コンテンツの内容との傾向を算出する。そして、コンテンツ制御装置１００は、算出結果に基づいて、対象の第２の音声コンテンツと、対象の第２の音声コンテンツが挿入される時間位置とを決定する。 Returning to the description of step S3, based on the user information, the content control device 100 determines what kind of audio content the user U1 tends to listen to at what time, that is, the time zone for the user U1 and the audio content heard during that time zone. A tendency with the content of the audio content is calculated. Then, the content control device 100 determines the target second audio content and the time position where the target second audio content is to be inserted based on the calculation result.

図１の例では、コンテンツ制御装置１００は、ユーザＵ１は毎朝、午前７時～８時の間にウェブニュースを閲覧する傾向にあることを算出するとともに、マンガが趣味であることを特定したものとする。このような状態において、午前７時～８時の間に配信予定の第１の音声コンテンツが音楽コンテンツＭＣ１であるとする。かかる場合、コンテンツ制御装置１００は、ニュースコンテンツＳＣ１１および広告コンテンツＳＣ２１を、第１の音声コンテンツである音楽コンテンツＭＣ１に挿入される対象の第２の音声コンテンツとして決定する。広告コンテンツＳＣ２１は、例えば、ユーザＵ１の趣味に合わせてマンガの広告コンテンツである。 In the example of FIG. 1, the content control device 100 calculates that user U1 tends to browse web news between 7:00 and 8:00 a.m. every morning, and specifies that manga is his hobby. . In such a state, assume that the first audio content scheduled to be delivered between 7:00 and 8:00 am is the music content MC1. In this case, the content control device 100 determines the news content SC11 and the advertisement content SC21 as the second audio content to be inserted into the music content MC1, which is the first audio content. The advertising content SC21 is, for example, advertising content of comics in accordance with user U1's taste.

また、コンテンツ制御装置１００は、音楽コンテンツＭＣ１の時間位置として再生時間５分の位置（午前７時０５分に対応する時間位置）をニュースコンテンツＳＣ１１を挿入させる時間位置、音楽コンテンツＭＣ１の時間位置として再生時間３０分の位置（午前７時３０分に対応する時間位置）を広告コンテンツＳＣ２１を挿入させる時間位置として決定する。 In addition, the content control device 100 sets the time position of the music content MC1 at 5 minutes of playback time (the time position corresponding to 7:05 am) as the time position where the news content SC11 is inserted, and the time position of the music content MC1. The position of 30 minutes of playback time (the time position corresponding to 7:30 am) is determined as the time position to insert the advertising content SC21.

例えば、コンテンツ制御装置１００は、ニュースコンテンツのようにリアルタイム性が求められるコンテンツについては、第１の音声コンテンツの配信が開始されてからなるべく早い段階で出力されるよう時間位置を決定する。天気情報についても同様である。一方、コンテンツ制御装置１００は、広告コンテンツのように、あまりリアルタイム性が求められないコンテンツについては、第１の音声コンテンツの配信が開始されてから中盤辺りに出力されるよう時間位置を決定する。 For example, the content control apparatus 100 determines the time position of content such as news content that requires real-time performance so that the content is output as early as possible after the start of distribution of the first audio content. The same is true for weather information. On the other hand, the content control apparatus 100 determines the time position of content such as advertising content that does not require much real-time performance so that the content is output around the middle stage after the start of distribution of the first audio content.

そして、コンテンツ制御装置１００は、ステップＳ３で決定した情報に基づいて、第１の音声コンテンツである音楽コンテンツＭＣ１に対して、ニュースコンテンツＳＣ１１および広告コンテンツＳＣ２１を挿入する（ステップＳ４）。ここで、ニュースコンテンツＳＣ１１および広告コンテンツＳＣ２１が挿入された音楽コンテンツＭＣ１を音楽コンテンツＭＳＣ１とする。 Based on the information determined in step S3, the content control device 100 inserts the news content SC11 and the advertisement content SC21 into the music content MC1, which is the first audio content (step S4). Here, the music content MC1 into which the news content SC11 and the advertisement content SC21 are inserted is referred to as music content MSC1.

このような状態において、コンテンツ制御装置１００は、音楽コンテンツＭＳＣ１に対して所定の出力制御を行う（ステップＳ５）。例えば、ユーザＵ１は、出勤仕度を行っている場合、出勤仕度に気を取られて音楽コンテンツＭＳＣ１がほとんど耳に入らない可能性がある。したがって、コンテンツ制御装置１００は、例えば、ユーザＵ１が出勤仕度を行っていると予測される場合には、音楽コンテンツＭＳＣ１が所定値より高い音量で出力されるよう音量を制御する。なお、コンテンツ制御装置１００は、音楽コンテンツＭＳＣ１に含まれる第２の音声コンテンツであるニュースコンテンツＳＣ１１および広告コンテンツＳＣ２１だけ、所定値より高い音量で出力されるよう音量を制御してもよい。 In such a state, the content control device 100 performs predetermined output control on the music content MSC1 (step S5). For example, when the user U1 is getting ready for work, there is a possibility that the music content MSC1 is hardly heard due to being preoccupied with getting ready for work. Therefore, for example, when the user U1 is expected to be getting ready for work, the content control device 100 controls the volume so that the music content MSC1 is output at a volume higher than a predetermined value. Note that the content control device 100 may control the volume so that only the news content SC11 and the advertising content SC21, which are the second audio content included in the music content MSC1, are output at a volume higher than a predetermined value.

そして、コンテンツ制御装置１００は、音楽コンテンツＭＳＣ１を配信するタイムスケジュール（例えば、午前７時から配信予定）に沿って、音楽コンテンツＭＳＣ１を配信する（ステップＳ６）。具体的には、コンテンツ制御装置１００は、スマートスピーカー３０に対して音楽コンテンツＭＳＣ１を送信することにより、スマートスピーカー３０から音楽コンテンツＭＳＣ１を配信（出力または再生）させる。 Then, the content control device 100 distributes the music content MSC1 according to the time schedule for distributing the music content MSC1 (for example, scheduled to be distributed from 7:00 am) (step S6). Specifically, the content control device 100 causes the smart speaker 30 to distribute (output or reproduce) the music content MSC1 by transmitting the music content MSC1 to the smart speaker 30 .

このような状態において、ユーザＵ１は、実際に出勤仕度をしながら、音楽コンテンツＭＳＣ１をながら聴取していたとする。ここで、ユーザＵ１は、ニュースコンテンツＳＣ１１が出力された後に、ニュースコンテンツＳＣ１１を再度聞き直したいと思い、スマートスピーカー３０に対して「もう一度聞かせて」と音声指示したものとする。かかる音声指示は、第２の音声コンテンツであるニュースコンテンツＳＣ１１の再生を制御する操作を行うようコンテンツ制御装置１００に対して指示するものである。 In such a state, assume that the user U1 listens to the music content MSC1 while actually getting ready for work. Here, it is assumed that the user U1 wants to listen to the news content SC11 again after the news content SC11 has been output, and has given a voice instruction to the smart speaker 30, "Let me hear it again." This voice instruction instructs the content control device 100 to perform an operation for controlling reproduction of the news content SC11, which is the second voice content.

スマートスピーカー３０は、このような音声指示を受け付けると、受け付けた音声指示をコンテンツ制御装置１００に送信する。例えば、スマートスピーカー３０は、音声指示に対応する音声データをコンテンツ制御装置１００に送信する。これにより、コンテンツ制御装置１００は、ユーザＵ１から音声指示を受け付ける（ステップＳ７）。 Upon receiving such a voice instruction, the smart speaker 30 transmits the received voice instruction to the content control device 100 . For example, the smart speaker 30 transmits audio data corresponding to the voice instruction to the content control device 100 . Thereby, the content control device 100 receives a voice instruction from the user U1 (step S7).

そして、コンテンツ制御装置１００は、音声指示「もう一度聞かせて」は、ニュースコンテンツＳＣ１１を始めから聞かせる、すなわち頭出しすると判断し、ニュースコンテンツＳＣ１１を頭出しする（ステップＳ８）。例えば、コンテンツ制御装置１００は、スマートスピーカー３０に対して、ニュースコンテンツＳＣ１１を頭出しするよう指示する。 Then, the content control device 100 judges that the voice instruction "Let me hear it again" causes the news content SC11 to be heard from the beginning, that is, cueing the news content SC11, and cueing the news content SC11 (step S8). For example, the content control device 100 instructs the smart speaker 30 to cue the news content SC11.

これまで、図１を用いて、ユーザＵ１を例に、コンテンツ制御装置１００が、ユーザＵ１のユーザ情報に合わせて、メインの音声コンテンツ（第１の音声コンテンツ）にサブの音声コンテンツ（第２の音声コンテンツ）を挿入したり、ユーザＵ１の音声指示に合わせて音声コンテンツを再生制御する例を示した。しかし、コンテンツ制御装置１００は、実際には、各ユーザに合わせてこのようなコンテンツ制御処理を行う。つまり、コンテンツ制御装置１００は、ユーザに合わせ音声コンテンツをパーソナライズ化する。 So far, using FIG. 1, the content control device 100, taking the user U1 as an example, converts the main audio content (first audio content) into sub audio content (second audio content) in accordance with the user information of the user U1. Examples of inserting audio contents) and controlling reproduction of audio contents in accordance with user U1's voice instructions are shown. However, the content control apparatus 100 actually performs such content control processing according to each user. In other words, the content control device 100 personalizes the audio content for the user.

また、コンテンツ制御装置１００は、図１で説明したコンテンツ制御処理のうち、ＵＩ処理を行うことで、音声コンテンツがながら聴取されている場合であっても、当該音楽コンテンツを容易に制御させることができる。また、コンテンツ制御装置１００は、図１で説明したコンテンツ制御処理のうち、挿入処理を行うことで、ユーザが「耳の端で捉えられる」程度に加工および制御した音声コンテンツをユーザに提供することができるため、ながら聴取の中でもユーザの欲する情報を効果的に提供することができる。 In addition, the content control apparatus 100 performs UI processing among the content control processing described in FIG. can. In addition, the content control apparatus 100 performs the insertion processing among the content control processing described with reference to FIG. Therefore, it is possible to effectively provide the information desired by the user even during listening.

〔２．コンテンツ制御装置の構成〕
次に、図３を用いて、実施形態にかかるコンテンツ制御装置１００について説明する。図３は、実施形態にかかるコンテンツ制御装置１００の構成例を示す図である。図３に示すように、コンテンツ制御装置１００は、通信部１１０と、記憶部１２０と、制御部１３０とを有する。 [2. Configuration of content control device]
Next, the content control device 100 according to the embodiment will be described using FIG. FIG. 3 is a diagram showing a configuration example of the content control device 100 according to the embodiment. As shown in FIG. 3, the content control device 100 has a communication section 110, a storage section 120, and a control section .

（通信部１１０について）
通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部１１０は、ネットワークＮと有線または無線で接続され、例えば、端末装置１０、スマートスピーカー３０との間で情報の送受信を行う。 (Regarding communication unit 110)
The communication unit 110 is realized by, for example, a NIC (Network Interface Card) or the like. The communication unit 110 is connected to the network N by wire or wirelessly, and transmits and receives information to and from the terminal device 10 and the smart speaker 30, for example.

（記憶部１２０について）
記憶部１２０は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ等の半導体メモリ素子またはハードディスク、光ディスク等の記憶装置によって実現される。記憶部１２０は、コンテンツ記憶部１２１と、行動履歴記憶部１２２と、操作履歴記憶部１２３と、ユーザ情報記憶部１２４と、配信コンテンツ記憶部１２５とを有する。 (Regarding storage unit 120)
The storage unit 120 is realized by, for example, a RAM (Random Access Memory), a semiconductor memory device such as a flash memory, or a storage device such as a hard disk or an optical disk. Storage unit 120 includes content storage unit 121 , action history storage unit 122 , operation history storage unit 123 , user information storage unit 124 , and distribution content storage unit 125 .

（コンテンツ記憶部１２１について）
コンテンツ記憶部１２１は、第１の音声コンテンツおよび第２の音声コンテンツに関する情報を記憶する。ここで、図４に実施形態にかかるコンテンツ記憶部１２１の一例を示す。図４の例では、コンテンツ記憶部１２１には、第１の音声コンテンツを記憶するコンテンツ記憶部１２１－１と、第２の音声コンテンツを記憶するコンテンツ記憶部１２１－２とを有する。 (Regarding the content storage unit 121)
The content storage unit 121 stores information regarding the first audio content and the second audio content. Here, FIG. 4 shows an example of the content storage unit 121 according to the embodiment. In the example of FIG. 4, the content storage unit 121 has a content storage unit 121-1 that stores first audio content and a content storage unit 121-2 that stores second audio content.

図４の例では、コンテンツ記憶部１２１－１は、「カテゴリ」、「コンテンツＩＤ」、「第１の音声コンテンツ」、「配信日時」といった項目を有する。 In the example of FIG. 4, the content storage unit 121-1 has items such as "category", "content ID", "first audio content", and "delivery date and time".

「カテゴリ」は、「第１の音声コンテンツ」が属するカテゴリを示す。「第１の音声コンテンツ」が音楽コンテンツである場合には、「カテゴリ」として「音楽」が入力される。「コンテンツＩＤ」は、「第１の音声コンテンツ」を識別する識別情報を示す。「第１の音声コンテンツ」は、第１の音声コンテンツそのもの、つまり音声データを示す。「配信時刻」は、「第１の音声コンテンツ」が配信される配信日時（タイムスケジュール）を示す。以下、第１の音声コンテンツを区別して表記する場合には、「コンテンツＩＤ」を用いる。例えば、コンテンツＩＤ「ＭＣ１」によって識別される第１の音声コンテンツであれば、第１の音声コンテンツＭＣ１と表記する。 "Category" indicates the category to which the "first audio content" belongs. If the "first audio content" is music content, "music" is entered as the "category." "Content ID" indicates identification information for identifying the "first audio content". "First audio content" indicates the first audio content itself, that is, audio data. "Distribution time" indicates the distribution date and time (time schedule) when the "first audio content" is distributed. In the following description, "content ID" is used when distinguishing the first audio contents. For example, if it is the first audio content identified by the content ID "MC1", it will be written as the first audio content MC1.

すなわち、図４の例では、コンテンツＩＤ「ＭＣ１」によって識別される第１の音声コンテンツ「ＭＤＡ１１」は、カテゴリ「音楽」に属し、配信日時「午前７時～午前８時」であることを示す。 That is, in the example of FIG. 4, the first audio content "MDA11" identified by the content ID "MC1" belongs to the category "music" and is distributed on a date and time of "7:00 am to 8:00 am". .

また、図４の例では、コンテンツ記憶部１２１－２は、「カテゴリ」、「コンテンツＩＤ」、「変換前コンテンツ」、「変換後コンテンツ」といった項目を有する。 In the example of FIG. 4, the content storage unit 121-2 has items such as "category", "content ID", "pre-conversion content", and "post-conversion content".

「カテゴリ」は、「第２の音声コンテンツ」が属するカテゴリを示す。「第２の音声コンテンツ」が広告コンテンツである場合には、「カテゴリ」として「広告」が入力される。「コンテンツＩＤ」は、「第２の音声コンテンツ」を識別する識別情報を示す。「変換前コンテンツ」は、編成・編集・音声データへの変換が行われる前のコンテンツであり、第２の音声コンテンツの元データに対応する。図１でも示したが、「変換前コンテンツ」は、例えば、テキストデータであり、外部装置６０から取得される。 "Category" indicates the category to which the "second audio content" belongs. If the "second audio content" is advertisement content, "advertisement" is entered as the "category." "Content ID" indicates identification information for identifying the "second audio content". "Pre-conversion content" is content before being organized, edited, and converted into audio data, and corresponds to the original data of the second audio content. As shown in FIG. 1, the “pre-conversion content” is text data, for example, and is acquired from the external device 60 .

「変換後コンテンツ」は、「変換前コンテンツ」が編成および編集され、さらに音声データへの変換された後の音声コンテンツであり、第１の音声コンテンツに挿入される候補の第２の音声コンテンツに対応する。以下、第２の音声コンテンツを区別して表記する場合には、「コンテンツＩＤ」を用いる。例えば、コンテンツＩＤ「ＳＣ１１」によって識別される第２の音声コンテンツであれば、第２の音声コンテンツＳＣ１１と表記する。 "Post-conversion content" is audio content after the "pre-conversion content" has been organized and edited and further converted into audio data, and is used as a candidate second audio content to be inserted into the first audio content. handle. In the following description, "content ID" is used when distinguishing the second audio contents. For example, the second audio content identified by the content ID "SC11" is expressed as the second audio content SC11.

すなわち、図４の例では、コンテンツＩＤ「ＳＣ１１」によって識別される第２の音声コンテンツ「ＳＤＡ１１ｂ」は、カテゴリ「ニュース」に属することを示す。なお、コンテンツ記憶部１２１－２に記憶される第２の音声コンテンツは、第１の音声コンテンツに挿入される候補の第２の音声コンテンツである。 That is, the example of FIG. 4 indicates that the second audio content "SDA11b" identified by the content ID "SC11" belongs to the category "news." The second audio content stored in the content storage unit 121-2 is a candidate second audio content to be inserted into the first audio content.

（行動履歴記憶部１２２について）
行動履歴記憶部１２２は、ユーザの行動履歴を記憶する。ここで、図５に実施形態にかかる行動履歴記憶部１２２の一例を示す。図５の例では、行動履歴記憶部１２２は、「ユーザＩＤ」、「カテゴリ」、「日時」、「行動履歴」といった項目を有する。 (Regarding the action history storage unit 122)
The action history storage unit 122 stores a user's action history. Here, FIG. 5 shows an example of the action history storage unit 122 according to the embodiment. In the example of FIG. 5, the action history storage unit 122 has items such as "user ID", "category", "date and time", and "action history".

「ユーザＩＤ」は、ユーザまたはユーザの端末装置１０を識別する識別情報を示す。「カテゴリ」は、対応する「行動履歴」がどのような状況下での行動履歴なのかを識別する識別情報を示す。例えば、対応する「行動履歴」がインターネット上での行動履歴である場合には、カテゴリ「ネット」が対応付けられる。また、例えば、対応する「行動履歴」が日常生活（生活導線上）での行動履歴である場合には、カテゴリ「日常生活」が対応付けられる。 “User ID” indicates identification information for identifying the user or the user's terminal device 10 . "Category" indicates identification information that identifies under what circumstances the corresponding "action history" is the action history. For example, when the corresponding "action history" is the action history on the Internet, the category "net" is associated. Further, for example, when the corresponding "action history" is the action history in daily life (on the life line), the category "daily life" is associated.

「日時」は、「行動履歴」が示す行動が行われた日時を示す。「行動履歴」は、ユーザによって実際に行われた行動の履歴情報を示す。インターネット上の行動履歴としては、例えば、コンテンツに対する閲覧履歴、購買履歴、動画や音楽の視聴履歴等が挙げられる。また、日常生活上の行動履歴としては、起床時刻、就寝時刻、利用路線、職場位置、移動履歴等が挙げられる。 The "date and time" indicates the date and time when the action indicated by the "action history" was performed. “Action history” indicates history information of actions actually performed by the user. The action history on the Internet includes, for example, browsing history of contents, purchase history, viewing history of moving images and music, and the like. Also, the behavior history in daily life includes wake-up time, bedtime, used route, workplace location, movement history, and the like.

また、コンテンツ制御装置１００は、ユーザがまさに現在行っている行動（例えば、洗濯や料理）を検知出来た場合には、この行動を示す行動情報を行動履歴として記憶してもよい。なお、コンテンツ制御装置１００は、例えば、スマートスピーカー３０を介して、ユーザがまさに現在行っている行動を検知することができる。例えば、コンテンツ制御装置１００は、スマートスピーカー３０が洗濯機の動作音を検出した場合には、ユーザは現在、洗濯をしていることを検知する。 Also, if the content control device 100 can detect an action that the user is currently performing (for example, washing or cooking), the content control device 100 may store action information indicating this action as an action history. Note that the content control device 100 can detect, for example, the action that the user is currently performing via the smart speaker 30 . For example, the content control device 100 detects that the user is currently doing laundry when the smart speaker 30 detects the operation sound of the washing machine.

また、コンテンツ制御装置１００は、上記行動履歴から、ユーザの生活スタイルの傾向を特定し、傾向として特定した生活スタイルも記憶してよい。 Also, the content control device 100 may identify a tendency of the user's lifestyle from the action history, and may also store the identified lifestyle as the tendency.

すなわち、図５の例では、ユーザＩＤ「Ｕ１」によって識別されるユーザ（ユーザＵ１）が、「２０１８年５月２０日２０時」に「化粧品サイト」にアクセスしたことを示す。 That is, the example of FIG. 5 indicates that the user (user U1) identified by the user ID "U1" accessed the "cosmetics site" at "20:00 on May 20, 2018".

（操作履歴記憶部１２３について）
操作履歴記憶部１２３は、ユーザの行動履歴を記憶する。本実施形態では、操作履歴記憶部１２３は、スマートスピーカー３０に対するユーザの操作履歴を記憶するものとする。ここで、図６に実施形態にかかる操作履歴記憶部１２３の一例を示す。図６の例では、操作履歴記憶部１２３は、「ユーザＩＤ」、「日時」、「操作履歴」といった項目を有する。 (Regarding the operation history storage unit 123)
The operation history storage unit 123 stores the user's action history. In this embodiment, the operation history storage unit 123 is assumed to store the user's operation history with respect to the smart speaker 30 . Here, FIG. 6 shows an example of the operation history storage unit 123 according to the embodiment. In the example of FIG. 6, the operation history storage unit 123 has items such as "user ID", "date and time", and "operation history".

「ユーザＩＤ」は、ユーザまたはユーザの端末装置１０を識別する識別情報を示す。「日時」は、「操作履歴」が示す行動が行われた日時を示す。「操作履歴」は、ユーザがスマートスピーカー３０を操作した操作履歴を示す。具体的には、「操作履歴」は、ユーザがスマートスピーカー３０を操作し、どのようなコンテンツ（チャンネルまたは番組）を聞いたかを示す操作履歴を示す。 “User ID” indicates identification information for identifying the user or the user's terminal device 10 . The "date and time" indicates the date and time when the action indicated by the "operation history" was performed. The “operation history” indicates the operation history of the smart speaker 30 operated by the user. Specifically, “operation history” indicates an operation history indicating what content (channel or program) the user has listened to by operating the smart speaker 30 .

すなわち、図６の例では、ユーザＩＤ「Ｕ１」によって識別されるユーザ（ユーザＵ１）が、「２０１８年５月２０日７時」に、ユーザがスマートスピーカー３０を操作し、音楽コンテンツであるチャンネルＣｈ１を流したことを示す。 That is, in the example of FIG. 6 , the user (user U1) identified by the user ID “U1” operates the smart speaker 30 at “7:00 on May 20, 2018”, and the channel which is the music content. It shows that Ch1 has flowed.

（ユーザ情報記憶部１２４について）
ユーザ情報記憶部１２４は、ユーザに関する各種情報を記憶する。例えば、ユーザ情報記憶部１２４は、「行動履歴」や「操作履歴」から特定されたユーザの生活スタイルや行動傾向、趣味嗜好等を記憶する。ここで、図７に実施形態にかかるユーザ情報記憶部１２４の一例を示す。図７の例では、ユーザ情報記憶部１２４は、「ユーザＩＤ」、「属性情報」、「趣味嗜好」、「生活状況」、「傾向情報」といった項目を有する。 (Regarding the user information storage unit 124)
The user information storage unit 124 stores various information about users. For example, the user information storage unit 124 stores the user's life style, behavioral tendencies, hobbies and preferences, etc. specified from the "action history" and "operation history". Here, FIG. 7 shows an example of the user information storage unit 124 according to the embodiment. In the example of FIG. 7, the user information storage unit 124 has items such as "user ID", "attribute information", "hobbies and preferences", "living situation", and "tendency information".

「ユーザＩＤ」は、ユーザまたはユーザの端末装置１０を識別する識別情報を示す。「属性情報」は、ユーザの属性情報を示す。「属性情報」としては、例えば、ユーザの年齢や性別が挙げられる。「趣味嗜好」は、ユーザの趣味嗜好を示す。例えば、コンテンツ制御装置１００は、ユーザの「行動履歴」や「操作履歴」に基づいて、ユーザの趣味嗜好を特定する。 “User ID” indicates identification information for identifying the user or the user's terminal device 10 . "Attribute information" indicates user attribute information. "Attribute information" includes, for example, the user's age and gender. “Preferences” indicates the preferences of the user. For example, the content control device 100 identifies the user's hobbies and preferences based on the user's "behavior history" and "operation history."

「生活状況」は、ユーザの生活スタイル（生活の傾向）を示す。例えば、コンテンツ制御装置１００は、ユーザの「行動履歴」や「操作履歴」に基づいて、ユーザの生活状況を特定する。「傾向情報」は、ユーザの行動傾向を示す。「傾向情報」は、例えば、ユーザがどの時間帯にどのような音声コンテンツを聞く傾向にあるかを示す情報である。 “Lifestyle” indicates the user's lifestyle (living tendency). For example, the content control device 100 identifies the user's living situation based on the user's "behavior history" and "operation history." "Tendency information" indicates user behavior tendencies. The “tendency information” is, for example, information indicating what kind of audio content the user tends to listen to in what time slot.

すなわち、図７の例では、ユーザＩＤ「Ｕ１」によって識別されるユーザ（ユーザＵ１）は、趣味嗜好「マンガ、ファッション」であり、生活状況「７時起床、六本木周辺で勤務」、傾向情報「７時台にニュースコンテンツを視聴、７時台に天気予報を視聴、２０時台にネットショッピングを視聴」であることを示す。 That is, in the example of FIG. 7, the user (user U1) identified by the user ID "U1" has a hobby of "manga, fashion", a living situation of "waking up at 7 o'clock, working around Roppongi", and trend information of " View news content between 7:00, weather forecast between 7:00, and online shopping between 8:00."

（配信コンテンツ記憶部１２５について）
配信コンテンツ記憶部１２５は、ユーザに対して実際に配信される音声コンテンツ、すなわち第２の音声コンテンツが挿入された第１の音声コンテンツを記憶する。なお、以下の実施形態では、第２の音声コンテンツが挿入された第１の音声コンテンツを「配信コンテンツ」と表記する場合がある。ここで、図８に実施形態にかかる配信コンテンツ記憶部１２５の一例を示す。図８の例では、配信コンテンツ記憶部１２５は、「ユーザＩＤ」、「配信コンテンツＩＤ」、「配信コンテンツ」といった項目を有する。 (Regarding the distribution content storage unit 125)
The distribution content storage unit 125 stores the audio content actually distributed to the user, that is, the first audio content into which the second audio content is inserted. In addition, in the following embodiments, the first audio content into which the second audio content is inserted may be referred to as "distribution content". Here, FIG. 8 shows an example of the distributed content storage unit 125 according to the embodiment. In the example of FIG. 8, the distributed content storage unit 125 has items such as "user ID", "distributed content ID", and "distributed content".

「ユーザＩＤ」は、ユーザまたはユーザの端末装置１０を識別する識別情報を示す。「配信コンテンツＩＤ」は、配信コンテンツを識別する識別情報を示す。「配信コンテンツ」は、配信コンテンツそのもの、つまり音声データを示す。 “User ID” indicates identification information for identifying the user or the user's terminal device 10 . “Distribution content ID” indicates identification information for identifying distribution content. “Distributed content” indicates the distributed content itself, that is, audio data.

（制御部１３０について）
図３に戻り、制御部１３０は、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、コンテンツ制御装置１００内部の記憶装置に記憶されている各種プログラムがＲＡＭを作業領域として実行されることにより実現される。また、制御部１３０は、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 (Regarding the control unit 130)
Returning to FIG. 3, the control unit 130 executes various programs stored in the storage device inside the content control device 100 using the RAM as a work area by means of a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like. It is realized by Also, the control unit 130 is implemented by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図３に示すように、制御部１３０は、取得部１３１と、変換部１３２と、特定部１３３と、決定部１３４と、選択部１３５と、出力制御部１３６と、受付部１３７と、コンテンツ制御部１３８と、配信部１３９とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部１３０の内部構成は、図３に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。また、制御部１３０が有する各処理部の接続関係は、図３に示した接続関係に限られず、他の接続関係であってもよい。 As shown in FIG. 3, the control unit 130 includes an acquisition unit 131, a conversion unit 132, an identification unit 133, a determination unit 134, a selection unit 135, an output control unit 136, a reception unit 137, and a content control unit. It has a unit 138 and a distribution unit 139, and implements or executes the information processing functions and actions described below. Note that the internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 3, and may be another configuration as long as it performs information processing described later. Moreover, the connection relationship between the processing units of the control unit 130 is not limited to the connection relationship shown in FIG. 3, and may be another connection relationship.

（取得部１３１について）
取得部１３１は、第１の音声コンテンツに挿入される候補の第２の音声コンテンツを取得する。例えば、取得部１３１は、候補の第２の音声コンテンツの元データ（例えば、テキスト）を取得する。また、取得部１３１は、コンテンツ制御処理（挿入処理）が行われるタイミングになるとコンテンツ記憶部１２１－２から候補の第２の音声コンテンツを取得する。 (Regarding the acquisition unit 131)
The acquisition unit 131 acquires a second audio content candidate to be inserted into the first audio content. For example, the acquisition unit 131 acquires the original data (for example, text) of the candidate second audio content. Further, the acquisition unit 131 acquires the second candidate audio content from the content storage unit 121-2 at the timing when the content control process (insertion process) is performed.

なお、取得部１３１は、候補の第２の音声コンテンツとして、リアルタイム性が求められるような音声コンテンツの元データについては、第１の音声コンテンツに対して決められている配信のタイムスケジュールに基づいて、例えば、配信予定時刻の１０分前（１０分前に限定されない）の最新情報の元データを取得する。そして、取得部１３１は、この最新情報の元データを用いて、コンテンツ制御処理（挿入処理）を行わせる。 Note that the acquisition unit 131 obtains the original data of the audio content that requires real-time performance as the candidate second audio content based on the distribution time schedule determined for the first audio content. For example, the original data of the latest information 10 minutes before (not limited to 10 minutes) before the scheduled delivery time is acquired. Then, the acquisition unit 131 causes content control processing (insertion processing) to be performed using the original data of the latest information.

例えば、取得部１３１は、元データを外部装置６０から取得することができる。一方で、取得部１３１は、元データではなく、候補の第２の音声コンテンツそのもの（音声データに変換済）を外部装置６０から取得してもよい。また、取得部１３１は、第１の音声データについても外部装置６０から取得してよい。また、取得部１３１は、コンテンツ制御処理（挿入処理）が行われるタイミングになるとコンテンツ記憶部１２１－１から第１の音声コンテンツを取得する。 For example, the acquisition unit 131 can acquire original data from the external device 60 . On the other hand, the acquisition unit 131 may acquire the candidate second audio content itself (converted into audio data) from the external device 60 instead of the original data. The acquisition unit 131 may also acquire the first audio data from the external device 60 . Also, the acquisition unit 131 acquires the first audio content from the content storage unit 121-1 at the timing of performing the content control process (insertion process).

（変換部１３２について）
変換部１３２は、取得部１３１により取得された元データをまず編成および編集する。例えば、変換部１３２は、元データであるテキストの文字数が所定の文字数を超えている場合には、予め決められた文字数に収まるよう元データを編集する。また、例えば、変換部１３２は、元データであるテキストを解析し、文章構成等に不具合がある場合には、元データを編成し直す。そして、変換部１３２は、編成および編集した元データを音声データに変換する。 (Regarding the conversion unit 132)
The conversion unit 132 first organizes and edits the original data acquired by the acquisition unit 131 . For example, when the number of characters of the text, which is the original data, exceeds a predetermined number of characters, the conversion unit 132 edits the original data so that the number of characters is within the predetermined number of characters. Also, for example, the conversion unit 132 analyzes the text, which is the original data, and reorganizes the original data if there is a problem with the sentence structure or the like. Then, the conversion unit 132 converts the organized and edited original data into audio data.

（特定部１３３について）
特定部１３３は、配信コンテンツが配信される配信先のユーザについて、当該ユーザの行動傾向、生活スタイル（生活状況）、趣味嗜好を特定する。具体的には、特定部１３３は、ユーザ情報として、行動履歴記憶部１２２に記憶される行動履歴や、操作履歴記憶部１２３に記憶される操作履歴に基づいて、行動傾向、生活スタイル（生活状況）、趣味嗜好を特定する。 (Regarding the specifying unit 133)
The identifying unit 133 identifies behavioral tendencies, lifestyles (living conditions), and tastes and preferences of users to whom distribution content is distributed. Specifically, the identification unit 133 determines behavior trends, lifestyles (living situations) based on the behavior history stored in the behavior history storage unit 122 and the operation history stored in the operation history storage unit 123 as user information. ), to identify hobbies and preferences.

一例を示すと、特定部１３３は、ユーザ情報に基づいて、ユーザがどの時間にどのような音声コンテンツを聞く傾向にあるか、といった行動傾向を特定（算出）する。また、特定部１３３は、ユーザ情報に基づいて、毎日どれくらいの時間に就寝起床するか、日常生活の中での移動エリアはどのエリアであるか、といった行動傾向を特定（算出）する。また、特定部１３３は、移動エリアの傾向から、例えば、ユーザの勤務先を特定することができる。なお、以下に示す決定部１３４が特定部１３３の処理を行ってもよい。かかる場合、コンテンツ制御装置１００は、特定部１３３を有しない。 To give an example, the identifying unit 133 identifies (calculates) a behavioral tendency, such as what kind of audio content the user tends to listen to at what time, based on the user information. Based on the user information, the specifying unit 133 also specifies (calculates) behavioral tendencies such as what time the user goes to bed and wakes up every day, and which area the user moves to in daily life. Further, the specifying unit 133 can specify, for example, the place of work of the user from the movement area tendency. Note that the determination unit 134 described below may perform the processing of the identification unit 133 . In such a case, the content control device 100 does not have the specifying unit 133 .

（決定部１３４について）
決定部１３４は、ユーザ情報に基づいて、第１の音声コンテンツに挿入される対象の第２の音声コンテンツと、当該対象の第２の音声コンテンツが第１の音声コンテンツに挿入される時間位置であって第１の音声コンテンツにおける時間位置とを決定する。かかるユーザ情報は、行動履歴記憶部１２２に記憶される行動履歴や、操作履歴記憶部１２３に記憶される操作履歴である。 (About decision unit 134)
Based on the user information, the determining unit 134 selects the second audio content to be inserted into the first audio content and the time position at which the second audio content is to be inserted into the first audio content. A time position in the first audio content is determined. Such user information is the action history stored in the action history storage unit 122 and the operation history stored in the operation history storage unit 123 .

そして、決定部１３４は、ユーザ情報に基づきユーザがどの時間にどのような音声コンテンツを聞く傾向にあるかが算出された算出結果に基づいて、対象の第２の音声コンテンツと時間位置とを決定する。また、決定部１３４は、ユーザ情報に基づいて、候補の第２の音声コンテンツのうち、ユーザの生活状況または趣味嗜好に応じた音声コンテンツを第１の音声コンテンツに挿入される対象の第２の音声コンテンツとして決定する。 Then, the determining unit 134 determines the target second audio content and the time position based on the result of calculating what kind of audio content the user tends to listen to at what time based on the user information. do. Further, based on the user information, the determining unit 134 selects the second audio content to be inserted into the first audio content, among the candidate second audio content, according to the user's living situation or hobbies and tastes. Determined as audio content.

（選択部１３５について）
選択部１３５は、配信先のユーザに関するユーザ情報に基づいて、候補の第２の音声コンテンツのうち、第１の音声コンテンツに挿入される第２の音声コンテンツを選択する。例えば、選択部１３５は、配信先のユーザに関するユーザ情報として、ユーザの行動履歴に基づいて、候補の第２の音声コンテンツのうち、第１の音声コンテンツに挿入される対象の第２の音声コンテンツを選択する。また、選択部１３５は、配信先のユーザに関するユーザ情報として、音声コンテンツを出力する所定の出力装置に対するユーザの操作履歴に基づいて、候補の第２の音声コンテンツのうち、第１の音声コンテンツに挿入される対象の第２の音声コンテンツを選択する。選択部１３５は、決定部１３４により決定された第２の音声コンテンツを対象の第２の音声コンテンツとして選択する。例えば、選択部１３５は、コンテンツ記憶部１２１－２から対象の第２の音声コンテンツを選択する。 (Regarding the selection unit 135)
The selection unit 135 selects the second audio content to be inserted into the first audio content from the second audio content candidates based on the user information about the user of the distribution destination. For example, the selection unit 135 selects the second audio content to be inserted into the first audio content among the candidate second audio content based on the action history of the user as the user information about the user of the delivery destination. to select. In addition, the selection unit 135 selects the first audio content among the second audio content candidates based on the user's operation history with respect to a predetermined output device that outputs the audio content as the user information about the user of the distribution destination. Selecting the second audio content to be inserted. The selection unit 135 selects the second audio content determined by the determination unit 134 as the target second audio content. For example, the selection unit 135 selects the target second audio content from the content storage unit 121-2.

また、選択部１３５は、決定部１３４により決定された時間位置に、選択した対象の第２の音声コンテンツを挿入する。すなわち、選択部１３５は、挿入部に対応する処理部でもある。 Also, the selection unit 135 inserts the selected second audio content at the time position determined by the determination unit 134 . That is, the selection unit 135 is also a processing unit corresponding to the insertion unit.

（出力制御部１３６について）
出力制御部１３６は、ユーザに関する情報に基づいて、第１の音声コンテンツまたは第２の音声コンテンツの出力を制御する。具体的には、出力制御部１３６は、ユーザに関する情報として、ユーザの周辺環境（周辺ノイズ）を示す環境情報またはユーザの属性情報に基づいて、第１の音声コンテンツまたは第２の音声コンテンツの出力を制御する。 (Regarding the output control unit 136)
The output control unit 136 controls output of the first audio content or the second audio content based on the information regarding the user. Specifically, the output control unit 136 outputs the first audio content or the second audio content based on the user attribute information or the environment information indicating the user's surrounding environment (surrounding noise) as the information about the user. to control.

一例を示すと、出力制御部１３６は、前記ユーザに関する情報に基づいて、第１の音声コンテンツまたは第２の音声コンテンツが出力される際の音量、音色またはピッチを制御する。他の一例を示すと、出力制御部１３６は、ユーザに関する情報に基づいて、第１の音声コンテンツまたは第２の音声コンテンツを巻き戻す。 As an example, the output control unit 136 controls the volume, timbre, or pitch when the first audio content or the second audio content is output, based on the information about the user. As another example, the output control unit 136 rewinds the first audio content or the second audio content based on information about the user.

なお、出力制御部１３６は、第１の音声コンテンツまたは第２の音声コンテンツがスマートスピーカー３０から出力される前、すなわちスマートスピーカー３０に配信コンテンツを配信する前の段階で、上記出力制御を行ってもよい。一方、出力制御部１３６は、配信コンテンツがスマートスピーカー３０で再生されている最中に、配信コンテンツに対して、上記出力制御を行ってもよい。つまり、出力制御部１３６は、出力中の配信コンテンツについて動的に上記出力制御を行ってよい。 Note that the output control unit 136 performs the above output control before the first audio content or the second audio content is output from the smart speaker 30, that is, before the distribution content is distributed to the smart speaker 30. good too. On the other hand, the output control unit 136 may perform the above output control on the distributed content while the smart speaker 30 is reproducing the distributed content. In other words, the output control unit 136 may dynamically perform the above-described output control on the distribution content that is being output.

（受付部１３７について）
受付部１３７は、出力された音声コンテンツの再生を制御する操作を行うよう指示する音声指示をユーザから受け付ける。例えば、受付部１３７は、スマートスピーカー３０から出力された音声コンテンツの再生を制御する操作を行うよう指示する音声指示をユーザから受け付ける。具体的には、受付部１３７は、出力された音声コンテンツとして、第１の音声コンテンツに挿入された第２の音声コンテンツの再生を制御する操作を行うよう指示する音声指示をユーザから受け付ける。 (Regarding the reception unit 137)
The accepting unit 137 accepts a voice instruction from the user that instructs to perform an operation for controlling reproduction of the output audio content. For example, the reception unit 137 receives a voice instruction from the user that instructs to perform an operation to control reproduction of audio content output from the smart speaker 30 . Specifically, the receiving unit 137 receives, from the user, a voice instruction to perform an operation for controlling reproduction of the second audio content inserted in the first audio content as the output audio content.

例えば、受付部１３７は、音声コンテンツの再生を制御する操作として、音声コンテンツを巻き戻すよう指示する音声指示を受け付ける。かかる音声指示としては、巻き戻す操作をダイレクトに指示する「巻き戻し！あるいは巻き戻せ」といったワードがある。 For example, the receiving unit 137 receives a voice instruction to rewind the audio content as an operation for controlling reproduction of the audio content. Such voice instructions include words such as "rewind! Or rewind" that directly instruct the rewinding operation.

また、受付部１３７は、音声コンテンツの再生を制御する操作として、音声コンテンツを頭出しするよう指示する音声指示を受け付ける。かかる音声指示としては、頭出し操作をダイレクトに指示する「頭出し」といったワードの他に、音声コンテンツに対するユーザの反応を示す発話での音声指示がある。音声コンテンツに対するユーザの反応を示す発話には、例えば、図１で示すように「えっ？」等のワードがある。また、受付部１３７は、頭出し操作をダイレクトに指示する「頭出し」ではなく、これを言い換えた類義の音声指示、例えば、「もう一度聞かせて」等のワードを受け付けてもよい。 The accepting unit 137 also accepts a voice instruction for cueing the audio content as an operation for controlling the reproduction of the audio content. Examples of such voice instructions include a word such as "cue" that directly instructs a cue operation, as well as a voice instruction that indicates the user's reaction to the audio content. An utterance indicating a user's reaction to audio content includes, for example, words such as "Eh?" as shown in FIG. Further, the accepting unit 137 may accept a voice instruction with a similar meaning, for example, a word such as "Let me hear it again" instead of "cue" that directly instructs a cue operation.

なお、受付部１３７は、直前に再生された音声コンテンツに対する音声指示だけでなく、直前に再生された音声コンテンツよりもさらに過去に再生された音声コンテンツに対する音声指示も受け付けることができる。頭出し操作を例に挙げると、受付部１３７は、例えば、「２つ前まで戻して」等といったように、どの音声コンテンツを頭出しさせるかを指定する言葉（かかる例では「２つ前」）を含む音声指示を受け付ける。 Note that the reception unit 137 can receive not only the voice instruction for the audio content played immediately before, but also the audio instruction for the audio content played earlier than the audio content played immediately before. Taking a cueing operation as an example, the receiving unit 137 may select a word (in this example, "two before") that designates which audio content is to be cueed, such as "go back two places". ) is accepted.

また、受付部１３７は、音声コンテンツの再生を制御する操作として、音声コンテンツが挿入された第１の音声コンテンツから、第２の音声コンテンツを抽出するよう指示する音声指示を受け付ける。かかる音声指示としては、抽出操作をダイレクトに指示する「クリップ」といったワードを受け付ける。また、受付部１３７は、抽出操作をダイレクトに指示する「クリップ」ではなく、これを言い換えた類義の音声指示、例えば、「取っといて」や「後で聞きたい」等のワードを受け付けてもよい。 The accepting unit 137 also accepts a voice instruction to extract the second audio content from the first audio content into which the audio content is inserted, as an operation for controlling the reproduction of the audio content. As such a voice instruction, a word such as "clip" that directly instructs the extraction operation is accepted. Further, the receiving unit 137 receives not "clip" that directly instructs the extraction operation, but a voice instruction with a similar meaning, such as "take it" or "I want to hear it later". good too.

また、受付部１３７は、音声コンテンツの再生を制御する操作として、音声コンテンツを早送り、または、スキップするよう指示する音声指示を受け付ける。かかる音声指示としては、早送り操作をダイレクトに指示する「早送り」、スキップ操作をダイレクトに指示する「スキップ」といったワードを受け付ける。また、受付部１３７は、早送り操作、スキップ操作をダイレクトに指示するワードでの音声指示ではなく、これを言い換えた類義の音声指示、例えば、「早めて」や「次のやつ」等のワードを受け付けてもよい。 The accepting unit 137 also accepts a voice instruction to fast-forward or skip the audio content as an operation for controlling the reproduction of the audio content. As such voice instructions, words such as "fast forward" directly instructing a fast forward operation and "skip" directly instructing a skip operation are accepted. Further, the reception unit 137 does not provide a voice instruction in words directly instructing a fast-forward operation or a skip operation, but a voice instruction with a similar meaning, such as words such as "faster" or "next one". may be accepted.

例えば、スマートスピーカー３０は上記のようなワードを含む音声指示を検出すると、この音声指示をテキストに変換し、変換したテキストをコンテンツ制御装置１００に送信する。これにより、受付部１３７は、音声指示として、テキストに変換された音声指示を受け付ける。なお、スマートスピーカー３０は上記のようなワードを含む音声指示を検出すると、この音声指示そのもの、つまり音声データをコンテンツ制御装置１００に送信してもよい。 For example, when the smart speaker 30 detects a voice instruction including the above words, it converts the voice instruction into text and transmits the converted text to the content control device 100 . Accordingly, the reception unit 137 receives the voice instruction converted into text as the voice instruction. Note that, when the smart speaker 30 detects a voice instruction including the words as described above, the smart speaker 30 may transmit the voice instruction itself, that is, the voice data to the content control device 100 .

また、受付部１３７は、音声指示を受け付けると、受け付けた音声指示がどのような操作を指示するのかを判定する。図３では、不図示であるが、例えば、コンテンツ制御装置１００は、音声指示を示す各ワードのテキスト（例えば、「巻き戻し」「頭出し」「え？」「早送り」等）と、そのワードが示す操作の内容とが対応付けられた記憶部を有する。かかる場合、受付部１３７は、音声指示に対応するテキストと、かかる記憶部とを照らし合わせて、音声指示がどのような操作を指示するのかを判定する。 Further, upon receiving a voice instruction, the receiving unit 137 determines what kind of operation the received voice instruction instructs. Although not shown in FIG. 3, for example, the content control apparatus 100 displays the text of each word indicating a voice instruction (for example, "rewind," "cue," "eh?," "fast forward," etc.) and the word has a storage unit associated with the content of the operation indicated by . In such a case, the reception unit 137 compares the text corresponding to the voice instruction with the storage unit to determine what kind of operation the voice instruction instructs.

（コンテンツ制御部１３８について）
コンテンツ制御部１３８は、受付部１３７により受け付けられた音声指示に応じた操作を音声コンテンツに対して行う。例えば、コンテンツ制御部１３８は、受付部１３７により音声コンテンツを巻き戻すよう指示する音声指示が受け付けられた場合に、音声コンテンツを巻き戻す。例えば、コンテンツ制御部１３８は、スマートスピーカー３０に対して、音声コンテンツを巻き戻すよう指示する。 (Regarding the content control unit 138)
The content control unit 138 performs an operation on the audio content according to the audio instruction accepted by the accepting unit 137 . For example, the content control unit 138 rewinds the audio content when the receiving unit 137 receives a voice instruction to rewind the audio content. For example, the content controller 138 instructs the smart speaker 30 to rewind the audio content.

また、コンテンツ制御部１３８は、受付部１３７により音声コンテンツを頭出しするよう指示する音声指示が受け付けられた場合に、音声コンテンツを頭出しする。例えば、コンテンツ制御部１３８は、スマートスピーカー３０に対して、音声コンテンツを巻き戻すよう指示する。 Further, the content control unit 138 cue the audio content when the reception unit 137 receives a voice instruction to cue the audio content. For example, the content controller 138 instructs the smart speaker 30 to rewind the audio content.

また、コンテンツ制御部１３８は、受付部１３７により音声コンテンツが挿入された第１の音声コンテンツから音声コンテンツを抽出するよう指示する音声指示が受け付けられた場合に、音声コンテンツを第１の音声コンテンツから抽出する。かかる場合の音声指示は、上記の通り、「クリップ」、「取っといて」、「後で聞きたい」といった音声指示である。この場合、コンテンツ制御部１３８は、コンテンツ制御装置１００とは異なる他の装置に対して、第１の音声コンテンツから抽出した音声コンテンツ、すなわち第２の音声コンテンツを送信（転送）する。コンテンツ制御装置１００とは異なる他の装置は、例えば、ユーザの端末装置１０である。これにより、ユーザは、自身の好きなタイミングで、端末装置１０で第２の音声コンテンツを聞き直すことができるようになる。 Further, when the reception unit 137 receives a voice instruction to extract the audio content from the first audio content into which the audio content is inserted, the content control unit 138 extracts the audio content from the first audio content. Extract. The voice instructions in such a case are, as described above, voice instructions such as "clip", "take it", and "listen to it later". In this case, the content control unit 138 transmits (transfers) the audio content extracted from the first audio content, that is, the second audio content, to another device other than the content control device 100 . Another device different from the content control device 100 is, for example, the terminal device 10 of the user. This allows the user to re-listen to the second audio content on the terminal device 10 at his/her desired timing.

また、コンテンツ制御部１３８は、受付部１３７により音声コンテンツを早送り、または、スキップするよう指示する音声指示が受け付けられた場合に、音声コンテンツを早送り、または、スキップする。 Further, the content control unit 138 fast-forwards or skips the audio content when the receiving unit 137 receives a voice instruction to fast-forward or skip the audio content.

（配信部１３９）
配信部１３９は、第２の音声コンテンツが挿入された第１の音声コンテンツである配信コンテンツを配信する。例えば、配信部１３９は、スマートスピーカー３０に配信コンテンツを配信する。 (Distribution unit 139)
The distribution unit 139 distributes the distribution content, which is the first audio content in which the second audio content is inserted. For example, the distribution unit 139 distributes distribution content to the smart speaker 30 .

〔３．処理手順〕
（挿入処理について）
次に、図９を用いて、実施形態にかかるコンテンツ制御装置１００が実行するコンテンツ制御処理のうち、挿入処理の手順について説明する。図９は、実施形態にかかる挿入処理手順を示すフローチャートである。 [3. Processing procedure]
(About insert processing)
Next, with reference to FIG. 9, a procedure of insertion processing among content control processing executed by the content control apparatus 100 according to the embodiment will be described. FIG. 9 is a flowchart illustrating an insertion processing procedure according to the embodiment;

まず、特定部１３３は、配信コンテンツが配信される配信先のユーザについて、当該ユーザの行動傾向、生活スタイル（生活状況）、趣味嗜好を特定する（ステップＳ１０１）。具体的には、特定部１３３は、ユーザ情報として、行動履歴記憶部１２２に記憶される行動履歴や、操作履歴記憶部１２３に記憶される操作履歴に基づいて、行動傾向、生活スタイル（生活状況）、趣味嗜好を特定する。一例を示すと、特定部１３３は、ユーザ情報に基づいて、ユーザがどの時間にどのような音声コンテンツを聞く傾向にあるか、すなわち時間帯と当該時間帯に聞かれる音声コンテンツの内容との傾向を算出する。以下、ユーザＵ１を例に説明する。 First, the identification unit 133 identifies behavioral tendencies, lifestyles (living situations), and tastes and preferences of users to whom distribution content is distributed (step S101). Specifically, the identification unit 133 determines behavior trends, lifestyles (living situations) based on the behavior history stored in the behavior history storage unit 122 and the operation history stored in the operation history storage unit 123 as user information. ), to identify hobbies and preferences. To give an example, based on the user information, the identifying unit 133 determines what kind of audio content the user tends to listen to at what time, that is, the tendency of the time period and the content of the audio content heard in the time period. Calculate User U1 will be described below as an example.

図９では、特定部１３３は、ユーザＵ１は「７時台にニュースを視聴する傾向にある」、「２０時台にラジオショッピングを視聴する傾向にある」、「マンガが趣味」、「勤務地六本木周辺」と特定したものとする。 In FIG. 9, the specifying unit 133 determines that the user U1 "tends to watch news between 7:00 and 20:00", "tends to watch radio shopping between 8:00 and 8:00", "comics are his hobby", and "work place "around Roppongi".

なお、図９の例では、特定部１３３は、初めにこのような特定処理を行っているが、特定処理を行うタイミングは限定されない。行動履歴や操作履歴は、ユーザの行動に応じて日に日に蓄積されてゆくものなので、特定部１３３は、例えば、所定期間毎に特定処理を繰り返し、その度に、ユーザ情報記憶部１２４に記憶される記憶情報（「趣味嗜好」「生活状況」「傾向情報」）を更新する。 Note that in the example of FIG. 9, the specifying unit 133 initially performs such specifying processing, but the timing of performing the specifying processing is not limited. Since the action history and the operation history are accumulated day by day in accordance with the actions of the user, the identification unit 133 repeats the identification process, for example, every predetermined period of time. The stored information (“hobbies and preferences”, “living situation”, and “tendency information”) is updated.

次に、取得部１３１は、挿入処理を行うタイミングであるか否かを判定する（ステップＳ１０２）。例えば、取得部１３１は、コンテンツ記憶部１２１－１に記憶される第１の音声コンテンツの配信日時（配信スケジュール）に基づいて、挿入処理を行うタイミングであるか否かを判定する。これまで説明してきた通り、第１の音声コンテンツに挿入される第２の音声コンテンツの中には、リアルタイム性が求められるようなものが存在するため、このことを考慮して、取得部１３１は、第１の音声コンテンツの配信日時の直前（例えば、１０分前）になれば、挿入処理を行うタイミングであると判定する（ステップＳ１０２；Ｙｅｓ）。一方、取得部１３１は、挿入処理を行うタイミングでないと判定した場合には（ステップＳ１０２；Ｎｏ）、挿入処理を行うタイミングになるまで待機する。 Next, the acquisition unit 131 determines whether or not it is time to perform the insertion process (step S102). For example, the acquisition unit 131 determines whether or not it is time to perform the insertion process based on the delivery date and time (delivery schedule) of the first audio content stored in the content storage unit 121-1. As described above, some of the second audio content to be inserted into the first audio content requires real-time performance. , just before (for example, 10 minutes) before the distribution date and time of the first audio content, it is determined that it is time to perform the insertion process (step S102; Yes). On the other hand, when the obtaining unit 131 determines that it is not the timing to perform the insertion process (step S102; No), it waits until the timing to perform the insertion process.

なお、取得部１３１は、第２の音声コンテンツの中に、リアルタイム性が求められるものが存在しない場合には、必ずしもステップＳ１０２の処理を行う必要はない。また、例えば、天気情報として、単に外気温に関する天気情報を第２の音声コンテンツとして挿入するにとどめる場合には、コンテンツ制御装置１００は、コンテンツ記憶部１２１－２において予め外気温毎の第２の音声コンテンツ（「変換後コンテンツ」）を予め有しておくことができる。このような場合、取得部１３１は、挿入処理を行う時点での外気温に応じた第２の音声コンテンツをコンテンツ記憶部１２１－２から取得すればよい。 Note that the acquisition unit 131 does not necessarily need to perform the process of step S102 when there is no second audio content that requires real-time performance. Further, for example, when only weather information related to the outside temperature is inserted as the second audio content as the weather information, the content control device 100 stores the second sound information for each outside temperature in advance in the content storage unit 121-2. Audio content (“converted content”) can be pre-existing. In such a case, the acquisition unit 131 may acquire the second audio content corresponding to the outside temperature at the time of performing the insertion process from the content storage unit 121-2.

次に、取得部１３１は、第１の音声コンテンツに挿入される候補の第２の音声コンテンツに対応する元データを取得する（ステップＳ１０３）。例えば、取得部１３１は、候補の第２の音声コンテンツの元データ（例えば、テキスト）として、ニュースコンテンツの元データ、広告コンテンツの元データ、路線情報の元データ、天気情報の元データ等を外部装置６０から取得する。 Next, the acquisition unit 131 acquires original data corresponding to the second audio content that is a candidate to be inserted into the first audio content (step S103). For example, the acquisition unit 131 externally acquires source data of news content, source data of advertisement content, source data of route information, source data of weather information, etc. as source data (eg, text) of the second voice content of the candidate. Obtained from device 60 .

次に、変換部１３２は、取得部１３１により取得された元データを編成、編集、変換する（ステップＳ１０４）。例えば、変換部１３２は、取得部１３１により取得された元データを編成および編集したものを、音声データつまり候補の第２の音声コンテンツに変換する。 Next, the conversion unit 132 organizes, edits, and converts the original data acquired by the acquisition unit 131 (step S104). For example, the conversion unit 132 converts the organized and edited original data acquired by the acquisition unit 131 into audio data, that is, the candidate second audio content.

次に、決定部１３４は、ユーザ情報に基づいて、第１の音声コンテンツに挿入される対象の第２の音声コンテンツと、当該対象の第２の音声コンテンツが第１の音声コンテンツに挿入される時間位置であって第１の音声コンテンツにおける時間位置とを決定する（ステップＳ１０５）。例えば、決定部１３４は、ステップＳ１０１で特定部に特定された行動傾向、生活スタイル、趣味嗜好に基づいて、第１の音声コンテンツに挿入される対象の第２の音声コンテンツと、当該対象の第２の音声コンテンツが第１の音声コンテンツに挿入される時間位置とを決定する。 Next, based on the user information, the determining unit 134 determines the second audio content to be inserted into the first audio content, and the second audio content to be inserted into the first audio content. A time position in the first audio content is determined (step S105). For example, the determination unit 134 selects the second audio content to be inserted into the first audio content and the second audio content to be inserted into the first audio content based on the behavioral tendency, lifestyle, and tastes and preferences specified by the specifying unit in step S101. A time position at which the second audio content is to be inserted into the first audio content is determined.

図９の例では、決定部１３４は、ニュースコンテンツである第２の音声コンテンツＳＣ１１およびマンガに関する広告コンテンツである第２の音声コンテンツＳＣ２１を、午前７時～午前８時の間に配信予定の音楽コンテンツである第１の音声コンテンツＭＣ１（１時間番組）に挿入される対象の第２の音声コンテンツとして決定したものとする。また、決定部１３４は、第１の音声コンテンツＭＣ１の時間位置として再生時間５分の位置（午前７時０５分に対応する時間位置）を第２の音声コンテンツＳＣ１１を挿入させる時間位置、第１の音声コンテンツＭＣ１の時間位置として再生時間３０分の位置（午前７時３０分に対応する時間位置）を第２の音声コンテンツＳＣ２１を挿入させる時間位置として決定する。 In the example of FIG. 9, the determining unit 134 selects the second audio content SC11, which is news content, and the second audio content SC21, which is advertising content related to manga, as music content scheduled to be delivered between 7:00 am and 8:00 am. Assume that it is determined as a second audio content to be inserted into a certain first audio content MC1 (one-hour program). Further, the determination unit 134 sets the position of 5 minutes of the reproduction time (the time position corresponding to 7:05 am) as the time position of the first audio content MC1 to the time position where the second audio content SC11 is to be inserted. As the time position of the audio content MC1, the position of 30 minutes of playback time (time position corresponding to 7:30 am) is determined as the time position to insert the second audio content SC21.

また、図９の例では、決定部１３４は、天気情報である第２の音声コンテンツＳＣ４１およびユーザＵ１の勤務地周辺の店舗広告である第２の音声コンテンツＳＣ２２を、午後８時～午後９時の間に配信予定のネットショッピングである第１の音声コンテンツＭＣ４（１時間番組）に挿入される対象の第２の音声コンテンツとして決定したものとする。特定部１３３は、ユーザＵ１は２０時台にネットショッピングを視聴する傾向にあることから、この時間帯に流される第２の音声コンテンツはより聞かれる可能性が高い。したがって、決定部１３４は、午後８時～午後９時の間に配信予定の第１の音声コンテンツＭＣ４に第２の音声コンテンツが挿入されるよう決定している。 Further, in the example of FIG. 9, the determination unit 134 selects the second audio content SC41, which is weather information, and the second audio content SC22, which is a store advertisement in the vicinity of user U1's place of work, between 8:00 pm and 9:00 pm. is determined as the second audio content to be inserted into the first audio content MC4 (one-hour program), which is an online shopping program scheduled to be distributed in 2010. Since the user U1 tends to watch online shopping around 20:00, the identification unit 133 has a high possibility that the second audio content played during this time period will be heard. Therefore, the determining unit 134 determines that the second audio content should be inserted into the first audio content MC4 scheduled to be delivered between 8:00 PM and 9:00 PM.

また、決定部１３４は、第１の音声コンテンツＭＣ４の時間位置として再生時間５分の位置（午後８時０５分に対応する時間位置）を第２の音声コンテンツＳＣ４１を挿入させる時間位置、第１の音声コンテンツＭＣ４の時間位置として再生時間３０分の位置（午後８時３０分に対応する時間位置）を第２の音声コンテンツＳＣ２２を挿入させる時間位置として決定する。 Further, the determination unit 134 sets the position of 5 minutes of the reproduction time (the time position corresponding to 8:05 pm) as the time position of the first audio content MC4 to the time position where the second audio content SC41 is to be inserted. As the time position of the audio content MC4, the position of 30 minutes of playback time (time position corresponding to 8:30 pm) is determined as the time position to insert the second audio content SC22.

そして、選択部１３５は、決定部１３４により決定された対象の第２の音声コンテンツをコンテンツ記憶部１２１－１から選択し、決定部１３４により決定された時間位置に第２の音声コンテンツを挿入する（ステップＳ１０６）。ここで、ステップＳ１０５およびＳ１０６について図１０を用いて説明する。図１０は、実施形態にかかる挿入処理を模式的に示す図である。 Then, the selection unit 135 selects the target second audio content determined by the determination unit 134 from the content storage unit 121-1, and inserts the second audio content at the time position determined by the determination unit 134. (Step S106). Here, steps S105 and S106 will be described with reference to FIG. FIG. 10 is a diagram schematically illustrating insertion processing according to the embodiment;

図１０に示すように、選択部１３５は、第１の音声コンテンツＭＣ１を時間位置「再生時間５分の位置」および時間位置「再生時間３０分の位置」それぞれで分割する。分割後の第１の音声コンテンツＭＣ１は、図１０のように３つになる。そして、選択部１３５は、分割された第１の音声コンテンツＭＣ１の間に第２の音声コンテンツＳＣ１１および第２の音声コンテンツＳＣ２１をそれぞれ挿入する。 As shown in FIG. 10, the selection unit 135 divides the first audio content MC1 at the time position "5 minute reproduction time position" and the time position "30 minute reproduction time position". The first audio content MC1 after division becomes three as shown in FIG. Then, the selection unit 135 inserts the second audio content SC11 and the second audio content SC21 between the divided first audio contents MC1.

また、図１０に示すように、選択部１３５は、第１の音声コンテンツＭＣ４を時間位置「再生時間５分の位置」および時間位置「再生時間３０分の位置」それぞれで分割する。分割後の第１の音声コンテンツＭＣ４は、図１０のように３つになる。そして、選択部１３５は、分割された第１の音声コンテンツＭＣ１の間に第２の音声コンテンツＳＣ４１および第２の音声コンテンツＳＣ２２をそれぞれ挿入する。 Further, as shown in FIG. 10, the selection unit 135 divides the first audio content MC4 at the time position "5 minutes of playback time" and the time position of "30 minutes of playback time". The first audio content MC4 after division becomes three as shown in FIG. Then, the selection unit 135 inserts the second audio content SC41 and the second audio content SC22 respectively between the divided first audio contents MC1.

なお、図１０では、選択部１３５が、第１の音声コンテンツの間に第２の音声コンテンツを挿入する例を示した。しかし、選択部１３５は、第１の音声コンテンツに対して第２の音声コンテンツを重畳してもよい。一例を示すと、選択部１３５は、第１の音声コンテンツＭＣ１の時間位置「再生時間５分の位置」に、第２の音声コンテンツＳＣ１１の先頭を合わせるようにして、第１の音声コンテンツＭＣ１に対して第２の音声コンテンツＳＣ１１を重畳させる。このようなことから、本実施形態では、「挿入する」とは「重畳する」の概念を含み得るものとする。 Note that FIG. 10 shows an example in which the selection unit 135 inserts the second audio content between the first audio contents. However, the selection unit 135 may superimpose the second audio content on the first audio content. To give an example, the selection unit 135 aligns the beginning of the second audio content SC11 with the time position “the position of 5 minutes of playback time” of the first audio content MC1 so as to match the first audio content MC1. The second audio content SC11 is superimposed on it. For this reason, in the present embodiment, "inserting" can include the concept of "overlapping."

次に、出力制御部１３６は、ユーザに関する情報として、ユーザの周辺環境を示す環境情報またはユーザの属性情報に基づいて、第１の音声コンテンツまたは第２の音声コンテンツの出力を制御する（ステップＳ１０７）。例えば、出力制御部１３６は、ユーザＵ１に関する情報に基づいて、第１の音声コンテンツまたは第２の音声コンテンツが出力される際の音量、音色またはピッチを制御する。また、出力制御部１３６は、ユーザに関する情報に基づいて、第１の音声コンテンツまたは第２の音声コンテンツを巻き戻す。 Next, the output control unit 136 controls the output of the first audio content or the second audio content based on the environment information indicating the surrounding environment of the user or the attribute information of the user as information about the user (step S107). ). For example, the output control unit 136 controls the volume, timbre, or pitch when the first audio content or the second audio content is output, based on the information regarding the user U1. Also, the output control unit 136 rewinds the first audio content or the second audio content based on the information about the user.

例えば、出力制御部１３６は、スマートスピーカー３０を介して、ユーザＵ１の周辺環境として、ユーザＵ１が現在、洗濯を行っている旨を検知したとする。かかる場合、出力制御部１３６は、洗濯音に邪魔されて音声コンテンツが聞き取りにくくなるのを防ぐため、第１の音声コンテンツまたは第２の音声コンテンツが所定値より高い音量で出力されるよう音量を制御する。出力制御部１３６は、ユーザＵ１が現在、洗濯を行っている場合、洗濯音に邪魔されて音声コンテンツを聞きのがしたと予測し、音声コンテンツを巻き戻してもよい。また、出力制御部１３６は、ユーザが年配（６０歳以上等）の場合には、より音声コンテンツを聞き取り易いよう音声コンテンツのピッチが所定値より遅くなるよう制御する。 For example, output control unit 136 detects, via smart speaker 30, that user U1 is currently washing clothes as user U1's surrounding environment. In such a case, the output control unit 136 adjusts the volume so that the first audio content or the second audio content is output at a volume higher than a predetermined value in order to prevent the audio content from being disturbed by the washing sound and becoming difficult to hear. Control. If the user U1 is currently washing, the output control unit 136 may anticipate that the user U1 has missed the audio content due to the disturbing sound of washing, and may rewind the audio content. In addition, when the user is elderly (eg, 60 years old or older), the output control unit 136 controls the pitch of the voice content to be slower than a predetermined value so that the voice content can be heard more easily.

なお、出力制御部１３６は、上記のような出力制御を第２の音声コンテンツにだけ適用してもよい。また、出力制御部１３６は、音声コンテンツが配信される前の段階で出力制御を行うのではなく、出力中の音声コンテンツに対して出力制御を行ってもよい。例えば、出力制御部１３６は、第２の音声コンテンツＳＣ１１が再生されている最中に、洗濯音を検知すると、第２の音声コンテンツＳＣ１１の音量を高めてもよい。 Note that the output control unit 136 may apply the above output control only to the second audio content. Also, the output control unit 136 may perform output control on the audio content being output instead of performing the output control before the audio content is distributed. For example, the output control unit 136 may increase the volume of the second audio content SC11 when detecting the washing sound while the second audio content SC11 is being played.

最後に、配信部１３９は、第２の音声コンテンツが挿入された第１の音声コンテンツである配信コンテンツを配信する（ステップＳ１０８）。例えば、配信部１３９は、スマートスピーカー３０に配信コンテンツを配信する。 Finally, the distribution unit 139 distributes the distribution content, which is the first audio content in which the second audio content is inserted (step S108). For example, the distribution unit 139 distributes distribution content to the smart speaker 30 .

（ＵＩ処理について）
次に、図１１を用いて、実施形態にかかるコンテンツ制御装置１００が実行するコンテンツ制御処理のうち、ＵＩ処理の手順について説明する。図１１は、実施形態にかかるＵＩ処理手順を示すフローチャートである。 (About UI processing)
Next, the UI processing procedure among the content control processing executed by the content control apparatus 100 according to the embodiment will be described with reference to FIG. 11 . 11 is a flowchart illustrating a UI processing procedure according to the embodiment; FIG.

まず、受付部１３７は、出力された音声コンテンツの再生を制御する操作を行うよう指示する音声指示をユーザから受け付けたか否かを判定する（ステップＳ２０１）。具体的には、受付部１３７は、出力された第２の音声コンテンツ（第１の音声コンテンツに挿入された第２の音声コンテンツ）の再生を制御する操作を行うよう指示する音声指示をユーザから受け付けたか否かを判定する。受付部１３７は、音声指示を受け付けていない場合には（ステップＳ２０１；Ｎｏ）、受け付けるまで待機する。 First, the reception unit 137 determines whether or not a voice instruction instructing to perform an operation for controlling reproduction of the output audio content has been received from the user (step S201). Specifically, the reception unit 137 receives a voice instruction from the user to perform an operation to control reproduction of the output second audio content (the second audio content inserted into the first audio content). Determine whether or not it has been accepted. If the reception unit 137 has not received a voice instruction (step S201; No), it waits until reception.

例えば、受付部１３７は、スマートスピーカー３０を介して、「巻き戻し」、「頭出し」、「クリップ」、「早送り」、「スキップ」等の音声指示を受け付けることができる。そして、受付部１３７は、音声指示を受け付けたと判定すると（ステップＳ２０１；Ｙｅｓ）、受け付けた音声指示がどのような操作を指示するのかを判定する（ステップＳ２０２）。また、受付部１３７は、判定した操作を示す情報をコンテンツ制御部１３８に送信する。 For example, the receiving unit 137 can receive voice instructions such as “rewind”, “cue”, “clip”, “fast forward”, and “skip” via the smart speaker 30 . Then, when determining that the voice instruction has been received (step S201; Yes), the receiving unit 137 determines what kind of operation the received voice instruction instructs (step S202). The accepting unit 137 also transmits information indicating the determined operation to the content control unit 138 .

コンテンツ制御部１３８は、受付部１３７により受け付けられた音声指示に応じた操作を第２の音声コンテンツに対して行う（ステップＳ２０３）。例えば、コンテンツ制御部１３８は、受付部１３７により第２の音声コンテンツを巻き戻すよう指示する音声指示が受け付けられた場合に、第２の音声コンテンツを巻き戻す。一例を示すと、コンテンツ制御部１３８は、スマートスピーカー３０に対して、第２の音声コンテンツを巻き戻すよう指示する。 The content control unit 138 performs an operation on the second audio content according to the voice instruction accepted by the accepting unit 137 (step S203). For example, the content control unit 138 rewinds the second audio content when the receiving unit 137 receives a voice instruction to rewind the second audio content. In one example, the content controller 138 instructs the smart speaker 30 to rewind the second audio content.

〔４．変形例〕
上記実施形態にかかるコンテンツ制御装置１００は、上記実施形態以外にも種々の異なる形態にて実施されてよい。そこで、以下では、コンテンツ制御装置１００の他の実施形態について説明する。 [4. Modification]
The content control device 100 according to the above embodiment may be implemented in various different forms other than the above embodiment. Therefore, another embodiment of the content control device 100 will be described below.

〔４－１．音声指示について〕
上記実施形態では、受付部１３７が、音声コンテンツの再生を制御する操作を行うよう指示する音声指示をユーザから受け付ける例を示した。しかし、この例に限らず受付部１３７は、音声コンテンツが示す内容に関連する操作を行うよう指示する音声指示をユーザから受け付けてもよい。この点について、第１の音声コンテンツ（ネットショッピングとする）に対して、料理器具に関する音声の広告コンテンツである第２の音声コンテンツが挿入されている例を用いて説明する。なお、説明の便宜上、かかる第２の音声コンテンツを第２の音声コンテンツＳＣ５１とする。 [4-1. About voice instructions]
In the above-described embodiment, an example was shown in which the reception unit 137 receives a voice instruction from the user that instructs to perform an operation for controlling reproduction of audio content. However, not limited to this example, the reception unit 137 may receive from the user a voice instruction instructing to perform an operation related to the content indicated by the audio content. This point will be described using an example in which the second audio content, which is audio advertising content related to cooking utensils, is inserted into the first audio content (online shopping). For convenience of explanation, this second audio content is referred to as a second audio content SC51.

このような状態において、第２の音声コンテンツＳＣ５１がスマートスピーカー３０から出力されている際に、ユーザＵ１が「これ買いたい！」と発言したとする。つまり、ユーザＵ１が、第２の音声コンテンツＳＣ５１が示す内容（料理器具）を欲する旨の意志表示を音声にて行ったとする。言い換えれば、受付部１３７が、第２の音声コンテンツＳＣ５１が示す商品（料理器具）の購入に関する操作を行うよう指示する音声指示をユーザＵ１から受け付けたとする。 Assume that the user U1 says "I want to buy this!" while the second audio content SC51 is being output from the smart speaker 30 in such a state. That is, it is assumed that the user U1 expresses his or her desire for the content (cooking utensils) indicated by the second audio content SC51. In other words, it is assumed that the reception unit 137 receives from the user U1 a voice instruction instructing to perform an operation related to purchase of the product (cooking utensil) indicated by the second audio content SC51.

かかる場合、コンテンツ制御部１３８は、コンテンツ制御装置１００とは異なる他の装置として、例えば、ユーザＵ１の端末装置１０に上記料理器具の購入サイトを示すアドレス情報（ＵＲＬ）を送信する。なお、かかるアドレス情報は、例えば、コンテンツ記憶部１２１－２において、第２の音声コンテンツに対応付けられる。 In such a case, the content control unit 138 transmits address information (URL) indicating the purchase site of the cooking utensils to, for example, the terminal device 10 of the user U1 as a device different from the content control device 100 . This address information is associated with the second audio content in, for example, the content storage unit 121-2.

また、かかる変形例では、第２の音声コンテンツが広告コンテンツである例を示したが、第２の音声コンテンツは広告コンテンツでなくてもよい。例えば、第２の音声コンテンツが路線情報である場合、ユーザＵ１が「○○線の路線情報知りたい！」と発言したとする。言い換えれば、受付部１３７が、路線情報を提示するよう指示する音声指示をユーザＵ１から受け付けたとする。かかる場合、コンテンツ制御部１３８は、ユーザＵ１の端末装置１０に〇〇線の路線情報を示すアドレス情報（ＵＲＬ）を送信する。 Also, in this modified example, an example is shown in which the second audio content is advertising content, but the second audio content may not be advertising content. For example, when the second audio content is route information, assume that user U1 says, "I want to know route information on XX line!" In other words, it is assumed that the receiving unit 137 receives from the user U1 a voice instruction instructing to present route information. In such a case, the content control unit 138 transmits address information (URL) indicating route information of the XX line to the terminal device 10 of the user U1.

これにより、実施形態にかかるコンテンツ制御装置１００は、ユーザの音声指示に応じて単に音声コンテンツを再生制御するだけでなく、音声コンテンツに関連する様々なユーザの要望に応えることができる。 As a result, the content control apparatus 100 according to the embodiment can not only simply control the reproduction of audio content according to the user's voice instruction, but also respond to various user requests related to audio content.

〔４－２．第２の音声コンテンツについて（１）〕
上記実施形態では、選択部１３５が、第２の音声コンテンツとして、ニュースコンテンツ、広告コンテンツ、路線情報、天気情報等を挿入する例を示した。しかし、選択部１３５は、これら以外にも様々な音声コンテンツを第２の音声コンテンツとして挿入することができる。例えば、ユーザＵ１が月額利用料金を支払って情報提供サービスを契約しているものとする。情報提供サービスとしては、例えば、バス送迎情報等がある。そして、ユーザＵ１は、このバス送迎情報を確認して、毎朝送迎バスを利用して通勤しているものとする。 [4-2. Regarding the second audio content (1)]
In the above embodiment, an example was given in which the selection unit 135 inserts news content, advertisement content, route information, weather information, etc. as the second audio content. However, the selection unit 135 can insert various audio contents other than these as the second audio contents. For example, it is assumed that user U1 has contracted for an information providing service by paying a monthly usage fee. As an information providing service, for example, there is bus pick-up information and the like. It is assumed that user U1 confirms this bus pick-up information and uses the pick-up bus to commute every morning.

かかる場合、決定部１３４は、午前７時から午前８時の間に放送される第１の音声コンテンツＭＣ１（音楽コンテンツ）に挿入させる対象の第２の音声コンテンツとして、その日のバス送迎情報を決定する。そして、選択部１３５は、バス送迎情報を第１の音声コンテンツＭＣ１に挿入する。また、配信部１３９は、タイムスケジュールに従って、バス送迎情報を含む配信コンテンツを配信する。 In this case, the determination unit 134 determines the bus pick-up information of the day as the second audio content to be inserted into the first audio content MC1 (music content) broadcast from 7:00 am to 8:00 am. Then, the selection unit 135 inserts the bus pick-up information into the first audio content MC1. Also, the distribution unit 139 distributes distribution content including bus pick-up information according to the time schedule.

これにより、実施形態にかかるコンテンツ制御装置１００は、ユーザにとって必要性の高い情報を優先的に提供することができる。 Thereby, the content control apparatus 100 according to the embodiment can preferentially provide information that is highly necessary for the user.

〔４－３．第２の音声コンテンツについて（２）〕
特定部１３３は、ユーザ情報として、行動履歴や操作履歴が十分に蓄積されていないうちは、正確に行動傾向を特定することができない場合がある。かかる場合、決定部１３４は、ユーザ情報を用いずに時間位置を決定してもよい。単純な例では、決定部１３４は、第１の音声コンテンツに対して予め決められた時間位置（例えば、再生時間１分の位置と５分の位置）を第２の音声コンテンツを挿入させる時間位置として決定する。 [4-3. Regarding the second audio content (2)]
The identifying unit 133 may not be able to accurately identify behavioral tendencies until sufficient behavior history and operation history are accumulated as user information. In such a case, the determination unit 134 may determine the time position without using user information. In a simple example, the determining unit 134 sets a predetermined time position (for example, a position of 1 minute and a position of 5 minutes of playback time) to the time position of inserting the second audio content for the first audio content. Determined as

また、第１の音声コンテンツが他社から提供される場合、提供主は第１の音声コンテンツに対して第２の音声コンテンツを挿入してよい場所を予め指定している場合がある。例えば、提供主は、第１の音声コンテンツにおいて第２の音声コンテンツを挿入してよい時間位置にタグを埋め込んでいる場合がある。かかる場合、決定部１３４は、タグが示す時間位置を第２の音声コンテンツを挿入させる時間位置として決定する。 Also, when the first audio content is provided by another company, the provider may have previously specified a place where the second audio content may be inserted into the first audio content. For example, the provider may have embedded tags in the first audio content at points in time where the second audio content may be inserted. In such a case, the determination unit 134 determines the time position indicated by the tag as the time position to insert the second audio content.

〔４－４．音声変換〕
上記実施形態では、変換部１３２がコンテンツのテキストデータを音声データに変換する、すなわち音声合成する例を示した。このとき変換部は、ユーザに適した形で声データに変換してもよい。例えば、変換部１３２は、ユーザの方言に合わせて音声データに変換する。また、変換部１３２は、ユーザに対して、例えば、好みの有名人（例えば、声優等）を一覧から選択させ、選択された有名人の声に合わせた音声データに変換してもよい。 [4-4. voice conversion]
In the above embodiment, the conversion unit 132 converts the text data of the content into speech data, that is, synthesizes speech. At this time, the conversion unit may convert the data into voice data in a form suitable for the user. For example, the conversion unit 132 converts into voice data according to the dialect of the user. Alternatively, the conversion unit 132 may cause the user to select, for example, a favorite celebrity (such as a voice actor) from a list, and convert the selected celebrity's voice into voice data.

〔４－５．出力制御について（１）〕
上記実施形態では、出力制御部１３６が、ユーザに関する情報に基づいて、第１の音声コンテンツまたは第２の音声コンテンツが出力される際の音量、音色またはピッチを制御する例を示した。また、出力制御部１３６が巻き戻しする例を示した。しかし、出力制御部１３６は、ユーザに関する情報として、ユーザの位置関係に基づいて、上記出力制御を行ってよい。ここで、ユーザの位置関係とは、ユーザＵ１を例にあげると、例えば、ユーザＵ１とスマートスピーカー３０との位置関係である。出力制御部１３６は、ユーザＵ１（端末装置１０）およびスマートスピーカー３０から位置情報を取得することができる。このため、出力制御部１３６は、例えば、ユーザＵ１とスマートスピーカー３０とが所定距離以上離れている場合、出力制御部１３６は、ユーザＵ１はスマートスピーカー３０から離れた位置に居るため、音声を聞き取りづらい状況にあると判断する。このように判断した場合、出力制御部１３６は、例えば、配信コンテンツの音量を上げるよう制御する。 [4-5. About output control (1)]
In the above embodiment, an example was given in which the output control unit 136 controls the volume, timbre, or pitch when the first audio content or the second audio content is output based on information about the user. Also, an example in which the output control unit 136 rewinds is shown. However, the output control unit 136 may perform the above-described output control based on the user's positional relationship as information about the user. Here, the user's positional relationship is, for example, the positional relationship between the user U1 and the smart speaker 30, taking the user U1 as an example. The output control unit 136 can acquire location information from the user U1 (terminal device 10) and the smart speaker 30. FIG. Therefore, for example, when the user U1 and the smart speaker 30 are separated by a predetermined distance or more, the output control unit 136 determines that the user U1 is at a position away from the smart speaker 30, so the user U1 cannot hear the voice. Decide that you are in a difficult situation. When determined in this way, the output control unit 136 controls, for example, to increase the volume of the distributed content.

これにより、実施形態にかかるコンテンツ制御装置１００は、ユーザの状況に合わせて出力制御することができるため、音声コンテンツに対する聞き易さの満足度を高めることができる。 As a result, the content control apparatus 100 according to the embodiment can perform output control according to the user's situation, so that the user's satisfaction with the audibility of the audio content can be enhanced.

〔４－６．出力制御について（２）〕
また、出力制御部１３６は、聴覚心理学を活用した出力制御を行ってもよい。一例を示すと、カクテルパーティー効果がある。カクテルパーティー効果とは、雑音の中でも自身が必要とする音声を選択して聞き取ることができる人間の能力である。これには、雑音と必要音声との周波数の差が関与していると考えられる。したがって、出力制御部１３６は、音声コンテンツの周波数を制御する。例えば、出力制御部１３６は、ユーザが音声コンテンツをより聞き取り易くなるよう音声コンテンツの周波数を制御する。 [4-6. About output control (2)]
In addition, the output control unit 136 may perform output control using psychoacoustics. One example is the cocktail party effect. The cocktail party effect is the ability of humans to select and hear the sounds they need in the midst of noise. It is believed that this is related to the frequency difference between the noise and the desired voice. Therefore, the output control unit 136 controls the frequency of audio content. For example, the output control unit 136 controls the frequency of the audio content so that the user can easily hear the audio content.

〔４－７．出力制御について（３）〕
また、出力制御部１３６は、第２の音声コンテンツの出力制御に連動させて、第１の音声コンテンツを出力制御してもよい。例えば、出力制御部１３６は、ユーザＵ１が現在、洗濯を行っている場合、洗濯音に邪魔されて音声コンテンツを聞きのがしたと予測し、音声コンテンツを巻き戻す。例えば、出力制御部１３６は、第２の音声コンテンツを巻き戻しそして再生する。また、この第２の音声コンテンツの再生が終了すれば、出力制御部１３６は、続きから再び第１の音声コンテンツの再生を行うが、この際、洗濯音が鳴っていても第１の音声コンテンツも聞き取り易いよう、第１の音声コンテンツも所定値より大きい音量で出力されるよう音量を制御する。また、出力制御部１３６は、第１の音声コンテンツの再生を再開させる際に、ユーザＵ１の周辺環境においてユーザＵ１にとってより重要な情報が出力されている場合（例えば、テレビがついている、駅の放送が流れている等）には、第１の音声コンテンツの音量が所定値より小さい音量で出力されるよう音量を制御してもよい。 [4-7. Regarding output control (3)]
The output control unit 136 may also control the output of the first audio content in conjunction with the output control of the second audio content. For example, if the user U1 is currently washing clothes, the output control unit 136 predicts that the user U1 was disturbed by the sound of washing and missed the audio content, and rewinds the audio content. For example, the output control unit 136 rewinds and plays the second audio content. Further, when the reproduction of the second audio content is completed, the output control unit 136 resumes the reproduction of the first audio content from the continuation. The volume is controlled so that the first audio content is also output at a volume higher than a predetermined value so that the first audio content can be easily heard. In addition, when resuming the reproduction of the first audio content, the output control unit 136 may output more important information for the user U1 in the surrounding environment of the user U1 (for example, when the television is on, the When a broadcast is being played, etc., the volume may be controlled so that the volume of the first audio content is output at a volume lower than a predetermined value.

〔５．ハードウェア構成〕
また、上述してきた実施形態にかかるコンテンツ制御装置１００は、例えば図１２に示すような構成のコンピュータ１０００によって実現される。図１２は、コンテンツ制御装置１００の機能を実現するコンピュータ１０００の一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ１３００、ＨＤＤ１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を有する。 [5. Hardware configuration]
Also, the content control apparatus 100 according to the above-described embodiments is implemented by a computer 1000 configured as shown in FIG. 12, for example. FIG. 12 is a hardware configuration diagram showing an example of a computer 1000 that implements the functions of the content control device 100. As shown in FIG. Computer 1000 has CPU 1100 , RAM 1200 , ROM 1300 , HDD 1400 , communication interface (I/F) 1500 , input/output interface (I/F) 1600 and media interface (I/F) 1700 .

ＣＰＵ１１００は、ＲＯＭ１３００又はＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on programs stored in the ROM 1300 or HDD 1400 and controls each section. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started up, a program depending on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、および、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、通信網５０を介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータを、通信網５０を介して他の機器へ送信する。 HDD 1400 stores programs executed by CPU 1100 and data used by these programs. Communication interface 1500 receives data from other devices via communication network 50 and sends the data to CPU 1100 , and transmits data generated by CPU 1100 to other devices via communication network 50 .

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを、入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls output devices such as displays and printers, and input devices such as keyboards and mice, through an input/output interface 1600 . CPU 1100 acquires data from an input device via input/output interface 1600 . CPU 1100 also outputs the generated data to an output device via input/output interface 1600 .

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラム又はデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 Media interface 1700 reads programs or data stored in recording medium 1800 and provides them to CPU 1100 via RAM 1200 . CPU 1100 loads such a program from recording medium 1800 onto RAM 1200 via media interface 1700, and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or a PD (Phase change rewritable disc), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. etc.

例えば、コンピュータ１０００が実施形態にかかるコンテンツ制御装置１００として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムを実行することにより、制御部１３０の機能を実現する。また、ＨＤＤ１４００には、記憶部１２０内のデータが格納される。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを、記録媒体１８００から読み取って実行するが、他の例として、他の装置から、通信網５０を介してこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the content control device 100 according to the embodiment, the CPU 1100 of the computer 1000 implements the functions of the control unit 130 by executing programs loaded onto the RAM 1200 . In addition, data in storage unit 120 is stored in HDD 1400 . CPU 1100 of computer 1000 reads these programs from recording medium 1800 and executes them, but as another example, these programs may be obtained from another device via communication network 50 .

〔６．その他〕
上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 [6. others〕
Of the processes described in the above embodiments, all or part of the processes described as being automatically performed can be manually performed, or all or part of the processes described as being manually performed. Part can also be done automatically by known methods. In addition, information including processing procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Also, each component of each device illustrated is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution and integration of each device is not limited to the one shown in the figure, and all or part of them can be functionally or physically distributed and integrated in arbitrary units according to various loads and usage conditions. Can be integrated and configured.

また、上述してきた各実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Moreover, each embodiment described above can be appropriately combined within a range in which the processing contents are not inconsistent.

〔７．効果〕
実施形態にかかるコンテンツ制御装置１００は、受付部１３７と、コンテンツ制御部１３８とを有する。受付部１３７は、出力された音声コンテンツの再生を制御する操作を行うよう指示する音声指示をユーザから受け付ける。コンテンツ制御部１３８は、受付部１３７により受け付けられた音声指示に応じた操作を音声コンテンツに対して行う。 [7. effect〕
The content control device 100 according to the embodiment has a reception section 137 and a content control section 138 . The accepting unit 137 accepts a voice instruction from the user that instructs to perform an operation for controlling reproduction of the output audio content. The content control unit 138 performs an operation on the audio content according to the audio instruction accepted by the accepting unit 137 .

これにより、実施形態にかかるコンテンツ制御装置１００は、音声コンテンツがながら聴取されている場合であっても、当該音楽コンテンツを容易に制御させることができる。 Thereby, the content control apparatus 100 according to the embodiment can easily control the music content even when the audio content is being listened to.

また、受付部１３７は、出力された音声コンテンツとして、第１の音声コンテンツに挿入された第２の音声コンテンツの再生を制御する操作を行うよう指示する音声指示をユーザから受け付ける。 Further, the receiving unit 137 receives, from the user, a voice instruction to perform an operation for controlling reproduction of the second audio content inserted in the first audio content as the output audio content.

これにより、実施形態にかかるコンテンツ制御装置１００は、第２の音声コンテンツがながら聴取されている場合であっても、当該音楽コンテンツを容易に制御させることができる。 Thereby, the content control apparatus 100 according to the embodiment can easily control the music content even when the second audio content is being listened to.

また、受付部１３７は、音声コンテンツの再生を制御する操作として、音声コンテンツを巻き戻すよう指示する音声指示を受け付け、コンテンツ制御部１３８は、受付部１３７により音声コンテンツを巻き戻すよう指示する音声指示が受け付けられた場合に、音声コンテンツを巻き戻す。 In addition, the reception unit 137 receives a voice instruction to rewind the audio content as an operation for controlling the reproduction of the audio content, and the content control unit 138 causes the reception unit 137 to receive a voice instruction to rewind the audio content. is accepted, rewind the audio content.

これにより、実施形態にかかるコンテンツ制御装置１００は、ユーザの一声で音声コンテンツを巻き戻すことができるため、容易に聞き逃し対策を実現することができる。 As a result, the content control apparatus 100 according to the embodiment can rewind the audio content with a single voice of the user, so that it is possible to easily implement countermeasures against missed hearing.

また、受付部１３７は、音声コンテンツの再生を制御する操作として、音声コンテンツを頭出しするよう指示する音声指示を受け付け、コンテンツ制御部１３８は、受付部１３７により音声コンテンツを頭出しするよう指示する音声指示が受け付けられた場合に、音声コンテンツを頭出しする。 Further, the reception unit 137 receives a voice instruction to cue the audio content as an operation for controlling the reproduction of the audio content, and the content control unit 138 instructs the reception unit 137 to cue the audio content. To cue audio contents when a voice instruction is accepted.

これにより、実施形態にかかるコンテンツ制御装置１００は、ユーザの一声で音声コンテンツを頭出しすることができるため、容易に聞き逃し対策を実現することができる。 As a result, the content control apparatus 100 according to the embodiment can cue audio content with a single voice of the user, so that measures against missed hearing can be easily implemented.

また、受付部１３７は、音声コンテンツの再生を制御する操作として、音声コンテンツが挿入された第１の音声コンテンツから音声コンテンツを抽出するよう指示する音声指示を受け付け、コンテンツ制御部１３８は、受付部１３７により音声コンテンツが挿入された第１の音声コンテンツから音声コンテンツを抽出するよう指示する音声指示が受け付けられた場合に、音声コンテンツを第１の音声コンテンツから抽出する。 In addition, the reception unit 137 receives, as an operation for controlling the reproduction of the audio content, a voice instruction instructing to extract the audio content from the first audio content into which the audio content is inserted. When the audio instruction instructing to extract the audio content from the first audio content into which the audio content is inserted by 137 is received, the audio content is extracted from the first audio content.

これにより、実施形態にかかるコンテンツ制御装置１００は、ユーザにとって重要な音声部分を保管しておくことができる。 As a result, the content control apparatus 100 according to the embodiment can store audio parts that are important to the user.

また、コンテンツ制御部１３８は、コンテンツ制御装置とは異なる他の装置に対して、第１の音声コンテンツから抽出した音声コンテンツに関するコンテンツを送信する。 Also, the content control unit 138 transmits content related to the audio content extracted from the first audio content to another device different from the content control device.

これにより、実施形態にかかるコンテンツ制御装置１００は、ユーザにとって重要な音声部分をユーザの端末装置に保管させておくことができるため、音声コンテンツを聞きたいときに聞き直しさせることができる。 As a result, the content control apparatus 100 according to the embodiment can store the audio portion that is important for the user in the terminal device of the user, so that the user can listen to the audio content again when he or she wants to hear it.

また、受付部１３７は、音声コンテンツの再生を制御する操作として、音声コンテンツを早送り、または、スキップするよう指示する音声指示を受け付け、コンテンツ制御部１３８は、受付部１３７により音声コンテンツを早送り、または、スキップするよう指示する音声指示が受け付けられた場合に、音声コンテンツを早送り、または、スキップする。 In addition, the reception unit 137 receives a voice instruction to fast-forward or skip the audio content as an operation for controlling the reproduction of the audio content, and the content control unit 138 causes the reception unit 137 to fast-forward the audio content or , fast-forwards or skips the audio content when a voice instruction instructing to skip is received.

これにより、実施形態にかかるコンテンツ制御装置１００は、ユーザの一声で音声コンテンツを早送り、スキップすることができるため、ユーザが不要と感じる音声コンテンツを容易に省略することができる。 As a result, the content control apparatus 100 according to the embodiment can fast-forward or skip audio content with the user's voice, so that the user can easily omit audio content that the user feels is unnecessary.

また、受付部１３７は、音声指示として、音声コンテンツに対するユーザの反応を示す発話での音声指示を受け付ける。 In addition, the reception unit 137 receives, as the voice instruction, a voice instruction in an utterance indicating the user's reaction to the audio content.

これにより、実施形態にかかるコンテンツ制御装置１００は、ユーザが操作に準じた音声指示をしなくとも、反射的につい発してまったような言葉についても音声指示と見なして音声コンテンツを制御することができる。 As a result, the content control apparatus 100 according to the embodiment can control the voice content by regarding even words that the user inadvertently uttered as a voice instruction, even if the user does not give a voice instruction according to the operation. can.

また、実施形態にかかるコンテンツ制御装置１００は、取得部１３１と、選択部１３５とを有する。取得部１３１は、第１の音声コンテンツに挿入される候補の第２の音声コンテンツを取得する。選択部１３５は、配信先のユーザに関するユーザ情報に基づいて、候補の第２の音声コンテンツのうち、第１の音声コンテンツに挿入される第２の音声コンテンツを選択する。そして、受付部１３７は、第１の音声コンテンツに挿入された第２の音声コンテンツであって選択部１３５により選択された第２の音声コンテンツの再生を制御する操作を行うよう指示する音声指示をユーザから受け付ける。 Also, the content control device 100 according to the embodiment has an acquisition unit 131 and a selection unit 135 . The acquisition unit 131 acquires a second audio content candidate to be inserted into the first audio content. The selection unit 135 selects the second audio content to be inserted into the first audio content from the second audio content candidates based on the user information about the user of the distribution destination. Then, the reception unit 137 issues a voice instruction to perform an operation for controlling reproduction of the second audio content selected by the selection unit 135, which is the second audio content inserted into the first audio content. Accept from users.

また、実施形態にかかるコンテンツ制御装置１００は、取得部１３１と、選択部１３５とを有する。取得部１３１は、第１の音声コンテンツに挿入される候補の第２の音声コンテンツを取得する。選択部１３５は、配信先のユーザに関するユーザ情報に基づいて、候補の第２の音声コンテンツのうち、第１の音声コンテンツに挿入される第２の音声コンテンツを選択する。 Also, the content control device 100 according to the embodiment has an acquisition unit 131 and a selection unit 135 . The acquisition unit 131 acquires a second audio content candidate to be inserted into the first audio content. The selection unit 135 selects the second audio content to be inserted into the first audio content from the second audio content candidates based on the user information about the user of the distribution destination.

これにより、実施形態にかかるコンテンツ制御装置１００は、ユーザが「耳の端で捉えられる」程度に加工および制御した音声コンテンツをユーザに提供することができるため、ながら聴取の中でもユーザの欲する情報を効果的に提供することができる。 As a result, the content control apparatus 100 according to the embodiment can provide the user with audio content that has been processed and controlled to the extent that the user can "catch it with the edge of the ear". can be provided effectively.

また、選択部１３５は、配信先のユーザに関するユーザ情報としてユーザの行動履歴に基づいて、候補の第２の音声コンテンツのうち、第１の音声コンテンツに挿入される対象の第２の音声コンテンツを選択する。 Further, the selection unit 135 selects the second audio content to be inserted into the first audio content from among the candidate second audio content based on the action history of the user as the user information about the user of the distribution destination. select.

これにより、実施形態にかかるコンテンツ制御装置１００は、ユーザが「耳の端で捉えられる」程度の第２の音声コンテンツを提供することができる。 Thereby, the content control apparatus 100 according to the embodiment can provide the second audio content that the user can "catch with the edge of the ear".

また、選択部１３５は、配信先のユーザに関するユーザ情報として、音声コンテンツを出力する所定の出力装置に対するユーザの操作履歴に基づいて、候補の第２の音声コンテンツのうち、第１の音声コンテンツに挿入される対象の第２の音声コンテンツを選択する。 In addition, the selection unit 135 selects the first audio content among the second audio content candidates based on the user's operation history with respect to a predetermined output device that outputs the audio content as the user information about the user of the distribution destination. Selecting the second audio content to be inserted.

これにより、実施形態にかかるコンテンツ制御装置１００は、ユーザ毎に最適化（パーソナライズ化）した第２の音声コンテンツをながら聴取させることができる。 Thereby, the content control apparatus 100 according to the embodiment can make the user listen to the second audio content optimized (personalized) for each user.

また、実施形態にかかるコンテンツ制御装置１００は、決定部１３４を有する。決定部１３４は、ユーザ情報に基づいて、第１の音声コンテンツに挿入される対象の第２の音声コンテンツと、当該対象の第２の音声コンテンツが第１の音声コンテンツに挿入される時間位置であって第１の音声コンテンツにおける時間位置とを決定する。また、選択部１３５は、決定部１３４により決定された第２のコンテンツを対象の第２の音声コンテンツとして選択する。 Also, the content control device 100 according to the embodiment has a determination unit 134 . Based on the user information, the determining unit 134 selects the second audio content to be inserted into the first audio content and the time position at which the second audio content is to be inserted into the first audio content. A time position in the first audio content is determined. Also, the selection unit 135 selects the second content determined by the determination unit 134 as the target second audio content.

これにより、実施形態にかかるコンテンツ制御装置１００は、ユーザ毎に最適化（パーソナライズ化）した第２の音声コンテンツを、ユーザ毎に適した時間に提供することができる。 Thereby, the content control apparatus 100 according to the embodiment can provide the second audio content optimized (personalized) for each user at a time suitable for each user.

また、決定部１３４は、ユーザ情報に基づきユーザがどの時間にどのような音声コンテンツを聞く傾向にあるかが算出された算出結果に基づいて、対象の第２の音声コンテンツと時間位置とを決定する。 Further, the determining unit 134 determines the target second audio content and the time position based on the result of calculating what kind of audio content the user tends to listen to at what time based on the user information. do.

これにより、実施形態にかかるコンテンツ制御装置１００は、第２の音声コンテンツに対するパーソナライズ化の精度を高めることができる。 Thereby, the content control apparatus 100 according to the embodiment can improve the accuracy of personalization for the second audio content.

また、決定部１３４は、ユーザ情報に基づいて、候補の第２の音声コンテンツのうち、ユーザの生活状況または趣味嗜好に応じた音声コンテンツを第１の音声コンテンツに挿入される対象の第２の音声コンテンツとして決定する。 Further, based on the user information, the determining unit 134 selects the second audio content to be inserted into the first audio content, among the candidate second audio content, according to the user's living situation or hobbies and tastes. Determined as audio content.

これにより、実施形態にかかるコンテンツ制御装置１００は、第２の音声コンテンツに対するパーソナライズ化の精度を高めることができるとともに、ユーザにとって興味度の高い音声コンテンツを提供することができる。 As a result, the content control apparatus 100 according to the embodiment can improve the accuracy of personalization of the second audio content, and can provide audio content with a high degree of interest to the user.

また、実施形態にかかるコンテンツ制御装置１００は、挿入部（選択部１３５）を有する。挿入部（選択部１３５）は、第２のコンテンツを決定部１３４により決定された時間位置に挿入する。 Further, the content control device 100 according to the embodiment has an insertion section (selection section 135). The insertion unit (selection unit 135 ) inserts the second content at the time position determined by the determination unit 134 .

また、実施形態にかかるコンテンツ制御装置１００は、出力制御部１３６を有する。出力制御部１３６は、ユーザに関する情報に基づいて、第１の音声コンテンツまたは第２の音声コンテンツの出力を制御する。 Also, the content control device 100 according to the embodiment has an output control unit 136 . The output control unit 136 controls output of the first audio content or the second audio content based on the information about the user.

これにより、実施形態にかかるコンテンツ制御装置１００は、第１の音声コンテンツまたは第２の音声コンテンツの聞き取り易さに貢献することができる。 Thereby, the content control apparatus 100 according to the embodiment can contribute to the ease of listening to the first audio content or the second audio content.

また、出力制御部１３６は、ユーザに関する情報として、ユーザの周辺環境を示す環境情報またはユーザの属性情報に基づいて、第１の音声コンテンツまたは第２の音声コンテンツの出力を制御する。 In addition, the output control unit 136 controls the output of the first audio content or the second audio content based on the environment information indicating the user's surrounding environment or the user's attribute information as information about the user.

また、出力制御部１３６は、ユーザに関する情報に基づいて、第１の音声コンテンツまたは第２の音声コンテンツが出力される際の音量、音色またはピッチを制御する。 The output control unit 136 also controls the volume, tone color, or pitch when the first audio content or the second audio content is output, based on the information about the user.

また、出力制御部１３６は、ユーザに関する情報に基づいて、第１の音声コンテンツまたは第２の音声コンテンツを巻き戻す。 Also, the output control unit 136 rewinds the first audio content or the second audio content based on the information about the user.

また、実施形態にかかるコンテンツ制御装置１００は、変換部１３２を有する。変換部１３２は、所定のコンテンツを音声データに変換する。そして、取得部１３１は、第１の音声コンテンツに挿入される候補の第２の音声コンテンツとして、変換部１３２により変換された音声データを取得する。 Also, the content control device 100 according to the embodiment has a conversion unit 132 . The conversion unit 132 converts predetermined content into audio data. Then, the acquiring unit 131 acquires the audio data converted by the converting unit 132 as a candidate second audio content to be inserted into the first audio content.

これにより、実施形態にかかるコンテンツ制御装置１００は、テキストデータのコンテンツから音声コンテンツを合成することができる。 Thereby, the content control apparatus 100 according to the embodiment can synthesize audio content from text data content.

また、選択部１３５は、配信先のユーザに関するユーザ情報に基づいて、候補の第２の音声コンテンツのうち、第１の音声コンテンツに挿入される第２のコンテンツであって、スマートスピーカーから出力される対象の第２のコンテンツを選択する。 Further, the selection unit 135 selects the second content to be inserted into the first audio content among the candidate second audio content based on the user information about the user of the distribution destination, and the second content to be output from the smart speaker. Select the second content for which you want to

これにより、実施形態にかかるコンテンツ制御装置１００は、スマートスピーカーから音声コンテンツを出力させることでながら聴取を実現することができる。 As a result, the content control apparatus 100 according to the embodiment can realize listening while outputting the audio content from the smart speaker.

以上、本願の実施形態をいくつかの図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 As described above, the embodiments of the present application have been described in detail based on several drawings, but these are examples, and various modifications and It is possible to carry out the invention in other forms with modifications.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、取得部は、取得手段や取得回路に読み替えることができる。 Also, the above-mentioned "section, module, unit" can be read as "means" or "circuit". For example, the acquisition unit can be read as acquisition means or an acquisition circuit.

１制御システム
１０端末装置
３０出力装置
６０外部装置
１００コンテンツ制御装置
１２１コンテンツ記憶部
１２２行動履歴記憶部
１２３操作履歴記憶部
１２４ユーザ情報記憶部
１２５配信コンテンツ記憶部
１３１取得部
１３２変換部
１３３特定部
１３４決定部
１３５選択部
１３６出力制御部
１３７受付部
１３８コンテンツ制御部
１３９配信部 1 control system 10 terminal device 30 output device 60 external device 100 content control device 121 content storage unit 122 action history storage unit 123 operation history storage unit 124 user information storage unit 125 distribution content storage unit 131 acquisition unit 132 conversion unit 133 identification unit 134 Determination unit 135 Selection unit 136 Output control unit 137 Acceptance unit 138 Contents control unit 139 Delivery unit

Claims

an acquisition unit that acquires second audio content, which is content different in category from the first audio content;
Among the second audio contents acquired by the acquiring unit, the second audio contents belonging to the category determined based on the user information regarding the user of the distribution destination are newly inserted into the first audio contents. a selection unit that selects as the second audio content that
A second audio content inserted into the first audio content so as to be output at a playback position included in the first audio content, the playback position being determined based on the viewing tendency of the user for the audio content. a reception unit that receives from a user a voice instruction instructing to perform an operation for controlling reproduction of the second audio content selected by the selection unit;
and a content control unit that performs an operation on the second audio content according to the voice instruction received by the receiving unit.

2. The content control device according to claim 1, wherein the reception unit receives from a user a voice instruction instructing to perform an operation for controlling reproduction of the second audio content.

The receiving unit receives, as an operation for controlling reproduction of the second audio content, a voice instruction instructing to rewind the second audio content,
3. The content control unit rewinds the second audio content when the reception unit receives a voice instruction to rewind the second audio content. 3. The content control device according to .

The receiving unit receives, as an operation for controlling reproduction of the second audio content, a voice instruction instructing to cue the second audio content,
2. The content control unit cue the second audio content when the reception unit receives a voice instruction to cue the second audio content. 4. The content control device according to any one of 1 to 3.

The reception unit provides, as an operation for controlling reproduction of the second audio content, a voice instruction instructing to extract the second audio content from the first audio content into which the second audio content is inserted. accept the
When the receiving unit receives a voice instruction to extract the second audio content from the first audio content into which the second audio content is inserted, the content control unit controls the second audio content to be extracted. 5. The content control device according to any one of claims 1 to 4, wherein second audio content is extracted from said first audio content.

6. The content control unit according to claim 5, wherein the content control unit transmits content related to the second audio content extracted from the first audio content to another device different from the content control device. content controller.

The receiving unit receives a voice instruction instructing to fast-forward or skip the second audio content as an operation for controlling reproduction of the second audio content,
The content control unit fast-forwards or skips the second audio content when the receiving unit receives a voice instruction to fast-forward or skip the second audio content. A content control device according to any one of claims 1 to 6.

The content control device according to any one of claims 1 to 7, wherein the receiving unit receives, as the voice instruction, an utterance indicating a user's reaction to the audio content.

an acquisition unit that acquires second audio content, which is content different in category from the first audio content;
A selection unit that selects, from among the second audio contents acquired by the acquisition unit, second audio content to be newly inserted into the first audio content, based on user information about users of distribution destinations. When,
The second audio content selected by the selection unit is output at a playback position that is determined based on the viewing tendency of the user with respect to the audio content and is included in the first audio content. and an inserting unit for inserting data forming a second audio content into data forming the first audio content.

The selecting unit selects the second audio content to be inserted into the first audio content from among the second audio content based on the action history of the user as user information about the user of the distribution destination. 10. The content control device according to claim 9, characterized by:

The selection unit inserts into the first audio content out of the second audio content based on a user's operation history with respect to a predetermined output device that outputs audio content as user information about a user of a distribution destination. 11. The content control device according to claim 9 or 10, wherein the second audio content to be played is selected.

A second audio content to be inserted into the first audio content based on the user information, and a playback position at which the target second audio content is inserted into the first audio content, further comprising a determination unit that determines a playback position included in the first audio content;
The content control device according to any one of claims 9 to 11, wherein the selection unit selects the target second audio content determined by the determination unit.

The determining unit selects the second audio content of the target based on the viewing tendency, which is a calculation result of calculating what kind of audio content the user tends to listen to at what time based on the user information. and the playback position.

Based on the user information, the determining unit selects a second audio content to be inserted into the first audio content, among the second audio content, according to the user's living situation or tastes and preferences. 14. The content control device according to claim 12 or 13, wherein the content is determined as audio content.

15. The inserting unit according to any one of claims 12 to 14, further comprising an inserting unit for inserting the second audio content selected by the selecting unit into the playback position determined by the determining unit. The content control device according to 1.

16. The apparatus according to any one of claims 9 to 15, further comprising an output control unit that controls output of said first audio content or said second audio content based on information regarding said user. Content controller.

The output control unit controls the output of the first audio content or the second audio content based on environment information indicating the surrounding environment of the user or attribute information of the user as the information about the user. 17. The content control device according to claim 16, characterized by:

17. The output control unit controls volume, timbre, or pitch when the first audio content or the second audio content is output, based on the information about the user. 18. The content control device according to 17.

The content control according to any one of claims 16 to 18, wherein the output control unit rewinds the first audio content or the second audio content based on information about the user. Device.

further comprising a conversion unit that converts predetermined content into audio data;
20. The method according to any one of claims 9 to 19, wherein the acquisition unit acquires audio data converted by the conversion unit as second audio content to be inserted into the first audio content. A content controller as described.

The selection unit selects second audio content to be inserted into the first audio content among the second audio content based on user information about a user of a distribution destination, the second audio content being output from a smart speaker. The content control device according to any one of claims 9 to 20, characterized in that it selects the target second audio content.

A control method executed by a content control device,
an acquisition step of acquiring second audio content, which is content in a category different from that of the first audio content;
Among the second audio contents acquired in the acquiring step, a second audio content belonging to a category determined based on user information about a user of a distribution destination is newly inserted into the first audio content. a selection step of selecting as a second audio content to
A second audio content inserted into the first audio content so as to be output at a playback position included in the first audio content, the playback position being determined based on the viewing tendency of the user for the audio content. a receiving step of receiving from a user a voice instruction instructing to perform an operation for controlling reproduction of the second audio content selected in the selecting step;
and a content control step of performing an operation on the second audio content in accordance with the voice instruction accepted by the accepting step.

A control method executed by a content control device,
an acquisition step of acquiring second audio content, which is content in a category different from that of the first audio content;
A selection step of selecting a second audio content to be newly inserted into the first audio content from among the second audio content obtained in the obtaining step, based on user information about a user of a distribution destination. When,
The second audio content selected by the selecting step is output at a playback position determined based on the viewing tendency of the user for the audio content and included in the first audio content. and an inserting step of inserting data constituting a second audio content into data constituting the first audio content.

an acquisition procedure for acquiring second audio content, which is other content whose category is different from that of the first audio content;
Among the second audio contents acquired by the acquisition procedure, a second audio content belonging to a category determined based on user information regarding a user of a distribution destination is newly inserted into the first audio content. a selection procedure for selecting as a second audio content that
A second audio content inserted into the first audio content so as to be output at a playback position included in the first audio content, the playback position being determined based on the viewing tendency of the user for the audio content. a reception procedure for accepting a voice instruction from a user for instructing an operation to control reproduction of the second audio content selected by the selection procedure;
A control program for causing a computer to execute a content control procedure for performing an operation on the second audio content in accordance with the voice instruction accepted by the acceptance procedure.

an acquisition procedure for acquiring second audio content, which is other content whose category is different from that of the first audio content;
A selection procedure for selecting a second audio content to be newly inserted into the first audio content from among the second audio contents acquired by the acquisition procedure, based on user information about a user of a distribution destination. When,
The second audio content selected by the selection procedure is output at a playback position determined based on the viewing tendency of the user for the audio content and included in the first audio content. A control program for causing a computer to execute an insertion procedure for inserting data constituting a second audio content into data constituting the first audio content.