WO2020071255A1

WO2020071255A1 - Information presentation device

Info

Publication number: WO2020071255A1
Application number: PCT/JP2019/038045
Authority: WO
Inventors: 優太朗白水
Original assignee: 株式会社Ｎｔｔドコモ
Priority date: 2018-10-05
Filing date: 2019-09-26
Publication date: 2020-04-09
Also published as: JPWO2020071255A1; JP7146933B2

Abstract

An information presentation device (1) presents speech information to a user U via utterance, the information presentation device comprising: a scoring unit (23) that calculates a score that is a numerical value related to the priority of utterance of an utterance sentence and gives the score to an utterance sentence that is a candidate as an utterance to the user U; a stand-by utterance information holding unit (21) that holds, in association with each other, an utterance sentence and a score given by the scoring unit to the utterance sentence; an utterance determination unit (22) that serves as an utterance information selection unit for selecting the utterance sentence associated with the highest score held in the stand-by utterance information holding unit (21); and a dialog module (10) that serves as an output unit for outputting as speech information the utterance sentence selected by the utterance determination unit (22) that serves as the utterance information selection unit.

Description

Information provision device

The present disclosure relates to an information providing device.

Various studies have been made on how to determine the contents of a response to a user in a dialogue system for performing a dialogue with a user (for example, see Patent Document 1).

Japanese Patent Application Laid-Open No. 2018-109663

However, there was room for study on how to determine the content of the system to actively speak to the user.

The present disclosure has been made in view of the above, and provides an information providing apparatus capable of giving a user a more appropriate utterance.

In order to achieve the above object, an information providing apparatus according to an exemplary embodiment of the present disclosure is an information providing apparatus that provides voice information to a user by uttering, and an utterance that is a candidate to utter to the user. A score assigning unit that calculates and assigns a score that is a numerical value related to the priority of the utterance of the utterance sentence to the sentence, the utterance sentence, and the score assignment unit that assigns the utterance sentence to the sentence. An utterance standby information holding unit that holds a score in association with the utterance information selection unit that selects an utterance sentence held in association with the highest score among the scores held in the utterance standby information holding unit And an output unit that outputs the utterance sentence selected by the utterance information selection unit as voice information.

According to the present disclosure, an information providing device capable of giving a user a more appropriate utterance is provided.

FIG. 2 is a diagram illustrating a schematic configuration of an information providing device. FIG. 3 is a diagram illustrating an utterance management module of the information providing device. FIG. 6 is a diagram illustrating an example of information held in an utterance standby information holding unit of the information providing device. It is a sequence diagram explaining an information provision method by an information provision device. FIG. 2 is a diagram illustrating a hardware configuration of an information providing device.

Hereinafter, exemplary embodiments of the present disclosure will be described with reference to the accompanying drawings. In the description of the drawings, the same elements will be denoted by the same reference symbols, without redundant description.

FIG. 1 is a diagram illustrating a schematic configuration of an information providing apparatus according to an embodiment of the present disclosure. The information providing device 1 illustrated in FIG. 1 is a device that provides information to a user U by voice. In addition, the information providing apparatus 1 has a function of responding by voice in response to an utterance from the user U. That is, the information providing device 1 functions as an interactive device capable of interacting with the user U. The information providing device 1 includes a dialogue module 10, an utterance management module 20, a response information generation module 30, and utterance

information generation modules

40A and 40B.

The information providing apparatus 1 according to the present embodiment provides information corresponding to a specific content that progresses in real time to the user U when the user U is watching or interested in the content. It is characterized by. The specific content that progresses in real time includes, for example, sports, horse racing, stock price fluctuation, and the like. Further, general information such as weather information and general news, or everyday life itself may be treated as “content”. When the user U is watching these contents, the information providing apparatus 1 provides information to the user U in accordance with the progress of the contents. Hereinafter, the content that the user U is watching or is interested in may be referred to as “target content”. The information providing device 1 provides information on the target content to the user U.

情報 Providing information to the user U in the above case does not mean responding to a question from the user U, but refers to voluntarily providing information to the user U from the information providing apparatus 1. The information providing device 1 also has a configuration that responds to a question from the user U. Therefore, the information providing device 1 is a device that responds to a question from the user U and voluntarily provides information to the user U.

The interaction module 10 of the information providing device 1 is a module serving as an interface for transmitting and receiving information by voice to and from the user U, and has a function of receiving a voice emitted by the user U and a function of emitting a voice to the user U. Having. That is, the dialogue module 10 also functions as an output unit that provides voice information to the user U by speaking.

機能 As a function of receiving a voice uttered by the user U, for example, a microphone is cited. Further, as a function of emitting a voice to the user U, for example, a speaker can be cited. Further, the dialogue module 10 may have a function of performing voice recognition processing of the received voice of the user U and converting the voice to text data. The voice information from the user U converted into text data is sent to the utterance management module 20 described later. Further, the dialogue module 10 may have a function of performing a speech synthesis process of text data provided from an utterance management module 20 described later for utterance to the user U. When the dialogue module 10 performs the above-described speech recognition processing and speech synthesis processing, transmission and reception of information (information provided to the user U or information obtained from the user U) between the dialogue module 10 and the utterance management module 20 Is text data.

The utterance management module 20 manages voice information uttered to the user U. Although the details will be described later, it manages what information about the target content is provided to the user U at what timing according to the progress of the target content. In addition, it has a function of appropriately responding to an inquiry from the user U when there is an inquiry from the user U. The utterance management module 20 has a function of managing information provision to the user U, including a response to an inquiry from the information provision device 1. The utterance management module 20 accumulates the response sentence generated by the response information generation module 30 and the utterance sentence generated by the utterance

information generation modules

40A and 40B, and stores the user U via the dialog module 10 at an appropriate timing. Is performed for output to Details of this control will be described later.

The response information generation module 30 is a module that generates a response sentence for the content of an inquiry from the user U. The response information generation module 30 generates a response sentence to the inquiry from the user U sent from the utterance management module 20. When generating the response sentence, a configuration may be employed in which communication or the like is performed with an external device or the like to obtain necessary information. The response sentence generated by the response information generation module 30 is sent to the utterance management module 20 and output to the user U.

Each utterance

information generation module

40A, 40B has a function of generating an utterance sentence that spontaneously utters to the user U. The information providing apparatus 1 shows an example in which two utterance

information generation modules

40A and 40B are used, but the number of utterance information generation modules may be one or three or more. In the information providing apparatus 1, the utterance information generation module 40A is a module that generates an utterance sentence including information directly related to the progress of the target content, and the utterance information generation module 40B is a module that is not directly related to the progress of the target content, It is assumed that the module is a module that generates an utterance sentence related to information related to. However, when a plurality of utterance information generation modules are provided, the classification of the above-described handling may or may not be performed.

The utterance information generation module 40A generates an utterance sentence relating to information relating to the progress of the target content. The utterance information generation module 40A acquires information from the content progress information providing device 50, which is an external device that provides information relating to the progress of the target content, in order to generate an utterance sentence. The content progress information providing device 50 is a device that provides information indicating the progress of the target content in a data format different from that of the text. The information indicating the progress of the target content includes, for example, information indicating the details of play when the target content is a sport. Further, when the target content is a stock price change, information on a stock having a large price change is included. When acquiring these pieces of information from the content progress information providing device 50, the utterance information generating module 40A generates a sentence (natural sentence) for explaining the pieces of information.

The utterance information generation module 40B generates an utterance sentence relating to information related to the target content. The utterance information generation module 40B acquires information from the content progress information providing device 50, which is an external device that provides information relating to the progress of the target content, in order to generate an utterance sentence. Further, information may be acquired from an external DB 60 (database) or the like, which is a device different from the content progress information providing device 50. Note that the information related to the target content includes, for example, information relating to a player who has performed a specific play when the target content is a sport, information describing a specific play, and the like. When the target content is a stock price fluctuation, information on stock prices in the same industry as the stock whose price fluctuation is large, information on related companies, and the like are given. By combining information relating to the progress of the target content output from the content progress information providing device 50, information output from the external DB 60, and the like, the utterance information generation module 40B generates a sentence for explaining the information. I do.

The response sentence generated in the utterance

information generation modules

40A and 40B is sent to the utterance management module 20 and output to the user U.

Next, the utterance management module 20 will be further described with reference to FIG. As shown in FIG. 2, the utterance management module 20 includes an utterance standby information holding unit 21, an utterance determination unit 22, and a score providing unit 23. The score assigning unit 23 includes a content score calculating unit 24, an elapsed time score calculating unit 25, and a situation score calculating unit 26.

The utterance standby information holding unit 21 has a function of holding the utterance sent from the utterance

information generation modules

40A and 40B. That is, it has a function of holding an utterance sentence that is information that is a candidate for providing voice information to the user U by utterance. The utterance determination unit 22 determines whether or not to provide voice information to the user U by utterance, and when uttering, from the utterance sentence held in the utterance standby information holding unit 21, A function to select an utterance sentence to be output (uttered) as voice information. The score assigning unit 23 assigns a score to each utterance sentence held in the utterance standby information holding unit 21. The score is a numerical value related to the priority of the utterance. It can be said that it is preferable that an utterance sentence with a higher assigned score be provided as voice information to the user U. In other words, the score is a value set in consideration of the surrounding environment and the like of the user U, and can be said to be a numerical value in consideration of the context.

The utterance

information generation modules

40A and 40B generate various utterance sentences in accordance with the progress of the target content. However, when the information providing apparatus 1 outputs all utterances to the user U, there is a possibility that the voice information output from the information providing apparatus 1 becomes excessive. Also, when there is an inquiry from the user U, it is desired to give priority to the response to the inquiry. Therefore, it is necessary to appropriately adjust the amount of the utterance sentence in the voice information output from the information providing apparatus 1.

It is also desirable that the content of the audio information output from the information providing apparatus 1 be appropriately changed according to the situation (particularly, the progress of the content). For example, in a situation where the target content is tense, the user U may want to know information describing the progress of the target content itself rather than related information of the target content. On the other hand, in a situation where the change of the target content is scarce, the information explaining the progress of the target content is reduced. Therefore, providing the relevant information of the target content may attract the user U.

In view of the above, in the information providing apparatus 1, the utterance sentence generated by the utterance

information generation modules

40A and 40B is not all uttered, but the utterance sentence is selected and output in consideration of the progress of the target content. I do. The utterance management module 20 performs this management.

The utterance standby information holding unit 21 of the utterance management module 20 accumulates the utterances generated in the utterance

information generation modules

40A and 40B (utterances as utterance candidates). Then, the utterance determination unit 22 determines which utterance sentence among the utterance sentences stored in the utterance

information generation modules

40A and 40B is output as the voice information. What is used for this determination is a score calculated and assigned by the score assigning unit 23. The utterance determination unit 22 refers to the score assigned by the score assigning unit 23 for each utterance sentence held in the utterance standby information holding unit 21, and the score is the highest (highest) and a predetermined threshold The utterance sentence described above is selected as an utterance sentence to be output as voice information.

The score assigning unit 23 calculates a score based mainly on three elements, and calculates a score for each utterance sentence by adding the scores. The three elements are a “content score”, an “elapsed time score”, and a “situation score”, and the scores associated with these elements are calculated by a content score calculation unit 24, an elapsed time score calculation unit 25, and a status The score is calculated by the score calculator 26.

"Content score" is a numerical value calculated based on the content included in the utterance sentence. The utterance sentence includes a plurality of words. The content score is a score given based on these plural words. The method of assigning the content score is not particularly limited, but a simple method is to determine a score to be assigned in advance for each word for a word related to the target content and add a score corresponding to each word included in the utterance sentence. There is a way to do it. Also, a method may be used in which a feature amount is calculated for each utterance sentence using a technique related to machine learning such as deep learning, and the feature amount is used as a score. For example, a plurality of pairs of a sentence composed of a set of words and a score (a numerical value indicated as a probability of 0 to 1) associated with the importance of the sentence are prepared, and teacher data is created. The score of the target utterance sentence may be calculated by using the teacher data and using a deep learning method such as CNN (Convolution Neural Network). The deep learning method or the machine learning method used for calculating the score is not particularly limited.

The “elapsed time score” is a numerical value corresponding to the elapsed time since the information providing apparatus 1 previously output the voice information to the user U. The information providing apparatus 1 changes the score according to the length of time since the last time the audio information was output to the user U. For the score to be assigned according to the elapsed time, for example, the elapsed time score calculation unit 25 prepares a formula for calculating a score based on the elapsed time in advance, and determines a numerical value to be automatically assigned based on the calculation formula. Method. However, it is not limited to this method.

The “situation score” is a numerical value that is determined based on the progress of the target content and that indicates the necessity of providing information according to the status of the target content. For example, in a situation where the target content is tense, the user U may set a higher score because there is a possibility that a request for audio information related to the target content may increase. In addition, in a situation where the target content is not urgent, the user U may be interested in the related information related to the target content. Therefore, it is possible to set a higher score for the related information. As described above, the status score is a score set according to the progress of the target content.

Note that the status score is set based on the progress status of the target content as described above. Therefore, the situation score calculation unit 26 may be configured to calculate the situation score using the information on the progress of the content from the content progress information providing device 50 acquired by the utterance

information generation modules

40A and 40B. In the case of such a configuration, the situation score calculation unit 26 includes, as a method of calculating the situation score, information included in the information related to the progress of the content provided from the content progress information providing apparatus 50, and A method in which a specific score is determined in advance for a highly relevant word can be used. For example, when the target content is soccer, a score of 1.0 is given to a “goal” considered to be a big event, and a score of 0.1 is given to a “pass” that tends to occur frequently in soccer. The method of giving is considered. In the case of soccer, a method of converting the position of the ball being played into a score and using the score may be considered. Further, when the target content is a stock price fluctuation, for example, a “fall” or “stop height” of the stock price may be a word to which a score is given.

状況 In addition, even when the target content is a general situation such as everyday life in general, a situation score can be given. Specifically, a high-value score is set in advance for a word indicating an event (for example, information indicating a change in weather) that is assumed that the user U desires to provide audio information from the information providing apparatus 1, A low score can be set in advance to a word indicating an event that is assumed that the user U does not want to provide the voice information (for example, information indicating that the user U is going out). With such a configuration, even when the target content is not limited to a specific category such as sports, a situation score can be given.

In the situation score calculation unit 26, for example, at the timing of the generation of the utterance sentence, among the information related to the progress of the content provided from the content progress information providing device 50, the latest (or a predetermined number from the latest) information is used. A method may be employed in which a specific word having a determined score is taken out, the sum of the scores is calculated, and the sum is used as a situation score. As described above, the method of giving the situation score is not particularly limited, and various methods can be used. In addition, a technique such as machine learning may be used for giving the situation score, similarly to the content score.

Note that the situation score may be calculated based on information different from the information acquired by the utterance

information generation modules

40A and 40B.

スコア The score calculated by each unit may be represented by a numerical value of 0 or more, or may be represented by a probability (0 to 1).

As described above, the content score calculation unit 24, the elapsed time score calculation unit 25, and the situation score calculation unit 26 calculate scores for one utterance sentence from different viewpoints. Then, the score assigning unit 23 calculates a score for one utterance sentence by adding the scores calculated by the respective units. The method of adding the scores is not particularly limited, and may be, for example, a simple addition. When the content score, elapsed time score, and situation score are each calculated with a probability (0 to 1), the total logarithm of each score is calculated to avoid underflow, and one utterance sentence is calculated. It may be a score for.

(4) For each utterance sentence generated in the utterance

information generation modules

40A and 40B and stored in the utterance standby information storage unit 21, a score is calculated and provided in the score providing unit 23. As a result, as shown in FIG. 3, the utterance standby information holding unit 21 assigns a score to each utterance sentence (queue) waiting for the utterance from the information providing apparatus 1 and holds the score. In the example shown in FIG. 3, a score of 0.8 is associated with an utterance sentence "A player has shot." Further, a score of 0.7 is associated with the utterance sentence “A player is from XX”. As described above, the utterance standby information holding unit 21 holds the utterance sentences generated by the utterance

information generation modules

40A and 40B, respectively, in a state where the utterances are individually assigned.

The score associated with the utterance sentence held in the utterance standby information holding unit 21 may be updated as appropriate. For example, the score given to each utterance may not be appropriate depending on the progress of the target content. If the utterance using the utterance sentence held in the utterance standby information holding unit 21 or the voice output from the information providing apparatus 1 is performed in response to the inquiry from the user U, The time is reset. Therefore, the score held in the utterance standby information holding unit 21 may be updated by the score giving unit 23 as appropriate (for example, at a predetermined timing).

The utterance determination unit 22 refers to the utterance standby information holding unit 21 periodically (for example, every several hundred milliseconds). Then, the utterance determination unit 22 refers to the utterance sentence having the highest score among the utterance sentences held in the utterance standby information holding unit 21, and determines whether the score given to the utterance sentence is a predetermined threshold value. In the case described above, it is determined that the user U is uttered based on the utterance sentence, and the utterance sentence is selected. That is, the utterance determination unit 22 has a function as an utterance information selection unit that selects an utterance sentence to be provided as voice information to the user U.

Note that the utterance determination unit 22 does not need to make an utterance when the score given to the utterance sentence having the highest score among the utterance sentences held in the utterance standby information holding unit 21 is smaller than the threshold. It may be determined that the utterance is not performed until the next opportunity.

The utterance sentence determined to be uttered by the utterance determination unit 22 is deleted from the utterance standby information holding unit 21. This can prevent the same utterance sentence from being uttered again. Further, an utterance sentence that has passed a predetermined time after being held in the utterance standby information holding unit 21 can also be deleted from the utterance standby information holding unit 21 as the possibility of utterance has been reduced. With such a configuration, among the utterance sentences held in the utterance standby information holding unit 21, utterance sentences that are not scheduled to be uttered in the future (low probability) are held for a long time, and the data amount increases. Can be prevented.

Note that the timing at which the utterance determination unit 22 refers to the utterance standby information holding unit 21 may be appropriately changed according to the presence or absence of an utterance (such as an inquiry) from the user U. For example, when there is an utterance from the user U, the information providing device 1 gives priority to responding to the utterance from the user U, and the spontaneous utterance from the information providing device 1 is omitted. Therefore, in such a state, the utterance determination unit 22 may omit reference to the utterance standby information holding unit 21 itself. In the information providing apparatus 1, the utterance management module 20 also manages a response to an utterance from the user U. Therefore, the utterance management module 20 manages the information to be uttered based on the priority of the utterance and the like so that the response to the utterance from the user U and the spontaneous provision of the voice information can be compatible.

Next, an information providing method in the information providing apparatus 1 will be described with reference to FIG. FIG. 4 illustrates a procedure related to voluntary provision of an audio information field from the information providing apparatus 1 side. Note that FIG. 4 does not describe a procedure relating to a response to an utterance from the user U.

As shown in FIG. 4, the utterance

information generation modules

40A and 40B of the information providing device 1 acquire information on the progress of the target content from the content progress information providing device 50 (S01). The utterance

information generation modules

40A and 40B generate an utterance sentence for the information providing apparatus 1 to output voice to the user U based on the content progress information from the content progress information providing apparatus 50 (S02). The generated utterance sentence is sent from the utterance

information generation modules

40A and 40B to the utterance management module 20 (S03).

In the utterance management module 20, for the acquired utterance sentence, first, a score of the utterance sentence is calculated in the score assigning unit 23 (S04). To calculate the score, a content score calculation process in the content score calculation unit 24, an elapsed time score calculation process in the elapsed time score calculation unit 25, a status score calculation process in the status score calculation unit 26, and a score The adding unit 23 includes a process of adding these. After calculating the score, the utterance waiting information holding unit 21 holds the utterance sentence and a score corresponding to the utterance sentence (S05). After the utterance sentence is held in the utterance standby information holding unit 21 (S05), the score is calculated (S04), and the score is associated with the utterance sentence and held in the utterance standby information holding unit 21. .

In the utterance management module 20, the score associated with the utterance sentence held in the utterance standby information holding unit 21 may be updated by the score assigning unit 23 as necessary (S06).

After that, the utterance determination unit 22 of the utterance management module 20 refers to the utterance standby information storage unit 21 and determines whether to utter the utterance sentence stored in the utterance standby information storage unit 21 (S07). When the utterance determination unit 22 determines that no utterance is made, the subsequent processing is not performed, and the utterance determination (S07) is repeated periodically. On the other hand, when it is determined that the utterance is performed, the utterance sentence to be output as the voice information is sent to the dialogue module 10 by the utterance determination unit 22 (S08). Then, the utterance sentence is converted into a voice by the dialogue module 10 and output to the user U, that is, the utterance is performed (S09).

As described above, the information providing apparatus 1 according to the present embodiment is an information providing apparatus that provides voice information to the user U by uttering the utterance sentence that is a candidate to utter the user U. A score assignment unit 23 that calculates and assigns a score that is a numerical value related to the priority of the utterance of the utterance sentence, and associates and holds the utterance sentence and the score assigned to the utterance sentence by the score assignment unit. And an utterance determining unit 22 as an utterance information selecting unit that selects an utterance sentence held in association with the highest score among the scores held in the utterance standby information holding unit 21 And a dialogue module 10 as an output unit that outputs, as speech information, the utterance sentence selected by the utterance determination unit 22 as the utterance information selection unit.

According to the information providing apparatus 1 described above, a score that is a numerical value related to the priority of the utterance of the utterance is calculated and assigned to each utterance, and is held by the utterance standby information holding unit 21 in association with the utterance. You. The utterance sentence associated with the highest score is selected by the utterance determination unit 22 as the utterance information selection unit, and is output as voice information by the dialogue module 10 as the output unit. In the information providing apparatus 1, the utterance sentence can be selected and output as audio information based on the score related to the priority by having such a configuration. It is possible to perform more appropriate utterance.

In the information providing apparatus 1, the utterance determination unit 22 as the utterance information selection unit stores the utterance sentence held in association with the highest score among the scores held in the utterance standby information holding unit 21. When a predetermined condition is satisfied, the utterance sentence is selected. With such a configuration, it is possible to output, as voice information, an utterance sentence to which a score that not only has the highest score but also satisfies other conditions is given. Therefore, it is possible to give a more appropriate utterance to the user according to the priority.

In the above embodiment, the predetermined condition is whether or not the score is equal to or more than a predetermined threshold. With such a configuration, an utterance sentence whose score is larger than a predetermined threshold, that is, whose priority is considered to be sufficiently high, can be output as voice information. Therefore, even if the utterance sentence held in the utterance waiting information holding unit has the highest score, it is possible to prevent the utterance sentence whose priority is not sufficiently high from being output as the voice information, so that the user is prevented from outputting the utterance sentence. It is possible to make a more appropriate utterance.

(4) The score assigning unit 23 is configured to calculate a score based on a content score corresponding to the content of the utterance sentence.

こと By adopting a configuration in which a score is calculated based on the content score corresponding to the content of the utterance sentence, an appropriate score according to the content of the utterance sentence can be given. Therefore, for example, a score indicating higher priority can be given to an utterance sentence relating to important content, and a more appropriate utterance can be given to the user.

スコア The score assigning unit 23 is configured to calculate the score based on the elapsed time score related to the elapsed time from the previous utterance by the own device.

こと By using a configuration in which the score is calculated based on the elapsed time score related to the elapsed time from the previous utterance by the own device, a score considering the elapsed time can be given. Therefore, for example, it is possible to prevent the utterance from being performed when the elapsed time from the previous utterance is too short, so that it is possible to perform a more appropriate utterance to the user.

(4) The score assigning unit 23 is configured to calculate a score based on a situation score relating to the situation of the content to which the audio information is provided.

こと By adopting a configuration in which the score is calculated based on the situation score relating to the content situation, it is possible to give an appropriate score according to the content situation. Therefore, for example, when it is better to reduce the provision of audio information based on the content situation, a score can be given in consideration of the timing at which the provision of audio information is desired, such as lowering the score. More appropriate utterance.

The information providing apparatus 1 described in the above embodiment is not limited to the above configuration, and various changes can be made.

In the above-described embodiment, the case where the information providing apparatus 1 is configured by one apparatus has been described. However, a configuration in which the function related to the information providing apparatus 1 is dispersedly arranged in a plurality of apparatuses may be employed. . For example, each module constituting the information providing device 1 may be an individual device. Further, each module may be configured by a plurality of devices.

Further, in the above-described embodiment, the case where the information providing apparatus 1 has a function of responding to the utterance (inquiry) from the user U has been described. What is necessary is just to have the function which concerns on.

In the above-described embodiment, the configuration in which the content score, the elapsed time score, and the situation score are calculated in the score assigning unit 23 has been described. However, the score assigned by the score assigning unit 23 includes a content score, an elapsed time score, and the like. , And any of the situation scores may not be included. The score assigning unit 23 may calculate the score based on information different from the above three scores. Further, the score assigning unit 23 may calculate a score by combining the above three scores and a score calculated based on other information. As described above, the method of calculating the score provided by the score providing unit 23 can be appropriately changed.

In the above-described embodiment, the configuration is described in which the utterance determination unit 22 determines that the utterance is uttered as voice information when the utterance sentence has the highest score and is equal to or greater than a predetermined threshold. However, the utterance determination unit 22 may be configured to select the utterance sentence having the highest score among the utterance sentences held in the utterance standby information holding unit 21 when other conditions are satisfied. . The other condition may be, for example, that the length of the utterance sentence is equal to or less than a predetermined number of characters. Other conditions (predetermined conditions) may be set in consideration of the function related to the output of audio information from the information providing device 1 and the like. The utterance determination unit 22 may be configured to select the utterance sentence having the highest score from the utterance sentences held in the utterance standby information holding unit 21. In this case, the output frequency of the voice information may be adjusted by adjusting the utterance sentence selection timing by the utterance determination unit 22 (periodical utterance sentence selection timing).

(Other)
The block diagram used in the description of the above-described embodiment shows blocks in functional units. These functional blocks (components) are realized by an arbitrary combination of hardware and / or software. The means for implementing each functional block is not particularly limited. That is, each functional block may be realized by one device that is physically and / or logically coupled, or two or more devices that are physically and / or logically separated are directly and / or indirectly connected. (For example, wired and / or wireless) and may be realized by these multiple devices.

For example, the information providing apparatus 1 according to an embodiment of the present disclosure may function as a computer that performs processing of the information providing apparatus 1 according to the present embodiment. FIG. 5 is a diagram illustrating an example of a hardware configuration of the information providing apparatus 1 according to the present embodiment. The information providing device 1 described above may be physically configured as a computer device including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like.

In the following description, the term “apparatus” can be read as a circuit, a device, a unit, or the like. The hardware configuration of the information providing apparatus 1 may be configured to include one or more devices illustrated in the drawing, or may be configured without including some devices.

The functions of the information providing apparatus 1 are performed by reading predetermined software (program) on hardware such as the processor 1001 and the memory 1002, so that the processor 1001 performs an arithmetic operation, communication by the communication apparatus 1004, and communication between the memory 1002 and the storage. This is realized by controlling the reading and / or writing of data in 1003.

The processor 1001 controls the entire computer by operating an operating system, for example. The processor 1001 may be configured by a central processing unit (CPU: Central Processing Unit) including an interface with a peripheral device, a control device, an arithmetic device, a register, and the like. For example, each function of the information providing device 1 may be realized by the processor 1001.

The processor 1001 reads out a program (program code), a software module, and data from the storage 1003 and / or the communication device 1004 to the memory 1002, and executes various processes according to these. As the program, a program that causes a computer to execute at least a part of the operation described in the above embodiment is used. For example, each function of the information providing apparatus 1 may be realized by a control program stored in the memory 1002 and operated by the processor 1001. Although it has been described that the various processes described above are executed by one processor 1001, the processes may be executed simultaneously or sequentially by two or more processors 1001. Processor 1001 may be implemented with one or more chips. Note that the program may be transmitted from a network via a telecommunication line.

The memory 1002 is a computer-readable recording medium, and is configured by at least one of a ROM (Read Only Memory), an EPROM (Erasable Programmable ROM), an EEPROM (Electrically Erasable Programmable ROM), a RAM (Random Access Memory), and the like. May be done. The memory 1002 may be called a register, a cache, a main memory (main storage device), or the like. The memory 1002 can store a program (program code), a software module, and the like that can be executed to perform the method according to an embodiment of the present disclosure.

The storage 1003 is a computer-readable recording medium, for example, an optical disk such as a CD-ROM (Compact Disc), a hard disk drive, a flexible disk, and a magneto-optical disk (for example, a compact disk, a digital versatile disk, a Blu-ray). (Registered trademark) disk), a smart card, a flash memory (for example, a card, a stick, a key drive), a floppy (registered trademark) disk, and a magnetic strip. The storage 1003 may be called an auxiliary storage device. The storage medium described above may be, for example, a database including the memory 1002 and / or the storage 1003, a server, or any other suitable medium.

The communication device 1004 is hardware (transmitting / receiving device) for performing communication between computers via a wired and / or wireless network, and is also referred to as, for example, a network device, a network controller, a network card, a communication module, or the like. For example, each function of the information providing device 1 may be realized by the communication device 1004.

The input device 1005 is an input device (for example, a keyboard, a mouse, a microphone, a switch, a button, a sensor, and the like) that receives an external input. The output device 1006 is an output device that performs output to the outside (for example, a display, a speaker, an LED lamp, and the like). Note that the input device 1005 and the output device 1006 may have an integrated configuration (for example, a touch panel).

The devices such as the processor 1001 and the memory 1002 are connected by a bus 1007 for communicating information. The bus 1007 may be configured by a single bus, or may be configured by a different bus between the devices.

The information providing device 1 includes hardware such as a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array). And some or all of the functional blocks may be realized by the hardware. For example, the processor 1001 may be implemented by at least one of these hardware.

Although the present disclosure has been described in detail above, it is obvious to those skilled in the art that the present disclosure is not limited to the embodiments described in the present disclosure. The present disclosure can be implemented as modified and changed aspects without departing from the spirit and scope of the present disclosure defined by the description of the claims. Therefore, the description of the present disclosure is intended for illustrative purposes, and has no restrictive meaning to the present embodiment.

通知 Notification of information is not limited to the aspects / embodiments described in this specification, and may be performed by other methods. For example, the notification of information includes physical layer signaling (for example, DCI (Downlink Control Information), UCI (Uplink Control Information)), upper layer signaling (for example, RRC (Radio Resource Control) signaling, MAC (Medium Access Control) signaling, Broadcast information (MIB (Master @ Information @ Block), SIB (System @ Information @ Block))), other signals, or a combination thereof may be used. The RRC signaling may be referred to as an RRC message, and may be, for example, an RRC connection setup (RRC Connection Setup) message, an RRC connection reconfiguration (RRC Connection Reconfiguration) message, or the like.

Each aspect / embodiment described in the present disclosure is applicable to LTE (Long Term Evolution), LTE-A (LTE-Advanced), SUPER 3G, IMT-Advanced, 4G (4th generation mobile communication system), and 5G (5th generation mobile communication). system), FRA (Future Radio Access), NR (new Radio), W-CDMA (registered trademark), GSM (registered trademark), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi (registered trademark) )), Systems using IEEE@802.16 (WiMAX®), IEEE@802.20, UWB (Ultra-WideBand), Bluetooth®, and other suitable systems and extensions based thereon. It may be applied to at least one of the next generation systems. A plurality of systems may be combined (for example, a combination of at least one of LTE and LTE-A with 5G) and applied.

処理 The processing procedures, sequences, flowcharts, and the like of each aspect / embodiment described in this specification may be interchanged as long as there is no inconsistency. For example, the methods described herein present elements of various steps in a sample order, and are not limited to the specific order presented.

Information and the like can be output from an upper layer (or lower layer) to a lower layer (or upper layer). Input and output may be performed via a plurality of network nodes.

情報 Input and output information and the like may be stored in a specific place (for example, a memory) or may be managed by a management table. Information that is input and output can be overwritten, updated, or added. The output information or the like may be deleted. The input information or the like may be transmitted to another device.

The determination may be made based on a value (0 or 1) represented by one bit, a Boolean value (Boolean: true or false), or a comparison of numerical values (for example, a predetermined value). Value).

各 Each aspect / embodiment described in the present disclosure may be used alone, may be used in combination, or may be used by switching with execution. Further, the notification of the predetermined information (for example, the notification of “X”) is not limited to being explicitly performed, and is performed implicitly (for example, not performing the notification of the predetermined information). Is also good.

Software, regardless of whether it is called software, firmware, middleware, microcode, a hardware description language, or any other name, instructions, instruction sets, codes, code segments, program codes, programs, subprograms, software modules , Applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, and the like.

ソフトウェア Also, software, instructions, and the like may be transmitted and received via a transmission medium. For example, if the software uses a wired technology such as coaxial cable, fiber optic cable, twisted pair and digital subscriber line (DSL) and / or a wireless technology such as infrared, wireless and microwave, the website, server, or other When transmitted from a remote source, these wired and / or wireless technologies are included within the definition of transmission medium.

The information, signals, etc. described in this disclosure may be represented using any of a variety of different technologies. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc., that can be referred to throughout the above description are not limited to voltages, currents, electromagnetic waves, magnetic or magnetic particles, optical or photons, or any of these. May be represented by a combination of

用語 Note that terms described in the present disclosure and / or terms necessary for understanding the present disclosure may be replaced with terms having the same or similar meaning.

用語 As used in this disclosure, the terms “system” and “network” are used interchangeably.

情報 Further, the information, parameters, and the like described in the present disclosure may be represented by an absolute value, a relative value from a predetermined value, or another corresponding information.

名称 The names used for the above parameters are not limiting in any way. Further, the formulas and the like that use these parameters may differ from those explicitly disclosed herein.

名称 The names used for the above parameters are not limiting in any way. Further, equations and the like using these parameters may differ from those explicitly disclosed in the present disclosure.

用語 As used in this disclosure, the terms “determining” and “determining” may encompass a wide variety of operations. “Judgment” and “decision” are, for example, judgment (judging), calculation (calculating), calculation (computing), processing (processing), derivation (deriving), investigating (investigating), searching (looking up) (for example, table , Searching in a database or another data structure), ascertaining what is considered as "determining", "determining", and the like. Also, “determining” and “deciding” include receiving (eg, receiving information), transmitting (eg, transmitting information), input (input), output (output), and access. (accessing) (for example, accessing data in a memory) may be regarded as “determined” or “determined”. In addition, `` judgment '' and `` decision '' means that resolving, selecting, selecting, establishing, establishing, comparing, etc. are regarded as `` judgment '' and `` decided ''. May be included. In other words, “judgment” and “decision” may include deeming any operation as “judgment” and “determined”.

The terms "connected," "coupled," or any variation thereof, mean any direct or indirect connection or connection between two or more elements that It may include the presence of one or more intermediate elements between the two elements "connected" or "coupled." The coupling or connection between the elements may be physical, logical, or a combination thereof. As used in this disclosure, the two elements may be implemented using one or more wires, cables and / or printed electrical connections, and as some non-limiting and non-exhaustive examples, in the radio frequency domain. Can be considered "connected" or "coupled" to each other by using electromagnetic energy, such as electromagnetic energy having wavelengths in the microwave and light (both visible and invisible) regions.

記載 The term "based on" as used in the present disclosure does not mean "based solely on" unless stated otherwise. In other words, the phrase "based on" means both "based only on" and "based at least on."

As long as “include”, “including”, and variations thereof, are used in the present disclosure or in the claims, these terms are as well as “comprising” It is intended to be comprehensive. Further, the term "or" as used in the present disclosure or claims is not intended to be an exclusive or.

In the present disclosure, a plurality of devices are also included unless the device is clearly and only one exists in context or technically. Throughout this disclosure, the singular may include the plural unless the context clearly dictates otherwise.

において In the present disclosure, if articles are added by translation, for example, a, an, and the in English, the present disclosure may include that the nouns following these articles are plural.

において In the present disclosure, the term “A and B are different” may mean that “A and B are different from each other”. The term may mean that “A and B are different from C”. Terms such as "separate", "coupled" and the like may be interpreted similarly to "different".

DESCRIPTION OF SYMBOLS 1 ... Information provision apparatus, 10 ... Dialog module, 20 ... Utterance management module, 21 ... Utterance standby information holding part, 22 ... Utterance determination part, 23 ... Score giving part, 24 ... Content score calculation part, 25 ... Elapsed time score calculation Unit, 26: situation score calculation unit, 30: response information generation module, 40A, 40B: utterance information generation module.

Claims

An information providing apparatus for providing voice information to a user by uttering,
A score assigning unit that calculates and assigns a score that is a numerical value related to the priority of the utterance of the utterance sentence to an utterance sentence to be uttered to the user;
The utterance sentence, an utterance standby information holding unit that holds the utterance sentence in association with the score given by the score providing unit,
An utterance information selection unit that selects an utterance sentence held in association with the highest score among the scores held in the utterance standby information holding unit,
An output unit that outputs the utterance sentence selected by the utterance information selection unit as voice information,
An information providing device, comprising:
The utterance information selecting unit, when the utterance sentence held in association with the highest score among the scores held in the utterance standby information holding unit satisfies a predetermined condition, the utterance information selection unit The information providing apparatus according to claim 1, wherein the information providing apparatus is selected.
The information providing device according to claim 2, wherein the predetermined condition is whether the score is equal to or greater than a predetermined threshold.
4. The information providing apparatus according to claim 1, wherein the score providing unit calculates the score based on a content score corresponding to the content of the utterance sentence.
The information providing apparatus according to any one of claims 1 to 4, wherein the score assigning unit calculates the score based on an elapsed time score relating to an elapsed time from a previous utterance by the own apparatus.
The information providing apparatus according to any one of claims 1 to 5, wherein the score providing unit calculates the score based on a situation score relating to a situation of the content to which the audio information is provided.