WO2020071255A1 - Information presentation device - Google Patents

Information presentation device

Info

Publication number
WO2020071255A1
WO2020071255A1 PCT/JP2019/038045 JP2019038045W WO2020071255A1 WO 2020071255 A1 WO2020071255 A1 WO 2020071255A1 JP 2019038045 W JP2019038045 W JP 2019038045W WO 2020071255 A1 WO2020071255 A1 WO 2020071255A1
Authority
WO
WIPO (PCT)
Prior art keywords
utterance
information
score
unit
sentence
Prior art date
Application number
PCT/JP2019/038045
Other languages
French (fr)
Japanese (ja)
Inventor
優太朗 白水
Original Assignee
株式会社Nttドコモ
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Nttドコモ filed Critical 株式会社Nttドコモ
Priority to JP2020550375A priority Critical patent/JP7146933B2/en
Publication of WO2020071255A1 publication Critical patent/WO2020071255A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Definitions

  • the present disclosure relates to an information providing device.
  • Patent Document 1 Various studies have been made on how to determine the contents of a response to a user in a dialogue system for performing a dialogue with a user (for example, see Patent Document 1).
  • the present disclosure has been made in view of the above, and provides an information providing apparatus capable of giving a user a more appropriate utterance.
  • an information providing apparatus that provides voice information to a user by uttering, and an utterance that is a candidate to utter to the user.
  • a score assigning unit that calculates and assigns a score that is a numerical value related to the priority of the utterance of the utterance sentence to the sentence, the utterance sentence, and the score assignment unit that assigns the utterance sentence to the sentence.
  • An utterance standby information holding unit that holds a score in association with the utterance information selection unit that selects an utterance sentence held in association with the highest score among the scores held in the utterance standby information holding unit And an output unit that outputs the utterance sentence selected by the utterance information selection unit as voice information.
  • an information providing device capable of giving a user a more appropriate utterance is provided.
  • FIG. 2 is a diagram illustrating a schematic configuration of an information providing device.
  • FIG. 3 is a diagram illustrating an utterance management module of the information providing device.
  • FIG. 6 is a diagram illustrating an example of information held in an utterance standby information holding unit of the information providing device. It is a sequence diagram explaining an information provision method by an information provision device.
  • FIG. 2 is a diagram illustrating a hardware configuration of an information providing device.
  • FIG. 1 is a diagram illustrating a schematic configuration of an information providing apparatus according to an embodiment of the present disclosure.
  • the information providing device 1 illustrated in FIG. 1 is a device that provides information to a user U by voice.
  • the information providing apparatus 1 has a function of responding by voice in response to an utterance from the user U. That is, the information providing device 1 functions as an interactive device capable of interacting with the user U.
  • the information providing device 1 includes a dialogue module 10, an utterance management module 20, a response information generation module 30, and utterance information generation modules 40A and 40B.
  • the information providing apparatus 1 provides information corresponding to a specific content that progresses in real time to the user U when the user U is watching or interested in the content. It is characterized by.
  • the specific content that progresses in real time includes, for example, sports, horse racing, stock price fluctuation, and the like. Further, general information such as weather information and general news, or everyday life itself may be treated as “content”.
  • the information providing apparatus 1 provides information to the user U in accordance with the progress of the contents.
  • the content that the user U is watching or is interested in may be referred to as “target content”.
  • the information providing device 1 provides information on the target content to the user U.
  • the information providing device 1 also has a configuration that responds to a question from the user U. Therefore, the information providing device 1 is a device that responds to a question from the user U and voluntarily provides information to the user U.
  • the interaction module 10 of the information providing device 1 is a module serving as an interface for transmitting and receiving information by voice to and from the user U, and has a function of receiving a voice emitted by the user U and a function of emitting a voice to the user U. Having. That is, the dialogue module 10 also functions as an output unit that provides voice information to the user U by speaking.
  • the dialogue module 10 may have a function of performing voice recognition processing of the received voice of the user U and converting the voice to text data.
  • the voice information from the user U converted into text data is sent to the utterance management module 20 described later.
  • the dialogue module 10 may have a function of performing a speech synthesis process of text data provided from an utterance management module 20 described later for utterance to the user U.
  • the utterance management module 20 manages voice information uttered to the user U. Although the details will be described later, it manages what information about the target content is provided to the user U at what timing according to the progress of the target content. In addition, it has a function of appropriately responding to an inquiry from the user U when there is an inquiry from the user U.
  • the utterance management module 20 has a function of managing information provision to the user U, including a response to an inquiry from the information provision device 1.
  • the utterance management module 20 accumulates the response sentence generated by the response information generation module 30 and the utterance sentence generated by the utterance information generation modules 40A and 40B, and stores the user U via the dialog module 10 at an appropriate timing. Is performed for output to Details of this control will be described later.
  • the response information generation module 30 is a module that generates a response sentence for the content of an inquiry from the user U.
  • the response information generation module 30 generates a response sentence to the inquiry from the user U sent from the utterance management module 20.
  • a configuration may be employed in which communication or the like is performed with an external device or the like to obtain necessary information.
  • the response sentence generated by the response information generation module 30 is sent to the utterance management module 20 and output to the user U.
  • Each utterance information generation module 40A, 40B has a function of generating an utterance sentence that spontaneously utters to the user U.
  • the information providing apparatus 1 shows an example in which two utterance information generation modules 40A and 40B are used, but the number of utterance information generation modules may be one or three or more.
  • the utterance information generation module 40A is a module that generates an utterance sentence including information directly related to the progress of the target content
  • the utterance information generation module 40B is a module that is not directly related to the progress of the target content
  • the module is a module that generates an utterance sentence related to information related to.
  • the classification of the above-described handling may or may not be performed.
  • the utterance information generation module 40A generates an utterance sentence relating to information relating to the progress of the target content.
  • the utterance information generation module 40A acquires information from the content progress information providing device 50, which is an external device that provides information relating to the progress of the target content, in order to generate an utterance sentence.
  • the content progress information providing device 50 is a device that provides information indicating the progress of the target content in a data format different from that of the text.
  • the information indicating the progress of the target content includes, for example, information indicating the details of play when the target content is a sport. Further, when the target content is a stock price change, information on a stock having a large price change is included.
  • the utterance information generating module 40A When acquiring these pieces of information from the content progress information providing device 50, the utterance information generating module 40A generates a sentence (natural sentence) for explaining the pieces of information.
  • the utterance information generation module 40B generates an utterance sentence relating to information related to the target content.
  • the utterance information generation module 40B acquires information from the content progress information providing device 50, which is an external device that provides information relating to the progress of the target content, in order to generate an utterance sentence. Further, information may be acquired from an external DB 60 (database) or the like, which is a device different from the content progress information providing device 50.
  • the information related to the target content includes, for example, information relating to a player who has performed a specific play when the target content is a sport, information describing a specific play, and the like.
  • the target content is a stock price fluctuation
  • information on stock prices in the same industry as the stock whose price fluctuation is large, information on related companies, and the like are given.
  • the utterance information generation module 40B By combining information relating to the progress of the target content output from the content progress information providing device 50, information output from the external DB 60, and the like, the utterance information generation module 40B generates a sentence for explaining the information. I do.
  • the response sentence generated in the utterance information generation modules 40A and 40B is sent to the utterance management module 20 and output to the user U.
  • the utterance management module 20 includes an utterance standby information holding unit 21, an utterance determination unit 22, and a score providing unit 23.
  • the score assigning unit 23 includes a content score calculating unit 24, an elapsed time score calculating unit 25, and a situation score calculating unit 26.
  • the utterance standby information holding unit 21 has a function of holding the utterance sent from the utterance information generation modules 40A and 40B. That is, it has a function of holding an utterance sentence that is information that is a candidate for providing voice information to the user U by utterance.
  • the utterance determination unit 22 determines whether or not to provide voice information to the user U by utterance, and when uttering, from the utterance sentence held in the utterance standby information holding unit 21, A function to select an utterance sentence to be output (uttered) as voice information.
  • the score assigning unit 23 assigns a score to each utterance sentence held in the utterance standby information holding unit 21. The score is a numerical value related to the priority of the utterance.
  • an utterance sentence with a higher assigned score be provided as voice information to the user U.
  • the score is a value set in consideration of the surrounding environment and the like of the user U, and can be said to be a numerical value in consideration of the context.
  • the utterance information generation modules 40A and 40B generate various utterance sentences in accordance with the progress of the target content.
  • the information providing apparatus 1 outputs all utterances to the user U
  • the voice information output from the information providing apparatus 1 becomes excessive.
  • the content of the audio information output from the information providing apparatus 1 be appropriately changed according to the situation (particularly, the progress of the content).
  • the user U may want to know information describing the progress of the target content itself rather than related information of the target content.
  • the information explaining the progress of the target content is reduced. Therefore, providing the relevant information of the target content may attract the user U.
  • the utterance sentence generated by the utterance information generation modules 40A and 40B is not all uttered, but the utterance sentence is selected and output in consideration of the progress of the target content. I do.
  • the utterance management module 20 performs this management.
  • the utterance standby information holding unit 21 of the utterance management module 20 accumulates the utterances generated in the utterance information generation modules 40A and 40B (utterances as utterance candidates). Then, the utterance determination unit 22 determines which utterance sentence among the utterance sentences stored in the utterance information generation modules 40A and 40B is output as the voice information. What is used for this determination is a score calculated and assigned by the score assigning unit 23. The utterance determination unit 22 refers to the score assigned by the score assigning unit 23 for each utterance sentence held in the utterance standby information holding unit 21, and the score is the highest (highest) and a predetermined threshold The utterance sentence described above is selected as an utterance sentence to be output as voice information.
  • the score assigning unit 23 calculates a score based mainly on three elements, and calculates a score for each utterance sentence by adding the scores.
  • the three elements are a “content score”, an “elapsed time score”, and a “situation score”, and the scores associated with these elements are calculated by a content score calculation unit 24, an elapsed time score calculation unit 25, and a status
  • the score is calculated by the score calculator 26.
  • Content score is a numerical value calculated based on the content included in the utterance sentence.
  • the utterance sentence includes a plurality of words.
  • the content score is a score given based on these plural words.
  • the method of assigning the content score is not particularly limited, but a simple method is to determine a score to be assigned in advance for each word for a word related to the target content and add a score corresponding to each word included in the utterance sentence. There is a way to do it. Also, a method may be used in which a feature amount is calculated for each utterance sentence using a technique related to machine learning such as deep learning, and the feature amount is used as a score.
  • a plurality of pairs of a sentence composed of a set of words and a score (a numerical value indicated as a probability of 0 to 1) associated with the importance of the sentence are prepared, and teacher data is created.
  • the score of the target utterance sentence may be calculated by using the teacher data and using a deep learning method such as CNN (Convolution Neural Network).
  • the deep learning method or the machine learning method used for calculating the score is not particularly limited.
  • the “elapsed time score” is a numerical value corresponding to the elapsed time since the information providing apparatus 1 previously output the voice information to the user U.
  • the information providing apparatus 1 changes the score according to the length of time since the last time the audio information was output to the user U.
  • the elapsed time score calculation unit 25 prepares a formula for calculating a score based on the elapsed time in advance, and determines a numerical value to be automatically assigned based on the calculation formula. Method. However, it is not limited to this method.
  • the “situation score” is a numerical value that is determined based on the progress of the target content and that indicates the necessity of providing information according to the status of the target content. For example, in a situation where the target content is tense, the user U may set a higher score because there is a possibility that a request for audio information related to the target content may increase. In addition, in a situation where the target content is not urgent, the user U may be interested in the related information related to the target content. Therefore, it is possible to set a higher score for the related information. As described above, the status score is a score set according to the progress of the target content.
  • the situation score calculation unit 26 may be configured to calculate the situation score using the information on the progress of the content from the content progress information providing device 50 acquired by the utterance information generation modules 40A and 40B.
  • the situation score calculation unit 26 includes, as a method of calculating the situation score, information included in the information related to the progress of the content provided from the content progress information providing apparatus 50, and A method in which a specific score is determined in advance for a highly relevant word can be used.
  • a score of 1.0 is given to a “goal” considered to be a big event, and a score of 0.1 is given to a “pass” that tends to occur frequently in soccer.
  • the method of giving is considered.
  • a method of converting the position of the ball being played into a score and using the score may be considered.
  • the target content is a stock price fluctuation, for example, a “fall” or “stop height” of the stock price may be a word to which a score is given.
  • a situation score can be given even when the target content is a general situation such as everyday life in general. Specifically, a high-value score is set in advance for a word indicating an event (for example, information indicating a change in weather) that is assumed that the user U desires to provide audio information from the information providing apparatus 1, A low score can be set in advance to a word indicating an event that is assumed that the user U does not want to provide the voice information (for example, information indicating that the user U is going out). With such a configuration, even when the target content is not limited to a specific category such as sports, a situation score can be given.
  • a situation score can be given even when the target content is not limited to a specific category such as sports.
  • the situation score calculation unit 26 for example, at the timing of the generation of the utterance sentence, among the information related to the progress of the content provided from the content progress information providing device 50, the latest (or a predetermined number from the latest) information is used.
  • a method may be employed in which a specific word having a determined score is taken out, the sum of the scores is calculated, and the sum is used as a situation score.
  • the method of giving the situation score is not particularly limited, and various methods can be used.
  • a technique such as machine learning may be used for giving the situation score, similarly to the content score.
  • the situation score may be calculated based on information different from the information acquired by the utterance information generation modules 40A and 40B.
  • the score calculated by each unit may be represented by a numerical value of 0 or more, or may be represented by a probability (0 to 1).
  • the content score calculation unit 24, the elapsed time score calculation unit 25, and the situation score calculation unit 26 calculate scores for one utterance sentence from different viewpoints. Then, the score assigning unit 23 calculates a score for one utterance sentence by adding the scores calculated by the respective units.
  • the method of adding the scores is not particularly limited, and may be, for example, a simple addition.
  • the content score, elapsed time score, and situation score are each calculated with a probability (0 to 1), the total logarithm of each score is calculated to avoid underflow, and one utterance sentence is calculated. It may be a score for.
  • a score is calculated and provided in the score providing unit 23.
  • the utterance standby information holding unit 21 assigns a score to each utterance sentence (queue) waiting for the utterance from the information providing apparatus 1 and holds the score.
  • a score of 0.8 is associated with an utterance sentence "A player has shot.”
  • a score of 0.7 is associated with the utterance sentence “A player is from XX”.
  • the utterance standby information holding unit 21 holds the utterance sentences generated by the utterance information generation modules 40A and 40B, respectively, in a state where the utterances are individually assigned.
  • the score associated with the utterance sentence held in the utterance standby information holding unit 21 may be updated as appropriate. For example, the score given to each utterance may not be appropriate depending on the progress of the target content. If the utterance using the utterance sentence held in the utterance standby information holding unit 21 or the voice output from the information providing apparatus 1 is performed in response to the inquiry from the user U, The time is reset. Therefore, the score held in the utterance standby information holding unit 21 may be updated by the score giving unit 23 as appropriate (for example, at a predetermined timing).
  • the utterance determination unit 22 refers to the utterance standby information holding unit 21 periodically (for example, every several hundred milliseconds). Then, the utterance determination unit 22 refers to the utterance sentence having the highest score among the utterance sentences held in the utterance standby information holding unit 21, and determines whether the score given to the utterance sentence is a predetermined threshold value. In the case described above, it is determined that the user U is uttered based on the utterance sentence, and the utterance sentence is selected. That is, the utterance determination unit 22 has a function as an utterance information selection unit that selects an utterance sentence to be provided as voice information to the user U.
  • the utterance determination unit 22 does not need to make an utterance when the score given to the utterance sentence having the highest score among the utterance sentences held in the utterance standby information holding unit 21 is smaller than the threshold. It may be determined that the utterance is not performed until the next opportunity.
  • the utterance sentence determined to be uttered by the utterance determination unit 22 is deleted from the utterance standby information holding unit 21. This can prevent the same utterance sentence from being uttered again. Further, an utterance sentence that has passed a predetermined time after being held in the utterance standby information holding unit 21 can also be deleted from the utterance standby information holding unit 21 as the possibility of utterance has been reduced. With such a configuration, among the utterance sentences held in the utterance standby information holding unit 21, utterance sentences that are not scheduled to be uttered in the future (low probability) are held for a long time, and the data amount increases. Can be prevented.
  • the timing at which the utterance determination unit 22 refers to the utterance standby information holding unit 21 may be appropriately changed according to the presence or absence of an utterance (such as an inquiry) from the user U. For example, when there is an utterance from the user U, the information providing device 1 gives priority to responding to the utterance from the user U, and the spontaneous utterance from the information providing device 1 is omitted. Therefore, in such a state, the utterance determination unit 22 may omit reference to the utterance standby information holding unit 21 itself.
  • the utterance management module 20 also manages a response to an utterance from the user U. Therefore, the utterance management module 20 manages the information to be uttered based on the priority of the utterance and the like so that the response to the utterance from the user U and the spontaneous provision of the voice information can be compatible.
  • FIG. 4 illustrates a procedure related to voluntary provision of an audio information field from the information providing apparatus 1 side. Note that FIG. 4 does not describe a procedure relating to a response to an utterance from the user U.
  • the utterance information generation modules 40A and 40B of the information providing device 1 acquire information on the progress of the target content from the content progress information providing device 50 (S01).
  • the utterance information generation modules 40A and 40B generate an utterance sentence for the information providing apparatus 1 to output voice to the user U based on the content progress information from the content progress information providing apparatus 50 (S02).
  • the generated utterance sentence is sent from the utterance information generation modules 40A and 40B to the utterance management module 20 (S03).
  • a score of the utterance sentence is calculated in the score assigning unit 23 (S04).
  • the adding unit 23 includes a process of adding these.
  • the utterance waiting information holding unit 21 holds the utterance sentence and a score corresponding to the utterance sentence (S05).
  • the score is calculated (S04), and the score is associated with the utterance sentence and held in the utterance standby information holding unit 21. .
  • the score associated with the utterance sentence held in the utterance standby information holding unit 21 may be updated by the score assigning unit 23 as necessary (S06).
  • the utterance determination unit 22 of the utterance management module 20 refers to the utterance standby information storage unit 21 and determines whether to utter the utterance sentence stored in the utterance standby information storage unit 21 (S07).
  • the utterance determination unit 22 determines that no utterance is made, the subsequent processing is not performed, and the utterance determination (S07) is repeated periodically.
  • the utterance sentence to be output as the voice information is sent to the dialogue module 10 by the utterance determination unit 22 (S08). Then, the utterance sentence is converted into a voice by the dialogue module 10 and output to the user U, that is, the utterance is performed (S09).
  • the information providing apparatus 1 is an information providing apparatus that provides voice information to the user U by uttering the utterance sentence that is a candidate to utter the user U.
  • a score assignment unit 23 that calculates and assigns a score that is a numerical value related to the priority of the utterance of the utterance sentence, and associates and holds the utterance sentence and the score assigned to the utterance sentence by the score assignment unit.
  • an utterance determining unit 22 as an utterance information selecting unit that selects an utterance sentence held in association with the highest score among the scores held in the utterance standby information holding unit 21
  • a dialogue module 10 as an output unit that outputs, as speech information, the utterance sentence selected by the utterance determination unit 22 as the utterance information selection unit.
  • a score that is a numerical value related to the priority of the utterance of the utterance is calculated and assigned to each utterance, and is held by the utterance standby information holding unit 21 in association with the utterance.
  • the utterance sentence associated with the highest score is selected by the utterance determination unit 22 as the utterance information selection unit, and is output as voice information by the dialogue module 10 as the output unit.
  • the utterance sentence can be selected and output as audio information based on the score related to the priority by having such a configuration. It is possible to perform more appropriate utterance.
  • the utterance determination unit 22 as the utterance information selection unit stores the utterance sentence held in association with the highest score among the scores held in the utterance standby information holding unit 21. When a predetermined condition is satisfied, the utterance sentence is selected. With such a configuration, it is possible to output, as voice information, an utterance sentence to which a score that not only has the highest score but also satisfies other conditions is given. Therefore, it is possible to give a more appropriate utterance to the user according to the priority.
  • the predetermined condition is whether or not the score is equal to or more than a predetermined threshold.
  • the score assigning unit 23 is configured to calculate a score based on a content score corresponding to the content of the utterance sentence.
  • the score assigning unit 23 is configured to calculate the score based on the elapsed time score related to the elapsed time from the previous utterance by the own device.
  • a score considering the elapsed time can be given. Therefore, for example, it is possible to prevent the utterance from being performed when the elapsed time from the previous utterance is too short, so that it is possible to perform a more appropriate utterance to the user.
  • the score assigning unit 23 is configured to calculate a score based on a situation score relating to the situation of the content to which the audio information is provided.
  • the information providing apparatus 1 described in the above embodiment is not limited to the above configuration, and various changes can be made.
  • each module constituting the information providing device 1 may be an individual device. Further, each module may be configured by a plurality of devices.
  • the score assigned by the score assigning unit 23 includes a content score, an elapsed time score, and the like. , And any of the situation scores may not be included.
  • the score assigning unit 23 may calculate the score based on information different from the above three scores. Further, the score assigning unit 23 may calculate a score by combining the above three scores and a score calculated based on other information. As described above, the method of calculating the score provided by the score providing unit 23 can be appropriately changed.
  • the configuration is described in which the utterance determination unit 22 determines that the utterance is uttered as voice information when the utterance sentence has the highest score and is equal to or greater than a predetermined threshold.
  • the utterance determination unit 22 may be configured to select the utterance sentence having the highest score among the utterance sentences held in the utterance standby information holding unit 21 when other conditions are satisfied.
  • the other condition may be, for example, that the length of the utterance sentence is equal to or less than a predetermined number of characters.
  • Other conditions predetermined conditions may be set in consideration of the function related to the output of audio information from the information providing device 1 and the like.
  • the utterance determination unit 22 may be configured to select the utterance sentence having the highest score from the utterance sentences held in the utterance standby information holding unit 21.
  • the output frequency of the voice information may be adjusted by adjusting the utterance sentence selection timing by the utterance determination unit 22 (periodical utterance sentence selection timing).
  • each functional block may be realized by one device that is physically and / or logically coupled, or two or more devices that are physically and / or logically separated are directly and / or indirectly connected. (For example, wired and / or wireless) and may be realized by these multiple devices.
  • the information providing apparatus 1 may function as a computer that performs processing of the information providing apparatus 1 according to the present embodiment.
  • FIG. 5 is a diagram illustrating an example of a hardware configuration of the information providing apparatus 1 according to the present embodiment.
  • the information providing device 1 described above may be physically configured as a computer device including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like.
  • the term “apparatus” can be read as a circuit, a device, a unit, or the like.
  • the hardware configuration of the information providing apparatus 1 may be configured to include one or more devices illustrated in the drawing, or may be configured without including some devices.
  • the functions of the information providing apparatus 1 are performed by reading predetermined software (program) on hardware such as the processor 1001 and the memory 1002, so that the processor 1001 performs an arithmetic operation, communication by the communication apparatus 1004, and communication between the memory 1002 and the storage. This is realized by controlling the reading and / or writing of data in 1003.
  • the processor 1001 controls the entire computer by operating an operating system, for example.
  • the processor 1001 may be configured by a central processing unit (CPU: Central Processing Unit) including an interface with a peripheral device, a control device, an arithmetic device, a register, and the like.
  • CPU Central Processing Unit
  • each function of the information providing device 1 may be realized by the processor 1001.
  • the processor 1001 reads out a program (program code), a software module, and data from the storage 1003 and / or the communication device 1004 to the memory 1002, and executes various processes according to these.
  • a program program code
  • a software module software module
  • data data from the storage 1003 and / or the communication device 1004 to the memory 1002, and executes various processes according to these.
  • the program a program that causes a computer to execute at least a part of the operation described in the above embodiment is used.
  • each function of the information providing apparatus 1 may be realized by a control program stored in the memory 1002 and operated by the processor 1001.
  • Processor 1001 may be implemented with one or more chips.
  • the program may be transmitted from a network via a telecommunication line.
  • the memory 1002 is a computer-readable recording medium, and is configured by at least one of a ROM (Read Only Memory), an EPROM (Erasable Programmable ROM), an EEPROM (Electrically Erasable Programmable ROM), a RAM (Random Access Memory), and the like. May be done.
  • the memory 1002 may be called a register, a cache, a main memory (main storage device), or the like.
  • the memory 1002 can store a program (program code), a software module, and the like that can be executed to perform the method according to an embodiment of the present disclosure.
  • the storage 1003 is a computer-readable recording medium, for example, an optical disk such as a CD-ROM (Compact Disc), a hard disk drive, a flexible disk, and a magneto-optical disk (for example, a compact disk, a digital versatile disk, a Blu-ray). (Registered trademark) disk), a smart card, a flash memory (for example, a card, a stick, a key drive), a floppy (registered trademark) disk, and a magnetic strip.
  • the storage 1003 may be called an auxiliary storage device.
  • the storage medium described above may be, for example, a database including the memory 1002 and / or the storage 1003, a server, or any other suitable medium.
  • the communication device 1004 is hardware (transmitting / receiving device) for performing communication between computers via a wired and / or wireless network, and is also referred to as, for example, a network device, a network controller, a network card, a communication module, or the like.
  • each function of the information providing device 1 may be realized by the communication device 1004.
  • the input device 1005 is an input device (for example, a keyboard, a mouse, a microphone, a switch, a button, a sensor, and the like) that receives an external input.
  • the output device 1006 is an output device that performs output to the outside (for example, a display, a speaker, an LED lamp, and the like). Note that the input device 1005 and the output device 1006 may have an integrated configuration (for example, a touch panel).
  • the devices such as the processor 1001 and the memory 1002 are connected by a bus 1007 for communicating information.
  • the bus 1007 may be configured by a single bus, or may be configured by a different bus between the devices.
  • the information providing device 1 includes hardware such as a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array). And some or all of the functional blocks may be realized by the hardware.
  • the processor 1001 may be implemented by at least one of these hardware.
  • the notification of information includes physical layer signaling (for example, DCI (Downlink Control Information), UCI (Uplink Control Information)), upper layer signaling (for example, RRC (Radio Resource Control) signaling, MAC (Medium Access Control) signaling, Broadcast information (MIB (Master @ Information @ Block), SIB (System @ Information @ Block))), other signals, or a combination thereof may be used.
  • the RRC signaling may be referred to as an RRC message, and may be, for example, an RRC connection setup (RRC Connection Setup) message, an RRC connection reconfiguration (RRC Connection Reconfiguration) message, or the like.
  • LTE Long Term Evolution
  • LTE-A Long Term Evolution-Advanced
  • SUPER 3G IMT-Advanced
  • 4G 4th generation mobile communication system
  • 5G 5th generation mobile communication
  • FRA Full Radio Access
  • NR new Radio
  • W-CDMA registered trademark
  • GSM registered trademark
  • CDMA2000 Code Division Multiple Access 2000
  • UMB Universal Mobile Broadband
  • IEEE 802.11 Wi-Fi (registered trademark)
  • Systems using IEEE@802.16 WiMAX®
  • IEEE@802.20 UWB (Ultra-WideBand
  • Bluetooth® and other suitable systems and extensions based thereon. It may be applied to at least one of the next generation systems.
  • a plurality of systems may be combined (for example, a combination of at least one of LTE and LTE-A with 5G) and applied.
  • Information and the like can be output from an upper layer (or lower layer) to a lower layer (or upper layer). Input and output may be performed via a plurality of network nodes.
  • Input and output information and the like may be stored in a specific place (for example, a memory) or may be managed by a management table. Information that is input and output can be overwritten, updated, or added. The output information or the like may be deleted. The input information or the like may be transmitted to another device.
  • the determination may be made based on a value (0 or 1) represented by one bit, a Boolean value (Boolean: true or false), or a comparison of numerical values (for example, a predetermined value). Value).
  • Each aspect / embodiment described in the present disclosure may be used alone, may be used in combination, or may be used by switching with execution. Further, the notification of the predetermined information (for example, the notification of “X”) is not limited to being explicitly performed, and is performed implicitly (for example, not performing the notification of the predetermined information). Is also good.
  • software, instructions, and the like may be transmitted and received via a transmission medium.
  • a transmission medium such as coaxial cable, fiber optic cable, twisted pair and digital subscriber line (DSL) and / or a wireless technology such as infrared, wireless and microwave, the website, server, or other When transmitted from a remote source, these wired and / or wireless technologies are included within the definition of transmission medium.
  • the information, signals, etc. described in this disclosure may be represented using any of a variety of different technologies.
  • data, instructions, commands, information, signals, bits, symbols, chips, etc. that can be referred to throughout the above description are not limited to voltages, currents, electromagnetic waves, magnetic or magnetic particles, optical or photons, or any of these. May be represented by a combination of
  • system and “network” are used interchangeably.
  • the information, parameters, and the like described in the present disclosure may be represented by an absolute value, a relative value from a predetermined value, or another corresponding information.
  • determining may encompass a wide variety of operations. “Judgment” and “decision” are, for example, judgment (judging), calculation (calculating), calculation (computing), processing (processing), derivation (deriving), investigating (investigating), searching (looking up) (for example, table , Searching in a database or another data structure), ascertaining what is considered as “determining", “determining", and the like. Also, “determining” and “deciding” include receiving (eg, receiving information), transmitting (eg, transmitting information), input (input), output (output), and access. (accessing) (for example, accessing data in a memory) may be regarded as “determined” or “determined”.
  • ⁇ judgment '' and ⁇ decision '' means that resolving, selecting, selecting, establishing, establishing, comparing, etc. are regarded as ⁇ judgment '' and ⁇ decided ''. May be included.
  • “judgment” and “decision” may include deeming any operation as “judgment” and “determined”.
  • connection means any direct or indirect connection or connection between two or more elements that It may include the presence of one or more intermediate elements between the two elements “connected” or “coupled.”
  • the coupling or connection between the elements may be physical, logical, or a combination thereof.
  • the two elements may be implemented using one or more wires, cables and / or printed electrical connections, and as some non-limiting and non-exhaustive examples, in the radio frequency domain. Can be considered “connected” or “coupled” to each other by using electromagnetic energy, such as electromagnetic energy having wavelengths in the microwave and light (both visible and invisible) regions.
  • the term “A and B are different” may mean that “A and B are different from each other”.
  • the term may mean that “A and B are different from C”.
  • Terms such as “separate”, “coupled” and the like may be interpreted similarly to "different”.
  • DESCRIPTION OF SYMBOLS 1 ... Information provision apparatus, 10 ... Dialog module, 20 ... Utterance management module, 21 ... Utterance standby information holding part, 22 ... Utterance determination part, 23 ... Score giving part, 24 ... Content score calculation part, 25 ... Elapsed time score calculation Unit, 26: situation score calculation unit, 30: response information generation module, 40A, 40B: utterance information generation module.

Abstract

An information presentation device (1) presents speech information to a user U via utterance, the information presentation device comprising: a scoring unit (23) that calculates a score that is a numerical value related to the priority of utterance of an utterance sentence and gives the score to an utterance sentence that is a candidate as an utterance to the user U; a stand-by utterance information holding unit (21) that holds, in association with each other, an utterance sentence and a score given by the scoring unit to the utterance sentence; an utterance determination unit (22) that serves as an utterance information selection unit for selecting the utterance sentence associated with the highest score held in the stand-by utterance information holding unit (21); and a dialog module (10) that serves as an output unit for outputting as speech information the utterance sentence selected by the utterance determination unit (22) that serves as the utterance information selection unit.

Description

情報提供装置Information provision device
 本開示は、情報提供装置に関する。 The present disclosure relates to an information providing device.
 ユーザとの対話を行う対話システムにおいて、ユーザへの応答内容をどのように決定するかについて、種々の検討が進められている(例えば、特許文献1参照)。 Various studies have been made on how to determine the contents of a response to a user in a dialogue system for performing a dialogue with a user (for example, see Patent Document 1).
特開2018-109663号公報Japanese Patent Application Laid-Open No. 2018-109663
 しかしながら、ユーザに対してシステム側から積極的に発話する内容をどのように決定するかについては、検討の余地があった。 However, there was room for study on how to determine the content of the system to actively speak to the user.
 本開示は上記を鑑みてなされたものであり、ユーザに対して、より適切な発話を行うことが可能な情報提供装置を提供する。 The present disclosure has been made in view of the above, and provides an information providing apparatus capable of giving a user a more appropriate utterance.
 上記目的を達成するため、本開示の例示的実施形態に係る情報提供装置は、ユーザに対して発話により音声情報を提供する情報提供装置であって、前記ユーザに対して発話する候補となる発話文に対して、前記発話文の発話の優先度に関係する数値であるスコアを算出して付与するスコア付与部と、前記発話文と、前記発話文に対して前記スコア付与部により付与されたスコアとを対応付けて保持する発話待機情報保持部と、前記発話待機情報保持部に保持されている前記スコアのうち最も高いスコアに対応付けられて保持されている発話文を選択する発話情報選択部と、前記発話情報選択部により選択された発話文を音声情報として出力する出力部と、を有する。 In order to achieve the above object, an information providing apparatus according to an exemplary embodiment of the present disclosure is an information providing apparatus that provides voice information to a user by uttering, and an utterance that is a candidate to utter to the user. A score assigning unit that calculates and assigns a score that is a numerical value related to the priority of the utterance of the utterance sentence to the sentence, the utterance sentence, and the score assignment unit that assigns the utterance sentence to the sentence. An utterance standby information holding unit that holds a score in association with the utterance information selection unit that selects an utterance sentence held in association with the highest score among the scores held in the utterance standby information holding unit And an output unit that outputs the utterance sentence selected by the utterance information selection unit as voice information.
 本開示によれば、ユーザに対して、より適切な発話を行うことが可能な情報提供装置が提供される。 According to the present disclosure, an information providing device capable of giving a user a more appropriate utterance is provided.
情報提供装置の概略構成を説明する図である。FIG. 2 is a diagram illustrating a schematic configuration of an information providing device. 情報提供装置の発話管理モジュールについて説明する図である。FIG. 3 is a diagram illustrating an utterance management module of the information providing device. 情報提供装置の発話待機情報保持部において保持される情報の例を示す図である。FIG. 6 is a diagram illustrating an example of information held in an utterance standby information holding unit of the information providing device. 情報提供装置による情報提供方法について説明するシーケンス図である。It is a sequence diagram explaining an information provision method by an information provision device. 情報提供装置のハードウェア構成を説明する図である。FIG. 2 is a diagram illustrating a hardware configuration of an information providing device.
 以下、添付図面を参照して、本開示の例示的実施形態を説明する。なお、図面の説明においては同一要素には同一符号を付し、重複する説明を省略する。 Hereinafter, exemplary embodiments of the present disclosure will be described with reference to the accompanying drawings. In the description of the drawings, the same elements will be denoted by the same reference symbols, without redundant description.
 図1は、本開示の一実施形態に係る情報提供装置の概略構成を示す図である。図1に示す情報提供装置1は、ユーザUに対して音声で情報を提供する装置である。また、情報提供装置1は、ユーザUからの発話に対応して音声により応答する機能も有する。すなわち、情報提供装置1は、ユーザUとの対話が可能な対話装置として機能する。情報提供装置1は、対話モジュール10と、発話管理モジュール20と、応答情報生成モジュール30と、発話情報生成モジュール40A,40Bと、を含んで構成される。 FIG. 1 is a diagram illustrating a schematic configuration of an information providing apparatus according to an embodiment of the present disclosure. The information providing device 1 illustrated in FIG. 1 is a device that provides information to a user U by voice. In addition, the information providing apparatus 1 has a function of responding by voice in response to an utterance from the user U. That is, the information providing device 1 functions as an interactive device capable of interacting with the user U. The information providing device 1 includes a dialogue module 10, an utterance management module 20, a response information generation module 30, and utterance information generation modules 40A and 40B.
 本実施形態で示す情報提供装置1は、リアルタイムで進行する特定のコンテンツについてユーザUが観賞しているまたは興味を示している場合に、当該コンテンツに対応した情報をユーザUに対して提供することを特徴とする。リアルタイムで進行する特定のコンテンツとは、例えば、スポーツ、競馬、株価変動等が挙げられる。また、気象情報や一般的なニュースなどの一般的な情報、または、日常生活自体を「コンテンツ」として取り扱ってもよい。情報提供装置1は、これらのコンテンツをユーザUが観賞している場合に、コンテンツの進行に応じてユーザUに対して情報を提供する。以下、ユーザUが観賞しているまたは興味を示しているコンテンツを、「対象コンテンツ」という場合がある。情報提供装置1は、対象コンテンツに関する情報をユーザUに対して提供する。 The information providing apparatus 1 according to the present embodiment provides information corresponding to a specific content that progresses in real time to the user U when the user U is watching or interested in the content. It is characterized by. The specific content that progresses in real time includes, for example, sports, horse racing, stock price fluctuation, and the like. Further, general information such as weather information and general news, or everyday life itself may be treated as “content”. When the user U is watching these contents, the information providing apparatus 1 provides information to the user U in accordance with the progress of the contents. Hereinafter, the content that the user U is watching or is interested in may be referred to as “target content”. The information providing device 1 provides information on the target content to the user U.
 上記の場合のユーザUに対する情報提供とは、ユーザUからの質問に対して応答することではなく、情報提供装置1からユーザUに対して自発的に情報を提供することを指す。なお、情報提供装置1は、ユーザUからの質問に対して応答する構成も有している。したがって、情報提供装置1は、ユーザUからの質問に対して応答すると共に、自発的にユーザUに対して情報提供を行う装置である。 情報 Providing information to the user U in the above case does not mean responding to a question from the user U, but refers to voluntarily providing information to the user U from the information providing apparatus 1. The information providing device 1 also has a configuration that responds to a question from the user U. Therefore, the information providing device 1 is a device that responds to a question from the user U and voluntarily provides information to the user U.
 情報提供装置1の対話モジュール10は、ユーザUとの間の音声による情報の送受信のインタフェースとなるモジュールであり、ユーザUの発する音声を受信する機能と、ユーザUに対して音声を発する機能とを有する。すなわち、対話モジュール10は、ユーザUに対して発話により音声情報を提供する出力部としても機能する。 The interaction module 10 of the information providing device 1 is a module serving as an interface for transmitting and receiving information by voice to and from the user U, and has a function of receiving a voice emitted by the user U and a function of emitting a voice to the user U. Having. That is, the dialogue module 10 also functions as an output unit that provides voice information to the user U by speaking.
 ユーザUの発する音声を受信する機能としては、例えば、マイクが挙げられる。また、ユーザUに対して音声を発する機能としては、例えば、スピーカーが挙げられる。また、対話モジュール10では、受信したユーザUの音声の音声認識処理等を行ない、テキストデータ化する機能を有していてもよい。テキストデータ化されたユーザUからの音声情報は、後述の発話管理モジュール20へ送られる。さらに、対話モジュール10では、ユーザUへの発話のために、後述の発話管理モジュール20から提供されるテキストデータの音声合成処理を行う機能を有していてもよい。対話モジュール10が上記の音声認識処理および音声合成処理を行う場合、対話モジュール10と発話管理モジュール20との間での情報(ユーザUに対して提供する情報またはユーザUから取得した情報)の送受信には、テキストデータが用いられる。 機能 As a function of receiving a voice uttered by the user U, for example, a microphone is cited. Further, as a function of emitting a voice to the user U, for example, a speaker can be cited. Further, the dialogue module 10 may have a function of performing voice recognition processing of the received voice of the user U and converting the voice to text data. The voice information from the user U converted into text data is sent to the utterance management module 20 described later. Further, the dialogue module 10 may have a function of performing a speech synthesis process of text data provided from an utterance management module 20 described later for utterance to the user U. When the dialogue module 10 performs the above-described speech recognition processing and speech synthesis processing, transmission and reception of information (information provided to the user U or information obtained from the user U) between the dialogue module 10 and the utterance management module 20 Is text data.
 発話管理モジュール20は、ユーザUに対して発話する音声情報の管理を行う。詳細は後述するが、対象コンテンツの進行状況に応じて、ユーザUに対して、対象コンテンツに関するどのような情報をどのようなタイミングで提供するか、を管理する。また、ユーザUからの問い合わせがあった場合には、ユーザUの問い合わせに対して適切に応答する機能も有する。発話管理モジュール20では、情報提供装置1から問い合わせに対する応答も含む、ユーザUに対する情報提供に係る管理を行う機能を有する。なお、発話管理モジュール20は、応答情報生成モジュール30で生成された応答文、発話情報生成モジュール40A,40Bで生成された発話文を蓄積し、適切なタイミングに、対話モジュール10を介してユーザUに対して出力することに係る制御を行う。この制御の詳細は後述する。 The utterance management module 20 manages voice information uttered to the user U. Although the details will be described later, it manages what information about the target content is provided to the user U at what timing according to the progress of the target content. In addition, it has a function of appropriately responding to an inquiry from the user U when there is an inquiry from the user U. The utterance management module 20 has a function of managing information provision to the user U, including a response to an inquiry from the information provision device 1. The utterance management module 20 accumulates the response sentence generated by the response information generation module 30 and the utterance sentence generated by the utterance information generation modules 40A and 40B, and stores the user U via the dialog module 10 at an appropriate timing. Is performed for output to Details of this control will be described later.
 応答情報生成モジュール30は、ユーザUから問い合わせがあった内容に対する応答文を作成するモジュールである。応答情報生成モジュール30では、発話管理モジュール20から送られるユーザUからの問い合わせに対する応答文を生成する。なお、応答文の生成の際に、外部装置等との間で通信等を行ない必要な情報を取得する構成であってもよい。応答情報生成モジュール30において生成された応答文は、発話管理モジュール20へ送られて、ユーザUへの出力が行われる。 The response information generation module 30 is a module that generates a response sentence for the content of an inquiry from the user U. The response information generation module 30 generates a response sentence to the inquiry from the user U sent from the utterance management module 20. When generating the response sentence, a configuration may be employed in which communication or the like is performed with an external device or the like to obtain necessary information. The response sentence generated by the response information generation module 30 is sent to the utterance management module 20 and output to the user U.
 発話情報生成モジュール40A,40Bは、いずれも、ユーザUに対して自発的に発話する発話文を生成する機能を有する。情報提供装置1では、2つの発話情報生成モジュール40A,40Bにより構成されている例を示すが、発話情報生成モジュールの数は1つでも3つ以上の複数でもよい。情報提供装置1では、発話情報生成モジュール40Aは、対象コンテンツの進行に直結した情報を含む発話文を生成するモジュールであり、発話情報生成モジュール40Bは、対象コンテンツの進行に直接は関係しないがコンテンツに関係する情報に係る発話文を生成するモジュールである、とする。ただし、複数の発話情報生成モジュールを有している場合、上記のような取り扱いの区分けを行っていても行っていなくてもよい。 Each utterance information generation module 40A, 40B has a function of generating an utterance sentence that spontaneously utters to the user U. The information providing apparatus 1 shows an example in which two utterance information generation modules 40A and 40B are used, but the number of utterance information generation modules may be one or three or more. In the information providing apparatus 1, the utterance information generation module 40A is a module that generates an utterance sentence including information directly related to the progress of the target content, and the utterance information generation module 40B is a module that is not directly related to the progress of the target content, It is assumed that the module is a module that generates an utterance sentence related to information related to. However, when a plurality of utterance information generation modules are provided, the classification of the above-described handling may or may not be performed.
 発話情報生成モジュール40Aは、対象コンテンツの進行に係る情報に係る発話文を生成する。発話情報生成モジュール40Aは、発話文を生成するために、対象コンテンツの進行に係る情報を提供する外部装置である、コンテンツ進行情報提供装置50からの情報を取得する。コンテンツ進行情報提供装置50は、対象コンテンツの進行状況を示す情報を文章とは異なるデータ形式で提供する装置である。対象コンテンツの進行状況を示す情報とは、例えば、対象コンテンツがスポーツの場合にはプレイの詳細内容を示す情報が挙げられる。また、対象コンテンツが株価変動の場合には、価格変動が大きい株に係る情報が挙げられる。コンテンツ進行情報提供装置50からこれらの情報を取得すると、発話情報生成モジュール40Aは、これらの情報を説明するための文章(自然文)を生成する。 The utterance information generation module 40A generates an utterance sentence relating to information relating to the progress of the target content. The utterance information generation module 40A acquires information from the content progress information providing device 50, which is an external device that provides information relating to the progress of the target content, in order to generate an utterance sentence. The content progress information providing device 50 is a device that provides information indicating the progress of the target content in a data format different from that of the text. The information indicating the progress of the target content includes, for example, information indicating the details of play when the target content is a sport. Further, when the target content is a stock price change, information on a stock having a large price change is included. When acquiring these pieces of information from the content progress information providing device 50, the utterance information generating module 40A generates a sentence (natural sentence) for explaining the pieces of information.
 発話情報生成モジュール40Bは、対象コンテンツに関連する情報に係る発話文を生成する。発話情報生成モジュール40Bは、発話文を生成するために、対象コンテンツの進行に係る情報を提供する外部装置である、コンテンツ進行情報提供装置50からの情報を取得する。また、コンテンツ進行情報提供装置50とは異なる装置である外部DB60(データベース)等から情報を取得してもよい。なお、対象コンテンツに関連する情報とは、例えば、対象コンテンツがスポーツの場合には特定のプレイを行ったプレイヤーに係る情報、または、特定のプレイを説明する情報等が挙げられる。また、対象コンテンツが株価変動の場合には、価格変動が大きい株と同じ業種の株価に関する情報、または、関連企業の情報等が挙げられる。コンテンツ進行情報提供装置50において出力される対象コンテンツの進行に係る情報、および、外部DB60において出力される情報等を組み合わせて、発話情報生成モジュール40Bは、これらの情報を説明するための文章を生成する。 The utterance information generation module 40B generates an utterance sentence relating to information related to the target content. The utterance information generation module 40B acquires information from the content progress information providing device 50, which is an external device that provides information relating to the progress of the target content, in order to generate an utterance sentence. Further, information may be acquired from an external DB 60 (database) or the like, which is a device different from the content progress information providing device 50. Note that the information related to the target content includes, for example, information relating to a player who has performed a specific play when the target content is a sport, information describing a specific play, and the like. When the target content is a stock price fluctuation, information on stock prices in the same industry as the stock whose price fluctuation is large, information on related companies, and the like are given. By combining information relating to the progress of the target content output from the content progress information providing device 50, information output from the external DB 60, and the like, the utterance information generation module 40B generates a sentence for explaining the information. I do.
 発話情報生成モジュール40A,40Bにおいて生成された応答文は、発話管理モジュール20へ送られて、ユーザUへの出力が行われる。 The response sentence generated in the utterance information generation modules 40A and 40B is sent to the utterance management module 20 and output to the user U.
 次に、発話管理モジュール20について、図2を参照しながらさらに説明する。図2に示すように、発話管理モジュール20は、発話待機情報保持部21、発話判定部22、および、スコア付与部23を有する。スコア付与部23は、内容スコア算出部24、経過時間スコア算出部25、および、状況スコア算出部26を有する。 Next, the utterance management module 20 will be further described with reference to FIG. As shown in FIG. 2, the utterance management module 20 includes an utterance standby information holding unit 21, an utterance determination unit 22, and a score providing unit 23. The score assigning unit 23 includes a content score calculating unit 24, an elapsed time score calculating unit 25, and a situation score calculating unit 26.
 発話待機情報保持部21は、発話情報生成モジュール40A,40Bから送られる発話文を保持する機能を有する。すなわち、ユーザUに対して発話により音声情報を提供する候補となる情報である発話文を保持する機能を有する。発話判定部22は、ユーザUに対して発話により音声情報を提供するか否かを判定すると共に、発話する場合には、発話待機情報保持部21において保持される発話文から、ユーザUに対して音声情報として出力する(発話する)発話文を選択する機能を有する。スコア付与部23は、発話待機情報保持部21において保持される発話文それぞれに対して、スコアを付与する。スコアとは、発話の優先度に関係する数値である。付与されているスコアが高い発話文ほど、ユーザUに対して音声情報として提供することが好ましい情報であるといえる。換言すると、スコアは、ユーザUの周辺環境等を考慮して設定される値であり、コンテキストを考慮した数値であるともいえる。 The utterance standby information holding unit 21 has a function of holding the utterance sent from the utterance information generation modules 40A and 40B. That is, it has a function of holding an utterance sentence that is information that is a candidate for providing voice information to the user U by utterance. The utterance determination unit 22 determines whether or not to provide voice information to the user U by utterance, and when uttering, from the utterance sentence held in the utterance standby information holding unit 21, A function to select an utterance sentence to be output (uttered) as voice information. The score assigning unit 23 assigns a score to each utterance sentence held in the utterance standby information holding unit 21. The score is a numerical value related to the priority of the utterance. It can be said that it is preferable that an utterance sentence with a higher assigned score be provided as voice information to the user U. In other words, the score is a value set in consideration of the surrounding environment and the like of the user U, and can be said to be a numerical value in consideration of the context.
 発話情報生成モジュール40A,40Bでは、対象コンテンツの進行状況に応じて、種々の発話文が生成される。ただし、全ての発話文を情報提供装置1がユーザUに対して出力すると、情報提供装置1から出力される音声情報が過多となる可能性がある。また、ユーザUからの問い合わせがあった場合には、問い合わせへの応答を優先することが望まれる。したがって、情報提供装置1から出力する音声情報のうち、発話文の量は適宜調整を図る必要がある。 The utterance information generation modules 40A and 40B generate various utterance sentences in accordance with the progress of the target content. However, when the information providing apparatus 1 outputs all utterances to the user U, there is a possibility that the voice information output from the information providing apparatus 1 becomes excessive. Also, when there is an inquiry from the user U, it is desired to give priority to the response to the inquiry. Therefore, it is necessary to appropriately adjust the amount of the utterance sentence in the voice information output from the information providing apparatus 1.
 また、情報提供装置1から出力する音声情報の内容も状況(特にコンテンツの進行状況)に応じて適宜変化させることが望まれる。例えば、対象コンテンツが緊迫した状況では、ユーザUは、対象コンテンツの関連情報よりも対象コンテンツの進行自体を説明する情報を知りたいことも考えられる。一方、対象コンテンツの変化が乏しい状況では、対象コンテンツの進行を説明する情報が少なくなるので、対象コンテンツの関連情報を提供することでユーザUの興味を引くことも考えられる。 It is also desirable that the content of the audio information output from the information providing apparatus 1 be appropriately changed according to the situation (particularly, the progress of the content). For example, in a situation where the target content is tense, the user U may want to know information describing the progress of the target content itself rather than related information of the target content. On the other hand, in a situation where the change of the target content is scarce, the information explaining the progress of the target content is reduced. Therefore, providing the relevant information of the target content may attract the user U.
 上記の点を鑑み、情報提供装置1では、発話情報生成モジュール40A,40Bで生成される発話文を全て発話するのではなく、対象コンテンツの進行状況等を考慮して発話文を選択して出力する。この管理を行うのが発話管理モジュール20となる。 In view of the above, in the information providing apparatus 1, the utterance sentence generated by the utterance information generation modules 40A and 40B is not all uttered, but the utterance sentence is selected and output in consideration of the progress of the target content. I do. The utterance management module 20 performs this management.
 発話管理モジュール20の発話待機情報保持部21では、発話情報生成モジュール40A,40Bにおいて生成された発話文(発話の候補となる発話文)を蓄積する。そして、発話情報生成モジュール40A,40Bに蓄積された発話文のうち、どの発話文を音声情報として出力するかを発話判定部22が判定する。この判定に用いられるのがスコア付与部23において算出して付与されるスコアである。発話判定部22は、発話待機情報保持部21において保持される発話文毎に、スコア付与部23により付与されたスコアを参照し、スコアが最上位であり(最も高く)、かつ、所定の閾値以上である発話文を音声情報として出力する発話文として選択する。 The utterance standby information holding unit 21 of the utterance management module 20 accumulates the utterances generated in the utterance information generation modules 40A and 40B (utterances as utterance candidates). Then, the utterance determination unit 22 determines which utterance sentence among the utterance sentences stored in the utterance information generation modules 40A and 40B is output as the voice information. What is used for this determination is a score calculated and assigned by the score assigning unit 23. The utterance determination unit 22 refers to the score assigned by the score assigning unit 23 for each utterance sentence held in the utterance standby information holding unit 21, and the score is the highest (highest) and a predetermined threshold The utterance sentence described above is selected as an utterance sentence to be output as voice information.
 スコア付与部23では、主に3つの要素に基づいてスコアを算出し、これらを合算して発話文毎のスコアを算出する。3つの要素は、「内容スコア」、「経過時間スコア」、および、「状況スコア」であり、これらの各要素に係るスコアを、内容スコア算出部24、経過時間スコア算出部25、および、状況スコア算出部26において算出する。 The score assigning unit 23 calculates a score based mainly on three elements, and calculates a score for each utterance sentence by adding the scores. The three elements are a “content score”, an “elapsed time score”, and a “situation score”, and the scores associated with these elements are calculated by a content score calculation unit 24, an elapsed time score calculation unit 25, and a status The score is calculated by the score calculator 26.
 「内容スコア」とは、発話文に含まれる内容に基づいて算出される数値である。発話文には、複数の単語が含まれている。内容スコアは、これらの複数の単語に基づいて付与されるスコアである。内容スコアの付与の仕方は特に限定されないが、簡単な方法としては、対象コンテンツに関連する単語について単語毎に予め付与するスコアを決めておき、発話文に含まれる単語毎に対応するスコアを加算していく方法が挙げられる。また、ディープラーニング等の機械学習に係る手法を利用して、発話文毎に特徴量を算出し、この特徴量をスコアとする方法を用いてもよい。例えば、単語の集合により構成されている文と、この文の重要度に対応付けられたスコア(0~1の確率として示された数値)と、の組を複数準備して、教師データを作成し、この教師データを利用して、CNN(Convolution Neural Network)等のディープラーニング手法を用いて、対象となる発話文のスコアを算出する構成としてもよい。スコアの算出に用いるディープラーニングの手法または機械学習の手法は特に限定されない。 "Content score" is a numerical value calculated based on the content included in the utterance sentence. The utterance sentence includes a plurality of words. The content score is a score given based on these plural words. The method of assigning the content score is not particularly limited, but a simple method is to determine a score to be assigned in advance for each word for a word related to the target content and add a score corresponding to each word included in the utterance sentence. There is a way to do it. Also, a method may be used in which a feature amount is calculated for each utterance sentence using a technique related to machine learning such as deep learning, and the feature amount is used as a score. For example, a plurality of pairs of a sentence composed of a set of words and a score (a numerical value indicated as a probability of 0 to 1) associated with the importance of the sentence are prepared, and teacher data is created. The score of the target utterance sentence may be calculated by using the teacher data and using a deep learning method such as CNN (Convolution Neural Network). The deep learning method or the machine learning method used for calculating the score is not particularly limited.
 「経過時間スコア」とは、情報提供装置1が前回ユーザUに対して音声情報を出力してからの経過時間に対応する数値である。情報提供装置1では、ユーザUに対して前回音声情報を出力した時点からの時間の長さに応じてスコアを変化させる。経過時間に応じて付与するスコアについては、例えば、経過時間スコア算出部25において、予め経過時間に基づくスコアの算出式等を準備しておき、算出式に基づいて自動的に付与する数値を決定する方法が挙げられる。ただし、この方法には限定されない。 The “elapsed time score” is a numerical value corresponding to the elapsed time since the information providing apparatus 1 previously output the voice information to the user U. The information providing apparatus 1 changes the score according to the length of time since the last time the audio information was output to the user U. For the score to be assigned according to the elapsed time, for example, the elapsed time score calculation unit 25 prepares a formula for calculating a score based on the elapsed time in advance, and determines a numerical value to be automatically assigned based on the calculation formula. Method. However, it is not limited to this method.
 「状況スコア」とは、対象コンテンツの進行状況に基づいて決定される、対象コンテンツの状況に応じた情報提供の必要性を示す数値である。例えば、対象コンテンツが緊迫した状況では、ユーザUは、対象コンテンツに係る音声情報に対する要求が高くなる可能性があるため、スコアを高く設定すると考えることができる。また、対象コンテンツが緊迫していない状況では、対象コンテンツに係る関連情報についてもユーザUが興味を示す可能性があるため、関連情報に対してスコアを高く設定するという考え方もできる。このように、状況スコアとは、対象コンテンツの進行状況に応じて設定されるスコアである。 The “situation score” is a numerical value that is determined based on the progress of the target content and that indicates the necessity of providing information according to the status of the target content. For example, in a situation where the target content is tense, the user U may set a higher score because there is a possibility that a request for audio information related to the target content may increase. In addition, in a situation where the target content is not urgent, the user U may be interested in the related information related to the target content. Therefore, it is possible to set a higher score for the related information. As described above, the status score is a score set according to the progress of the target content.
 なお、状況スコアは、上述の通り対象コンテンツの進行状況に基づいて設定される。したがって、状況スコア算出部26は、発話情報生成モジュール40A,40Bが取得するコンテンツ進行情報提供装置50からのコンテンツの進行に係る情報を利用して、状況スコアを算出する構成としてもよい。このような構成の場合、状況スコア算出部26では、状況スコアを算出する方法として、コンテンツ進行情報提供装置50から提供されるコンテンツの進行に係る情報に含まれて、対象コンテンツの進行状況との関連性が高い単語に対して予め特定のスコアを決めておく方法を用いることができる。例えば、対象コンテンツがサッカーである場合には、大きなイベントとなると考えられる「ゴール」に対してスコア1.0を付与し、サッカーでは頻発しがちな「パス」に対してはスコア0.1を付与する、という手法が考えられる。また、サッカーの場合には、プレイ中のボールの位置等をスコアに変換して、利用するという手法が考えられる。また、対象コンテンツが株価変動である場合には、例えば、株価の「暴落」または「ストップ高」などが、スコアを付与する対象となる単語になり得る。 Note that the status score is set based on the progress status of the target content as described above. Therefore, the situation score calculation unit 26 may be configured to calculate the situation score using the information on the progress of the content from the content progress information providing device 50 acquired by the utterance information generation modules 40A and 40B. In the case of such a configuration, the situation score calculation unit 26 includes, as a method of calculating the situation score, information included in the information related to the progress of the content provided from the content progress information providing apparatus 50, and A method in which a specific score is determined in advance for a highly relevant word can be used. For example, when the target content is soccer, a score of 1.0 is given to a “goal” considered to be a big event, and a score of 0.1 is given to a “pass” that tends to occur frequently in soccer. The method of giving is considered. In the case of soccer, a method of converting the position of the ball being played into a score and using the score may be considered. Further, when the target content is a stock price fluctuation, for example, a “fall” or “stop height” of the stock price may be a word to which a score is given.
 なお、対象コンテンツが日常生活全般のような一般的な状況である場合にも、状況スコアの付与はできる。具体的には、情報提供装置1からの音声情報の提供をユーザUが望むと想定される事象(例えば、天候の変化を示す情報)を示す単語には、予め高い値のスコアを設定し、音声情報の提供をユーザUが望まないと想定される事象(例えば、ユーザUの外出予定を示す情報など)を示す単語には、予め低い値のスコアを設定しておくことができる。このような構成とすることで、対象コンテンツがスポーツなどの特定のカテゴリのものに限定されない場合でも、状況スコアの付与が可能となる。 状況 In addition, even when the target content is a general situation such as everyday life in general, a situation score can be given. Specifically, a high-value score is set in advance for a word indicating an event (for example, information indicating a change in weather) that is assumed that the user U desires to provide audio information from the information providing apparatus 1, A low score can be set in advance to a word indicating an event that is assumed that the user U does not want to provide the voice information (for example, information indicating that the user U is going out). With such a configuration, even when the target content is not limited to a specific category such as sports, a situation score can be given.
 状況スコア算出部26では、例えば、発話文の生成のタイミングで、コンテンツ進行情報提供装置50から提供されるコンテンツの進行に係る情報のうち、直近の(または直近から所定の数の)情報から、スコアが決められた特定の単語を取り出し、そのスコアの合計を算出して、状況スコアとする方法を用いてもよい。このように、状況スコアの付与の方法は特に限定されず、種々の方法を用いることができる。また、状況スコアの付与についても、内容スコアと同様に、機械学習等の手法を用いてもよい。 In the situation score calculation unit 26, for example, at the timing of the generation of the utterance sentence, among the information related to the progress of the content provided from the content progress information providing device 50, the latest (or a predetermined number from the latest) information is used. A method may be employed in which a specific word having a determined score is taken out, the sum of the scores is calculated, and the sum is used as a situation score. As described above, the method of giving the situation score is not particularly limited, and various methods can be used. In addition, a technique such as machine learning may be used for giving the situation score, similarly to the content score.
 なお、発話情報生成モジュール40A,40Bが取得する情報とは別の情報に基づいて状況スコアを算出する構成としてもよい。 Note that the situation score may be calculated based on information different from the information acquired by the utterance information generation modules 40A and 40B.
 各部で算出されるスコアについては、0以上の数値で表現してもよいし、確率(0~1)で表現してもよい。 ス コ ア The score calculated by each unit may be represented by a numerical value of 0 or more, or may be represented by a probability (0 to 1).
 上記のように、内容スコア算出部24、経過時間スコア算出部25、および、状況スコア算出部26では、一の発話文に対して互いに異なる観点からスコアを算出する。そして、スコア付与部23では、各部で算出されたスコアを合算することで、一の発話文に対するスコアを算出する。スコアの合算方法は特に限定されず、例えば、単純な加算としてもよい。また、内容スコア、経過時間スコア、および状況スコアが、それぞれ確率(0~1)で算出される場合には、アンダーフローを避けるために、各スコアの対数の合計を求めて、一の発話文に対するスコアとしてもよい。 As described above, the content score calculation unit 24, the elapsed time score calculation unit 25, and the situation score calculation unit 26 calculate scores for one utterance sentence from different viewpoints. Then, the score assigning unit 23 calculates a score for one utterance sentence by adding the scores calculated by the respective units. The method of adding the scores is not particularly limited, and may be, for example, a simple addition. When the content score, elapsed time score, and situation score are each calculated with a probability (0 to 1), the total logarithm of each score is calculated to avoid underflow, and one utterance sentence is calculated. It may be a score for.
 発話情報生成モジュール40A,40Bにおいて生成されて、発話待機情報保持部21において保持される発話文毎に、スコア付与部23においてスコアが算出されて付与される。この結果、図3に示すように、発話待機情報保持部21では、情報提供装置1からの発話を待機している発話文(キュー)に対してそれぞれスコアが付与されて保持されている。図3に示す例では、「A選手がシュートしました。」という発話文に、スコア0.8が対応付けられている。また、「A選手は、○○出身です。」という発話文には、スコア0.7が対応付けられている。このように、発話待機情報保持部21では、発話情報生成モジュール40A,40Bにおいてそれぞれ生成された発話文について、個別にスコアが付与された状態で保持されている。 (4) For each utterance sentence generated in the utterance information generation modules 40A and 40B and stored in the utterance standby information storage unit 21, a score is calculated and provided in the score providing unit 23. As a result, as shown in FIG. 3, the utterance standby information holding unit 21 assigns a score to each utterance sentence (queue) waiting for the utterance from the information providing apparatus 1 and holds the score. In the example shown in FIG. 3, a score of 0.8 is associated with an utterance sentence "A player has shot." Further, a score of 0.7 is associated with the utterance sentence “A player is from XX”. As described above, the utterance standby information holding unit 21 holds the utterance sentences generated by the utterance information generation modules 40A and 40B, respectively, in a state where the utterances are individually assigned.
 なお、発話待機情報保持部21で保持されている発話文に対応付けられているスコアは、適宜更新されてもよい。例えば、対象コンテンツの進行状況によっては、各発話文に付与されているスコアが適切ではなくなる場合がある。また、発話待機情報保持部21で保持されている発話文を用いた発話、あるいは、ユーザUからの問い合わせに対して応答のために、情報提供装置1から音声出力を行った場合には、経過時間がリセットされる。そのため、発話待機情報保持部21で保持されるスコアは、適宜(例えば、所定のタイミングで)スコア付与部23により更新されてもよい。 The score associated with the utterance sentence held in the utterance standby information holding unit 21 may be updated as appropriate. For example, the score given to each utterance may not be appropriate depending on the progress of the target content. If the utterance using the utterance sentence held in the utterance standby information holding unit 21 or the voice output from the information providing apparatus 1 is performed in response to the inquiry from the user U, The time is reset. Therefore, the score held in the utterance standby information holding unit 21 may be updated by the score giving unit 23 as appropriate (for example, at a predetermined timing).
 発話判定部22では、発話待機情報保持部21を定期的(例えば、数百m秒おき)に参照する。そして、発話判定部22は、発話待機情報保持部21に保持されている発話文のうち最もスコアが高い発話文を参照し、当該発話文に対して付与されているスコアが、予め定めた閾値以上である場合には、当該発話文に基づいてユーザUに対して発話すると判定し、当該発話文を選択する。すなわち、発話判定部22は、ユーザUに対して音声情報として提供する発話文を選択する発話情報選択部としての機能を有する。 The utterance determination unit 22 refers to the utterance standby information holding unit 21 periodically (for example, every several hundred milliseconds). Then, the utterance determination unit 22 refers to the utterance sentence having the highest score among the utterance sentences held in the utterance standby information holding unit 21, and determines whether the score given to the utterance sentence is a predetermined threshold value. In the case described above, it is determined that the user U is uttered based on the utterance sentence, and the utterance sentence is selected. That is, the utterance determination unit 22 has a function as an utterance information selection unit that selects an utterance sentence to be provided as voice information to the user U.
 なお、発話判定部22は、発話待機情報保持部21に保持されている発話文のうち最もスコアが高い発話文に対して付与されているスコアが閾値よりも小さい場合には発話が不要であると判断し、次の機会まで発話を行わない構成としてもよい。 Note that the utterance determination unit 22 does not need to make an utterance when the score given to the utterance sentence having the highest score among the utterance sentences held in the utterance standby information holding unit 21 is smaller than the threshold. It may be determined that the utterance is not performed until the next opportunity.
 発話判定部22により発話すると判定された発話文は、発話待機情報保持部21から消去される。これにより、同じ発話文が再度発話されることを防ぐことができる。また、発話待機情報保持部21に保持されてから所定時間経過した発話文も、発話可能性が低くなったとして発話待機情報保持部21から消去することができる。このような構成とすることで、発話待機情報保持部21に保持される発話文のうち、今後発話する予定がない(見込みが低い)発話文が長時間保持されて、データ量が増大することを防ぐことができる。 The utterance sentence determined to be uttered by the utterance determination unit 22 is deleted from the utterance standby information holding unit 21. This can prevent the same utterance sentence from being uttered again. Further, an utterance sentence that has passed a predetermined time after being held in the utterance standby information holding unit 21 can also be deleted from the utterance standby information holding unit 21 as the possibility of utterance has been reduced. With such a configuration, among the utterance sentences held in the utterance standby information holding unit 21, utterance sentences that are not scheduled to be uttered in the future (low probability) are held for a long time, and the data amount increases. Can be prevented.
 なお、発話判定部22が発話待機情報保持部21を参照するタイミングは、ユーザUからの発話(問い合わせなど)の有無等に応じて適宜変更してもよい。例えば、ユーザUからの発話あった場合には、情報提供装置1ではユーザUからの発話に応答することが優先され、情報提供装置1からの自発的な発話は省略される。したがって、このような状態では、発話判定部22は、発話待機情報保持部21の参照自体を省略してもよい。情報提供装置1において、発話管理モジュール20は、ユーザUからの発話に対する応答についても管理する。したがって、発話管理モジュール20では、発話の優先度等に基づいて、ユーザUからの発話に対する応答、および、自発的な音声情報の提供を両立できるように、発話する情報の管理を行う。 Note that the timing at which the utterance determination unit 22 refers to the utterance standby information holding unit 21 may be appropriately changed according to the presence or absence of an utterance (such as an inquiry) from the user U. For example, when there is an utterance from the user U, the information providing device 1 gives priority to responding to the utterance from the user U, and the spontaneous utterance from the information providing device 1 is omitted. Therefore, in such a state, the utterance determination unit 22 may omit reference to the utterance standby information holding unit 21 itself. In the information providing apparatus 1, the utterance management module 20 also manages a response to an utterance from the user U. Therefore, the utterance management module 20 manages the information to be uttered based on the priority of the utterance and the like so that the response to the utterance from the user U and the spontaneous provision of the voice information can be compatible.
 次に、図4を参照しながら、情報提供装置1における情報提供方法について説明する。図4では、情報提供装置1側からの自発的な音声情報野提供に係る手順を説明している。なお、図4では、ユーザUからの発話に対する応答に係る手順についての説明は行っていない。 Next, an information providing method in the information providing apparatus 1 will be described with reference to FIG. FIG. 4 illustrates a procedure related to voluntary provision of an audio information field from the information providing apparatus 1 side. Note that FIG. 4 does not describe a procedure relating to a response to an utterance from the user U.
 図4に示すように、情報提供装置1の発話情報生成モジュール40A,40Bでは、コンテンツ進行情報提供装置50から対象コンテンツの進行に係る情報を取得する(S01)。発話情報生成モジュール40A,40Bでは、コンテンツ進行情報提供装置50からのコンテンツの進行情報に基づいて、情報提供装置1がユーザUに対して音声出力するための発話文を生成する(S02)。生成された発話文は、発話情報生成モジュール40A,40Bから発話管理モジュール20へ送られる(S03)。 As shown in FIG. 4, the utterance information generation modules 40A and 40B of the information providing device 1 acquire information on the progress of the target content from the content progress information providing device 50 (S01). The utterance information generation modules 40A and 40B generate an utterance sentence for the information providing apparatus 1 to output voice to the user U based on the content progress information from the content progress information providing apparatus 50 (S02). The generated utterance sentence is sent from the utterance information generation modules 40A and 40B to the utterance management module 20 (S03).
 発話管理モジュール20では、取得した発話文について、まず、スコア付与部23において、発話文のスコアを算出する(S04)。スコアの算出には、内容スコア算出部24での内容スコアの算出処理、経過時間スコア算出部25での経過時間スコアの算出処理、状況スコア算出部26での状況スコアの算出処理、および、スコア付与部23においてこれらを合算する処理が含まれる。スコアを算出した後、発話待機情報保持部21において、発話文および発話文に対応するスコアが保持される(S05)。なお、発話文を発話待機情報保持部21に保持した(S05)後に、スコアの算出(S04)を行い、スコアを発話文に対応付けて発話待機情報保持部21にて保持する構成としてもよい。 In the utterance management module 20, for the acquired utterance sentence, first, a score of the utterance sentence is calculated in the score assigning unit 23 (S04). To calculate the score, a content score calculation process in the content score calculation unit 24, an elapsed time score calculation process in the elapsed time score calculation unit 25, a status score calculation process in the status score calculation unit 26, and a score The adding unit 23 includes a process of adding these. After calculating the score, the utterance waiting information holding unit 21 holds the utterance sentence and a score corresponding to the utterance sentence (S05). After the utterance sentence is held in the utterance standby information holding unit 21 (S05), the score is calculated (S04), and the score is associated with the utterance sentence and held in the utterance standby information holding unit 21. .
 発話管理モジュール20では、必要に応じて、スコア付与部23により、発話待機情報保持部21において保持される発話文に対応付けられたスコアを更新(S06)してもよい。 In the utterance management module 20, the score associated with the utterance sentence held in the utterance standby information holding unit 21 may be updated by the score assigning unit 23 as necessary (S06).
 その後、発話管理モジュール20の発話判定部22は、発話待機情報保持部21を参照し、発話待機情報保持部21に保持された発話文の発話を行うかを判定する(S07)。発話判定部22が発話を行わない都判定した場合には、以降の処理は行われず、定期的に発話判定(S07)が繰り返される。一方、発話を行うと判定した場合には、発話判定部22により、音声情報として出力する発話文が対話モジュール10へ送られる(S08)。そして、対話モジュール10により発話文が音声へと変換されて、ユーザUへの出力、すなわち、発話が行われる(S09)。 After that, the utterance determination unit 22 of the utterance management module 20 refers to the utterance standby information storage unit 21 and determines whether to utter the utterance sentence stored in the utterance standby information storage unit 21 (S07). When the utterance determination unit 22 determines that no utterance is made, the subsequent processing is not performed, and the utterance determination (S07) is repeated periodically. On the other hand, when it is determined that the utterance is performed, the utterance sentence to be output as the voice information is sent to the dialogue module 10 by the utterance determination unit 22 (S08). Then, the utterance sentence is converted into a voice by the dialogue module 10 and output to the user U, that is, the utterance is performed (S09).
 以上のように、本実施形態に係る情報提供装置1は、ユーザUに対して発話により音声情報を提供する情報提供装置であって、ユーザUに対して発話する候補となる発話文に対して、発話文の発話の優先度に関係する数値であるスコアを算出して付与するスコア付与部23と、発話文と、発話文に対してスコア付与部により付与されたスコアとを対応付けて保持する発話待機情報保持部21と、発話待機情報保持部21に保持されているスコアのうち最も高いスコアに対応付けられて保持されている発話文を選択する発話情報選択部としての発話判定部22と、発話情報選択部としての発話判定部22により選択された発話文を音声情報として出力する出力部としての対話モジュール10と、を有する。 As described above, the information providing apparatus 1 according to the present embodiment is an information providing apparatus that provides voice information to the user U by uttering the utterance sentence that is a candidate to utter the user U. A score assignment unit 23 that calculates and assigns a score that is a numerical value related to the priority of the utterance of the utterance sentence, and associates and holds the utterance sentence and the score assigned to the utterance sentence by the score assignment unit. And an utterance determining unit 22 as an utterance information selecting unit that selects an utterance sentence held in association with the highest score among the scores held in the utterance standby information holding unit 21 And a dialogue module 10 as an output unit that outputs, as speech information, the utterance sentence selected by the utterance determination unit 22 as the utterance information selection unit.
 上記の情報提供装置1によれば、発話文の発話の優先度に関係する数値であるスコアを算出して発話文毎に付与し、発話文に対応付けて発話待機情報保持部21により保持される。また、最も高いスコアに対応付けられた発話文が、発話情報選択部としての発話判定部22により選択されて、出力部としての対話モジュール10により音声情報として出力される。情報提供装置1では、このような構成を有することで、優先度に関係したスコアに基づいて、発話文が選択されて音声情報として出力することができるから、優先度に応じてユーザに対してより適切な発話を行うことが可能となる。 According to the information providing apparatus 1 described above, a score that is a numerical value related to the priority of the utterance of the utterance is calculated and assigned to each utterance, and is held by the utterance standby information holding unit 21 in association with the utterance. You. The utterance sentence associated with the highest score is selected by the utterance determination unit 22 as the utterance information selection unit, and is output as voice information by the dialogue module 10 as the output unit. In the information providing apparatus 1, the utterance sentence can be selected and output as audio information based on the score related to the priority by having such a configuration. It is possible to perform more appropriate utterance.
 また、情報提供装置1では、発話情報選択部としての発話判定部22において、発話待機情報保持部21に保持されているスコアのうち最も高いスコアに対応付けられて保持されている発話文が、所定の条件を満たす場合に、当該発話文を選択する。このような構成を有していることで、スコアが最も高いだけでなく、そのほかの条件も満たしたスコアが付与された発話文を音声情報として出力することができる。したがって、優先度に応じてユーザに対してより適切な発話を行うことが可能となる。 In the information providing apparatus 1, the utterance determination unit 22 as the utterance information selection unit stores the utterance sentence held in association with the highest score among the scores held in the utterance standby information holding unit 21. When a predetermined condition is satisfied, the utterance sentence is selected. With such a configuration, it is possible to output, as voice information, an utterance sentence to which a score that not only has the highest score but also satisfies other conditions is given. Therefore, it is possible to give a more appropriate utterance to the user according to the priority.
 なお、上記実施形態では、所定の条件は、スコアが所定の閾値以上であるか否かとされている。このような構成とすることで、スコアが所定の閾値より大きい、すなわち、優先度が十分に高いと考えられる発話文を音声情報として出力することができる。したがって、発話待機情報保持部に保持されている発話文のうちスコアが最も高いとしても、優先度が十分に高いとはいえない発話文を音声情報として出力することが防がれるため、ユーザに対してより適切な発話を行うことができる。 In the above embodiment, the predetermined condition is whether or not the score is equal to or more than a predetermined threshold. With such a configuration, an utterance sentence whose score is larger than a predetermined threshold, that is, whose priority is considered to be sufficiently high, can be output as voice information. Therefore, even if the utterance sentence held in the utterance waiting information holding unit has the highest score, it is possible to prevent the utterance sentence whose priority is not sufficiently high from being output as the voice information, so that the user is prevented from outputting the utterance sentence. It is possible to make a more appropriate utterance.
 また、スコア付与部23は、発話文の内容に対応する内容スコアに基づいて、スコアを算出する構成とされている。 (4) The score assigning unit 23 is configured to calculate a score based on a content score corresponding to the content of the utterance sentence.
 発話文の内容に対応する内容スコアに基づいてスコアを算出する構成とすることで、発話文の内容に応じた適切なスコアを付与することができる。そのため、例えば、重要な内容に係る発話文に対してより優先度が高いことを示すスコアを付与することができ、ユーザに対してより適切な発話を行うことができる。 こ と By adopting a configuration in which a score is calculated based on the content score corresponding to the content of the utterance sentence, an appropriate score according to the content of the utterance sentence can be given. Therefore, for example, a score indicating higher priority can be given to an utterance sentence relating to important content, and a more appropriate utterance can be given to the user.
 また、スコア付与部23は、自装置による前回の発話からの経過時間に係る経過時間スコアに基づいて、スコアを算出する構成とされている。 ス コ ア The score assigning unit 23 is configured to calculate the score based on the elapsed time score related to the elapsed time from the previous utterance by the own device.
 自装置による前回の発話からの経過時間に係る経過時間スコアに基づいてスコアを算出する構成とすることで、経過時間を考慮したスコアを付与することができる。そのため、例えば、前回の発話からの経過時間が短すぎる場合に発話を行うことなどを防ぐことも可能となるため、ユーザに対してより適切な発話を行うことができる。 こ と By using a configuration in which the score is calculated based on the elapsed time score related to the elapsed time from the previous utterance by the own device, a score considering the elapsed time can be given. Therefore, for example, it is possible to prevent the utterance from being performed when the elapsed time from the previous utterance is too short, so that it is possible to perform a more appropriate utterance to the user.
 また、スコア付与部23は、音声情報を提供する対象となるコンテンツの状況に係る状況スコアに基づいて、スコアを算出する構成とされている。 (4) The score assigning unit 23 is configured to calculate a score based on a situation score relating to the situation of the content to which the audio information is provided.
 コンテンツの状況に係る状況スコアに基づいてスコアを算出する構成とすることで、コンテンツの状況に応じた適切なスコアを付与することができる。そのため、例えば、コンテンツの状況から音声情報の提供を減らしたほうがよい場合には、スコアを低くするなど、音声情報の提供が望まれるタイミング等を考慮したスコアを付与することができ、ユーザに対してより適切な発話を行うことができる。 こ と By adopting a configuration in which the score is calculated based on the situation score relating to the content situation, it is possible to give an appropriate score according to the content situation. Therefore, for example, when it is better to reduce the provision of audio information based on the content situation, a score can be given in consideration of the timing at which the provision of audio information is desired, such as lowering the score. More appropriate utterance.
 なお、上記実施形態で説明した情報提供装置1は上記の構成に限定されず、種々の変更を加えることができる。 The information providing apparatus 1 described in the above embodiment is not limited to the above configuration, and various changes can be made.
 上記実施形態では、情報提供装置1が1台の装置により構成されている場合について説明したが、上記の情報提供装置1に係る機能が複数台の装置に分散配置された構成であってもよい。例えば、情報提供装置1を構成する各モジュールが個別の装置であってもよい。また、各モジュールが、それぞれ複数台の装置により構成されていてもよい。 In the above-described embodiment, the case where the information providing apparatus 1 is configured by one apparatus has been described. However, a configuration in which the function related to the information providing apparatus 1 is dispersedly arranged in a plurality of apparatuses may be employed. . For example, each module constituting the information providing device 1 may be an individual device. Further, each module may be configured by a plurality of devices.
 また、上記実施形態では、情報提供装置1がユーザUからの発話(問い合わせ)に対して応答する機能を有している場合について説明したが、情報提供装置1は、少なくとも発話による音声情報の出力に係る機能を有していればよい。 Further, in the above-described embodiment, the case where the information providing apparatus 1 has a function of responding to the utterance (inquiry) from the user U has been described. What is necessary is just to have the function which concerns on.
 また、上記実施形態では、スコア付与部23において、内容スコア、経過時間スコア、および、状況スコアを算出する構成について説明したが、スコア付与部23が付与するスコアには、内容スコア、経過時間スコア、および、状況スコアのいずれかが含まれていなくてもよい。また、スコア付与部23は、上記の3つのスコアとは異なる情報等に基づいてスコアを算出してもよい。さらに、スコア付与部23は、上記の3つのスコアと、その他の情報に基づいて算出されるスコアとを組み合わせてスコアを算出してもよい。このように、スコア付与部23により付与されるスコアの算出方法は、適宜変更することができる。 In the above-described embodiment, the configuration in which the content score, the elapsed time score, and the situation score are calculated in the score assigning unit 23 has been described. However, the score assigned by the score assigning unit 23 includes a content score, an elapsed time score, and the like. , And any of the situation scores may not be included. The score assigning unit 23 may calculate the score based on information different from the above three scores. Further, the score assigning unit 23 may calculate a score by combining the above three scores and a score calculated based on other information. As described above, the method of calculating the score provided by the score providing unit 23 can be appropriately changed.
 また、上記実施形態では、発話判定部22において発話文のスコアが最も高くかつ所定の閾値以上である場合に、音声情報として発話すると判断する構成について説明している。ただし、発話判定部22は、発話待機情報保持部21において保持されている発話文のうち最もスコアが高い発話文について、その他の条件を満たしている場合に当該発話文を選択する構成としてもよい。その他の条件とは、例えば、発話文の長さが所定の文字数以下である、等であってもよい。情報提供装置1からの音声情報の出力に係る機能等を考慮して、その他の条件(所定の条件)を設定してもよい。また、発話判定部22は、発話待機情報保持部21において保持されている発話文のうち、最もスコアが高い発話文を選択する構成としてもよい。この場合、発話判定部22による発話文の選択タイミング(定期的な発話文の選択のタイミング)を調整することなどにより、音声情報の出力の頻度を調整してもよい。 In the above-described embodiment, the configuration is described in which the utterance determination unit 22 determines that the utterance is uttered as voice information when the utterance sentence has the highest score and is equal to or greater than a predetermined threshold. However, the utterance determination unit 22 may be configured to select the utterance sentence having the highest score among the utterance sentences held in the utterance standby information holding unit 21 when other conditions are satisfied. . The other condition may be, for example, that the length of the utterance sentence is equal to or less than a predetermined number of characters. Other conditions (predetermined conditions) may be set in consideration of the function related to the output of audio information from the information providing device 1 and the like. The utterance determination unit 22 may be configured to select the utterance sentence having the highest score from the utterance sentences held in the utterance standby information holding unit 21. In this case, the output frequency of the voice information may be adjusted by adjusting the utterance sentence selection timing by the utterance determination unit 22 (periodical utterance sentence selection timing).
(その他)
 上記実施の形態の説明に用いたブロック図は、機能単位のブロックを示している。これらの機能ブロック(構成部)は、ハードウェア及び/又はソフトウェアの任意の組み合わせによって実現される。また、各機能ブロックの実現手段は特に限定されない。すなわち、各機能ブロックは、物理的及び/又は論理的に結合した1つの装置により実現されてもよいし、物理的及び/又は論理的に分離した2つ以上の装置を直接的及び/又は間接的に(例えば、有線及び/又は無線)で接続し、これら複数の装置により実現されてもよい。
(Other)
The block diagram used in the description of the above-described embodiment shows blocks in functional units. These functional blocks (components) are realized by an arbitrary combination of hardware and / or software. The means for implementing each functional block is not particularly limited. That is, each functional block may be realized by one device that is physically and / or logically coupled, or two or more devices that are physically and / or logically separated are directly and / or indirectly connected. (For example, wired and / or wireless) and may be realized by these multiple devices.
 例えば、本開示の一実施の形態における情報提供装置1は、本実施形態の情報提供装置1の処理を行うコンピュータとして機能してもよい。図5は、本実施形態に係る情報提供装置1のハードウェア構成の一例を示す図である。上述の情報提供装置1は、物理的には、プロセッサ1001、メモリ1002、ストレージ1003、通信装置1004、入力装置1005、出力装置1006、バス1007などを含むコンピュータ装置として構成されてもよい。 For example, the information providing apparatus 1 according to an embodiment of the present disclosure may function as a computer that performs processing of the information providing apparatus 1 according to the present embodiment. FIG. 5 is a diagram illustrating an example of a hardware configuration of the information providing apparatus 1 according to the present embodiment. The information providing device 1 described above may be physically configured as a computer device including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like.
 なお、以下の説明では、「装置」という文言は、回路、デバイス、ユニットなどに読み替えることができる。情報提供装置1のハードウェア構成は、図に示した各装置を1つ又は複数含むように構成されてもよいし、一部の装置を含まずに構成されてもよい。 In the following description, the term “apparatus” can be read as a circuit, a device, a unit, or the like. The hardware configuration of the information providing apparatus 1 may be configured to include one or more devices illustrated in the drawing, or may be configured without including some devices.
 情報提供装置1における各機能は、プロセッサ1001、メモリ1002などのハードウェア上に所定のソフトウェア(プログラム)を読み込ませることで、プロセッサ1001が演算を行い、通信装置1004による通信や、メモリ1002及びストレージ1003におけるデータの読み出し及び/又は書き込みを制御することで実現される。 The functions of the information providing apparatus 1 are performed by reading predetermined software (program) on hardware such as the processor 1001 and the memory 1002, so that the processor 1001 performs an arithmetic operation, communication by the communication apparatus 1004, and communication between the memory 1002 and the storage. This is realized by controlling the reading and / or writing of data in 1003.
 プロセッサ1001は、例えば、オペレーティングシステムを動作させてコンピュータ全体を制御する。プロセッサ1001は、周辺装置とのインタフェース、制御装置、演算装置、レジスタなどを含む中央処理装置(CPU:Central Processing Unit)で構成されてもよい。例えば、情報提供装置1の各機能は、プロセッサ1001で実現されてもよい。 The processor 1001 controls the entire computer by operating an operating system, for example. The processor 1001 may be configured by a central processing unit (CPU: Central Processing Unit) including an interface with a peripheral device, a control device, an arithmetic device, a register, and the like. For example, each function of the information providing device 1 may be realized by the processor 1001.
 また、プロセッサ1001は、プログラム(プログラムコード)、ソフトウェアモジュールやデータを、ストレージ1003及び/又は通信装置1004からメモリ1002に読み出し、これらに従って各種の処理を実行する。プログラムとしては、上述の実施の形態で説明した動作の少なくとも一部をコンピュータに実行させるプログラムが用いられる。例えば、情報提供装置1の各機能は、メモリ1002に格納され、プロセッサ1001で動作する制御プログラムによって実現されてもよい。上述の各種処理は、1つのプロセッサ1001で実行される旨を説明してきたが、2以上のプロセッサ1001により同時又は逐次に実行されてもよい。プロセッサ1001は、1以上のチップで実装されてもよい。なお、プログラムは、電気通信回線を介してネットワークから送信されても良い。 The processor 1001 reads out a program (program code), a software module, and data from the storage 1003 and / or the communication device 1004 to the memory 1002, and executes various processes according to these. As the program, a program that causes a computer to execute at least a part of the operation described in the above embodiment is used. For example, each function of the information providing apparatus 1 may be realized by a control program stored in the memory 1002 and operated by the processor 1001. Although it has been described that the various processes described above are executed by one processor 1001, the processes may be executed simultaneously or sequentially by two or more processors 1001. Processor 1001 may be implemented with one or more chips. Note that the program may be transmitted from a network via a telecommunication line.
 メモリ1002は、コンピュータ読み取り可能な記録媒体であり、例えば、ROM(Read Only Memory)、EPROM(Erasable Programmable ROM)、EEPROM(Electrically Erasable Programmable ROM)、RAM(Random Access Memory)などの少なくとも1つで構成されてもよい。メモリ1002は、レジスタ、キャッシュ、メインメモリ(主記憶装置)などと呼ばれてもよい。メモリ1002は、本開示の一実施の形態に係る方法を実施するために実行可能なプログラム(プログラムコード)、ソフトウェアモジュールなどを保存することができる。 The memory 1002 is a computer-readable recording medium, and is configured by at least one of a ROM (Read Only Memory), an EPROM (Erasable Programmable ROM), an EEPROM (Electrically Erasable Programmable ROM), a RAM (Random Access Memory), and the like. May be done. The memory 1002 may be called a register, a cache, a main memory (main storage device), or the like. The memory 1002 can store a program (program code), a software module, and the like that can be executed to perform the method according to an embodiment of the present disclosure.
 ストレージ1003は、コンピュータ読み取り可能な記録媒体であり、例えば、CD-ROM(Compact Disc ROM)などの光ディスク、ハードディスクドライブ、フレキシブルディスク、光磁気ディスク(例えば、コンパクトディスク、デジタル多用途ディスク、Blu-ray(登録商標)ディスク)、スマートカード、フラッシュメモリ(例えば、カード、スティック、キードライブ)、フロッピー(登録商標)ディスク、磁気ストリップなどの少なくとも1つで構成されてもよい。ストレージ1003は、補助記憶装置と呼ばれてもよい。上述の記憶媒体は、例えば、メモリ1002及び/又はストレージ1003を含むデータベース、サーバその他の適切な媒体であってもよい。 The storage 1003 is a computer-readable recording medium, for example, an optical disk such as a CD-ROM (Compact Disc), a hard disk drive, a flexible disk, and a magneto-optical disk (for example, a compact disk, a digital versatile disk, a Blu-ray). (Registered trademark) disk), a smart card, a flash memory (for example, a card, a stick, a key drive), a floppy (registered trademark) disk, and a magnetic strip. The storage 1003 may be called an auxiliary storage device. The storage medium described above may be, for example, a database including the memory 1002 and / or the storage 1003, a server, or any other suitable medium.
 通信装置1004は、有線及び/又は無線ネットワークを介してコンピュータ間の通信を行うためのハードウェア(送受信デバイス)であり、例えばネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュールなどともいう。例えば、情報提供装置1の各機能は、通信装置1004で実現されてもよい。 The communication device 1004 is hardware (transmitting / receiving device) for performing communication between computers via a wired and / or wireless network, and is also referred to as, for example, a network device, a network controller, a network card, a communication module, or the like. For example, each function of the information providing device 1 may be realized by the communication device 1004.
 入力装置1005は、外部からの入力を受け付ける入力デバイス(例えば、キーボード、マウス、マイクロフォン、スイッチ、ボタン、センサなど)である。出力装置1006は、外部への出力を実施する出力デバイス(例えば、ディスプレイ、スピーカー、LEDランプなど)である。なお、入力装置1005及び出力装置1006は、一体となった構成(例えば、タッチパネル)であってもよい。 The input device 1005 is an input device (for example, a keyboard, a mouse, a microphone, a switch, a button, a sensor, and the like) that receives an external input. The output device 1006 is an output device that performs output to the outside (for example, a display, a speaker, an LED lamp, and the like). Note that the input device 1005 and the output device 1006 may have an integrated configuration (for example, a touch panel).
 また、プロセッサ1001やメモリ1002などの各装置は、情報を通信するためのバス1007で接続される。バス1007は、単一のバスで構成されてもよいし、装置間で異なるバスで構成されてもよい。 The devices such as the processor 1001 and the memory 1002 are connected by a bus 1007 for communicating information. The bus 1007 may be configured by a single bus, or may be configured by a different bus between the devices.
 また、情報提供装置1は、マイクロプロセッサ、デジタル信号プロセッサ(DSP:Digital Signal Processor)、ASIC(Application Specific Integrated Circuit)、PLD(Programmable Logic Device)、FPGA(Field Programmable Gate Array)などのハードウェアを含んで構成されてもよく、当該ハードウェアにより、各機能ブロックの一部又は全てが実現されてもよい。例えば、プロセッサ1001は、これらのハードウェアの少なくとも1つで実装されてもよい。 The information providing device 1 includes hardware such as a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array). And some or all of the functional blocks may be realized by the hardware. For example, the processor 1001 may be implemented by at least one of these hardware.
 以上、本開示について詳細に説明したが、当業者にとっては、本開示が本開示中に説明した実施形態に限定されるものではないということは明らかである。本開示は、請求の範囲の記載により定まる本開示の趣旨及び範囲を逸脱することなく修正及び変更態様として実施することができる。したがって、本開示の記載は、例示説明を目的とするものであり、本実施形態に対して何ら制限的な意味を有するものではない。 Although the present disclosure has been described in detail above, it is obvious to those skilled in the art that the present disclosure is not limited to the embodiments described in the present disclosure. The present disclosure can be implemented as modified and changed aspects without departing from the spirit and scope of the present disclosure defined by the description of the claims. Therefore, the description of the present disclosure is intended for illustrative purposes, and has no restrictive meaning to the present embodiment.
 情報の通知は、本明細書で説明した態様/実施形態に限られず、他の方法で行われてもよい。例えば、情報の通知は、物理レイヤシグナリング(例えば、DCI(Downlink Control Information)、UCI(Uplink Control Information))、上位レイヤシグナリング(例えば、RRC(Radio Resource Control)シグナリング、MAC(Medium Access Control)シグナリング、報知情報(MIB(Master Information Block)、SIB(System Information Block)))、その他の信号又はこれらの組み合わせによって実施されてもよい。また、RRCシグナリングは、RRCメッセージと呼ばれてもよく、例えば、RRC接続セットアップ(RRC Connection Setup)メッセージ、RRC接続再構成(RRC Connection Reconfiguration)メッセージなどであってもよい。 通知 Notification of information is not limited to the aspects / embodiments described in this specification, and may be performed by other methods. For example, the notification of information includes physical layer signaling (for example, DCI (Downlink Control Information), UCI (Uplink Control Information)), upper layer signaling (for example, RRC (Radio Resource Control) signaling, MAC (Medium Access Control) signaling, Broadcast information (MIB (Master @ Information @ Block), SIB (System @ Information @ Block))), other signals, or a combination thereof may be used. The RRC signaling may be referred to as an RRC message, and may be, for example, an RRC connection setup (RRC Connection Setup) message, an RRC connection reconfiguration (RRC Connection Reconfiguration) message, or the like.
 本開示において説明した各態様/実施形態は、LTE(Long Term Evolution)、LTE-A(LTE-Advanced)、SUPER 3G、IMT-Advanced、4G(4th generation mobile communication system)、5G(5th generation mobile communication system)、FRA(Future Radio Access)、NR(new Radio)、W-CDMA(登録商標)、GSM(登録商標)、CDMA2000、UMB(Ultra Mobile Broadband)、IEEE 802.11(Wi-Fi(登録商標))、IEEE 802.16(WiMAX(登録商標))、IEEE 802.20、UWB(Ultra-WideBand)、Bluetooth(登録商標)、その他の適切なシステムを利用するシステム及びこれらに基づいて拡張された次世代システムの少なくとも一つに適用されてもよい。また、複数のシステムが組み合わされて(例えば、LTE及びLTE-Aの少なくとも一方と5Gとの組み合わせ等)適用されてもよい。 Each aspect / embodiment described in the present disclosure is applicable to LTE (Long Term Evolution), LTE-A (LTE-Advanced), SUPER 3G, IMT-Advanced, 4G (4th generation mobile communication system), and 5G (5th generation mobile communication). system), FRA (Future Radio Access), NR (new Radio), W-CDMA (registered trademark), GSM (registered trademark), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi (registered trademark) )), Systems using IEEE@802.16 (WiMAX®), IEEE@802.20, UWB (Ultra-WideBand), Bluetooth®, and other suitable systems and extensions based thereon. It may be applied to at least one of the next generation systems. A plurality of systems may be combined (for example, a combination of at least one of LTE and LTE-A with 5G) and applied.
 本明細書で説明した各態様/実施形態の処理手順、シーケンス、フローチャートなどは、矛盾の無い限り、順序を入れ替えてもよい。例えば、本明細書で説明した方法については、例示的な順序で様々なステップの要素を提示しており、提示した特定の順序に限定されない。 処理 The processing procedures, sequences, flowcharts, and the like of each aspect / embodiment described in this specification may be interchanged as long as there is no inconsistency. For example, the methods described herein present elements of various steps in a sample order, and are not limited to the specific order presented.
 情報等は、上位レイヤ(又は下位レイヤ)から下位レイヤ(又は上位レイヤ)へ出力され得る。複数のネットワークノードを介して入出力されてもよい。 Information and the like can be output from an upper layer (or lower layer) to a lower layer (or upper layer). Input and output may be performed via a plurality of network nodes.
 入出力された情報等は特定の場所(例えば、メモリ)に保存されてもよいし、管理テーブルで管理してもよい。入出力される情報等は、上書き、更新、または追記され得る。出力された情報等は削除されてもよい。入力された情報等は他の装置へ送信されてもよい。 情報 Input and output information and the like may be stored in a specific place (for example, a memory) or may be managed by a management table. Information that is input and output can be overwritten, updated, or added. The output information or the like may be deleted. The input information or the like may be transmitted to another device.
 判定は、1ビットで表される値(0か1か)によって行われてもよいし、真偽値(Boolean:trueまたはfalse)によって行われてもよいし、数値の比較(例えば、所定の値との比較)によって行われてもよい。 The determination may be made based on a value (0 or 1) represented by one bit, a Boolean value (Boolean: true or false), or a comparison of numerical values (for example, a predetermined value). Value).
 本開示において説明した各態様/実施形態は単独で用いてもよいし、組み合わせて用いてもよいし、実行に伴って切り替えて用いてもよい。また、所定の情報の通知(例えば、「Xであること」の通知)は、明示的に行うものに限られず、暗黙的(例えば、当該所定の情報の通知を行わない)ことによって行われてもよい。 各 Each aspect / embodiment described in the present disclosure may be used alone, may be used in combination, or may be used by switching with execution. Further, the notification of the predetermined information (for example, the notification of “X”) is not limited to being explicitly performed, and is performed implicitly (for example, not performing the notification of the predetermined information). Is also good.
 ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語と呼ばれるか、他の名称で呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、プログラム、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順、機能などを意味するよう広く解釈されるべきである。 Software, regardless of whether it is called software, firmware, middleware, microcode, a hardware description language, or any other name, instructions, instruction sets, codes, code segments, program codes, programs, subprograms, software modules , Applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, and the like.
 また、ソフトウェア、命令などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、同軸ケーブル、光ファイバケーブル、ツイストペア及びデジタル加入者回線(DSL)などの有線技術及び/又は赤外線、無線及びマイクロ波などの無線技術を使用してウェブサイト、サーバ、又は他のリモートソースから送信される場合、これらの有線技術及び/又は無線技術は、伝送媒体の定義内に含まれる。 ソ フ ト ウ ェ ア Also, software, instructions, and the like may be transmitted and received via a transmission medium. For example, if the software uses a wired technology such as coaxial cable, fiber optic cable, twisted pair and digital subscriber line (DSL) and / or a wireless technology such as infrared, wireless and microwave, the website, server, or other When transmitted from a remote source, these wired and / or wireless technologies are included within the definition of transmission medium.
 本開示で説明した情報、信号などは、様々な異なる技術のいずれかを使用して表されてもよい。例えば、上記の説明全体に渡って言及され得るデータ、命令、コマンド、情報、信号、ビット、シンボル、チップなどは、電圧、電流、電磁波、磁界若しくは磁性粒子、光場若しくは光子、又はこれらの任意の組み合わせによって表されてもよい。 The information, signals, etc. described in this disclosure may be represented using any of a variety of different technologies. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc., that can be referred to throughout the above description are not limited to voltages, currents, electromagnetic waves, magnetic or magnetic particles, optical or photons, or any of these. May be represented by a combination of
 なお、本開示で説明した用語及び/又は本開示の理解に必要な用語については、同一の又は類似する意味を有する用語と置き換えてもよい。 用語 Note that terms described in the present disclosure and / or terms necessary for understanding the present disclosure may be replaced with terms having the same or similar meaning.
 本開示で使用する「システム」および「ネットワーク」という用語は、互換的に使用される。 用語 As used in this disclosure, the terms “system” and “network” are used interchangeably.
 また、本開示で説明した情報、パラメータなどは、絶対値で表されてもよいし、所定の値からの相対値で表されてもよいし、対応する別の情報で表されてもよい。 情報 Further, the information, parameters, and the like described in the present disclosure may be represented by an absolute value, a relative value from a predetermined value, or another corresponding information.
 上述したパラメータに使用する名称はいかなる点においても限定的なものではない。さらに、これらのパラメータを使用する数式等は、本明細書で明示的に開示したものと異なる場合もある。 名称 The names used for the above parameters are not limiting in any way. Further, the formulas and the like that use these parameters may differ from those explicitly disclosed herein.
 上述したパラメータに使用する名称はいかなる点においても限定的な名称ではない。さらに、これらのパラメータを使用する数式等は、本開示で明示的に開示したものと異なる場合もある。 名称 The names used for the above parameters are not limiting in any way. Further, equations and the like using these parameters may differ from those explicitly disclosed in the present disclosure.
 本開示で使用する「判断(determining)」、「決定(determining)」という用語は、多種多様な動作を包含する場合がある。「判断」、「決定」は、例えば、判定(judging)、計算(calculating)、算出(computing)、処理(processing)、導出(deriving)、調査(investigating)、探索(looking up)(例えば、テーブル、データベースまたは別のデータ構造での探索)、確認(ascertaining)した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、受信(receiving)(例えば、情報を受信すること)、送信(transmitting)(例えば、情報を送信すること)、入力(input)、出力(output)、アクセス(accessing)(例えば、メモリ中のデータにアクセスすること)した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、解決(resolving)、選択(selecting)、選定(choosing)、確立(establishing)、比較(comparing)などした事を「判断」「決定」したとみなす事を含み得る。つまり、「判断」「決定」は、何らかの動作を「判断」「決定」したとみなす事を含み得る。 用語 As used in this disclosure, the terms “determining” and “determining” may encompass a wide variety of operations. “Judgment” and “decision” are, for example, judgment (judging), calculation (calculating), calculation (computing), processing (processing), derivation (deriving), investigating (investigating), searching (looking up) (for example, table , Searching in a database or another data structure), ascertaining what is considered as "determining", "determining", and the like. Also, “determining” and “deciding” include receiving (eg, receiving information), transmitting (eg, transmitting information), input (input), output (output), and access. (accessing) (for example, accessing data in a memory) may be regarded as “determined” or “determined”. In addition, `` judgment '' and `` decision '' means that resolving, selecting, selecting, establishing, establishing, comparing, etc. are regarded as `` judgment '' and `` decided ''. May be included. In other words, “judgment” and “decision” may include deeming any operation as “judgment” and “determined”.
 「接続された(connected)」、「結合された(coupled)」という用語、又はこれらのあらゆる変形は、2又はそれ以上の要素間の直接的又は間接的なあらゆる接続又は結合を意味し、互いに「接続」又は「結合」された2つの要素間に1又はそれ以上の中間要素が存在することを含むことができる。要素間の結合又は接続は、物理的なものであっても、論理的なものであっても、或いはこれらの組み合わせであってもよい。本開示で使用する場合、2つの要素は、1又はそれ以上の電線、ケーブル及び/又はプリント電気接続を使用することにより、並びにいくつかの非限定的かつ非包括的な例として、無線周波数領域、マイクロ波領域及び光(可視及び不可視の両方)領域の波長を有する電磁エネルギーなどの電磁エネルギーを使用することにより、互いに「接続」又は「結合」されると考えることができる。 The terms "connected," "coupled," or any variation thereof, mean any direct or indirect connection or connection between two or more elements that It may include the presence of one or more intermediate elements between the two elements "connected" or "coupled." The coupling or connection between the elements may be physical, logical, or a combination thereof. As used in this disclosure, the two elements may be implemented using one or more wires, cables and / or printed electrical connections, and as some non-limiting and non-exhaustive examples, in the radio frequency domain. Can be considered "connected" or "coupled" to each other by using electromagnetic energy, such as electromagnetic energy having wavelengths in the microwave and light (both visible and invisible) regions.
 本開示で使用する「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 記載 The term "based on" as used in the present disclosure does not mean "based solely on" unless stated otherwise. In other words, the phrase "based on" means both "based only on" and "based at least on."
 「含む(include)」、「含んでいる(including)」、およびそれらの変形が、本開示あるいは請求の範囲で使用されている限り、これら用語は、用語「備える(comprising)」と同様に、包括的であることが意図される。さらに、本開示あるいは請求の範囲において使用されている用語「または(or)」は、排他的論理和ではないことが意図される。 As long as “include”, “including”, and variations thereof, are used in the present disclosure or in the claims, these terms are as well as “comprising” It is intended to be comprehensive. Further, the term "or" as used in the present disclosure or claims is not intended to be an exclusive or.
 本開示において、文脈または技術的に明らかに1つのみしか存在しない装置である場合以外は、複数の装置をも含むものとする。本開示の全体において、文脈から明らかに単数を示したものではなければ、複数のものを含むものとする。 In the present disclosure, a plurality of devices are also included unless the device is clearly and only one exists in context or technically. Throughout this disclosure, the singular may include the plural unless the context clearly dictates otherwise.
 本開示において、例えば、英語でのa, an及びtheのように、翻訳により冠詞が追加された場合、本開示は、これらの冠詞の後に続く名詞が複数形であることを含んでもよい。 に お い て In the present disclosure, if articles are added by translation, for example, a, an, and the in English, the present disclosure may include that the nouns following these articles are plural.
 本開示において、「AとBが異なる」という用語は、「AとBが互いに異なる」ことを意味してもよい。なお、当該用語は、「AとBがそれぞれCと異なる」ことを意味してもよい。「離れる」、「結合される」などの用語も、「異なる」と同様に解釈されてもよい。 に お い て In the present disclosure, the term “A and B are different” may mean that “A and B are different from each other”. The term may mean that “A and B are different from C”. Terms such as "separate", "coupled" and the like may be interpreted similarly to "different".
 1…情報提供装置、10…対話モジュール、20…発話管理モジュール、21…発話待機情報保持部、22…発話判定部、23…スコア付与部、24…内容スコア算出部、25…経過時間スコア算出部、26…状況スコア算出部、30…応答情報生成モジュール、40A,40B…発話情報生成モジュール。 DESCRIPTION OF SYMBOLS 1 ... Information provision apparatus, 10 ... Dialog module, 20 ... Utterance management module, 21 ... Utterance standby information holding part, 22 ... Utterance determination part, 23 ... Score giving part, 24 ... Content score calculation part, 25 ... Elapsed time score calculation Unit, 26: situation score calculation unit, 30: response information generation module, 40A, 40B: utterance information generation module.

Claims (6)

  1.  ユーザに対して発話により音声情報を提供する情報提供装置であって、
     前記ユーザに対して発話する候補となる発話文に対して、前記発話文の発話の優先度に関係する数値であるスコアを算出して付与するスコア付与部と、
     前記発話文と、前記発話文に対して前記スコア付与部により付与されたスコアとを対応付けて保持する発話待機情報保持部と、
     前記発話待機情報保持部に保持されている前記スコアのうち最も高いスコアに対応付けられて保持されている発話文を選択する発話情報選択部と、
     前記発話情報選択部により選択された発話文を音声情報として出力する出力部と、
     を有する、情報提供装置。
    An information providing apparatus for providing voice information to a user by uttering,
    A score assigning unit that calculates and assigns a score that is a numerical value related to the priority of the utterance of the utterance sentence to an utterance sentence to be uttered to the user;
    The utterance sentence, an utterance standby information holding unit that holds the utterance sentence in association with the score given by the score providing unit,
    An utterance information selection unit that selects an utterance sentence held in association with the highest score among the scores held in the utterance standby information holding unit,
    An output unit that outputs the utterance sentence selected by the utterance information selection unit as voice information,
    An information providing device, comprising:
  2.  前記発話情報選択部は、前記発話待機情報保持部に保持されている前記スコアのうち最も高いスコアに対応付けられて保持されている発話文が、所定の条件を満たす場合に、当該発話文を選択する、請求項1に記載の情報提供装置。 The utterance information selecting unit, when the utterance sentence held in association with the highest score among the scores held in the utterance standby information holding unit satisfies a predetermined condition, the utterance information selection unit The information providing apparatus according to claim 1, wherein the information providing apparatus is selected.
  3.  前記所定の条件は、前記スコアが所定の閾値以上であるか否かである、請求項2に記載の情報提供装置。 The information providing device according to claim 2, wherein the predetermined condition is whether the score is equal to or greater than a predetermined threshold.
  4.  前記スコア付与部は、前記発話文の内容に対応する内容スコアに基づいて、前記スコアを算出する、請求項1~3のいずれか一項に記載の情報提供装置。 4. The information providing apparatus according to claim 1, wherein the score providing unit calculates the score based on a content score corresponding to the content of the utterance sentence.
  5.  前記スコア付与部は、自装置による前回の発話からの経過時間に係る経過時間スコアに基づいて、前記スコアを算出する、請求項1~4のいずれか一項に記載の情報提供装置。 The information providing apparatus according to any one of claims 1 to 4, wherein the score assigning unit calculates the score based on an elapsed time score relating to an elapsed time from a previous utterance by the own apparatus.
  6.  前記スコア付与部は、前記音声情報を提供する対象となるコンテンツの状況に係る状況スコアに基づいて、前記スコアを算出する、請求項1~5のいずれか一項に記載の情報提供装置。 The information providing apparatus according to any one of claims 1 to 5, wherein the score providing unit calculates the score based on a situation score relating to a situation of the content to which the audio information is provided.
PCT/JP2019/038045 2018-10-05 2019-09-26 Information presentation device WO2020071255A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2020550375A JP7146933B2 (en) 2018-10-05 2019-09-26 Information provision device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018190167 2018-10-05
JP2018-190167 2018-10-05

Publications (1)

Publication Number Publication Date
WO2020071255A1 true WO2020071255A1 (en) 2020-04-09

Family

ID=70054786

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/038045 WO2020071255A1 (en) 2018-10-05 2019-09-26 Information presentation device

Country Status (2)

Country Link
JP (1) JP7146933B2 (en)
WO (1) WO2020071255A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014073613A1 (en) * 2012-11-08 2014-05-15 日本電気株式会社 Conversation-sentence generation device, conversation-sentence generation method, and conversation-sentence generation program
JP2015127758A (en) * 2013-12-27 2015-07-09 シャープ株式会社 Response control device and control program
JP2015138147A (en) * 2014-01-22 2015-07-30 シャープ株式会社 Server, interactive device, interactive system, interactive method and interactive program
WO2015174172A1 (en) * 2014-05-13 2015-11-19 シャープ株式会社 Control device and message output control system
JP2017161875A (en) * 2016-03-11 2017-09-14 富士通株式会社 Information processing apparatus, information processing method, and information processing program
JP6400871B1 (en) * 2018-03-20 2018-10-03 ヤフー株式会社 Utterance control device, utterance control method, and utterance control program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014073613A1 (en) * 2012-11-08 2014-05-15 日本電気株式会社 Conversation-sentence generation device, conversation-sentence generation method, and conversation-sentence generation program
JP2015127758A (en) * 2013-12-27 2015-07-09 シャープ株式会社 Response control device and control program
JP2015138147A (en) * 2014-01-22 2015-07-30 シャープ株式会社 Server, interactive device, interactive system, interactive method and interactive program
WO2015174172A1 (en) * 2014-05-13 2015-11-19 シャープ株式会社 Control device and message output control system
JP2017161875A (en) * 2016-03-11 2017-09-14 富士通株式会社 Information processing apparatus, information processing method, and information processing program
JP6400871B1 (en) * 2018-03-20 2018-10-03 ヤフー株式会社 Utterance control device, utterance control method, and utterance control program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KUNUGI, MASASHI: "Construction of non-task-oriented dialogue system considering context", PROCEEDINGS OF THE 30TH ANNUAL CONFERENCE OF THE JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE, 2016, June 2016 (2016-06-01), pages 1 - 4 *

Also Published As

Publication number Publication date
JPWO2020071255A1 (en) 2021-09-02
JP7146933B2 (en) 2022-10-04

Similar Documents

Publication Publication Date Title
JP6317111B2 (en) Hybrid client / server speech recognition
JP6771805B2 (en) Speech recognition methods, electronic devices, and computer storage media
US9886952B2 (en) Interactive system, display apparatus, and controlling method thereof
KR20190046623A (en) Dialog system with self-learning natural language understanding
WO2017084334A1 (en) Language recognition method, apparatus and device and computer storage medium
US9142211B2 (en) Speech recognition apparatus, speech recognition method, and computer-readable recording medium
US20200026977A1 (en) Electronic apparatus and control method thereof
US11586689B2 (en) Electronic apparatus and controlling method thereof
US9251808B2 (en) Apparatus and method for clustering speakers, and a non-transitory computer readable medium thereof
TW201606750A (en) Speech recognition using a foreign word grammar
US20220148576A1 (en) Electronic device and control method
US11532301B1 (en) Natural language processing
JP2019015838A (en) Speech recognition system, terminal device and dictionary management method
KR102345625B1 (en) Caption generation method and apparatus for performing the same
US11514910B2 (en) Interactive system
US11626107B1 (en) Natural language processing
WO2020071255A1 (en) Information presentation device
JP7096199B2 (en) Information processing equipment, information processing methods, and programs
KR101565143B1 (en) Feature Weighting Apparatus for User Utterance Information Classification in Dialogue System and Method of the Same
CN113053390B (en) Text processing method and device based on voice recognition, electronic equipment and medium
JP7121802B2 (en) Response sentence creation device
WO2019220791A1 (en) Dialogue device
KR20210001905A (en) Electronic apparatus and control method thereof
JP2021082125A (en) Dialogue device
WO2020195022A1 (en) Voice dialogue system, model generation device, barge-in speech determination model, and voice dialogue program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19868803

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020550375

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19868803

Country of ref document: EP

Kind code of ref document: A1