WO2024057381A1 - Information processing device, information processing method, program, and recording medium - Google Patents

Information processing device, information processing method, program, and recording medium Download PDF

Info

Publication number
WO2024057381A1
WO2024057381A1 PCT/JP2022/034147 JP2022034147W WO2024057381A1 WO 2024057381 A1 WO2024057381 A1 WO 2024057381A1 JP 2022034147 W JP2022034147 W JP 2022034147W WO 2024057381 A1 WO2024057381 A1 WO 2024057381A1
Authority
WO
WIPO (PCT)
Prior art keywords
wake word
user
detection unit
situation
movement
Prior art date
Application number
PCT/JP2022/034147
Other languages
French (fr)
Japanese (ja)
Inventor
哲也 三ツ井
宣昭 田上
洋人 河内
Original Assignee
パイオニア株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パイオニア株式会社 filed Critical パイオニア株式会社
Priority to PCT/JP2022/034147 priority Critical patent/WO2024057381A1/en
Publication of WO2024057381A1 publication Critical patent/WO2024057381A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems

Definitions

  • the present invention relates to an information processing device, an information processing method, a program, and a recording medium.
  • the present invention has been made in view of the problems mentioned above as an example, and includes an information processing device, an information processing method, a program, and a recording medium that facilitate detection of a wake word even when a user is moving.
  • the purpose is to provide a medium.
  • the invention according to claim 1 includes a movement situation detection unit that determines whether the user is in a situation where it is easy to utter a wake word based on movement information of the user; a wake word detection unit that makes it easier to detect the wake word and detects the wake word from the voice uttered by the user when the movement situation detection unit determines that the user is in a situation where it is easy to utter the wake word;
  • An information processing device characterized by comprising the following.
  • the invention according to claim 5 is an information processing method for an information processing apparatus, comprising a movement situation detection section and a wake word detection section, wherein the movement situation detection section is configured to detect movement information of a user. a first step of determining whether or not the user is in a situation where it is easy to utter a wake word based on the above information; , the wake word detection unit is an information processing method comprising: a second step of making the wake word easy to detect and detecting the wake word from a voice uttered by the user.
  • the invention according to claim 6 is a program for causing a computer to execute an information processing method of an information processing apparatus, the program comprising a movement situation detection section and a wake word detection section, wherein the movement situation detection section a first step of determining whether the user is in a situation where the user is likely to issue a wake word based on movement information of the user; a second step of making it easier to detect the wake word and detecting the wake word from the voice uttered by the user.
  • This is a program that causes a computer to execute.
  • the invention according to claim 7 is a computer-readable computer-readable program recorded with a program for causing a computer to execute an information processing method of an information processing apparatus, which includes a movement status detection section and a wake word detection section.
  • the second step of detecting a word is a non-transitory recording medium recording a program for causing a computer to execute an information processing method.
  • FIG. 1 is a diagram showing the configuration of an information processing device according to a first embodiment.
  • FIG. 2 is a diagram showing a processing flow of the information processing apparatus according to the first embodiment.
  • FIG. 3 is a diagram illustrating a speed determination processing flow of the information processing apparatus according to the first embodiment.
  • FIG. 2 is a diagram showing the configuration of an information processing device according to a second embodiment.
  • 3 is a diagram illustrating a processing flow of an information processing apparatus according to a second embodiment.
  • FIG. 7 is a diagram illustrating a speed determination processing flow of the information processing apparatus according to the second embodiment.
  • the information processing device includes a movement status detection section and a wake word detection section.
  • the movement status detection unit determines whether the user is in a situation where it is easy to utter a wake word, based on the user's movement information. For example, if there is little change in the user's movement speed, the movement status detection unit determines that the user is moving stably and is therefore in a situation where it is easy to issue the wake word. Further, when the movement situation detection unit determines that the user is in a situation where it is easy to utter a wake word, the wake word detection unit makes it easier to detect the wake word and detects the wake word from the voice uttered by the user. .
  • the wake word detection unit detects the wake word from the voice uttered by the user by changing the threshold for detecting the wake word to make it easier to detect the wake word.
  • the wake word detection section detects the wake word by lowering the threshold for detecting the wake word. Make it easier. Therefore, the wake word detection unit can easily detect the wake word even when the user is moving.
  • Example 1 An information processing device 1 according to this embodiment will be explained using FIGS. 1 to 3.
  • the information processing device 1 includes a movement information acquisition section 110, a movement situation detection section 120, a voice acquisition section 130, and a wake word detection section 140.
  • the movement information acquisition unit 110 acquires movement information of the user.
  • the movement information acquisition unit 110 is configured with, for example, a speed sensor, and acquires the user's movement speed.
  • the movement information acquisition unit 110 may acquire the current position from a navigation device, a GPS (Global-Positioning System) receiver, etc., and calculate the movement speed.
  • the movement information acquisition unit 110 acquires speed information at regular time intervals and transmits it to the movement status detection unit 120, which will be described later.
  • the movement status detection unit 120 determines whether the user is in a situation where it is easy to utter a wake word, based on the user's movement information. Specifically, the movement status detection unit 120 refers to the user movement information received from the movement information acquisition unit 110, and determines that the user is in a situation where it is easy to issue a wake word when there is little change in the user's movement speed. judge.
  • the movement status detection unit 120 continuously acquires speed information from the movement information acquisition unit 110, and, for example, when it is confirmed that the change in movement speed is within 5%, The moving status detection unit 120 determines that there is little change in speed, and determines that the user is in a situation where it is easy to issue a wake word. The movement status detection unit 120 transmits the determined result to a wake word detection unit 140, which will be described later.
  • the voice acquisition unit 130 acquires the voice uttered by the user using, for example, a microphone connected to the voice acquisition unit 130, and stores the voice in a storage unit (not shown).
  • the voice acquisition unit 130 transmits information that the voice uttered by the user has been stored in the storage unit to the wake word detection unit 140, which will be described later.
  • the wake word detection unit 140 makes it easier to detect the wake word and detects the wake word from the voice uttered by the user. .
  • the wake word recognition rate is used, for example, to detect the wake word.
  • the wake word recognition rate is, for example, a value indicating how much a word acquired from the voice uttered by the user matches a comparative wake word stored in advance in a storage unit (not shown).
  • the wake word detection unit 140 calculates a recognition rate from the voice uttered by the user, and compares the calculated recognition rate with a predetermined threshold set for detecting a wake word. Then, if the calculated recognition rate is greater than or equal to a predetermined threshold, the wake word detection unit 140 detects the voice uttered by the user as a wake word.
  • the wake word detection unit 140 changes the threshold value based on whether the user is in a situation where it is easy to utter a wake word, thereby making it easier to detect the wake word. Specifically, when the wake word detection unit 140 receives information indicating that the user is unlikely to utter a wake word, the wake word detection unit 140 sets the threshold value to 80%, for example. On the other hand, when the wake word detection unit 140 receives information indicating that the user is in a situation where it is easy to issue a wake word, the wake word detection unit 140 sets the threshold value to 50%, for example. In this case, the wake word detection unit 140 can more easily detect the wake word than when the user has received information indicating that it is difficult for the user to utter the wake word.
  • the wake word detection unit 140 starts determination processing A (step S110).
  • the movement information acquisition unit 110 acquires speed information from a speed sensor or the like, and transmits the speed information to the movement status detection unit 120 (step S510).
  • the movement status detection unit 120 determines whether or not the speed change is small (step S520).
  • step S520 If the movement status detection unit 120 determines that the speed change is large (“NO” in step S520), it determines that it is difficult to emit a wake word, and transmits the determination result to the wake word detection unit 140 ( Step S530), the process moves to step S540.
  • the wake word detection unit 140 sets the threshold TH to, for example, 80%, and ends the determination process A (step S540).
  • the threshold value TH is used for comparison with the recognition rate in step S140 described later.
  • step S520 determines that the speed change is small (“YES” in step S520)
  • the wake word detection unit 140 sets the threshold value TH to, for example, 50%, and ends the determination process A (step S560). That is, in determination process A (step S110), a process is executed to determine whether the user is in a situation where it is easy to utter a wake word, and to determine the value of the threshold value TH based on the determination result.
  • the wake word detection unit 140 determines whether the user has spoken (step S120). When the wake word detection unit 140 receives information from the voice acquisition unit 130 that the voice uttered by the user has been stored in the storage unit, it determines that the voice has been uttered by the user.
  • step S120 If the wake word detection unit 140 determines that the user is not speaking (“NO” in step S120), the process returns to step S110.
  • the wake word detection unit 140 determines that the user is speaking (“YES” in step S120)
  • the wake word detection unit 140 uses the voice spoken by the user and the comparison wake word stored in the storage unit.
  • the recognition rate of the wake word is calculated by comparing the wake word with the wake word (step S130).
  • the wake word detection unit 140 determines whether the calculated recognition rate is greater than or equal to the threshold TH set in step S110 (step S140).
  • step S140 If the calculated recognition rate is less than the threshold TH (“NO” in step S140), the wake word detection unit 140 returns the process to step S110 (step S140).
  • the wake word detection unit 140 detects the voice uttered by the user as a wake word, and ends the process. (Step S150).
  • the information processing device 1 is configured to include a movement status detection section 120 and a wake word detection section 140.
  • the movement situation detection unit 120 determines whether the user is in a situation where it is easy to issue a wake word based on the user movement information, and if the change in the user's movement speed is small, the movement situation detection unit 120 , it is determined that the user is likely to issue a wake word. Further, if the movement situation detection unit 120 determines that the user is in a situation where it is easy to utter a wake word, the wake word detection unit 140 makes it easier to detect the wake word, and extracts the wake word from the voice uttered by the user. To detect. That is, when the moving situation detection section 120 determines that the user is moving stably, the wake word detection section 140 detects the wake word by making it easier to detect the wake word. Therefore, the wake word can be easily detected even when the user is moving.
  • the wake word detection unit 140 sets a threshold value for detecting a wake word when the movement situation detection unit 120 determines that the user is in a situation where the user is likely to utter a wake word. By changing this, the wake word can be easily detected. In other words, if the movement situation detection unit 120 determines that the user is in a situation where the user is likely to utter the wake word, the wake word detection unit 140 detects the wake word by changing the threshold value with which the calculated recognition rate is compared to a lower value. Make it easier to detect. Therefore, the wake word can be easily detected even when the user is moving.
  • Example 2 The information processing device 1A according to this embodiment will be explained using FIGS. 4 to 6.
  • the information processing device 1A includes a movement information acquisition section 110, a movement status detection section 120, a voice acquisition section 130, a wake word detection section 140A, and a communication section 210. ing. It should be noted that the constituent elements denoted by the same reference numerals as those in the first embodiment have the same functions, and therefore detailed explanations thereof will be omitted.
  • the wake word detection unit 140A transmits audio data uttered by the user to the server 900 (described later) via the communication unit 210 (described later). , the wake word is detected based on the determination by the server 900.
  • the wake word detection unit 140A calculates a recognition rate from the voice uttered by the user, and compares the calculated recognition rate with a predetermined threshold set for detecting a wake word. If the calculated recognition rate is equal to or higher than the set threshold, the wake word detection unit 140A detects the voice uttered by the user as a wake word. Specifically, the wake word detection unit 140A sets the threshold to 80%, for example, and detects the voice uttered by the user as the wake word when the calculated recognition rate is equal to or higher than the threshold. On the other hand, if the calculated recognition rate is less than the set threshold, the wake word detection unit 140A transmits the voice uttered by the user to the server 900, which will be described later. When the wake word detection unit 140A receives information from the server 900 that the voice uttered by the user has been detected as the wake word, the wake word detection unit 140A detects the voice uttered by the user as the wake word.
  • communication between the wake word detection unit 140A and the server 900 is performed via a communication unit 210 configured with, for example, a communication module that can be connected to the Internet.
  • the server 900 is a server on the cloud, and is equipped with high-performance speech recognition processing that has higher wake word detection accuracy than the wake word detection unit 140A. Therefore, the server 900 has the ability to calculate a higher recognition rate than the recognition rate calculated by the wake word detection unit 140A.
  • the server 900 calculates the recognition rate from the user's voice transmitted from the wake word detection unit 140A, and detects the wake word based on a threshold for detecting the wake word (server threshold). When the calculated recognition rate is equal to or higher than the server threshold, the server 900 transmits information indicating that the voice uttered by the user is a wake word to the wake word detection unit 140A.
  • the wake word detection unit 140A starts determination processing B (step S210).
  • the movement information acquisition unit 110 acquires speed information from a speed sensor or the like, and transmits the speed information to the movement status detection unit 120 (step S610).
  • the movement status detection unit 120 determines whether the speed change is small based on the speed information received from the movement information acquisition unit 110 (step S620).
  • step S620 If the movement status detection unit 120 determines that the speed change is large (“NO” in step S620), it determines that it is difficult to emit a wake word, and transmits the determined result to the wake word detection unit 140A, The determination process B is ended (step S630).
  • step S620 determines that the speed change is small (“YES” in step S620), it determines that the wake word is likely to be emitted, and sends the determined result to the wake word detection unit 140A.
  • the information is transmitted, and the determination process B is ended (step S640). That is, in determination process B (step S210), a process is executed to determine whether the user is in a situation where it is easy to utter a wake word.
  • the wake word detection unit 140A determines whether the user has spoken (step S220).
  • step S220 When the wake word detection unit 140A determines that the user is not speaking ("NO" in step S220), the process returns to step S210.
  • the wake word detection unit 140A determines that the user is speaking ("YES" in step S220)
  • the wake word detection unit 140A compares the voice spoken by the user with the wake word for comparison, and detects the wake word.
  • the word recognition rate is calculated (step S230).
  • the wake word detection unit 140A determines whether the calculated recognition rate is, for example, a threshold of 80% or more (step S240).
  • the wake word detection unit 140A detects the voice uttered by the user as a wake word, and ends the process. (Step S290)
  • step S240 if the calculated recognition rate is less than the threshold of 80% ("NO" in step S240), the wake word detection unit 140A moves the process to step S250.
  • the wake word detection unit 140A determines based on determination process B whether or not it is in a situation where it is easy to issue a wake word. (Step S250).
  • step S210 determines that the wake word is difficult to emit in step S210 ("NO" in step S250). the process returns to step S210.
  • step S210 determines whether the wake word is likely to be emitted ("YES" in step S250). If the determination result in step S210 is that the wake word is likely to be emitted ("YES" in step S250), the wake word detection unit 140A moves the process to step S260.
  • the wake word detection unit 140A checks whether the calculated recognition rate is, for example, a threshold value of 50% or more (step S260). If the calculated recognition rate is less than the threshold of 50% (“NO” in step S260), the wake word detection unit 140A returns the process to step S210. If the recognition rate is less than the threshold of 50%, it is difficult for the server 900 to recognize the wake word, so the voice uttered by the user is not transmitted to the server 900.
  • step S260 if the calculated recognition rate is equal to or higher than the threshold of 50% (“YES” in step S260), the wake word detection unit 140A transmits the voice uttered by the user to the server 900 via the communication unit 210. , and the process moves to step S270.
  • the server 900 calculates the recognition rate of the voice uttered by the user transmitted from the wake word detection unit 140A. (Step S270).
  • the server 900 compares the calculated recognition rate with the server threshold, and if the calculated recognition rate is less than the server threshold, the server 900 sends information to the wake word detection unit 140A that the recognition rate is less than the server threshold. Send to. On the other hand, if the calculated recognition rate is greater than or equal to the server threshold, the server 900 transmits information to the effect that the recognition rate is greater than or equal to the server threshold to the wake word detection unit 140A (step S280).
  • step S280 When the wake word detection unit 140A receives information from the server 900 that the calculated recognition rate is less than the server threshold (“NO” in step S280), the process returns to step S210. On the other hand, if the wake word detection unit 140A receives information from the server 900 that the calculated recognition rate is equal to or higher than the server threshold (“YES” in step S280), the wake word detection unit 140A wakes the voice uttered by the user. A word is detected, and the process is terminated (step S290).
  • the information processing device 1A includes a movement status detection section 120, a wake word detection section 140A, and a communication section 210.
  • the wake word detection unit 140A transmits the voice uttered by the user to the server 900 via the communication unit 210.
  • the server 900 detects the wake word using high-performance speech recognition processing with high detection accuracy provided in the server 900, and transmits the detection result to the wake word detection unit 140A.
  • the wake word detection unit 140A detects a wake word based on the detection result sent from the server 900. In other words, the wake word detection unit 140A detects the wake word based on the detection result of the high performance speech recognition processing with high detection accuracy performed by the server 900. Therefore, the wake word can be easily detected even when the user is moving.
  • the speed is calculated by acquiring current position information from a smartphone etc., and the user It may also be determined whether the user is in a situation where it is likely to issue a wake word. Thereby, the wake word can be easily detected even when the user is walking.
  • the moving situation detection unit 120 determines whether the user is in a situation where it is easy to issue a wake word based on the speed information, but it does not determine based on the acceleration. Good too. That is, the movement situation detection unit 120 determines the change in the user's speed based on the acceleration information, and if the change in speed is small, the movement situation detection unit 120 determines that the user is in a situation where it is easy to issue the wake word. Therefore, the wake word can be easily detected even when the user is moving.
  • the moving situation detection unit 120 determines whether the user is in a situation where it is easy to utter a wake word based on the speed information, but the heart rate, which is biological information, The determination may be based on the respiratory rate. Specifically, biometric information such as heart rate, heart rate variability, and breathing rate is acquired from a smartwatch worn by the user. If “heart rate is not fast”, “there is no heart rate fluctuation”, or “breathing is not fast”, it may be determined that the user is not in a nervous state and is therefore likely to utter the wake word. .
  • the moving situation detecting section 120 determines that the user is not nervous based on the biological information
  • the moving situation detecting section 120 determines that the user is in a situation where it is easy to utter the wake word. Therefore, if the user is not nervous even when moving, the wake word can be easily detected.
  • Information processing device 1A Information processing device 110; Movement information acquisition unit 120; Movement status detection unit 140; Wake word detection unit 140A; Wake word detection unit 210; Communication unit 900; Server

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The purpose of the present invention is to provide an information processing device, an information processing method, a program, and a recording medium with which, on the basis of a user's movement information, determination is made as to whether the user is in a situation in which the user is likely to speak a wake word, so as to enable easier detection of the wake word even if the user is moving. This invention comprises: a movement situation detection unit 120 for determining, on the basis of a user's movement information, whether the user is in a situation in which the user is likely to speak the wake word; and a wake word detection unit 140 for making it easier to detect the wake word by modifying a threshold value for detecting the wake word if the movement situation detection unit 120 has determined that the user is in a situation in which the user is likely to speak the wake word, and detecting the wake word from audio uttered by the user.

Description

情報処理装置、情報処理方法、プログラムおよび記録媒体Information processing device, information processing method, program and recording medium
 本発明は、情報処理装置、情報処理方法、プログラムおよび記録媒体に関する。 The present invention relates to an information processing device, an information processing method, a program, and a recording medium.
 近年、ユーザがウェイクワード(ウェイクアップワードあるいはホットワードともいう)を発話することにより音声アシスタントを起動させる機能を有するスマートフォン、スマートスピーカなどの機器が普及しつつある。
 この種の機器は、騒音や雑音などが含まれている環境下で動作させると、ウェイクワードを正しく検出できないことがあるため、例えば、複数の機器を連携させて雑音のレベルなどを適切に推定し、ウェイクワードを検出する技術が開示されている(例えば、特許文献1参照)。
In recent years, devices such as smartphones and smart speakers that have a function of activating a voice assistant by a user speaking a wake word (also referred to as a wake-up word or hot word) have become popular.
This type of device may not be able to correctly detect the wake word if it is operated in an environment that contains noise, so for example, multiple devices may be linked to properly estimate the noise level. However, a technique for detecting a wake word has been disclosed (for example, see Patent Document 1).
特開2021-15202号公報JP 2021-15202 Publication
 しかしながら、上述した特許文献1に記載の技術においては、所定の固定された空間における雑音を推定していることにより、例えば、ユーザが機器を保有して移動している場合やユーザが車両で移動している場合には、ウェイクワードの検出ができない可能性があるという課題が一例としてあげられる。 However, in the technology described in Patent Document 1 mentioned above, noise in a predetermined fixed space is estimated. An example of this problem is that the wake word may not be detected if the
 そこで、本発明は、上述の一例としてあげられた課題に鑑みてなされたものであって、ユーザが移動している場合でもウェイクワードを検出しやすくする情報処理装置、情報処理方法、プログラムおよび記録媒体を提供することを目的とする。 SUMMARY OF THE INVENTION The present invention has been made in view of the problems mentioned above as an example, and includes an information processing device, an information processing method, a program, and a recording medium that facilitate detection of a wake word even when a user is moving. The purpose is to provide a medium.
 上記課題を解決するために、請求項1に記載の発明は、ユーザの移動情報に基づいて、ユーザがウェイクワードを発しやすい状況にあるか否かを判定する移動状況検出部と、前記ユーザが前記ウェイクワードを発しやすい状況にあると前記移動状況検出部が判定した場合には、前記ウェイクワードを検出しやすくして、前記ユーザが発話した音声から前記ウェイクワードを検出するウェイクワード検出部と、を備えることを特徴とする情報処理装置である。 In order to solve the above problem, the invention according to claim 1 includes a movement situation detection unit that determines whether the user is in a situation where it is easy to utter a wake word based on movement information of the user; a wake word detection unit that makes it easier to detect the wake word and detects the wake word from the voice uttered by the user when the movement situation detection unit determines that the user is in a situation where it is easy to utter the wake word; An information processing device characterized by comprising the following.
 また、請求項5に記載の発明は、移動状況検出部と、ウェイクワード検出部と、を備えた、情報処理装置の情報処理方法であって、前記移動状況検出部が、ユーザの移動情報に基づいて、ユーザがウェイクワードを発しやすい状況にあるか否かを判定する第1の工程と、前記移動状況検出部が、前記ユーザが前記ウェイクワードを発しやすい状況にあると判定した場合には、前記ウェイクワード検出部は、前記ウェイクワードを検出しやすくして、前記ユーザが発話した音声から前記ウェイクワードを検出する第2の工程と、を備える情報処理方法である。 Further, the invention according to claim 5 is an information processing method for an information processing apparatus, comprising a movement situation detection section and a wake word detection section, wherein the movement situation detection section is configured to detect movement information of a user. a first step of determining whether or not the user is in a situation where it is easy to utter a wake word based on the above information; , the wake word detection unit is an information processing method comprising: a second step of making the wake word easy to detect and detecting the wake word from a voice uttered by the user.
 また、請求項6に記載の発明は、移動状況検出部と、ウェイクワード検出部と、を備えた、情報処理装置の情報処理方法をコンピュータに実行させるためのプログラムであって、前記移動状況検出部が、ユーザの移動情報に基づいて、ユーザがウェイクワードを発しやすい状況にあるか否かを判定する第1の工程と、前記移動状況検出部が、前記ユーザが前記ウェイクワードを発しやすい状況にあると判定した場合には、前記ウェイクワード検出部は、前記ウェイクワードを検出しやすくして、前記ユーザが発話した音声から前記ウェイクワードを検出する第2の工程と、を備える情報処理方法をコンピュータに実行させるためのプログラムである。 Further, the invention according to claim 6 is a program for causing a computer to execute an information processing method of an information processing apparatus, the program comprising a movement situation detection section and a wake word detection section, wherein the movement situation detection section a first step of determining whether the user is in a situation where the user is likely to issue a wake word based on movement information of the user; a second step of making it easier to detect the wake word and detecting the wake word from the voice uttered by the user. This is a program that causes a computer to execute.
 また、請求項7に記載の発明は、移動状況検出部と、ウェイクワード検出部と、を備えた、情報処理装置の情報処理方法をコンピュータに実行させるためのプログラムを記録したコンピュータが読み取り可能な非一過性の記録媒体であって、前記移動状況検出部が、ユーザの移動情報に基づいて、ユーザがウェイクワードを発しやすい状況にあるか否かを判定する第1の工程と、前記移動状況検出部が、前記ユーザが前記ウェイクワードを発しやすい状況にあると判定した場合には、前記ウェイクワード検出部は、前記ウェイクワードを検出しやすくして、前記ユーザが発話した音声から前記ウェイクワードを検出する第2の工程と、を備える情報処理方法をコンピュータに実行させるためのプログラムを記録した非一過性の記録媒体である。 In addition, the invention according to claim 7 is a computer-readable computer-readable program recorded with a program for causing a computer to execute an information processing method of an information processing apparatus, which includes a movement status detection section and a wake word detection section. A first step of determining whether the user is in a situation where the user is likely to issue a wake word based on the user's movement information, and the movement of the non-transitory recording medium. If the situation detection unit determines that the user is in a situation where it is easy to utter the wake word, the wake word detection unit makes it easy to detect the wake word, and detects the wake word from the voice uttered by the user. The second step of detecting a word is a non-transitory recording medium recording a program for causing a computer to execute an information processing method.
実施例1に係る情報処理装置の構成を示す図である。1 is a diagram showing the configuration of an information processing device according to a first embodiment. 実施例1に係る情報処理装置の処理フローを示す図である。FIG. 2 is a diagram showing a processing flow of the information processing apparatus according to the first embodiment. 実施例1に係る情報処理装置の速度の判定処理フローを示す図である。FIG. 3 is a diagram illustrating a speed determination processing flow of the information processing apparatus according to the first embodiment. 実施例2に係る情報処理装置の構成を示す図である。FIG. 2 is a diagram showing the configuration of an information processing device according to a second embodiment. 実施例2に係る情報処理装置の処理フローを示す図である。3 is a diagram illustrating a processing flow of an information processing apparatus according to a second embodiment. FIG. 実施例2に係る情報処理装置の速度の判定処理フローを示す図である。7 is a diagram illustrating a speed determination processing flow of the information processing apparatus according to the second embodiment. FIG.
 実施形態に係る情報処理装置は、移動状況検出部と、ウェイクワード検出部と、を含んで構成されている。
 移動状況検出部は、ユーザの移動情報に基づいて、ユーザがウェイクワードを発しやすい状況にあるか否かを判定する。
 例えば、移動状況検出部は、ユーザの移動速度の変化が少ない場合には、ユーザは安定して移動していることにより、ウェイクワードを発しやすい状況にあると判定する。
 また、ウェイクワード検出部は、ユーザがウェイクワードを発しやすい状況にあると移動状況検出部が判定した場合には、ウェイクワードを検出しやすくして、ユーザが発話した音声からウェイクワードを検出する。
 例えば、ウェイクワード検出部は、ウェイクワードを検出する閾値を変更することによりウェイクワードを検出しやすくして、ユーザが発話した音声からウェイクワードを検出する。
 つまり、移動状況検出部が、ユーザが安定した移動をしている状況であると判定した場合には、ウェイクワード検出部は、ウェイクワードを検出する閾値を低くすることにより、ウェイクワードを検出しやすくする。
 そのため、ウェイクワード検出部は、ユーザが移動している場合でもウェイクワードを検出しやすくすることができる。
The information processing device according to the embodiment includes a movement status detection section and a wake word detection section.
The movement status detection unit determines whether the user is in a situation where it is easy to utter a wake word, based on the user's movement information.
For example, if there is little change in the user's movement speed, the movement status detection unit determines that the user is moving stably and is therefore in a situation where it is easy to issue the wake word.
Further, when the movement situation detection unit determines that the user is in a situation where it is easy to utter a wake word, the wake word detection unit makes it easier to detect the wake word and detects the wake word from the voice uttered by the user. .
For example, the wake word detection unit detects the wake word from the voice uttered by the user by changing the threshold for detecting the wake word to make it easier to detect the wake word.
In other words, when the movement status detection section determines that the user is moving stably, the wake word detection section detects the wake word by lowering the threshold for detecting the wake word. Make it easier.
Therefore, the wake word detection unit can easily detect the wake word even when the user is moving.
<実施例1>
 図1~3を用いて、本実施例に係る情報処理装置1について説明する。
<Example 1>
An information processing device 1 according to this embodiment will be explained using FIGS. 1 to 3.
<情報処理装置1の構成>
 図1に示すように、情報処理装置1は、移動情報取得部110と、移動状況検出部120と、音声取得部130と、ウェイクワード検出部140と、を含んで構成されている。
<Configuration of information processing device 1>
As shown in FIG. 1, the information processing device 1 includes a movement information acquisition section 110, a movement situation detection section 120, a voice acquisition section 130, and a wake word detection section 140.
 移動情報取得部110は、ユーザの移動情報を取得する。
 具体的には、移動情報取得部110は、例えば、速度センサにより構成され、ユーザの移動速度を取得する。
The movement information acquisition unit 110 acquires movement information of the user.
Specifically, the movement information acquisition unit 110 is configured with, for example, a speed sensor, and acquires the user's movement speed.
 また、移動情報取得部110は、ナビゲーション装置、GPS(Global-Positioning System)受信機などから現在位置を取得し、移動速度を算出してもよい。
 移動情報取得部110は、速度情報を一定の時間間隔で取得し、後述する移動状況検出部120に送信する。
Furthermore, the movement information acquisition unit 110 may acquire the current position from a navigation device, a GPS (Global-Positioning System) receiver, etc., and calculate the movement speed.
The movement information acquisition unit 110 acquires speed information at regular time intervals and transmits it to the movement status detection unit 120, which will be described later.
 移動状況検出部120は、ユーザの移動情報に基づいて、ユーザがウェイクワードを発しやすい状況にあるか否かを判定する。
 具体的には、移動状況検出部120は、移動情報取得部110から受信したユーザの移動情報を参照し、ユーザの移動速度の変化が少ない場合に、ユーザがウェイクワードを発しやすい状況にあると判定する。
The movement status detection unit 120 determines whether the user is in a situation where it is easy to utter a wake word, based on the user's movement information.
Specifically, the movement status detection unit 120 refers to the user movement information received from the movement information acquisition unit 110, and determines that the user is in a situation where it is easy to issue a wake word when there is little change in the user's movement speed. judge.
 より具体的には、移動状況検出部120は、移動情報取得部110から連続的に速度情報を取得し、例えば、移動速度の変化が5%以内の変化であることを確認した場合には、移動状況検出部120は、速度変化が少ないと判断し、ユーザがウェイクワードを発しやすい状況にあると判定する。
 移動状況検出部120は、判定した結果を後述するウェイクワード検出部140に送信する。
More specifically, the movement status detection unit 120 continuously acquires speed information from the movement information acquisition unit 110, and, for example, when it is confirmed that the change in movement speed is within 5%, The moving status detection unit 120 determines that there is little change in speed, and determines that the user is in a situation where it is easy to issue a wake word.
The movement status detection unit 120 transmits the determined result to a wake word detection unit 140, which will be described later.
 音声取得部130は、ユーザが発話した音声を、例えば、音声取得部130に接続されたマイクロフォンなどにより取得し、図示しない記憶部に格納する。
 音声取得部130は、ユーザが発話した音声が記憶部に格納された旨を、後述するウェイクワード検出部140に送信する。
The voice acquisition unit 130 acquires the voice uttered by the user using, for example, a microphone connected to the voice acquisition unit 130, and stores the voice in a storage unit (not shown).
The voice acquisition unit 130 transmits information that the voice uttered by the user has been stored in the storage unit to the wake word detection unit 140, which will be described later.
 ウェイクワード検出部140は、ユーザがウェイクワードを発しやすい状況にあると移動状況検出部120が判定した場合には、ウェイクワードを検出しやすくして、ユーザが発話した音声からウェイクワードを検出する。 When the movement situation detection unit 120 determines that the user is in a situation where it is easy to utter a wake word, the wake word detection unit 140 makes it easier to detect the wake word and detects the wake word from the voice uttered by the user. .
 ここで、ウェイクワードの検出には、例えば、ウェイクワードの認識率が用いられる。
 ウェイクワードの認識率は、例えば、ユーザが発話した音声から取得された単語が、図示しない記憶部に予め格納されている比較用ウェイクワードと、どれだけ一致したかを示した値である。
 ウェイクワード検出部140は、ユーザが発話した音声から認識率を算出し、算出された認識率と、ウェイクワードを検出するために設定されている所定の閾値と、を比較する。
 そして、ウェイクワード検出部140は、算出された認識率が所定の閾値以上であった場合には、ユーザが発話した音声をウェイクワードと検出する。
Here, the wake word recognition rate is used, for example, to detect the wake word.
The wake word recognition rate is, for example, a value indicating how much a word acquired from the voice uttered by the user matches a comparative wake word stored in advance in a storage unit (not shown).
The wake word detection unit 140 calculates a recognition rate from the voice uttered by the user, and compares the calculated recognition rate with a predetermined threshold set for detecting a wake word.
Then, if the calculated recognition rate is greater than or equal to a predetermined threshold, the wake word detection unit 140 detects the voice uttered by the user as a wake word.
 また、ウェイクワード検出部140は、ユーザがウェイクワードを発しやすい状況にあるか否かに基づいて閾値を変更し、ウェイクワードを検出しやすくする。
 具体的には、ウェイクワード検出部140は、ユーザがウェイクワードを発しにくい旨の情報を受信した場合には、例えば、閾値を80%に設定する。
 一方で、ウェイクワード検出部140は、ユーザがウェイクワードを発しやすい状況にある旨の情報を受信した場合には、例えば、閾値を50%に設定する。
 この場合には、ユーザがウェイクワードを発しにくい旨の情報を受信している場合と比較し、ウェイクワード検出部140は、ウェイクワードを検出しやすくなる。
Further, the wake word detection unit 140 changes the threshold value based on whether the user is in a situation where it is easy to utter a wake word, thereby making it easier to detect the wake word.
Specifically, when the wake word detection unit 140 receives information indicating that the user is unlikely to utter a wake word, the wake word detection unit 140 sets the threshold value to 80%, for example.
On the other hand, when the wake word detection unit 140 receives information indicating that the user is in a situation where it is easy to issue a wake word, the wake word detection unit 140 sets the threshold value to 50%, for example.
In this case, the wake word detection unit 140 can more easily detect the wake word than when the user has received information indicating that it is difficult for the user to utter the wake word.
<情報処理装置1の処理>
 図2、図3を用いて、情報処理装置1の処理について説明する。
<Processing of information processing device 1>
The processing of the information processing device 1 will be explained using FIGS. 2 and 3.
 図2に示すように、ウェイクワード検出部140は、判定処理Aを開始させる(ステップS110)。 As shown in FIG. 2, the wake word detection unit 140 starts determination processing A (step S110).
  図3に示すように、移動情報取得部110は、速度センサなどから速度情報を取得し、速度情報を移動状況検出部120に送信する(ステップS510)。 As shown in FIG. 3, the movement information acquisition unit 110 acquires speed information from a speed sensor or the like, and transmits the speed information to the movement status detection unit 120 (step S510).
 移動状況検出部120は、移動情報取得部110から受信した速度情報に基づいて、速度変化が少ないか否かを判定する(ステップS520)。 Based on the speed information received from the movement information acquisition unit 110, the movement status detection unit 120 determines whether or not the speed change is small (step S520).
 移動状況検出部120は、速度変化が大きい旨の判定をした場合には(ステップS520の「NO」)、ウェイクワードを発しにくいと判定し、判定した結果をウェイクワード検出部140に送信し(ステップS530)、処理をステップS540に移行させる。 If the movement status detection unit 120 determines that the speed change is large (“NO” in step S520), it determines that it is difficult to emit a wake word, and transmits the determination result to the wake word detection unit 140 ( Step S530), the process moves to step S540.
 ウェイクワード検出部140は、閾値THに、例えば、80%を設定し、判定処理Aを終了させる(ステップS540)。
 閾値THは、後述するステップS140で認識率との比較に用いる。
The wake word detection unit 140 sets the threshold TH to, for example, 80%, and ends the determination process A (step S540).
The threshold value TH is used for comparison with the recognition rate in step S140 described later.
 一方で、移動状況検出部120は、速度変化が少ない旨の判定をした場合には(ステップS520の「YES」)、ウェイクワードを発しやすいと判定し、判定した結果をウェイクワード検出部140に送信する(ステップS550)。 On the other hand, if the movement status detection unit 120 determines that the speed change is small (“YES” in step S520), it determines that the wake word is likely to be emitted, and transmits the determination result to the wake word detection unit 140. Transmit (step S550).
 ウェイクワード検出部140は、閾値THに、例えば、50%を設定し、判定処理Aを終了させる(ステップS560)。
 つまり、判定処理A(ステップS110)では、ユーザがウェイクワードを発しやすい状況にあるか否かを判定し、当該判定結果に基づいて、閾値THの値を決定する処理が実行される。
The wake word detection unit 140 sets the threshold value TH to, for example, 50%, and ends the determination process A (step S560).
That is, in determination process A (step S110), a process is executed to determine whether the user is in a situation where it is easy to utter a wake word, and to determine the value of the threshold value TH based on the determination result.
 図2に示すように、ウェイクワード検出部140は、ユーザが発話したか否かの判定を行う(ステップS120)。
 ウェイクワード検出部140は、音声取得部130から、ユーザが発話した音声が記憶部に格納された旨の情報を受信した場合には、ユーザが発話したと判定する。
As shown in FIG. 2, the wake word detection unit 140 determines whether the user has spoken (step S120).
When the wake word detection unit 140 receives information from the voice acquisition unit 130 that the voice uttered by the user has been stored in the storage unit, it determines that the voice has been uttered by the user.
 ウェイクワード検出部140は、ユーザが発話していない旨の判定をした場合には(ステップS120の「NO」)、処理をステップS110に戻す。 If the wake word detection unit 140 determines that the user is not speaking (“NO” in step S120), the process returns to step S110.
 一方で、ウェイクワード検出部140は、ユーザが発話している旨の判定をした場合には(ステップS120の「YES」)、ユーザが発話した音声と、記憶部に格納されている比較用ウェイクワードとを比較し、ウェイクワードの認識率を算出する(ステップS130)。 On the other hand, if the wake word detection unit 140 determines that the user is speaking (“YES” in step S120), the wake word detection unit 140 uses the voice spoken by the user and the comparison wake word stored in the storage unit. The recognition rate of the wake word is calculated by comparing the wake word with the wake word (step S130).
 ウェイクワード検出部140は、算出された認識率が、ステップS110にて設定された閾値TH以上であるか否かを判定する(ステップS140)。 The wake word detection unit 140 determines whether the calculated recognition rate is greater than or equal to the threshold TH set in step S110 (step S140).
 ウェイクワード検出部140は、算出された認識率が、閾値TH未満である場合には(ステップS140の「NO」)、処理をステップS110に戻す(ステップS140)。 If the calculated recognition rate is less than the threshold TH (“NO” in step S140), the wake word detection unit 140 returns the process to step S110 (step S140).
 一方で、ウェイクワード検出部140は、算出された認識率が、閾値TH以上である場合には(ステップS140の「YES」)、ユーザが発話した音声をウェイクワードと検出し、処理を終了させる(ステップS150)。 On the other hand, if the calculated recognition rate is equal to or higher than the threshold TH ("YES" in step S140), the wake word detection unit 140 detects the voice uttered by the user as a wake word, and ends the process. (Step S150).
 本実施例に係る情報処理装置1は、移動状況検出部120と、ウェイクワード検出部140と、を含んで構成されている。
 移動状況検出部120は、ユーザの移動情報に基づいて、ユーザがウェイクワードを発しやすい状況にあるか否かを判定し、ユーザの移動速度の変化が少ない場合には、移動状況検出部120は、ユーザがウェイクワードを発しやすい状況にあると判定する。
 また、ウェイクワード検出部140は、ユーザがウェイクワードを発しやすい状況にあると移動状況検出部120が判定した場合には、ウェイクワードを検出しやすくして、ユーザが発話した音声からウェイクワードを検出する。
 つまり、移動状況検出部120が、ユーザが安定した移動をしている状況であると判定した場合には、ウェイクワード検出部140は、ウェイクワードを検出しやすくしてウェイクワードを検出する。
 そのため、ユーザが移動している場合でもウェイクワードを検出しやすくすることができる。
The information processing device 1 according to this embodiment is configured to include a movement status detection section 120 and a wake word detection section 140.
The movement situation detection unit 120 determines whether the user is in a situation where it is easy to issue a wake word based on the user movement information, and if the change in the user's movement speed is small, the movement situation detection unit 120 , it is determined that the user is likely to issue a wake word.
Further, if the movement situation detection unit 120 determines that the user is in a situation where it is easy to utter a wake word, the wake word detection unit 140 makes it easier to detect the wake word, and extracts the wake word from the voice uttered by the user. To detect.
That is, when the moving situation detection section 120 determines that the user is moving stably, the wake word detection section 140 detects the wake word by making it easier to detect the wake word.
Therefore, the wake word can be easily detected even when the user is moving.
 また、本実施例に係る情報処理装置1は、ウェイクワード検出部140は、ユーザがウェイクワードを発しやすい状況にあると移動状況検出部120が判定した場合には、ウェイクワードを検出する閾値を変更することによりウェイクワードを検出しやすくする。
 つまり、ウェイクワード検出部140は、ユーザがウェイクワードを発しやすい状況にあると移動状況検出部120が判定した場合には、算出された認識率と比較する閾値を、低く変更することによりウェイクワードを検出しやすくする。
 そのため、ユーザが移動している場合でもウェイクワードを検出しやすくできる。
Further, in the information processing device 1 according to the present embodiment, the wake word detection unit 140 sets a threshold value for detecting a wake word when the movement situation detection unit 120 determines that the user is in a situation where the user is likely to utter a wake word. By changing this, the wake word can be easily detected.
In other words, if the movement situation detection unit 120 determines that the user is in a situation where the user is likely to utter the wake word, the wake word detection unit 140 detects the wake word by changing the threshold value with which the calculated recognition rate is compared to a lower value. Make it easier to detect.
Therefore, the wake word can be easily detected even when the user is moving.
<実施例2>
 図4~図6を用いて、本実施例に係る情報処理装置1Aについて説明する。
<Example 2>
The information processing device 1A according to this embodiment will be explained using FIGS. 4 to 6.
<情報処理装置1Aの構成>
 図4に示すように、情報処理装置1Aは、移動情報取得部110と、移動状況検出部120と、音声取得部130と、ウェイクワード検出部140Aと、通信部210と、を含んで構成されている。
 なお、実施例1と同一の符号を付す構成要素については、同様の機能を有することから、その詳細な説明は省略する。
<Configuration of information processing device 1A>
As shown in FIG. 4, the information processing device 1A includes a movement information acquisition section 110, a movement status detection section 120, a voice acquisition section 130, a wake word detection section 140A, and a communication section 210. ing.
It should be noted that the constituent elements denoted by the same reference numerals as those in the first embodiment have the same functions, and therefore detailed explanations thereof will be omitted.
 ウェイクワード検出部140Aは、移動状況検出部120がウェイクワードを発しやすい状況にあると判定した場合には、後述する通信部210を介して後述するサーバ900にユーザが発話した音声データを送信し、サーバ900における判定に基づいて、ウェイクワードを検出する。 When the movement status detection unit 120 determines that the wake word is likely to be uttered, the wake word detection unit 140A transmits audio data uttered by the user to the server 900 (described later) via the communication unit 210 (described later). , the wake word is detected based on the determination by the server 900.
 ウェイクワード検出部140Aは、ユーザが発話した音声から認識率を算出し、算出された認識率と、ウェイクワードを検出するために設定されている所定の閾値と、を比較する。
 そして、算出された認識率が設定された閾値以上であった場合には、ウェイクワード検出部140Aは、ユーザが発話した音声をウェイクワードと検出する。
 具体的には、ウェイクワード検出部140Aは、例えば、閾値に80%を設定し、算出された認識率が閾値以上の場合には、ユーザが発話した音声をウェイクワードと検出する。
 一方で、ウェイクワード検出部140Aは、算出された認識率が設定された閾値未満であった場合には、後述するサーバ900にユーザが発話した音声を送信する。
 ウェイクワード検出部140Aは、サーバ900から、ユーザが発話した音声をウェイクワードと検出した旨の情報を受信した場合には、ユーザが発話した音声をウェイクワードと検出する。
The wake word detection unit 140A calculates a recognition rate from the voice uttered by the user, and compares the calculated recognition rate with a predetermined threshold set for detecting a wake word.
If the calculated recognition rate is equal to or higher than the set threshold, the wake word detection unit 140A detects the voice uttered by the user as a wake word.
Specifically, the wake word detection unit 140A sets the threshold to 80%, for example, and detects the voice uttered by the user as the wake word when the calculated recognition rate is equal to or higher than the threshold.
On the other hand, if the calculated recognition rate is less than the set threshold, the wake word detection unit 140A transmits the voice uttered by the user to the server 900, which will be described later.
When the wake word detection unit 140A receives information from the server 900 that the voice uttered by the user has been detected as the wake word, the wake word detection unit 140A detects the voice uttered by the user as the wake word.
 なお、ウェイクワード検出部140Aと、サーバ900との間の通信は、例えば、インターネットに接続可能な通信モジュールなどで構成された通信部210を介して行われる。 Note that communication between the wake word detection unit 140A and the server 900 is performed via a communication unit 210 configured with, for example, a communication module that can be connected to the Internet.
 サーバ900は、クラウド上のサーバであって、ウェイクワード検出部140Aよりもウェイクワードの検出精度の高い、高性能な音声認識処理を備えている。
 そのため、サーバ900は、ウェイクワード検出部140Aで算出される認識率と比較し、より高い認識率を算出する能力を有している。
The server 900 is a server on the cloud, and is equipped with high-performance speech recognition processing that has higher wake word detection accuracy than the wake word detection unit 140A.
Therefore, the server 900 has the ability to calculate a higher recognition rate than the recognition rate calculated by the wake word detection unit 140A.
 サーバ900は、ウェイクワード検出部140Aから送信された、ユーザが発話した音声から認識率の算出を行い、ウェイクワードを検出するための閾値(サーバ閾値)に基づいて、ウェイクワードを検出する。
 サーバ900は、算出された認識率がサーバ閾値以上の場合には、ユーザが発話した音声がウェイクワードである旨の情報をウェイクワード検出部140Aに送信する。
The server 900 calculates the recognition rate from the user's voice transmitted from the wake word detection unit 140A, and detects the wake word based on a threshold for detecting the wake word (server threshold).
When the calculated recognition rate is equal to or higher than the server threshold, the server 900 transmits information indicating that the voice uttered by the user is a wake word to the wake word detection unit 140A.
<情報処理装置1Aの処理>
 図5、図6を用いて、情報処理装置1Aの処理について説明する。
<Processing of information processing device 1A>
The processing of the information processing device 1A will be explained using FIGS. 5 and 6.
 図5に示すように、ウェイクワード検出部140Aは、判定処理Bを開始させる(ステップS210)。 As shown in FIG. 5, the wake word detection unit 140A starts determination processing B (step S210).
  図6に示すように、移動情報取得部110は、速度センサなどから速度情報を取得し、速度情報を移動状況検出部120に送信する(ステップS610)。 As shown in FIG. 6, the movement information acquisition unit 110 acquires speed information from a speed sensor or the like, and transmits the speed information to the movement status detection unit 120 (step S610).
 移動状況検出部120は、移動情報取得部110から受信した速度情報に基づいて、速度変化が少ないか否かを判定する(ステップS620)。 The movement status detection unit 120 determines whether the speed change is small based on the speed information received from the movement information acquisition unit 110 (step S620).
 移動状況検出部120が、速度変化が大きい旨の判定をした場合には(ステップS620の「NO」)、ウェイクワードを発しにくいと判定し、判定した結果をウェイクワード検出部140Aに送信し、判定処理Bを終了させる(ステップS630)。 If the movement status detection unit 120 determines that the speed change is large (“NO” in step S620), it determines that it is difficult to emit a wake word, and transmits the determined result to the wake word detection unit 140A, The determination process B is ended (step S630).
 一方で、移動状況検出部120が、速度変化が少ない旨の判定をした場合には(ステップS620の「YES」)、ウェイクワードを発しやすいと判定し、判定した結果をウェイクワード検出部140Aに送信し、判定処理Bを終了させる(ステップS640)。
 つまり、判定処理B(ステップS210)では、ユーザがウェイクワードを発しやすい状況にあるか否かを判定する処理が実行される。
On the other hand, if the movement status detection unit 120 determines that the speed change is small (“YES” in step S620), it determines that the wake word is likely to be emitted, and sends the determined result to the wake word detection unit 140A. The information is transmitted, and the determination process B is ended (step S640).
That is, in determination process B (step S210), a process is executed to determine whether the user is in a situation where it is easy to utter a wake word.
  図5に示すように、ウェイクワード検出部140Aは、ユーザが発話したか否かの判定を行う(ステップS220)。 As shown in FIG. 5, the wake word detection unit 140A determines whether the user has spoken (step S220).
 ウェイクワード検出部140Aは、ユーザが発話していない旨の判定をした場合には(ステップS220の「NO」)、処理をステップS210に戻す。 When the wake word detection unit 140A determines that the user is not speaking ("NO" in step S220), the process returns to step S210.
 一方で、ウェイクワード検出部140Aは、ユーザが発話している旨の判定をした場合には(ステップS220の「YES」)、ユーザが発話した音声と、比較用ウェイクワードとを比較し、ウェイクワードの認識率を算出する(ステップS230)。 On the other hand, if the wake word detection unit 140A determines that the user is speaking ("YES" in step S220), the wake word detection unit 140A compares the voice spoken by the user with the wake word for comparison, and detects the wake word. The word recognition rate is calculated (step S230).
 ウェイクワード検出部140Aは、算出された認識率が、例えば、閾値80%以上か否かを判定する(ステップS240)。 The wake word detection unit 140A determines whether the calculated recognition rate is, for example, a threshold of 80% or more (step S240).
 ウェイクワード検出部140Aは、算出された認識率が、閾値80%以上である場合には(ステップS240の「YES」)、ユーザが発話した音声をウェイクワードと検出し、処理を終了させる。(ステップS290) If the calculated recognition rate is equal to or higher than the threshold of 80% (“YES” in step S240), the wake word detection unit 140A detects the voice uttered by the user as a wake word, and ends the process. (Step S290)
 一方で、ウェイクワード検出部140Aは、算出された認識率が、閾値80%未満である場合には(ステップS240の「NO」)、処理をステップS250に移行させる。 On the other hand, if the calculated recognition rate is less than the threshold of 80% ("NO" in step S240), the wake word detection unit 140A moves the process to step S250.
 ウェイクワード検出部140Aは、ウェイクワードを発しやすい状況にあるか否かを判定処理Bに基づいて判定する。(ステップS250)。 The wake word detection unit 140A determines based on determination process B whether or not it is in a situation where it is easy to issue a wake word. (Step S250).
 ウェイクワード検出部140Aは、ステップS210の判定結果がウェイクワードを発しにくい旨の判定である場合には(ステップS250の「NO」)、処理をステップS210に戻す。 When the wake word detection unit 140A determines that the wake word is difficult to emit in step S210 ("NO" in step S250), the process returns to step S210.
 一方で、ウェイクワード検出部140Aは、ステップS210の判定結果がウェイクワードを発しやすい旨の判定である場合には(ステップS250の「YES」)、処理をステップS260に移行させる。 On the other hand, if the determination result in step S210 is that the wake word is likely to be emitted ("YES" in step S250), the wake word detection unit 140A moves the process to step S260.
 ウェイクワード検出部140Aは、算出された認識率が、例えば、閾値50%以上であるか否かを確認する(ステップS260)。
 ウェイクワード検出部140Aは、算出された認識率が、閾値50%未満である場合には(ステップS260の「NO」)、処理をステップS210に戻す。
 認識率が閾値50%未満の場合には、サーバ900でもウェイクワードを認識することが困難であるため、サーバ900にユーザが発話した音声を送信しないようにしている。
The wake word detection unit 140A checks whether the calculated recognition rate is, for example, a threshold value of 50% or more (step S260).
If the calculated recognition rate is less than the threshold of 50% (“NO” in step S260), the wake word detection unit 140A returns the process to step S210.
If the recognition rate is less than the threshold of 50%, it is difficult for the server 900 to recognize the wake word, so the voice uttered by the user is not transmitted to the server 900.
 一方で、ウェイクワード検出部140Aは、算出された認識率が、閾値50%以上である場合には(ステップS260の「YES」)、ユーザが発話した音声を、通信部210を介してサーバ900に送信し、処理をステップS270に移行させる。 On the other hand, if the calculated recognition rate is equal to or higher than the threshold of 50% (“YES” in step S260), the wake word detection unit 140A transmits the voice uttered by the user to the server 900 via the communication unit 210. , and the process moves to step S270.
 サーバ900は、ウェイクワード検出部140Aから送信されたユーザが発話した音声の認識率を算出する。(ステップS270)。 The server 900 calculates the recognition rate of the voice uttered by the user transmitted from the wake word detection unit 140A. (Step S270).
 サーバ900は、算出された認識率と、サーバ閾値とを比較し、算出された認識率がサーバ閾値未満である場合には、認識率はサーバ閾値未満である旨の情報をウェイクワード検出部140Aに送信する。
 一方で、サーバ900は、算出された認識率がサーバ閾値以上である場合には、認識率はサーバ閾値以上である旨の情報をウェイクワード検出部140Aに送信する(ステップS280)。
The server 900 compares the calculated recognition rate with the server threshold, and if the calculated recognition rate is less than the server threshold, the server 900 sends information to the wake word detection unit 140A that the recognition rate is less than the server threshold. Send to.
On the other hand, if the calculated recognition rate is greater than or equal to the server threshold, the server 900 transmits information to the effect that the recognition rate is greater than or equal to the server threshold to the wake word detection unit 140A (step S280).
 ウェイクワード検出部140Aは、サーバ900から、算出された認識率がサーバ閾値未満である旨の情報を受信した場合には(ステップS280の「NO」)、処理をステップS210に戻す。
 一方で、ウェイクワード検出部140Aは、サーバ900から、算出された認識率がサーバ閾値以上である旨の情報を受信した場合には(ステップS280の「YES」)、ユーザが発話した音声をウェイクワードと検出し、処理を終了させる(ステップS290)。
When the wake word detection unit 140A receives information from the server 900 that the calculated recognition rate is less than the server threshold (“NO” in step S280), the process returns to step S210.
On the other hand, if the wake word detection unit 140A receives information from the server 900 that the calculated recognition rate is equal to or higher than the server threshold (“YES” in step S280), the wake word detection unit 140A wakes the voice uttered by the user. A word is detected, and the process is terminated (step S290).
 本実施例に係る情報処理装置1Aは、移動状況検出部120と、ウェイクワード検出部140Aと、通信部210と、を含んで構成されている。
 移動状況検出部120が、ユーザがウェイクワードを発しやすい状況であると判定した場合には、ウェイクワード検出部140Aは、通信部210を介して、ユーザが発話した音声をサーバ900に送信する。
 サーバ900は、サーバ900に備えている検出精度の高い高性能な音声認識処理にてウェイクワードの検出を行い、検出結果をウェイクワード検出部140Aに送信する。
 そして、ウェイクワード検出部140Aは、サーバ900から送信された検出結果に基づきウェイクワードの検出を行う。
 つまり、ウェイクワード検出部140Aは、サーバ900にて実施された検出精度の高い高性能な音声認識処理による検出結果に基づき、ウェイクワードの検出を行う。
 そのため、ユーザが移動している場合でもウェイクワードを検出しやすくすることができる。
The information processing device 1A according to the present embodiment includes a movement status detection section 120, a wake word detection section 140A, and a communication section 210.
When the moving situation detection unit 120 determines that the user is likely to utter a wake word, the wake word detection unit 140A transmits the voice uttered by the user to the server 900 via the communication unit 210.
The server 900 detects the wake word using high-performance speech recognition processing with high detection accuracy provided in the server 900, and transmits the detection result to the wake word detection unit 140A.
Then, the wake word detection unit 140A detects a wake word based on the detection result sent from the server 900.
In other words, the wake word detection unit 140A detects the wake word based on the detection result of the high performance speech recognition processing with high detection accuracy performed by the server 900.
Therefore, the wake word can be easily detected even when the user is moving.
<その他の実施例>
 上述した情報処理装置1、1Aにおいて、ユーザが歩行移動など、車両から速度情報を取得できない場合には、スマートフォンなどから現在位置情報を取得することにより速度を算出し、速度情報に基づいて、ユーザがウェイクワードを発しやすい状況にあるか否かを判定してもよい。
 これにより、ユーザが歩行移動している場合でも、ウェイクワードを検出しやすくすることができる。
<Other Examples>
In the information processing apparatuses 1 and 1A described above, when the user is walking and cannot acquire speed information from the vehicle, the speed is calculated by acquiring current position information from a smartphone etc., and the user It may also be determined whether the user is in a situation where it is likely to issue a wake word.
Thereby, the wake word can be easily detected even when the user is walking.
 また、上述した情報処理装置1、1Aにおいて、移動状況検出部120は、速度情報に基づいて、ユーザがウェイクワードを発しやすい状況にあるか否かを判定したが、加速度に基づいて判定してもよい。
 つまり、移動状況検出部120は、加速度情報に基づきユーザの速度変化を判断し、速度変化が少ない場合には、移動状況検出部120は、ユーザがウェイクワードを発しやすい状況にあると判定する。
 そのため、ユーザが移動している場合でもウェイクワードを検出しやすくすることができる。
Furthermore, in the information processing apparatuses 1 and 1A described above, the moving situation detection unit 120 determines whether the user is in a situation where it is easy to issue a wake word based on the speed information, but it does not determine based on the acceleration. Good too.
That is, the movement situation detection unit 120 determines the change in the user's speed based on the acceleration information, and if the change in speed is small, the movement situation detection unit 120 determines that the user is in a situation where it is easy to issue the wake word.
Therefore, the wake word can be easily detected even when the user is moving.
 また、上述した情報処理装置1、1Aにおいて、移動状況検出部120は、速度情報に基づいて、ユーザがウェイクワードを発しやすい状況にあるか否かを判定したが、生体情報である心拍数、呼吸数に基づいて判定してもよい。
 具体的には、ユーザが装着するスマートウォッチなどから心拍数、心拍変動、呼吸数などの生体情報を取得する。
 「心拍数が早くない」、「心拍変動がない」、あるいは「呼吸が早くない」場合には、ユーザが緊張状態にないため、ユーザがウェイクワードを発しやすい状況にあると判定してもよい。
 つまり、移動状況検出部120が、生体情報に基づいてユーザが緊張していないと判定した場合には、移動状況検出部120は、ユーザがウェイクワードを発しやすい状況にあると判定する。
 そのため、ユーザが移動中であっても緊張していない場合には、ウェイクワードを検出しやすくすることができる。
Further, in the information processing apparatuses 1 and 1A described above, the moving situation detection unit 120 determines whether the user is in a situation where it is easy to utter a wake word based on the speed information, but the heart rate, which is biological information, The determination may be based on the respiratory rate.
Specifically, biometric information such as heart rate, heart rate variability, and breathing rate is acquired from a smartwatch worn by the user.
If "heart rate is not fast", "there is no heart rate fluctuation", or "breathing is not fast", it may be determined that the user is not in a nervous state and is therefore likely to utter the wake word. .
That is, when the moving situation detecting section 120 determines that the user is not nervous based on the biological information, the moving situation detecting section 120 determines that the user is in a situation where it is easy to utter the wake word.
Therefore, if the user is not nervous even when moving, the wake word can be easily detected.
 以上、この発明の実施形態につき、図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計なども含まれる。 Although the embodiments of the present invention have been described above in detail with reference to the drawings, the specific configuration is not limited to these embodiments, and includes designs within the scope of the gist of the present invention.
 1;情報処理装置
 1A;情報処理装置
 110;移動情報取得部
 120;移動状況検出部
 140;ウェイクワード検出部
 140A;ウェイクワード検出部
 210;通信部
 900;サーバ
1; Information processing device 1A; Information processing device 110; Movement information acquisition unit 120; Movement status detection unit 140; Wake word detection unit 140A; Wake word detection unit 210; Communication unit 900; Server

Claims (7)

  1.  ユーザの移動情報に基づいて、ユーザがウェイクワードを発しやすい状況にあるか否かを判定する移動状況検出部と、
     前記ユーザが前記ウェイクワードを発しやすい状況にあると前記移動状況検出部が判定した場合には、前記ウェイクワードを検出しやすくして、前記ユーザが発話した音声から前記ウェイクワードを検出するウェイクワード検出部と、
     を備えることを特徴とする情報処理装置。
    a movement situation detection unit that determines whether the user is in a situation where it is easy to issue a wake word based on movement information of the user;
    When the movement situation detection unit determines that the user is in a situation where it is easy to utter the wake word, a wake word that makes it easier to detect the wake word and detects the wake word from the voice uttered by the user. a detection section;
    An information processing device comprising:
  2.  前記移動状況検出部は、前記ユーザの移動速度の変化が少ない場合には、前記ユーザが前記ウェイクワードを発しやすい状況にあると判定することを特徴とする請求項1に記載の情報処理装置。 The information processing apparatus according to claim 1, wherein the movement situation detection unit determines that the user is in a situation where it is easy to utter the wake word when there is little change in the movement speed of the user.
  3.  前記ウェイクワード検出部は、前記ウェイクワードを検出する閾値を変更することにより、前記ウェイクワードを検出しやすくすることを特徴とする請求項1または2に記載の情報処理装置。 The information processing device according to claim 1 or 2, wherein the wake word detection unit makes it easier to detect the wake word by changing a threshold value for detecting the wake word.
  4.  クラウド上のサーバとデータの送受信を行う通信部をさらに備え、
     前記移動状況検出部が前記ウェイクワードを発しやすい状況にあると判定した場合には、前記ウェイクワード検出部は、前記通信部を介して前記サーバに前記ユーザが発話した音声データを送信し、前記サーバにおける判定に基づいて、前記ウェイクワードの検出を行うことを特徴とする請求項1または2に記載の情報処理装置。
    It also includes a communication section that sends and receives data to and from a server on the cloud.
    When the movement situation detection section determines that the situation is such that the wake word is likely to be uttered, the wake word detection section transmits the audio data uttered by the user to the server via the communication section, and 3. The information processing apparatus according to claim 1, wherein the wake word is detected based on a determination made by a server.
  5.  移動状況検出部と、ウェイクワード検出部と、を備えた、情報処理装置の情報処理方法であって、
     前記移動状況検出部が、ユーザの移動情報に基づいて、ユーザがウェイクワードを発しやすい状況にあるか否かを判定する第1の工程と、
     前記移動状況検出部が、前記ユーザが前記ウェイクワードを発しやすい状況にあると判定した場合には、前記ウェイクワード検出部は、前記ウェイクワードを検出しやすくして、前記ユーザが発話した音声から前記ウェイクワードを検出する第2の工程と、
     を備える情報処理方法。
    An information processing method for an information processing device, comprising a movement status detection section and a wake word detection section,
    a first step in which the movement status detection unit determines whether the user is in a situation where it is easy to utter a wake word, based on the user's movement information;
    When the movement situation detection unit determines that the user is in a situation where it is easy to utter the wake word, the wake word detection unit makes it easy to detect the wake word, and detects the wake word from the voice uttered by the user. a second step of detecting the wake word;
    An information processing method comprising:
  6.  移動状況検出部と、ウェイクワード検出部と、を備えた、情報処理装置の情報処理方法をコンピュータに実行させるためのプログラムであって、
     前記移動状況検出部が、ユーザの移動情報に基づいて、ユーザがウェイクワードを発しやすい状況にあるか否かを判定する第1の工程と、
     前記移動状況検出部が、前記ユーザが前記ウェイクワードを発しやすい状況にあると判定した場合には、前記ウェイクワード検出部は、前記ウェイクワードを検出しやすくして、前記ユーザが発話した音声から前記ウェイクワードを検出する第2の工程と、
     を備える情報処理方法をコンピュータに実行させるためのプログラム。
    A program for causing a computer to execute an information processing method of an information processing device, the program comprising a movement status detection unit and a wake word detection unit,
    a first step in which the movement status detection unit determines whether the user is in a situation where it is easy to utter a wake word, based on the user's movement information;
    When the movement situation detection unit determines that the user is in a situation where it is easy to utter the wake word, the wake word detection unit makes it easy to detect the wake word, and detects the wake word from the voice uttered by the user. a second step of detecting the wake word;
    A program for causing a computer to execute an information processing method comprising:
  7.  移動状況検出部と、ウェイクワード検出部と、を備えた、情報処理装置の情報処理方法をコンピュータに実行させるためのプログラムを記録したコンピュータが読み取り可能な非一過性の記録媒体であって、
     前記移動状況検出部が、ユーザの移動情報に基づいて、ユーザがウェイクワードを発しやすい状況にあるか否かを判定する第1の工程と、
     前記移動状況検出部が、前記ユーザが前記ウェイクワードを発しやすい状況にあると判定した場合には、前記ウェイクワード検出部は、前記ウェイクワードを検出しやすくして、前記ユーザが発話した音声から前記ウェイクワードを検出する第2の工程と、
     を備える情報処理方法をコンピュータに実行させるためのプログラムを記録した非一過性の記録媒体。
    A computer-readable non-transitory recording medium recording a program for causing a computer to execute an information processing method of an information processing apparatus, comprising a movement status detection section and a wake word detection section,
    a first step in which the movement status detection unit determines whether the user is in a situation where it is easy to utter a wake word, based on the user's movement information;
    When the movement situation detection unit determines that the user is in a situation where it is easy to utter the wake word, the wake word detection unit makes it easy to detect the wake word, and detects the wake word from the voice uttered by the user. a second step of detecting the wake word;
    A non-transitory recording medium that records a program for causing a computer to execute an information processing method comprising:
PCT/JP2022/034147 2022-09-13 2022-09-13 Information processing device, information processing method, program, and recording medium WO2024057381A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/034147 WO2024057381A1 (en) 2022-09-13 2022-09-13 Information processing device, information processing method, program, and recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/034147 WO2024057381A1 (en) 2022-09-13 2022-09-13 Information processing device, information processing method, program, and recording medium

Publications (1)

Publication Number Publication Date
WO2024057381A1 true WO2024057381A1 (en) 2024-03-21

Family

ID=90274422

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/034147 WO2024057381A1 (en) 2022-09-13 2022-09-13 Information processing device, information processing method, program, and recording medium

Country Status (1)

Country Link
WO (1) WO2024057381A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017537361A (en) * 2014-09-12 2017-12-14 アップル インコーポレイテッド Dynamic threshold for always listening for speech trigger
WO2019176252A1 (en) * 2018-03-13 2019-09-19 ソニー株式会社 Information processing device, information processing system, information processing method, and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017537361A (en) * 2014-09-12 2017-12-14 アップル インコーポレイテッド Dynamic threshold for always listening for speech trigger
WO2019176252A1 (en) * 2018-03-13 2019-09-19 ソニー株式会社 Information processing device, information processing system, information processing method, and program

Similar Documents

Publication Publication Date Title
EP3413305B1 (en) Dual mode speech recognition
US11676600B2 (en) Methods and apparatus for detecting a voice command
KR101981878B1 (en) Control of electronic devices based on direction of speech
CN112106381B (en) User experience assessment method, device and equipment
US10728941B2 (en) Bidirectional sending and receiving of wireless data
KR101986354B1 (en) Speech-controlled apparatus for preventing false detections of keyword and method of operating the same
US9542947B2 (en) Method and apparatus including parallell processes for voice recognition
WO2018039045A1 (en) Methods and systems for keyword detection using keyword repetitions
US20120166190A1 (en) Apparatus for removing noise for sound/voice recognition and method thereof
US20200053611A1 (en) Wireless device connection handover
US10147444B2 (en) Electronic apparatus and voice trigger method therefor
US20180144740A1 (en) Methods and systems for locating the end of the keyword in voice sensing
US11763819B1 (en) Audio encryption
CN110265036A (en) Voice awakening method, system, electronic equipment and computer readable storage medium
US20160314801A1 (en) Content reproduction device, content reproduction program, and content reproduction method
CN111916068A (en) Audio detection method and device
US11562748B2 (en) Detecting and suppressing commands in media that may trigger another automated assistant
US11064281B1 (en) Sending and receiving wireless data
WO2024057381A1 (en) Information processing device, information processing method, program, and recording medium
US12080276B2 (en) Adapting automated speech recognition parameters based on hotword properties
KR102061206B1 (en) Speech-controlled apparatus for preventing false detections of keyword and method of operating the same
JP2023050044A (en) Information processing unit, information processing method, program and recording medium
WO2021147018A1 (en) Electronic device activation based on ambient noise
WO2021153102A1 (en) Information processing device, information processing system, information processing method and information processing program
CN117238316A (en) Voice activity detection system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22958716

Country of ref document: EP

Kind code of ref document: A1