WO2019174604A1 - 电子设备及电子设备控制方法 - Google Patents

电子设备及电子设备控制方法 Download PDF

Info

Publication number
WO2019174604A1
WO2019174604A1 PCT/CN2019/078052 CN2019078052W WO2019174604A1 WO 2019174604 A1 WO2019174604 A1 WO 2019174604A1 CN 2019078052 W CN2019078052 W CN 2019078052W WO 2019174604 A1 WO2019174604 A1 WO 2019174604A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
volume value
utterance
electronic device
data
Prior art date
Application number
PCT/CN2019/078052
Other languages
English (en)
French (fr)
Inventor
小林丈朗
大久保好理
石丸大
吉沢纯一
Original Assignee
青岛海信电器股份有限公司
东芝视频解决方案株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 青岛海信电器股份有限公司, 东芝视频解决方案株式会社 filed Critical 青岛海信电器股份有限公司
Priority to CN201980016654.4A priority Critical patent/CN112189230A/zh
Publication of WO2019174604A1 publication Critical patent/WO2019174604A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • Embodiments of the present invention relate to an electronic device and a method of controlling the electronic device.
  • the voice interaction system collects the voice uttered by the user, for example, with a microphone, and parses the collected voice through the voice recognition processing, thereby discriminating the content sent by the user, and providing the user with a response corresponding to the identified content.
  • the voice interaction system generally includes two parts, a voice interaction service part and an electronic device part.
  • the voice interactive service portion has a function of inputting the content sent by the user as input, parsing the input content by the voice recognition processing, and providing the user with a response corresponding to the analysis result.
  • the electronic device portion has a function of inputting the uttered content of the user collected by the microphone as voice data to the voice interactive service, or outputting the content of the response output by the voice interactive service as a voice to the user, or performing control of the peripheral device.
  • the voice interaction service sometimes does not provide a response corresponding to the input of content sent by the user, and the voice interaction service spontaneously provides information to the electronic device.
  • Patent Document 1 Japanese Laid-Open Patent Publication No. 2017-122930
  • the volume of the voice volume outputted in the electronic device is generally set by the user. Further, depending on the type of the electronic device, there is a device that can separately set the voice volume for each function such as a volume such as wake-up, in addition to the volume for voice interaction.
  • the voice interactive service spontaneously provides information to the electronic device
  • the electronic device when the electronic device outputs the provided information as a voice, the volume value for voice interaction preset in the electronic device is generally used.
  • the content of the service provided by the voice interactive service autonomously is information that is more urgent for the user or information that is more important. Even when such urgency information or information of high importance is output as a voice, the electronic device outputs the volume value with normal voice interaction.
  • the normal voice interaction volume value is set lower, it can only be set to a lower volume.
  • the value outputs a voice, so that there is a problem that the user cannot recognize the urgency or importance of the outputted voice.
  • an object of the present invention is to provide a voice interaction system composed of a voice interaction service and a voice interaction device, wherein the electronic device does not use the voice interaction value preset for voice interaction in the electronic device to be output by the voice interaction service. Spontaneously provided information, but output using the utterance volume value notified by the server.
  • An electronic device is an electronic device that transmits a voice input from the outside to a server, and includes: a voice input unit that collects the voice input from the outside; and a trigger word detecting unit that is input from the voice input unit Detecting a trigger word in the voice; the control unit, when the trigger word detecting unit detects the trigger word, transmitting at least the voice located in the voice after the trigger word to the server; a response that is transmitted by the server corresponding to the voice transmitted by the control unit, or a content of the utterance control data that the server does not spontaneously transmit according to the voice transmitted from the control unit is voiced Outputting; a recording unit storing a volume value indicating a category of the output voice type and a voice output of each category; the calling unit reading a category added by the server when transmitting the call control data, from the The recording unit reads the volume value corresponding to the additional category, and outputs the content of the utterance control data with the voice of the corresponding volume value
  • FIG. 1 is a schematic diagram showing a voice interaction system to which an embodiment of the present invention is applied;
  • FIG. 2 is a detailed structural diagram of the electronic device and the voice interactive service shown in FIG. 1;
  • FIG. 3 is a diagram showing the processing performed when the electronic device 1 that recognizes the trigger word issued by the user 5 performs the interactive type processing with the voice interactive service A2-1 in the electronic device and the voice interactive service shown in FIG. 2.
  • FIG. 4 is a diagram showing an example of a processing sequence when the voice interactive service A performs a self-styled process of transmitting information to the electronic device spontaneously in the electronic device and the voice interactive service shown in FIG. 2;
  • FIG. 5A is a diagram showing an example of a format of the utterance control data when the voice interactive service A transmits the utterance voice data and the utterance volume value when the utterance portion of the electronic device 1 plays the utterance voice data in one data block;
  • FIG. 5B is a diagram showing an example of the format of the utterance control data when the voice interaction service A transmits the utterance voice data and the utterance volume value when the utterance portion of the electronic device 1 plays the utterance voice data, respectively;
  • FIGS. 5A and 5B shows that when the value set for the utterance volume value 502 of the data format shown in FIGS. 5A and 5B is a numerical value, the value set by the utterance volume value 502 and the utterance unit 206 of the electronic device 1 play the voice data.
  • FIGS. 5A and 5B shows that when the value set for the utterance volume value 502 of the data format shown in FIGS. 5A and 5B is an identifier, the value set by the utterance volume value 502 and the utterance unit 206 of the electronic device 1 play the voice data.
  • FIG. 7 is a diagram showing a flow of processing when the electronic device receives the utterance control data including the volume value shown in FIG. 5;
  • FIG. 8A is a schematic diagram showing an example of a change in the volume value when the speech unit 206 outputs the speech data when the self-styled processing sequence shown in FIG. 4 is performed during the interactive processing sequence shown in FIG. 3;
  • FIG. 8B is a schematic diagram showing another example of a change in the volume value when the speech unit 206 outputs the speech data when the self-style processing sequence shown in FIG. 4 is performed between the interactive processing sequences shown in FIG. 3. ;
  • FIG. 8C is a schematic diagram showing still another example of a change in the volume value when the speech unit 206 outputs the speech data when the self-styled processing sequence shown in FIG. 4 is performed during the interactive processing sequence shown in FIG. 3;
  • FIG. 9A shows a change in the volume value when the speech unit 206 outputs the speech data when the speech interaction service A2-1 continues the processing sequence corresponding to the event B910 from the outside after the event A900 from the outside.
  • a schematic diagram of an example of a situation
  • FIG. 9B shows the utterance unit 206 when the electronic device 1 is capable of switching between the voice interactive service A2-1 and the voice interactive service B2-2, and the self-styled processing sequence is performed corresponding to an external event in each of the voice interactive services.
  • FIG. 10 shows an example of a data block of a volume value change notification of each category
  • FIG. 11 shows an example of a data block of an output voice notification using a voice class identifier (IDC);
  • Figure 12 shows an example 1 of the volume value management table of each category
  • FIG. 13 shows an example of the procedure of the volume value change notification
  • Figure 14 shows an example 2 of the volume value management table of each category
  • Fig. 15 is a flow chart showing whether or not to implement volume change of each category.
  • Control unit 203 System memory
  • An embodiment of the present invention provides an electronic device, including a communication receiving unit and a calling unit.
  • the communication receiving unit is configured to receive the utterance control data sent from the server, where the utterance control data includes at least a utterance volume value and utterance voice data; and the uttering unit plays the utterance voice data by using the utterance volume value.
  • the electronic device may be a mobile phone or a voice interactive terminal such as a smart speaker, or may only have voice control data such as information received by the server, and broadcast information of the server. Electronic device that controls data.
  • the server in the embodiment of the present invention may be a server capable of interacting with an electronic device, for example, a voice interactive service that performs voice interaction with a voice interactive terminal such as a smart speaker, or may only have a server that sends a call control data such as a notification to the electronic device.
  • some embodiments of the present invention provide a voice interactive terminal, such as a smart speaker.
  • 1 is a diagrammatic view of a voice interaction system including an electronic device 1 provided in some embodiments of the present invention.
  • the voice interactive system is constituted, for example, by the electronic device 1 disposed in the room 4 and the voice interactive service 2 existing in the cloud, and the electronic device 1 and the voice interactive service 2 can communicate with each other via the network 3.
  • the electronic device 1 is also passed through a close-range wireless communication system such as Bluetooth (registered trademark), ZigBee (registered trademark), Wi-Fi, etc., and a lamp 10, an air conditioner (air conditioner) 11 disposed in the room 4.
  • the video playback device 12 performs communication.
  • the electronic device 1 can also control the peripheral device by a communication method such as infrared communication that does not require pairing.
  • the electronic device 1 can also communicate with other electronic devices than those shown here.
  • the voice interaction service 2 includes two voice interaction services, a voice interaction service A2-1 and a voice interaction service B2-2. Which one of the voice interactive service A2-1 and the voice interactive service B2-2 is used is determined by the trigger word sent by the user.
  • FIG. 1 shows an example in which the voice interactive service 2 has two voice interactive services, a voice interactive service A2-1 and a voice interactive service B2-2, for example, only one voice interactive service may exist or exist. Three or more voice interaction services.
  • the electronic device 1 After the user speaks to the electronic device 1, the electronic device 1 transmits the voice data of the user's speech collected from the attached pickup, such as a microphone, to the voice interactive service 2 via the network 3.
  • the attached pickup such as a microphone
  • the voice interactive service 2 that has received the voice data transmitted from the electronic device 1 analyzes the received voice data and generates a response based on the parsed content. After generating the response, the voice interaction service 2 transmits the generated response to the electronic device 1 via the network 3.
  • the response generated by the voice interactive service 2 includes two responses, a voice response and a command response.
  • the voice response is a response generated by the voice interactive service 2 based on the voice data input from the electronic device 1.
  • the command response is based on the voice data input from the electronic device 1, and controls an electronic device (device) of the electronic device 1 or a peripheral device (peripheral device) connected to the electronic device 1 through a short-range wireless communication system or the like.
  • the electronic device (device) included in the electronic device 1 is, for example, an attached camera.
  • a peripheral device (peripheral device) connected to the electronic device 1 through a short-range wireless communication system or the like is, for example, a lamp 10 or an air conditioner (air conditioner) 11.
  • the response content of the voice data response is a reply such as "Good morning. How do you feel this today?" is made for the content that the user said to the electronic device 1, for example, "Good morning”. Or, for example, the question “When do you arrive in Osaka on the Shinkansen?", the answer to the user's inquiry, such as "If you start after 30 minutes, you will arrive in Osaka before 8:00" .
  • the electronic device 1 After the electronic device 1 receives a response from the voice interactive service 2, when the response is a voice data response, the content of the response can be output as voice by, for example, an attached speaker. Thus, the user can hear the response of the voice interaction system to what he said.
  • the response content of the command response includes a command.
  • the content of the command response may include, for example, "play from www.xxxxxx.co.jp/musicBBB.wav", or the command portion.
  • the "play” is composed of "www.xxxxxx.co.jp/musicBBB.wav” which is converted into a part of text data based on the content of the user's speech.
  • the electronic device 1 After obtaining the response from the voice interactive service 2, if the response is a command response including text data, the electronic device 1 performs interpretation of the text data portion in addition to the interpretation command and controls the device as the control object. For example, when the content of the command is "play from www.xxxxxx.co.jp/musicBBB.wav", the electronic device 1 can acquire data of www.xxxxxx.co.jp/musicBBB.wav and play it in the electronic device 1. The data obtained.
  • the voice interaction service 2 can provide information based on interaction with the user.
  • the voice interactive service 2 can also spontaneously provide information to the electronic device 1 when there is no input of voice data from the electronic device 1.
  • the information spontaneously provided by the voice interactive service 2 may be, for example, information of a bus close to a bus stop near the user, weather information such as cloud rain close to the user's residential area, information corresponding to the personal needs of the user, or the like, or It may be publicly available information such as an earthquake emergency report or a tsunami warning.
  • FIG. 2 shows a detailed structural diagram of the electronic device 1 and the voice interactive service 2 shown in FIG. 1.
  • the electronic device 1 and the voice interactive service 2 can communicate with each other via the network 3.
  • the electronic device 1 is controlled by the trigger word detecting unit 201, and controls the control unit 202 of the entire electronic device, and includes a system memory 203 for controlling a program of the electronic device 1 or a work memory for communicating with the voice interactive service 2 or peripheral devices via the network 3.
  • the communication control unit 204 collects a voice input unit 205 for the user's speech, a call unit 206 for outputting a voice data response, and a display unit that displays the state of the electronic device 1, the content of the response, and the function setting interface of the setting function. 207, and an operation button 208 operated by the user.
  • the trigger word detecting unit (may also be referred to as a keyword detecting unit) 201 is a processing unit that detects a trigger word from the content of the user's speech.
  • the trigger word is a prescribed trigger word issued by the user for starting the interaction process with the electronic device 1.
  • the trigger word detecting unit 201 detects that the user has issued the specified keyword, that is, the trigger word
  • the electronic device 1 uses the content of the user's speech after the trigger word as the spoken word to the electronic device 1, and continues to perform the content spoken by the user. deal with.
  • the trigger word that should be detected by the trigger word detecting unit 201 is stored in a trigger word storage area (not shown) of the system memory 203.
  • the trigger word detecting unit 201 can always detect it regardless of which of the set trigger words is spoken by the user. Further, only when the set trigger word is detected, the notification control unit 202 has detected the set trigger word in order to process the subsequent user's speech content.
  • the control unit 202 that has received the notification processes the subsequent user's speech content while exchanging data with the voice interactive service 2.
  • the communication control unit 204 is a processing unit that controls communication with the voice interactive service 2.
  • the communication control unit 204 monitors the communication state with the voice interactive service 2 via the network 3, and notifies the control unit 202 whether or not it is possible to communicate with the voice interactive service 2.
  • the communication control unit 204 may include a short-range wireless communication system such as Bluetooth, ZigBee, or Wi-Fi, or a communication method such as infrared communication.
  • the voice input unit 205 includes a pickup, such as a microphone or the like, and a processing unit that can collect voices issued by the user.
  • the utterance unit 206 is a processing unit that outputs data in the voice response transmitted from the voice interactive service 2 as voice when the response generated by the voice interactive service 2 is a voice response.
  • the data in the voice response may be voice data
  • the speech unit 206 outputs the voice.
  • the data in the voice response can be text data.
  • the data in the voice response transmitted from the voice interactive service 2 may be, for example, text data
  • the utterance portion 206 that obtains the text data may convert the text data into voice data using a voice synthesis function and output the voice.
  • the utterance unit 206 performs a process of outputting the content of the transmitted data in the form of voice.
  • the data in the voice response transmitted from the voice interactive service 2 may be, for example, text data, and the utterance section 206 that obtains the text data may convert the text data into voice data using a voice synthesis function and output the voice.
  • the voice interaction service 2 has two voice interaction services: a voice interaction service A2-1 and a voice interaction service B2-2.
  • the voice interaction service A2-1 includes a voice recognition system 261 that recognizes voice data transmitted from the electronic device 1 and converts it into utterance text, a voice intention understanding system 262 that understands the user's intention to speak from the utterance text, and The interactive processing system 263 and the autonomous processing system 265 that understand the utterance content generated by the user as understood by the system 262 are understood.
  • the voice interaction service A2-1 may include a voice recognition system 261 that recognizes voice data transmitted from the electronic device 1 and converts it into uttered text, and understands the voice intention of the user's utterance intention from the utterance text.
  • the understanding system 262, and the interactive processing system 263 and the autonomous processing system 265 that generate responses to the uttered content uttered by the user as understood by the speech intent understanding system 262.
  • the autonomous processing system 265 performs processing of spontaneously providing information to the electronic device 1 corresponding to an event from the outside.
  • the voice interactive service B2-2 also includes a voice recognition system 271 that recognizes voice data transmitted from the electronic device 1 and converts it into text data, and understands conversion from voice data to text.
  • the voice interaction service B2-2 is also recognized by a voice recognition system 271 that recognizes voice data transmitted from the electronic device 1 and converts it into text data, and understands a character string converted from voice data to text data.
  • the intent speech intent understanding system 272, and the interactive processing system 273 and the autonomous processing system 275 for generating responses to the user's speech content as understood by the speech intent understanding system 272.
  • the voice interactive service A2-1 and the voice interactive service B2-2 respectively have the characteristics of the voice interactive service processing that they are good at, and each of them has different processing fields (electrical, medical, agricultural, sports, etc.).
  • FIG. 2 shows an example in which the voice interaction service 2 includes two voice interaction services, a voice interaction service A2-1 and a voice interaction service B2-2, but may include only one voice interaction service, for example, or may include three.
  • One or more voice interaction services may be included in the voice interaction service 2 in the voice interaction service 2 in the voice interaction service 2 in the voice interaction service 2 in the voice interaction service 2 includes two voice interaction services, a voice interaction service A2-1 and a voice interaction service B2-2, but may include only one voice interaction service, for example, or may include three.
  • One or more voice interaction services may be included in the voice interaction service 2 includes two voice interaction services, a voice interaction service A2-1 and a voice interaction service B2-2, but may include only one voice interaction service, for example, or may include three.
  • One or more voice interaction services may be included in the voice interaction service 2 in the voice interaction service 2 includes two voice interaction services, a voice interaction service A2-1 and a voice interaction service B2-2, but may include only one voice interaction service, for example, or may include three.
  • FIG. 3 shows a processing sequence when the electronic device 1 and the voice interactive service 2 shown in FIG. 2 recognize that the electronic device 1 of the trigger word issued by the user 5 performs interactive processing with the voice interactive service A2-1.
  • Schematic diagram Further, it is assumed that when it is determined that the trigger word issued by the user 5 is the trigger word A, the electronic device 1 is preset to use the voice interactive service A2-1 to generate a response to the utterance of the user 5.
  • the voice input unit 205 of the electronic device 1 that collects the user 5 transmits the collected voice as voice data to the trigger word detecting unit 201.
  • the trigger word detecting unit 201 determines whether or not the voice data transmitted from the voice input unit 205 matches the trigger word previously stored in the system memory 203 by the voice recognition processing.
  • the electronic device 1 issues an interaction start instruction to the voice interactive service A2-1 (S312).
  • the voice interactive service A2-1 that has received the interactive start instruction (S312) prepares to analyze the subsequent voice data transmitted from the electronic device 1.
  • the user 5 then issues an utterance to the electronic device 1 after S310 and S311 are uttered (S313, S314).
  • the electronic device 1 that has collected the voice uttered by the user 5 transmits the collected uttered voice as voice data to the voice interactive service A2-1 (S315) ). Further, even during the utterance of the user 5, the electronic device 1 can sequentially transmit the voices of the previously collected utterances as voice data to the voice interactive service A2-1.
  • the voice data transmitted by the electronic device 1 to the voice interaction service A2-1 in S315 may be only the voice data of the user's speech in S313 to S314, or may be the voice data including the trigger word A of S310 to S311. Alternatively, it may be voice data of an utterance in any section of the user's speech between S310 and S314.
  • the voice interactive service A2-1 that has received the voice data transmitted from the electronic device 1 parses the received voice data and generates a response corresponding to the analysis result.
  • the voice interactive service A2-1 that has completed the response generation transmits the generated response, that is, the call control data (2) to the electronic device 1 (S316).
  • the electronic device 1 that has received the response from the voice interactive service A2-1 operates based on the content of the response.
  • the response generated by the voice interactive service A2-1 that is, the utterance control data (2) is the case of voice data.
  • the electronic device 1 that has received (S316) the utterance control data (2) outputs the content of the response as a voice by the utterance unit 206 (S317, S318).
  • the utterance unit 206 outputs the content of the response with the volume value for voice interaction preset in the electronic device 1.
  • the electronic device 1 After outputting the response, the electronic device 1 transmits an interactive end notification that the utterance output has ended to the voice interactive service A2-1 (S319).
  • the volume value output by the utterance unit 206 of the electronic device 1 is the normal voice interaction volume value preset in the electronic device 1 during the period from the utterance start (2) S317 to the utterance end (2).
  • FIG. 4 shows an embodiment of a processing sequence in the case where the voice interactive service A2-1 performs a self-styled process of spontaneously transmitting information to the electronic device 1 in the electronic device 1 and the voice interactive service 2 shown in FIG. 2.
  • the voice interactive service A2-1 transmits the utterance control data (1) to the electronic device 1 corresponding to the received event after receiving the event from the outside (S400) (S401).
  • the electronic device 1 Upon receiving the utterance control data (1) (S401), the electronic device 1 performs utterance corresponding to the received utterance control data (1) (S402).
  • the electronic device (1) Upon completion of the utterance corresponding to the received utterance control data (1) (S403), the electronic device (1) transmits the end of the utterance as the utterance end notification to the voice interactive service A2-1 (S404).
  • the volume value of the utterance outputted by the utterance unit 206 of the electronic device 1 during the period from the start of the utterance (1) (S402) to the end of the utterance (1) (S403) is specified by the voice interactive service A2-1. value.
  • the voice interaction service A2-1 in order to specify the volume value output by the utterance portion 206 of the electronic device 1, the voice interaction service A2-1 needs to transmit the utterance volume value from the voice interaction service A2-1 to the electronic device 1.
  • 5A and 5B are diagrams showing an example of a format in which the voice volume value is included in the utterance control data transmitted from the voice interactive service A2-1 to the electronic device 1 in the self-styled processing sequence shown in FIG.
  • 5A shows an example of the format 500A of the voice control data when the voice interaction service A2-1 transmits the voice data and the voice volume value when the voice channel data of the electronic device 1 is played in one data block. .
  • the utterance voice ID 501 is an identification number of the utterance control data.
  • the utterance volume value 502 is a volume value when the voice data of the utterance voice data 503 is played by the utterance unit 206 of the electronic device 1.
  • the utterance voice data 503 is voice data played by the utterance unit 206 of the electronic device 1.
  • the electronic device 1 can utter the utterance volume value 502 included in the data block of the same utterance voice ID 501.
  • FIG. 5B shows an example of the format 500B of the utterance control data when the voice interactive service A2-1 transmits the utterance voice data and the utterance volume value when the utterance unit 206 of the electronic device 1 plays the utterance voice data, respectively. .
  • composition of this format is:
  • the first data block includes: a voiced voice ID 501 and a voice volume value 502;
  • the second data block includes an uttered voice ID 501 and uttered voice data 503.
  • the electronic device 1 detects a data block composed of the uttered voice ID 501 and the utterance volume value 502, and a data block including the uttered voice data 503 in which the same uttered voice ID is set, when included in the detected data block.
  • the utterance voice data 503 is played by the utterance unit 206 of the electronic device 1, it is sufficient to use the utterance volume value 502 to speak. That is, the electronic device determines, according to the uttered voice ID, the utterance volume value to which the uttered voice data is applicable; the utterance unit plays the uttered voice data with the utterance volume value applicable to the uttered voice data.
  • volume value set by the voice interaction service A2-1 for the utterance volume value 502 may be a numerical value or a preset identifier.
  • the identifier does not absolutely indicate the volume value that can be set by the speech unit 206 of the electronic device 1 by a value of 1, 2, 3 or the like, but relatively expresses the volume value such as normal or large.
  • the speech data is output as the volume value when the speech volume value 502 is set as the identifier.
  • FIGS. 5A and 5B an example in which the volume control value is included in the utterance control data transmitted from the voice interactive service A2-1 to the electronic device 1 in the self-styled processing sequence shown in FIG. 4 is shown.
  • the format can also be used in the interactive processing sequence shown in FIG.
  • the voice interactive service A2-1 may set the utterance volume value 502 of the utterance control data transmitted to the electronic device 1 to be invalid (NULL).
  • the utterance unit 206 of the electronic device 1 that has received the utterance control data may use the volume value preset in the electronic device 1 to utter the speech when it is recognized that the utterance volume value 502 is set to be invalid (NULL).
  • FIGS. 5A and 5B shows the value set by the utterance volume value 502 and the volume value when the utterance unit 206 of the electronic device 1 speaks when the value of the utterance volume value 502 of the data format shown in FIGS. 5A and 5B is set to a numerical value.
  • 601 is a value set by the voice interactive service A2-1 for the utterance volume value 502 of the utterance control data.
  • 602 is a volume value when the speech unit 206 of the electronic device 1 plays voice data.
  • the combination 610 of the value set by the speech volume value 502 and the volume value when the speech unit 206 outputs the speech data is an example in which the speech unit 206 of the electronic device 1 sets the value 5 set using the speech volume value 502, and speaks at the volume value of 5. .
  • the combination 611 of the value set by the speech volume value 502 and the volume value when the speech unit 206 outputs the speech data is an example in which the speech unit 206 of the electronic device 1 speaks the value 5 for the speech volume value 502 and speaks at the volume value 4. .
  • the volume value set by the voice interactive service A2-1 exceeds, for example, the upper limit of the volume value range of the electronic device 1, and therefore the speech unit 206 is replaced with an example of the upper limit value 4 of the settable volume value.
  • FIGS. 5A and 5B shows that when the value set by the value of the utterance volume value 502 of the data format shown in FIGS. 5A and 5B is the identifier, the value set by the utterance volume value 502 is played back with the utterance unit 206 of the electronic device 1.
  • the identifier has values of three levels of normal, large, and small.
  • the utterance section 206 of the electronic device 1 can set the volume value to seven levels from 1 to 7, and can also set the values of the identifiers of the three levels of normal, larger, and smaller.
  • the utterance section 206 is replaced with the value 4 to set the volume value, and the voice data is played.
  • the utterance section 206 is replaced with the value 5 to set the volume value, and the voice data is played.
  • the utterance section 206 is replaced with the value 3 to set the volume value, and the voice data is played.
  • the combination 620 of the identifier set by the speech volume value 502 and the volume value when the speech unit 206 outputs the speech data is a comparison of the value of the identifier of the volume value set by the speech unit 206 of the electronic device 1 to the speech volume value 502.
  • a large value is substituted with a value of 5 to set a volume value, and an example of voice data is output.
  • the combination 621 of the value set by the utterance volume value 502 and the volume value when the utterance unit 206 outputs the voice data is the utterance unit 206 of the electronic device 1 replacing the smaller value of the identifier set as the utterance volume value 502 with a numerical value. 3 to set the volume value and output an example of voice data.
  • the volume value set by the speech unit 206 is normally replaced by a numerical value of 4, and the volume value set by the speech unit 206 is replaced by a numerical value of 5, and the volume value set by the speech unit 206 is set.
  • the smaller is replaced by a value of 3, which is only an example and is not limited thereto.
  • the value corresponding to the ordinary value of the identifier may be 4, and the value corresponding to the larger value of the identifier may be 7, and the identifier The smaller corresponding value of the value may be 1.
  • the value of the identifier may have values of, for example, a slightly smaller, normal, slightly larger, larger, and maximum five levels.
  • the voice interactive service 2 can set the volume value when speaking from the electronic device 1 in accordance with, for example, urgency, importance, or suddenness, etc., in accordance with the content to be provided. Thereby, the user of the electronic device 1 can easily recognize the urgency, importance or suddenness of the provided information, and improve the usability of the user of the voice interactive system.
  • the voice interactive service 2 sets the volume value for the utterance volume value 502
  • an identifier can be set for information that is more urgency and more public.
  • the voice interactive service 2 sets a plurality of electronic devices 1 having different terminal specifications for setting information using, for example, a maximum identifier.
  • the transmission processing can be completed by setting information of a much faster volume value for each electronic device 1 than the value corresponding to each terminal specification.
  • FIG. 7 is a flow chart showing the processing when the electronic device 1 receives the utterance control data including the volume value shown in FIG. 5.
  • the communication control unit 204 of the electronic device 1 Upon receiving the utterance control data, the communication control unit 204 of the electronic device 1 starts the reception process (700).
  • the communication control unit 204 analyzes the received utterance control data and acquires the utterance voice ID 501, the utterance volume value 502, and the utterance voice data 503 (S701). The communication control unit 204 transmits the acquired utterance volume value 502 and utterance voice data 503 to the utterance unit 206.
  • the utterance unit 206 outputs the transmitted utterance voice data 503 using the transmitted utterance volume value 502 (S702).
  • the utterance voice data 503 output using the transmitted utterance volume value is limited to the utterance voice data 503 having the same utterance voice ID.
  • the speech unit 206 ends the processing (S703).
  • the transmitting unit 206 of the electronic device 1 uses the utterance volume value 502 included in the transmitted utterance control data to perform the utterance of the voice data of the uttered voice data 503 only when the spontaneous information is transmitted from the voice interactive service 2. .
  • FIG. 8A is a schematic diagram showing an example of a change in the volume value when the speech unit 206 outputs the speech data when the self-styled processing sequence shown in FIG. 4 is performed during the interactive processing sequence shown in FIG. 3.
  • the processing from S800 to S809 in Fig. 8A is the same as the processing from S310 to S319 in Fig. 3.
  • the processing from S820 to S829 is also the same as the processing from S310 to S319 in FIG.
  • the processing from S810 to S814 is the same as the processing from S400 to S404 in FIG.
  • the volume value uttered by the utterance unit 206 of the electronic device 1 is set to, for example, 3, that is, the preset volume value of the utterance portion is 3.
  • the utterance volume value included in the utterance control data transmitted from the voice interactive service A2-1 in S813 is, for example, a value of four.
  • the volume value of the speech from the start of the speech (2) (S807) to the end of the speech (2) (S808) is the volume value 3 set in the speech unit 206.
  • the volume value of the speech from the start of the speech (3) (S812) to the end of the speech (3) (S813) is included in the speech control data (2) received by the electronic device 1 in the process of S811.
  • the volume of the utterance which is the value 4.
  • the volume value of the speech from the start of the speech (4) (S827) to the end of the speech (4) (S828) is the volume value 3 set in the speech unit 206. That is, the utterance section 206 continues to hold the preset volume value before starting to play the content of the utterance control data.
  • FIG. 8B is a view showing another example of the change in the volume value when the speech unit 206 outputs the speech data when the self-style processing sequence shown in FIG. 4 is performed between the interactive processing sequences shown in FIG. 3. schematic diagram.
  • the electronic device 1 is in a state in which the microphone is muted.
  • the mute state is, for example, a state in which the voice input unit 205 of the electronic device 1 is set not to collect the utterances sent by the user.
  • the electronic device 1 does not start the interactive start instruction. Send to voice interactive service A. As a result, the electronic device 1 does not return a response to the utterance from S830 to S831 and from S833 to S834 to the user 5.
  • the electronic device 1 does not respond even if the user 5 makes a call again as shown in S850 to S851 and S853 to S854.
  • FIG. 8C is a view showing still another example of the change in the volume value when the speech unit 206 outputs the speech data when the self-style processing sequence shown in FIG. 4 is performed between the interactive processing sequences shown in FIG. 3. schematic diagram.
  • the state in which the electronic device is muted may be, for example, a state in which the preset volume value is 0 when the calling unit 206 of the electronic device 1 speaks.
  • the user 5 speaks the trigger word A as shown in S860 to S861, and then corresponds to the speech as shown in S863 to S864 ( 1) Even if the utterance control data (2) is transmitted from the voice interactive service A (S866), the electronic device 1 does not perform utterance corresponding to the utterance control data (2). As a result, the user 5 does not hear the responses to the speeches of S860 to S861 and S863 to S864.
  • the electronic device 1 does not respond even if the user 5 performs the speech again as shown in S880 to S881 and S883 to S884.
  • the utterance unit 206 of the electronic device 1 can only use the utterance volume value included in the utterance control data in the self-styled processing sequence to utter the speech, and in other cases, according to the electronic device 1 A set volume value or a setting state such as a mute state is used to control the utterance.
  • the utterance unit 206 can be set to play according to the value of the utterance volume value included in the utterance control data in each sequence.
  • the volume value of the voice data can be set to play according to the value of the utterance volume value included in the utterance control data in each sequence.
  • FIG. 9A shows the volume at which the speech unit 206 outputs the speech data when the speech interaction service A2-1 continues to process the self-styled sequence corresponding to the event (S910) from the outside after the event (S900) from the outside.
  • the processing from S900 to S904 is the same as the processing from S400 to S404 in Fig. 4 .
  • the processing from S910 to S914 is also the same as the processing from S400 to S404 in FIG.
  • the volume value of the speech from the start of the speech (1) (S902) to the end of the speech (1) (S903) is the speech control data (1) received by the electronic device 1 in the process of S901.
  • the included volume value such as the value 4.
  • the utterance volume value during the period from the start of the utterance (2) (S912) to the end of the utterance (2) (S913) is the utterance control data (2) received by the language interactive terminal 1 in the process of S911.
  • the included volume value such as the value 2.
  • the electronic device 1 can be switched between use among a plurality of voice interactive services.
  • FIG. 9B shows the utterance unit 206 outputs when the electronic device 1 is capable of switching between the voice interactive service A2-1 and the voice interactive service B2-2, and the processing sequence corresponding to the external event in each voice interaction service is performed.
  • the processing from S920 to S924 is the same as the processing from S400 to S404 in Fig. 4 .
  • the processing from S930 to S934 is also the same as the processing from S400 to S404 in FIG.
  • the utterance volume value from the start of the utterance (1) (S922) to the end of the utterance (1) (S923) is included in the utterance control data (1) received by the electronic device 1 in the process of S921.
  • the volume of the speech volume such as the value 4.
  • the utterance volume value from the start of the utterance (2) (S932) to the end of the utterance (2) (S933) is the utterance control data (2) received by the language interactive terminal 1 in the process of S931.
  • the included volume value such as the value 2.
  • the processing sequence example of FIG. 9B is that the electronic device 1 receives the utterance control data (1) from the voice interactive service A2-1 (S921), and receives the voice interactive service B2-2 during processing (S922 to S924) thereof.
  • the electronic device 1 can also use the separately specified utterance volume value to set the utterance section. The volume value of the voice data played by 206.
  • the voice interaction system of the embodiment of the present invention uses the menu displayed on the display unit 207 of the electronic device 1, and can select a self-styled information providing service through the self-styled processing sequence performed by the voice interaction service 2, or set a condition. .
  • the selection or condition setting of the self-styled information providing service of the self-styled processing sequence by the voice interactive service when the user inputs through the interface of the self-styled information providing menu, the input content is registered to the voice interactive service via the network 3. 2 in the spontaneous processing system 265.
  • the autonomous processing system 265 supplies the information conforming to the registered content to the electronic device 1 of the user 5 with reference to the kind or condition of the registered information providing service.
  • the user 5 of the electronic device 1 can select a self-styled information providing service that matches his or her preference from among the self-styled information providing services performed by a large number of voice interactive services.
  • the autonomous processing system 265 supplies the information of the kind of the information providing service conforming to the registered content to the electronic device 1 of the user 5 with reference to the kind of the information providing service registered.
  • the user 5 of the electronic device 1 wants to further filter the provided information in the selected self-styled information providing service C.
  • the user 5 wants to register the location information of the electronic device 1 from the interface of the self-styled information providing menu, for example.
  • the autonomous processing system 265 refers to the condition of the registered information providing service C, and supplies only the information conforming to the condition in the information of the information providing service C to the electronic device 5 of the user 5.
  • the voice interactive service spontaneously provides information to the electronic device, there is a case where the information is information with higher urgency or higher importance.
  • the volume value of the utterance of the utterance unit 206 of the electronic device 1 can be controlled based on the content provided by the voice interactive service 2. Values so that information can be effectively provided to users.
  • the user 5 can filter the self-styled information provided by the voice interactive service 2, for example, by filtering the self-styled information by the regionality of the provided information, it is possible to easily obtain the self-styled information that is consistent with the user's needs.
  • the speech unit 206 may transmit the response corresponding to the voice transmitted by the control unit 202, or may transmit the content of the notification that is not spontaneously transmitted according to the voice transmitted from the control unit 202 by voice. Outputting, and using the volume value added when the notification is sent by the voice interactive service 2, that is, the volume value of the utterance in the uttered voice data, the content of the notification is voice-outputted.
  • voice output may also be performed based on the notified voice category.
  • the system memory 203 may store a correspondence relationship between the voice category and the volume value to which the voice category is applicable, thereby determining the volume value to which the voice category identifier is applied according to the voice category attached when the voice interaction service 2 transmits the notification, and at the volume The value to play the content of the notification.
  • the user 5, the electronic device 1 and the voice interaction service 2 interact, and the process of generating a response or notification by the voice interaction service 2 is similar to the process in the foregoing embodiment, where Let me repeat.
  • the difference is that when the electronic device 1 performs voice output on the notification sent by the voice interactive service 2, the voice category added by the voice interaction service 2 when the notification is sent can be read, and the volume is applied according to the voice category and the voice category.
  • the correspondence between the values determines the volume value corresponding to the voice category, and outputs the content of the notification with the voice of the corresponding volume value.
  • Fig. 10 shows an example of a data block including utterance control data of a voice class.
  • the correspondence relationship between the voice category and the volume value to which the voice category is applied is stored in advance in the electronic device 1.
  • the voice class identifier may represent a voice class of the call control data, and the volume value indicates a volume value corresponding to the voice class.
  • the volume value can be a numerical value, and can be set according to actual needs, and is not limited herein.
  • the storage device in the electronic device prestores a correspondence between a voice class identifier (IDC) and a volume value.
  • IDC voice class identifier
  • Figure 11 shows an example of a data block of utterance control data including a voice class identifier. Similar to FIG. 5A, when outputting voice data, the voice interaction service A2-1 will voice voice data, a volume value when the voice data is played in the calling unit 206 of the electronic device 1, and a voice category indicating the voice type of the voice. An example of the format of the utterance control data when the identifier is transmitted as one data block.
  • the uttered voice ID is an identification number of the utterance control data, and the voice category identifier indicates the type of the utterance.
  • the data block of the utterance control data includes a voice class identifier and utterance voice data.
  • Fig. 12 shows an example of a volume value management table of each category.
  • a correspondence between the voice class identifier and the volume value to which the voice class is applied may be stored.
  • the voice class identifier indicates the voice class of the utterance, for example, the number is 001, 002, 003, 004, 005.
  • the number of the number and the number of the number are only examples, which is not limited herein.
  • the volume value indicates the volume value applicable to the voice output of each voice category, or the volume value corresponding to each voice category. The larger the volume value, the larger the sound when outputting the voice data of the type.
  • the application example shows a scenario or an example in which each voice category is applicable.
  • the voice category identifier is a volume value corresponding to the number 001, and the category may be, for example, a normal voice interaction, that is, when the normal voice interaction is performed, the output volume is relatively low; the voice category identifier is The volume value corresponding to the number 002 is 50.
  • the category may be, for example, timing and reminder, that is, the volume of the output may be slightly higher when the volume value is timed and reminded; the voice category identifier is the volume value corresponding to the number 003.
  • the category may be, for example, a news broadcast, that is, when the news broadcast is performed, the volume of the output is generally;
  • the voice category identifier is a volume value corresponding to the number 004, and the category may be, for example, a normal alarm, that is, when the normal alarm is output.
  • the volume is slightly higher;
  • the voice category identifier is the volume value corresponding to the number 005, which may be, for example, an emergency alert, that is, in the case of an emergency alert, the volume of the output is the largest.
  • the voice class identifier attached by the voice interaction service 2 when transmitting the voice data can be read, and determined according to the volume value management table.
  • the volume value corresponding to the voice class identifier is not fixed and can be updated or changed.
  • Fig. 13 shows an example of the steps of the volume value change notification
  • the voice interaction service 2 may send the voice category and volume value setting information corresponding to the voice category to the electronic device 1, and then the electronic device 1 may set the voice category and the volume value according to the voice category and the volume value. The information is updated to update the volume value corresponding to the voice category.
  • the voice interaction service 2 can transmit a voice category identifier when transmitting a voice category. .
  • the voice category identifier takes a value of 001, and the volume value setting information is 50, the electronic device 1 receives the voice category identifier. After the character and volume value setting information, the volume value corresponding to the voice category numbered 001 can be updated to 50.
  • the electronic device 1 may not update the preset volume value of the voice category. That is, the system memory 203 may store, for the voice category, control information for indicating whether to perform the update process as described in FIG. 13 according to the voice category and volume value setting information transmitted by the voice interaction service 2; when in the electronic device 1
  • the communication control unit 204 receives the voice type and volume value setting information transmitted by the voice interactive service 2
  • the communication control unit 204 determines whether or not to update the volume applicable to the voice output of the specified category based on the control information corresponding to the voice type. The value, if yes, updates the volume value applicable to the speech output of the speech category based on the speech category and the volume value setting information. If not, it will not be updated.
  • the control information for indicating whether to update based on the voice category and volume value setting information transmitted by the voice interactive service 2 is simultaneously stored in the electronic device 1. For example, taking the voice category identifier numbered 001 as an example, if the changeable mark corresponding to the voice category is “not available”, it means that even if the volume value setting information for the voice category from the voice interactive service 2 is received, It is also impossible to change the volume value; for example, the voice category identifier numbered 004, and the changeable label corresponding to the voice category is “OK”, indicating that the volume from the voice interaction service 2 is received for the voice category. For the value setting information, the volume value can be changed. The specific change process is similar to that of FIG. 12 and will not be described here.
  • Fig. 15 is a flow chart showing whether or not to implement volume change of each category.
  • the electronic device 1 After receiving the volume value setting update notification of the voice category, that is, after receiving the voice category and the volume value setting information corresponding to the voice category, the electronic device 1 can determine whether the volume value corresponding to the voice category can be updated, and if so, Then, according to the category and volume value setting information, the volume value applicable when the category is output is updated, and if not, the volume value applicable when the category is output is not updated.
  • an electronic device includes one or more processors including one or more memories in which preset instructions are stored, and the processor can read and execute instructions in the memory for:
  • utterance control data sent from the server, where the utterance control data includes at least a utterance volume value and utterance voice data;
  • the uttered voice data is played using the utterance volume value.
  • the uttered voice data is played with the utterance volume value to which the uttered voice data is applicable.
  • an electronic device includes one or more processors including one or more memories in which preset instructions are stored, and the processor can read and execute instructions in the memory.
  • a method for receiving utterance control data sent from a server where the utterance control data includes at least uttered voice data and a voice category identifier;
  • a volume value setting information for receiving a voice category and the voice category, and updating a volume value to which the voice recognition is applied according to the voice category and the volume value setting information.
  • an electronic device includes one or more processors including one or more memories in which preset instructions are stored.
  • the memory stores control information for indicating whether to receive the voice category and the volume value setting information of the voice category to be updated;
  • the processor is configured to read and execute preset instructions stored in the memory,
  • Receiving the voice category and the volume value setting information of the voice category determining whether to update according to the control information; if yes, updating the voice category according to the specified category and the volume value setting information Applicable volume value.
  • the voice interaction terminal includes one or more processors including one or more memories, the memory storing preset instructions, and the processor for reading and executing the preset instructions stored in the memory.
  • the trigger word detecting unit For transmitting, when the trigger word detecting unit detects the trigger word, at least the voice located after the trigger word in the voice to the server;
  • the utterance control data includes at least a utterance volume value and utterance voice data
  • the voice interaction terminal includes one or more processors, including one or more memories, storing volume values applicable to categories of output voice types and voice outputs of the respective categories; a preset instruction that the processor uses to read and execute preset instructions stored in the memory for
  • the trigger word detecting unit For transmitting, when the trigger word detecting unit detects the trigger word, at least the voice located after the trigger word in the voice to the server;
  • a method for receiving utterance control data sent from a server where the utterance control data includes at least uttered voice data and a voice category identifier;
  • the memory in the above embodiments may be a non-volatile memory readable by a processor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本发明的目的在于提供一种电子设备,其中,电子设备不使用在电子设备中所设定的音量值输出由语音交互服务自发地提供的信息,而是使用由语音交互服务所通知的音量值进行输出。解决方案为:提供一种电子设备,将由外部输入的语音经由网络发送给语音交互服务,所述发话部将所述发话控制数据的内容以所述语音交互服务发送所述发话控制数据时附加的音量值的语音进行输出。

Description

电子设备及电子设备控制方法
本申请要求于2018年3月13日提交日本特许厅、申请号为2018-045903、发明名称为“语音交互终端及语音交互终端控制方法”的日本专利申请的优先权。上述申请的全部内容通过引用结合在本申请中。
技术领域
本发明实施方式涉及电子设备及电子设备的控制方法。
背景技术
语音交互系统例如用麦克风收集用户发出的语音,通过语音识别处理解析收集到的语音,从而辨别用户发出的内容,并向用户提供与辨别出的内容相对应的响应。该语音交互系统大体上包括语音交互服务部分和电子设备部分这两部分。
语音交互服务部分具有将用户发出的内容作为输入,通过语音识别处理解析所输入的内容,并向用户提供与解析结果相对应的响应的功能。
电子设备部分具有将由麦克风收集的用户的发话内容作为语音数据输入到语音交互服务,或将语音交互服务输出的响应的内容作为语音输出给用户,或进行外围设备的控制的功能。
此外,语音交互服务有时并不提供与用户发出的内容的输入相对应的响应,语音交互服务会自发地向电子设备提供信息。
现有技术文献
专利文献1:日本特开2017-122930号公报
发明内容
发明要解决的问题
电子设备中输出的语音音量的大小一般由用户设定。且根据电子设备的种类的不同,还存在除了语音交互用音量外,可以为例如唤醒等提醒功能用音量等各个功能单独设定语音音量的设备。
另一方面,当语音交互服务自发地向电子设备提供信息时,当电子设备将被提供的信息作为语音输出时,一般使用在电子设备中预设的语音交互用音量值。
语音交互服务自发地提供的服务的内容是对于用户而言紧急性较高的信息或重要度较高的信息。即使当这种紧急性较高的信息或重要度较高的信息作为语音输出时,电子设备也以通常的语音交互用音量值进行输出。
因此,即使是在想要向用户传达紧急性较高的信息或重要度较高的信息时,如果通常的语音交互用音量值设定得较低,则只能以设定为较低的音量值输出语音,从而存在用户无法识别所输出的语音的紧急性或重要性的问题。
由此,本发明实施方式的目的在于提供一种由语音交互服务和语音交互装置构成的语 音交互系统,其中,电子设备不使用在电子设备中预设的语音交互用音量值输出由语音交互服务自发地提供的信息,而是使用由服务器所通知的发话音量值输出。
解决问题的方案
本发明实施方式的电子设备为将从外部输入的语音发送给服务器的电子设备,包括:语音输入部,收集从外部输入的所述语音;触发词检测部,从由所述语音输入部输入的所述语音中检测触发词;控制部,当所述触发词检测部检测到所述触发词时,至少将所述语音中位于所述触发词之后的所述语音发送给所述服务器;发话部,将所述服务器对应于通过所述控制部发送的所述语音而发送的响应,或所述服务器不根据从所述控制部发送的所述语音而自发地发送的发话控制数据的内容以语音进行输出;记录部,存储表示输出语音类型的类别和各类别的语音输出时所适用的音量值;所述发话部读取所述服务器在发送所述发话控制数据时附加的类别,从所述记录部读取所述附加的类别对应的音量值,并以所述对应的音量值的语音输出所述发话控制数据的内容。
附图说明
图1示出了应用了本发明一实施方式的语音交互系统的概略图;
图2示出了图1所示的电子设备和语音交互服务的详细的结构图;
图3示出了在图2中所示的电子设备和语音交互服务中,识别了用户5发出的触发词的电子设备1与语音交互服务A2-1进行交互型的处理时所进行的处理的处理序列的一例的示意图;
图4示出了在图2中所示的电子设备和语音交互服务中,语音交互服务A进行自发地向电子设备发送信息的自发型的处理时的处理序列的一例的示意图;
图5A示出了语音交互服务A将发话语音数据和在电子设备1的发话部播放该发话语音数据时的发话音量值以一个数据块发送时的发话控制数据的格式的一例;
图5B示出了语音交互服务A将发话语音数据和在电子设备1的发话部播放该发话语音数据时的发话音量值分别作为数据块发送时的发话控制数据的格式的一例;
图6A示出了当为图5A和5B所示的数据格式的发话音量值502设定的值为数值时,发话音量值502所设定的值与电子设备1的发话部206播放语音数据时的音量值之间的关系的示意图;
图6B示出了当为图5A和5B所示的数据格式的发话音量值502设定的值为标识符时,发话音量值502所设定的值与电子设备1的发话部206播放语音数据时的音量值之间的关系的示意图;
图7示出了当电子设备接收了包括图5所示的音量值的发话控制数据时的处理流程的示意图;
图8A示出了在图3所示的交互型的处理序列期间进行了图4所示的自发型的处理序列时,发话部206输出发话数据时的音量值的变化情况的一例的示意图;
图8B示出了在图3所示的交互型的处理序列之间进行了图4所示的自发型的处理序列时,发话部206输出发话数据时的音量值的变化情况的另一例的示意图;
图8C示出了在图3所示的交互型的处理序列期间进行了图4所示的自发型的处理序列时,发话部206输出发话数据时的音量值的变化情况的又一例的示意图;
图9A示出了语音交互服务A2-1在来自外部的事件A900之后,同样继续对应于来自外部的事件B910进行了自发型的处理序列时,发话部206输出发话语音数据时的音量值的变化情况的一例的示意图;
图9B示出了当电子设备1能够切换使用语音交互服务A2-1和语音交互服务B2-2时,对应于各个语音交互服务中的外部事件而进行了自发型的处理序列时,发话部206输出发话语音数据时的音量值的变化情况的一例的示意图;
图10示出了各个类别的音量值变更通知的数据块例;
图11示出了使用语音类别标识符(IDC)的输出语音通知的数据块例;
图12示出了各个类别的音量值管理表的例①;
图13示出了音量值变更通知的步骤例;
图14示出了各个类别的音量值管理表的例②;
图15示出了判断是否实施各个类别的音量变更的流程图。
附图标记说明
1:电子设备                2:语音交互服务
3:网络                    201:触发词检测部
202:控制部                203:系统存储器
204:通信控制部            205:语音输入部
206:发话部                207:显示部
208:操作按钮              261:语音识别系统
262:意图理解系统          263:交互处理系统
265:自发处理系统
具体实施方式
以下参照附图对本发明的具体实施方式进行说明。
本发明实施例提供了一种电子设备,包括通信接收部、发话部。所述通信接收部用于接收从服务器发送来的发话控制数据,所述发话控制数据至少包括发话音量值和发话语音数据;所述发话部使用所述发话音量值播放所述发话语音数据。具体的,在本发明某些实施例中,该电子设备可以为手机、也可以为智能音箱等语音交互终端,也可以仅具有接受服务器发送的信息等发话控制数据,并广播服务器的信息等发话控制数据的电子设备。
本发明实施例中的服务器,可以为能够与电子设备交互的服务器,例如与智能音箱等语音交互终端进行语音交互的语音交互服务,也可以仅具有向电子设备发送通知等发话控制数据的服务器。
举例说明,本发明某些实施例提供了一种语音交互终端,例如一智能音箱。图1为本发明某些实施例中提供的包含了电子设备1的语音交互系统的概略图。
该语音交互系统例如由配置在房间4中的电子设备1和存在于云端的语音交互服务2构成,电子设备1和语音交互服务2可以经由网络3彼此进行通信。
电子设备1还经可由诸如蓝牙(Bluetooth,注册商标)、紫蜂(ZigBee,注册商标)、Wi-Fi等近距离无线通信系统与设置在房间4中的灯具10、空气调节器(空调)11、录像播放器件12进行通信。此外,电子设备1还可以通过诸如红外通信等不需要配对的通信 方式来控制外围设备。此外,电子设备1还可以与除了这里所示的电子设备之外的其他电子设备进行通信。
语音交互服务2包括语音交互服务A2-1和语音交互服务B2-2这两个语音交互服务。使用语音交互服务A2-1和语音交互服务B2-2中的哪一个由用户发出的触发词所确定。
此外,图1的例子虽然示出了语音交互服务2具有语音交互服务A2-1和语音交互服务B2-2这两个语音交互服务的例子,但例如也可以仅存在一个语音交互服务,或者存在三个或更多的语音交互服务。
当用户对电子设备1发话后,电子设备1将从附带的拾音器,例如麦克风收集到的用户发话的语音数据经由网络3发送到语音交互服务2。
接收到从电子设备1发送来的语音数据的语音交互服务2对接收到的语音数据进行解析,并生成基于解析出的内容的响应。在生成响应后,语音交互服务2经由网络3将所生成的响应发送到电子设备1。
语音交互服务2生成的响应包括语音响应和命令响应这两种响应。语音响应是基于从电子设备1所输入的语音数据,由语音交互服务2生成的响应。命令响应是基于从电子设备1所输入的语音数据,控制电子设备1所具有的电子设备(器件)或通过近距离无线通信系统等与电子设备1相连接的外围设备(外围器件)的命令。电子设备1所具有的电子设备(器件)例如是附带的相机。通过近距离无线通信系统等与电子设备1相连接的外围设备(外围器件)例如是灯具10或空气调节器(空调)11。
语音数据响应的响应内容是对于用户向电子设备1所说的例如“早上好”之类的话而对应于用户发话的内容做出的诸如“早上好。今天感觉这么样?”这样的回复。或者例如对于“现在乘坐新干线去大阪的话几点到达?”这样的提问,对应于用户的询问做出的诸如“现在开始30分钟后出发的话,将在晚上8点前到达大阪”这样的回答。
电子设备1在从语音交互服务2得到响应后,当响应是语音数据响应时,可以将该响应的内容通过例如附带的扬声器作为语音来输出。由此,用户可以听到语音交互系统对自己所说的话的响应。
命令响应的响应内容包括命令,例如,用户向电子设备1说“打开空调”、语音交互服务2的命令响应的响应内容可以包括“器件=空调11,操作=ON,模式=制冷,设定=温度26度、风量最大”的命令。例如,用户向电子设备1说“开一下灯”,语音交互服务2的命令响应的响应内容可以包括“器件=灯具10,操作=ON”的命令。
在从语音交互服务2得到响应后,如果响应是命令响应,则电子设备1进行包含于命令中的作为控制对象的器件的控制。例如,当命令的内容包括“器件=空调11,操作=ON,模式=制冷,设定=温度26度、风量最大”时,电子设备1通过其内部具有的Wi-Fi、ZigBee、Bluetooth等近距离无线通信系统控制,以温度为26度、最大风量的设定启动空调11。
对于用户向电子设备1所说的例如“播放AAA视频服务的BBB内容”之类的话,命令响应的内容可以包括诸如“play from www.xxxxxx.co.jp/musicBBB.wav”,也可由命令部分的“play”与基于用户的发话内容而转换成为文本数据的部分的“www.xxxxxx.co.jp/musicBBB.wav”构成。
在从语音交互服务2获得响应后,如果该响应为包括文本数据的命令响应时,则电子 设备1除了解释命令之外还进行文本数据部分的解释并对作为控制对象的器件进行控制。例如,当命令的内容为“play from www.xxxxxx.co.jp/musicBBB.wav”时,电子设备1可以获取www.xxxxxx.co.jp/musicBBB.wav的数据,并在电子设备1中播放所获取到的数据。
这样,语音交互服务2可以提供基于与用户交互的信息。
此外,在没有来自电子设备1的语音数据的输入时,语音交互服务2也可以自发地向电子设备1提供信息。
由语音交互服务2自发地提供的信息例如可以是接近用户附近公共汽车站的公共汽车的信息、接近用户的住宅区域的云雨等天气信息、也可以是对应于用户个人需求的信息等,或者也可以是例如地震紧急速报或海啸警报等公共性较高的信息。
图2示出了图1中所示的电子设备1和语音交互服务2的详细的结构图。电子设备1和语音交互服务2可以经由网络3彼此通信。
电子设备1由触发词检测部201,控制电子设备整体的控制部202,包括用于控制电子设备1的程序或工作存储器的系统存储器203,用于通过网络3与语音交互服务2或外围器件通信的通信控制部204,收集用户的发话的语音输入部205,用于输出语音数据响应的发话部206,显示电子设备1的状态、响应的内容、设定功能的功能设定界面等的显示部207,以及用户操作的操作按钮208构成。
触发词检测部(也可以称为关键词检测部)201是从用户发话的内容中检测触发词的处理部。
触发词是用户发出的用于开始与电子设备1的交互处理的规定的触发词。当触发词检测部201检测到用户发出了规定的关键词即触发词后,电子设备1将在位于触发词之后的用户发话内容作为向电子设备1说出的话,并继续对用户发话的内容进行处理。
应该由触发词检测部201检测的触发词被存储在系统存储器203的触发词存储区域(未图示)中。不管用户说出了哪个设定的触发词,触发词检测部201都始终可以将其检测出。另外,仅当检测到了设定的触发词时,为了处理此后的用户发话内容而通知控制部202已检测出所述设定的触发词。接收到通知的控制部202在与语音交互服务2进行数据交换的同时,处理此后的用户发话内容。
通信控制部204是控制与语音交互服务2的通信的处理部。通信控制部204监视经由网络3的与语音交互服务2的通信状态,并通知控制部202是否可以与语音交互服务2进行通信。此外,通信控制部204可以包括Bluetooth、ZigBee、Wi-Fi等近距离无线通信系统,或者红外通信等通信方式。
语音输入部205包括拾音器,例如麦克风等,可以收集由用户发出的语音的处理部。
发话部206是当由语音交互服务2生成的响应是语音响应时,将从语音交互服务2发送来的语音响应中的数据以语音输出的处理部。具体的,在某些实施例中,语音响应中的数据可以是语音数据,发话部206以语音输出。在某些实施例中,语音响应中的数据可以是文本数据。从语音交互服务2所发送的语音响应中的数据例如可以是文本数据,获得文本数据的发话部206可以使用语音合成功能将文本数据转换为语音数据,并以语音输出。此外,即使当语音交互服务2自发地提供信息时,发话部206也进行将所发送来的数据的内容以语音的形式输出的处理。从语音交互服务2所发送的语音响应中的数据例如可以是 文本数据,获得文本数据的发话部206可以使用语音合成功能将文本数据转换为语音数据,并以语音输出。
语音交互服务2具有语音交互服务A2-1和语音交互服务B2-2这两种语音交互服务。语音交互服务A2-1包括识别从电子设备1发送来的语音数据并将其转换为发话文本的语音识别系统261、从发话文本中理解用户发话意图的语音意图理解系统262、以及对于由语音意图理解系统262所理解出的用户发出的发话内容生成响应的交互处理系统263和自发处理系统265。在某些实施例中,可以由语音交互服务A2-1包括识别从电子设备1发送来的语音数据并将其转换为发话文本的语音识别系统261、从发话文本中理解用户发话意图的语音意图理解系统262、以及对于由语音意图理解系统262所理解出的用户发出的发话内容生成响应的交互处理系统263和自发处理系统265构成。
与基于由语音识别系统261与交互处理系统263的电子设备1的交互的处理不同,自发处理系统265对应于来自外部的事件而进行自发地向电子设备1提供信息的处理。
另外,与语音交互服务A2-1相同,语音交互服务B2-2也包括识别从电子设备1发送来的语音数据并将其转换为文本数据的语音识别系统271、理解从由语音数据转换为文本数据的字符串的意图的语音意图理解系统272、以及用于对由语音意图理解系统272所理解出的用户发话内容而生成响应的交互处理系统273和自发处理系统275。在某些实施例中,语音交互服务B2-2也由识别从电子设备1发送来的语音数据并将其转换为文本数据的语音识别系统271、理解从由语音数据转换为文本数据的字符串的意图的语音意图理解系统272、以及用于对由语音意图理解系统272所理解出的用户发话内容而生成响应的交互处理系统273和自发处理系统275构成。
语音交互服务A2-1和语音交互服务B2-2分别具有各自所擅长的语音交互服务处理的特性,可分别具有不同的所擅长的处理领域(电气、医疗、农业、体育等领域)。
图2的例子虽然示出了语音交互服务2包含语音交互服务A2-1和语音交互服务B2-2这两个语音交互服务的例子,但例如也可以仅包括一个语音交互服务,也可以包括三个或更多的语音交互服务。
图3示出了在图2中所示的电子设备1和语音交互服务2中,识别了用户5发出的触发词的电子设备1与语音交互服务A2-1进行交互型的处理时的处理序列的示意图。此外,假设当判定用户5发出的触发词是触发词A时,电子设备1被预设为使用语音交互服务A2-1来生成对用户5的发话的响应。
当用户5发话(S310、S311)后,收集用户5发话的电子设备1的语音输入部205将其收集的语音作为语音数据发送到触发词检测部201。触发词检测部201判定从语音输入部205发送来的语音数据与通过语音识别处理而被预先存储在系统存储器203的触发词是否一致。
判定的结果是,当判定用户5在S310、S311中所说出的话是触发词A时,电子设备1向语音交互服务A2-1发布交互开始指示(S312)。接收到交互开始指示(S312)的语音交互服务A2-1为解析从电子设备1发送来的此后的语音数据而进行准备。
用户5在S310、S311发话后接着对电子设备1进行发话(S313、S314)。在识别出用户5的一系列的发话(S313、S314)已经结束后,收集了用户5发话的语音的电子设备1将收集到的发话的语音作为语音数据发送到语音交互服务A2-1(S315)。此外,即使在 用户5的发话期间,电子设备1也可以将此前收集的发话的语音依次作为语音数据发送到语音交互服务A2-1。
另外,电子设备1在S315中向语音交互服务A2-1发送的语音数据可以仅是S313至S314中用户发话的语音数据,也可以是包含了S310至S311的触发词A的语音数据。或者也可以是S310至S314之间的用户发话中的任意区间中的发话的语音数据。
接收到从电子设备1发送来的语音数据的语音交互服务A2-1解析接收到的语音数据并生成与解析结果相对应的响应。已完成响应生成的语音交互服务A2-1将生成的响应即发话控制数据(2)发送到电子设备1(S316)。
从语音交互服务A2-1接收到响应的电子设备1基于响应的内容进行动作。在图3的示例中,由语音交互服务A2-1生成的响应即发话控制数据(2)是语音数据的情形。接收(S316)到发话控制数据(2)的电子设备1将该响应的内容通过发话部206作为语音进行输出(S317、S318)。发话部206以在电子设备1中预设的语音交互用音量值将该响应的内容输出。
在输出完响应后,电子设备1向语音交互服务A2-1发送发话输出已经结束的交互结束通知(S319)。
在从发话开始(2)S317到发话结束(2)的期间,由电子设备1的发话部206输出的音量值是在电子设备1中预设的通常的语音交互用音量值。
图4示出了在图2中所示的电子设备1和语音交互服务2中,语音交互服务A2-1进行向电子设备1自发地发送信息的自发型的处理时的处理序列的实施例的示意图。
语音交互服务A2-1在从外部接收到事件(S400)后,对应于接收到的事件将发话控制数据(1)发送到电子设备1(S401)。电子设备1在接收到发话控制数据(1)(S401)后,进行与接收到的发话控制数据(1)相对应的发话(S402)。在结束了与接收到的发话控制数据(1)相对应的发话(S403)后,电子设备(1)将发话的结束作为发话结束通知发送到语音交互服务A2-1(S404)。
在此,在从发话(1)开始(S402)到发话(1)结束(S403)的期间内的由电子设备1的发话部206输出的发话的音量值是由语音交互服务A2-1指定的值。
在某些实施例中,为了指定由电子设备1的发话部206输出的音量值,语音交互服务A2-1需要将发话音量值从语音交互服务A2-1发送到电子设备1。
图5A和5B是在图4所示的自发型的处理序列中,当语音交互服务A2-1发送到电子设备1的发话控制数据中包括发话音量值时的格式的例子。
图5A示出了语音交互服务A2-1将发话语音数据和在电子设备1的发话部206播放该发话语音数据时的发话音量值以一个数据块发送时的发话控制数据的格式500A的一个例子。
发话语音ID501是发话控制数据的标识号。
发话音量值502是将发话语音数据503的语音数据由电子设备1的发话部206播放时的音量值。
发话语音数据503是由电子设备1的发话部206播放的语音数据。
此时,当包括发话语音数据503的数据块由发话部206播放时,电子设备1可以使用包含在相同的发话语音ID501的数据块中的发话音量值502来发话。
图5B示出了语音交互服务A2-1将发话语音数据和在电子设备1的发话部206播放该发话语音数据时的发话音量值分别作为数据块发送时的发话控制数据的格式500B的一个例子。
该格式的具体构成为:
第一数据块:包括发话语音ID501以及发话音量值502;
第二数据块:包括发话语音ID501以及发话语音数据503。
此时,电子设备1检测由发话语音ID501和发话音量值502构成的数据块,以及包括被设定了相同发话语音ID的发话语音数据503的数据块,当包含在检测到的数据块中的发话语音数据503由电子设备1的发话部206播放时,使用发话音量值502发话即可。即电子设备根据所述发话语音ID确定所述所述发话语音数据适用的发话音量值;发话部,以所述发话语音数据适用的发话音量值播放所述发话语音数据。
此外,语音交互服务A2-1为发话音量值502设定的音量值可以是数值或预设的标识符。
标识符并不以1、2、3等数值来绝对地表示电子设备1的发话部206可设定的音量值,而是以诸如普通、较大来相对性地表示音量值。
当电子设备1的发话部206以将标识符的值替换成数值的值的大小,作为发话音量值502被设定为标识符时的音量值,来输出语音数据。
此外,在图5A和5B中,示出了在图4所示的自发型的处理序列中,由语音交互服务A2-1发送到电子设备1的发话控制数据中包含音量值时的例子,该格式也可以用在图3所示的交互型的处理序列中。例如,语音交互服务A2-1可将发送给电子设备1的发话控制数据的发话音量值502设定为无效(NULL)。接收到发话控制数据的电子设备1的发话部206在当识别出了发话音量值502被设定为无效(NULL)时,使用电子设备1中预设的音量值发话即可。
图6A示出了在图5A和5B所示的数据格式的发话音量值502的值设定为数值时,发话音量值502所设定的值与电子设备1的发话部206发话时的音量值之间的关系的示意图。
601是由语音交互服务A2-1为发话控制数据的发话音量值502设定的值。602是当电子设备1的发话部206播放语音数据时的音量值。
发话音量值502所设定的值与发话部206输出语音数据时的音量值的组合610是电子设备1的发话部206使用发话音量值502被设定的值5,以音量值5发话的例子。
发话音量值502所设定的值与发话部206输出语音数据时的音量值的组合611为电子设备1的发话部206对于为发话音量值502设定的值5,以音量值4发话的例子。本例为由于语音交互服务A2-1设定的音量值例如超过了电子设备1所具有的音量值范围的上限,因此发话部206替换为可设定的音量值的上限值4的例子。
图6B示出了在图5A和5B所示的数据格式的发话音量值502的值所设定的值为标识符时,发话音量值502所设定的值与电子设备1的发话部206播放语音数据时的音量值之间的关系的示意图。
在图6B的例子中,假设标识符具有普通、较大、较小这三个级别的值。
此外,假设电子设备1的发话部206可以将音量值设定为从1到7的七个等级,并且 还可以设定为普通、较大、较小这三个等级的标识符的值。在此,当发话音量值502所设定的音量值是标识符中的普通时,发话部206替换为数值4来设定音量值,并播放语音数据。另外,当发话音量值502所设定的音量值是标识符中的较大时,发话部206替换为数值5来设定音量值,并播放语音数据。另外,当发话音量值502所设定的音量值是标识符中的较小时,发话部206替换为数值3来设定音量值,并播放语音数据。
发话音量值502所设定的标识符与发话部206输出语音数据时的音量值的组合620为电子设备1的发话部206将设定为发话音量值502的音量值的标识符的值的较大替换为数值5来设定音量值,并输出语音数据的例子。
发话音量值502所设定的值与发话部206输出语音数据时的音量值的组合621是电子设备1的发话部206将设定为发话音量值502的标识符的值的较小替换为数值3来设定音量值,并输出语音数据的例子。
此外,在图6B的例子中,发话部206设定的音量值普通替换为数值后为4,发话部206设定的音量值较大替换为数值后为5,发话部206设定的音量值较小替换为数值后为3,这些仅为一例,并不限于此。例如,在与标识符的每个值相对应的数值中,与标识符的值的普通相对应的数值可以为4,与标识符的值的较大相对应的数值为可以7,与标识符的值的较小相对应的数值可以为1。标识符的值可以具有例如稍小、普通、稍大、大、最大五个级别的值。
这样,当自发地向电子设备1提供信息时,语音交互服务2可以根据例如紧急度、重要度或者突发性等,与要提供的内容相应地设定从电子设备1发话时的音量值。由此,电子设备1的使用者可以容易地识别出所提供的信息的紧急性、重要性或突发性,提高了语音交互系统的使用者的易用性。
另外,语音交互服务2在为发话音量值502设定音量值时,可以基于自发地提供的信息的内容来选择是设定数值还是设定标识符。例如,可以为具有紧急性较高且公众性较高的信息设定标识符。作为紧急性较高和公众性较高的信息,例如在提供紧急地震速报时,语音交互服务2对于具有不同终端规范的多个电子设备1,针对提供信息时使用例如最大等标识符进行设定要比以符合各个终端规范的数值来针对各个电子设备1设定音量值快得多的信息,能完成其发送处理。
图7示出了电子设备1接收了包括图5所示的音量值的发话控制数据时的处理流程示意图。
电子设备1的通信控制部204在接收到发话控制数据后,开始接收处理(700)。
通信控制部204解析接收到的发话控制数据并获取发话语音ID501、发话音量值502和发话语音数据503(S701)。通信控制部204将获取到的发话音量值502和发话语音数据503传送到发话部206。
发话部206使用传送来的发话音量值502来输出传送来的发话语音数据503(S702)。
此外,在发话部206中,使用传送来的发话音量值输出的发话语音数据503仅限于具有相同的发话语音ID的发话语音数据503。在语音数据的发话完成后,发话部206结束处理(S703)。
这样,电子设备1的发话部206仅在从语音交互服务2发送了自发性的信息时才使用包含于发送来的发话控制数据中的发话音量值502来进行发话语音数据503的语音数据的 发话。
接下来,通过如图3所示的交互型的处理序列和如图4所示的自发型的处理序列的组合方式,来对发话部206输出发话语音数据时的音量值的变化情况进行说明。
图8A示出了在图3所示的交互型的处理序列期间进行了图4所示的自发型的处理序列时,发话部206输出发话数据时的音量值的变化情况的一例的示意图。
图8A中的从S800到S809的处理与图3中的从S310到S319的处理相同。另外,从S820到S829的处理也与图3中的从S310到S319的处理相同。另外,从S810到S814的处理与图4中的从S400到S404的处理相同。
这里,假设由电子设备1的发话部206发话的音量值被设定为例如3,即发话部的预设音量值为3。另一方面,假设在S813中从语音交互服务A2-1发送来的发话控制数据中所包含的发话音量值例如是数值4。
此时,从发话(2)开始(S807)到发话(2)结束(S808)为止期间的发话的音量值是在发话部206中设定的音量值3。另一方面,从发话(3)开始(S812)到发话(3)结束(S813)为止期间的发话的音量值是在S811的处理中由电子设备1接收的发话控制数据(2)中所包含的发话音量值,即数值4。
更进一步地,从发话(4)开始(S827)到发话(4)结束(S828)为止期间的发话的音量值是在发话部206中所设定的音量值3。即发话部206继续保持在开始播放所述发话控制数据的内容之前的预设音量值。
图8B示出了在图3所示的交互型的处理序列之间进行了图4所示的自发型的处理序列时,发话部206输出发话数据时的音量值的变化情况的另一个例子的示意图。
假设电子设备1处于麦克风静音的状态。静音状态例如是电子设备1的语音输入部205被设定为不收集用户发出的发话的状态。
在麦克风静音的状态下,如图8B所示,即使用户5如S830到S831所示发出触发词A并接着进行如S833到S834所示的发话(1),电子设备1也不将交互开始指示发送到语音交互服务A。其结果是,电子设备1不向用户5返回对于从S830到S831以及从S833到S834的发话的响应。
在这种状态下,当进行了与图4所示的自发型的处理序列相同的S840至S844时,发话(3)开始(S842)到发话(3)结束(S843)为止期间的发话音量值是在S861的处理中由电子设备1接收的发话控制数据(2)中所包含的发话音量值,即数值4。
另外,在发话(3)开始(S853)到发话(3)结束(S854)之后,即使用户5再次如S850到S851以及S853到S854所示进行发话,电子设备1也不发出响应。
图8C示出了在图3所示的交互型的处理序列之间进行了图4所示的自发型的处理序列时,发话部206输出发话数据时的音量值的变化情况的又一个例子的示意图。
电子设备静音的状态例如可以是当电子设备1的发话部206发话时的预设音量值为0的状态。
在当发话部206发话时的音量值为0的静音的状态下,如图8C所示,用户5说出如S860到S861所示的触发词A,接着如S863到S864所示对应于发话(1),即使从语音交互服务A发送了发话控制数据(2)(S866),电子设备1也不对应于该发话控制数据(2)进行发话。其结果是,用户5听不到对于S860到S861以及S863到S864的发话的响应。
在这种状态下,当进行了与图4所示的自发型的处理序列相同的S870到S874时,发话(3)开始(S872)到发话(3)结束(S873)为止期间的发话的音量值是在S871的处理中由电子设备1接收到的发话控制数据(2)中所包含的发话音量值,即数值4。
另外,在发话(3)开始(S872)到发话(3)结束(S873)之后,即使用户5再次如S880到S881以及S883到S884所示进行发话,电子设备1也不发出响应。
这样,电子设备1的发话部206仅在自发型的处理序列中使用包含于发话控制数据中的发话音量值来发话,而在除此之外的其他情况下,可以根据在电子设备1中所设定的音量值或诸如静音状态等设定状态来进行发话的控制。
进一步地,在本发明实施方式的语音交互系统中,即使自发型的处理序列是连续的,也可以根据各个序列中发话控制数据中所包含的发话音量值的值来设定由发话部206播放的语音数据的音量值。
图9A示出了语音交互服务A2-1在来自外部的事件(S900)之后,同样继续对应来自外部的事件(S910)进行了自发型的处理序列时,发话部206输出发话语音数据时的音量值的变化情况的一个例子的示意图。从S900到S904的处理与图4中的从S400到S404的处理相同。另外,从S910到S914的处理也与图4中的从S400到S404的处理相同。
在此,从发话(1)开始(S902)到发话(1)结束(S903)为止的期间的发话的音量值是在S901的处理中由电子设备1接收到的发话控制数据(1)中所包含的发话音量值,例如数值4。另一方面,从发话(2)开始(S912)到发话(2)结束(S913)为止的期间的发话音量值是在S911的处理中由语言交互终端1接收到的发话控制数据(2)中所包含的发话音量值,例如数值2。
此外,在本发明实施方式的语音交互系统中,如参考图1的说明,电子设备1可以在多个语音交互服务之间切换使用。
图9B示出了当电子设备1能够切换使用语音交互服务A2-1和语音交互服务B2-2时,对应于各个语音交互服务中的外部事件进行了自发型的处理序列时,发话部206输出发话语音数据时的音量值的变化情况的一例的示意图。
从S920到S924的处理与图4中的从S400到S404的处理相同。另外,从S930到S934的处理也与图4中从S400到S404的处理相同。
在此,从发话(1)开始(S922)到发话(1)结束(S923)为止期间的发话音量值是在S921的处理中由电子设备1接收到的发话控制数据(1)中所包含的发话音量值,例如数值4。另一方面,从发话(2)开始(S932)到发话(2)结束(S933)为止期间的发话音量值是在S931的处理中由语言交互终端1接收到的发话控制数据(2)中所包含的发话音量值,例如数值2。
图9B的处理序列示例为电子设备1接收来自语音交互服务A2-1的发话控制数据(1)(S921),并在对其进行处理(S922至S924)期间接收到来自语音交互服务B2-2的发话控制数据(2)(S931)的例子。此时,由于可以通过接收到的发话控制数据中所包含的发话语音ID来识别对应的发话语音数据和发话音量值,电子设备1也可以使用被分别指定的发话音量值来设定由发话部206播放的语音数据的音量值。
进一步地,本发明实施例的语音交互系统使用电子设备1的显示部207上所显示的菜单,可通过语音交互服务2进行的自发型的处理序列选择自发型的信息提供服务,或设定 条件。关于通过语音交互服务进行的自发型的处理序列的自发型的信息提供服务的选择或条件设定,当用户通过自发型信息提供菜单的界面输入后,输入内容经由网络3被注册于语音交互服务2的自发处理系统265中。
当存在来自外部的事件时,自发处理系统265参考被注册的信息提供服务的种类或条件,将与注册内容相符合的信息提供给用户5的电子设备1。
例如,电子设备1的用户5可以从由大量的语音交互服务进行的自发型信息提供服务之中选择与自己喜好相符的自发型信息提供服务。当存在来自外部的事件时,自发处理系统265参考被注册的信息提供服务的种类,将与注册内容相符合的信息提供服务的种类的信息提供给用户5的电子设备1。
另外,例如也存在电子设备1的用户5想要在所选择的自发型信息提供服务C中对所提供的信息进一步筛选的情况。例如可能会有在由自发型信息提供服务C提供的信息中,用户5想要的仅是自身居住地周边的信息。此时,用户5可以从例如自发型信息提供菜单的界面将电子设备1的位置信息进行注册。当存在来自外部的事件时,自发处理系统265参考被注册的信息提供服务C的条件,并只将与信息提供服务C的信息中的条件相符合的信息提供给用户5的电子设备5。
这样,当语音交互服务自发地向电子设备提供信息时,存在信息为紧急性较高或重要度较高的信息的情况。为了处理这种情况,通过准备语音交互服务2指定电子设备1的发话部206的发话的音量值的功能,可以基于语音交互服务2提供的内容来控制电子设备1的发话部206的发话的音量值,从而可以有效地向用户提供信息。
另外,由于用户5可以对语音交互服务2提供的自发型信息进行筛选,例如,通过所提供的信息的区域性对自发型信息进行筛选,因此可以轻松获取与用户需求相符的自发型信息。
在上述实施例中,发话部206可以将语音交互服务2对应于通过控制部202发送的语音而发送响应,或者将不根据从控制部202发送的语音而自发地发送的通知的内容以语音进行输出,并且采用语音交互服务2发送通知时附加的音量值,即发话语音数据中的发话音量值,来语音输出该通知的内容。
在某些实施例中,还可以根据通知的语音类别别来进行语音输出。例如,系统存储器203可以存储语音类别与语音类别适用的音量值的对应关系,从而根据语音交互服务2发送通知时附加的语音类别,来确定该语音类别标识符适用的音量值,并以该音量值来播放通知的内容。
需要说明的是,在下述实施例中,用户5、电子设备1与语音交互服务2之间进行交互,由语音交互服务2产生响应或通知的过程与前述实施例中的过程相似,此处不再赘述。不同的是,在电子设备1在对语音交互服务2所发送的通知进行语音输出时,可以读取语音交互服务2在发送通知时附加的语音类别,并根据语音类别和语音类别所适用的音量值之间的对应关系,确定语音类别对应的音量值,并以对应的音量值的语音输出该通知的内容。
图10示出了包含语音类别的发话控制数据的数据块例。
在电子设备1中预先存储语音类别和语音类别适用的音量值的对应关系。
其中,语音类别标识符(Identifier Category,IDC)可以表示发话控制数据的语音 类别,而音量值则表示对应于该语音类别的音量值。该音量值可以为数值,具体可以根据实际需求设定,此处并不限制。
电子设备1中的存储装置,例如系统存储器203预存有语音类别标识符(Identifier Category,IDC)与音量值的对应关系。
图11示出了包括语音类别标识符的发话控制数据的数据块的例子。与图5A类似,在输出发话语音数据时,语音交互服务A2-1将发话语音数据、在电子设备1的发话部206播放该发话语音数据时的音量值、以及表示该发话语音类型的语音类别标识符以一个数据块发送时的发话控制数据的格式的一个例子。
其中,发话语音ID是发话控制数据的标识号,语音类别标识符表示该发话语的类型。在某些实施例中,发话控制数据的数据块包括语音类别标识符和发话语音数据。
图12示出了各个类别的音量值管理表的例。如上文所述,在电子设备1的记录部,具体的,在某些实施例中可以是系统存储器203中,可以存储语音类别标识符和语音类别所适用的音量值之间的对应关系。
其中,语音类别标识符表示该发话语的语音类别,例如可以顺序编号为001、002、003、004、005,此处的编号值和编号的数量仅为示例,本文对此并不作限定。
音量值则表示各语音类别的语音输出时所适用的音量值,或称为各语音类别对应的音量值,音量值越大,表示在输出该类型的发话语音数据时的声音越大。
适用例表示各语音类别适用的场景或例子。
例如,根据此音量值管理表,语音类别标识符为编号001所对应的音量值为30,该类别可以是比如普通语音交互,即进行普通语音交互时,输出的音量相对较低;语音类别标识符为编号002对应的音量值为50,该类别可以是比如定时和提醒,即在定时和提醒时音量值时,输出的音量可以稍高一些;语音类别标识符为编号003对应的音量值为40,该类别可以是比如新闻播报,即进行新闻播报时,输出的音量一般;语音类别标识符为编号004对应的音量值为50,该类别可以是比如普通警报,即在普通警报时,输出的音量稍高;语音类别标识符为编号005对应的音量值为70,该类别可以是比如紧急警报,即在紧急警报的情况下,输出的音量最大。
按照上述图中示出的实施例,当电子设备1在输出发话语音数据时,可以读取语音交互服务2在发送发话语音数据时附加的语音类别标识符,并根据上述音量值管理表,确定所述附加的语音类别标识符对应的音量值,并以所述对应的音量值的语音输出该发话语音数据的内容。
语音类别标识符所对应的音量值并不是固定不变的,也可以对其进行更新或变更。
图13示出了音量值变更通知的步骤的例子
为了更新某一语音类别对应的音量值,语音交互服务2可以向电子设备1发送该语音类别和对应于该语音类别的音量值设定信息,然后电子设备1可以根据该语音类别和音量值设定信息,更新该语音类别对应的音量值。语音交互服务2在发送语音类别时,可以发送语音类别标识符。。
如图12所示,如果语音交互服务2发送的为语音类别标识符,该语音类别标识符取值为001,而上述音量值设定信息为50,则电子设备1在接收到上述语音类别标识符和音量值设定信息后,即可将编号为001的语音类别对应的音量值更新为50。
此外,电子设备1在接收到语音交互服务2所发送的语音类别和音量值设定信息后,也可以不更新上述语音类别的预设音量值。即系统存储器203可以针对所述语音类别,存储用于指示是否根据语音交互服务2发送的语音类别和音量值设定信息进行如图13所述的更新过程的控制信息;当电子设备1中的通信控制部204接收到由语音交互服务2发送的语音类别和音量值设定信息时,通信控制部204根据该语音类别对应的控制信息,确定是否更新该指定类别的语音输出时所适用的音量值,若是,则根据该语音类别和该音量值设定信息,更新该语音类别的语音输出时所适用的音量值。若否,则不更新。
如图14示,电子设备1中同时存储用于指示是否根据语音交互服务2发送的语音类别和音量值设定信息进行更新的控制信息。例如,以编号为001的语音类别标识符为例,该语音类别对应的可变更标记为“不可”,则表示,即便收到来自语音交互服务2的针对该语音类别的音量值设定信息,也不可变更其音量值;以编号为004的语音类别标识符为例,该语音类别对应的可变更标记为“可”,则表示,若收到来自语音交互服务2的针对该语音类别的音量值设定信息,则可变更其音量值,具体的变更过程与图12类似,此处不再赘述。
图15示出了判断是否实施各个类别的音量变更的流程图。电子设备1在接收到语音类别的音量值设定更新通知,即接收到语音类别和对应于该语音类别的音量值设定信息后,可以判断是否能够更新该语音类别对应的音量值,若是,则根据该类别和音量值设定信息,更新该类别输出时所适用的音量值,若否,则不更新该类别输出时所适用的音量值。
本发明某些实施例中,电子设备包括一个或多个处理器,包括一个或多个存储器,存储器中存储有预设的指令,处理器可以读取并执行存储器中的指令,用于:
接收从服务器发送来的发话控制数据,所述发话控制数据至少包括发话音量值和发话语音数据;
使用所述发话音量值播放所述发话语音数据。
根据所述发话语音ID确定所述所述发话语音数据适用的发话音量值;
以所述发话语音数据适用的发话音量值播放所述发话语音数据。
本发明某些实施例中,电子设备包括一个或多个处理器,包括一个或多个存储器,存储器中存储有预设的指令,处理器可以读取并执行存储器中的指令,
用于存储表示输出语音类型的类别和各类别的语音输出时所适用的音量值;
用于接收从服务器发送来的发话控制数据,所述发话控制数据至少包括发话语音数据和语音类别标识符;
用于读取语音类别标识符,从所述记录部读取所述语音类别标识符对应的音量值,并以所述对应的音量值的语音播放所述发话语音数据。
用于接收语音类别和所述语音类别的音量值设定信息,并根据所述语音类别和所述音量值设定信息,更新所述语音识别所适用的音量值。
本发明某些实施例中,电子设备包括一个或多个处理器,包括一个或多个存储器,存储器中存储有预设的指令,
存储器存储用于指示是否接收语音类别和所述语音类别的音量值设定信息进行更新的控制信息;
处理器用于读取并执行存储器中存储的预设的指令,
接收到语音类别和所述语音类别的音量值设定信息时,根据所述控制信息,确定是否更新;若是,则根据所述指定类别和所述音量值设定信息,更新所述语音类别所适用的音量值。
本发明某些实施例中,语音交互终端包括一个或多个处理器,包括一个或多个存储器,存储器中存储有预设的指令,处理器用于读取并执行存储器中存储的预设的指令,
用于收集从外部输入的所述语音;
其将从外部输入的语音发送给服务器;
用于从由所述语音输入部输入的所述语音中检测触发词;
用于当所述触发词检测部检测到所述触发词时,至少将所述语音中位于所述触发词之后的所述语音发送给所述服务器;
用于接收服务器发送来的语音响应;
用于使用预设音量值播放所述语音响应;
用于接收从服务器发送来的发话控制数据,所述发话控制数据至少包括发话音量值和发话语音数据;
用于使用所述发话音量值播放所述发话语音数据。
本发明某些实施例中,语音交互终端包括一个或多个处理器,包括一个或多个存储器,存储表示输出语音类型的类别和各类别的语音输出时所适用的音量值;存储器中存储有预设的指令,处理器用于读取并执行存储器中存储的预设的指令用于
用于收集从外部输入的所述语音;
其将从外部输入的语音发送给服务器;
用于从由所述语音输入部输入的所述语音中检测触发词;
用于当所述触发词检测部检测到所述触发词时,至少将所述语音中位于所述触发词之后的所述语音发送给所述服务器;
用于接收服务器发送来的语音响应;
用于使用预设音量值播放所述语音响应;
用于接收从服务器发送来的发话控制数据,所述发话控制数据至少包括发话语音数据和语音类别标识符;
用于读取语音类别标识符,从所述记录部读取所述语音类别标识符对应的音量值,并以所述对应的音量值的语音播放所述发话语音数据。
以上实施例中的存储器可以为处理器可读的非易失性存储器。
以上虽然已经对本发明的若干实施方式进行了说明,但是这些实施方式仅是作为示例呈现的,而并不旨在限制本发明的范围。这些新的实施方式可以由其他各种方式来实施,并且在不脱离本发明的主旨的范围内可以进行各种省略、替换和变更。这些实施方式或其变形包含在本发明的范围和主旨内,且同样包含在权利要求书所记载的发明及其等同的范围内。进一步地,在权利要求的各构成要素中,无论是将构成要素分开表示的情况,还是多个合并表示的情况,或者将这些组合表示的情况均在本发明的范围之内。另外,可以组合多个实施方式,由该组合所构成的实施例也在本发明的范围内。
另外,在本说明书和各附图中,会对产生与之前的附图描述的内容相同或相似的功能 的构成要素赋予相同的附图标记,以避免重复而适当地省略详细说明的情况。此外,在将权利要求表述为控制逻辑时,或表述为包含用于使计算机执行的说明的程序时,以及表述为记载了所述说明的计算机可读记录介质时,也可应用本发明的装置。此外,所使用的名称和术语也并非限定性的,任何实质上相同的内容或相同主旨的其他表述也应包含于本发明中。

Claims (10)

  1. 一种电子设备,包括通信接收部、发话部;
    所述通信接收部用于接收从服务器发送来的发话控制数据,所述发话控制数据至少包括发话音量值和发话语音数据;
    所述发话部使用所述发话音量值播放所述发话语音数据。
  2. 根据权利要求1所述的电子设备,其特征在于所述发话音量值、所述发话语音数据以一个数据块被所述电子设备接收。
  3. 根据权利要求2所述的电子设备,其特征在于,所述一个数据块包括发话语音ID、所述发话音量值、所述发话语音数据。
  4. 根据权利要求1所述的电子设备,其特征在于,所述发话音量值、所述发话语音数据以分别以不同的数据块被所述电子设备接收。
  5. 根据权利要求4所述的电子设备,其特征在于,
    发话语音ID和所述发话音量值以第一数据块被所述电子设备接收;
    所述发话语音ID和所述发话语音数据以第一数据块被所述电子设备接收;
    所述电子设备根据所述发话语音ID确定所述所述发话语音数据适用的发话音量值;
    所述发话部,以所述发话语音数据适用的发话音量值播放所述发话语音数据。
  6. 一种电子设备,其特征在于,包括:
    记录部,用于存储表示输出语音类型的类别和各类别的语音输出时所适用的音量值;
    通信接收部,用于接收从服务器发送来的发话控制数据,所述发话控制数据至少包括发话语音数据和语音类别标识符;
    发话部,用于读取语音类别标识符,从所述记录部读取所述语音类别标识符对应的音量值,并以所述对应的音量值的语音播放所述发话语音数据。
  7. 如权利要求6所述的电子设备,其特征在于,所述通信控制部接收语音类别和所述语音类别的音量值设定信息,并根据所述语音类别和所述音量值设定信息,更新所述语音识别所适用的音量值。
  8. 根据权利要求7所述的电子设备,其特征在于,
    所述记录部,存储用于指示是否接收语音类别和所述语音类别的音量值设定信息进行更新的控制信息;
    当所述通信控制部接收到语音类别和所述语音类别的音量值设定信息时,根据所述控制信息,确定是否更新;若是,则根据所述指定类别和所述音量值设定信息,更新所述语音类别所适用的音量值。
  9. 一种语音交互终端,其将从外部输入的语音发送给服务器,其特征在于,所述语音交互终端还包括:
    语音输入部,用于收集从外部输入的所述语音;
    触发词检测部,用于从由所述语音输入部输入的所述语音中检测触发词;
    控制部,用于当所述触发词检测部检测到所述触发词时,至少将所述语音中位于所述触发词之后的所述语音发送给所述服务器;
    通信接收部,用于接收服务器发送来的语音响应;
    发话部,用于使用预设音量值播放所述语音响应;
    所述通信接收部还用于接收从服务器发送来的发话控制数据,所述发话控制数据至少包括发话音量值和发话语音数据;
    所述发话部使用所述发话音量值播放所述发话语音数据。
  10. 一种语音交互终端,其将从外部输入的语音发送给服务器,其特征在于,所述语音交互终端还包括:
    语音输入部,用于收集从外部输入的所述语音;
    触发词检测部,用于从由所述语音输入部输入的所述语音中检测触发词;
    控制部,用于当所述触发词检测部检测到所述触发词时,至少将所述语音中位于所述触发词之后的所述语音发送给所述服务器;
    通信接收部,用于接收服务器发送来的语音响应;
    发话部,用于使用预设音量值播放所述语音响应;
    记录部,用于存储表示输出语音类型的类别和各类别的语音输出时所适用的音量值;
    所述通信接收部,用于接收从服务器发送来的发话控制数据,所述发话控制数据至少包括发话语音数据和语音类别标识符;
    所述发话部,用于读取语音类别标识符,从所述记录部读取所述语音类别标识符对应的音量值,并以所述对应的音量值的语音播放所述发话语音数据。
PCT/CN2019/078052 2018-03-13 2019-03-13 电子设备及电子设备控制方法 WO2019174604A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201980016654.4A CN112189230A (zh) 2018-03-13 2019-03-13 电子设备及电子设备控制方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018045903A JP6929811B2 (ja) 2018-03-13 2018-03-13 音声対話端末、および音声対話端末制御方法
JP2018-045903 2018-03-13

Publications (1)

Publication Number Publication Date
WO2019174604A1 true WO2019174604A1 (zh) 2019-09-19

Family

ID=67907319

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/078052 WO2019174604A1 (zh) 2018-03-13 2019-03-13 电子设备及电子设备控制方法

Country Status (3)

Country Link
JP (1) JP6929811B2 (zh)
CN (1) CN112189230A (zh)
WO (1) WO2019174604A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1397063A (zh) * 2000-11-27 2003-02-12 皇家菲利浦电子有限公司 对具有声音输出装置的设备进行控制的方法
CN103517119A (zh) * 2012-06-15 2014-01-15 三星电子株式会社 显示设备、控制显示设备的方法、服务器以及控制服务器的方法
US20150127340A1 (en) * 2013-11-07 2015-05-07 Alexander Epshteyn Capture
CN106205648A (zh) * 2016-08-05 2016-12-07 易晓阳 一种语音控制音乐网络播放方法

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4080986B2 (ja) * 2003-10-28 2008-04-23 三菱電機株式会社 音声通知装置
CN100587746C (zh) * 2004-12-23 2010-02-03 艾利森电话股份有限公司 向多个移动终端通知紧急事件的方法
CN101489091A (zh) * 2009-01-23 2009-07-22 深圳华为通信技术有限公司 一种语音信号传输处理方法及装置
CN101909105A (zh) * 2009-06-05 2010-12-08 鸿富锦精密工业(深圳)有限公司 手机音量调节方法
JP6068088B2 (ja) * 2012-10-22 2017-01-25 ホーチキ株式会社 防災警報システム
JP5996603B2 (ja) * 2013-10-31 2016-09-21 シャープ株式会社 サーバ、発話制御方法、発話装置、発話システムおよびプログラム
CN103943105A (zh) * 2014-04-18 2014-07-23 安徽科大讯飞信息科技股份有限公司 一种语音交互方法及系统
JP6391386B2 (ja) * 2014-09-22 2018-09-19 シャープ株式会社 サーバ、サーバの制御方法およびサーバ制御プログラム
JP6678315B2 (ja) * 2015-04-24 2020-04-08 パナソニックIpマネジメント株式会社 音声再生方法、音声対話装置及び音声対話プログラム
JP6779659B2 (ja) * 2015-07-21 2020-11-04 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America 制御方法および制御装置
CN106231108B (zh) * 2016-08-10 2019-10-29 Tcl移动通信科技(宁波)有限公司 一种移动终端音量控制方法及系统
CN107146613A (zh) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 一种语音交互方法及装置
CN107084511B (zh) * 2017-06-21 2019-09-06 广东美的暖通设备有限公司 用于指导操作空调的方法和装置、空调

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1397063A (zh) * 2000-11-27 2003-02-12 皇家菲利浦电子有限公司 对具有声音输出装置的设备进行控制的方法
CN103517119A (zh) * 2012-06-15 2014-01-15 三星电子株式会社 显示设备、控制显示设备的方法、服务器以及控制服务器的方法
US20150127340A1 (en) * 2013-11-07 2015-05-07 Alexander Epshteyn Capture
CN106205648A (zh) * 2016-08-05 2016-12-07 易晓阳 一种语音控制音乐网络播放方法

Also Published As

Publication number Publication date
JP2019159121A (ja) 2019-09-19
JP6929811B2 (ja) 2021-09-01
CN112189230A (zh) 2021-01-05

Similar Documents

Publication Publication Date Title
JP6402748B2 (ja) 音声対話装置および発話制御方法
US8527258B2 (en) Simultaneous interpretation system
US9641954B1 (en) Phone communication via a voice-controlled device
KR102489914B1 (ko) 전자 장치 및 이의 제어 방법
US20100250253A1 (en) Context aware, speech-controlled interface and system
KR102056330B1 (ko) 통역장치 및 그 방법
JP2003223188A (ja) 音声入力システム、音声入力方法及び音声入力プログラム
KR102151626B1 (ko) 통화 중 특정 태스크를 처리하는 장치 및 그 방법
KR101327112B1 (ko) 주변 소리 정보를 이용하여 다양한 사용자 인터페이스를 제공하는 단말기 및 그 제어방법
KR102447381B1 (ko) 통화 중 인공지능 서비스를 제공하기 위한 방법 및 그 전자 장치
JP5616390B2 (ja) 応答生成装置、応答生成方法および応答生成プログラム
JP2013195823A (ja) 対話支援装置、対話支援方法および対話支援プログラム
KR20190030081A (ko) 인공 지능 비서 서비스 제공 방법, 및 이에 사용되는 음성 인식 장비
US10255266B2 (en) Relay apparatus, display apparatus, and communication system
KR20200045851A (ko) 음성 인식 서비스를 제공하는 전자 장치 및 시스템
US10002611B1 (en) Asynchronous audio messaging
JP6385150B2 (ja) 管理装置、会話システム、会話管理方法及びプログラム
WO2019174604A1 (zh) 电子设备及电子设备控制方法
CN111258529A (zh) 电子设备及其控制方法
JP2007286376A (ja) 音声案内システム
KR102000282B1 (ko) 청각 기능 보조용 대화 지원 장치
KR102170902B1 (ko) 실시간 다자 통역 무선 이어셋 및 이를 이용한 송수신 방법
JP2018055155A (ja) 音声対話装置および音声対話方法
JP2011128260A (ja) 外国語会話支援装置、方法、プログラム、および電話端末装置
JP2020119043A (ja) 音声翻訳システムおよび音声翻訳方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19768234

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19768234

Country of ref document: EP

Kind code of ref document: A1