WO2023209888A1

WO2023209888A1 - Information processing device

Info

Publication number: WO2023209888A1
Application number: PCT/JP2022/019144
Authority: WO
Inventors: 悟滝澤
Original assignee: パイオニア株式会社
Priority date: 2022-04-27
Filing date: 2022-04-27
Publication date: 2023-11-02

Abstract

When there is a point that a user desires to confirm because the user could not understand a guidance voice or the guidance voice is insufficient to understand, a device or system finds out the situation and a cause of generation of the point that the user desires to confirm the guidance voice in order to improve the guidance voice.　This information processing device outputs the guidance voice to be provided to the user from a voice output unit. In addition, a detection unit detects a response of the user for the guidance voice. In addition, an extraction unit extracts a confirmation action of the user pertaining to the guidance voice from the response detected by the detection unit. In addition, an estimation unit estimates, on the basis of the confirmation action, a point that the user desires to confirm. In addition, a storage unit stores information about the point estimated by the estimation unit.

Description

information processing equipment

The present invention relates to an information processing device.

Car navigation systems that provide route guidance using audio information are known (for example, see Patent Documents 1 and 2).

Patent Document 1 discloses a route guidance device that includes a guidance output section that outputs a guidance voice and a direction input section that serves as an input means. Patent Document 2 discloses a speech recognition device that includes a determination unit that determines a recognition vocabulary comprehension level that indicates the extent to which a user understands speech-recognizable vocabulary from the acquired user's utterance content.

JP2008-261641A Japanese Patent Application Publication No. 2012-27487

In a route guidance device such as that described in Patent Document 1, if a user determines that the direction in which to proceed is uncertain even after listening to the guidance voice, the user inputs the direction into the direction input section and queries the system side. Check whether the recognized direction is correct. However, in this case, it is difficult for the system to understand why the user has determined that the direction in which he should proceed is uncertain, and it is difficult to improve the guidance voice that the user has determined to be uncertain about. difficult.

In a speech recognition device such as that described in Patent Document 2, a microphone collects speech indicating the contents of a user's utterance, a speech recognition unit performs speech recognition processing on the collected speech, and a count is performed during the speech recognition processing. The recognition vocabulary understanding level is determined based on the number of timeouts and the like. Then, the guidance is changed according to the recognition vocabulary understanding level. However, even if the system is able to determine whether or not the user was able to speak with the system when determining recognition vocabulary comprehension, it is not possible for the system to understand why the user was able to speak (or why the user was unable to speak). difficult to do.

The problem to be solved by the present invention is that when there is a matter that the user wants to confirm due to reasons such as the user not being able to understand the guidance voice or the fact that the guidance voice is lacking in some matters, the situation is An example of this is for the device or system to understand the cause of the occurrence of an item that the user wants to confirm regarding the guidance voice so that the device or system can use the information to improve the guidance voice.

In order to solve the above problem, the invention according to claim 1 includes: an audio output unit that outputs a guidance voice to be provided to a user; a detection unit that detects a reaction of the user to the guidance voice; an extraction unit that extracts the user's confirmation behavior regarding the guide voice from among the detected reactions; an estimation unit that estimates what the user wants to confirm based on the confirmation behavior; The device is characterized by comprising a storage unit that stores information about the item.

The invention according to claim 10 is an information processing method executed by a computer, comprising: an audio output step of outputting a guidance voice to be provided to a user; a detection step of detecting a reaction of the user to the guidance voice; an extraction step of extracting the user's confirmation behavior regarding the guidance voice from among the reactions detected in the detection step; an estimation step of estimating what the user wants to confirm based on the confirmation behavior; The method is characterized by comprising a storage step of storing information about the estimated item.

The invention according to claim 11 is characterized in that the information processing method according to claim 10 is executed by a computer as an information processing program.

The invention according to claim 11 is characterized in that the information processing program according to claim 10 is stored in a computer-readable storage medium.

1 is a schematic configuration diagram of an information processing device according to an embodiment of the present invention. 2 is a flowchart of the operation of the information processing apparatus shown in FIG. 1. FIG. FIG. 2 is a schematic configuration diagram of an information processing device in Example 2. FIG. FIG. 3 is a schematic configuration diagram of an information processing device in Example 3. FIG.

An information processing device according to an embodiment of the present invention will be described below. In the information processing device of the present invention, the audio output unit outputs guidance audio provided to the user. Then, the detection unit detects the user's reaction to the guidance voice. Then, the extraction unit extracts the user's confirmation behavior regarding the guidance voice from among the reactions detected by the detection unit. Then, the estimating unit estimates what the user wants to confirm based on the confirmation behavior. Then, the storage unit stores information about the items estimated by the estimation unit. By doing this, the extraction section can extract the user's confirmation behavior from among the user's reactions detected by the detection section, so that it is possible to detect that the user needs some kind of confirmation regarding the guidance voice. It is possible for the information processing device to understand that a situation has occurred in which the user wants to confirm something due to reasons such as the user not being able to understand the guidance voice or the information being lacking in the guidance voice. . In addition, the estimating section estimates the items that the user wants to confirm based on the confirmation behavior, and the storage section can store information about the items estimated by the estimating section, so it can accumulate information about the estimated items. It can be utilized. Therefore, by analyzing the accumulated information, it is possible for the device or system side to understand the cause of the occurrence of the matters that the user wants to confirm regarding the guidance voice, so that it can be used to improve the guidance voice. .

Furthermore, the storage unit may be configured to classify and store information about the items estimated by the estimation unit according to predetermined conditions. By doing this, information about the matters estimated by the estimation unit can be stored, for example, as "the content of the guidance voice," "the voice of the guidance voice itself," "things related to place names," and "things related to the speed of the guidance voice." , ``Things related to the pronunciation of the guidance voice,'' and so on, so for example, regarding the first guidance voice, we can analyze by category what kind of content the user needed to confirm. When utilizing the information stored in the storage unit in the future, such as in a case, the handling of the matter can be facilitated.

Furthermore, the information processing device may include a route search unit that searches for a route for a moving body, and the guidance voice may be a voice that conveys to the user the route for the mobile body that has been searched by the route search unit. Further, the confirmation action does not have to be an action for the route search unit. By doing this, if a problem occurs such as the user not being able to understand the guidance voice or there being something missing in the guidance voice, as described above, the user can resolve the problem. An attempt to solve the problem using a means other than using the route search unit can be extracted as a confirmation action. Specifically, when a user driving a mobile vehicle stops the vehicle on the roadside after listening to a guidance voice and searches for information using a device such as a smartphone, or when the driver of a mobile vehicle asks a fellow passenger to stop the vehicle and search for information using a device such as a smartphone. If you try to solve the above problem using a different method than using a route search unit, such as when you instruct someone to re-check the route using a mobile device, check their behavior. It can be extracted as an action. Therefore, it is possible to make it easier for the information processing device to understand that a situation in which the user wants to check the guidance voice has occurred, and to analyze the cause of the occurrence of the situation in which the user wants to check the guidance voice. , and can contribute to improving the guidance voice.

The information processing device also includes a route search unit that searches for a route for the mobile object, and the guidance voice is a voice that conveys the route of the mobile body searched by the route search unit to the user. The output section, the detection section, the extraction section, the estimation section, and the storage section may be provided in a terminal device that moves together with the mobile object. By doing so, the information processing device can be configured with one terminal device without using external equipment such as a server device.

The information processing device also includes a route search unit that searches for a route for the mobile object, and the guidance voice is a voice that conveys the route of the mobile body searched by the route search unit to the user. The output section, the extraction section, the estimation section, and the storage section may be provided in a first terminal device that moves together with the mobile object. Further, the detection unit may be provided in a second terminal device that moves together with the mobile object and can communicate with the first terminal device. By doing so, the load on each device can be reduced compared to an information processing device configured with one terminal device.

The information processing device also includes a route search unit that searches for a route for the mobile object, and the guidance voice is a voice that conveys the route of the mobile body searched by the route search unit to the user. The output unit may be provided in a first terminal device that moves together with the mobile object. Further, the detection unit may be provided in a second terminal device that moves together with the mobile object and can communicate with the first terminal device. Further, the extraction unit, the estimation unit, and the storage unit may be provided in a server device that can communicate with the first terminal device and the second terminal device. By doing so, the load on each device can be reduced compared to an information processing device configured with one terminal device or an information processing device configured with a first terminal device and a second terminal device. can.

Additionally, the second terminal device may include an information search unit independent from the route search unit. In this case, the confirmation actions include the user using the information search unit to search for a route for a moving object, the user using the information search unit to search for the coordinates of a specific point, and the user searching for information. It may be possible to perform a keyword search for a specific word using the section. By doing this, if a problem occurs such as the user not being able to understand the guidance voice or there being something missing in the guidance voice, the user can resolve the problem independently from the route search section. By using the information retrieval unit, it is possible to extract the problem that was attempted to be solved as a confirmation action. Therefore, for example, even if the user attempts to solve the problem by searching for a route using a terminal device (such as a smartphone) other than the terminal device that originally performed the route search, It becomes difficult for the information processing device to overlook the occurrence of such a problem, and it becomes possible to improve the content of the guidance voice by utilizing the information stored in the storage unit.

Further, the confirmation action is the user making a sound in response to the guidance sound, and the estimating unit may estimate the item the user wants to confirm based on the sound made by the user. By doing this, the information processing device can understand from the voice emitted by the user that the user needs to confirm the guidance voice, and based on the voice emitted by the user, the information processing device can determine whether the user wants to confirm the guidance voice or not. It is possible to estimate matters. For this reason, we analyzed data for each user regarding points that were difficult to understand in the guidance voice, such as ``The guidance voice was too fast to understand,'' or ``The guidance voice was poorly pronounced and I couldn't understand.'' It becomes possible to adjust the guidance voice so that it is optimal for each user.

Additionally, the guidance voice does not need to request a response from the user. By doing so, the extraction unit can extract actions other than the user's response to the guidance voice as confirmation actions. For this reason, for example, a user might say things like, "Where is (place name)?", "I don't really understand (place name)," "I couldn't catch it because it was too fast," or "I couldn't catch it because my pronunciation was bad." The user is asked to respond, such as that the user voluntarily gave instructions to the passenger, such as "look up (place name) on your smartphone" or "check the route again on your smartphone." It is possible to extract, as a confirmation action, the fact that the user spontaneously uttered a voice, rather than the fact that he or she spoke to the information processing device. Therefore, there is no need for the information processing device to specifically guide the user to speak while driving, and it is possible to prevent the user from diverting the user's attention while driving or causing the user to feel bothered by the guidance. can.

Furthermore, an information processing method according to an embodiment of the present invention is an information processing method executed by a computer. In the information processing method, in the audio output step, a guidance audio to be provided to the user is output. Then, in the detection step, the user's reaction to the guidance voice is detected. Then, in the extraction step, the user's confirmation behavior regarding the guidance voice is extracted from among the reactions detected in the detection step. Then, in the estimation step, the items that the user wants to confirm are estimated based on the confirmation behavior. Then, in the storage step, information about the items estimated in the estimation step is stored. By doing this, the user's confirmation behavior can be extracted in the extraction step from among the user's reactions detected in the detection step, so it is possible to know that the confirmation behavior has occurred. In other words, it is possible to understand that the content of the guidance voice requires the user's confirmation. In addition, in the estimation step, the items that the user wants to confirm are estimated based on the confirmation behavior, and in the storage step, information about the estimated items can be stored, so it is possible to accumulate and utilize the information about the estimated items. I can do it. Therefore, by analyzing the accumulated information, it is possible for the device or system side to understand the cause of the occurrence of the matters that the user wants to confirm regarding the guidance voice, so that it can be used to improve the guidance voice. .

Furthermore, the above-described information processing method may be executed by a computer as an information processing program. By doing so, when a situation occurs in which the user wants to check the guidance voice, the situation can be grasped using the computer. In addition, it is possible to understand the cause of occurrence of an item that the user wants to confirm regarding the guidance voice so that it can be used to improve the guidance voice.

Furthermore, the above-described information processing program may be stored in a computer-readable storage medium. By doing so, the information processing program can be distributed as a standalone program in addition to being incorporated into a device, and version upgrades can be easily performed.

[Example 1]
FIG. 1 is a schematic configuration diagram of an information processing apparatus 1 according to Example 1 of this embodiment. The information processing device 1 includes, for example, a terminal device 10 (first terminal device) that moves together with a vehicle (mobile object). The terminal device 10 is, for example, a navigation device installed in a vehicle, and includes a control section 11, an input section 12, an output section 13 (sound output section), and a storage section 14.

The control unit 11 is composed of a CPU equipped with a memory such as a RAM or ROM, and controls the entire terminal device 10. The control unit 11 includes a user identification unit 110, a route search unit 111, a detection unit 112, an extraction unit 113, and an estimation unit 114. It performs route search processing to search for a route, audio output processing to output information about the searched route as guidance voice to the user, and so on.

Additionally, the control unit 11 monitors various devices installed in the vehicle and performs detection processing to detect reactions to the user's guidance voice. At this time, the various devices monitored by the control unit 11 are, for example, devices such as a GPS receiver, various sensors such as an acceleration sensor and a gyro sensor, an in-vehicle camera, or a microphone.

Further, the control unit 11 performs an extraction process to extract the user's confirmation behavior related to the guidance voice from among the user's reactions detected by the detection process, and also performs an extraction process to extract confirmation behavior of the user regarding the guidance voice, and confirmation items that the user wants to confirm based on the extracted confirmation behavior. Perform estimation processing to estimate. The control unit 11 also performs a classification process of classifying confirmation items according to predetermined conditions, a storage process of storing the classified confirmation items in the storage unit 14, and the like. Note that details of the user's reaction, user's confirmation behavior, and confirmation items will be explained later when the operation of the information processing device 1 is explained.

The input unit 12 is composed of a device such as a microphone, an input button, or a touch panel, for example. The input unit 12 receives a user's instruction by the user's voice or by operating an input button or touch panel, and transmits a signal indicating the instruction to the control unit 11.

The output unit 13 is comprised of devices such as a speaker that outputs guidance audio and an amplifier that drives the speaker. The output unit 13 outputs information about the vehicle route searched by the route search process as a guidance voice, and transmits it to the user.

The storage unit 14 is composed of, for example, a hard disk or a non-volatile memory, and stores programs and data (for example, map data necessary for route search processing, and voice output processing) for the control unit 11 to perform the above-mentioned control. (e.g., voice data), confirmation data 140 (described later) indicating confirmation behavior, estimated data 141 (described later) indicating confirmation items, etc. are stored.

Next, an example of the operation (information processing method) of the information processing device 1 configured as described above will be described with reference to the flowchart in FIG. 2. Note that this information processing method can be executed as an information processing program on a computer equipped with a CPU or the like. Further, this information processing program may be stored in a computer-readable storage medium.

First, when the terminal device 10 is started, the user identification unit 110 performs user identification processing (step S100). In this step, the user who uses the terminal device 10 is identified. At this time, the user may be automatically identified by matching the in-vehicle image taken by the in-vehicle camera with images of the vehicle driver and passenger stored in the storage unit 14 in advance, or A question is output from the terminal device 10 to the driver and fellow passengers via the output unit 13, and the user is requested to input an answer to the question into the input unit 12, and based on the answer, the user may be specified. In addition, if a user terminal such as a smartphone owned by the user is registered in advance in the terminal device 10, the user terminal may be detected by the terminal device 10 by wireless means or the like or directly connected to the terminal device 10. The user may be identified based on what the user has done.

Next, the control unit 11 determines whether to direct route search processing (step S200). In this step, the input unit 12 is first monitored, and it is determined whether an instruction for performing route search processing is input. Specifically, the instruction for performing the route search process is for the user to input the starting point and destination into the input section 12, and the instruction is to input the starting point and destination into the input section 12, and the instruction is ``from 〇〇 (starting point) to ~XX (destination)''. This may be done by voice, such as "Check the route," or by operating a touch panel or input button. If the result of the determination is YES, the process advances to the next step S300, and if the result is NO, the process waits until the instruction is input.

In step S300, the route search unit 111 performs a route search process. In this step, a route from the departure point input to the input unit 12 to the destination is searched, and guidance information regarding the route is created. Since the contents of the route search process are publicly known, a detailed explanation will be omitted. A route to a destination is searched, and guidance information about the searched route is generated. When the route search process ends, step S300 ends and the process advances to step S400.

Step S400 is a step corresponding to the audio output process in the above embodiment. In this step, the control section 11 performs voice output processing, and the output section 13 outputs a guidance voice. The guidance voice is a voice provided to the user, and in this case indicates the guidance information generated in step S300 described above. The vehicle route searched by the route search unit is transmitted to the user by this guidance voice.

For example, the guidance voice may say, ``Go along 〇〇 road (name of the road), enter the □□ expressway (name of the expressway) from the XX interchange (name of the interchange), and get off at the △△ interchange (name of the interchange). Routes are explained to the user using road names, such as ``This is the route,'' or ``This is the route that passes through ●● (place name) and goes to ◎◎ (place name).'' It may also be something that explains the route. In addition, the guidance voice uses the name of the facility, such as "Turn right at the intersection before ◇◇ (name of facility), then turn left at the traffic light after ◆◆ (name of facility)." The route may be explained to the user by a combination of these, road names, place names, and facility names.

Note that the guidance voice may include a request for a response from the user, such as "If you do not understand the guidance, please reply that you did not understand," but the guidance voice may also include a request for a response from the user, such as "If you do not understand the guidance, please reply that you did not understand." From the viewpoint of preventing distractions and annoyance to the user, it is more preferable that the message does not require the user to respond. When the guidance voice is output, step S400 ends and the process advances to step S500.

Step S500 is a detection step in the embodiment described above. In this step, the detection unit 112 performs detection processing. Here, first, devices such as a GPS receiver, various sensors such as an acceleration sensor and a gyro sensor, an in-vehicle camera, or a microphone are monitored. Then, through these devices, sounds, user actions, behavior of the vehicle operated by the user, etc. that occur after the guidance voice is output are detected as the user's reaction. The detected reaction is treated as reaction data indicating the reaction, and is temporarily stored in the storage unit 14 in a ring buffer format. In this manner, in step S500, the detection unit 112 detects the user's reaction to the guidance voice, and the reaction is temporarily stored in the storage unit 14. Then, the process advances to step S600.

Step S600 is a step corresponding to the extraction step in the embodiment described above. In this step, the extraction unit 113 performs extraction processing. Here, first, the storage unit 14 is monitored for a predetermined period of time. The predetermined time may be, for example, about 1 to 3 minutes after the guidance voice is output. After the monitoring, it is determined whether or not data indicating the user's confirmation behavior regarding the guidance voice is found among the temporarily stored reaction data.

If data indicating confirmation behavior is found, the data is extracted as confirmation data 140. In this manner, the extraction unit 113 extracts the user's confirmation behavior regarding the guidance voice from among the reactions detected by the detection unit 112. Then, if the confirmation data 140 can be extracted (in the case of YES), the process advances to step S700, and if the confirmation data 140 cannot be extracted (in the case of NO), the operation of the terminal device 10 is ended.

Here, the confirmation action extracted as the confirmation data 140 refers to the action taken when a user who has heard a guidance voice encounters something that he or she wants to confirm about the guidance voice, and the user who has heard the guidance voice for some reason This refers to actions that indicate that the person was unable to understand the instructions, or that there was something missing in the guidance voice. The confirmation behavior is, for example, when the user "stops the vehicle on the roadside and operates a device such as a smartphone," and this confirmation behavior is included in the reaction data created by the detection unit 112. The extraction unit 113 discovers and extracts data when there is data indicating that the user is operating a device such as a smartphone, and video data of a user operating a device such as a smartphone.

In addition, the confirmation behavior was when the user asked, ``Where is ●● (place name)?'', ``I don't really understand ●● (place name)'', ``I couldn't catch it because it was too fast'', or ``I couldn't catch it because my pronunciation was bad.'' ” or “give instructions to fellow passengers” such as “look up ●● (place name) on your smartphone” or “check the route again on your smartphone,” and this confirmation behavior is discovered and extracted by the extracting unit 113 when the reaction data created by the detecting unit 112 includes voice data indicating the above-mentioned soliloquy or instructions to a fellow passenger.

In this way, confirmation actions include actions that are not actions such as re-searching for a route using the route search unit 111, that is, actions that are not actions for the route search unit 111 that initially performed the route search. Therefore, such behavior is also extracted by the extraction unit 113. In addition, as mentioned above, the user's utterance of voice in response to the guidance voice is also extracted as a confirmation behavior, and this behavior includes ``talking to oneself'' and ``giving instructions to fellow passengers.'' , includes spontaneous actions other than responding to the guidance voice.

Step S700 is a step corresponding to the estimation process in the above-mentioned embodiment. In this step, the estimation unit 114 performs estimation processing. Here, confirmation items that the user wants to confirm are estimated based on the confirmation data 140 (that is, the extracted confirmation behavior). For example, if the confirmation data 140 indicates that the user has stopped the vehicle on the roadside and is operating a device such as a smartphone, it indicates that the user has stopped driving and is operating a device such as a smartphone. There is a high possibility that they are doing some research on the voice guidance. On the other hand, the fact that there is a high possibility that the person is doing research work means that there is a high possibility that the user can hear the guidance voice. Therefore, when such confirmation data is extracted, it is presumed that the confirmation item is "the content of the guidance voice."

This is also the case when confirmation data 140 with the content "give instructions to fellow passengers" is extracted. Since he is giving instructions to his fellow passengers, he can hear the guidance voice, but he does not understand what the guidance voice is saying because he is having his fellow passenger check the route using another device called a smartphone. This is because it can be estimated that The same is true when the confirmation data 140 containing the content "talking to oneself" such as "Where is ●● (place name)?" or "I don't really understand ●● (place name)" is extracted.

On the other hand, if the confirmation data 140 with the content of "talking to oneself" such as "I couldn't hear it because it was too fast" or "I couldn't hear it because my pronunciation was bad" is extracted, the user can There is a high possibility that not only the content but also the audio itself cannot be heard accurately. Therefore, when confirmation data 140 with such content is extracted, the confirmation item is presumed to be "the voice of the guidance voice itself." In this way, in step S700, confirmation items are estimated based on the user's confirmation behavior such as using another device after listening to the guidance voice or the confirmation behavior such as emitting a voice in response to the guidance voice. Sometimes.

In addition, in step S700, whether the confirmation item is "the content of the guidance voice" or the "guidance voice itself", the confirmation item may be specified based on the voice uttered by the user. If so, such identification will be made. For example, if the content of the confirmation data 140 is ``talking to oneself'' such as ``Where is ●● (place name)?'' or ``I don't really understand ●● (place name),'' the confirmation item is ``Where is ●● (place name)?'' If the content of the confirmation data 140 is "talking to oneself" such as "I was too fast to hear", then the confirmation matter is identified as "related to the speed of the guidance voice". If the content of the confirmation data 140 is ``talking to oneself'' such as ``I couldn't understand your pronunciation because of poor pronunciation,'' then the confirmation item is specified as ``related to the pronunciation of the guidance voice.''

When these confirmation items are estimated or specified, the estimation unit 114 creates estimated data 141 as information regarding the confirmation items. This estimated data 141 indicates confirmation items. This ends step S700, and the process advances to step S800.

Step S800 is a storage step in the above embodiment. In this step, the control unit 11 performs classification processing and storage processing. Here, first, the estimated data 141 is classified according to predetermined conditions. In this case, for example, the types of confirmation items mentioned above ("content of the guidance voice", "the voice of the guidance voice itself", "things related to place names", "things related to the speed of the guidance voice", "things related to the pronunciation of the guidance voice") ), and by assigning a classification code to each type, the estimated data 141 can be classified.

Then, when the classification of the estimated data 141 is completed, a storage process is performed to save the classified estimated data 141 in the storage unit 14. In this storage process, unlike the above-described temporary storage of reaction data, the estimated data 141 is stored as data that cannot be overwritten. At this time, the estimated data 141 may be simply stored, but for example, data on the user specified by the user identification section 110, data on instructions inputted into the input section 12, and guidance voice outputted from the output section 13 can be used. , the confirmation data 140 that is the basis for creating the estimated data 141, the current position and current time data received by the GPS receiver, etc. may be added to the estimated data 141 and stored. By doing this, it is possible to analyze the reason why the guidance voice was not understood by the user from various viewpoints. Then, the operation of the terminal device 10 ends.

According to the first embodiment, the extraction unit 113 can extract the user's confirmation behavior from among the user's reactions detected by the detection unit 112, so that it is possible to detect that the user needs some kind of confirmation regarding the guidance voice. In other words, the information processing device 1 side can grasp that a situation has occurred in which the user has an item that the user would like to confirm due to reasons such as the user not being able to understand the guidance audio or the information being lacking in the guidance audio. I can do it. In addition, the estimation unit 114 estimates confirmation items (items) that the user wants to confirm based on the confirmation behavior, and the storage unit can store estimation data 141 (information about the confirmation items estimated by the estimation unit 114). , it is possible to accumulate and utilize confirmation items. Therefore, by analyzing the accumulated estimated data 141, it is possible to understand on the device side or the system side the cause of occurrence of the matters that the user wants to confirm regarding the guidance voice so that it can be used to improve the guidance voice. I can do it.

In this case, the estimated data 141 includes the above-mentioned user data, instruction data input to the input section 12, guidance voice data outputted by the output section 13, and the basis for creating the estimated data 141. If the confirmation data 140 and the data of the current location and current time received by the GPS receiver are provided, for example, it becomes possible to check the relationship between the guidance voice and the confirmation items for each user. It becomes possible to optimize the guidance voice according to the characteristics of each user, and to examine the expression of the guidance voice that many users find difficult to understand. Furthermore, since it is possible to check the relationship between the user's information, the current location, current time, and confirmation items, it is also possible to check points and routes whose guidance voice is difficult for many users to understand.

In addition, the information about the confirmation items estimated by the estimation unit 114 is added, for example, "content of the guidance voice", "the voice of the guidance voice itself", "things related to place names", "things related to the speed of the guidance voice", "information about the speed of the guidance voice", etc. For example, if you want to analyze by category what kind of content the user needed to confirm regarding the first guidance voice, etc. When utilizing the matters stored in the future storage unit 14, handling of the matters can be facilitated.

In addition, if a problem occurs such as the user cannot understand the guidance voice or there is something missing in the guidance voice as described above, the user may contact the route search unit 111 regarding the problem. It is possible to extract as a confirmation behavior that an attempt was made to solve the problem using a method other than using the method. Specifically, when a user driving a vehicle stops the vehicle on the roadside after listening to the guidance voice and operates a device such as a smartphone, or when a vehicle driver uses a mobile terminal to ask a passenger. When an attempt is made to solve the above-mentioned problem using a means other than using the route search unit 111, such as when the user instructs the user to re-examine the route, the action is extracted as a confirmation action. I can do it. Therefore, it is possible to make it easier for the information processing device 1 to understand that a situation in which the user wants to check the guidance voice has occurred, and to identify the cause of the occurrence of the problem that the user wants to check regarding the guidance voice. This can contribute to analysis and improvement of guidance voices.

Furthermore, since the information processing device 1 can be configured with one terminal device 10 without using external equipment such as a server device, the configuration of the information processing device 1 can be simplified.

Further, the information processing device 1 can understand from the voice emitted by the user that the user needs to confirm the guidance voice, and can estimate the confirmation items based on the voice emitted by the user. . For this reason, we analyzed data for each user regarding points that were difficult to understand in the guidance voice, such as ``The guidance voice was too fast to understand,'' or ``The guidance voice was poorly pronounced and I couldn't understand.'' It becomes possible to adjust the guidance voice so that it is optimal for each user.

Furthermore, according to the present embodiment, the extraction unit 113 can extract actions other than the user's response to the guidance voice as confirmation actions. For this reason, for example, when a user asks, ``Where is ●● (place name)?'', ``I don't really understand ●● (place name)'', ``I couldn't catch it because it was too fast'', or ``I couldn't catch it because the pronunciation was bad''. Users may have uttered such words to themselves, or users may have spontaneously given instructions to fellow passengers such as, ``Look up ●● (place name) on your smartphone,'' or ``Check the route again on your smartphone.'' , it is possible to extract as a confirmation action that the user voluntarily uttered a voice, rather than speaking to the information processing device 1 in response to a request for a response. Therefore, there is no need for the information processing device 1 to specifically guide the user to speak while driving, and this guidance can prevent the user from diverting his or her attention while driving or causing the user to feel bothered. I can do it.

[Example 2]
Next, the information processing device 1 according to Example 2 of this embodiment will be described with reference to FIG. 3. Note that the same parts as in the first embodiment described above are given the same reference numerals, and the explanation will be omitted or simplified.

As shown in FIG. 3, the information processing device 1 according to this embodiment includes a terminal device 10 (first terminal device) and a mobile terminal 20 (second terminal device).

The terminal device 10 is a device that moves with the vehicle as in the first embodiment, and is the same as the terminal device 10 in the first embodiment except that it does not have the detection section 112 and has the communication section 15. It has the following configuration.

The mobile terminal 20 is a device that moves with the vehicle (mobile object) and is provided to be able to communicate with the terminal device 10. The mobile terminal 20 is, for example, a device such as a smartphone or a tablet, and includes a control section 21 and a communication section 22.

The control unit 21 is composed of a CPU equipped with a memory such as a RAM or ROM, and controls the entire mobile terminal 20. The control unit 21 includes a detection unit 211 and an information search unit 212. The detection unit 211 has a configuration corresponding to the detection unit 112 in the first embodiment.

The information search unit 212 is configured to perform an information search provided independently of the route search unit 111 of the terminal device 10, and performs a route search process (which performs a vehicle route search independently of the route search unit 111). route search), a point search process (point search) to check the coordinates of a specific point, and a keyword search process (keyword search) to search for a keyword for a specific word.

Note that the information search unit 212 may use a known search engine used on the World Wide Web, and the search results are output as images by an output unit (not shown) provided in the mobile terminal 20. Alternatively, it may be output as audio.

The communication unit 22 is provided to be able to communicate with the communication unit 15 of the terminal device 10, and the reaction data created by the detection unit 211 can be transmitted from the mobile terminal 20 to the terminal device 10.

Next, the operation (information processing method) of the information processing device 1 with this configuration will be explained. Although the operation of the information processing device 1 of this embodiment is partially different from that of the first embodiment in the device that performs the processing, the flow (steps) of the operation is the same as that of the first embodiment, so please refer to FIG. Give an explanation.

First, in step S100, in addition to user identification processing, cooperation processing of the mobile terminal 20 with respect to the terminal device 10 is performed. Here, authentication is performed to determine whether the mobile terminal 20 is a terminal that can cooperate with the terminal device 10, and if cooperation is recognized, the terminal device 10 and the mobile terminal 20 can communicate. This allows the terminal device 10 to grasp the user's reaction detected by the detection unit 211 of the mobile terminal 20 and the content of the information search performed by the information search unit 212. Next, the process advances to step S200.

The steps from step S200 to step S400 are the same as in the first embodiment, so the description thereof will be omitted. In step S500, detection processing is performed. Here, in addition to the configuration of the first embodiment, the information search section 212 of the mobile terminal 20 is monitored, and the operation of the information search section 212 that occurs after the guidance voice is output is detected as the user's reaction.

In step S600, in addition to the example of the first embodiment, the confirmation action is that the user "performed a search action using the information search unit 212." This confirmation behavior is performed by the extraction unit when the reaction data created by the detection unit 211 includes data indicating that the information search unit 212 has operated for route search, point search, keyword search, etc. 113 and extracted as confirmation data 140. That is, in addition to the confirmation behavior of the first embodiment, the confirmation behavior includes the user using the information search unit 212 to search for a route for a vehicle, and the user using the information search unit 212 to check the coordinates of a specific point. This includes a point search and a keyword search for a specific word by the user using the information search unit 212.

According to the second embodiment, when a problem occurs such as the user not being able to understand the guidance voice or there being something missing in the guidance voice, the user can report the problem from the route search unit 111. By using the independent information search unit 212, it is possible to extract what was attempted to be solved as a confirmation action. Therefore, for example, if a user who is driving a vehicle stops the vehicle on the roadside after listening to the guidance voice and searches for information using a device such as a smartphone, that action can be extracted as a confirmation action. I can do it. In addition, since it is possible to link the terminal device 10 and the mobile terminal 20, it is possible to refer to the search word (keyword) input to the information search unit 212, the output method of the search result (i.e. audio or image), etc. By estimating the items to be confirmed, it is possible to understand in more detail the items the user wants to confirm.

Therefore, for example, an action in which the user performs a route search using the information search unit 212 and displays the result on a display unit (not shown) can be regarded as a confirmation action and extracted as the confirmation data 140. In this case, It can be inferred that the user was unable to visualize the route using the guidance voice and wanted to confirm the route using images. Further, an action in which the user performs a keyword search using the information search unit 212 can be regarded as a confirmation action and extracted as confirmation data 140. In this case, the search characters of the keyword search may be the name of a facility, a place name, or a road. If the name is , it can be presumed that the user was unable to understand the facility name, place name, or road name in the guidance voice.

Additionally, in this case, especially if the keyword search characters are hiragana or katakana, it can be presumed that the user does not know the name of the facility, place name, or road name. In addition, if the user searches for the same facility name, place name, or road name multiple times by keyword, the user may be able to think of multiple locations with the same sound for the facility name, place name, or road name that was guided. It can be assumed that

In this way, even if the user attempts to solve the problem by searching for information using a mobile terminal 20 such as a smartphone (terminal device other than the terminal device 10 that initially performed the route search), , it becomes difficult for the information processing device 1 to overlook that the problem has occurred, and it becomes possible to improve the content of the guidance voice by utilizing the estimated data 141 accumulated in the storage unit 14. . Furthermore, since the information processing device 1 can be configured with the terminal device 10 and the mobile terminal 20, the load on each device can be reduced compared to the information processing device 1 configured with one terminal device 10. .

[Example 3]
Next, the information processing device 1 according to Example 3 of this embodiment will be described with reference to FIG. 4. Note that the same parts as in the first and second embodiments described above are denoted by the same reference numerals, and the description thereof will be omitted or simplified.

As shown in FIG. 4, the information processing device 1 according to this embodiment includes a terminal device 10 (first terminal device), a mobile terminal 20 (second terminal device), and a server device 30.

The terminal device 10 is a device that moves with the vehicle as in the first embodiment, and has the same configuration as the terminal device 10 in the second embodiment except that it does not include the extraction unit 113, the estimation unit 114, and the storage unit 14. It is equipped with The mobile terminal 20 has the same configuration as the mobile terminal 20 of the second embodiment.

The server device 30 is provided to be able to communicate with the terminal device 10 and the mobile terminal 20, and includes a control section 31, a storage section 32, and a communication section 33. The control unit 31 includes an extraction unit 310 and an estimation unit 311. The extraction unit 310, the estimation unit 311, and the storage unit 32 correspond to the extraction unit 113, the estimation unit 114, and the storage unit 14 in Examples 1 and 2, respectively, and have the same functions as these structures. The communication unit 33 is provided to be able to communicate with the communication unit 15 and the communication unit 22, and is capable of transmitting and receiving the above-mentioned reaction data, confirmation data 140, and estimation data 141.

By doing so, the information processing device 1 configured with one terminal device 10 or the information processing device 1 configured with the terminal device 10 (first terminal device) and the mobile terminal 20 (second terminal device) The load on each device can be reduced compared to the above.

Note that the present invention is not limited to the above embodiments. That is, those skilled in the art can implement various modifications based on conventionally known knowledge without departing from the gist of the present invention. Of course, such modifications fall within the scope of the present invention as long as they still include the information processing apparatus of the present invention.

For example, the items that the user would like to confirm estimated in the estimation process may include "detailed information on the facility." Specifically, first, in the audio output step, information about the destination facility is made to be able to provide audio guidance. When providing information about a facility, actions such as "the user looked up detailed information about the facility using a smartphone or the like" are extracted as confirmation actions. Then, from the confirmation action, it is estimated that the information that the user needed (or that was missing in the guidance voice) was "detailed information about the facility." By doing this, it is possible for the device or system to grasp more information that the user needs (or that is missing in the voice guidance).

1 Information processing device 13 Output section (audio output section)
14 Storage unit 112 Detection unit 113 Extraction unit 114 Estimation unit

Claims

an audio output unit that outputs guidance audio to be provided to the user;
a detection unit that detects a reaction of the user to the guidance voice;
an extraction unit that extracts confirmation behavior of the user regarding the guidance voice from among the reactions detected by the detection unit;
an estimation unit that estimates items that the user wants to confirm based on the confirmation behavior;
An information processing device comprising: a storage unit that stores information about the item estimated by the estimation unit.
The information processing device according to claim 1, wherein the storage unit stores information about the items estimated by the estimation unit, classifying them according to predetermined conditions.
Equipped with a route search unit that searches the route of a moving object,
The guidance voice is a voice that conveys to the user the route of the mobile body searched by the route search unit,
The information processing apparatus according to claim 1, wherein the confirmation action is not an action directed toward the route search unit.
Equipped with a route search unit that searches the route of a moving object,
The guidance voice is a voice that conveys to the user the route of the mobile body searched by the route search unit,
The route search unit, the audio output unit, the detection unit, the extraction unit, the estimation unit, and the storage unit are provided in a terminal device that moves together with the mobile object. The information processing device according to claim 1.
Equipped with a route search unit that searches the route of a moving object,
The guidance voice is a voice that conveys to the user the route of the mobile body searched by the route search unit,
The route search unit, the audio output unit, the extraction unit, the estimation unit, and the storage unit are provided in a first terminal device that moves together with the mobile object,
The information processing device according to claim 1, wherein the detection unit is provided in a second terminal device that moves together with the mobile object and can communicate with the first terminal device.
Equipped with a route search unit that searches the route of a moving object,
The guidance voice is a voice that conveys to the user the route of the mobile body searched by the route search unit,
The route search unit and the audio output unit are provided in a first terminal device that moves together with the mobile object,
The detection unit is provided in a second terminal device that moves together with the mobile object and can communicate with the first terminal device,
The information according to claim 1, wherein the extraction unit, the estimation unit, and the storage unit are provided in a server device that can communicate with the first terminal device and the second terminal device. Processing equipment.
The second terminal device includes an information search unit independent from the route search unit,
The confirmation action includes: the user using the information search unit to search for a route for the mobile object; the user using the information search unit to search for a point to find the coordinates of a specific point; 7. The information processing apparatus according to claim 5, wherein the user performs a keyword search for a specific word using the information search unit.
The confirmation action is for the user to emit a sound in response to the guidance sound,
The information processing apparatus according to claim 1, wherein the estimation unit estimates the item based on the voice uttered by the user.
The information processing device according to claim 1, wherein the guidance voice does not request a response from the user.
An information processing method executed by a computer, the method comprising:
a voice output step of outputting a guidance voice to be provided to the user;
a detection step of detecting a reaction of the user to the guidance voice;
an extraction step of extracting confirmation behavior of the user regarding the guidance voice from among the reactions detected in the detection step;
an estimation step of estimating what the user wants to confirm based on the confirmation behavior;
An information processing method comprising: a storage step of storing information about the items estimated in the estimation step.
An information processing program that causes a computer to execute the information processing method according to claim 10.
A computer-readable storage medium storing the information processing program according to claim 11.