WO2020194915A1

WO2020194915A1 - Control data generation device, user device, and information processing system

Info

Publication number: WO2020194915A1
Application number: PCT/JP2019/049097
Authority: WO
Inventors: 田中　彰; 翔七尾; 充弘小形; 誠村▲崎▼; 昇悟池田; 広樹石塚
Original assignee: 株式会社Ｎｔｔドコモ
Priority date: 2019-03-28
Filing date: 2019-12-16
Publication date: 2020-10-01
Also published as: JP7369181B2; JPWO2020194915A1

Abstract

In the present invention, a server device 100A comprises: an acquisition unit 120A that acquires log data in which a time is associated with operation content including an operation performed using the voice of a user at a user device 200A; an estimation unit 122A that, on the basis of log data LG, estimates a non-operated time slot during which no operations using voice are performed, from among a plurality of time slots obtained by dividing up one day; and a control data generation unit 124 that generates control data for instructing that a sound input device 252 which receives voice input be turned off during the non-operated time slot.

Description

Control data generator, user device and information processing system

The present invention relates to reduction of power consumption in a user device.

Conventionally, a technique of recognizing a user's voice and controlling a device based on the recognition result has been known. Since the user's voice is input using the sound input device, it is necessary to supply electric power to the sound input device in order to perform the operation by voice. Patent Document 1 discloses a device including a normal mode in which power is supplied to the sound input device to wait for a user's voice and a power saving mode in which power is not supplied to the sound input device. A device having two modes selects a mode by referring to a table that specifies a normal mode and a power saving mode for each time zone.

Japanese Unexamined Patent Publication No. 2014-212641

However, in the conventional technology, since the time zone for selecting the power saving mode is predetermined, it is not possible to set the time for selecting the power saving mode for each user. That is, in the conventional technique, it has not been possible to reduce the power consumption of the user device by grasping the tendency of the operation by voice for each user and controlling the sound input device to the off state.

In order to solve the above problems, the control data generation device according to the preferred embodiment of the present invention includes an acquisition unit that acquires log data in which the operation content including the operation by the user's voice in the user device and the time are associated with each other. Based on the log data, an estimation unit that estimates an unoperated time zone in which the voice operation is not performed out of a plurality of time zones that divide the day, and a sound input that accepts the voice input in the non-operation time zone. It includes a control data generation unit that generates control data instructing the device to be turned off.

Further, the information processing system according to a preferred embodiment of the present invention is an information processing system including a user device managed by the user and a server device capable of communicating with the user device, and the user device is a user's voice. A sound input device that accepts the input of, a control unit that turns off the sound input device based on the control data, and log data in which the operation content including the operation by the user's voice in the user device and the time are associated with each other. A first communication device that transmits to the server device and receives the control data transmitted from the server device is provided, and the server device receives the log data transmitted from the user device and controls the control. A second communication device that transmits data to the user device, an estimation unit that estimates an unoperated time zone in which the voice operation is not performed among a plurality of time zones that divide the day based on the log data, and an estimation unit. It includes a control data generation unit that generates the control data instructing the sound input device that receives the input of the voice to be turned off during the non-operation time zone.

According to the present invention, it is possible to reduce the power consumption of the user device by grasping the tendency of the operation by voice for each user and controlling the sound input device to the off state.

It is a block diagram which shows the whole structure of the information processing system which concerns on 1st Embodiment of this invention. It is explanatory drawing which shows an example of log data. It is explanatory drawing which shows the process which a learning part generates teacher data. It is a flowchart which shows the operation example of the information processing system which concerns on 1st Embodiment. It is a block diagram which shows the structural example of the server apparatus which concerns on 2nd Embodiment. It is explanatory drawing which shows an example of the behavior data. It is a flowchart which shows the operation example of the information processing system which concerns on 2nd Embodiment. It is a block diagram which shows the structural example of the user apparatus which concerns on 3rd Embodiment. It is a block diagram which shows the whole structure of the information processing system which concerns on 4th Embodiment. It is a flowchart which shows the operation example of the information processing system which concerns on 4th Embodiment.

[1. First Embodiment]
FIG. 1 is a block diagram showing an overall configuration of the information processing system 10 according to the first embodiment. As illustrated in FIG. 1, the information processing system 10 includes a server device 100A and a user device 200A owned by the user. In the following description, a smartphone is assumed as the user device 200A. However, any information processing device can be adopted as the user device 200A. For example, the user device 200A may be a portable information terminal such as a notebook computer, a wearable device, or a tablet terminal.
Further, the user device 200A has a voice operation function capable of controlling the operation by the voice of the user.

[1-1. Server device]
The server device 100A includes a processing device 110, a storage device 130, and a communication device 140. Each element of the server device 100A is connected to each other by a single bus or a plurality of buses for communicating information. The term "device" in the present specification may be read as another term such as a circuit, a device or a unit. Further, each element of the server device 100A and the user device 200A may be composed of a single device or a plurality of devices. Some elements of the server device 100A and the user device 200A may be omitted.

The processing device 110 is a processor that controls the entire user device 200A, and is composed of, for example, a single or a plurality of chips. The processing device 110 is composed of, for example, a central processing unit (CPU: Central Processing Unit) including an interface with peripheral devices, an arithmetic unit, registers, and the like. A part or all of the functions of the processing device 110 are realized by hardware such as DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), PLD (Programmable Logic Device), FPGA (Field Programmable Gate Array). You may. The processing device 110 executes various processes in parallel or sequentially.

The storage device 130 is a recording medium that can be read by the processing device 110, is a plurality of programs including the control program P1 executed by the processing device 110, and various types such as the learning model M1 and the log data LG used by the processing device 110. Store data.

The log data LG is generated by the user device 200A and transmitted to the server device 100A. The log data LG is data in which the operation content and the time in the user device 200A are associated with each other. The operation content includes the operation by the user's voice. The user's voice operation is not always performed.
In the following description, of the plurality of time zones Tz in which one day is divided, the time zone Tz in which no voice operation is performed is referred to as an unoperated time zone Tx. In the present embodiment, one day is divided into 72 time zones Tz1 to Tz72 (see FIG. 3). The time width is 20 minutes in each of the 72 time zones Tz1 to Tz72. When each time zone Tz1 to Tz72 is not distinguished, any time zone is simply referred to as a time zone Tz. Further, the number of time zones Tz is not limited to 72, and may be 2 or more. Further, the time width of each time zone Tz may be different. For example, 3 hours from 2:00 to 5:00, when the user is likely to be sleeping, may be assigned to one time zone Tz.

The storage device 130 may be composed of at least one such as a ROM (Read Only Memory), an EPROM (Erasable Programmable ROM), an EPROM (Electrically Erasable Programmable ROM), a RAM (Random Access Memory), and a flash memory. The storage device 130 may be called a register, a cache, a main memory (main storage device), or the like.

The communication device 140 is hardware (transmission / reception device) for communicating with other devices. The communication device 140 is also called, for example, a network device, a network controller, a network card, a communication module, or the like.

The processing device 110 functions as an acquisition unit 120A, an estimation unit 122A, a control data generation unit 124, and a transmission control unit 126 by reading the control program P1 from the storage device 130 and executing the program. The control program P1 may be transmitted from another device via the network.

The acquisition unit 120A acquires the log data LG by using the communication device 140. FIG. 2 shows an example of log data LG. The log data LG shown in FIG. 2 shows the operation contents of the user device 200A in the time zone from 8:00 am to 8:40 am among the operation contents of the user device 200A on March 11, 2019. In this example, the log data LG includes records r1 to r14. For example, record r1 indicates that the user device 200A was unlocked at 8:00. The record r5 also shows that the voice operation was performed at 8:12. The records r1 to r8 belong to the time zone Tz24, and the records r9 to r14 belong to the time zone Tz25. As described above, in the example of FIG. 2, the log data LG is data having a plurality of sets (plural records) of the operation content and the time in the user device 200A in each time zone.

Based on the log data LG, the estimation unit 122A estimates the unoperated time zone Tx in which no voice operation is performed among the plurality of time zone Tz that divide the day. The estimation unit 122A includes a learning unit 1221 and a prediction unit 1222.
The learning unit 1221 causes the learning model M1 to machine learn the relationship between the log data LG and the unoperated time zone Tx. The unoperated time zone Tx used by the learning unit 1221 represents a time zone in which no voice operation is actually performed. The prediction unit 1222 uses the learning model M1 to generate prediction data Dp indicating the presence or absence of voice operation in a future time zone. The prediction data Dp indicates the presence or absence of voice operation for a time zone Tz of 1 or more. For example, the prediction data Dp indicates the presence or absence of voice operation for each of the 72 time zones Tz1 to Tz72 on a daily basis. If the current date and time is 8:15 am on March 11, the forecast data Dp is for each time zone Tz from 8:20 am on March 11 to 8:20 am on March 12. The presence or absence of voice operation may be indicated. Then, the time zone Tz in which the predicted data Dp indicates that there is no voice operation is the time zone Tz estimated to be the above-mentioned unoperated time zone Tx.

More specifically, the learning unit 1221 generates label data Dl indicating the correctness of the prediction data Dp based on the log data LG. When the prediction data Dp indicates the presence or absence of voice operation for a plurality of time zones Tz, the label data Dl indicates correctness for the plurality of time zones Tz. That is, the label data Dl indicates correctness with respect to the time zone Tz corresponding to the prediction data Dp. The learning unit 1221 determines the correctness of the time zone Tz corresponding to the prediction data Dp by referring to the log data LG of the time zone, and generates label data Dl indicating the determination result.
Further, the learning unit 1221 generates a pair of the label data Dl and the log data LG before the time zone Tz corresponding to the prediction data Dp as the teacher data Dt, and causes the learning model M1 to learn the teacher data Dt.

FIG. 3 is an explanatory diagram showing a process in which the learning unit 1221 generates teacher data Dt. For example, when the forecast date is March 11, the forecast unit 1222 generates the forecast data Dp before March 11. In this example, it is assumed that the forecast data Dp is generated on March 10. The learning unit 1221 generates voice operation data Dr indicating the presence or absence of voice operation in each time zone Tz of the predicted day based on the log data LG of the day on March 11. In the voice operation data Dr, "1" indicates that the voice operation was not performed, and "0" indicates that the voice operation was performed.

In the example shown in FIG. 3, the voice operation data Dr has voice operation (voice operation has been performed) in the time zone Tz24 (08: 00-08: 20) and the time zone Tz25 (08: 20-08: 40). ) Is shown. On the other hand, the prediction data Dp indicates that there is voice operation (it was predicted that voice operation will be performed) in the time zone Tz24 (08: 00-08: 20) and time zone Tz25 (08: 20-08: 40). Indicates that there is no voice operation (it was predicted that no voice operation would be performed). Therefore, the prediction of the unoperated time zone Tx using the learning model M1 is incorrect in the time zone Tz25. The learning unit 1221 generates label data Dl. The label data Dl is "1" in the time zone Tz where the prediction is correct, and is "0" in the time zone Tz where the prediction is incorrect. In the example shown in FIG. 3, the learning unit 1221 generates label data Dl indicating an error in the time zone Tz25.

Further, in the example shown in FIG. 3, the learning unit 1221 generates a pair of log data LG and label data Dl on March 10 before the predicted date of March 11 as teacher data Dt. By making the learning model M1 machine-learn the teacher data Dt, the accuracy of estimating the unoperated time zone Tx is improved.

Return the explanation to Fig. 1. The control data generation unit 124 uses a sound input device 252 that receives voice input in the time zone Tz indicating that the predicted data Dp estimated by the estimation unit 122A has no voice operation, that is, in the unoperated time zone Tx indicated by the predicted data Dp. Generates control data Dc instructing to turn it off. The control data Dc may be the same as the prediction data Dp, or may indicate the start time and end time of the period during which the sound input device 252 is in the off state. For example, when the predicted data Dp is an example shown in FIG. 3, the control data Dc may be the same 72-bit data as the predicted data Dp, or the start time is 08:20 and the end time is 08:40. May be indicated.

Next, the transmission control unit 126 controls the communication device 140 to cause the communication device 140 to transmit the control data Dc to the user device 200A.

[1-2. User device]
Next, the user device 200A includes a processing device 210, a storage device 230, a communication device 240, an input device 250, an output device 260, a motion detection device 270, and a GPS device 280. The processing device 210 is a processor that controls the entire user device 200A, and is configured in the same manner as the processing device 110.

The storage device 230 is a recording medium that can be read by the processing device 210, and stores a plurality of programs including the control program P2 executed by the processing device 210 and various data used by the processing device 210. The storage device 230 may be composed of, for example, at least one of ROM, EPROM, EEPROM, RAM and the like.

The communication device 240 is hardware (transmission / reception device) for communicating with other devices. The communication device 240 may be configured in the same manner as the communication device 140. The communication device 240 is an example of the first communication device.

The input device 250 is an input device that accepts input from the outside. For example, the input device 250 accepts an operation for inputting a code such as a number and a character into the processing device 210. Input operations include user touch operations and user voice operations. Regarding the touch operation, for example, a touch panel that detects the contact of the user's finger with the display surface of the display device 261 is suitable as the input device 250. The input device 250 may include a plurality of controls that can be operated by the user.

In addition, the sound input device 252 accepts the user's voice operation. The sound input device 252 includes a microphone that converts sound into an electric signal, an amplifier that amplifies the output signal of the microphone, and an AD converter that converts the output signal of the amplifier into a digital signal. Further, the sound input device 252 includes a switch provided between the amplifier and the AD converter and the power supply line for supplying power. When the switch is turned on, power is supplied to the sound input device 252, and the sound input device 252 is turned on. When the sound input device 252 is on, the user's voice can be converted into sound data and output. On the other hand, when the switch is turned off, power is not supplied to the sound input device 252. In this case, the sound input device 252 is turned off. The sound input device 252 in the off state cannot convert the user's voice into sound data. The switch is controlled by the processing device 210. In the present specification, the on state of the sound input device 252 means an operating state capable of converting sound into sound data. Further, the off state of the sound input device 252 means an operating state in which sound cannot be converted into sound data. Therefore, the off state includes a sleep state in which the time until the transition to the on state is short is short. The sleep state consumes less power than the on state. The sleep state is different from the completely off state in which the power of the user device 200A is not consumed at all.

The output device 260 is a device that outputs the user device 200A to the outside. The output device 260 includes, for example, a display device 261 for displaying an image and a sound output device 262 for outputting sound. The display device 261 displays various images under the control of the processing device 210. For example, various display panels such as a liquid crystal display panel and an organic EL (Electro Luminescence) display panel are preferably used as the display device 261.

The motion detection device 270 detects the motion of the user device 200A and outputs motion data. The motion detection device 270 corresponds to an inertial sensor such as a gyro sensor that detects angular acceleration and an acceleration sensor that detects acceleration. When the motion detection device 270 detects an acceleration larger than a predetermined value, it means that the user is moving at high speed in a vehicle (for example, a train or a car). On the contrary, when the motion detection device 270 detects an acceleration smaller than a predetermined value, it can be detected that the user is walking or running.

The GPS device 280 receives radio waves from a plurality of satellites and generates position data using the received radio waves. The position data indicates the position of the user device 200A. The position data may be in any format as long as the position can be specified. The position data indicates, for example, the latitude and longitude of the user device 200A. In this example, it is illustrated that the position data is obtained from the GPS device 280, but the user device 200A may acquire the position data by any method. For example, the position data may be acquired using the cell ID assigned to the base station that is the communication destination of the user device 200A. Further, when the user device 200A communicates with the access point of the wireless LAN (Local Area Network), the user device 200A actually has an identification address (MAC (Media Access Control) address) on the network assigned to the access point. The location data may be acquired by referring to a database in which the addresses (locations) of the above are associated with each other. Further, the user device 200A may receive the ID information included in the advertisement packet conforming to the BLE (Bluetooth Low Energy) standard, and may acquire the position data based on the ID information.

The processing device 210 functions as a control data acquisition unit 220, a control unit 222, a voice agent unit 224, and a transmission control unit 226 by reading the control program P2 from the storage device 230 and executing the program. The control program P2 may be transmitted from another device via the network.

The control data acquisition unit 220 acquires the control data Dc from the server device 100A by using the communication device 240.
The control unit 222 turns off the sound input device 252 based on the control data Dc. When the control data Dc indicates the start time and the end time of the non-operation time zone Tx, the control unit 222 turns off the switch when the current time and the start time match. On the other hand, when the current time and the end time match, the control unit 222 turns on the switch.
Further, the control unit 222 generates the log data LG in which the operation content of the user device 200A and the time are associated with each other, and stores the log data LG in the storage device 230.

The voice agent unit 224 recognizes the voice based on the sound data output from the sound input device 252, interprets the operation instruction by the user's voice, and controls the user device 200A. For example, if the user's voice indicates "What is the weather today?", The voice agent unit 224 accesses the weather forecast site, obtains the weather forecast for today, and uses the voice or image to the user. Let me know the weather today.

The transmission control unit 226 causes the communication device 240 to transmit the log data LG stored in the storage device 230 to the server device 100A by using the communication device 240.

[1-3. Information processing system operation]
Next, the operation of the information processing system 10 will be described. FIG. 4 is a flowchart showing the operation of the information processing system.

First, the processing device 210 of the user device 200A functions as the control unit 222, generates the log data LG, and stores the log data LG in the storage device 230 (S200).

Next, the processing device 210 functions as a transmission control unit 226, controls the communication device 240, and causes the communication device 240 to transmit the log data LG to the server device 100A (S210). The transmission of the log data LG may be a periodic transmission such as once a day. Alternatively, the log data LG may be transmitted every time a predetermined number of new log data LGs are generated.

Next, the processing device 110 of the server device 100A functions as the acquisition unit 120A and acquires the log data LG transmitted from the user device 200A (S100). Specifically, the acquisition unit 120A causes the communication device 140 to receive the log data LG transmitted from the user device 200A. Upon this reception, the acquisition unit 120A acquires the log data LG.

In step S110, the processing device 110 functions as a learning unit 1221 and generates voice operation data Dr based on the daily log data LG. The voice operation data Dr indicates the presence or absence of voice operation for each time zone Tz.

In step S120, the processing device 110 functions as a learning unit 1221 and generates label data Dl. In this example, the processing device 110 generates label data Dl on a daily basis. Specifically, the processing device 110 compares the prediction data Dp of the prediction date corresponding to the voice operation data Dr and the voice operation data Dr for each time zone Tz, and generates label data Dl indicating the correctness of the prediction. To do.

In step S130, the processing device 110 functions as a learning unit 1221, generates a set of label data Dl and log data LG of the predicted date as teacher data Dt, and trains the teacher data Dt in the learning model M1.

In step S140, the processing device 110 functions as the prediction unit 1222, inputs the log data LG before the prediction date to the learning model M1, and generates the prediction data Dp indicating the unoperated time zone Tx on the prediction date. .. The term before the forecast date is, for example, the day before the forecast date. For example, if the forecast date is the next day, the log data LG before the forecast date is the log data LG of the current day.

In step S150, the processing device 110 functions as the control data generation unit 124 and generates the control data Dc instructing the sound input device 252 to be turned off during the non-operation time zone Tx.

In step S160, the processing device 110 functions as a transmission control unit 126, and the communication device 140 is used to transmit the control data Dc to the user device 200A.

Next, the processing device 210 of the user device 200A functions as the control data acquisition unit 220 in step S220, and causes the communication device 240 to receive the control data Dc transmitted from the server device 100A by using the communication device 240.

In step S230, the processing device 210 functions as the control unit 222, and controls the sound input device 252 to the off state in the non-operation time zone Tx based on the control data Dc.

As described above, according to the present embodiment, the server device 100A has the acquisition unit 120A for acquiring the log data LG in which the operation content including the operation by the user's voice in the user device 200A is associated with the time, and the log data LG. Based on the above, the estimation unit 122A is provided to estimate the unoperated time zone Tx in which the operation by voice is not performed among the plurality of time zones Tz that divide the day. Since the estimation of the non-operated time zone Tx in the estimation unit 122A is based on the operation content of the user device 200A, it is an estimation considering how the user operates the user device 200A. Therefore, it is possible to grasp the tendency of the operation by voice for each user and estimate the non-operation time zone Tx. Further, the server device 100A includes a control data generation unit 124 that generates control data Dc instructing the sound input device 252 that accepts voice input to be turned off during the non-operation time zone Tx. By using the control data Dc by the user device 200A, the sound input device 252 is controlled to be in the off state during the estimated non-operation time zone Tx. This control makes it possible to reduce the power consumption of the user device 200A.

Further, the estimation unit 122A uses the learning unit 1221 that causes the learning model M1 to learn the relationship between the log data LG and the unoperated time zone Tx, and the learning model M1 to determine whether or not there is an operation by voice in the future time zone Tz. It is provided with a prediction unit 1222 that generates the prediction data Dp shown. Since the relationship between the log data LG and the unoperated time zone Tx is machine-learned by the learning model M1, the prediction accuracy of the unoperated time zone Tx using the learning model M1 is gradually improved as the degree of machine learning progresses. be able to. In this case, the control data generation unit 124 generates the control data Dc instructing the sound input device 252 to be turned off during the non-operation time zone Tx indicated by the prediction data Dp. By using the control data Dc by the user device 200A, the sound input device 252 can be controlled to be in the off state during the non-operation time zone Tx indicated by the predicted data Dp. As a result, it is possible to reduce the power consumption of the user device 200A.

Further, the learning unit 1221 generates label data Dl indicating the correctness of the predicted data Dp based on the log data LG, and sets the label data Dl and the log data LG before the time zone Tz corresponding to the predicted data Dp. The learning model M1 is trained as the teacher data Dt.
That is, the learning unit 1221 causes the learning model M1 to perform machine learning using the teacher data Dt. Therefore, learning by the learning model M1 becomes possible in a short period of time as compared with the case where the learning model M1 is constructed without the teacher data.

The acquisition unit 120A, the estimation unit 122A, and the control data generation unit 124 included in the server device 100A of the first embodiment are examples of the control data generation device that generates control data.

[2. Second Embodiment]
The information processing system 10 according to the second embodiment is configured in the same manner as the information processing system 10 of the first embodiment shown in FIG. 1, except that the server device 100B is provided instead of the server device 100A. There is.

FIG. 5 is a block diagram showing a configuration example of the server device 100B. In the server device 100B, the estimation unit 122B is used instead of the estimation unit 122A, the storage device 130 stores the control program P3 instead of the control program P1, and the storage device 130 stores the control program P3 instead of the learning model M1. It is configured in the same manner as the server device 100A of the first embodiment except that the storage device 130 stores the action data Da. In the server device 100A, the set of the log data LG and the label data Dl is used as the teacher data Dt, but in the server device 100B, the set of the action data Da and the label data Dl is used as the teacher data Dt.
The processing device 110 functions as an acquisition unit 120A, an estimation unit 122B, a control data generation unit 124, and a transmission control unit 126 by reading the control program P3 from the storage device 130 and executing the program.

The estimation unit 122B includes a learning unit 1223 and a prediction unit 1222. The learning unit 1223 generates the action data Da in which the action content of the user and the time when the action is taken are associated with each other based on the log data LG acquired from the user device 200A by the acquisition unit 120A. The action data Da is not the log data LG itself, but is an interpretation of the log data LG and applied to the action content of the user.

To give an example of user behavior, there are video playback, music playback, game execution, email, web search, etc. regarding application execution. These action contents are specified from the application recorded in the log data LG. In addition, the location of the user is also included in the action content. For example, home, office, cafe. These locations are identified based on the location data contained in the log data LG. In addition, the action contents related to the movement of the user include walking, running, moving by train, and the like. These action contents are generated from the acceleration data included in the log data LG.

For example, the action data Da shown in FIG. 6 is generated based on the log data LG showing the operation content of the user device 200A in the time zone from 8:00 am to 8:40 am shown in FIG. In this example, the records r1 to r14 of the log data LG shown in FIG. 2 are compressed into the records R1 to R10 of the action data Da. Further, a "commuting route" is assigned as an action content to the record R4 and the record R8 of the action data Da. As described above, in the example of FIG. 6, the behavior data Da is data having a plurality of sets (plural records) of the user's behavior content and time in each time zone.

The learning unit 1223 generates a set of behavior data Da and label data Dl as teacher data Dt, and causes the learning model M2 to learn the teacher data Dt. The label data Dl is generated in the same manner as the learning unit 1221 of the first embodiment. In the behavior data Da, since the content of the log data LG is abstracted from the viewpoint of the behavior content of the user, the learning efficiency can be improved.

Next, the operation of the information processing system 10 according to the second embodiment will be described. FIG. 7 is a flowchart showing the operation of the information processing system 10 according to the second embodiment. The flowchart shown in FIG. 7 differs from the flowchart shown in FIG. 4 in that step S131 is adopted instead of step S130, and step S122 is provided between step 120 and step S131.

In step S122, the processing device 110 functions as a learning unit 1223 and generates action data Da based on the log data LG. Further, in step S131, the processing device 110 functions as a learning unit 1223, generates a set of action data Da and label data Dl as teacher data Dt, and causes the learning model M2 to learn the teacher data Dt.

As described above, according to the second embodiment, the learning unit 1223 generates label data Dl indicating the correctness of the predicted data Dp based on the log data LG, and is before the time zone Tz corresponding to the predicted data Dp. Based on the log data LG, the action data Da in which the action content of the user and the time when the action is performed is associated with each other is generated, and the pair of the label data Dl and the action data Da is trained by the learning model M2 as the teacher data Dt. Since the content of the log data LG is abstracted from the viewpoint of the user's behavior in the behavior data Da, the learning efficiency of the learning model M2 can be improved.

The acquisition unit 120A, the estimation unit 122B, and the control data generation unit 124 included in the server device 100B of the second embodiment are examples of the control data generation device that generates control data.

[3. Third Embodiment]
FIG. 8 is a block diagram showing a configuration example of the user device 200B according to the third embodiment. The user device 200B has a point that the estimation unit 122A is provided, a point that the acquisition unit 120B is provided, a point that the control program P4 is stored in the storage device 230 instead of the control program P1, and a point that the storage device 230 learns. It differs from the user apparatus 200A of the first embodiment shown in FIG. 1 in that the model M1 is provided.

That is, in the first embodiment, the user device 200A transmits the log data LG to the server device 100A, and the server device 100A constructs the learning model M1, but in the third embodiment, the user device 200B is based on the log data LG. The learning model M1 is constructed, and the user device 200B turns off the sound input device 252 in the unoperated time zone Tx predicted by using the learning model M1.

The processing device 210 functions as an acquisition unit 120B, an estimation unit 122A, a control data generation unit 124, a control unit 222, and a voice agent unit 224 by reading the control program P4 from the storage device 230 and executing the program.

The acquisition unit 120B reads the log data LG from the storage device 230 and acquires the log data. In this respect, it differs from the acquisition unit 120A of the first embodiment in which the log data LG is acquired from the user device 200A. The estimation unit 122A estimates the non-operation time zone Tx based on the log data LG. Specifically, the prediction unit 1222 generates prediction data Dp using the learning model M1. The control unit 222 controls the sound input device 252 to the off state during the non-operation time zone Tx based on the control data Dc.

Since the user device 200B of the third embodiment does not transmit the log data LG to the server device 100A, communication resources can be saved. Further, since the log data LG includes personal information, the user device 200B can enhance the security from the viewpoint of protecting the personal information.

The acquisition unit 120B, the estimation unit 122A, and the control data generation unit 124 included in the user device 200B of the third embodiment are examples of the control data generation device that generates control data.
Further, in the user device 200B described above, the estimation unit 122B described in the second embodiment may be used instead of the estimation unit 122A. When the estimation unit 122B is used, the learning efficiency of the learning model M2 can be improved as compared with the case where the estimation unit 122A is used.

[4. Fourth Embodiment]
In the first embodiment, the second embodiment, and the third embodiment described above, the unoperated time zone Tx was estimated using the learning model M1 or M2. On the other hand, in the fourth embodiment, the log data LG is analyzed without using machine learning, and the non-operation time zone Tx is specified.

FIG. 9 is a block diagram showing a configuration example of the information processing system 10 according to the fourth embodiment. The information processing system 10 according to the fourth embodiment is configured in the same manner as the information processing system 10 according to the first embodiment, except that the server device 100C is used instead of the server device 100A.

The server device 100C uses the estimation unit 122C instead of the estimation unit 122A, the storage device 130 stores the control program P5 instead of the control program P1, and the storage device 130 does not store the learning model M1. It is different from the server device 100A of the first embodiment.

The differences will be mainly explained below. The estimation unit 122C estimates the non-operation time zone Tx based on the log data LG. The estimation unit 122C includes a calculation unit 1224 and a specific unit 1225. Based on the log data LG, the calculation unit 1224 calculates an evaluation value indicating the degree of possibility that the operation by voice is not performed for each of the plurality of time zones Tz. For example, when the evaluation value is large as compared with the case where the evaluation value is small, there is a high possibility that the operation by voice is not performed. On the contrary, when the evaluation value is small as compared with the case where the evaluation value is large, there is a high possibility that the operation by voice is not performed. For example, as the evaluation value, the number of times the voice operation is performed for each time zone Tz is used. When the number of times the voice operation is performed is used as the evaluation value, when the evaluation value is smaller than when the evaluation value is large, there is a high possibility that the voice operation is not performed.

The identification unit 1225 specifies the non-operation time zone Tx based on the comparison result of comparing the evaluation value with the predetermined value. A time zone Tz having an evaluation value smaller than a predetermined value is specified as an unoperated time zone Tx.

The control data generation unit 124 generates control data Dc instructing the sound input device 252 to be turned off in the unoperated time zone Tx specified by the specific unit 1225, and the control unit 222 inputs sound according to the control data Dc. The device 252 is controlled to the off state.

FIG. 10 is a flowchart for explaining the operation of the information processing system 10 according to the fourth embodiment. The flowchart shown in FIG. 10 is different from the flowchart shown in FIG. 4 in that steps S102 and S104 are provided instead of steps S120, S130 and S140. The differences will be described.

In step S102, the processing device 110 functions as the calculation unit 1224, and calculates the evaluation value for each of the plurality of time zones Tz based on the log data LG. Specifically, the calculation unit 1224 extracts the past log data LG for a predetermined period (for example, one month) from the present, calculates the number of times the voice operation is performed for each time zone Tz, and calculates the number of times. Use as an evaluation value. For example, 50 times in the time zone Tz25 from 8:20 to 8:40, twice in the time zone Tz26 from 10:00 to 10:20, and so on.

In step S104, the processing device 110 functions as a specific unit 1225, and specifies the non-operation time zone Tx based on the comparison result of comparing the evaluation value with the predetermined value. In the above example, if the predetermined value is "3", the time zone Tz26 having an evaluation value "2" smaller than the predetermined value is specified as the unoperated time zone Tx.

The server device 100C of the present embodiment determines the evaluation value and the calculation unit 1224 that calculates the evaluation value indicating the degree of possibility that the operation by voice is not performed for each of the plurality of time zones Tz based on the log data LG. A specific unit 1225 for specifying the non-operation time zone Tx based on the comparison result compared with the value is provided. Therefore, the unoperated time zone Tx can be easily estimated as compared with the case where the learning model M1 or M2 is used.

The acquisition unit 120A, the estimation unit 122C, and the control data generation unit 124 included in the server device 100C of the fourth embodiment are examples of the control data generation device that generates control data.

Further, in the user device 200B of the third embodiment, the estimation unit 122C may be used instead of the estimation unit 122A, and the learning model M1 may not be provided from the storage device 230.

[5. Modification example]
The present invention is not limited to the embodiments exemplified above. A specific mode of modification is illustrated below. Two or more aspects arbitrarily selected from the following examples may be merged.

[First modification]
In the second embodiment described above, the acquisition unit 120A may acquire schedule data indicating a schedule related to the user's behavior in addition to the log data LG. The schedule data may be acquired from the user apparatus 200A or may be acquired from another server apparatus. For example, if the user stores the schedule data on the cloud, the schedule data may be acquired from the server device that manages the schedule data.
The learning unit 1223 may generate the action data Da based on the log data LG and the schedule data before the time zone corresponding to the prediction data Dp.
According to the first modification, since the action data Da is generated in consideration of not only the log data LG but also the schedule data, the unoperated time zone Tx on the predicted date is estimated based on the more accurate action data Da. Can be done. As a result, the estimation accuracy of the non-operating time zone Tx can be improved, and the power consumption of the user device 200A can be reduced.

[Second modification]
In each of the above embodiments, the

user device

200A or 200B may include a detection device that detects a state in which the

user device

200A or 200B is in use. The detection device is, for example, a proximity sensor. The control unit 222 turns off the sound input device 252 when the detection result of the detection device is in a predetermined state in the time zone Tz in which the control data Dc does not specify that the sound input device 252 is turned off. For example, when the control unit 222 determines based on the output data of the proximity sensor that the display surface of the display device 261 is close to an object (for example, a table), the control unit 222 turns off the sound input device 252. Alternatively, the control unit 222 detects the SN ratio using the detection device based on the sound data output from the sound input device 252, and when the detected SN ratio is lower than the predetermined value, the sound input device 252 is turned off. Let me. The signal-to-noise ratio may be calculated by applying the energy component of the human voice band to the signal component S and applying the energy component of the other band to the noise component N. When the SN ratio is lower than the predetermined value, there is a high possibility that voice recognition becomes impossible. Therefore, even when the sound input device 252 is in the off state, the power consumption of the

user devices

200A or 200B can be reduced without significantly impairing the operability of the

user devices

200A or 200B.

[Third variant]
In the first embodiment, the second embodiment, and the fourth embodiment described above, the control data Dc is transmitted from the

server device

100A or 100C to the user device 200A, but is controlled by another device owned by the user of the user device 200A. Data Dc may be transmitted. As another device, for example, a wearable device is applicable. In this case, the control data Dc generated from the log data LG of the user device 200A can be applied to the wearable device. Since the wearable device does not have to transmit the log data LG to the

server device

100A or 100C, the power consumption of the wearable device can be reduced. The other device owned by the user may be a so-called AI speaker.

[6. Others]
(1) In the above-described embodiment, the

storage devices

130 and 230 are recording media that can be read by the processing device 110 or 210, and examples thereof include a ROM and a RAM, but a flexible disk, a magneto-optical disk (for example, a compact disk) , Digital versatile discs, Blu-ray® discs), smart cards, flash memory devices (eg cards, sticks, key drives), CD-ROMs (Compact Disc-ROMs), registers, removable disks, hard disks, Floppy® disks, magnetic strips, databases, servers and other suitable storage media. The program may also be transmitted from the network via a telecommunication line. The program may also be transmitted from the communication network via a telecommunication line.

(2) In the above-described embodiment, the described information, signals, and the like may be represented using any of a variety of different techniques. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description may be voltage, current, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. It may be represented by a combination of.

(3) In the above-described embodiment, the input / output information and the like may be stored in a specific place (for example, a memory) or may be managed by using a management table. Input / output information and the like can be overwritten, updated, or added. The output information and the like may be deleted. The input information or the like may be transmitted to another device.

(4) In the above-described embodiment, the determination may be made by a value represented by 1 bit (0 or 1) or by a boolean value (Boolean: true or false). , May be done by numerical comparison (eg, comparison with a given value).

(5) The order of the processing procedures, sequences, flowcharts, etc. exemplified in the above-described embodiment may be changed as long as there is no contradiction. For example, the methods described in the present disclosure present elements of various steps using exemplary order, and are not limited to the particular order presented.

(6) Each of the functions illustrated in FIGS. 1, 5, 8 and 9 is realized by any combination of at least one of hardware and software. Further, the method of realizing each functional block is not particularly limited. That is, each functional block may be realized by using one physically or logically connected device, or directly or indirectly (for example, two or more physically or logically separated devices). , Wired, wireless, etc.) and may be realized using these plurality of devices. The functional block may be realized by combining the software with the one device or the plurality of devices.

(7) In the program illustrated in the above-described embodiment, the software is an instruction, an instruction set, a code, regardless of whether the software is called software, firmware, middleware, microcode, hardware description language, or another name. It should be broadly interpreted to mean code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executable files, execution threads, procedures, features, etc. ..

In addition, software, instructions, information, etc. may be transmitted and received via a transmission medium. For example, a website that uses at least one of wired technology (coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL: Digital Subscriber Line), etc.) and wireless technology (infrared, microwave, etc.) When transmitted from a server, or other remote source, at least one of these wired and wireless technologies is included within the definition of transmission medium.

(8) In the embodiments described above, the terms "connected", "coupled", or any variation thereof, are direct or indirect between two or more elements. It means any connection or connection and can include the presence of one or more intermediate elements between two elements that are "connected" or "connected" to each other. The connection or connection between the elements may be physical, logical, or a combination thereof. For example, "connection" may be read as "access". As used in the present disclosure, the two elements use at least one of one or more wires, cables and printed electrical connections, and, as some non-limiting and non-comprehensive examples, the radio frequency domain. Can be considered to be "connected" or "coupled" to each other using electromagnetic energy having wavelengths in the microwave and light (both visible and invisible) regions.

(9) In the above embodiments, the statement "based on" does not mean "based on" unless otherwise stated. In other words, the statement "based on" means both "based only" and "at least based on".

(10) When "include", "including" and variants thereof are used in the above-described embodiments, these terms are similar to the term "comprising". , Intended to be comprehensive. Furthermore, the term "or" used in the present disclosure is intended not to be an exclusive OR.

(11) In the present disclosure, when articles are added by translation, for example, a, an and the in English, the disclosure also includes the plural nouns following these articles. Good.

(12) In the present disclosure, the term "A and B are different" may mean "A and B are different from each other". The term may mean that "A and B are different from C". Terms such as "separate" and "combined" may be interpreted in the same way as "different".

(13) Each aspect / embodiment described in the present disclosure may be used alone, in combination, or may be switched according to the execution. Further, the notification of predetermined information (for example, the notification of "being X") is not limited to the explicit notification, but is performed implicitly (for example, the notification of the predetermined information is not performed). May be good.

Although the present disclosure has been described in detail above, it is clear to those skilled in the art that the present disclosure is not limited to the embodiments described in the present disclosure. The present disclosure may be implemented as an amendment or modification mode without departing from the purpose and scope of the present disclosure, which is determined by the description of the claims. Therefore, the description of this disclosure is for the purpose of exemplary explanation and does not have any restrictive meaning to this disclosure.

10 ... Information processing system, 100A, 100B, 100C ... Server device, 110, 210 ... Processing device, 120A, 120B ... Acquisition unit, 122A, 122B, 122C ... Estimating unit, 124 ... Control data generation unit, 200A, 200B ... User Device, 222 ... Control unit, 252 ... Sound input device, 1221, 1223 ... Learning unit, 1222 ... Prediction unit, 1224 ... Calculation unit, 1225 ... Specific unit, Da ... Behavior data, Dc ... Control data, Dl ... Label data, Dp ... Prediction data, Dt ... Teacher data, LG ... Log data, M1, M2 ... Learning model, Tx ... Unoperated time zone.

Claims

An acquisition unit that acquires log data that associates time with operation content including user voice operations on the user device.
Based on the log data, an estimation unit that estimates an unoperated time zone in which the voice operation is not performed among a plurality of time zones in which one day is divided,
A control data generation unit that generates control data instructing the sound input device that accepts the voice input to be turned off during the non-operation time zone.
A control data generator comprising.
The estimation unit
A learning unit that causes the learning model to learn the relationship between the log data and the time zone in which the voice operation was not performed.
It is provided with a prediction unit that generates prediction data indicating the presence or absence of the voice operation in a future time zone using the learning model.
The control data is data instructing the sound input device to be turned off during the non-operation time zone indicated by the prediction data.
The control data generation device according to claim 1.
The learning unit generates label data indicating the correctness of the predicted data based on the log data, and uses the pair of the label data and the log data before the time zone corresponding to the predicted data as teacher data for the learning. Let the model learn
The control data generation device according to claim 2.
The learning unit
Label data indicating the correctness of the predicted data is generated based on the log data.
Based on the log data before the time zone corresponding to the predicted data, the action data in which the action content of the user and the time when the action was performed are associated with each other is generated.
The training model is trained using the set of the label data and the behavior data as teacher data.
The control data generation device according to claim 2.
The acquisition unit acquires schedule data indicating a schedule related to the user's behavior, and obtains schedule data.
The learning unit generates the action data based on the log data before the time zone corresponding to the prediction data and the schedule data.
The control data generation device according to claim 4.
The estimation unit
Based on the log data, a calculation unit that calculates an evaluation value indicating the degree of possibility that the operation by voice is not performed for each of the plurality of time zones, and a calculation unit.
Based on the comparison result of comparing the evaluation value with the predetermined value, the specific unit for specifying the non-operation time zone and the specific unit
The control data generation device according to claim 1.
The control data generator according to any one of claims 1 to 6,
A sound input device that accepts the user's voice input,
A control unit that turns off the sound input device based on the control data,
A user device comprising.
An information processing system including a user device managed by a user and a server device capable of communicating with the user device.
The user device is
A sound input device that accepts user voice input,
A control unit that turns off the sound input device based on the control data,
A first communication device that transmits log data associated with time and operation content including a voice operation of the user in the user device to the server device, and receives the control data transmitted from the server device. Prepare,
The server device
A second communication device that receives the log data transmitted from the user device and transmits the control data to the user device.
Based on the log data, an estimation unit that estimates an unoperated time zone in which the voice operation is not performed among a plurality of time zones in which one day is divided, and an estimation unit.
It includes a control data generation unit that generates the control data instructing the sound input device that receives the voice input to be turned off during the non-operation time zone.
Information processing system.
The user device is
A detection device for detecting a state in which the user device is used is provided.
The control unit turns the sound input device off when the detection result of the detection device is a predetermined state in a time zone in which the control data does not specify that the sound input device is turned off.
The information processing system according to claim 8.