WO2024171312A1 - 情報処理装置、情報処理方法、および情報処理プログラム - Google Patents

情報処理装置、情報処理方法、および情報処理プログラム Download PDF

Info

Publication number
WO2024171312A1
WO2024171312A1 PCT/JP2023/005036 JP2023005036W WO2024171312A1 WO 2024171312 A1 WO2024171312 A1 WO 2024171312A1 JP 2023005036 W JP2023005036 W JP 2023005036W WO 2024171312 A1 WO2024171312 A1 WO 2024171312A1
Authority
WO
WIPO (PCT)
Prior art keywords
reward
time
schedule
current bias
information processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2023/005036
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
康紀 赤木
直貴 丸茂
健 倉島
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Inc
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2025500476A priority Critical patent/JPWO2024171312A1/ja
Priority to PCT/JP2023/005036 priority patent/WO2024171312A1/ja
Publication of WO2024171312A1 publication Critical patent/WO2024171312A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management

Definitions

  • This invention relates to an information processing device, an information processing method, and an information processing program.
  • Non-Patent Document 1 proposes a model based on graph theory, which is one of the means of such modeling.
  • states that a person can take are represented as vertices, and actions that a person can take in each state are represented as edges.
  • a cost is set for each edge, which represents the effort required to take each action.
  • a reward is placed at each vertex, which represents the reward for reaching the corresponding state.
  • the agent evaluates its own gains for candidate sequences of actions (paths on the graph) that it can take in the future, and selects the action with the largest gain.
  • the gains of a sequence of actions are calculated by weighting future costs low and recent costs high using a present bias called quasi-hyperbolic discounting.
  • This model has attracted attention as an effective way of explaining irrational human behavior, and has been extended to include models that include other biases, for example, as in non-patent literature 2 and non-patent literature 3.
  • This invention was made in light of the above circumstances, and its purpose is to provide a technology that maximizes the performance of individual subjects (agents) by creating a model that is a simpler version of the model disclosed in Non-Patent Document 1 and optimizing reward scheduling in that model (when and how much of a reward is to be given, with the overall budget set).
  • one aspect of the present invention is an information processing device that includes an input information acquisition unit that acquires input information including a total reward budget, the number of time steps until a target time, and a current bias strength; a calculation unit that calculates a predetermined threshold value greater than 1/2 based on the total reward budget, the number of time steps, and the current bias strength; a determination unit that determines whether the current bias strength is greater than the predetermined threshold value, and if the current bias strength is less than the predetermined threshold value, determines whether the current bias strength is greater than 1/2; a reward schedule calculation unit that calculates a reward schedule that corresponds to the result of the determination, including the number of times to give a reward, the time to give a reward, the reward to be given, and the achievement value required at that time; and an output control unit that controls to output the reward schedule.
  • FIG. 1 is a diagram showing an image of this embodiment.
  • FIG. 2 is a block diagram showing an example of a hardware configuration of the information processing device according to the present embodiment.
  • FIG. 3 is a block diagram showing the software configuration of the information processing apparatus according to this embodiment in relation to the hardware configuration shown in FIG.
  • FIG. 4 is a flowchart showing an example of the operation of the information processing device to optimize a reward schedule and output the optimized reward schedule.
  • FIG. 5 is a diagram showing an example of input information.
  • FIG. 6 shows an example of the optimized schedule displayed by the output control unit.
  • FIG. 7 shows an example of the optimized schedule displayed by the output control unit.
  • i ⁇ 0,. .. .. , N ⁇ , x ⁇ 0 ⁇ , E ⁇ ((I,x),(i+1,y))
  • i is a discrete value while x is a continuous value.
  • the cost of edge ((i,x),(i+1,y)) can be written as (y-x) 2 .
  • the reward obtained at vertex (i,x) is written as r(i,x).
  • i corresponds to the time
  • x corresponds to a numerical index that indicates the progress of the task. For example, if we consider the task of "walking 100,000 steps (100,000 steps) in 30 days," i represents the current day, and x represents the number of steps walked to date. N is the maximum time, or the number of time steps until the target time, which corresponds to "30 days” in the previous example.
  • is a parameter that indicates the strength of the present bias, in particular the strength of the quasi-hyperbolicity, and satisfies 0 ⁇ 1. If ⁇ is small, the subject will place a particularly large weight on the most recent costs, and if ⁇ is large, the subject will also evaluate future costs.
  • a total reward of R is given over N time steps.
  • the reward can be divided and presented multiple times, and each reward is expressed by the set (n, a, r) of time n, required achievement value a, and reward amount r.
  • V 1 ⁇ V k holds.
  • the reward schedule can be determined using Algorithm 1 below.
  • Figure 1 shows an image of this embodiment.
  • FIG. 2 is a block diagram showing an example of a hardware configuration of the information processing device 1 according to the present embodiment.
  • the information processing device 1 is a computer that outputs a schedule for giving rewards to subjects (agents) based on input data.
  • the information processing device 1 can create a schedule (timing) for giving rewards necessary for subjects to achieve their goals based on various information input by an administrator who manages the information processing device 1.
  • the information processing device 1 may be a portable terminal that the administrator can carry around. However, the information processing device 1 is not limited to a portable terminal, and may be a stationary personal computer on which a user can perform operations.
  • the information processing device 1 includes a control unit 10, a program storage unit 20, a data storage unit 30, a communication interface 40, an input/output interface 50, an input device 51, and an output device 52.
  • the control unit 10, the program storage unit 20, the data storage unit 30, the communication interface 40, and the input/output interface 50 are communicatively connected to each other via a bus. Furthermore, the input/output interface 50 is communicatively connected to the input device 2 and the output device 3.
  • the control unit 10 controls the information processing device 1.
  • the control unit 10 includes a hardware processor such as a central processing unit (CPU).
  • the control unit 10 may be an integrated circuit capable of executing various programs.
  • the program memory unit 20 can use, as a storage medium, a combination of non-volatile memory that can be written to and read from at any time, such as an EPROM (Erasable Programmable Read Only Memory), HDD (Hard Disk Drive), or SSD (Solid State Drive), and non-volatile memory such as a ROM (Read Only Memory).
  • the program memory unit 20 stores programs necessary to execute various processes. In other words, the control unit 10 can realize various controls and operations by reading and executing the programs stored in the program memory unit 20.
  • the data storage unit 30 is a storage that uses a combination of non-volatile memory, such as a HDD or memory card, which can be written to and read from at any time, and volatile memory, such as a RAM (Random Access Memory), as a storage medium.
  • the data storage unit 30 is used to store data acquired and generated in the process of the control unit 10 executing programs and performing various processes.
  • the communication interface 40 includes one or more wired or wireless communication modules.
  • the communication interface 40 includes a communication module that wirelessly connects to an external device via a network.
  • the communication interface 40 may also include a wired communication module that enables a direct connection to an external device without going through a network.
  • the communication interface 40 may also include a wireless communication module that uses short-range wireless technology to wirelessly connect to an external device.
  • the communication interface 40 may be a general communication interface as long as it can communicate with an external device under the control of the control unit 10 and send and receive various information.
  • the input/output interface 50 is connected to the input device 51, the output device 52, etc.
  • the input/output interface 50 is an interface that enables the transmission and reception of information between the input device 51, the output device 52, etc.
  • the input/output interface 50 may be integrated with the communication interface 40.
  • the information processing device 1 and at least one of the input device 2 and the output device 3, etc. may be wirelessly connected using short-range wireless technology, etc., and the transmission and reception of information may be performed using the short-range wireless technology.
  • the input device 2 includes, for example, a keyboard or a pointing device for the administrator of the information processing device 1 to input various information.
  • the input device 51 may also include a reader for reading data to be stored in the program storage unit 20 or the data storage unit 30 from a memory medium such as a USB memory, or a disk device for reading such data from a disk medium.
  • the output device 3 includes a display that displays the output data to be presented to the administrator from the information processing device 1, a printer that prints it, etc.
  • FIG. 3 is a block diagram showing the software configuration of the information processing device 1 according to the present embodiment in relation to the hardware configuration shown in FIG.
  • the control unit 10 includes an input information acquisition unit 101, a remuneration schedule optimization unit 102, and an output control unit 103.
  • the data storage unit 30 includes an input information storage unit 301, and a remuneration schedule storage unit 302.
  • the input information acquisition unit 101 acquires input information.
  • the input information includes the number of time steps N, the current bias strength ⁇ , the total reward budget R, etc.
  • the input information acquisition unit 101 stores the input information in the input information storage unit 301.
  • the current bias strength ⁇ is a value that can be determined by the interventionist from the state of the subject. For example, the interventionist determines the state of the subject using a questionnaire, and determines the current bias strength ⁇ from the state of the subject.
  • Reward schedule optimization unit 102 is an optimization unit that optimizes a schedule for determining at what timing rewards should be given to subjects based on input information in order to maximize a numerical index. For example, reward schedule optimization unit 102 optimizes the reward schedule according to the above-mentioned Algorithm 1.
  • Reward schedule optimization unit 102 includes a ⁇ 0 calculation unit 1021, a ⁇ determination unit 1022, and a reward schedule calculation unit 1023.
  • the ⁇ 0 calculation unit 1021 reads out the input information stored in the input information storage unit 301. Then, the ⁇ 0 calculation unit 1021 calculates a predetermined threshold value ⁇ 0 that is equal to or greater than 1 ⁇ 2 based on at least the number of time steps N and the current bias strength ⁇ . The method of calculating the predetermined threshold value ⁇ 0 will be described in detail later.
  • the ⁇ determination unit 1022 determines whether the current bias strength ⁇ included in the input information is equal to or greater than the calculated ⁇ 0. If ⁇ is equal to or greater than ⁇ 0 , the ⁇ determination unit 1022 outputs the input information and information indicating that ⁇ is equal to or greater than ⁇ 0 to the reward schedule calculation unit 1023. Furthermore, if the ⁇ determination unit 1022 determines that ⁇ is less than ⁇ 0 , it determines whether ⁇ is greater than 1 ⁇ 2.
  • the reward schedule calculation unit 1023 calculates a reward schedule that corresponds to the result determined by the ⁇ comparison unit.
  • the reward schedule includes the number of times the reward is given, the time at which the reward is given, the reward to be given, and the achievement value required at that time.
  • the reward schedule calculation unit 1023 calculates the reward schedule according to Algorithm 1 described above. A more detailed explanation of the method for calculating the reward schedule will be given later.
  • the output control unit 103 controls the display of the output device 3 to display the optimized schedule calculated by the remuneration schedule calculation unit 1023.
  • the input information storage unit 301 is used to store the input information acquired by the input information acquisition unit 101.
  • the remuneration schedule storage unit 302 is used to store the remuneration schedule calculated by the remuneration schedule optimization unit 102.
  • FIG. 4 is a flowchart showing an example of the operation of information processing device 1 to optimize a reward schedule and output the optimized reward schedule. That is, FIG. 4 shows an example of the operation of the information processing device 1 to optimize the reward schedule using Algorithm 1 described above.
  • the operation is started by an instruction from an administrator of the information processing device 1. Alternatively, the operation may be started when the control unit 10 receives input information, which will be described below, from an external device via the communication interface 40.
  • step ST101 the input information acquisition unit 101 acquires input information.
  • the input information includes the number of time steps N, the current bias strength ⁇ , the total reward budget R, etc.
  • the input information acquisition unit 101 stores the input information in the input information storage unit 301.
  • FIG. 5 is a diagram showing an example of input information.
  • the number of time steps N is 20
  • the current bias strength ⁇ is 0.6
  • the total reward budget is 100.
  • step ST102 the ⁇ 0 calculation unit 1021 calculates ⁇ 0.
  • the ⁇ 0 calculation unit 1021 reads out the input information stored in the input information storage unit 301.
  • the ⁇ 0 calculation unit 1021 outputs the calculated ⁇ 0 to the ⁇ determination unit 1022.
  • step ST103 the ⁇ judgment unit 1022 judges whether ⁇ is equal to or greater than ⁇ 0 . Then, the ⁇ judgment unit 1022 judges whether ⁇ included in the input information is equal to or greater than the calculated ⁇ 0 . If it is judged that the current bias strength ⁇ is equal to or greater than ⁇ 0 , the ⁇ judgment unit 1022 outputs the input information and information indicating that ⁇ is equal to or greater than ⁇ 0 to the reward schedule calculation unit 1023. Then, the process proceeds to step ST104. On the other hand, if it is judged that ⁇ is less than ⁇ 0 , the process proceeds to step ST105.
  • ⁇ determination unit 1022 determines whether ⁇ is 1/2 or greater. If the ⁇ of the input information is 1/2 or greater, ⁇ determination unit 1022 outputs information indicating that ⁇ is 1/2 or greater and the input information to remuneration schedule calculation unit 1023. On the other hand, if ⁇ is less than 1/2, ⁇ determination unit 1022 outputs the input information, information indicating that ⁇ is less than 1/2, and the input information to remuneration schedule calculation unit 1023.
  • the optimal remuneration schedule is calculated as the remuneration to be given when the total remuneration budget is R.
  • the remuneration schedule calculation unit 1023 stores the calculated remuneration schedule in the remuneration schedule storage unit 302.
  • the reward schedule calculation unit 1023 determines the time N i at which the reward should be given for each time N i by rounding the integer value so that
  • the remuneration schedule calculation unit 1023 calculates that the achievement value A i required at time N i is
  • the remuneration schedule calculation unit 1023 calculates the above schedule to be an optimized remuneration schedule.
  • the remuneration schedule calculation unit 1023 stores the calculated remuneration schedule in the remuneration schedule storage unit 302.
  • step ST108 the output control unit 103 controls the output device 3 so that the reward schedule is displayed on a display or the like.
  • the output control unit 103 acquires the reward schedule stored in the reward schedule storage unit 302, and controls the output device 3 so that the reward schedule is displayed on a display or the like.
  • the intervener can maximize the subject's performance by rewarding the subject based on the reward schedule.
  • FIG. 6 and 7 show examples of the optimized schedule displayed by the output control unit 103.
  • FIG. 6 is a diagram showing the number of times k to give a reward
  • FIG. 7 is a diagram showing a time N i , an achievement value A i required at the time N i , and a reward R i to be given.
  • the information processing device 1 can optimize a schedule for how to give rewards under the influence of the present bias, thereby maximizing the performance of a target human (subject) in a task.
  • the operations of the components can be constructed as a program, which can be installed and executed on a computer used as the information processing device.
  • the method described in the above embodiment can be stored as a program (software means) that can be executed by a calculator (computer) on a storage medium such as a magnetic disk (floppy disk, hard disk, etc.), optical disk (CD-ROM, DVD, MO, etc.), semiconductor memory (ROM, RAM, flash memory, etc.), and can also be distributed by transmitting it via a communication medium.
  • the program stored on the medium also includes a setting program that configures the software means (including not only execution programs but also tables and data structures) that the computer executes.
  • the computer that realizes this device reads the program stored in the storage medium, and in some cases, configures the software means using the setting program, and executes the above-mentioned processing by controlling the operation of this software means.
  • the storage medium referred to in this specification is not limited to one for distribution, but also includes storage media such as magnetic disks and semiconductor memories installed inside the computer or in devices connected via a network.
  • this invention is not limited to the above-described embodiment, and various modifications can be made in the implementation stage without departing from the gist of the invention.
  • the various embodiments may be implemented in combination as appropriate as possible, in which case the combined effects can be obtained.
  • the above-described embodiment includes inventions at various stages, and various inventions can be extracted by appropriate combinations of the multiple constituent elements disclosed.
  • REFERENCE SIGNS LIST 1 Information processing device 2: Input device 3: Output device 10: Control unit 101: Input information acquisition unit 102: Remuneration schedule optimization unit 1021: ⁇ 0 calculation unit 1022: ⁇ determination unit 1023: Remuneration schedule calculation unit 103: Output control unit 20: Program storage unit 30: Data storage unit 301: Input information storage unit 302: Remuneration schedule storage unit 40: Communication interface 50: Input/output interface 51: Input device 52: Output device

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Operations Research (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
PCT/JP2023/005036 2023-02-14 2023-02-14 情報処理装置、情報処理方法、および情報処理プログラム Ceased WO2024171312A1 (ja)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2025500476A JPWO2024171312A1 (https=) 2023-02-14 2023-02-14
PCT/JP2023/005036 WO2024171312A1 (ja) 2023-02-14 2023-02-14 情報処理装置、情報処理方法、および情報処理プログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2023/005036 WO2024171312A1 (ja) 2023-02-14 2023-02-14 情報処理装置、情報処理方法、および情報処理プログラム

Publications (1)

Publication Number Publication Date
WO2024171312A1 true WO2024171312A1 (ja) 2024-08-22

Family

ID=92421013

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/005036 Ceased WO2024171312A1 (ja) 2023-02-14 2023-02-14 情報処理装置、情報処理方法、および情報処理プログラム

Country Status (2)

Country Link
JP (1) JPWO2024171312A1 (https=)
WO (1) WO2024171312A1 (https=)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013084175A (ja) * 2011-10-12 2013-05-09 Sony Corp 情報処理装置、情報処理方法、及びプログラム
JP2019141869A (ja) * 2018-02-19 2019-08-29 ファナック株式会社 制御装置及び機械学習装置
WO2021165425A1 (en) * 2020-02-21 2021-08-26 Philip Morris Products Sa Method and apparatus for interactive and privacy-preserving communication between a server and a user device
JP2021192141A (ja) * 2020-06-05 2021-12-16 国立大学法人 東京大学 学習装置、学習方法、および学習プログラム

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013084175A (ja) * 2011-10-12 2013-05-09 Sony Corp 情報処理装置、情報処理方法、及びプログラム
JP2019141869A (ja) * 2018-02-19 2019-08-29 ファナック株式会社 制御装置及び機械学習装置
WO2021165425A1 (en) * 2020-02-21 2021-08-26 Philip Morris Products Sa Method and apparatus for interactive and privacy-preserving communication between a server and a user device
JP2021192141A (ja) * 2020-06-05 2021-12-16 国立大学法人 東京大学 学習装置、学習方法、および学習プログラム

Also Published As

Publication number Publication date
JPWO2024171312A1 (https=) 2024-08-22

Similar Documents

Publication Publication Date Title
Frazier et al. Incentivizing exploration
KR102322845B1 (ko) 인공지능 기반 브랜드 마케팅 전략 도출 방법, 장치 및 시스템
Wang et al. MeLoDy: A long-term dynamic quality-aware incentive mechanism for crowdsourcing
Ballings et al. CRM in social media: Predicting increases in Facebook usage frequency
JP6737707B2 (ja) コンテンツ推奨のための方法、装置およびシステム
Wright et al. Level-0 meta-models for predicting human behavior in games
JP5484968B2 (ja) 情報処理装置、情報処理方法、及びプログラム
US20100293026A1 (en) Crowdsourcing
JP6931624B2 (ja) 学習支援装置および学習支援方法
CN108537567A (zh) 一种目标用户群体的确定方法和装置
JP4847916B2 (ja) 購買順序を考慮したリコメンド装置、リコメンド方法、リコメンドプログラムおよびそのプログラムを記録した記録媒体
JP5984147B2 (ja) 情報処理装置、情報処理方法、及び、プログラム
CN104899266A (zh) 一种应用推荐方法及装置
CN111694753B (zh) 一种应用程序测试方法、装置及计算机存储介质
CN109800138B (zh) 一种cpu测试方法、电子装置及存储介质
WO2017219121A2 (en) Method and system for determining optimized customer touchpoints
Fujii et al. Numerical analysis of non-constant pure rate of time preference: a model of climate policy
KR20220170583A (ko) 연합 학습 방법 및 장치
JP2009110341A (ja) 時間情報を用いた予測装置、予測方法、予測プログラムおよびそのプログラムを記録した記録媒体
CN117980938A (zh) 智能预测性a/b测试
KR102010031B1 (ko) 게임 지표 정보 예측 방법 및 장치
CN112669091A (zh) 数据处理方法、装置及存储介质
JP7741383B2 (ja) 情報処理装置、情報処理方法及びプログラム
WO2024171312A1 (ja) 情報処理装置、情報処理方法、および情報処理プログラム
CN112446763B (zh) 服务推荐方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23922651

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2025500476

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2025500476

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23922651

Country of ref document: EP

Kind code of ref document: A1