KR20170140655A

KR20170140655A - Context Aware POMDP based Adaptive Eye Tracking for Robust Human Computer Interaction

Info

Publication number: KR20170140655A
Application number: KR1020160073242A
Authority: KR
Inventors: 이필규
Original assignee: 인하대학교 산학협력단
Priority date: 2016-06-13
Filing date: 2016-06-13
Publication date: 2017-12-21
Also published as: KR101832805B1

Abstract

본 발명은 강건한 휴먼 컴퓨터 상호작용을 위한 CA-POMDP를 이용한 눈 추적 장치에 관한 것이다.
또한, 본 발명에 따르면, 얼굴 영상의 입력시에, 눈 영역을 추출한 후에, 이진화 문턱값으로 눈 영상을 이진화하며, 이진화된 눈 영상에서 눈동자 위치를 추적하여 눈 추적을 수행하고, 시스템 제어 파라미터의 조정 행위에 대한 보상값을 출력하는 눈 추적 모듈; 얼굴 영상을 입력받아 화질을 평가하여 화질 라벨을 제공하는 화질 평가 모듈; 및 상기 시스템 제어 파라미터를 상태로 정의하고, 화질 라벨별로 다양한 객체들의 조합으로 월드 컨텍스트 모델을 구성하고, 상기 화질 평가 모듈에서 제공되는 화질 라벨에 따라 다양한 객체들의 속성과 속성 값들로 표현된 월드 컨텍스트 모델상의 정보를 기초로 현재 상태를 판단하고, 판단된 현재 상태를 근거로 실행 가능한 시스템 제어 파라미터 조정 행위를 수행하고, 상기 시스템 제어 파라미터 조정 행위의 보상값을 기초로 행위 함수값을 갱신하고 갱신된 행위 함수값에 따라 최적의 시스템 제어 파라미터 조정 행위를 선택하는 CA-POMDP 모듈을 포함하는 눈 추적 장치를 제공한다.The present invention relates to an eye tracking apparatus using CA-POMDP for robust human computer interaction.
In addition, according to the present invention, when the face image is input, the eye region is extracted, the eye image is binarized with the binarization threshold, the eye position is tracked in the binarized eye image, An eye tracking module for outputting a compensation value for an adjustment action; An image quality evaluation module that receives a face image and evaluates image quality to provide an image quality label; And the system control parameter is defined as a state, a world context model is configured by a combination of various objects according to image quality labels, and a world context model expressed by attribute and attribute values of various objects according to an image quality label provided by the image quality evaluation module Determines an existing state based on information on the system control parameter adjustment operation, performs an executable system control parameter adjustment operation based on the determined current state, updates the behavior function value based on the compensation value of the system control parameter adjustment operation, And a CA-POMDP module for selecting an optimal system control parameter adjustment behavior according to a function value.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to an eye tracking apparatus using CAPOMDP for robust human computer interaction,

본 발명은 강건한 휴먼 컴퓨터 상호작용을 위한 CA-POMDP를 이용한 눈 추적 장치에 관한 것이다.The present invention relates to an eye tracking apparatus using CA-POMDP for robust human computer interaction.

최근, 많은 시선 추적(eye-tracking) 처리 방법들이 다양한 목표들과 어플리케이션들에 대해 제안되었다. Recently, many eye-tracking processing methods have been proposed for various goals and applications.

자동 시선 추적은, 인간의 지각, 주의 및 인식의 상태와 강하게 연결되어 있기 때문에, 많은 어플리케이션 영역에서 이용되어 왔다. Automatic line-of-sight tracking has been used in many application areas because it is strongly connected to the state of human perception, attention and cognition.

시선 추적 기술은, 상호작용을 위한 효과적인 툴로서, 방해받지 않고 손을 쓰지 않고도 이용할 수 있는 휴먼 컴퓨터 상호작용(human computer interaction, HCI) 및 컴퓨터와 사람간의 통신에 이용될 수 있다. Eye-tracking technology is an effective tool for interaction and can be used for human-computer interaction (HCI) and computer-to-human communication that can be used without interference and without hands.

그럼에도 불구하고, 기술 개발에 많은 노력과 조명, 시야각, 스케일 등 많은 문제가 존재하고, 개개인의 눈의 모양 및 지터링의 다양한 종류에 따른 동작 환경의 변경으로 문제가 존재한다. Nevertheless, there are many problems in technology development and many problems such as illumination, viewing angle, and scale, and there is a problem due to the change of operating environment according to various shapes of eyes and jitter of individual.

상업적인 영역에서 가장 성공적인 시선 추적 기술은 고 비용의 이미지 캡처 장치(예를 들어, 고 비용의 카메라 및 렌즈)를 요구하거나 또는 강력하게 제어된 상황 내에서 매우 제한된 동작이 요구된다. The most successful eye tracking techniques in the commercial domain require very costly image capture devices (e.g., high cost cameras and lenses) or very limited operations in strongly controlled situations.

최근 시선 추적 기술에서 최대의 관심사는, 웹 유용성, 광고, 스마트 TV 및 모바일 어플리케이션을 위한 HCI 분야에서 생성되고 있다. The greatest interest in recent line-of-sight technology is being generated in the HCI field for web usability, advertising, smart TV and mobile applications.

그러나, 종래의 시선 추적 기술은 저 비용 산업 어플리케이션에서 이용하기에 높은 비용과 제한성이라는 문제점을 갖고 있다. Conventional line-of-sight tracking technology, however, has the problem of high cost and limitation for use in low cost industrial applications.

조명의 제어와 상황의 제어 없이, 비용을 줄일 뿐만 아니라 복잡한 초기 사용자 설정을 간소화하여 더 이용 가능하고 더 일반적인 시선 추적 기술이 성공적인 HCI 어플리케이션을 위해 필요하다.Without control of lighting and control of the situation, more available and more general eye tracking techniques are needed for successful HCI applications, as well as reducing costs and simplifying complex initial user settings.

공개특허번호 2015-002741Published Patent No. 2015-002741 등록번호 10-138775Registration number 10-138775

1. L. J. G. Vazquez, et. al.: Low Cost Human Computer Interface Voluntary Eye Movement as Communication System for Disabled People with Limited Movements. PAHCE, pp. 165-170. (2011)1. L. J. G. Vazquez, et. al .: Low Cost Human Computer Interface Voluntary Eye Movement as Communication Systems for Disabled People with Limited Movements. PAHCE, pp. 165-170. (2011) 2. J. S. Agustin, et. al.: Low-Cost Gaze Interaction: Ready to Deliver the Promises. CHI 2009 - Spotlight on Works in Progress - Session 2, Boston, MA, USA (2009) 2. J. S. Agustin, et. al .: Low-Cost Gaze Interaction: Ready to Deliver the Promises. CHI 2009 - Spotlight on Works in Progress - Session 2, Boston, MA, USA (2009) 3. A. Sein et. al.: Eyeing a Real-Time Human-Computer Interface to Assist Those with Motor Disabilities. IEEE Potentials, pp. 19-25 (2008)3. A. Sein et al. al .: Eyeing a Real-Time Human-Computer Interface to Assist Those with Motor Disabilities. IEEE Potentials, pp. 19-25 (2008) 4. R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, (1998)4. R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, (1998) 5. S. Ross et. al.: A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes. Journal of Machine Learning Research 12, pp.1729-1770. (2011)5. S. Ross et. al.: A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes. Journal of Machine Learning Research 12, pp.1729-1770. (2011) 6. G. Shani: A survey of point-based POMDP solvers. Auton Agent Multi-Agent Syst, Published online, Springer (2012)6. G. Shani: A survey of point-based POMDP solvers. Auton Agent Multi-Agent Syst, Published online, Springer (2012) 7. A. Atrash et. al.: Development and validation of a robust speech interface for improved human-robot interaction. International Journal of Social Robotics, 1, pp.345-356. (2009)7. Atrash et al. al .: Development and validation of a robust speech interface for improved human-robot interaction. International Journal of Social Robotics, 1, pp. 345-356. (2009) 8. G. Shani, et. al.: An MDP-based recommender system. Journal of Machine Learning Research, 6, pp.1265-1295. (2005)8. G. Shani, et. an MDP-based recommender system. Journal of Machine Learning Research, 6, pp. 1265-1295. (2005) 9. M. Hauskrecht, H. S. F. Fraser: Planning treatment of ischemic heart disease with partially observable Markov decision processes," Artificial Intelligence in Medicine, 18(3), pp. 221-244. (2000)9. M. Hauskrecht, H. S. F. Fraser: Planning treatment of ischemic heart disease with partial observable Markov decision processes, Artificial Intelligence in Medicine, 18 (3), pp. 221-244. (2000) 10. J. Hoey, et. al.: Automated hand washing assistance for persons with dementia using video and a partially observable Markov decision process. Computer Vision and Image Understanding, 114(5), pp. 503-519. (2010)10. J. Hoey, et. al .: Automated hand washing assistance for persons with dementia using video and a partially observable Markov decision process. Computer Vision and Image Understanding, 114 (5), pp. 503-519. (2010) 11. J. D. Williams, S. Young: Partially observable Markov decision processes for spoken dialog systems. Computer Speech & Language, 21(2), pp. 393-422. (2007)11. J. D. Williams, S. Young: Partially observable Markov decision processes for spoken dialog systems. Computer Speech & Language, 21 (2), pp. 393-422. (2007) 12. L. P. Kaelbling et. al.: Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, pp. 99-134. (1998)12. L. P. Kaelbling et. al .: Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, pp. 99-134. (1998) 13. J. Pineau, et. al., "Point-based value iteration: An anytime algorithm for POMDPs," In International joint conference on artificial intelligence, pp. 1025-1032. (2003)13. J. Pineau, et. , "Point-based value iteration: An anytime algorithm for POMDPs," In International Joint Conference on Artificial Intelligence, pp. 1025-1032. (2003) 14. T. Smith and R. G. Simmons, "Point-based POMDP algorithms: Improved analysis and implementation," In Conference on uncertainty in artificial intelligence (UAI), pp. 542-547. (2005)14. T. Smith and R. G. Simmons, "Point-based POMDP algorithms: Improved analysis and implementation," In Conference on uncertainty in artificial intelligence (UAI), pp. 542-547. (2005) 15. S. Paquet: Distributed Decision-Making and Task Coordination in Dynamic, Uncertain and Real-Time Multiagent Environments. Ph.D. thesis, Laval University (2006)15. S. Paquet: Distributed Decision-Making and Task Coordination in Dynamic, Uncertain and Real-Time Multiagent Environments. Ph.D. thesis, Laval University (2006) 16. S. Ross, et. al.: Online Planning Algorithms for POMDPs. Journal of Artificial Intelligence Research, 32, pp. 663-704. (2008)16. S. Ross, et. al .: Online Planning Algorithms for POMDPs. Journal of Artificial Intelligence Research, 32, pp. 663-704. (2008) 17. M. L. Littman et. al.: Learning policies for partially observable environments: scaling up. International Conference in Machine Learning (1995)17. M. L. Littman et. al .: Learning policies for scaling up. International Conference on Machine Learning (1995) 18. G. Shani, et. Al: Forward search value iteration for POMDPs. In International joint conference on artificial intelligence, (IJCAI), (2007)18. G. Shani, et. Al: Forward search value iteration for POMDPs. In Joint Conference on Artificial Intelligence, (IJCAI), (2007) 19. T. Smith, and R. Simmons, "Heuristic search value iteration for POMDPs," In Conference on uncertainty in artificial intelligence (UAI), 2003.19. T. Smith, and R. Simmons, "Heuristic search value iteration for POMDPs," In Conference on uncertainty in artificial intelligence (UAI), 2003. 20. H. Kurniawati, et. al. "SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces," Proc. Robotics: Science & Systems, 2008.20. H. Kurniawati, et. al. "SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces," Proc. Robotics: Science & Systems, 2008. 21. A. Barto, et. al.: Learning to act using realtime dynamic programming. AIJ, Vol. 72, pp. 81-138. (1995)21. A. Barto, et. al .: Learning to act using realtime dynamic programming. AIJ, Vol. 72, pp. 81-138. (1995) 22. S. Paquet, et. al,"An online POMDP algorithm for complex multiagent environments, AAMAS (2005)22. S. Paquet, et. al, "An online POMDP algorithm for complex multiagent environments, AAMAS (2005) 23. J. Wang, et. al., "A Framework for Moving Target Detection, Recognition and Tracking in UAV Videos," Affective Computing and Intelligent Interaction, ed., Jia Luo, volume 137 of Advances in Intelligent and Soft Computing, Springer Berlin / Heidelberg, 2012, pp. 69-76. 23. J. Wang, et. JA Luo, volume 137 of Advances in Intelligent and Soft Computing, Springer Berlin / Heidelberg, 2012, p. 69-76. 24. B. Bonet and H. Geffner, "Solving POMDPs: RTDP-bel vs. point based algorithms," Proceedings of the 21st international joint conference on Artificial intelligence, IJCAI, 2009, p. 1641-1646.24. B. Bonet and H. Geffner, "Solving POMDPs: RTDP-bel vs. point based algorithms," Proceedings of the 21st International Joint Conference on Artificial Intelligence, IJCAI, 2009, p. 1641-1646. 25. S. Ross and B. Chaib-draa et. al., "Aems: An anytime online search algorithm for approximate policy refinement in large POMDPs," Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007, pp. 2592-2598.25. S. Ross and B. Chaib-draa et. al., "Aems: An anytime online search algorithm for approximate policy refinement in large POMDPs," Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007, pp. 2592-2598. 26. C. A. P. Chanel, "POMDP-based online target detection and recognition for autonomous UAVs,"ECAI, 2012, pp. 955-960. 26. C. A. P. Chanel, "POMDP-based online target detection and recognition for autonomous UAVs," ECAI, 2012, pp. 955-960. 27. S. Paquet, et. al., "Hybrid POMDP algorithms," Proceedings of The Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains, 2006.27. S. Paquet, et. al., "Hybrid POMDP algorithms," Proceedings of the Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains, 2006. 28. S. Ross, et. al., "Theoretical analysis of heuristic search methods for online POMDPs," Advances in Neural Information Processing Systems, 2008.28. S. Ross, et. al., "Theoretical analysis of heuristic search methods for online POMDPs," Advances in Neural Information Processing Systems, 2008. 29. H. Geffner and B. Bonet, "Solving large POMDPs using real time dynamic Programming," in Working Notes Fall AAAI Symposium on POMDPs, 1998, pp. 61-68.29. H. Geffner and B. Bonet, "Solving large POMDPs using real time dynamic programming," in Working Notes Fall AAAI Symposium on POMDPs, 1998, pp. 61-68. 30. R. Washington, "BI-POMDP: Bounded, Incremental Partially-Observable Markov-Model Planning," Proceedings of the 4th European Conference on Planning, volume 1348 of Lecture Notes in Computer Science, Toulouse, France, Springer, 1997, pp. 440-451.30. R. Washington, "BI-POMDP: Bounded, Incremental Partially-Observable Markov-Model Planning," Proceedings of the 4th European Conference on Planning, volume 1348 of Lecture Notes in Computer Science, Toulouse, France, Springer, 1997, pp. . 440-451. 31. T. Wang, et. al., "A Bayesian sparse sampling for on-line reward optimization,"ICML'05: Proceedings of the 22nd international conference on Machine Learning, New York, NY, USA: ACM,2005, pp. 956-963. 31. T. Wang, et. , "A Bayesian Sparse Sampling for On-line Reward Optimization," ICML'05: Proceedings of the 22nd International Conference on Machine Learning, New York, NY, USA: ACM, 2005, pp. 956-963. 32. J. Asmuth, et. al., "A Bayesian sampling approach to exploration in reinforcement learning,"Conference on Uncertainty in Artificial Intelligence (2009)32. J. Asmuth, et. "A Bayesian sampling approach to exploration in reinforcement learning," Conference on Uncertainty in Artificial Intelligence (2009) 33. F. Doshi-Velez, et. al., "Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs,"Artificial Intelligence, 2012, pp.187-188.33. F. Doshi-Velez, et. al., "Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs," Artificial Intelligence, 2012, pp. 187-188. 34. M. Strens, "A Bayesian framework for reinforcement learning,"International Conference in Machine Learning, 2000.34. M. Strens, "A Bayesian framework for reinforcement learning," International Conference on Machine Learning, 2000. 35. R. Jaulmes, "Active learning in partially observable Markov Decision Processes," ECML, 2005.35. R. Jaulmes, "Active Learning in Markov Decision Processes," ECML, 2005. 36. S. Ross, et. al., "Bayes-adaptive POMDPs. In Neural Information Processing Systems," 2008.36. S. Ross, et. al., "Bayes-adaptive POMDPs. In Neural Information Processing Systems," 2008. 37. F. Doshi-Velez, "The infinite partially observable Markov decision process," Advances in Neural Information Processing Systems, 22, pp. 477-485 (2009)37. F. Doshi-Velez, "The infinite partial observable Markov decision process," Advances in Neural Information Processing Systems, 22, pp. 477-485 (2009) 38. P. Poupart and N. Vlassis, "Model-based Bayesian reinforcement learning in partially observable domains," ISAIM, 2008.38. P. Poupart and N. Vlassis, "Model-based Bayesian reinforcement learning in partially observable domains," ISAIM, 2008. 39. A. Krause, et. al., "Trading off prediction accuracy and power consumption for context-aware wearable computing," Proc. 9th IEEE Int. Symp. Wearable Computers, Oct. 2005, pp. 20-26.39. A. Krause, et. al., "Trading off prediction accuracy and power consumption for context-aware wearable computing, " Proc. 9th IEEE Int. Symp. Wearable Computers, Oct. 2005, pp. 20-26. 40. L. Au, et. al, "Episodic sampling: Towards energy-efficient patient monitoring with wearable sensors," Proc. IEEE Annu. Int. Conf. Engineering in Medicine and Biology Society, Sep. 2009, pp. 6901-6905.40. L. Au, et. al, "Episodic sampling: Towards energy-efficient patient monitoring with wearable sensors," Proc. IEEE Annu. Int. Conf. Engineering in Medicine and Biology Society, Sep. 2009, pp. 6901-6905. 41. B. Bhanu and J. Peng, "Adaptive integrated image segmentation and object recognition," IEEE Trans. on Systems, Man, and Cybernetics-PART C: Applications and Reviews, Vol. 30, No. 4, Nov. 2000, pp. 427-441.41. B. Bhanu and J. Peng, "Adaptive integrated image segmentation and object recognition," IEEE Trans. on Systems, Man, and Cybernetics-PART C: Applications and Reviews, Vol. 30, No. 4, Nov. 2000, pp. 427-441. 42. R. D. Smallwood and E. J. Sondik. "The optimal control of partially observable Markov processes over a finite horizon," Operations Research, 21(5): Sep/Oct, 1973, pp. 1071-1088.42. R. D. Smallwood and E. J. Sondik. "Optimal control of partially observable Markov processes over a finite horizon," Operations Research, 21 (5): Sep / Oct, 1973, pp. 1071-1088. 43. K. Murphy, "Dynamic Bayesian Networks: Representation, Inference and Learning," PhD thesis, UC Berkeley, 2002.43. K. Murphy, "Dynamic Bayesian Networks: Representation, Inference and Learning," PhD Thesis, UC Berkeley, 2002. 44. M. Abdel-Mottaleb and M. H. Mahoor, "Application notes - Algorithms for assessing the quality of facial images", IEEE Computational Intelligence Magazine, Volume 2, Issue2, May 2007, pp. 10 - 17. 44. M. Abdel-Mottaleb and M. H. Mahoor, "Application notes - Algorithms for assessing the quality of facial images ", IEEE Computational Intelligence Magazine, Volume 2, Issue 2, May 2007, pp. 10-17. 45. H. Sellahewa and S. A. Jassim, "Image-quality-based adaptive face recognition," IEEE Trans. on Instrumentation and Measurement, Vol. 59, No. 4, April, 2010, pp. 805-813.45. H. Sellahewa and S. A. Jassim, "Image-quality-based adaptive face recognition," IEEE Trans. on Instrumentation and Measurement, Vol. 59, No. 4, April, 2010, pp. 805-813. 46. Q. Li and Z. Wang, "Reduced-reference image quality assessment using divisive normalization-based image representation," IEEE Journal of Selected Topic in Signal Processing, Vol. 3, No. 2, Apr. 2009, pp. 202-211. 46. Q. Li and Z. Wang, "Reduced-reference image quality assessment using divisive normalization-based image representation," IEEE Journal of Selected Topic in Signal Processing, Vol. 3, No. 2, Apr. 2009, pp. 202-211. 47. C. J. C. H. Watkins, Learning from delayed rewards, Ph.D. Thesis, Cambridge University, Cambridge, England, 1989.47. C. J. C. H. Watkins, Learning from delayed rewards, Ph.D. Thesis, Cambridge University, Cambridge, England, 1989. 48. Z., Wang, A.C., Bovik: A universal image quality index. IEEE Signal Process. Lett., Vol. 9, No. 3, pp. 81-84. (2002)48. Z., Wang, A. C., Bovik: A universal image quality index. IEEE Signal Process. Lett., Vol. 9, No. 3, pp. 81-84. (2002) 49. P.K., Rhee, et. al.: Context-aware evolvable system framework for environment identifying systems. Part II, pp. 270-283. KES (2005) 49. P. K., Rhee, et. al .: Context-aware evolvable system framework for identifying systems. Part II, pp. 270-283. KES (2005) 50. E.J., Koh, M.Y., Nam, P.K., Rhee: A context-driven Bayesian classification method for eye location. ICANNGA, 2, pp.517-524 (2007)50. E.J., Koh, M.Y., Nam, P.K., Rhee: A context-driven Bayesian classification method for eye location. ICANNGA, 2, pp.517-524 (2007) 51. R. Dearden, N. Friedman, D. Andre: Model based Bayesian exploration. Proceedings UAI (1999)51. R. Dearden, N. Friedman, D. Andre: Model based Bayesian exploration. Proceedings UAI (1999) 52. P. K. Rhee, et. al.: Pupil location and movement measurement for efficient emotional sensibility Analysis. ISSPIT (2011)52. P. K. Rhee, et. al .: Pupil location and movement measurement for efficient emotional sensibility analysis. ISSPIT (2011) 53. Z. Kadal, et. al.: Tracking-Learning-Detection. IEEE Trans on PAMI, Vol. 34, No. 7, pp. 1409-1422. (2012)53. Z. Kadal, et. al .: Tracking-Learning-Detection. IEEE Trans on PAMI, Vol. 34, No. 7, pp. 1409-1422. (2012) 54. Y. Shen, et al.: Evolutionary adaptive eye tracking for low-cost human computer interaction applications. Journal of Electronic Imaging, 22.1, 013031-013031 (2013)54. Y. Shen, et al .: Evolutionary adaptive eye tracking for low-cost human computer interaction applications. Journal of Electronic Imaging, 22.1, 013031-013031 (2013)

본 발명은 상기와 같은 문제점을 해결하기 위하여, 고비용 이미지 캡쳐 기기나 매우 제한적인 상황을 이용하는 대신, CA-POMDP를 사용하여 시스템 제어 파라미터를 최적화하여 성능을 보장할 수 있는 강건한 휴먼 컴퓨터 상호작용을 위한 CA-POMDP를 이용한 눈 추적 장치를 제공하는 데 있다.The present invention has been made to solve the above-mentioned problems, and it is an object of the present invention to provide a robust human-computer interaction capable of optimizing system control parameters using CA-POMDP instead of using a high- CA-POMDP. &Lt; / RTI >

본 발명의 일 측면은 얼굴 영상의 입력시에, 눈 영역을 추출한 후에, 이진화 문턱값으로 눈 영상을 이진화하며, 이진화된 눈 영상에서 눈동자 위치를 추적하여 눈 추적을 수행하고, 시스템 제어 파라미터의 조정 행위에 대한 보상값을 출력하는 눈 추적 모듈; 얼굴 영상을 입력받아 화질을 평가하여 화질 라벨을 제공하는 화질 평가 모듈; 및 상기 시스템 제어 파라미터를 상태로 정의하고, 화질 라벨별로 다양한 객체들의 조합으로 월드 컨텍스트 모델을 구성하고, 상기 화질 평가 모듈에서 제공되는 화질 라벨에 따라 다양한 객체들의 속성과 속성 값들로 표현된 월드 컨텍스트 모델상의 정보를 기초로 현재 상태를 판단하고, 판단된 현재 상태를 근거로 실행 가능한 시스템 제어 파라미터 조정 행위를 수행하고, 상기 시스템 제어 파라미터 조정 행위의 보상값을 기초로 행위 함수값을 갱신하고 갱신된 행위 함수값에 따라 최적의 시스템 제어 파라미터 조정 행위를 선택하는 CA-POMDP 모듈을 포함한다.According to one aspect of the present invention, an eye region is extracted, a binarized binarized binarized binarized binarized binarized binarized binarized binarized binarized binarized binarized binarized binarized binarized binarized binarized binarized binarized binary binarized binarized binarized binarized binary binarized binary binarized binary binarized binary binarized An eye tracking module for outputting a compensation value for an action; An image quality evaluation module that receives a face image and evaluates image quality to provide an image quality label; And the system control parameter is defined as a state, a world context model is configured by a combination of various objects according to image quality labels, and a world context model expressed by attribute and attribute values of various objects according to an image quality label provided by the image quality evaluation module Determines an existing state based on information on the system control parameter adjustment operation, performs an executable system control parameter adjustment operation based on the determined current state, updates the behavior function value based on the compensation value of the system control parameter adjustment operation, And a CA-POMDP module for selecting an optimal system control parameter adjustment action according to a function value.

또한, 본 발명의 일 측면의 상기 시스템 제어 파라미터는 이진화 문턱값, 부분 허프 변환의 각도 파라미터 및 칼만 필터의 노이즈 파라미터를 포함한다.Further, the system control parameter of an aspect of the present invention includes a binarization threshold, an angle parameter of the partial Huff transform, and a noise parameter of the Kalman filter.

또한, 본 발명의 일 측면의 상기 화질 평가 모듈은 수집된 훈련 영상을 사용하여 화질 인덱스 프로토타입으로부터 화질 인덱스를 학습하는 화질 학습부; 및 영상을 입력받아 관심 영역을 추출하고, 사각 패치로 분할한 후에, 상기 화질 학습부의 프로토타입을 로드하고, 스캐닝 윈도우의 픽셀을 추출하여 화질 라벨을 산출하는 화질 라벨부를 포함한다.According to an aspect of the present invention, the image quality evaluation module includes an image quality learning unit that learns an image quality index from an image quality index prototype using the collected training image; And an image quality labeling unit for extracting a pixel of interest from the scanning window by dividing the region of interest into rectangular patches, loading the prototype of the image quality learning unit, and calculating the image quality label.

또한, 본 발명의 일 측면의 상기 화질 라벨부는 화질 라벨을 산출하는데 있어서 조명 변화를 고려하여 간략화된 화질 라벨을 산출하는 것을 특징으로 한다.The image quality label portion according to an aspect of the present invention is characterized by calculating a simplified image quality label in consideration of illumination change in calculating an image quality label.

또한, 본 발명의 일 측면의 상기 CA-POMDP 모듈은 상기 시스템 제어 파라미터를 상태로 정의하고, 다양한 객체들의 속성과 속성 값들로 표현된 월드 컨텍스트 모델상의 정보를 기초로 각 상태 변수들의 현재 상태를 판단하며, 갱신된 행위 함수값에 근거한 실행 가능한 행위를 수행하는 제어부; 화질 라벨별로 다양한 객체들의 조합으로 월드 컨텍스트 모델을 구성하는 월드 컨텍스트 모델링부; 및 현재 상태에 대한 행위의 보상값을 기초로 행위 함수값을 갱신하고 갱신된 행위 함수값을 저장하는 실시간 Q-학습부를 포함한다.In addition, the CA-POMDP module according to an aspect of the present invention defines the system control parameter as a state, and determines the current state of each state variable based on information on a world context model represented by attributes and attribute values of various objects And performing an executable action based on the updated behavior function value; A world context modeling unit for constructing a world context model based on a combination of various objects for each image quality label; And a real-time Q-learning unit for updating the action function value based on the compensation value of the action for the current state and storing the updated action function value.

또한, 본 발명의 일 측면의 상기 실시간 Q-학습부는 행위를 획득하고, 내부 보상을 계산하며, 즉각적인 보상을 근거로 새로운 내부 상태를 관찰하여 월드 컨텍스트 모델의 행위 함수값을 업데이트하는 것을 특징으로 한다.Further, the real-time Q-learning unit according to an aspect of the present invention is characterized by acquiring an action, calculating an internal compensation, and observing a new internal state based on the immediate compensation to update an action function value of the world context model .

본 발명은 고비용 이미지 캡쳐 기기나 매우 제한적인 상황을 이용하는 대신, CA-POMDP를 사용하여 시스템 제어 파라미터를 최적화하여 성능을 보장할 수 있다.The present invention can ensure performance by optimizing system control parameters using CA-POMDP instead of using expensive image capture devices or very limited circumstances.

도 1은 본 발명의 일실시예에 따른 강건한 휴먼 컴퓨터 상호작용을 위한 POMDP를 이용한 눈 추적 장치의 구성도이다.
도 2는 눈이 떠지는 정도에 따라 홍채의 외부 경계를 나타내는 부분 원들을 표시한 도면이다.
도 3은 화질 라벨부에서 화질을 라벨링하기 위한 과정을 설명하기 위한 도면이다.
도 4를 참조하면, 일반적인 POMDP와 본 발명의 CA-POMDP를 비교해서 보여준다.
도 5는 도 1의 눈 추적 장치의 동작을 설명하기 위한 흐름도이다.1 is a block diagram of an eye tracking apparatus using POMDP for robust human computer interaction according to an embodiment of the present invention.
FIG. 2 is a diagram showing partial circles representing the outer boundary of the iris according to the degree of eye shedding.
3 is a view for explaining a process for labeling the image quality in the image quality label portion.
Referring to FIG. 4, a comparison between a general POMDP and a CA-POMDP of the present invention is shown.
5 is a flowchart for explaining the operation of the eye tracking apparatus of FIG.

본 발명과 본 발명의 동작상의 이점 및 본 발명의 실시에 의하여 달성되는 목적을 설명하기 위하여 이하에서는 본 발명의 바람직한 실시예를 예시하고 이를 참조하여 살펴본다.BRIEF DESCRIPTION OF THE DRAWINGS The above and other objects, features and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.

먼저, 본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로서, 본 발명을 한정하려는 의도가 아니며, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. 또한 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.First, the terminology used in the present application is used only to describe a specific embodiment, and is not intended to limit the present invention, and the singular expressions may include plural expressions unless the context clearly indicates otherwise. Also, in this application, the terms "comprise", "having", and the like are intended to specify that there are stated features, integers, steps, operations, elements, parts or combinations thereof, But do not preclude the presence or addition of features, numbers, steps, operations, components, parts, or combinations thereof.

본 발명을 설명함에 있어서, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

도 1은 본 발명의 일실시예에 따른 강건한 휴먼 컴퓨터 상호작용을 위한 POMDP를 이용한 눈 추적 장치의 구성도이다.1 is a block diagram of an eye tracking apparatus using POMDP for robust human computer interaction according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일실시예에 따른 강건한 휴먼 컴퓨터 상호작용을 위한 POMDP를 이용한 눈 추적 장치는 눈추적 모듈(100), 화질 평가 모듈(200) 및 CA-POMDP 모듈(300)을 포함한다.Referring to FIG. 1, an eye tracking apparatus using POMDP for robust human computer interaction according to an embodiment of the present invention includes an eye tracking module 100, an image quality evaluation module 200, and a CA-POMDP module 300 .

상기 눈추적 모듈(100)은 얼굴 영상의 입력시에, 눈 영역을 추출한 후에, 이진화 문턱값으로 눈 영상을 이진화하며, 이진화된 눈 영상에서 눈동자 위치를 추적하여 눈 추적을 수행한다.The eye tracking module 100 binarizes the eye image at the binarization threshold value after extracting the eye region at the time of inputting the face image, and tracks the pupil position in the binarized eye image to perform eye tracking.

이와 같은 눈추적 모듈(100)에서 수행되는 눈 추적 방법의 성능은 각 단계들에서 알고리즘의 선택 그리고 선택된 알고리즘과 관련된 문턱값들과 파라미터들(이를 통칭하여 시스템 제어 파라미터라고 한다)에 매우 의존적이다. The performance of the eye tracking method performed in such an eye tracking module 100 is highly dependent on the selection of the algorithm and the thresholds and parameters (collectively referred to as system control parameters) associated with the selected algorithm in each step.

상기 눈 추적 방법은, 눈 추적 알고리즘 구조 및, 알고리즘 구조와 관련된 문턱값들과 파라미터들을 결정하는 것의 중대한 동적 제어 문제에서 가장 잘 체계화될 수 있다. The eye tracking method can be best structured in the eye tracking algorithm structure and in the critical dynamic control problem of determining the thresholds and parameters associated with the algorithm structure.

이와 같은 눈 추적 방법이 성능이 저하되는 주된 이유가, 조명, 포즈 및 이미지의 모양변화로 인한 이미지 질의 변화임을 고려하면 상기 문턱값들과 파라미터들은 환경의 변화에 많은 영향을 받는다. 본 발명에서는 눈추적 모듈(100)의 시스템 제어 파라미터를 화질 라벨(image-quality labels), 관측값들(observation) 및 보상값들(rewards)에 의해 적응적으로 제어한다.The threshold values and parameters are greatly influenced by changes in the environment in consideration of the fact that the performance of the eye tracking method is deteriorated due to changes in image quality due to illumination, pose, and change in shape of the image. In the present invention, system control parameters of the eye tracking module 100 are adaptively controlled by image-quality labels, observations, and compensation values (rewards).

눈 영역은, 이미 잘 알려진 얼굴 위치 및 눈 영역 위치 방법을 이용하여 정확한 위치를 찾아낼 수 있고, 눈 추적 역시 이미 잘 알려진 눈 추적 방법에 기반하여 처리될 수 있다. The eye region can be located using the already well known face position and eye region location method, and eye tracking can also be processed based on the well known eye tracking method.

초기 눈의 중심이 전처리, 특성 추출 및 부분 허프 변환(Hough transformation)에 의해 추정될 수 있다. The center of the initial eye can be estimated by preprocessing, feature extraction and partial Hough transformation.

전처리는 적응적으로 선택된 알고리즘 하부 구조 및 선택된 알고리즘 하부 구조와 관련된 문턱값을 이용하여 수행될 수 있고, 특성 추출은 윤곽이나 엣지 이미지를 생성하기 위해 수행될 수 있다. Preprocessing may be performed using an adaptively selected algorithmic infrastructure and a threshold associated with the selected algorithmic infrastructure, and feature extraction may be performed to generate an outline or edge image.

윤곽 또는 엣지 이미지는 최적의 트래킹 포인트를 결정하기 위해, 홍채 경계의 조절된 호각 파라미터를 이용한 부분 허프 변환에 의해 처리될 수 있다. The contour or edge image may be processed by a partial Huff transform using the adjusted whistle parameter of the iris boundary to determine the optimal tracking point.

최종적으로, 칼만 필터가 적용될 수 있고, 다음 눈 중심이 예측될 수 있다. 상술한 단계들은 눈 추적이 종료될 때까지 반복될 수 있다. Finally, a Kalman filter can be applied and the next eye center can be predicted. The above steps can be repeated until the eye tracking is finished.

눈 추적 제어 공간은, 예를 들어 가능한 알고리즘 구조와 문턱값들 또는 파라미터들의 다양성은, 추적의 정확도와 실행 시간 제약간의 트레이드 오프를 고려한 이전 지식 또는 경험에 기반하여 결정될 수 있다.The eye tracking control space can be determined based on previous knowledge or experience, for example, considering possible algorithmic structures and thresholds or a variety of parameters, taking into account tradeoffs between tracking accuracy and runtime constraints.

이와 같은 눈 추적 방법에 있어서 전처리 단계는 히스토그램 평활화, 하나의 문턱값을 이용한 레티넥스 및 두 개의 파라미터들을 이용한 엔드인 컨트라스트 스트레칭(end-in contrast stretching)으로 구성될 수 있다. In such an eye tracking method, the preprocessing step may include histogram smoothing, Retinex using one threshold value, and end-in contrast stretching using two parameters.

예를 들어, 전처리 단계의 가능한 알고리즘 구조들은 히스토그램 평활화(HE:Histogram equalization), 레티넥스, 직렬 조합에서 히스토그램 평활화를 이용한 레티넥스 또는 엔드 인 컨트라스트 스트레칭일 수 있다. For example, possible algorithmic structures of the preprocessing step may be retinex or end-in-contrast stretch using histogram equalization (HE), retinex, histogram smoothing in a serial combination.

다음으로, 특징 추출 단계는 (1) 이진화(BN:binarization), (2) 캐니 엣지 탐색 알고리즘, (3) AND 연산을 이용한 병렬 조합에서 이진화와 캐니 또는 (4) OR 연산을 이용한 병렬 조합에서 이진화와 캐니가 될 수 있다. 여기서, 직렬 조합에서 이진화와 윤곽은 단순성을 위해 이진화에 의해 나타나질 수 있다. Next, the feature extraction step is performed in a parallel combination using binarization and canny or (4) OR operation in parallel combination using (1) binarization, (2) canny edge search algorithm, (3) AND operation, And can be. Here, in a series combination, binarization and contouring can be represented by binarization for simplicity.

특징 추출은 상술한 (1) 내지 (4)의 알고리즘 구조 중 하나를 이용하여 선택적으로 수행될 수 있다. 이진화는 하나의 문턱값(X_BN)을 가질 수 있고, 캐니 알고리즘은 엣지 탐지를 위한 두 개의 문턱값을 가질 수 있다. 여기서, 캐니Feature extraction may be selectively performed using one of the algorithm structures of (1) to (4) described above. The binarization may have one threshold (X _BN ), and the Kanny algorithm may have two thresholds for edge detection. Here,

알고리즘을 위한 두 개의 문턱값 중 하나는 엣지 링킹과 관련될 수 있고, 다른 하나는 스트롱 엣지의 초기 세그먼트를 찾는 것과 관련될 수 있다.One of the two thresholds for the algorithm can be related to edge linking and the other can be related to finding the initial segment of the strong edge.

한편, 눈 추적 방법에서 홍채 경계는, 낮은 해상도 이미지를 이용하여 실시간 성능을 제공하는 것을 목표로 하기 때문에 원에 근사할 수 있다. On the other hand, in the eye tracking method, the iris boundary can be approximated to a circle because it aims at providing real-time performance using a low resolution image.

알고리즘은 홍채 이미지의 외부 경계를 계산함으로써, 눈(홍채)의 중심을 탐지할 수 있다. 이때, 노이즈와 눈꺼풀 가림의 영향을 덜 미치는 특성을 추출하기 위해, 수정된 허프 변환 방법(부분 허브 변환이라 불리는)이 이용될 수 있다. The algorithm can detect the center of the eye (iris) by calculating the outer boundary of the iris image. At this time, a modified Hough transform method (referred to as partial Hub transform) may be used to extract characteristics less affected by noise and eyelid occlusion.

완전한 원 대신에 원의 두 부분(부분 원들)이 연결되어 위와 아래의 눈꺼풀들이 교합되는 효과를 최소화할 수 있다. Instead of a complete circle, the two parts of the circle (partial circles) can be connected to minimize the effect of occlusion of the upper and lower eyelids.

눈의 중심 트래킹을 위한 4차원의 제어 공간은 부분 관심사(예를 들어, 부분 원들)가 되는 홍채 경계를 나타내는 각도값을 포함할 수 있다.The four-dimensional control space for center tracking of the eye may include an angle value representing an iris boundary that is a partial interest (e.g., partial circles).

도 2는 눈이 떠지는 정도에 따라 홍채의 외부 경계를 나타내는 부분 원들을 표시한 도면이다. 도 2는 눈꺼풀 교합 효과를 피하기 위해, 모든 홍채 경계 대신에, Φ₁과 Φ₂의 각도들과, Φ₃와 Φ₄의 각도들간의 홍채 외부 경계 부분이 고려되는 모습을 나타내고 있다.FIG. 2 is a diagram showing partial circles representing the outer boundary of the iris according to the degree of eye shedding. Fig. 2 shows a state in which, instead of all the iris boundaries, the angles of 陸₁ and 陸₂ and the iris outer boundaries between angles of 陸₃ and 陸₄ are considered in order to avoid the eyelid occlusion effect.

서클피팅을 위한 부분 허프 변환 알고리즘은 다음과 같이 묘사될 수 있다. 우선, 홍채 외부 원은 다음 수학식1과 같이 표현될 수 있다.The partial Huff transformation algorithm for circle fitting can be described as follows. First, the iris outer circle can be expressed by the following equation (1).

(수학식 1)(1)

(x-a₀)²+(y-b₀)²=r² (xa ₀ ) ² + (yb ₀ ) ² = r ²

여기서, (a₀, b₀)는 눈 중심(홍채 외부 경계의 중심)의 좌표를 의미할 수 있고, r은 원의 반지름을 의미할 수 있다. 이때, 다음과 같은 두 개의 스테이지 알고리즘이 이용될 수 있다. 첫 번째 스테이지는 눈 원 중심을 찾고, 두 번째 스테이지는 눈 중심을 교차하는 부분 원들에서 홍채 외부 경계점의 정상적인 방향을 예측할 수 있다. (x, y)가 홍채 외부 경계에서의 경사 방향점(gradient direction point)이라고 가정하자. (x, y, X_PTH) 삼원수로부터 센터 파라미터 (a, b)까지의 맵핑은 직선이 된다. 여기서, X_PTH는 경사 방향의 각도를 의미할 수 있다. 이러한 직선들의 많은 교차는 눈 중앙의 좌표를 식별할 수 있다. (x, y, X_PTH)과 (a, b)간의 관계는 다음 수학식 2와 같이 주어질 수 있다.Here, (a ₀ , b ₀ ) may mean the center of the eye (the center of the outer boundary of the iris), and r may mean the radius of the circle. At this time, the following two stage algorithms can be used. The first stage finds the center of the eye circle, and the second stage predicts the normal direction of the outer boundary point of the iris in the partial circles intersecting the eye center. (x, y) is a gradient direction point at the outer boundary of the iris. (x, y, X _PTH ) The mapping from the triple to the center parameter (a, b) is straight. Here, X _PTH may mean an angle in an oblique direction. Many intersections of these straight lines can identify the coordinates at the center of the eye. (x, y, X _PTH ) and (a, b) can be given by the following equation (2).

(수학식 2)(2)

X_PTH=arctan((y-b)/(x-a))X _PTH = arctan ((yb) / (xa))

부분 허프 변환 알고리즘에서, 홍채 외부 경계의 범위는 눈꺼풀 교합의 효과를 피하기 위해 다음 수학식 3의 일례와 같이 네 개의 각도 파라미터 X_PTH1, X_PTH2, X_PTH3, 및 X_PTH4에 의해 제한될 수 있다. In the partial Huff transformation algorithm, the range of the iris outer boundary can be limited by four angular parameters X _PTH1 , X _PTH2 , X _PTH3 , and X _PTH4 to avoid the effect of the eyelid occlusion as in the example of Equation (3).

(수학식 3)(3)

X_PTH=[X_PTH1,X_PTH2] U [X_PTH2,X_PTH4]X _PTH = [X _PTH1 , X _PTH2 ] U [X _PTH2 , X _PTH4 ]

한편, 칼만 필터(KF:Kalman filter)는 제곱 오차를 최소화하는 상태 프로세스의 추정에 의한 불연속성의 선형 필터 문제를 위한 반복적인 접근법일 수 있다.On the other hand, a Kalman filter (KF) can be an iterative approach to the linear filter problem of discontinuity by estimation of the state process minimizing the squared error.

칼만 필터는 베이지안 트래킹 문제로서 체계화되는 눈 중앙의 트래킹의 근사치를 계산하기 위해 이용될 수 있다.The Kalman filter can be used to calculate an approximation of tracking centered eye as a Bayesian tracking problem.

눈 중앙의 움직임이 다음 수학식 4 및 수학식 5로서 결정될 수 있는 일정 속도 F_t, 상태 천이 모델의 공분산 행렬 및 프로세스 노이즈 모델을 갖는다고 가정하자.Suppose that the motion at the center of the eye has a constant velocity F _t , a covariance matrix of the state transition model, and a process noise model, which can be determined by the following equations (4) and (5)

(수학식 4)(4)

(수학식 5)(5)

여기서, ΔT는 인접한 프레임들간의 시간 주기(보통 매우 짧은)를, q는 프로세스 노이즈와 연관된 파라미터를 각각 의미할 수 있다. H_t가 상수라 가정하면, 측정 모델의 매트릭스들과 측정 노이즈 공분산은 각각 다음 수학식 6 및 7과 같이 결정될 수 있다.Where DELTA T is the time period between adjacent frames (usually very short), and q may be a parameter associated with process noise, respectively. Assuming that H _t is a constant, the matrices of the measurement model and the measured noise covariance can be determined as shown in the following equations (6) and (7), respectively.

(수학식 6)(6)

(수학식 7)(7)

여기서, r은 측정 노이즈와 관련된 파라미터를 의미할 수 있다. 칼만 필터 접근법은 종종 노이즈 공분산 매트릭스 Q_t-1과 R_t의 추정에 어려움이 있다. 많은 어플리케이션들에서, 노이즈 공분산의 Q_t-1과 R_t는 빠르게 안정화될 수 있고, 불변할 수 있다. 파라미터들 r과 q는 오프라인에서 필터를 동작시키거나 또는 상태 값을 결정함으로써 미리 계산될 수 있다. 그러나, 노이즈는 동적으로 변화하는 환경의 불확실성 때문에 눈 추적에서 불변하지 않는다. 파라미터들 r과 q는 본 발명의 실시예들에 따른 눈 추적 스킴에서 최적의 성능을 달성하기 위해 조정될 수 있다.Here, r may mean a parameter related to the measurement noise. The Kalman filter approach often has difficulties in estimating the noise covariance matrix Q _t-1 and R _t . In many applications, Q _t-1 and R _t of the noise covariance can be quickly stabilized and invariant. The parameters r and q can be pre-computed by operating the filter off-line or by determining the state value. However, noise is not invariant in eye tracking due to uncertainty in the dynamically changing environment. The parameters r and q may be adjusted to achieve optimal performance in an eye tracking scheme according to embodiments of the present invention.

여기에서, 시스템 제어 파라미터는 표기상의 통일을 위하여 r은 X_KF1으로 표현하고, q는 X_KF2로 표기한다.Here, the system control parameter is represented by X _KF1 for the unification in the notation, and q is represented by X _KF2 .

결론적으로, 시스템 제어 파라미터 공간 X^Ψ는 파라미터들을 내부 집합으로 갖는 아래 수학식 8로 표현된다.In conclusion, the system control parameter space X ^? Is represented by the following equation (8) with parameters as an inner set.

(수학식 8)(8)

여기에서, 각각의 파라미터는 해당하는 이산 영역에 포함된다.Here, each parameter is included in the corresponding discrete area.

상기 눈 추적 모듈(100)은 CA-POMDP 모듈(300)에 의해 시스템 제어 파라미터가 상태 s로 정의되고 행위 a에 의해 이러한 시스템 제어 파라미터가 변화될 때 실시간 Q-학습부(330)의 강화학습을 위하여 각 행위의 실행 결과를 평가하여 보상값을 결정하여 실시간 Q-학습부(330)에 제공한다. 이때, 보상값을 월드 컨텍스트 모델링부(320)에도 제공하여 월드 컨텍스트 모델을 업데이트 하도록 한다.The eye tracking module 100 may perform reinforcement learning of the real-time Q-learning unit 330 when the system control parameter is defined as a state s by the CA-POMDP module 300 and the system control parameter is changed by the action a Evaluates the execution result of each action, determines the compensation value, and provides the compensation value to the real-time Q-learning unit 330. At this time, the compensation value is also provided to the world context modeling unit 320 to update the world context model.

한편, 외부 환경이 주어지면, 관련된 문턱값과 파라미터들을 갖는 최적의 알고리즘 구조가 인지된 이미지 컨텍스트에 따라 결정될 수 있다.On the other hand, given an external environment, an optimal algorithm structure with associated threshold values and parameters may be determined according to the perceived image context.

일반적으로, 컨텍스트는 관심사의 시스템 성능에 영향을 미치는 어떤 정보가 될 수 있다. 이미지 컨텍스트는 조명의 방향, 밝기, 대비 및 분광 조성에 의해 영향을 받을 수 있다. 화질(image qulity)의 분석 기술을 이용한 이미지 컨텍스트의 컨셉은 시스템 성능을 향상시킬 수 있다. In general, a context can be any information that affects system performance of interest. The image context may be influenced by the direction, brightness, contrast and spectral composition of the illumination. The concept of image context using image quality analysis technique can improve system performance.

최근, 화질의 분석 방법은 이미지 저장, 압축, 통신, 디스플레이, 분할 및 인식 등과 같은 다양한 어플리케이션에 성공적으로 적용되고 있으며, 적응적 가Recently, image quality analysis methods have been successfully applied to various applications such as image storage, compression, communication, display, segmentation and recognition, and adaptive

중 파라미터를 이용하여 적응 빛 정규화를 위한 전처리를 결정하는데 이용될 수 있다.Can be used to determine the preprocessing for adaptive light normalization using the intermediate parameter.

이를 위하여 화질 평가 모듈(200)은 화질을 분석하고 분석된 결과를 화질 라벨(

)로 CA-POMDP 모듈(300)로 알려준다.For this, the image quality evaluation module 200 analyzes the image quality and outputs the analyzed result to the image quality label (

) To the CA-POMDP module 300.

이를 위하여 화질 평가 모듈(200)은 화질 라벨부(210)와 화질 학습부(220)로 구성되어 있다.For this, the image quality evaluation module 200 includes an image quality label unit 210 and an image quality learning unit 220.

여기에서, 화질 라벨부(210)는 이미지 프레임 I가 입력되면 도 3에 도시되어 있는 바와 같이 관심 영역(ROI: region of interest)(210-1)을 추출한다.Here, the image quality label portion 210 extracts a region of interest (ROI) 210-1 as shown in FIG. 3 when the image frame I is input.

그리드(grid) Θ는 n개의 그리드 셀로 이루어지며 이에 따라 Θ={ω₁,...,ω_n}으로 정의되며, ROI는 I(Θ)로 정의된다.The grid Θ is made up of n grid cells and is thus defined as Θ = {ω ₁ , ..., ω _n }, and the ROI is defined as I (Θ).

그리고, 화질 라벨부(210)는 관심 영역을 사각 패치(rectangular patch)(210-2)로 분할하고, 사각 패치(210-2)의 인덱스를 만든다.Then, the image quality label portion 210 divides the region of interest into a rectangular patch 210-2 and creates an index of the rectangular patch 210-2.

사각 패치를 I(ω)라 하고, 각각의 ω가 중심 그리드 셀을 의미할 때에 I(Θ)=I(ω₁)∪...∪I(ω_n)로 표현된다.I (Θ) = I (ω ₁ ) ∪ ... ∪ I (ω _n ) when the square patch is I (ω) and each ω is the center grid cell.

n개의 그리드 셀의 라벨 집합을 Λ_i(i=1,...,n)이라고 하고, Λ⁺ _i={0}∪Λ_i라고 하며(여기에서 {0}은 라벨이 없는 널 라벨을 의미함),

^j _i를 Λ⁺ _i의 j번째 라벨이라 하면, Λ_i={

¹ _i,...

^l _i}이 된다. 이에 따라, 관심 영역의 화질 라벨 공간은 Φ(I(Θ))∈

로 정의될때

=Λ⁺ _ix…xΛ⁺ _n 에 속한다.Let Λ _i (i = 1, ..., n) be the label set of n grid cells, and let Λ ⁺ _i = {0} ∪Λ _i (where {0} box),

^{Let j} _{i be} the jth label of Λ ⁺ _i , then Λ _i = {

¹ _i , ...

^l _i }. Accordingly, the image quality label space of the region of interest is Φ (I (Θ)) ∈

When defined as

= Λ ⁺ _i x ... belongs to xΛ ⁺ _n .

다음으로, 화질 라벨부(210)는 프로토파입의 픽셀을 로드하고, 사각 패치의 W 스캐닝 윈도우의 픽셀을 추출한 후에, 스캐닝 윈도의 픽셀을 프로토파입의 픽셀로 비교하면서 화질을 계산하여 화질 인덱스를 생성하여 출력한다. Next, the image quality labeling unit 210 loads the pixels of the prototype image, extracts the pixels of the W scanning window of the rectangular patch, calculates the image quality by comparing the pixels of the scanning window with the pixels of the prototype image, And outputs it.

여기에서, W 스캐닝 윈도우는 사각 패치의 좌에서 우로 그리고 위에서 아래로 수행된다. Here, the W scanning window is performed from left to right and top to bottom of the rectangular patch.

이와 관련하여 상세히 살펴보면, 스캐닝 윈도우를 k라 할때, 각각

^j _i를 갖는 클러스터 프로로타입(cluster prototype)의 픽셀은 z^k(

^j)={z^k ₁,...,z^k _m}으로 표현된다. 그리고, 스캐닝 윈도우 k의 테스트 픽셀의 화질값은 x^k={x^k ₁,...,x^k _m}이라 하며, W 스캐닝 윈도우에서 예측값 Δ(

^j)는 아래 수학식9와 10으로 정해진다.In detail, when the scanning window is represented by k,

The pixels of the cluster prototype with ^j _i are z ^k (

^j = {z ^k ₁ , ..., z ^k _m }. Then, the image quality value of the test pixel of the scanning window k is x ^k = {x ^k ₁ , ..., x ^k _m }, and the predicted value Δ (

^j ) is defined by the following equations (9) and (10).

(수학식 9) (9)

여기에서,

,

및

이다.From here,

,

And

to be.

(수학식 10)(10)

여기에서, Δ(

^j)∈{-1, 1}이며, Δ(

^j)의 최대값은 모든 i=1,...,m일때 z^k(

^j)=x^k즉, z^k _i=x^k _i,이면 1.0이다.Here,? (

^j ) ∈ {-1,1} and Δ (

The maximum value of ^j is z ^k (i = 1, ..., m)

^j ) = x ^k, that is, z ^k _i = x ^k _i , and 1.0.

이미지 패치 I(ω)에서 화질 라벨은 다음 수학식 11로 결정된다.In the image patch I (?), The image quality label is determined by the following equation (11).

(수학식 11)(11)

관심 영역 I(Θ)에서 화질 라벨은 다음 수학식 12로 얻어진다.In the region of interest I (?), The image quality label is obtained by the following equation (12).

(수학식 12)(12)

(I(Θ))=

^* ₁

^* ₂…

^* _n

(I (?)) =

^* ₁

^* ₂ ...

^* _n

각각

^j _i의 화질값은 x^k는 아래 수학식 13과 같이 상관 관계, 조명 왜곡 및 컨트라스트 왜곡의 3개부분으로 나누어진다.each

quality value of the ^j x ^k _i is is divided into three parts of the correlation, lighting contrast distortion and distortion as shown in Equation 13 below.

(수학식 13) (13)

여기에서, 눈 추적의 성능의 주요한 요소와 실시간의 제약을 고려할 때에 조명 영향이 가장 중요하면 다음 수학식 14로 화질값은 간략화된다.Here, when the illumination effect is most important in consideration of the main factors of the performance of the eye tracking and the real-time constraints, the image quality value is simplified by the following equation (14).

(수학식 14) (14)

이에 따라, 관심 영역의 화질은 다음 수학식 15로 표현된다. Accordingly, the image quality of the ROI is represented by the following equation (15).

(수학식 15)(15)

여기에서,

이다.From here,

to be.

결론적으로, 화질 라벨부(210)는 조명변화를 반영하여 화질을 계산한다.Consequently, the image quality label unit 210 calculates the image quality by reflecting the illumination change.

다음으로, 화질 학습부(220)는 수집된 훈련 영상을 사용하여 화질 인덱스 프로토타입으로부터 화질 인덱스를 학습한다.Next, the image quality learning unit 220 learns the image quality index from the image quality index prototype using the collected training image.

한편, CA-POMDP 모듈(300)는 제어부(310), 월드 컨텍스트 모델링부(320) 및 실시간 Q-학습부(330)로 이루어져 있다.The CA-POMDP module 300 includes a control unit 310, a world context modeling unit 320, and a real-time Q-learning unit 330.

상기 CA-POMDP 모듈(300)은 시스템 제어 파라미터를 상태 s로 정의한다(제어부가 이 역할을 수행한다).The CA-POMDP module 300 defines a system control parameter as a state s (the controller performs this role).

그리고, CA-POMDP 모듈(300)은 화질 라벨별로 다양한 객체들의 조합으로 월드 컨텍스트 모델을 구성하고(월드 컨텍스트 모델링부가 수행함), 다양한 객체들의 속성과 속성 값들로 표현된 월드 컨텍스트 모델상의 정보를 기초로 각 상태 변수들의 불리언 값을 결정함으로써 현재 상태를 판단한다(제어부가 수행함).The CA-POMDP module 300 constructs a world context model by combining various objects according to image quality labels (a world context modeling unit), and based on the information on the world context model represented by the attributes and attribute values of various objects The current state is determined by the control unit by determining the Boolean value of each state variable.

이와 같은 CA-POMDP 모듈(300)에서 판단된 현재 상태는 행위와 Q-학습이 중요한 기초가 된다.The current state determined by the CA-POMDP module 300 is an important basis for the behavior and the Q-learning.

상기 CA-POMDP 모듈(300)은 현재 상태에 대한 행위의 보상값을 기초로 그 행위의 가치 함수인 Q 값을 갱신하고 갱신된 Q 값을 저장하며(실시간 Q-학습부가 수행함), 갱신된 Q 값에 근거한 실행 가능한 행위를 수행하고(제어부가 수행함) 눈 추적 모듈(100)에서 보상값을 제공받아 Q 값을 갱신하면서 가치 함수인 Q 값이 가장 우수한 행위를 선택한다(제어부와 실시간 Q-학습부가 수행함).The CA-POMDP module 300 updates the Q value, which is a value function of the action, based on the compensation value of the action for the current state, stores the updated Q value (real-time Q-learning unit performs) And performs an executable action based on the value (performed by the control unit). The eye tracking module 100 receives the compensation value and updates the Q value, and selects an action having the best Q value (the controller and the real- Added).

일반적으로, 부분 관찰 마르코프 의사 결정 모델(partially observable Markov decision process,POMDP)은 부분적으로 관찰 가능한 문제에서의 일반적인 의사 결정 프레임워크이다.In general, the partially observable Markov decision process (POMDP) is a general decision framework for partially observable problems.

이는 불확실성이 존재하는 실제 문제들에 적합한 모델이고, 실제 세계에 대한 정확한 모델을 가정하며, 주어진 모델에서의 최적 행위 정책을 찾는다.It is an appropriate model for real problems in which uncertainty exists, assumes an accurate model for the real world, and finds the optimal action policy in a given model.

POMDP는 부분적으로 관측할 수 있는 불확실한 상황을 고려하여 기본적으로 다음{S,A,O,T,Ω,R,γ}의 구성요소를 가진다. POMDP basically has the following components {S, A, O, T, Ω, R, γ}, taking into account the uncertainties that can be observed in part.

여기에서 S는 상태(state)s의 집합이며, A는 제어부가 취할 수 있는 행위(action) a의 집합이고, T는 상태 s의 변이 확률 분포이며, R은 제어부의 행위에 따른 보상r의 확률 분포이며, O는 관측치 o의 집합이며, Ω는 실제 상태에 대한 관측치의 확률 분포이다. γ는 할인율로 0과 1사이의 실수 값을 가진다.Where S is a set of states s, A is a set of actions a that the controller can take, T is a variance probability distribution of state s, R is the probability of compensation r Where O is the set of observations o and Ω is the probability distribution of the observations to the actual state. γ has a real value between 0 and 1 at a discount rate.

즉, 현재 상태를 s, 다음 상태를 s'라 하면, T(s, a, s')=P(s'｜ s,a)가 되며, Ω(s,a,o')=P(o'｜s,o)가 된다.(S, a ', s') = P (s '| s, a) where s is the current state and s' '| S, o).

MDP에서 특정 상태를 기반으로 판단하게 되지만 POMDP에서는 이것이 불가능해지기 때문에 부분적으로 관측 정보를 이용해서 확률적으로 상태를 결정하도록 하는 신뢰 공간(belief space)에 대한 정의가 필요하다. 이러한 신뢰공간으로 인해 POMDP 해결은 MDP 해결로 가능하게 된다.In the MDP, it is based on a specific state. However, since it becomes impossible in POMDP, it is necessary to define a belief space that allows the state to be determined stochastically using observation information in part. Due to this trusting space, POMDP resolution becomes possible with MDP resolution.

시점 t에서의 상태 s의 분포 확률을 b_t(s)라 하면 에이젠트 히스토리에 의해 b_t(s)=P(s_t｜h)로 정해진다. 여기에서, h=((a₀, o₁, r₁), (a₁, o₂, r₂),...,(a_t-1, o_t, r_t)이다.Let b _t (s) be the distribution probability of state s at time t, and b _t (s) = P (s _t | h) is determined by agent history. Here, h = ((a ₀ , o ₁ , r ₁ ), (a ₁ , o ₂ , r ₂ ), ..., (a _t-1 , o _t , r _t ).

POMDP에서 최적의 정책을 구하는 문제는 신뢰 상태 MDP에서 4가지 튜플 [B,A,τ, R_B]를 구하는 것과 같다.The problem of obtaining the optimal policy in POMDP is the same as obtaining the four tuples [B, A, τ, R _B ] in the trust MDP.

신뢰 상태 b는 상태 평가자 τ를 이용하여 다음 신뢰 상태 b'으로 업데이트되며, b'=τ(b,a,o)로 정해진다. 상태의 업데이트는 아래 베이지안(Bayesian) 규칙[5]에 의해 정해진다.The trust state b is updated to the next trust state b 'using the state evaluator τ, and b' = τ (b, a, o). The status update is determined by the Bayesian rule [5].

(수학식 16)(16)

그리고, 보상 함수는 다음 수학식 17로 정의된다.The compensation function is defined by the following equation (17).

(수학식 17)(17)

여기서, V^*를 정책 π의 가치함수(value function)라 한다. 만약 V^*가 '정의할 수 있는 다른 모든 타 정책의 가치함수'보다 크거나 같다면, 이때의 정책을 최적정책(optimal policy)이라고 하고, π*로 표기한다.Here, V ^* is the value function of the policy π. If V ^* is greater than or equal to the value function of all other policies that can be defined, then the policy is called the optimal policy and denoted by π *.

결과적으로, MDP에서 해를 구하는 과정은 최적정책(π*)을 구하는 과정이라 할 수 있다.As a result, the process of obtaining the solution in the MDP is a process of obtaining the optimal policy (π *).

최적 정책은 아래 수학식 18의 벨맨(Bellman) 방정식에 의해 정의된다.The optimal policy is defined by the Bellman equation of Equation 18 below.

(수학식 18)(18)

여기에서, Q(b,a)는 행위 값 함수로 최적의 행위 값 함수는 Q^*(b, a)이다.Here, Q (b, a) is the behavior value function, and the optimal behavior value function is Q ^* (b, a).

최적의 행위 값 함수는 다음 수학식 19로 얻을 수 있다.The optimal behavior value function can be obtained by the following equation (19).

(수학식 19)(19)

한편, 관찰 분포 함수

이다.On the other hand,

to be.

상기 CA-POMDP에서 상태 공간은 컨텍스트 공간에 의존하는 시스템 제어 공간에 의해 한정된다. 여기에서, 컨텍스트는 눈 추적 모듈(100)의 시스템 성능에 영향을 미치는 어떤 정보가 될 수 있다. 이미지 컨텍스트는 조명의 방향, 밝기, 대비 및 분광 조성등이 될 수 있다.The state space in the CA-POMDP is defined by the system control space which depends on the context space. Here, the context may be some information that affects the system performance of the eye tracking module 100. The image context may be the direction, brightness, contrast and spectral composition of the illumination.

행위의 집합 A는 상태 공간에서 시스템 제어 파라미터의 적응을 나타낸다. 그리고, 관찰의 집합 O는 객체가 출현하는 추적 성능과 화질의 상태는 나타낸다.The set of actions A represents the adaptation of system control parameters in the state space. The set of observations O indicates the tracking performance and the state of image quality in which the object appears.

행위 주체는 현재의 환경 상태를 직접적으로 알 수 없고, 대신 관찰만을 얻을 수 있다. 따라서 행위 주체는 현재까지의 모든 행위과 그에 대응되는 관측을 통해 현재의 환경 상태에 대한 확률 분포를 유지한다. 이와 같은 것을 신뢰 상태(belief state)라고 하며, 이러한 공간을 신뢰 공간 B라고 한다.따라서, 행위 주체는 신뢰 공간에서 근거해서 불확실성을 탐지한다. The actor can not directly know the current state of the environment, but can only obtain observations. Therefore, the actor maintains a probability distribution of the current environmental condition through all the actions to date and corresponding observations. This is called the belief state, and this space is called the trust space B. Thus, the actor detects uncertainty based on the trust space.

하지만, 일반적인 POMDP에서는 상태 번호｜S｜에서 신뢰 공간 B의 급증때문에 실시간 해결을 제공하고 있지 않다.However, in a typical POMDP, real-time resolution is not provided due to the surge in the trust space B in the state number | S |.

상태 천이들, 관측들 그리고 보상들의 확률 분포는 그러한 처리상의 어려움을 피하기 위하여 한정된 월드 컨텍스트 모델들과 결합되어야 한다.The probability distributions of state transitions, observations, and compensation should be combined with limited world context models to avoid such processing difficulties.

상기 월드 컨텍스트 모델들은 눈 추적의 불확실성에 대한 영역에 대한 이해를 나타내며, 영역에 대한 이해는 시간 범위 모색에 사용된다. The world context models represent an understanding of the area of uncertainty of eye tracking, and an understanding of the area is used for time range searching.

일반적인 POMDP와 달리 단일 모델에서 상태는 관찰되며 전이 동작은 현재의 월드 컨텍스트 모델에 의존하는 국부 상태에 집중될 수 있다.Unlike normal POMDP, the state is observed in a single model, and the transition behavior can be focused on the local state dependent on the current world context model.

제안된 CA-POMDP는 추가적인 간략화로 인해 정확도가 떨어지나 실시간의 한계를 만족하는 계산상의 이득이 있다는 것은 분명하다.It is clear that the proposed CA-POMDP is less accurate due to additional simplification but has a computational gain that meets real-time limitations.

상기 CA-POMDP는 <S,A,O,T,Ω,R,γ,Φ>의 8개 요소로 정의된다.The CA-POMDP is defined as eight elements: S, A, O, T, Ω, R, γ, and Φ.

여기에서,Φ는 컨텍스트 공간을 나타내며, 컨텍스트 공간은 눈 영상에 영향을 주는 조명의 방향, 밝기, 시야각 및 스케일등이 될 수 있다.Here,? Represents the context space, and the context space may be the direction, brightness, viewing angle, and scale of illumination affecting the eye image.

상기 컨텍스트 공간은 영상을 획득한 환경을 고려하여 화질을 측정한 것을 나타낸다. 일예로, 컨텍스트 공간은 여러가지 변수를 포함하는데, Φ₁은 조명의 방향, Φ₂는 밝기, Φ₃는 시야각 그리고 Φ₄는 물체의 스케일을 나타낸다.The context space indicates that the image quality is measured in consideration of the environment in which the image is acquired. For example, the context space includes several variables, where Φ ₁ is the direction of the illumination, Φ ₂ is the brightness, Φ ₃ is the viewing angle, and Φ ₄ is the scale of the object.

이와 같은 요소는 이산 값으로 측정되며, 컨텍스트 상태 φ는 각 변수의 정렬된 값으로, 즉 φ={Φ₁=φ₁,...,φ_M=φ_M}으로 표현된다.This is measured elements, such as discrete values, the context state φ is expressed in terms of the sorted values for each variable, i.e., _{_{φ = {Φ 1 = φ 1}} , ..., φ M = φ M}.

상태 공간 S는 눈 추적 공간에서 시스템 제어 파라미터로 정의되며, 환경의 변화에 영향을 받는다. 정확한 컨텍스트가 불가능하기 때문에, 상태 공간 S는 부분적으로 관찰 가능하게 감안하여 처리한다. 화질 상태가 실질적으로 측정될 수 없기 때문에 화질 라벨

'에 의해 추정된다.The state space S is defined as a system control parameter in the eye tracking space, and is affected by changes in the environment. Since the correct context is impossible, the state space S is processed in consideration of the partial observability. Since the image quality state can not be substantially measured,

'.

상태 공간 S는 컨텍스트 공간 Φ에 의존하는 시스템 제어 공간 X의 집합이다. 즉, S=(X｜Φ)이며, 여기에서 X={X₁,...,X_L}이고, Φ={Φ₁,..., Φ_M}으로, 컨텍스트 공간에 의존하는 시스템 제어 공간의 랜덤 변수 집합을 포함한다.The state space S is a set of system control spaces X that depend on the context space?. That is, S = (X | Φ) and, from here with _{X = {X 1, ...,} X L} a, {Φ = Φ _1, ..., Φ _M}, the system control that depends on the context space It contains a set of random variables of space.

상기 랜덤 변수 X_i는 눈 추적에서 문턱값 또는 파라미터를 의미한다. 개별적인 랜덤 변수의 시스템 제어 파라미터의 정렬 x={X₁=x₁,...,X_L=x_L}을 나타낸다. The random variable X _i means a threshold or parameter in eye tracking. Alignment of the system control parameters of the individual random variable x = denotes a _{_{{X 1 = x 1, ...}} , X L = x L}.

상태 공간 S에서의 상태 s는 다음 수학식 20로 표현된다.The state s in the state space S is expressed by the following equation (20).

(수학식 20)(20)

여기에서, s∈S이며, s는 제어부 양상의 나타내는 스칼라이며, D_i는 시스템 제어 파라미터 X_i의 이산화된 영역이다.Where s? S, s is the scalar representing the control part, and D _i is the discretized area of the system control parameter X _i .

제어부(310)는 눈 추적 장치에 있어서 시스템 제어 파라미터를 행위에 의해 변경할 수 있으나, 컨텍스트 공간에서 화질을 변경할 수는 없다.The control unit 310 can change the system control parameter in the eye tracking apparatus by an action, but can not change the image quality in the context space.

화질은 실제로 완전하게 측정될 수 없으며 다만 부분적으로 그리고 제한적으로 이용가능하다. 또한, 컨텍스트 공간에서 불확실성으로 인해 상태 s는 부분적으로 관찰가능하다.Image quality can not actually be measured completely and is only partially and limitedly available. Also, state s is partially observable due to uncertainty in the context space.

행위 집합a은 행위 공간 A에서 정의된다. 행위 a_i는 컨텍스트 공간Φ에 의존하는 시스템 제어 공간 X의 랜덤 변수 X_i에 대하여 각각 정의되며 이러한 관계는 수학식 21으로 표현된다.The action set a is defined in action space A. The action a _i is defined for each random variable X _i in the system control space X that depends on the context space Φ, and this relationship is expressed by Equation (21).

(수학식 21)(21)

여기에서, 행위 a는 행위 공간 A에 속하며, 각각의 X_i 수평에서 d_i 단위의 하락을 나타내거나 포화, 상승을 나타내는 제어부의 -d_i 행위, 0, +d_i 행위를 나타내는 스칼라이다.Here, action a belongs to action space A, and each X _i The -d _i action of the control, which represents the drop of d _i units horizontally or represents saturation, rise, 0, + d _i It is a Scala that represents an action.

눈 추적 알고리즘에 있어서 다음 상태는 불확실성을 모델링한 전이 확률 함수 T에 의해 나타내어진다. For the eye tracking algorithm, the following states are represented by the transition probability function T modeling the uncertainty.

함수 T는 현상태 s, 컨텍스트 상태 φ∈Φ, 행위 a가 주어지면 현상태로부터 다음 상태 s'으로의 전이 확률이다. 상태 전이의 확률 함수

는 다은 상태 s'의 확률 분포로 아래 수학식 22와 같이 정의된다.The function T is the transition probability from the current state to the next state s' given the current state s, the context state φ∈Φ, and the behavior a. Probability function of state transition

Is the probability distribution of the state s' and is defined as: < EMI ID = 22.0 >

(수학식 22)(22)

여기에서,

이다. From here,

to be.

관측 공간 O는 눈 추적 성능과 화질에 걸쳐 있는 분포 공간으로, O={τ,φ}로 나타낸다. 여기에서, τ는 추적 정확도이고, φ는 화질 측정치이다. 관찰 o는 관찰 확률 함수

를 사용해서 관찰 공간에 걸쳐 아래 수학식 23로 정의된다.Observation space O is a distribution space that spans snow tracing performance and image quality, and is expressed as O = {τ, φ}. Where? Is the tracking accuracy and? Is the image quality measurement. Observation o is the observation probability function

Lt; RTI ID = 0.0 > (23) < / RTI >

(수학식 23)(23)

여기에서, o_τ와 o_φ는 추적 정확도와 화질 측정치를 나타낸다. Where o _τ and o _φ denote tracking accuracy and image quality measurements.

더불어, 보상은 행위에 의한 즉각적으로 주어지는 것으로, 함수

로 표현되며, 보상 r은 r=R(s,a)로 표현된다. 보상 함수는 상태 s에서 행위 a를 취했을 때 얻는 보상을 나타내며, 보상 함수는 추적 성능에 근거해서 측정되며, 문턱값보다 추적 신뢰가 크면 높은 점수가 보상되고, 이와 달라지면 적은 점수가 보상된다. γ는 할인율로 0과 1사이의 실수 값을 가진다.In addition, compensation is given immediately by action,

, And the compensation r is expressed as r = R (s, a). The compensation function represents the compensation obtained when the action a is taken in the state s. The compensation function is measured based on the tracking performance. If the tracking confidence is higher than the threshold value, the higher score is compensated. γ has a real value between 0 and 1 at a discount rate.

상기 CA-POMDP에서 막대한 계산상의 요구를 극복하기 위하여, 유연한 월드 컨텍스트 모델들과 각각의 월드 컨텍스트 모델들에 대한 온라인 학습을 통해 간략화한다. In order to overcome the enormous computational demands of the CA-POMDP, we simplify the flexible world context models and online learning of each world context model.

상기 월드 컨텍스트 모델들은 온라인 학습에서 전이, 관찰 및 보상 분포의 결합 공간에 걸쳐 월드 컨텍스트 모델링부(320)에서 미리 컴파일한 것이다. The world context models are pre-compiled by the world context modeling unit 320 over the combined space of transition, observation, and compensation distributions in on-line learning.

온라인 학습은 불충분한 정보와 실시간의 한계로 인하여 눈 추적 알고리즘의 시스템 제어 파라미터의 동적 적응으로 공식화되며 실시간 Q-학습 접근[16]으로 해결된다.On-line learning is formulated by dynamic adaptation of the system control parameters of the eye tracking algorithm due to insufficient information and real-time limitations and solved by a real-time Q-learning approach [16].

정책 학습은 온라인 정책 학습과 오프라인 정책 학습으로 구분된다[10,11,12]. 오프라인 학습은 행위 선택 정책을 결정하기 위하여 막대한 계산을 요구한다. 이것은 실시간 임무에서 때때로 유리하나, 실행시간에 매우 빈번하게 발생하는 새로운 환경에서 적용하기는 어렵다. Policy learning is divided into online policy learning and offline policy learning [10,11,12]. Offline learning requires enormous computation to determine the behavior selection policy. This is sometimes advantageous in real-time tasks, but is difficult to apply in a new environment that occurs very frequently at run time.

온라인 접근은 실시간으로 약간의 계산을 요구하면서 적응성이 뛰어나다. 월드 컨텍스트 모델링부(320)에 의한 월드 컨텍스트 모델에 대하여 오프라인 학습을 그리고 실시간 Q-학습부(330)에 의한 실시간 국부 정책 결정에 대하여 온라인 학습을 결합하며 계산상의 이점이 있다. Online access is highly adaptable, requiring little computation in real time. There is an advantage in calculating the off-line learning for the world context model by the world context modeling unit 320 and for on-line learning for real-time local policy decision by the real-time Q-learning unit 330.

상기 CA-POMDP는 월드 컨텍스트 모델링부(320)를 통하여 실행 동안에는 확률 분포 T, 및 R에 대하여 가장 좋은 상태를 선택하며 오프라인 월드 컨텍스트 모델을 구성한다.The CA-POMDP selects the best state for the probability distributions T and R during execution through the world context modeling unit 320 and constructs an offline world context model.

상기 월드 컨텍스트 모델이 실시간 성능 저하 없이 미리 컴파일되기 때문에, 온라인 학습은 제한적인 신뢰 상태를 탐색하여 좋은 정책을 발견하도록 한다.Since the world context model is precompiled without real-time performance degradation, on-line learning explores limited trust states to discover good policies.

현재의 월드 컨텍스트 모델에서 최대값이 계산됨에 따라 상기 온라인 학습이 수행되며 정책은 국부적으로 결정된다.As the maximum value is calculated in the current world context model, the on-line learning is performed and the policy is locally determined.

월드 컨텍스트 모델은 개개의 행위보다 긴 시간 동안 충분한 행위가 발생됨에 따라 여러 스텝 동안 유지된다. 동일한 월드 컨텍스트 모델이 여러 시간 간격동안 유지되는 것은 탐색 전략을 일치시킬 수 있는 이점이 있다. 온라인 정책 산출은 시스템 정확도와 실시간 한계 사이에 균형을 고려할 필요가 있다.The world context model is maintained for several steps as long as sufficient action occurs for longer than an individual action. Keeping the same world context model for several time intervals has the advantage of matching the search strategy. Online policy output needs to consider a balance between system accuracy and real-time limits.

한편, CA-POMDP은 월드 컨텍스트 모델에서 오프라인 학습을 수행하고(월드 컨텍스트 모델링부(320)가 수행함), 현재 월드 컨텍스트 모델에 의존하는 상태에서 온라인 학습을 수행한다(실시간 Q-학습부(330)가 수행함).The CA-POMDP performs offline learning in the world context model (performed by the world context modeling unit 320), and performs on-line learning in a state dependent on the current world context model (real-time Q-learning unit 330) Lt; / RTI >

S,A,O 및 R 집합의 카디널러티(cardinality)가 제한되어 있기 때문에, CA-POMDP는 전이, 관찰 및 보상에서 충분한 한계를 허용한다. Since the cardinality of the S, A, O, and R sets is limited, CA-POMDP allows sufficient limits in metastasis, observation, and compensation.

월드 컨텍스트 모델링부(320)는 오프라인 학습을 위하여 결합 공간 CM=TxΩxR에서 월드 컨텍스트 모델을 미리 컴파일한다. 유사한 접근이 참고문헌 [17]에서 발견되는데, 화질 라벨을 반영하는 컨텍스트 공간을 고려하지 않는다.The world context modeling unit 320 precompiles the world context model at the coupling space CM = Tx? XxR for off-line learning. A similar approach is found in Ref. [17], which does not take into account the context space that reflects image quality labels.

도 4를 참조하면, 일반적인 POMDP와 본 발명의 CA-POMDP를 비교해서 보여준다.Referring to FIG. 4, a comparison between a general POMDP and a CA-POMDP of the present invention is shown.

일반적인 POMDP는 단일 월드 컨텍스트 모델로 행위이 제어되는데 반해, CA-POMDP에서 전이 동작은 현재 상태와 월드 컨텍스트 모델에 영향을 받는다.While the general POMDP is controlled by a single world context model, the transition behavior in CA-POMDP is affected by the current state and the world context model.

CA-POMDP는 다중의 월드 컨텍스트 모델로 구성되며, MDP 아래에서 국부 신뢰 공간을 가지고 있다. CA-POMDP는 환경과 상호작용에 있어서 부분적인 관찰에 대한 마코비안 성질에 근거하고 있다.CA-POMDP consists of multiple world context models and has a local trust space under the MDP. CA-POMDP is based on the Markovian nature of partial observations of environment and interaction.

랜덤 변수 A,O 및 R은 확률 분포 T, Ω 및 R과 관련되어 있다. O₁,O₂,...,O_n는 관찰 확률 분포 Ω의 랜덤 변수이다. A₀,A₁,...,A_N-1은 전이 확률 분포 T에 대한 랜덤 상수이며, R₁,R₂,...,R_N은 보상 확률 분포 R에 랜덤 상수이다.The random variables A, O, and R are associated with the probability distributions T, < RTI ID = 0.0 > O ₁ , O ₂ , ..., O _n are random variables of the observation probability distribution Ω. _{_{A 0, A 1, ...,}} A N-1 are random constants for the transition probability distribution _{_{T, R 1, R 2,}} ..., R N is a random constant to compensate the probability distribution R.

행위 A₀=a₀, A₁=a₁,...,A_N-1=a_N _-1에 대하여 각각 관찰 O₁=o₁, O₂=o₂,...,O_N=O_N에 보상 R₁=r₁, R₂=r₂,...,R_N=r_N을 구할 수 있다.Action _{_{_{A 0 = a 0, A 1}}} = a 1, ..., A N-1 = each observation with respect to _{_{_{_{a N -1 O 1 = o 1}}}} , O 2 = o 2, ..., O N = O _We can obtain the compensation R ₁ = r ₁ , R ₂ = r ₂ , ..., R _N = r _N.

CM 공간에 대한 월드 컨텍스트 모델 c에 대한 수집된 훈련 데이터는 h={(a₀, o₁,r₁),(a₁, o₂,r₂),...,(a_n-1, o_n,r_n)} 로 표현된다. 모든 월드 컨텍스트 모델에 대한 전체 훈련 데이터는

로 나타내어진다. 월드 컨텍스트 모델과 상태는 부분적으로 관찰가능하며, 결합 신뢰는 아래 수학식 24로 표현된다.The collected training data for the world context model c for the CM space is h = {(a ₀ , o ₁ , r ₁ ), (a ₁ , o ₂ , r ₂ ), ..., (a _n- o _n , r _n )}. Total training data for all world context models

Lt; / RTI > The world context model and state are partially observable, and the joint confidence is expressed by Equation 24 below.

(수학식 24)(24)

여기에서, b_c(s)=b(s｜c,h)이며, 신뢰 업데이트에 의해 결정된다. 행위는 국부 수행 시간에 걸친 기대 할인 보상을 최대화하는 월드 컨텍스트 모델에 대한 b_CM(c｜φ )에 의해 선택된다.Where b _c (s) = b (s | c, h) and is determined by the trust update. The behavior is selected by b _CM (c | φ) for the world context model that maximizes the expected discount compensation over the local execution time.

b_CM(c)는 월드 컨텍스트 모델c에 걸쳐 있는 신뢰를 나타내며, 아래 수학식 25로 간략화된다[17].b _CM (c) represents trust over the world context model c, and is simplified to Equation 25 below [17].

(수학식 25)(25)

여기에서,

는 화질 라벨 φ에서 월드 컨텍스트 모델의 델타 함수를 나타내며, w_i는 c_i와 관련된 가중치를 나타낸다. 컨텍스트 상태 φ는 화질 라벨 ψ의 간략화이며, 닫힌 형식이 아니다. 상기 월드 컨텍스트 모델링부(320)에 의해 구성된 월드 컨텍스트 모델의 신뢰는 영역 행위, 관찰 및 보상의 히스토리에 의해 결정된다. 전이, 관찰 및 보상 모델들이 이산 다중으로, 분포[19}를 학습하기 위해 디클레(Dirichlet) 정리와 베이시언(bayesian) 방법을 사용한다.From here,

Represents the delta function of the world context model in the image quality label &phiv;, and w _i represents a weight value associated with c _i . The context state? Is a simplification of the picture quality label?, Not a closed form. The trust of the world context model configured by the world context modeling unit 320 is determined by the history of area behavior, observation and compensation. Transition, observation, and compensation models use the Dirichlet theorem and the Bayesian method to learn the distribution [19] in discrete multiplicities.

새로운 랜덤변수를 (a,o,r)이라 하면, 확률은 현재 월드 컨텍스트 모델에 의해 계산되며, 다음 컨텍스트 모델은 MAP 군집화 규칙에 의해 결정된다. 확률은 월드 컨텍스트 모델 c에 의해 다음 수학식 26로 결정된다.If the new random variable is (a, o, r), the probability is calculated by the current world context model, and the next context model is determined by the MAP clustering rule. The probability is determined by the world context model c by the following equation (26).

(수학식 26)(26)

여기에서, p((a,o,r)｜c,h)는 현재 월드 컨텍스트 모델의 조건 밀도를 나타내며, 정리 확률 P(c)는 현재 월드 컨텍스트 모델의 신뢰 b_CM(c)이다. 그리고, h와 H는 개별적인 월드 컨텍스트 모델과 전체 월드 컨텍스트 모델의 각각의 훈련 집합을 의미한다. 새로운 랜덤 변수에 대한 다음 월드 컨텍스트 모델은 MAP 규칙을 사용하여 다음 수학식 27으로 결정된다.Here, p ((a, o, r) | c, h) represents the condition density of the current world context model and the theorem probability P (c) is the current world context model trust b _CM (c). And h and H denote the training set of the respective world context model and the entire world context model, respectively. The next world context model for the new random variable is determined using Equation 27 using the MAP rule.

(수학식 27)(27)

약간의 시간 동안 동일한 월드 컨텍스트 모델을 유지하는 것은 탐색 전략의 일치성을 향상시킨다. 월드 컨텍스트 모델은 개별적인 시간 단계보다 약간의 시간 단계의 기간 동안에 걸쳐 행위, 관찰 및 보상의 충분한 동기화에 기초하여 변경된다.Maintaining the same world context model for some time improves the consistency of the search strategy. The world context model is modified based on sufficient synchronization of behavior, observation, and compensation over a period of time steps that are slightly less than individual time steps.

한편, 월드 컨텍스트 모델링부(320)는 오프 라인 학습을 통하여 월드 컨텍스트 모델들을 모델링한다.Meanwhile, the world context modeling unit 320 models world context models through offline learning.

그리고, 실시간 Q-학습부(330)는 월드 컨덱스트 모델에 근거하여 최적 정책을 산출한다.Then, the real-time Q-learning unit 330 calculates an optimal policy based on the world context model.

이처럼 CA-POMDP는 현재의 월드 컨덱스트 모델에 근거하여 최적의 정책을 결정한다. 이는 제어부(310)가 월드 컨덱스트 모델의 국부 시간 범위를 고려하고 온라인 학습에 근거한 신뢰 상태의 작은 집합에 집중하도록 한다.Thus, CA-POMDP determines the optimal policy based on the current world context model. This allows the controller 310 to take into account the local time range of the world context model and to focus on a small set of trust states based on on-line learning.

비록 월드 컨텍스트 모델을 오프라인을 통하여 구축하는 것이 CA-POMDP에서 많은 계산 시간의 오버 헤드를 감소시키지만 여전히 시간 복잡도는 지수 함수적으로 증가하기 때문에, 즉 O(｜A∥O｜)^K(여기에서 k는 재귀 계산의 깊이이고, "O"는 큰 O 표기임)이기 때문에 다루기가 불가능하다. CA-POMDP의 탐색 공간의 감소는 탐색 깊이가 만족하지 못하기 때문에 새로운 온라인 접근을 사용해도 해결할 수 없다. 눈추적에 있어 실시간의 제약은 간단하고 빠른 알고리즘을 요구하며, 실시간 Q-학습은 직접인 신뢰 상태에서 해결하기보다는 불완전한 정보를 가지고 비동기적으로 동적 프로그램을 가진다.Although building the world context model off-line reduces the overhead of many computation times in CA-POMDP, but still time complexity exponentially increases, that is, O (| A∥O |) ^K (where k Is the depth of the recursive computation, and "O" is the big O notation). The reduction of the search space of CA-POMDP can not be solved by using a new online approach because the search depth is not satisfied. Real-time constraints on eye tracking require a simple and fast algorithm, and real-time Q-learning has dynamic programs asynchronously with incomplete information rather than solving in direct trust.

CA-POMDP의 근사 최적 가치 함수 V^*는 다음 시간 단계에 모든 불확실성이 해결되는, 즉 다음 행위 이후에 현재 신뢰 넘어의 모든 불확실성이 해결되는 것으로 간략화된다.The approximate optimal value function V ^* of CA-POMDP is simplified to solve all uncertainties at the next time step, that is, all uncertainties beyond the current confidence after the next action.

QMDP의 경험적인 접근은 아주 큰 장시간을 가지는 모든 상태로부터 보상을 가지는 행위이 각 시간 단계에서 신뢰 상태가 선택되어 가중되며 아래 수학식 28로 표시된다.The empirical approach of QMDP is that the behavior with compensation from all states with a very long time is weighted by the trust state selected at each time step and is expressed by Equation 28 below.

(수학식 28)(28)

여기에서, MDP 아래에서 V^* _MDP(S)는 상태 s의 최적 가치 함수를 나타내며, 신뢰 상태 b_c는 수학식 16을 사용하여 산출된다.Here, under MDP, V ^* _MDP (S) represents the optimal value function of state s, and the trust state b _c is calculated using equation (16).

간략화된 최적값은 MDP 아래에서 선형이며, 효과적으로 해결가능하다. 월드 컨텍스트 모델 c에 대한 MDP 아래에서 수행 시간은 아래 수학식 29로 정의된다.The simplified optimal values are linear under the MDP and can be effectively solved. The execution time under the MDP for the world context model c is defined by Equation 29 below.

(수학식 29)(29)

여기에서,

,

이다.From here,

,

to be.

각각의 월드 컨텍스트 모델에 대한 MDP 아래에서 각각의 국부 탐색 공간은 MDP 해결을 작은 시간 범위로 감소시킨다. MDP 아래에서 여전히 불완전한 정보를 가지고 있다. Under the MDP for each world context model, each local search space reduces the MDP resolution to a small time span. There is still incomplete information under the MDP.

눈 추적 모듈을 제거하는 것과 최적 성능을 달성하면서 행위를 추적하는 것의 충돌 문제는 민감한 이슈이다. 이러한 종류의 모순은 많은 탐색과 일반적이지 않은 매커니즘을 필요로 한다.The problem of removing eye tracking modules and tracking the behavior while achieving optimal performance is a sensitive issue. This kind of contradiction requires a lot of searching and an uncommon mechanism.

MDP는 베이시안 방법 또는 베이시안 방법이 아닌 방법을 사용하여 이러한 문제를 해결한다. 실시간 Q-학습은 직접적인 베이시안 방법은 아니다.MDP solves this problem using Bayesian or non-Bayesian methods. Real-time Q-learning is not a direct Bayesian method.

상기 실시간 Q-학습부(330)에 의한 각각의 월드 컨텍스트 모델에 대하여 Q-학습은 Q-테이블을 유지하며, 평가 함수 대신에 각각의 볼 수 있는 상태 행위 쌍에 대하여 최적 행위 함수 값인 Q-값의 평가에 의해 백업된다.For each world context model by the real-time Q-learning unit 330, the Q-learning maintains the Q-table, and instead of the evaluation function, the Q- Lt; / RTI >

월드 컨텍스트 모델 c의 Q-테이블은 수집된 훈련 데이터에 의해 초기화된다. Q^c _k(s_i,a_i)는 스테이지 k에서 Q^c ^*(s_i,a_i)의 평가를 나타내며, 즉

이다.The Q-table of the world context model c is initialized by the collected training data. Q ^c _k (s _i , a _i ) represents the evaluation of Q ^c ^* (s _i , a _i ) at stage k,

to be.

k=0,1,...,n의 각각의 스테이지에서 실현가능한 상태-행위 쌍이

로 표현되도록 하면, Q-값은 스테이지 k에서 업데이트된다. (s_i,a_i)∈

에 대하여, Q-값은 다음 수학식 30에 따라 스테이지 k+1에서 백업된다.A state-action pair that can be realized at each stage of k = 0, 1, ..., n

, The Q-value is updated at stage k. (s _i , a _i ) ∈

, The Q-value is backed up at stage k + 1 according to the following equation (30).

(수학식 30) (30)

여기에서, s_i'는 다음 상태이며, 즉 s_i'=T(s_i, a_i)이고, α(s_i, a_i)는 현재 상태-해위 쌍에서 학습률 파라미터이며, r_i는 보상 함수이다. MDP 아래에서 명시적인 모델의 구성하지 않고도 최적의 정책을 결정할 수 있다. Here, s _i 'is the next state, that is _{_{s i' = T (s i}} , a i) and, α (s _i, a _i) is the current state-of-a learning rate parameter in haewi pair, r _i is a compensation function to be. Under MDP, optimal policy can be determined without constructing an explicit model.

실시간 적용에 있어서, 제어부(310)는 각각의 시간 단계 t에서 상태 s_t를 관찰하며, 초기화 상태 이후에 모든 진행하는 시간 단계에서 최적의 Q-값을 산출한다.In real-time applications, the controller 310, and observing the state s _t at each time step t, and calculates the optimum Q- value in time steps of any proceeding after the initialization state.

제어부(310)는 상태 a_t∈A_c(s_t)를 선택하며, 그것을 수행하고, 눈추적 모듈(200)의 상태가 다음 상태로 전이될 때에 즉각적인 보상

을 받는다.The controller 310 selects the state a _t? A _c (s _t ) and performs it, and when the state of the eye tracking module 200 transitions to the next state,

.

이때, 월드 컨텍스트 모델 c의 Q-값은 시간 t+1에서 아래 수학식 31에 따라 백업된다.At this time, the Q-value of the world context model c is backed up according to the following equation (31) at time t + 1.

(수학식 31)(31)

여기에서, α(s_t, a_t)는 시간 단계 t에서 현재 상태-행위 쌍에서 학습률 파라미터이며, r_t는 보상 함수이다.Where α (s _t , a _t ) is the learning rate parameter in the current state-behavior pair at time t and r _t is the compensation function.

실시간 Q-학습부(330)는 오프라인 Q-학습의 제안된 경우로, 상태-행위 쌍의 집합에서 Q-값이 각 단계 t에서 백업되며,

는 (s_t, u_t)가 된다.The real-time Q-learning unit 330 is a proposed case of offline Q-learning, in which a Q-value is backed up at each step t,

(S _t , u _t ).

실시간 Q-학습부(330)는 오프라인 학습이 발산되는 것을 요구하는 상태의 Q^*에서는 발산한다.The real-time Q-learning unit 330 diverges in the state Q ^* in which the offline learning is required to be diverted.

최적의 정책에서 Q-학습이 각각의 실현가능한 행위이 무한한 수의 시간 단계가 반복되는 각 상태에서 백업되며 점진적인 방법으로 시간 단계 t가 학습률 α로 줄어들때 발산하는 것은 Watkins[47]에 의해 제공된다.In the optimal policy, Q-learning is provided by Watkins [47] where each feasible action is backed up in each state in which an infinite number of time steps are repeated and when the time step t decreases to the learning rate α in an incremental manner.

실시간 Q-학습부(330)는 비동기 DP 백업보다 적은 백업을 실시간 응용에서 필요하기 때문에 이점이 있다.The real-time Q-learning unit 330 is advantageous because less backup is required in real-time applications than asynchronous DP backup.

만약, n 상태가 있고 어떤 상태에서 허용되는 동작의 가장 큰 수가 m이면, Q-학습의 백업은 O(m)이며 DP 백업의 O(mn)보다 작다.If there is an n state and the largest number of operations allowed in any state is m, then the backup of the Q-learning is O (m) and less than O (mn) of the DP backup.

본 발명에서 국부적인 정책의 수립이 최적화되지 않을 때에, 즉 성공적인 눈 추적 기준에 부합되지 않을 겨웅에 추가적인 계산이 요구된다. 그 경우에, 다음 실현 가능한 월드 컨텍스트 모델이 시도되며, 시간 슬라이스는 종료된다. In the present invention, additional computation is required when the establishment of the local policy is not optimized, i.e., not meeting the successful eye tracking criteria. In that case, the next feasible world context model is tried, and the time slice ends.

눈 추적의 정확도와 실시간의 요구는 시간 슬라이스의 길이를 조정하여 균형을 이룰 수 있다.The accuracy and real-time requirements of eye tracking can be balanced by adjusting the length of the time slice.

이와 같은 강건한 휴먼 컴퓨터 상호작용을 위한 CA-POMDP를 이용한 눈 추적 장치의 동작을 도 5의 흐름도를 참조하여 설명하면 다음과 같다.The operation of the eye tracking apparatus using CA-POMDP for robust human computer interaction will be described with reference to the flowchart of FIG.

CA-POMDP 모듈(300)의 제어부(310)는 각각의 월드 컨텍스토 모델에 대한 상태 s를 초기화한다(S100).The control unit 310 of the CA-POMDP module 300 initializes the state s for each of the world context models (S100).

그리고, 이때 화질 평가 모듈(200)의 화질 라벨부(210)는 첫번째 영상을 획득하고 화질 라벨을 산출하여 CA-POMDP 모듈(300)의 제어부(310)와 월드 컨텍스트 모델링부(320)에 제공한다(S200).At this time, the image quality label unit 210 of the image quality evaluation module 200 acquires the first image, calculates the image quality label, and provides the image quality label to the control unit 310 and the world context modeling unit 320 of the CA-POMDP module 300 (S200).

한편, CA-POMDP 모듈(300)의 제어부(310)는 실시간 Q-학습부(330)에 월드 컨텍스트 모델 c를 질의한다(S300).Meanwhile, the control unit 310 of the CA-POMDP module 300 queries the real-time Q-learning unit 330 for the world context model c (S300).

이후에, 제어부(310)는 시간 슬라이스를 초기화한다(S400).Thereafter, the control unit 310 initializes the time slice (S400).

이에 따라 실시간 Q-학습부(330)는 그리디 폴리시를 이용하여 현재 상태 s로부터 행위 a를 선택한다(S500).Accordingly, the real-time Q-learning unit 330 selects the action a from the current state s using the glyph policy (S500).

그리디 폴리시는 현재 낮은 값 행위를 학습하는 것을 허용하지 않기 때문에, 시험해 보지 않은 동작 시퀀스를 탐색하지 않는다. 그러나, 미래에 높은 값을 유도할 수 있다. 이와 관련하여, ε-그리스 폴리스가 탐색과 개발간의 행위의 균형을 맞추기 위해 이용될 수 있다.Since the glyph policy does not currently allow learning low value behavior, it does not search for untried operation sequences. However, high values can be derived in the future. In this regard, ε-Greek polis can be used to balance behavior between exploration and development.

실시간 Q-학습부(330)는 행위를 얻고, 내부 보상을 계산함으로서 즉각적인 보상 r을 관찰하고, 새로운 내부 상태 s'을 관찰한다(S600).The real-time Q-learning unit 330 obtains an action, observes an immediate compensation r by calculating an internal compensation, and observes a new internal state s' (S600).

다음으로, 실시간 Q-학습부(330)는 월드 컨텍스트 모델 c의 Q-테이블을 업데이트하며, 이때 수학식 31이 이용된다(S700).Next, the real-time Q-learning unit 330 updates the Q-table of the world context model c, and the equation (31) is used at this time (S700).

그리고, 실시간 Q-학습부(330)는 상태를 다음 상태로 변경하며(S800), 시간 슬라이스가 종료되거나 눈 추적이 성공 조건을 만족하는 경우에(S900), 단계 S200으로 진행하고, 종료되지 않거나 만족하지 않는 경우에 단계 S500으로 진행한다.Then, the real-time Q-learning unit 330 changes the state to the next state (S800), and if the time slice is completed or the eye tracking satisfies the success condition (S900), the process proceeds to step S200, If not satisfied, the process proceeds to step S500.

상기와 같은 본 발명에 따르면 고비용의 이미지 캡쳐 기기나 매우 제한적인 상황을 이용하는 대신, CA-POMDP를 사용하여 시스템 제어 파라미터를 최적화하여 성능을 보장할 수 있다.According to the present invention, CA-POMDP can be used to optimize system control parameters to ensure performance, instead of using a costly image capture device or a very limited situation.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서 본 발명에 기재된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상이 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의해서 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The foregoing description is merely illustrative of the technical idea of the present invention, and various changes and modifications may be made by those skilled in the art without departing from the essential characteristics of the present invention. Therefore, the embodiments of the present invention are not intended to limit the scope of the present invention but to limit the scope of the present invention. The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents thereof should be construed as being included in the scope of the present invention.

100 : 눈 추적 모듈 200 : 화질 평가 모듈
210 : 화질 라벨부 220 : 화질 학습부
300 : CA-POMDP 모듈 310 : 제어부
320 : 월드 컨텍스트 모델링부 330 : 실시간 Q-학습부100: eye tracking module 200: image quality evaluation module
210: image quality labeling unit 220: picture quality learning unit
300: CA-POMDP module 310:
320: World context modeling unit 330: Real-time Q-learning unit

Claims

At the input of the facial image, the eye region is extracted, the eye image is binarized with the binarization threshold, the pupil position is tracked in the binarized eye image, the eye tracking is performed, and the compensation value An outputting eye tracking module;
An image quality evaluation module that receives a face image and evaluates image quality to provide an image quality label; And
The system control parameter is defined as a state, a world context model is configured by a combination of various objects according to image quality labels, and the world context model represented by attributes and attribute values of various objects according to the image quality label provided by the image quality evaluation module Determining a current state based on the current state, performing an executable system control parameter adjustment operation based on the determined current state, updating the behavior function value based on the compensation value of the system control parameter adjustment operation, And a CA-POMDP module for selecting an optimal system control parameter adjustment behavior based on the value of the CA-POMDP module.

The method according to claim 1,
Wherein the system control parameter comprises a binarization threshold, an angle parameter of the partial Huff transform, and a noise parameter of the Kalman filter.

The method according to claim 1,
The image quality evaluation module
An image quality learning unit for learning an image quality index from the image quality index prototype using the collected training image; And
And an image quality labeling unit for extracting a region of interest and dividing the image into a rectangular patch, loading a prototype of the image quality learning unit, extracting pixels of the scanning window, and calculating an image quality label.

The method according to claim 3,
Wherein the image quality label portion calculates a simplified image quality label in consideration of illumination change in calculating an image quality label.

The method according to claim 1,
The CA-POMDP module
The system control parameter is defined as a state, the current state of each state variable is determined based on the information on the world context model represented by the attributes and property values of various objects, and an executable action based on the updated behavior function value is performed ;
A world context modeling unit for constructing a world context model based on a combination of various objects for each image quality label; And
And a real-time Q-learning unit for updating the behavior function value based on the compensation value of the behavior for the current state and storing the updated behavior function value.

The method according to claim 1,
Wherein the real-time Q-learning unit updates the behavior function value of the world context model by acquiring an action, calculating an internal compensation, and observing a new internal state based on immediate compensation.