KR20230059029A

KR20230059029A - Electronic device and operating method for the same

Info

Publication number: KR20230059029A
Application number: KR1020210143035A
Authority: KR
Inventors: 데벤드라 아가르왈; 이수명
Original assignee: 삼성전자주식회사
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2023-05-03
Also published as: WO2023075118A1

Abstract

Disclosed in the disclosed embodiment is an electronic device including: a memory which stores one or more instructions; and a processor which executes the one or more instructions stored in the memory, wherein the processor may: receive a user voice input based on an execution screen of an application; obtain screen information for the execution screen through analysis of the execution screen based on information about the application; obtain application state information by inputting the obtained screen information to a neural network; and perform an operation corresponding to the user voice input on the basis of the application state information. Therefore, by analyzing the execution screen of the application, the state of the application can be identified.

Description

Electronic device and operating method thereof

다양한 실시예들은 전자 장치 및 그 동작 방법에 관한 것이다. 더욱 상세하게는 뉴럴 네트워크를 이용하여, 실행 중인 어플리케이션 상태를 식별하는 전자 장치 및 그 동작 방법에 관한 것이다.Various embodiments relate to an electronic device and an operating method thereof. More particularly, the present invention relates to an electronic device for identifying a state of an application being executed using a neural network and an operating method thereof.

인공지능 시스템인 인간 수준의 지능을 구현하는 컴퓨터 시스템으로서 기계가 스스로 학습하고 판단하며, 사용할수록 인식률이 향상되는 시스템이다. 인공지능 기술은 입력 데이터들의 특징을 스스로 분류/학습하는 알고리즘을 이용하는 기계학습(딥러닝) 기술 및 기계학습 알고리즘을 활용하여 인간 두뇌의 인지, 판단 등의 기능을 모사하는 요소 기술들로 구성된다.It is a computer system that implements human-level intelligence, which is an artificial intelligence system, and a machine that learns and judges itself, and the recognition rate improves as it is used. Artificial intelligence technology consists of machine learning (deep learning) technology using an algorithm that classifies/learns the characteristics of input data by itself, and elemental technologies that mimic functions such as recognition and judgment of the human brain using machine learning algorithms.

요소 기술들은, 예로, 인간의 언어/문자를 인식하는 언어적 이해 기술, 사물을 인간의 시각처럼 인식하는 시각적 이해 기술, 정보를 판단하여 논리적으로 추론하고 예측하는 추론/예측 기술, 인간의 경험 정보를 지식데이터로 처리하는 지식 표현 기술 및 차량의 자율 주행, 로봇의 움직임을 제어하는 동작 제어 기술 중 적어도 하나를 포함할 수 있다. 언어적 이해는 인간의 언어/문자를 인식하고 응용/처리하는 기술로서, 자연어 처리, 기계번역, 대화 시스템, 질의 응답, 음성 인식/합성 등을 포함한다.Elemental technologies include, for example, linguistic understanding technology that recognizes human language/characters, visual understanding technology that recognizes objects as human eyes, reasoning/prediction technology that logically infers and predicts information by judging information, and human experience information. It may include at least one of a knowledge expression technology for processing as knowledge data and a motion control technology for controlling the autonomous driving of a vehicle and the movement of a robot. Linguistic understanding is a technology for recognizing and applying/processing human language/text, and includes natural language processing, machine translation, dialogue system, question and answering, voice recognition/synthesis, and the like.

음성 인식은, 마이크와 같은 소리 센서를 통해 얻은 오디오 신호를 단어나 문장과 같은 텍스트 데이터로 변환시키는 처리를 의미한다. 한편, 음성 인식을 이용하여, 어플리케이션의 기능 또는 동작을 수행하는 경우, 어플리케이션의 상태에 따라, 사용자 음성 입력에 대응하는 기능 또는 동작이 결정되므로, 어플리케이션의 상태를 식별할 필요가 있다.Speech recognition refers to a process of converting an audio signal obtained through a sound sensor such as a microphone into text data such as words or sentences. Meanwhile, when a function or operation of an application is performed using voice recognition, a function or operation corresponding to a user's voice input is determined according to the state of the application, and thus the state of the application needs to be identified.

어플리케이션의 상태를 식별하기 위한 방법으로, 어플리케이션 화면 자체를, 이미지 분류 네트워크를 이용하여, 화면을 분석하는 방법을 이용할 수 있다. 또는, 어플리케이션 화면 전체에 대한 이미지 처리를 수행하고, 이미지 처리된 결과와 어플리케이션의 가능한 모든 상태를 상세하게 기술하는 데이터를 비교함으로써, 어플리케이션의 상태를 식별하는 방법을 이용할 수 있다. 그러나, 상기 방법들은 대용량의 메모리를 필요로 하고, 이미지 처리에 많은 시간이 소요된다는 문제점이 있다.As a method for identifying the state of the application, a method of analyzing the application screen itself using an image classification network may be used. Alternatively, a method of identifying the state of the application may be used by performing image processing on the entire screen of the application and comparing the image processing result with data describing all possible states of the application in detail. However, the above methods have problems in that a large amount of memory is required and image processing takes a lot of time.

개시된 실시예들은, 어플리케이션의 실행 화면을 분석함으로써, 어플리케이션의 상태를 식별할 수 있는 전자 장치 및 그 동작 방법을 제공할 수 있다.Disclosed embodiments may provide an electronic device capable of identifying an application state by analyzing an execution screen of the application, and an operating method thereof.

일 실시예에 따른 전자 장치는, 하나 이상의 인스트럭션들을 저장하는 메모리, 및 상기 메모리에 저장된 상기 하나 이상의 인스트럭션들을 실행하는 프로세서를 포함하고, 상기 프로세서는, 어플리케이션의 실행 화면에 기초한 사용자 음성 입력을 수신하고, 상기 어플리케이션에 대한 정보에 기초한 상기 실행 화면의 분석을 통해 상기 실행 화면에 대한 화면 정보를 획득하며, 상기 획득한 화면 정보를 뉴럴 네트워크에 입력하여 어플리케이션 상태 정보를 획득하고, 상기 어플리케이션 상태 정보에 기초하여, 상기 사용자 음성 입력에 대응하는 동작을 수행할 수 있다.An electronic device according to an embodiment includes a memory that stores one or more instructions, and a processor that executes the one or more instructions stored in the memory, wherein the processor receives a user voice input based on an execution screen of an application and , Screen information on the execution screen is acquired through analysis of the execution screen based on the information on the application, and application state information is obtained by inputting the acquired screen information to a neural network, and based on the application state information Thus, an operation corresponding to the user voice input may be performed.

일 실시예에 따른 어플리케이션에 대한 정보는, 상기 어플리케이션에 포함되는 하나 이상의 아이템들의 종류, 상기 아이템들의 크기 정보, 상기 아이템들의 위치 정보, 및 하이라이트 여부에 따른 상기 아이템들의 픽셀 값 정보 중 적어도 하나를 포함할 수 있다.Information about an application according to an embodiment includes at least one of types of one or more items included in the application, size information of the items, location information of the items, and pixel value information of the items according to whether or not they are highlighted. can do.

일 실시예에 따른 화면 정보는, 상기 어플리케이션 실행 화면에 포함된 바운딩 박스들에 대한 정보, 상기 바운딩 박스들의 하이라이트 여부, 및 상기 어플리케이션에 포함되는 아이템들이 상기 실행 화면에 포함되는지 여부 중 적어도 하나를 포함할 수 있다.Screen information according to an embodiment includes at least one of information about bounding boxes included in the application execution screen, whether the bounding boxes are highlighted, and whether items included in the application are included in the execution screen. can do.

일 실시예에 따른 어플리케이션 상태 정보는, 상기 실행 화면에 포함되는 아이템들에 대한 정보 및 상기 아이템들 중 선택된 아이템에 대한 정보를 포함할 수 있다.Application state information according to an embodiment may include information on items included in the execution screen and information on a selected item among the items.

일 실시예에 따른 프로세서는, 상기 사용자 음성 입력에 대응하는 아이템이 상기 실행 화면에 포함되면, 상기 사용자의 음성 입력에 대응하는 동작을 수행할 수 있다.The processor according to an embodiment may perform an operation corresponding to the user's voice input when an item corresponding to the user's voice input is included in the execution screen.

일 실시예에 따른 프로세서는, 상기 실행 화면에서 상기 선택된 아이템과 상기 사용자 음성 입력에 대응하는 아이템의 위치 관계에 기초하여, 상기 사용자 음성 입력에 대응하는 동작을 수행할 수 있다.The processor according to an embodiment may perform an operation corresponding to the user voice input based on a positional relationship between the selected item on the execution screen and the item corresponding to the user voice input.

일 실시예에 따른 전자 장치는, 디스플레이를 더 포함하고, 상기 프로세서는, 상기 실행 화면을 표시하도록 상기 디스플레이를 제어할 수 있다.The electronic device according to an embodiment may further include a display, and the processor may control the display to display the execution screen.

일 실시예에 따른 전자 장치는, 상기 사용자 음성 입력을 수신하는 마이크로폰을 더 포함할 수 있다.An electronic device according to an embodiment may further include a microphone for receiving the user's voice input.

일 실시예에 따른 전자 장치는, 상기 사용자 음성 입력을 수신하는 통신부를 더 포함할 수 있다.The electronic device according to an embodiment may further include a communication unit that receives the user's voice input.

일 실시예에 따른 전자 장치의 동작 방법은, 어플리케이션의 실행 화면에 기초한 사용자 음성 입력을 수신하는 단계, 상기 어플리케이션에 대한 정보에 기초하여, 상기 실행 화면을 분석함으로써, 상기 실행 화면에 대한 화면 정보를 획득하는 단계, 상기 화면 정보를 뉴럴 네트워크에 입력함으로써, 어플리케이션 상태 정보를 획득하는 단계, 및 상기 어플리케이션 상태 정보에 기초하여, 상기 사용자 음성 입력에 대응하는 동작을 수행하는 단계를 포함할 수 있다.An operating method of an electronic device according to an embodiment includes receiving a user voice input based on an execution screen of an application, analyzing the execution screen based on information on the application, and thereby obtaining screen information on the execution screen. Obtaining application state information by inputting the screen information to a neural network; and and performing an operation corresponding to the user voice input based on the application state information.

일 실시예에 따른 전자 장치는 어플리케이션 정보에 기초하여, 어플리케이션의 실행 화면을 분석함으로써, 화면 정보를 획득하고, 획득된 화면 정보와 기 학습된 뉴럴 네트워크를 이용하여, 어플리케이션의 상태를 식별할 수 있다. 이에 따라, 어플리케이션의 상태를 식별하는 데 이용되는 메모리 사용량을 감소시킬 수 있으며, 처리 속도가 빨라질 수 있다.An electronic device according to an embodiment may obtain screen information by analyzing an execution screen of an application based on application information, and identify a state of the application by using the obtained screen information and a pre-learned neural network. . Accordingly, the memory usage used to identify the state of the application can be reduced, and the processing speed can be increased.

도 1은 일 실시예에 따른 전자 장치를 나타내는 도면이다.
도 2는 일 실시예에 따른 전자 장치의 동작 방법을 나타내는 흐름도이다.
도 3은 일 실시예에 따른 전자 장치가 어플리케이션 상태 정보를 획득하는 동작을 설명하기 위한 도면이다.
도 4는 일 실시예에 따른 화면 분석부가 어플리케이션 실행 화면을 분석하여, 화면 정보를 획득하는 동작을 설명하기 위한 도면이다.
도 5는 일 실시예에 따른 어플리케이션 상태 결정부가 화면 정보에 기초하여, 어플리케이션의 상태를 결정하는 동작을 설명하기 위한 도면이다.
도 6은 일 실시예에 따른 뉴럴 네트워크의 훈련 방법을 설명하기 위한 도면이다.
도 7a 및 도 7b는 일 실시예에 따른 전자 장치가 어플리케이션 상태 정보에 기초하여, 사용자 음성 입력에 대응하는 동작을 수행하는 동작을 설명하기 위한 도면들이다.
도 8은 일 실시예에 따른 전자 장치가 사용자 음성 입력에 대응하는 동작을 수행하는 방법을 나타내는 흐름도이다.
도 9는 일 실시예에 따른 전자 장치가 어플리케이션 상태 정보에 기초하여, 사용자 음성 입력에 대응하는 동작을 수행하는 방법을 설명하기 위한 도면이다.
도 10은 일 실시예에 따른 음성 인식 시스템을 나타내는 도면이다.
도 11은 일 실시예에 따른 전자 장치의 구성을 나타내는 블록도이다.
도 12는 다른 실시예에 따른 전자 장치의 구성을 나타내는 블록도이다.1 is a diagram illustrating an electronic device according to an exemplary embodiment.
2 is a flowchart illustrating a method of operating an electronic device according to an exemplary embodiment.
3 is a diagram for explaining an operation of obtaining application state information by an electronic device according to an exemplary embodiment.
4 is a diagram for explaining an operation of obtaining screen information by a screen analyzer analyzing an application execution screen according to an exemplary embodiment.
5 is a diagram for explaining an operation of an application state determination unit determining an application state based on screen information according to an exemplary embodiment.
6 is a diagram for explaining a method for training a neural network according to an exemplary embodiment.
7A and 7B are diagrams for explaining an operation in which an electronic device performs an operation corresponding to a user voice input based on application state information, according to an exemplary embodiment.
8 is a flowchart illustrating a method of performing an operation corresponding to a user voice input by an electronic device according to an exemplary embodiment.
9 is a diagram for explaining a method of performing an operation corresponding to a user voice input based on application state information by an electronic device according to an exemplary embodiment.
10 is a diagram illustrating a voice recognition system according to an exemplary embodiment.
11 is a block diagram illustrating a configuration of an electronic device according to an exemplary embodiment.
12 is a block diagram illustrating the configuration of an electronic device according to another embodiment.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 발명에 대해 구체적으로 설명하기로 한다.The terms used in this specification will be briefly described, and the present invention will be described in detail.

본 발명에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in the present invention have been selected from general terms that are currently widely used as much as possible while considering the functions in the present invention, but these may vary depending on the intention of a person skilled in the art or precedent, the emergence of new technologies, and the like. In addition, in a specific case, there is also a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the invention. Therefore, the term used in the present invention should be defined based on the meaning of the term and the overall content of the present invention, not simply the name of the term.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.When it is said that a certain part "includes" a certain component throughout the specification, it means that it may further include other components without excluding other components unless otherwise stated. In addition, terms such as "...unit" and "module" described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. .

아래에서는 첨부한 도면을 참고하여 실시예들에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, embodiments will be described in detail so that those skilled in the art can easily carry out the present invention. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

본 명세서의 실시예에서 "사용자"라는 용어는 시스템, 기능 또는 동작을 제어하는 사람을 의미하며, 개발자, 관리자 또는 설치 기사를 포함할 수 있다.In the embodiments of this specification, the term "user" means a person who controls a system, function, or operation, and may include a developer, administrator, or installer.

또한, 본 명세서의 실시예에서, '영상(image)' 또는 '픽처'는 정지영상, 복수의 연속된 정지영상(또는 프레임)으로 구성된 동영상, 또는 비디오를 나타낼 수 있다.In addition, in an embodiment of the present specification, 'image' or 'picture' may indicate a still image, a motion picture composed of a plurality of continuous still images (or frames), or a video.

도 1은 일 실시예에 따른 전자 장치를 나타내는 도면이다.1 is a diagram illustrating an electronic device according to an exemplary embodiment.

도 1을 참조하면, 일 실시예에 따른 전자 장치(100)는 사용자의 음성 입력을 수신하여, 수신한 사용자 음성 입력에 대응하는 동작을 수행하는 전자 장치일 수 있다. 전자 장치(100)는, TV, 셋탑 박스, 휴대폰, 태블릿 PC, 디지털 카메라, 캠코더, 노트북 컴퓨터(laptop computer), 데스크탑, 전자책 단말기, 디지털 방송용 단말기, PDA(Personal Digital Assistants), PMP(Portable Multimedia Player), 네비게이션, MP3 플레이어, 착용형 장치(wearable device) 등과 같은 다양한 형태로 구현될 수 있다.Referring to FIG. 1 , an electronic device 100 according to an embodiment may be an electronic device that receives a user's voice input and performs an operation corresponding to the received user's voice input. The electronic device 100 includes TVs, set-top boxes, mobile phones, tablet PCs, digital cameras, camcorders, laptop computers, desktops, e-book readers, digital broadcasting terminals, personal digital assistants (PDAs), and portable multimedia devices (PMPs). Player), navigation, MP3 player, wearable device, etc. can be implemented in various forms.

또한, 도 1에는 전자 장치(100)가 디스플레이를 포함하는 것으로 도시하였으나, 이에 한정되지 않는다. 전자 장치(100)는 디스플레이를 포함하는 별도의 디스플레이 장치와 유무선 통신으로 연결되어, 디스플레이 장치로 비디오/오디오 신호를 전송하도록 구성될 수 있다.In addition, although the electronic device 100 is illustrated as including a display in FIG. 1, it is not limited thereto. The electronic device 100 may be connected to a separate display device including a display through wired/wireless communication and transmit video/audio signals to the display device.

또한, 전자 장치(100)는 고정된 위치에 배치되는 고정형 전자 장치 또는 휴대 가능한 형태를 갖는 이동형 전자 장치일 수 있으며, 디지털 방송 수신이 가능한 디지털 방송 수신기일 수 있다.Also, the electronic device 100 may be a fixed electronic device disposed at a fixed location or a mobile electronic device having a portable form, and may be a digital broadcasting receiver capable of receiving digital broadcasting.

일 실시예에 따른 제어 장치(200)는 리모컨 또는 휴대폰과 같이 전자 장치(200)를 제어하기 위한 다양한 형태의 장치로 구현될 수 있다. 제어 장치(200)에는 전자 장치(100)를 제어하기 위한 어플리케이션이 설치될 수 있으며, 제어 장치(200)는 설치된 어플리케이션을 이용하여, 전자 장치(100)를 제어할 수 있다. 제어 장치(200)는 IR(Infrared), BT(Bluetooth), Wi-Fi 등을 이용하여 전자 장치를 제어할 수 있다.The control device 200 according to an embodiment may be implemented as various types of devices for controlling the electronic device 200, such as a remote controller or a mobile phone. An application for controlling the electronic device 100 may be installed in the control device 200 , and the control device 200 may control the electronic device 100 using the installed application. The control device 200 may control the electronic device using IR (Infrared), BT (Bluetooth), Wi-Fi, and the like.

한편, 사용자는 전자 장치(100) 또는 제어 장치(200)에 대하여 발화할 수 있으며, 발화는 전자 장치(100)로 하여금 특정 기능(예를 들어, 전자 장치(100)에 포함된 하드웨어/소프트웨어 구성들의 동작 제어, 컨텐츠 검색 등)을 수행하도록 하는 자연어를 포함할 수 있다.Meanwhile, the user may speak to the electronic device 100 or the control device 200, and the speaking causes the electronic device 100 to perform a specific function (eg, a hardware/software configuration included in the electronic device 100). It may include natural language to perform operation control, content search, etc.).

일 실시예에 따른 전자 장치(100)는 내장 또는 외장된 오디오 입력 모듈(예를 들어, 마이크로폰)을 이용하여, 사용자의 발화(아날로그 음성 신호)를 디지털 오디오 신호(오디오 데이터)로 변환할 수 있다.The electronic device 100 according to an embodiment may convert a user's speech (analog voice signal) into a digital audio signal (audio data) using a built-in or external audio input module (eg, a microphone). .

또는, 제어 장치(200)에는 음성 입력 어플리케이션 또는 음성 인식 어플리케이션이 설치되어, 사용자는 해당 어플리케이션을 이용하여, 제어 장치(200)로 음성 입력을 수행할 수 있다. 사용자는 제어 장치(200)에 대해 발화할 수 있으며, 제어 장치(200)는 내장 또는 외장된 마이크로폰을 이용하여, 사용자의 발화(아날로그 음성 신호)를 디지털 오디오 신호(오디오 데이터)로 변환할 수 있다.Alternatively, a voice input application or a voice recognition application is installed in the control device 200, and the user can perform a voice input with the control device 200 using the corresponding application. The user may speak to the control device 200, and the control device 200 may convert the user's speech (analog voice signal) into a digital audio signal (audio data) using a built-in or external microphone. .

제어 장치(200)는 변환된 오디오 데이터를 전자 장치(100)로 전송할 수 있다. 제어 장치(200)는 오디오 데이터를 BT(Bluetooth), Wi-Fi 등을 이용하여, 전자 장치(100)로 전송할 수 있다. 전자 장치(100)는 BT(Bluetooth) 모듈 또는 Wi-Fi 모듈을 포함하는 통신 모듈을 통해 제어 장치(200)로부터 사용자의 발화(사용자 음성 입력)에 대응하는 오디오 데이터를 수신할 수 있다.The control device 200 may transmit the converted audio data to the electronic device 100 . The control device 200 may transmit audio data to the electronic device 100 using BT (Bluetooth), Wi-Fi, or the like. The electronic device 100 may receive audio data corresponding to a user's speech (user voice input) from the control device 200 through a communication module including a Bluetooth (BT) module or a Wi-Fi module.

일 실시예에 따른 전자 장치(100)는 사용자 음성 입력이 수신되면, 사용자 음성 입력에 대응하는 동작 또는 기능을 수행할 수 있다. 이때, 전자 장치(100)가 사용자 음성 입력에 대응하는 동작 또는 기능을 수행하기 위해서는 현재 실행 중인 어플리케이션의 상태를 식별할 필요가 있다.When a user voice input is received, the electronic device 100 according to an embodiment may perform an operation or function corresponding to the user voice input. At this time, in order for the electronic device 100 to perform an operation or function corresponding to the user's voice input, it is necessary to identify the state of the currently running application.

예를 들어, 도 1에 도시된 바와 같이, 사용자 음성 입력에 대응하는 동작이 설정 메뉴를 선택하는 것인 경우, 전자 장치(100)는 현재 어플리케이션 실행 화면에서 선택된 아이템과 설정 메뉴에 대응하는 아이템(40, "설정" 아이템)을 식별하고, 선택된 아이템과 "설정" 아이템(40) 사이의 위치 관계에 기초하여, 키 동작을 발생시킬 수 있다.For example, as shown in FIG. 1 , when an operation corresponding to a user voice input is selecting a setting menu, the electronic device 100 displays an item selected from the current application execution screen and an item corresponding to the setting menu ( 40, a “setting” item), and based on the positional relationship between the selected item and the “setting” item 40, a key operation can be generated.

구체적으로, 전자 장치(100)에 표시된 현재 어플리케이션 실행 화면에서 "홈" 아이템(10)이 선택된 상태이고, "설정" 아이템(40)이 "홈" 아이템(10)의 오른쪽 방향으로 3번째 위치하는 아이템인 경우, 전자 장치(100)는 "설정" 아이템(40)을 선택하기 위해, 오른쪽 키 입력을 3회 발생시킬 수 있다. 이에 따라, "설정" 아이템(40)이 선택될 수 있다. 전자 장치(100)는 선택된 아이템을 다른 아이템들과 구별하기 위하여, 선택된 아이템을 하이라이트하여 표시하거나 선택된 아이템 상에 포커스, 커서 등을 표시할 수 있다. 다만, 이에 한정되지 않으며, 다양한 방식으로 표시될 수 있다.Specifically, when the “Home” item 10 is selected on the current application execution screen displayed on the electronic device 100, and the “Settings” item 40 is located third to the right of the “Home” item 10, In the case of an item, the electronic device 100 may generate a right key input 3 times to select the “settings” item 40 . Accordingly, the "setting" item 40 may be selected. The electronic device 100 may highlight and display the selected item or display a focus, cursor, etc. on the selected item in order to distinguish the selected item from other items. However, it is not limited thereto and may be displayed in various ways.

또한, 전자 장치(100)에 표시된 현재 어플리케이션 실행 화면에서 "영화" 아이템(20)이 선택된 상태이고, "설정" 아이템(40)이 "영화" 아이템(20)의 오른쪽 방향으로 2번째 위치하는 아이템인 경우, 전자 장치(100)는 오른쪽 키 입력을 2회 발생시킬 수 있다. 이에 따라, "설정" 아이템(40)이 선택될 수 있다.In addition, the "movie" item 20 is selected in the current application execution screen displayed on the electronic device 100, and the "setting" item 40 is the second item to the right of the "movie" item 20. In case of , the electronic device 100 may generate the right key input twice. Accordingly, the "setting" item 40 may be selected.

이와 같이, 전자 장치(100)는 사용자 음성 입력에 대응하는 동작을 수행하기 위해서는, 현재 어플리케이션 실행 화면에 포함되는 아이템들에 대한 정보, 실행 화면에 포함된 아이템들 중 선택된 아이템에 대한 정보 등을 포함하는 어플리케이션 상태 정보가 필요하다.In this way, in order to perform an operation corresponding to a user's voice input, the electronic device 100 includes information on items included in the current application execution screen, information on a selected item among items included in the execution screen, and the like. application state information.

이하, 도면들을 참조하여, 일 실시예에 따른 전자 장치(100)가 어플리케이션 상태 정보를 획득하는 방법에 대하여 자세히 설명하기로 한다.Hereinafter, a method for obtaining application state information by the electronic device 100 according to an embodiment will be described in detail with reference to the drawings.

도 2는 일 실시예에 따른 전자 장치의 동작 방법을 나타내는 흐름도이다.2 is a flowchart illustrating a method of operating an electronic device according to an exemplary embodiment.

도 2를 참조하면, 전자 장치(100)는 사용자의 음성 입력을 수신할 수 있다(S210).Referring to FIG. 2 , the electronic device 100 may receive a user's voice input (S210).

예를 들어, 전자 장치(100)는 전자 장치(100)에 내장 또는 외장된 오디오 입력 모듈(예를 들어, 마이크로폰)을 이용하여, 사용자의 발화를 입력 받는 경우, 오디오 입력 모듈로부터 사용자의 음성 입력을 수신할 수 있다. 또는, 제어 장치(200)와 같은 외부 장치에 내장 또는 외장된 마이크로폰을 이용하여, 사용자의 발화를 입력 받는 경우, 전자 장치(100)는 통신 모듈(통신부)을 통하여, 외부 장치로부터 사용자의 음성 입력을 수신할 수 있다.For example, when the electronic device 100 receives a user's speech using an audio input module (eg, a microphone) built into or external to the electronic device 100, the user's voice is input from the audio input module. can receive Alternatively, when a user's speech is received using a microphone built into or external to an external device such as the control device 200, the electronic device 100 inputs the user's voice from the external device through a communication module (communication unit). can receive

전자 장치(100)는 어플리케이션 실행 화면을 분석하여, 화면 정보를 획득할 수 있다(S220).The electronic device 100 may obtain screen information by analyzing the application execution screen (S220).

일 실시예에 따른 전자 장치(100)는 어플리케이션에 대한 정보에 기초하여, 실행 화면을 분석함으로써, 실행 화면에 대한 화면 정보를 획득할 수 있다. 이때, 어플리케이션에 대한 정보는, 어플리케이션에 포함된 하나 이상의 아이템들의 종류, 아이템들의 크기 정보, 아이템들의 위치 정보, 하이라이트 여부에 따른 아이템들의 픽셀 값 정보 중 적어도 하나를 포함할 수 있다. The electronic device 100 according to an embodiment may obtain screen information on the execution screen by analyzing the execution screen based on information about the application. In this case, the information about the application may include at least one of types of one or more items included in the application, size information of the items, location information of the items, and pixel value information of the items according to whether or not they are highlighted.

전자 장치(100)는 어플리케이션에 대한 정보에 기초하여, 현재 실행 화면에 포함되는 바운딩 박스들을 검출하고, 검출된 바운딩 박스들 각각에 대응하는 아이템 정보, 검출된 바운딩 박스가 하이라이트된 아이템인지 여부 등을 포함하는 화면 정보를 획득할 수 있다.The electronic device 100 detects bounding boxes included in the current execution screen based on information about the application, and determines item information corresponding to each of the detected bounding boxes, whether or not the detected bounding box is a highlighted item, and the like. It is possible to obtain screen information including the screen information.

전자 장치(100)는 화면 정보를 뉴럴 네트워크에 입력함으로써, 어플리케이션 상태 정보를 획득할 수 있다(S230).The electronic device 100 may acquire application state information by inputting screen information to the neural network (S230).

일 실시예에 따른 뉴럴 네트워크는 어플리케이션에 대한 화면 정보와 화면 정보에 대응하는 어플리케이션 상태 정보를 포함하는 복수의 훈련 데이터들을 이용하여, 훈련될 수 있다.A neural network according to an embodiment may be trained using a plurality of training data including screen information about an application and application state information corresponding to the screen information.

전자 장치(100)는 획득한 화면 정보를 뉴럴 네트워크에 입력함으로써, 어플리케이션 상태 정보를 출력으로 획득할 수 있다. 일 실시예에 따른 어플리케이션 상태 정보는, 현재 어플리케이션 실행 화면에 포함되는 아이템들에 대한 정보, 아이템들 중 선택된 아이템에 대한 정보 등을 포함할 수 있다.The electronic device 100 may obtain application state information as an output by inputting the obtained screen information to the neural network. Application state information according to an embodiment may include information on items included in a current application execution screen, information on a selected item among items, and the like.

일 실시예에 따른 전자 장치(100)는 어플리케이션 상태 정보에 기초하여, 수신한 사용자 음성 입력에 대응하는 동작을 수행할 수 있다(S240).The electronic device 100 according to an embodiment may perform an operation corresponding to the received user voice input based on application state information (S240).

예를 들어, 전자 장치(100)는 현재 어플리케이션 실행 화면에 사용자 음성 입력에 대응하는 아이템이 포함되는지 여부, 실행 화면에서 선택된 아이템과 사용자 음성 입력에 대응하는 아이템 사이의 위치 관계 등에 기초하여, 사용자 음성 입력에 대응하는 동작을 결정하고, 수행할 수 있다.For example, the electronic device 100 determines the user voice based on whether an item corresponding to the user's voice input is included in the current application execution screen, a positional relationship between an item selected on the execution screen and an item corresponding to the user's voice input, and the like. An action corresponding to the input may be determined and performed.

도 3은 일 실시예에 따른 전자 장치가 어플리케이션 상태 정보를 획득하는 동작을 설명하기 위한 도면이다.3 is a diagram for explaining an operation of obtaining application state information by an electronic device according to an exemplary embodiment.

도 3을 참조하면, 일 실시예에 따른 전자 장치(100)는 현재 어플리케이션 실행 화면(301)을 획득할 수 있다. 구체적으로, 전자 장치(100)가 디스플레이를 포함하고, 전자 장치(100)의 디스플레이에 어플리케이션 실행 화면(301)이 표시되는 경우, 전자 장치(100)는 표시 중인 어플리케이션 실행 화면(301)을 캡쳐하여, 실행 화면(301)을 획득할 수 있다.Referring to FIG. 3 , the electronic device 100 according to an embodiment may obtain a current application execution screen 301 . Specifically, when the electronic device 100 includes a display and the application execution screen 301 is displayed on the display of the electronic device 100, the electronic device 100 captures the currently displayed application execution screen 301 and , the execution screen 301 can be obtained.

또는, 전자 장치(100)가 디스플레이를 포함하지 않고, 전자 장치(100)가 실행 화면 정보(비디오 데이터)를 외부 디스플레이 장치로 전송하는 경우, 전자 장치(100)는 실행 화면 정보에 기초하여, 어플리케이션 실행 화면(301)을 획득할 수 있다. 또는, 전자 장치(100)는 외부 디스플레이 장치로부터, 외부 디스플레이 장치에서 캡쳐한 어플리케이션 실행 화면(301)을 수신할 수도 있다. 다만, 이에 한정되지 않는다.Alternatively, when the electronic device 100 does not include a display and the electronic device 100 transmits execution screen information (video data) to an external display device, the electronic device 100 based on the execution screen information, the application An execution screen 301 may be acquired. Alternatively, the electronic device 100 may receive the application execution screen 301 captured by the external display device from the external display device. However, it is not limited thereto.

일 실시예에 따른 전자 장치(100)는 화면 분석부(310)를 포함할 수 있다. 일 실시예에 따른 화면 분석부(310)는 어플리케이션 실행 화면(301)을 분석하여, 화면 정보를 획득할 수 있다. 화면 분석부(310)는 어플리케이션 실행 화면(301)에서 화면 정보를 획득할 수 있도록 동작되는 적절한 로직, 회로, 인터페이스, 및/또는 코드를 포함할 수 있다.The electronic device 100 according to an embodiment may include a screen analyzer 310 . The screen analyzer 310 according to an embodiment may obtain screen information by analyzing the application execution screen 301 . The screen analyzer 310 may include appropriate logic, circuits, interfaces, and/or codes that operate to obtain screen information from the application execution screen 301 .

일 실시예에 따른 화면 분석부(310)는 어플리케이션에 대한 정보(어플리케이션 정보)를 이용하여, 어플리케이션 실행 화면(301)을 분석할 수 있다. 전자 장치(100)는 복수의 어플리케이션들 각각에 대한 어플리케이션 정보를 포함할 수 있으며, 어플리케이션 실행 화면(301)에 대응하는 어플리케이션 정보를 이용하여, 어플리케이션 실행 화면(301)을 분석할 수 있다.The screen analyzer 310 according to an embodiment may analyze the application execution screen 301 by using application information (application information). The electronic device 100 may include application information for each of a plurality of applications, and may analyze the application execution screen 301 using application information corresponding to the application execution screen 301 .

예를 들어, 어플리케이션 정보는, 어플리케이션에 포함된 하나 이상의 아이템들의 종류, 아이템들의 크기 정보, 아이템들의 위치 정보, 하이라이트 여부에 따른 아이템들의 픽셀 값 정보 중 적어도 하나를 포함할 수 있다. For example, application information may include at least one of types of one or more items included in the application, size information of items, location information of items, and pixel value information of items according to whether or not they are highlighted.

구체적으로, 어플리케이션 정보는, 해당 어플리케이션은 썸네일 아이템들 및 메뉴 아이템들을 포함하며, 썸네일 아이템들은 100 x 200의 사이즈를 가지고, 메뉴 아이템들은 30 x 30의 사이즈를 가진다는 정보를 포함할 수 있다. 또한, 어플리케이션 정보는, 메뉴 아이템들은 화면의 상단 영역에 가로 방향으로 위치하며, 썸네일 아이템들은 화면의 중앙 영역에 위치한다는 정보를 포함할 수 있다. 또한, 어플리케이션 정보는, 아이템들은 그레이 스케일에서 픽셀 값이 10 이상이며, 하이라이트된 아이템은 그레이 스케일에서 픽셀 값이 220 이상이라는 정보를 포함할 수 있다. 또한, 어플리케이션 정보는, 어플리케이션에 포함되는 아이템 리스트를 포함할 수 있다. 상기 기재한 어플리케이션 정보는 일 예들에 불과하며, 어플리케이션 정보는 어플리케이션에 따라 다양한 정보를 포함할 수 있다.Specifically, the application information may include information indicating that the corresponding application includes thumbnail items and menu items, the thumbnail items have a size of 100 x 200, and the menu items have a size of 30 x 30. Also, the application information may include information indicating that menu items are located in the upper region of the screen in a horizontal direction and thumbnail items are located in the central region of the screen. In addition, the application information may include information indicating that items have a pixel value of 10 or more in a gray scale and that a highlighted item has a pixel value of 220 or more in a gray scale. Also, the application information may include a list of items included in the application. The application information described above is just one example, and the application information may include various types of information according to applications.

화면 분석부(310)가 어플리케이션 정보를 이용하여, 화면 정보를 획득하는 동작에 대해서는 도 4를 참조하여 자세히 설명하기로 한다.An operation of acquiring screen information by the screen analyzer 310 using application information will be described in detail with reference to FIG. 4 .

도 4는 일 실시예에 따른 화면 분석부가 어플리케이션 실행 화면을 분석하여, 화면 정보를 획득하는 동작을 설명하기 위한 도면이다.4 is a diagram for explaining an operation of obtaining screen information by a screen analyzer analyzing an application execution screen according to an exemplary embodiment.

도 4를 참조하면, 화면 분석부(310)는 어플리케이션 정보에 기초하여, 실행 화면(301)에 포함된 바운딩 박스들을 검출할 수 있다. 구체적으로, 화면 분석부(310)는 메뉴 아이템의 위치 정보에 기초하여, 실행 화면(301)의 상단 영역에서 메뉴 아이템에 대응하는 바운딩 박스들(411, 412, 413, 414)을 검출할 수 있다. 또한, 화면 분석부(310)는 썸네일 아이템들의 위치 정보에 기초하여, 실행 화면(301)의 중앙 영역에서 썸네일 아이템에 대응하는 바운딩 박스들(421, 422, 423, 424)을 검출할 수 있다. 또한, 화면 분석부(310)는 어플리케이션 정보에 포함되는 아이템 리스트에 기초하여, 검출된 바운딩 박스들 각각에 대응하는 아이템을 식별하고, 어플리케이션에 포함되는 아이템들이 실행 화면(301)에 표시되고 있는지 여부(아이템 상태 정보)를 결정할 수 있다. 또한, 검출된 바운딩 박스들의 픽셀 값들에 기초하여, 바운딩 박스의 하이라이트 여부를 구별할 수 있다. Referring to FIG. 4 , the screen analyzer 310 may detect bounding boxes included in the execution screen 301 based on application information. Specifically, the screen analyzer 310 may detect the bounding boxes 411, 412, 413, and 414 corresponding to the menu item in the upper region of the execution screen 301 based on the location information of the menu item. . Also, the screen analyzer 310 may detect the bounding boxes 421 , 422 , 423 , and 424 corresponding to the thumbnail items in the central area of the execution screen 301 based on the location information of the thumbnail items. In addition, the screen analyzer 310 identifies an item corresponding to each of the detected bounding boxes based on the list of items included in the application information, and determines whether the items included in the application are displayed on the execution screen 301. (item status information) can be determined. Also, based on pixel values of the detected bounding boxes, it is possible to distinguish whether the bounding box is highlighted.

이에 따라, 일 실시예에 따른 화면 정보(430)는 하이라이트 박스 정보, 노말 박스 정보, 및 아이템 상태 정보를 포함할 수 있다. 이때, 하이라이트 박스 정보 및 노말 박스 정보는, 바운딩 박스의 위치 좌표, 바운딩 박스의 너비, 바운딩 박스의 높이 값들을 포함할 수 있다.Accordingly, the screen information 430 according to an embodiment may include highlight box information, normal box information, and item state information. In this case, the highlight box information and the normal box information may include location coordinates of the bounding box, width of the bounding box, and height values of the bounding box.

다시, 도 3을 참조하면, 일 실시예에 따른 전자 장치(100)는 어플리케이션 상태 결정부(320)를 포함할 수 있다. 일 실시예에 따른 어플리케이션 상태 결정부(320)는 화면 분석부(310)에서 획득된 화면 정보에 기초하여, 어플리케이션의 상태를 결정할 수 있다. 어플리케이션 상태 결정부(320)는 화면 정보로부터 어플리케이션의 상태를 결정할 수 있도록 동작되는 적절한 로직, 회로, 인터페이스, 및/또는 코드를 포함할 수 있다.Again, referring to FIG. 3 , the electronic device 100 according to an embodiment may include an application state determining unit 320 . The application state determination unit 320 according to an embodiment may determine the state of the application based on screen information obtained from the screen analysis unit 310 . The application state determination unit 320 may include appropriate logic, circuits, interfaces, and/or codes that operate to determine the state of the application from screen information.

일 실시예에 따른 어플리케이션 상태 결정부(320)는 어플리케이션 실행 화면에 대한 화면 정보와 뉴럴 네트워크를 이용하여, 어플리케이션의 상태를 결정할 수 있다. 이에 대해서는 도 5를 참조하여 자세히 설명하기로 한다.The application state determination unit 320 according to an embodiment may determine the state of the application by using screen information about the application execution screen and a neural network. This will be described in detail with reference to FIG. 5 .

도 5는 일 실시예에 따른 어플리케이션 상태 결정부가 화면 정보에 기초하여, 어플리케이션의 상태를 결정하는 동작을 설명하기 위한 도면이다.5 is a diagram for explaining an operation of an application state determination unit determining an application state based on screen information according to an exemplary embodiment.

일 실시예에 따른 어플리케이션 상태 결정부(320)는 뉴럴 네트워크(520)를 포함할 수 있다. 뉴럴 네트워크(520)는 화면 정보를 입력 받아, 어플리케이션 상태 정보를 출력하는 뉴럴 네트워크일 수 있으며, 적어도 하나의 뉴럴 네트워크를 포함할 수 있다. 일 실시예에 따른 뉴럴 네트워크(520)는 연산을 수행하는 하나 이상의 레이어들을 포함하며, 레이어들이 복수인 딥 뉴럴 네트워크(DNN: Deep Neural Netwokr)를 포함할 수 있다.The application state determination unit 320 according to an embodiment may include a neural network 520 . The neural network 520 may be a neural network that receives screen information and outputs application state information, and may include at least one neural network. The neural network 520 according to an embodiment includes one or more layers that perform calculations and may include a deep neural network (DNN) having a plurality of layers.

뉴럴 네트워크가 입력 데이터에 대응하는 결과 데이터를 정확하게 출력하기 위해서는 목적에 따라 뉴럴 네트워크를 학습(training)시켜야 한다. 여기서, '학습(training)'은 뉴럴 네트워크로 다양한 데이터들을 입력시키고, 입력된 데이터들을 분석하는 방법, 입력된 데이터들을 분류하는 방법, 및/또는 입력된 데이터들에서 결과 데이터 생성에 필요한 특징을 추출하는 방법 등을 뉴럴 네트워크가 스스로 발견 또는 터득할 수 있도록 뉴럴 네트워크를 훈련시키는 것을 의미할 수 있다. 구체적으로, 학습 과정을 통하여, 뉴럴 네트워크는 학습 데이터(예를 들어, 서로 다른 복수의 이미지들)를 학습(training)하여 뉴럴 네트워크 내부의 가중치 값들을 최적화하여 설정할 수 있다. 그리고, 최적화된 가중치 값을 가지는 뉴럴 네트워크를 통하여, 입력된 데이터를 스스로 학습(learning)함으로써, 목적하는 결과를 출력한다.In order for the neural network to accurately output result data corresponding to input data, the neural network must be trained according to a purpose. Here, 'training' means inputting various data into the neural network, analyzing the input data, classifying the input data, and/or extracting features necessary for generating result data from the input data. It can mean training the neural network so that the neural network can discover or learn how to do it by itself. Specifically, through a learning process, the neural network may train training data (eg, a plurality of different images) to optimize and set weight values inside the neural network. And, by self-learning the input data through a neural network having optimized weight values, a desired result is output.

예를 들어, 학습(training)을 통하여, 뉴럴 네트워크(520)가 입력된 화면 정보에 기초하여, 어플리케이션 상태 정보를 출력하도록 뉴럴 네트워크(520) 내부의 가중치 값들이 최적화될 수 있다. 이에 따라, 학습(training)이 완료된 뉴럴 네트워크(520)는 화면 정보를 입력 받고, 어플리케이션의 상태 정보를 출력할 수 있다.For example, through training, weight values inside the neural network 520 may be optimized so that the neural network 520 outputs application state information based on input screen information. Accordingly, the neural network 520 on which training is completed may receive screen information and output state information of the application.

일 실시예에 따른 어플리케이션의 상태 정보는, 현재 어플리케이션 실행 화면에 포함되는 아이템들에 대한 정보, 아이템들 중 선택된(하이라이트된) 아이템에 대한 정보 등을 포함할 수 있다.State information of an application according to an embodiment may include information on items included in a current application execution screen, information on a selected (highlighted) item among items, and the like.

도 6은 일 실시예에 따른 뉴럴 네트워크의 훈련 방법을 설명하기 위한 도면이다.6 is a diagram for explaining a method for training a neural network according to an exemplary embodiment.

일 실시예에 따른 뉴럴 네트워크(610)는 외부 장치에 의해 훈련될 수 있다. 이때, 외부 장치는 일 실시예에 따른 전자 장치(100)와 다른 별도의 장치일 수 있다. 예를 들어, 외부 장치는 훈련 데이터 셋에 기초하여, 뉴럴 네트워크(610)를 훈련시킬 수 있으며, 훈련이 완료된 뉴럴 네트워크(610)에 대한 정보를 전자 장치(100)로 전송할 수 있다. The neural network 610 according to an embodiment may be trained by an external device. In this case, the external device may be a separate device different from the electronic device 100 according to an embodiment. For example, the external device may train the neural network 610 based on a training data set, and may transmit information about the neural network 610 on which training is completed to the electronic device 100 .

외부 장치는 복수의 훈련 데이터들(620)에 기초하여, 뉴럴 네트워크(610)를 훈련시킬 수 있다. 이때, 복수의 훈련 데이터들(620)은, 어플리케이션의 복수의 실행 화면들에 기초하여 생성될 수 있다. 하나의 어플리케이션은 복수의 실행 화면들을 포함할 수 있으며, 복수의 실행 화면들 각각은 어플리케이션의 상태 정보에 대응될 수 있다.The external device may train the neural network 610 based on the plurality of training data 620 . In this case, the plurality of training data 620 may be generated based on the plurality of execution screens of the application. One application may include a plurality of execution screens, and each of the plurality of execution screens may correspond to state information of the application.

예를 들어, 복수의 훈련 데이터들(620) 각각은, 어플리케이션의 제1 실행 화면을 나타내는 제1 화면 정보와 제1 실행 화면에 대응하는 어플리케이션의 제1 상태 정보를 포함하는 제1 훈련 데이터, 어플리케이션의 제2 실행 화면을 나타내는 제2 화면 정보와 제2 실행 화면에 대응하는 어플리케이션의 제2 상태 정보를 포함하는 제2 훈련 데이터 등을 포함할 수 있다. For example, each of the plurality of training data 620 includes first training data including first screen information representing a first execution screen of an application and first state information of an application corresponding to the first execution screen, and application Second training data including second screen information indicating a second execution screen of and second state information of an application corresponding to the second execution screen may be included.

외부 장치는 제1 훈련 데이터에 포함되는 제1 화면 정보를 뉴럴 네트워크(610)에 입력하여, 출력되는 데이터(출력 데이터)와 제1 훈련 데이터에 포함되는 제1 상태 정보의 차이를 최소화하는 방향으로 뉴럴 네트워크(610)에 포함되는 가중치들을 업데이트할 수 있다. 또한, 외부 장치는 제2 훈련 데이터에 포함되는 제2 화면 정보를 뉴럴 네트워크(610)에 입력하여, 출력되는 데이터와 제2 훈련 데이터에 포함되는 제2 상태 정보의 차이가 최소화되는 방향으로 뉴럴 네트워크(610)에 포함되는 가중치들을 업데이트할 수 있다. 외부 장치는 동일한 방식으로 복수의 훈련 데이터들(620)을 이용하여, 뉴럴 네트워크(610)의 가중치를 업데이트함으로써, 뉴럴 네트워크(610)를 훈련시킬 수 있다. 훈련이 완료된 뉴럴 네트워크(610) 또는 뉴럴 네트워크(610)에 대한 정보는 전자 장치(100)로 전송될 수 있다.The external device inputs the first screen information included in the first training data to the neural network 610 in a direction to minimize the difference between the output data (output data) and the first state information included in the first training data. Weights included in the neural network 610 may be updated. In addition, the external device inputs the second screen information included in the second training data to the neural network 610 so that the difference between the output data and the second state information included in the second training data is minimized. Weights included in 610 may be updated. The external device may train the neural network 610 by updating the weight of the neural network 610 using the plurality of training data 620 in the same manner. The trained neural network 610 or information about the neural network 610 may be transmitted to the electronic device 100 .

도 7a 및 도 7b는 일 실시예에 따른 전자 장치가 어플리케이션 상태 정보에 기초하여, 사용자 음성 입력에 대응하는 동작을 수행하는 동작을 설명하기 위한 도면들이다.7A and 7B are diagrams for explaining an operation in which an electronic device performs an operation corresponding to a user voice input based on application state information, according to an exemplary embodiment.

도 7a를 참조하면, 일 실시예에 따른 전자 장치(100)는 전자 장치(100)에 설치된 복수의 어플리케이션들 중 제1 어플리케이션을 실행할 수 있다. 전자 장치(100)는 제1 어플리케이션이 실행되면, 제1 어플리케이션의 제1 실행 화면(710)을 표시할 수 있다. 전자 장치(100)는 제1 어플리케이션의 제1 실행 화면(710)이 표시된 상태에서, 사용자의 발화(예를 들어, "컨텐츠 2 보여줘")에 대응하는 사용자 음성 입력을 수신할 수 있다.Referring to FIG. 7A , the electronic device 100 according to an embodiment may execute a first application among a plurality of applications installed on the electronic device 100 . When the first application is executed, the electronic device 100 may display the first execution screen 710 of the first application. The electronic device 100 may receive a user voice input corresponding to the user's utterance (eg, “Show me content 2”) while the first execution screen 710 of the first application is displayed.

전자 장치(100)는 사용자의 음성 입력에 대응하는 동작을 수행하기 위해, 제1 어플리케이션의 제1 실행 화면(710)에 대응하는 제1 어플리케이션의 상태 정보를 획득할 수 있다. 일 실시예에 따른 전자 장치(100)가 어플리케이션의 상태 정보를 획득하는 동작에 대해서는 도 2 내지 도 6에서 자세히 설명하였으므로, 구체적인 설명은 생략하기로 한다.The electronic device 100 may obtain state information of the first application corresponding to the first execution screen 710 of the first application in order to perform an operation corresponding to the user's voice input. Since the operation of obtaining state information of an application by the electronic device 100 according to an embodiment has been described in detail with reference to FIGS. 2 to 6 , a detailed description thereof will be omitted.

전자 장치(100)는, 제1 실행 화면(710)은 컨텐츠 1에 대응하는 아이템(720)이 선택되어 하이라이트되고 있으며, 사용자의 발화에 포함된 컨텐츠 2에 대응하는 아이템(730)은 컨텐츠 1에 대응하는 아이템(720)의 오른쪽 방향으로 2번째에 위치한다는 정보를 제1 실행 화면에 대응하는 상태 정보(제1 상태 정보)로 획득할 수 있다.In the electronic device 100, on the first execution screen 710, an item 720 corresponding to content 1 is selected and highlighted, and an item 730 corresponding to content 2 included in the user's utterance is displayed in content 1. Information indicating that the corresponding item 720 is located second in the right direction may be obtained as state information (first state information) corresponding to the first execution screen.

전자 장치(100)는 제1 상태 정보에 기초하여, 사용자의 발화에 포함된 컨텐츠 2를 실행하기 위하여, 오른쪽 방향 키를 2번 발생시켜, 컨텐츠 2에 대응하는 아이템(730)을 선택함으로써, 컨텐츠 2를 실행시킬 수 있다. 이에 따라, 전자 장치(100)는 컨텐츠 2의 실행 화면(740)을 디스플레이하도록 동작할 수 있다.Based on the first state information, the electronic device 100 selects the item 730 corresponding to the content 2 by generating the right direction key twice to execute the content 2 included in the user's utterance. 2 can be executed. Accordingly, the electronic device 100 may operate to display the execution screen 740 of content 2.

한편, 도 7b를 참조하면, 일 실시예에 따른 전자 장치(100)는 복수의 어플리케이션들 중 제1 어플리케이션을 실행할 수 있다. 전자 장치(100)는 제1 어플리케이션이 실행되면, 제1 어플리케이션의 제2 실행 화면(750)을 표시할 수 있다.Meanwhile, referring to FIG. 7B , the electronic device 100 according to an embodiment may execute a first application among a plurality of applications. When the first application is executed, the electronic device 100 may display the second execution screen 750 of the first application.

전자 장치(100)는 제1 어플리케이션의 제2 실행 화면(750)이 표시된 상태에서, 사용자의 발화(예를 들어, "컨텐츠 2 보여줘")에 대응하는 사용자 음성 입력을 수신할 수 있다.The electronic device 100 may receive a user voice input corresponding to the user's utterance (eg, “show content 2”) in a state where the second execution screen 750 of the first application is displayed.

전자 장치(100)는 사용자의 음성 입력에 대응하는 동작을 수행하기 위해, 제1 어플리케이션의 제2 실행 화면(750)에 대응하는 제1 어플리케이션의 제2 상태 정보를 획득할 수 있다. 이때, 제2 상태 정보는, 제2 실행 화면은 컨텐츠 2에 대응하는 아이템을 포함하지 않으며, 검색 메뉴에 대응하는 아이템(760)을 포함한다는 정보를 포함할 수 있다.The electronic device 100 may obtain second state information of the first application corresponding to the second execution screen 750 of the first application in order to perform an operation corresponding to the user's voice input. In this case, the second status information may include information indicating that the second execution screen does not include an item corresponding to content 2 and includes an item 760 corresponding to the search menu.

전자 장치(100)는 제2 상태 정보에 기초하여, 검색 메뉴를 선택하여, 사용자의 발화에 포함된 컨텐츠 2를 검색하고, 컨텐츠 2에 대한 검색 결과(770)를 디스플레이하도록 동작할 수 있다.Based on the second state information, the electronic device 100 may operate to select a search menu, search for content 2 included in the user's utterance, and display a search result 770 for content 2.

도 8은 일 실시예에 따른 전자 장치가 사용자 음성 입력에 대응하는 동작을 수행하는 방법을 나타내는 흐름도이다.8 is a flowchart illustrating a method of performing an operation corresponding to a user voice input by an electronic device according to an exemplary embodiment.

도 8을 참조하면, 일 실시예에 따른 전자 장치는 어플리케이션 실행 화면을 표시할 수 있다(S810).Referring to FIG. 8 , the electronic device according to an embodiment may display an application execution screen (S810).

예를 들어, 전자 장치(100)는 홈 화면을 표시할 수 있다. 홈 화면은 복수의 어플리케이션들 각각에 대응하는 아이템을 포함할 수 있다. 전자 장치(100)는 홈 화면에서 제1 어플리케이션에 대응하는 아이템을 선택하는 사용자 입력을 수신하면, 제1 어플리케이션을 실행시킬 수 있다. 전자 장치(100)는 제1 어플리케이션이 실행되면, 제1 어플리케이션의 실행 화면을 표시할 수 있다.For example, the electronic device 100 may display a home screen. The home screen may include items corresponding to each of a plurality of applications. When receiving a user input for selecting an item corresponding to the first application on the home screen, the electronic device 100 may execute the first application. When the first application is executed, the electronic device 100 may display an execution screen of the first application.

전자 장치(100)는 어플리케이션의 실행 화면이 표시되는 상태에서, 사용자의 음성 입력을 수신할 수 있다(S820).The electronic device 100 may receive a user's voice input while an application execution screen is displayed (S820).

전자 장치(100)는 표시된 실행 화면을 분석하여, 화면 정보를 획득할 수 있다(S830).The electronic device 100 may acquire screen information by analyzing the displayed execution screen (S830).

전자 장치(100)가 실행 화면을 분석하여, 화면 정보를 획득하는 방법에 대해서는 도 2 내지 도 4에서 자세히 설명하였으므로 구체적인 설명은 생략하기로 한다.Since the method for acquiring screen information by analyzing the execution screen by the electronic device 100 has been described in detail with reference to FIGS. 2 to 4 , a detailed description thereof will be omitted.

전자 장치(100)는 830 단계(S830)에서 획득한 화면 정보를 뉴럴 네트워크에 입력함으로써, 어플리케이션의 실행 화면에서 현재 선택된 아이템을 식별할 수 있다(S840).The electronic device 100 may identify the currently selected item on the execution screen of the application by inputting the screen information obtained in step 830 (S830) to the neural network (S840).

전자 장치(100)가 화면 정보를 뉴럴 네트워크 입력함으로써, 선택된 아이템에 대한 정보를 포함하는 어플리케이션의 상태 정보를 획득하는 방법에 대해서는 도 5 및 6에서 자세히 설명하였으므로 구체적인 설명은 생략하기로 한다.Since the method for acquiring state information of an application including information on a selected item by the electronic device 100 inputting screen information to a neural network has been described in detail with reference to FIGS. 5 and 6 , a detailed description thereof will be omitted.

전자 장치(100)는 현재 선택된 아이템에 기초하여, 사용자 음성 입력에 대응하는 동작을 수행할 수 있다(S850).The electronic device 100 may perform an operation corresponding to the user's voice input based on the currently selected item (S850).

예를 들어, 전자 장치(100)는 현재 선택된 아이템의 위치와 사용자 음성 입력에 대응하는 아이템의 위치 관계에 기초하여, 사용자 음성 입력에 대응하는 동작을 수행할 수 있다.For example, the electronic device 100 may perform an operation corresponding to a user voice input based on a positional relationship between a location of a currently selected item and an item corresponding to the user voice input.

도 9는 일 실시예에 따른 전자 장치가 어플리케이션 상태 정보에 기초하여, 사용자 음성 입력에 대응하는 동작을 수행하는 방법을 설명하기 위한 도면이다.9 is a diagram for explaining a method of performing an operation corresponding to a user voice input based on application state information by an electronic device according to an exemplary embodiment.

도 9를 참조하면, 일 실시예에 따른 전자 장치(100)는 복수의 어플리케이션들 중 제1 어플리케이션을 실행할 수 있다. 전자 장치(100)는 제1 어플리케이션이 실행되면, 제1 어플리케이션의 제1 실행 화면을 표시할 수 있다. 전자 장치(100)는 제1 어플리케이션의 제1 실행 화면이 표시된 상태에서, 사용자의 발화(예를 들어, "컨텐츠 2 선택해줘")에 대응하는 사용자 음성 입력을 수신할 수 있다.Referring to FIG. 9 , the electronic device 100 according to an embodiment may execute a first application among a plurality of applications. When the first application is executed, the electronic device 100 may display a first execution screen of the first application. The electronic device 100 may receive a user voice input corresponding to the user's speech (eg, "Select content 2") in a state where the first execution screen of the first application is displayed.

전자 장치(100)는 사용자의 음성 입력에 대응하는 동작을 수행하기 위해, 제1 어플리케이션의 제1 실행 화면에 대응하는 제1 어플리케이션의 상태 정보를 획득할 수 있다. 일 실시예에 따른 전자 장치(100)가 어플리케이션의 상태 정보를 획득하는 동작에 대해서는 도 2 내지 도 6에서 자세히 설명하였으므로, 구체적인 설명은 생략하기로 한다.The electronic device 100 may obtain state information of the first application corresponding to the first execution screen of the first application in order to perform an operation corresponding to the user's voice input. Since the operation of obtaining state information of an application by the electronic device 100 according to an embodiment has been described in detail with reference to FIGS. 2 to 6 , a detailed description thereof will be omitted.

전자 장치(100)는, 제1 실행 화면(910)은 컨텐츠 1에 대응하는 아이템(920)이 선택되어 하이라이트되고 있으며, 사용자의 발화에 포함된 컨텐츠 2에 대응하는 아이템(930)은 컨텐츠 1에 대응하는 아이템(920)의 오른쪽 방향으로 2번째에 위치한다는 정보를 제1 실행 화면에 대응하는 상태 정보(제1 상태 정보)로 획득할 수 있다.In the electronic device 100, on the first execution screen 910, an item 920 corresponding to content 1 is selected and highlighted, and an item 930 corresponding to content 2 included in the user's utterance corresponds to content 1. Information indicating that the corresponding item 920 is located second in the right direction may be obtained as state information (first state information) corresponding to the first execution screen.

전자 장치(100)는 제1 상태 정보에 기초하여, 사용자의 발화에 포함된 컨텐츠 2를 선택하기 위하여, 오른쪽 방향 키를 2번 발생시켜, 컨텐츠 2에 대응하는 아이템(930)을 선택할 수 있다. 이에 따라, 컨텐츠 2에 대응하는 아이템(930)은 하이라이트될 수 있다. 또는 컨텐츠 2에 대응하는 아이템(930) 상에 포커스가 표시될 수 있다.Based on the first state information, the electronic device 100 may generate the right direction key twice to select the item 930 corresponding to the content 2 in order to select the content 2 included in the user's utterance. Accordingly, the item 930 corresponding to content 2 may be highlighted. Alternatively, the focus may be displayed on the item 930 corresponding to content 2.

도 10은 일 실시예에 따른 음성 인식 시스템을 나타내는 도면이다.10 is a diagram illustrating a voice recognition system according to an exemplary embodiment.

도 10을 참조하면, 일 실시예에 따른 음성 인식 시스템은, 전자 장치(100) 및 서버(1000)를 포함할 수 있다. 서버(1000)는 전자 장치(100)와 네트워크 또는 근거리 통신을 통하여 연결될 수 있다. 일 실시예에 따른 서버(1000)는 음성 인식 처리를 수행하는 서버일 수 있다. 또한, 도 10에는 서버가 하나인 것으로 도시하였으나, 이에 한정되지 않으며, 복수의 서버들에 의해 음성 인식 처리가 수행될 수 있다.Referring to FIG. 10 , a voice recognition system according to an embodiment may include an electronic device 100 and a server 1000. The server 1000 may be connected to the electronic device 100 through a network or short-range communication. The server 1000 according to an embodiment may be a server that performs voice recognition processing. In addition, although FIG. 10 shows one server, it is not limited thereto, and voice recognition processing may be performed by a plurality of servers.

일 실시예에 따른 전자 장치(100)는 사용자의 발화에 대응하는 사용자 음성 입력을 수신할 수 있다. 전자 장치(100)는 사용자 음성 입력을 수신하여, 사용자 음성 입력에 대한 음성 인식 처리를 수행할 수 있다. 또는, 음성 입력에 대응하는 신호(오디오 신호)를 서버(1000)로 전송할 수 있다. 서버(1000)는 전자 장치(100)로부터 수신한 오디오 데이터에 대한 음성 인식 처리를 수행할 수 있다.The electronic device 100 according to an embodiment may receive a user voice input corresponding to a user's speech. The electronic device 100 may receive a user voice input and perform voice recognition processing on the user voice input. Alternatively, a signal (audio signal) corresponding to the voice input may be transmitted to the server 1000 . The server 1000 may perform voice recognition processing on audio data received from the electronic device 100 .

음성 인식 처리는 오디오 신호에 대응하는 텍스트 데이터를 획득하는 처리일 수 있다. 음성 인식 처리는 스피치 투 텍스트(STT) 처리를 포함할 수 있다. 예를 들어, 음성 인식 처리는 사용자가 발화한 음성 신호를 문자열로 인식하는 처리를 포함할 수 있다. 음성 인식 결과 획득된 텍스트는 자연어 형태의 문장 형태, 워드 형태, 또는 구 형태를 가질 수 있다. 다만, 이에 한정되는 것은 아니다.The speech recognition process may be a process of obtaining text data corresponding to an audio signal. Speech recognition processing may include speech-to-text (STT) processing. For example, the voice recognition process may include a process of recognizing a voice signal uttered by a user as a character string. The text obtained as a result of voice recognition may have a natural language sentence form, word form, or phrase form. However, it is not limited thereto.

서버(1000)는 음성 인식 결과에 기초하여, 특정 동작 또는 기능을 수행할 수 있다. 또는, 서버(1000)는 음성 인식 결과(예를 들어, 서버에서 획득된 텍스트 데이터)를 전자 장치(100) 또는 다른 서버로 전송할 수 있다.The server 1000 may perform a specific operation or function based on the voice recognition result. Alternatively, the server 1000 may transmit a voice recognition result (eg, text data obtained from the server) to the electronic device 100 or another server.

전자 장치(100)는 전자 장치(100)에서 음성 인식 처리를 수행하여 획득한 음성 인식 결과 또는 서버(1000)로부터 수신한 음성 인식 결과에 기초하여, 특정 동작 또는 기능을 수행할 수 있다. 이때, 일 실시예에 따른 전자 장치(100)는 실행 중인 어플리케이션의 상태 정보에 기초하여, 음성 인식 결과에 대응하는 특정 동작 또는 기능을 수행할 수 있다.The electronic device 100 may perform a specific operation or function based on a voice recognition result obtained by performing voice recognition processing in the electronic device 100 or a voice recognition result received from the server 1000 . In this case, the electronic device 100 according to an embodiment may perform a specific operation or function corresponding to a voice recognition result based on state information of an application being executed.

또는, 다른 서버가 음성 인식 결과를 수신하는 경우, 다른 서버는 음성 인식 결과에 기초하여, 특정 기능을 수행하거나, 다른 전자 장치가 특정 기능을 수행하도록 제어할 수 있다.Alternatively, when another server receives a voice recognition result, the other server may perform a specific function based on the voice recognition result or control another electronic device to perform a specific function.

도 11은 일 실시예에 따른 전자 장치의 구성을 나타내는 블록도이다.11 is a block diagram illustrating a configuration of an electronic device according to an exemplary embodiment.

도 11을 참조하면, 일 실시예에 따른 전자 장치(100)는 마이크로폰(110), 프로세서(120), 메모리(130), 통신부(140) 및 디스플레이(150)를 포함할 수 있다.Referring to FIG. 11 , an electronic device 100 according to an embodiment may include a microphone 110, a processor 120, a memory 130, a communication unit 140, and a display 150.

일 실시예에 따른 마이크로폰(110)은 외부 장치 또는 화자(전자 장치(100)의 사용자)로부터 음향 신호를 수신할 수 있다. 일 실시예에 따른 마이크로폰(110)은 사용자 발화에 대한 음성을 수신할 수 있다. 마이크로폰(110)은, 외부의 음향 신호를 입력 받아 전기적인 신호(오디오 데이터)로 변환할 수 있다. 마이크로폰(110)은 외부의 음향 신호를 입력 받는 과정에서 발생 되는 잡음(noise)를 제거하기 위한 다양한 잡음 제거 알고리즘을 이용할 수 있다.According to one embodiment The microphone 110 may receive a sound signal from an external device or a speaker (a user of the electronic device 100). The microphone 110 according to an embodiment may receive a voice for a user's speech. The microphone 110 may receive an external sound signal and convert it into an electrical signal (audio data). The microphone 110 may use various noise cancellation algorithms for removing noise generated in the process of receiving an external sound signal.

일 실시예에 따른 통신부(140)는 Wi-Fi 모듈, 블루투스 모듈, 적외선 통신 모듈 및 무선 통신 모듈, LAN 모듈, 이더넷(Ethernet) 모듈, 유선 통신 모듈 등을 포함할 수 있다. 이때, 각 통신 모듈은 적어도 하나의 하드웨어 칩 형태로 구현될 수 있다.The communication unit 140 according to an embodiment may include a Wi-Fi module, a Bluetooth module, an infrared communication module, a wireless communication module, a LAN module, an Ethernet module, a wired communication module, and the like. At this time, each communication module may be implemented in the form of at least one hardware chip.

Wi-Fi 모듈, 블루투스 모듈은 각각 Wi-Fi 방식, 블루투스 방식으로 통신을 수행한다. Wi-Fi 모듈이나 블루투스 모듈을 이용하는 경우에는 SSID 및 세션 키 등과 같은 각종 연결 정보를 먼저 송수신하고, 이를 이용하여 통신 연결한 후 각종 정보들을 송수신할 수 있다. 무선 통신 모듈은 지그비(zigbee), 3G(3^rdGeneration), 3GPP(3^rd Generation Partnership Project), LTE(Long Term Evolution), LTE-A(LTE Advanced), 4G(4^th Generation), 5G(5^th Generation) 등과 같은 다양한 무선 통신 규격에 따라 통신을 수행하는 적어도 하나의 통신 칩을 포함할 수 있다.The Wi-Fi module and the Bluetooth module perform communication using the Wi-Fi method and the Bluetooth method, respectively. In the case of using a Wi-Fi module or a Bluetooth module, various connection information such as an SSID and a session key is first transmitted and received, and various kinds of information can be transmitted and received after communication is connected using the same. The wireless communication module includes zigbee, 3 ^rd Generation (3G), 3 ^rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), LTE Advanced (LTE-A), 4 ^th Generation (4G), and 5G (5 ^th Generation) may include at least one communication chip that performs communication according to various wireless communication standards.

일 실시예에 따른 통신부(140)는 제어 장치(200)로부터 사용자 음성 입력을 수신할 수 있다.The communication unit 140 according to an embodiment may receive a user voice input from the control device 200 .

일 실시예에 따른 프로세서(120)는 전자 장치(100)의 전반적인 동작 및 전자 장치(100)의 내구 구성 요소들 사이의 신호 흐름을 제어하고, 데이터를 처리하는 기능을 수행한다.The processor 120 according to an embodiment controls overall operation of the electronic device 100 and signal flow between durable components of the electronic device 100 and processes data.

프로세서(120)는 싱글 코어, 듀얼 코어, 트리플 코어, 쿼드 코어 및 그 배수의 코어를 포함할 수 있다. 또한, 프로세서(120)는 복수의 프로세서를 포함할 수 있다. 예를 들어, 프로세서(120)는 메인 프로세서(main processor, 도시되지 아니함) 및 슬립 모드(sleep mode)에서 동작하는 서브 프로세서(sub processor, 도시되지 아니함)로 구현될 수 있다.The processor 120 may include a single core, a dual core, a triple core, a quad core, and multiple cores thereof. Also, the processor 120 may include a plurality of processors. For example, the processor 120 may be implemented as a main processor (not shown) and a sub processor (not shown) operating in a sleep mode.

또한, 프로세서(120)는 CPU(Cetral Processing Unit), GPU (Graphic Processing Unit) 및 VPU(Video Processing Unit) 중 적어도 하나를 포함할 수 있다. 또는, 실시예에 따라, CPU, GPU 및 VPU 중 적어도 하나를 통합한 SoC(System On Chip) 형태로 구현될 수 있다.Also, the processor 120 It may include at least one of a Central Processing Unit (CPU), a Graphic Processing Unit (GPU), and a Video Processing Unit (VPU). Alternatively, according to embodiments, it may be implemented in a system on chip (SoC) form in which at least one of a CPU, a GPU, and a VPU is integrated.

일 실시예에 따른 메모리(130)는 디스플레이 장치(100)를 구동하고 제어하기 위한 다양한 데이터, 프로그램 또는 어플리케이션을 저장할 수 있다.The memory 130 according to an embodiment may store various data, programs, or applications for driving and controlling the display device 100 .

또한, 메모리(130)에 저장되는 프로그램은 하나 이상의 인스트럭션들을 포함할 수 있다. 메모리(130)에 저장된 프로그램(하나 이상의 인스트럭션들) 또는 어플리케이션은 프로세서(120)에 의해 실행될 수 있다.Also, a program stored in memory 130 may include one or more instructions. A program (one or more instructions) or application stored in memory 130 may be executed by processor 120 .

일 실시예에 따른 프로세서(120)는 도 3 내지 6에서 설명한 화면 분석부 및 어플리케이션 상태 결정부 중 적어도 하나를 포함할 수 있다.The processor 120 according to an embodiment may include at least one of the screen analysis unit and the application state determination unit described with reference to FIGS. 3 to 6 .

일 실시예에 따른 프로세서(120)는 어플리케이션에 대한 정보에 기초하여, 실행 화면을 분석함으로써, 실행 화면에 대한 화면 정보를 획득할 수 있다. 이때, 어플리케이션에 대한 정보는, 어플리케이션에 포함된 하나 이상의 아이템들의 종류, 아이템들의 크기 정보, 아이템들의 위치 정보, 하이라이트 여부에 따른 아이템들의 픽셀 값 정보 중 적어도 하나를 포함할 수 있다. The processor 120 according to an embodiment may obtain screen information on the execution screen by analyzing the execution screen based on information about the application. In this case, the information about the application may include at least one of types of one or more items included in the application, size information of the items, location information of the items, and pixel value information of the items according to whether or not they are highlighted.

또한, 프로세서(120)는 어플리케이션에 대한 정보에 기초하여, 현재 실행 화면에 포함되는 바운딩 박스들을 검출하고, 검출된 바운딩 박스들 각각에 대응하는 아이템 정보, 검출된 바운딩 박스가 하이라이트된 아이템인지 여부 등을 포함하는 화면 정보를 획득할 수 있다.In addition, the processor 120 detects bounding boxes included in the current execution screen based on information about the application, item information corresponding to each of the detected bounding boxes, whether the detected bounding box is a highlighted item, and the like. Screen information including may be obtained.

프로세서(120)는 화면 정보를 뉴럴 네트워크에 입력함으로써, 어플리케이션 상태 정보를 획득할 수 있다. 일 실시예에 따른 뉴럴 네트워크는 어플리케이션에 대한 화면 정보와 화면 정보에 대응하는 어플리케이션 상태 정보를 포함하는 복수의 훈련 데이터들을 이용하여, 훈련될 수 있다.The processor 120 may obtain application state information by inputting screen information to the neural network. A neural network according to an embodiment may be trained using a plurality of training data including screen information about an application and application state information corresponding to the screen information.

프로세서(120)는 획득한 화면 정보를 뉴럴 네트워크에 입력함으로써, 어플리케이션 상태 정보를 출력으로 획득할 수 있다. 일 실시예에 따른 어플리케이션 상태 정보는, 현재 어플리케이션 실행 화면에 포함되는 아이템들에 대한 정보, 아이템들 중 선택된 아이템에 대한 정보 등을 포함할 수 있다.The processor 120 may obtain application state information as an output by inputting the obtained screen information to the neural network. Application state information according to an embodiment may include information on items included in a current application execution screen, information on a selected item among items, and the like.

프로세서(120)는 어플리케이션 상태 정보에 기초하여, 수신한 사용자 음성 입력에 대응하는 동작을 수행할 수 있다. 예를 들어, 프로세서(120)는 현재 어플리케이션 실행 화면에 사용자 음성 입력에 대응하는 아이템이 포함되는지 여부, 실행 화면에서 선택된 아이템과 사용자 음성 입력에 대응하는 아이템 사이의 위치 관계 등에 기초하여, 사용자 음성 입력에 대응하는 동작을 결정하고, 수행할 수 있다.The processor 120 may perform an operation corresponding to the received user voice input based on application state information. For example, the processor 120 inputs the user's voice based on whether the current application execution screen includes an item corresponding to the user's voice input, the positional relationship between the item selected on the execution screen and the item corresponding to the user's voice input, and the like. An operation corresponding to may be determined and performed.

일 실시예에 따른 디스플레이(150)는, 프로세서(1210)에서 처리된 영상 신호, 데이터 신호, OSD 신호, 제어 신호 등을 변환하여 구동 신호를 생성한다. 디스플레이(150)는 PDP, LCD, OLED, 플렉시블 디스플레이(flexible display)등으로 구현될 수 있으며, 또한, 3차원 디스플레이(3D display)로 구현될 수 있다. 또한, 디스플레이(150)는, 터치 스크린으로 구성되어 출력 장치 이외에 입력 장치로 사용되는 것도 가능하다.The display 150 according to an embodiment converts an image signal, a data signal, an OSD signal, a control signal, and the like processed by the processor 1210 to generate a driving signal. The display 150 may be implemented as a PDP, LCD, OLED, flexible display, or the like, and may also be implemented as a 3D display. Also, the display 150 may be configured as a touch screen and used as an input device other than an output device.

일 실시예에 따른 디스플레이(150)는 어플리케이션 실행 화면을 표시할 수 있다.The display 150 according to an embodiment may display an application execution screen.

도 12는 다른 실시예에 따른 전자 장치의 구성을 나타내는 블록도이다.12 is a block diagram illustrating the configuration of an electronic device according to another embodiment.

도 12를 참조하면, 도 12의 전자 장치(1200)는 도 1 내지 도 11을 참조하여 설명한 전자 장치(100)의 일 실시예일 수 있다.Referring to FIG. 12 , the electronic device 1200 of FIG. 12 may be an embodiment of the electronic device 100 described with reference to FIGS. 1 to 11 .

도 12를 참조하면, 일 실시예에 따른 전자 장치(1200)는, 튜너부(1240), 프로세서(1210), 디스플레이(1220), 통신부(1250), 감지부(1230), 입/출력부(1270), 비디오 처리부(1280), 오디오 처리부(1285), 오디오 출력부(1260), 메모리(1290), 전원부(1295)를 포함할 수 있다.Referring to FIG. 12 , an electronic device 1200 according to an embodiment includes a tuner unit 1240, a processor 1210, a display 1220, a communication unit 1250, a sensing unit 1230, an input/output unit ( 1270), a video processing unit 1280, an audio processing unit 1285, an audio output unit 1260, a memory 1290, and a power supply unit 1295.

도 12의 마이크로폰(1231)은 도 11의 마이크로폰(110)에, 도 12의 통신부(1250)는 도 11의 통신부(140)에, 도 12의 프로세서(1210)는, 도 11의 프로세서(120)에 도 12의 메모리(1290)는 도 11의 메모리(130)에 도 12의 디스플레이(1220)는 도 11의 디스플레이(150)에 대응하는 구성이다. 따라서, 앞에서 설명한 내용과 동일한 내용은 생략하기로 한다.The microphone 1231 of FIG. 12 is connected to the microphone 110 of FIG. 11, the communication unit 1250 of FIG. 12 is connected to the communication unit 140 of FIG. 11, the processor 1210 of FIG. 12 is the processor 120 of FIG. The memory 1290 of FIG. 12 corresponds to the memory 130 of FIG. 11 and the display 1220 of FIG. 12 corresponds to the display 150 of FIG. Therefore, the same contents as those described above will be omitted.

일 실시예에 따른 튜너부(1240)는 유선 또는 무선으로 수신되는 방송 신호를 증폭(amplification), 혼합(mixing), 공진(resonance)등을 통하여 많은 전파 성분 중에서 방송 수신 장치(100)에서 수신하고자 하는 채널의 주파수만을 튜닝(tuning)시켜 선택할 수 있다. 방송 신호는 오디오(audio), 비디오(video) 및 부가 정보(예를 들어, EPG(Electronic Program Guide))를 포함한다.The tuner unit 1240 according to an embodiment attempts to receive a broadcast signal received by wire or wirelessly in the broadcast reception device 100 among many radio wave components through amplification, mixing, resonance, and the like. It can be selected by tuning only the frequency of the desired channel. The broadcast signal includes audio, video, and additional information (eg, Electronic Program Guide (EPG)).

튜너부(1240)는 지상파 방송, 케이블 방송, 위성 방송, 인터넷 방송 등과 같이 다양한 소스로부터 방송 신호를 수신할 수 있다. 튜너부(1240)는 아날로그 방송 또는 디지털 방송 등과 같은 소스로부터 방송 신호를 수신할 수도 있다.The tuner unit 1240 may receive broadcast signals from various sources such as terrestrial broadcasting, cable broadcasting, satellite broadcasting, and Internet broadcasting. The tuner unit 1240 may receive a broadcasting signal from a source such as analog broadcasting or digital broadcasting.

감지부(1230)는 사용자의 음성, 사용자의 영상 또는 사용자의 인터랙션을 감지하며, 마이크로폰(1231), 카메라부(1232) 및 광 수신부(1233)를 포함할 수 있다.The sensing unit 1230 detects a user's voice, a user's video, or a user's interaction, and may include a microphone 1231, a camera unit 1232, and a light receiving unit 1233.

마이크로폰(1231)은 사용자의 발화(utterance)된 음성을 수신한다. 마이크로폰(1231)은 수신된 음성을 전기 신호로 변환하여 프로세서(1210)로 출력할 수 있다. 사용자 음성은 예를 들어, 전자 장치(1200)의 메뉴 또는 기능에 대응되는 음성을 포함할 수 있다.The microphone 1231 receives the user's utterance. The microphone 1231 may convert the received voice into an electrical signal and output it to the processor 1210 . The user's voice may include, for example, a voice corresponding to a menu or function of the electronic device 1200 .

카메라부(1232)는 카메라 인식 범위에서 제스처를 포함하는 사용자의 모션에 대응되는 영상(예를 들어, 연속되는 프레임)을 수신할 수 있다. 프로세서(1210)는 수신된 모션의 인식 결과를 이용하여 전자 장치(1200)에 표시되는 메뉴를 선택하거나 모션 인식 결과에 대응되는 제어를 할 수 있다. The camera unit 1232 may receive an image (eg, continuous frames) corresponding to a user's motion including a gesture within the camera recognition range. The processor 1210 may select a menu displayed on the electronic device 1200 or control corresponding to the motion recognition result by using the received motion recognition result.

광 수신부(1233)는 외부의 제어 장치에서부터 수신되는 광 신호(제어 신호를 포함)를 디스플레이(1220)의 베젤의 광창(도시되지 아니함) 등을 통해 수신한다. 광 수신부(1233)는 제어 장치로부터 사용자 입력(예를 들어, 터치, 눌림, 터치 제스처, 음성, 또는 모션)에 대응되는 광 신호를 수신할 수 있다. 수신된 광 신호로부터 프로세서(1210)의 제어에 의해 제어 신호가 추출될 수 있다.The light receiver 1233 receives light signals (including control signals) received from an external control device through a light window (not shown) of a bezel of the display 1220 . The light receiving unit 1233 may receive an optical signal corresponding to a user input (eg, touch, pressure, touch gesture, voice, or motion) from the control device. A control signal may be extracted from the received optical signal under control of the processor 1210 .

입/출력부(1270)는 프로세서(1210)의 제어에 의해 전자 장치(1200)의 외부에서부터 비디오(예를 들어, 동영상 등), 오디오(예를 들어, 음성, 음악 등) 및 부가 정보(예를 들어, EPG 등) 등을 수신한다. 입출력 인터페이스는 HDMI (High-Definition Multimedia Interface), MHL(Mobile High-Definition Link), USB(Universal Serial Bus), DP(Display Port), 썬더볼트(Thunderbolt), VGA(Video Graphics Array) 포트, RGB 포트, D-SUB(D-subminiature), DVI(Digital Visual Interface), 컴포넌트 잭(component jack), PC 포트(PC port) 중 어느 하나를 포함할 수 있다.The input/output unit 1270 provides video (eg, motion pictures, etc.), audio (eg, voice, music, etc.) and additional information (eg, video) from the outside of the electronic device 1200 under the control of the processor 1210. For example, EPG, etc.) and the like are received. Input and output interfaces include HDMI (High-Definition Multimedia Interface), MHL (Mobile High-Definition Link), USB (Universal Serial Bus), DP (Display Port), Thunderbolt, VGA (Video Graphics Array) port, RGB port , D-subminiature (D-SUB), digital visual interface (DVI), component jack, and PC port.

프로세서(1210)는 전자 장치(1400)의 전반적인 동작 및 전자 장치(1200)의 내부 구성 요소들 사이의 신호 흐름을 제어하고, 데이터를 처리하는 기능을 수행한다. 프로세서(1210)는 사용자의 입력이 있거나 기 설정되어 저장된 조건을 만족하는 경우, 메모리(1290)에 저장된 OS(Operation System) 및 다양한 애플리케이션을 실행할 수 있다.The processor 1210 controls overall operation of the electronic device 1400 and signal flow between internal components of the electronic device 1200 and processes data. The processor 1210 may execute an operation system (OS) and various applications stored in the memory 1290 when there is a user's input or when a pre-set stored condition is satisfied.

프로세서(1210)는 전자 장치(1200)의 외부에서부터 입력되는 신호 또는 데이터를 저장하거나, 전자 장치(1200)에서 수행되는 다양한 작업에 대응되는 저장 영역으로 사용되는 램(RAM), 전자 장치(1200)의 제어를 위한 제어 프로그램이 저장된 롬(ROM) 및 프로세서(Processor)를 포함할 수 있다.The processor 1210 stores signals or data input from the outside of the electronic device 1200, or RAM used as a storage area corresponding to various tasks performed in the electronic device 1200, the electronic device 1200 It may include a ROM and a processor in which a control program for control of is stored.

비디오 처리부(1280)는, 전자 장치(1200)가 수신한 비디오 데이터에 대한 처리를 수행한다. 비디오 처리부(1280)에서는 비디오 데이터에 대한 디코딩, 스케일링, 노이즈 필터링, 프레임 레이트 변환, 해상도 변환 등과 같은 다양한 이미지 처리를 수행할 수 있다.The video processor 1280 processes video data received by the electronic device 1200 . The video processing unit 1280 may perform various image processing such as decoding, scaling, noise filtering, frame rate conversion, and resolution conversion on video data.

오디오 처리부(1285)는 오디오 데이터에 대한 처리를 수행한다. 오디오 처리부(1285)에서는 오디오 데이터에 대한 디코딩이나 증폭, 노이즈 필터링 등과 같은 다양한 처리가 수행될 수 있다. 한편, 오디오 처리부(1285)는 복수의 컨텐츠에 대응되는 오디오를 처리하기 위해 복수의 오디오 처리 모듈을 구비할 수 있다.The audio processing unit 1285 processes audio data. The audio processing unit 1285 may perform various processes such as decoding or amplifying audio data and filtering noise. Meanwhile, the audio processing unit 1285 may include a plurality of audio processing modules to process audio corresponding to a plurality of contents.

오디오 출력부(1260)는 프로세서(1210)의 제어에 의해 튜너부(1240)를 통해 수신된 방송 신호에 포함된 오디오를 출력한다. 오디오 출력부(1260)는 통신부(1250) 또는 입/출력부(1270)를 통해 입력되는 오디오(예를 들어, 음성, 사운드)를 출력할 수 있다. 또한, 오디오 출력부(1260)는 프로세서(1210)의 제어에 의해 메모리(1290)에 저장된 오디오를 출력할 수 있다. 오디오 출력부(1260)는 스피커, 헤드폰 출력 단자 또는 S/PDIF(Sony/Philips Digital Interface: 출력 단자 중 적어도 하나를 포함할 수 있다.The audio output unit 1260 outputs audio included in the broadcast signal received through the tuner unit 1240 under the control of the processor 1210 . The audio output unit 1260 may output audio (eg, voice, sound) input through the communication unit 1250 or the input/output unit 1270 . Also, the audio output unit 1260 may output audio stored in the memory 1290 under the control of the processor 1210 . The audio output unit 1260 may include at least one of a speaker, a headphone output terminal, or a Sony/Philips Digital Interface (S/PDIF) output terminal.

전원부(1295)는 프로세서(1210)의 제어에 의해 전자 장치(1200) 내부의 구성 요소들로 외부의 전원 소스에서부터 입력되는 전원을 공급한다. 또한, 전원부(1295)는 프로세서(1210)의 제어에 의해 전자 장치(1200) 내부에 위치하는 하나 또는 둘 이상의 배터리(도시되지 아니함)에서부터 출력되는 전원을 내부의 구성 요소들에게 공급할 수 있다.The power supply unit 1295 supplies power input from an external power source to internal components of the electronic device 1200 under the control of the processor 1210 . In addition, the power supply unit 1295 may supply power output from one or more batteries (not shown) located inside the electronic device 1200 to internal components under the control of the processor 1210 .

메모리(1290)는 프로세서(1210)의 제어에 의해 전자 장치(1200)를 구동하고 제어하기 위한 다양한 데이터, 프로그램 또는 어플리케이션을 저장할 수 있다. 메모리(1290)는 도시되지 아니한 방송 수신 모듈, 채널 제어 모듈, 볼륨 제어 모듈, 통신 제어 모듈, 음성 인식 모듈, 모션 인식 모듈, 광 수신 모듈, 디스플레이 제어 모듈, 오디오 제어 모듈, 외부 입력 제어 모듈, 전원 제어 모듈, 무선(예를 들어, 블루투스)으로 연결되는 외부 장치의 전원 제어 모듈, 음성 데이터베이스(DB), 또는 모션 데이터베이스(DB)를 포함할 수 있다. 메모리(1290)의 도시되지 아니한 모듈들 및 데이터 베이스는 전자 장치(1200)에서 방송 수신의 제어 기능, 채널 제어 기능, 볼륨 제어 기능, 통신 제어 기능, 음성 인식 기능, 모션 인식 기능, 광 수신 제어 기능, 디스플레이 제어 기능, 오디오 제어 기능, 외부 입력 제어 기능, 전원 제어 기능 또는 무선(예를 들어, 블루투스)으로 연결되는 외부 장치의 전원 제어 기능을 수행하기 위하여 소프트웨어 형태로 구현될 수 있다. 프로세서(1210)는 메모리(1290)에 저장된 이들 소프트웨어를 이용하여 각각의 기능을 수행할 수 있다.The memory 1290 may store various data, programs, or applications for driving and controlling the electronic device 1200 under the control of the processor 1210 . The memory 1290 includes a broadcast reception module (not shown), a channel control module, a volume control module, a communication control module, a voice recognition module, a motion recognition module, an optical reception module, a display control module, an audio control module, an external input control module, and a power supply. It may include a control module, a power control module of an external device connected wirelessly (eg, Bluetooth), a voice database (DB), or a motion database (DB). Not shown modules and database of the memory 1290 include a broadcast reception control function, a channel control function, a volume control function, a communication control function, a voice recognition function, a motion recognition function, and a light reception control function in the electronic device 1200. , A display control function, an audio control function, an external input control function, a power control function, or a power control function of an external device connected wirelessly (eg, Bluetooth) may be implemented in the form of software. The processor 1210 may perform each function using these software stored in the memory 1290.

한편, 도 11 및 도 12에 도시된 전자 장치(100, 1200)의 블록도는 일 실시예를 위한 블록도이다. 블록도의 각 구성요소는 실제 구현되는 전자 장치(100, 1200)의 사양에 따라 통합, 추가, 또는 생략될 수 있다. 즉, 필요에 따라 2 이상의 구성요소가 하나의 구성요소로 합쳐지거나, 혹은 하나의 구성요소가 2 이상의 구성요소로 세분되어 구성될 수 있다. 또한, 각 블록에서 수행하는 기능은 실시예들을 설명하기 위한 것이며, 그 구체적인 동작이나 장치는 본 발명의 권리범위를 제한하지 아니한다.Meanwhile, the block diagrams of the electronic devices 100 and 1200 shown in FIGS. 11 and 12 are block diagrams for one embodiment. Each component of the block diagram may be integrated, added, or omitted according to specifications of the electronic device 100 or 1200 that is actually implemented. That is, if necessary, two or more components may be combined into one component, or one component may be subdivided into two or more components. In addition, the functions performed in each block are for explaining the embodiments, and the specific operation or device does not limit the scope of the present invention.

일 실시예에 따른 전자 장치의 동작방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.An operating method of an electronic device according to an embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the medium may be those specially designed and configured for the present invention or those known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.

또한, 개시된 실시예들에 따른 전자 장치의 동작방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다.In addition, the operating method of the electronic device according to the disclosed embodiments may be included in a computer program product and provided. Computer program products may be traded between sellers and buyers as commodities.

컴퓨터 프로그램 제품은 S/W 프로그램, S/W 프로그램이 저장된 컴퓨터로 읽을 수 있는 저장 매체를 포함할 수 있다. 예를 들어, 컴퓨터 프로그램 제품은 전자 장치의 제조사 또는 전자 마켓(예, 구글 플레이 스토어, 앱 스토어)을 통해 전자적으로 배포되는 S/W 프로그램 형태의 상품(예, 다운로더블 앱)을 포함할 수 있다. 전자적 배포를 위하여, S/W 프로그램의 적어도 일부는 저장 매체에 저장되거나, 임시적으로 생성될 수 있다. 이 경우, 저장 매체는 제조사의 서버, 전자 마켓의 서버, 또는 SW 프로그램을 임시적으로 저장하는 중계 서버의 저장매체가 될 수 있다.A computer program product may include a S/W program and a computer-readable storage medium in which the S/W program is stored. For example, a computer program product may include a product in the form of a S/W program (eg, a downloadable app) that is distributed electronically through a manufacturer of an electronic device or an electronic marketplace (eg, Google Play Store, App Store). there is. For electronic distribution, at least a part of the S/W program may be stored in a storage medium or temporarily generated. In this case, the storage medium may be a storage medium of a manufacturer's server, an electronic market server, or a relay server temporarily storing SW programs.

컴퓨터 프로그램 제품은, 서버 및 클라이언트 장치로 구성되는 시스템에서, 서버의 저장매체 또는 클라이언트 장치의 저장매체를 포함할 수 있다. 또는, 서버 또는 클라이언트 장치와 통신 연결되는 제3 장치(예, 스마트폰)가 존재하는 경우, 컴퓨터 프로그램 제품은 제3 장치의 저장매체를 포함할 수 있다. 또는, 컴퓨터 프로그램 제품은 서버로부터 클라이언트 장치 또는 제3 장치로 전송되거나, 제3 장치로부터 클라이언트 장치로 전송되는 S/W 프로그램 자체를 포함할 수 있다.A computer program product may include a storage medium of a server or a storage medium of a client device in a system composed of a server and a client device. Alternatively, if there is a third device (eg, a smart phone) that is communicatively connected to the server or the client device, the computer program product may include a storage medium of the third device. Alternatively, the computer program product may include a S/W program itself transmitted from the server to the client device or the third device or from the third device to the client device.

이 경우, 서버, 클라이언트 장치 및 제3 장치 중 하나가 컴퓨터 프로그램 제품을 실행하여 개시된 실시예들에 따른 방법을 수행할 수 있다. 또는, 서버, 클라이언트 장치 및 제3 장치 중 둘 이상이 컴퓨터 프로그램 제품을 실행하여 개시된 실시예들에 따른 방법을 분산하여 실시할 수 있다.In this case, one of the server, the client device and the third device may execute the computer program product to perform the method according to the disclosed embodiments. Alternatively, two or more of the server, the client device, and the third device may execute the computer program product to implement the method according to the disclosed embodiments in a distributed manner.

예를 들면, 서버(예로, 클라우드 서버 또는 인공 지능 서버 등)가 서버에 저장된 컴퓨터 프로그램 제품을 실행하여, 서버와 통신 연결된 클라이언트 장치가 개시된 실시예들에 따른 방법을 수행하도록 제어할 수 있다.For example, a server (eg, a cloud server or an artificial intelligence server) may execute a computer program product stored in the server to control a client device communicatively connected to the server to perform a method according to the disclosed embodiments.

이상에서 실시예들에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속한다.Although the embodiments have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concept of the present invention defined in the following claims are also within the scope of the present invention. belongs to

Claims

In electronic devices,
a memory that stores one or more instructions; and
a processor to execute the one or more instructions stored in the memory;
the processor,
Receiving a user voice input based on the execution screen of the application;
Obtaining screen information on the execution screen through analysis of the execution screen based on the information on the application;
Obtaining application state information by inputting the acquired screen information to a neural network;
An electronic device that performs an operation corresponding to the user voice input based on the application state information.

According to claim 1,
Information about the application,
The electronic device comprising at least one of types of one or more items included in the application, size information of the items, location information of the items, and pixel value information of the items according to whether or not they are highlighted.

According to claim 1,
The screen information is
The electronic device comprising at least one of information about bounding boxes included in the application execution screen, whether or not the bounding boxes are highlighted, and whether or not items included in the application are included in the execution screen.

According to claim 1,
The application state information,
An electronic device comprising information on items included in the execution screen and information on a selected item among the items.

According to claim 4,
the processor,
If an item corresponding to the user's voice input is included in the execution screen, the electronic device performs an operation corresponding to the user's voice input.

According to claim 4,
the processor,
An electronic device that performs an operation corresponding to the user voice input based on a positional relationship between the selected item on the execution screen and the item corresponding to the user voice input.

According to claim 1,
The electronic device,
Including more displays,
The processor controls the display to display the execution screen.

According to claim 1,
The electronic device,
The electronic device further comprising a microphone for receiving the user voice input.

According to claim 1,
The electronic device,
The electronic device further comprising a communication unit receiving the user voice input.

In the method of operating an electronic device,
Receiving a user voice input based on an execution screen of an application;
obtaining screen information on the execution screen through analysis of the execution screen based on the information on the application;
acquiring application state information by inputting the acquired screen information to a neural network; and
and performing an operation corresponding to the user voice input based on the application state information.

According to claim 10,
Information about the application,
and at least one of a type of one or more items included in the application, size information of the items, location information of the items, and pixel value information of the items according to whether or not they are highlighted.

According to claim 10,
The screen information is
The method of operating an electronic device, comprising at least one of information about bounding boxes included in the application execution screen, whether or not the bounding boxes are highlighted, and whether or not items included in the application are included in the execution screen.

According to claim 10,
The application state information,
The method of operating an electronic device comprising information on items included in the execution screen and information on a selected item among the items.

According to claim 13,
The step of performing an operation corresponding to the user voice input based on the application state information,
and performing an operation corresponding to the user voice input when an item corresponding to the user voice input is included in the execution screen.

According to claim 13,
The step of performing an operation corresponding to the user voice input based on the application state information,
and performing an operation corresponding to the user voice input based on a positional relationship between the selected item and the item corresponding to the user voice input on the execution screen.

According to claim 10,
The method of operation is
The method of operating the electronic device further comprising displaying the execution screen.

According to claim 10,
Receiving the user's voice input,
A method of operating an electronic device comprising receiving the user voice input through a microphone.

According to claim 10,
Receiving the user's voice input,
A method of operating an electronic device comprising receiving the user's voice input through a communication unit.

A recording medium readable by one or more computers in which a program for performing the method of claim 10 is stored.