WO2023075118A1 - Dispositif électronique et son procédé de fonctionnement - Google Patents

Dispositif électronique et son procédé de fonctionnement Download PDF

Info

Publication number
WO2023075118A1
WO2023075118A1 PCT/KR2022/013102 KR2022013102W WO2023075118A1 WO 2023075118 A1 WO2023075118 A1 WO 2023075118A1 KR 2022013102 W KR2022013102 W KR 2022013102W WO 2023075118 A1 WO2023075118 A1 WO 2023075118A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
electronic device
application
execution screen
screen
Prior art date
Application number
PCT/KR2022/013102
Other languages
English (en)
Korean (ko)
Inventor
이수명
아가르왈데벤드라
Original Assignee
삼성전자 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 삼성전자 주식회사 filed Critical 삼성전자 주식회사
Publication of WO2023075118A1 publication Critical patent/WO2023075118A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • Various embodiments relate to an electronic device and an operating method thereof. More particularly, the present invention relates to an electronic device for identifying a state of an application being executed using a neural network and an operating method thereof.
  • Human-level intelligence which is an artificial intelligence system, and a machine that learns and judges itself, and the recognition rate improves as it is used.
  • Artificial intelligence technology consists of machine learning (deep learning) technology using an algorithm that classifies/learns the characteristics of input data by itself, and elemental technologies that mimic functions such as recognition and judgment of the human brain using machine learning algorithms.
  • Elemental technologies include, for example, linguistic understanding technology that recognizes human language/characters, visual understanding technology that recognizes objects as human eyes, reasoning/prediction technology that logically infers and predicts information by judging information, and human experience information. It may include at least one of a knowledge expression technology for processing as knowledge data and a motion control technology for controlling the autonomous driving of a vehicle and the movement of a robot.
  • Linguistic understanding is a technology for recognizing and applying/processing human language/text, and includes natural language processing, machine translation, dialogue system, question and answering, voice recognition/synthesis, and the like.
  • Speech recognition refers to a process of converting an audio signal obtained through a sound sensor such as a microphone into text data such as words or sentences. Meanwhile, when a function or operation of an application is performed using voice recognition, a function or operation corresponding to a user's voice input is determined according to the state of the application, and thus the state of the application needs to be identified.
  • a method for identifying the state of the application a method of analyzing the application screen itself using an image classification network may be used.
  • a method of identifying the state of the application may be used by performing image processing on the entire screen of the application and comparing the image processing result with data describing all possible states of the application in detail.
  • the above methods have problems in that a large amount of memory is required and image processing takes a lot of time.
  • Disclosed embodiments may provide an electronic device capable of identifying an application state by analyzing an execution screen of the application, and an operating method thereof.
  • An electronic device includes a memory that stores one or more instructions, and a processor that executes the one or more instructions stored in the memory, wherein the processor receives a user voice input based on an execution screen of an application and , Screen information on the execution screen is acquired through analysis of the execution screen based on the information on the application, and application state information is obtained by inputting the acquired screen information to a neural network, and based on the application state information Thus, an operation corresponding to the user voice input may be performed.
  • Information about an application includes at least one of types of one or more items included in the application, size information of the items, location information of the items, and pixel value information of the items according to whether or not they are highlighted. can do.
  • Screen information includes at least one of information about bounding boxes included in the application execution screen, whether the bounding boxes are highlighted, and whether items included in the application are included in the execution screen. can do.
  • Application state information may include information on items included in the execution screen and information on a selected item among the items.
  • the processor may perform an operation corresponding to the user's voice input when an item corresponding to the user's voice input is included in the execution screen.
  • the processor may perform an operation corresponding to the user voice input based on a positional relationship between the selected item on the execution screen and the item corresponding to the user voice input.
  • the electronic device may further include a display, and the processor may control the display to display the execution screen.
  • An electronic device may further include a microphone for receiving the user's voice input.
  • the electronic device may further include a communication unit that receives the user's voice input.
  • An operating method of an electronic device includes receiving a user voice input based on an execution screen of an application, analyzing the execution screen based on information on the application, and thereby obtaining screen information on the execution screen. obtaining, by inputting the screen information to a neural network, obtaining application state information; and and performing an operation corresponding to the user voice input based on the application state information.
  • An electronic device may obtain screen information by analyzing an execution screen of an application based on application information, and identify a state of the application by using the obtained screen information and a pre-learned neural network. . Accordingly, the memory usage used to identify the state of the application can be reduced, and the processing speed can be increased.
  • FIG. 1 is a diagram illustrating an electronic device according to an exemplary embodiment.
  • FIG. 2 is a flowchart illustrating a method of operating an electronic device according to an exemplary embodiment.
  • FIG. 3 is a diagram for explaining an operation of obtaining application state information by an electronic device according to an exemplary embodiment.
  • FIG. 4 is a diagram for explaining an operation of obtaining screen information by a screen analyzer analyzing an application execution screen according to an exemplary embodiment.
  • FIG. 5 is a diagram for explaining an operation of an application state determination unit determining an application state based on screen information according to an exemplary embodiment.
  • FIG. 6 is a diagram for explaining a method for training a neural network according to an exemplary embodiment.
  • FIGS. 7A and 7B are diagrams for explaining an operation in which an electronic device performs an operation corresponding to a user voice input based on application state information, according to an exemplary embodiment.
  • FIG. 8 is a flowchart illustrating a method of performing an operation corresponding to a user voice input by an electronic device according to an exemplary embodiment.
  • FIG. 9 is a diagram for explaining a method of performing an operation corresponding to a user voice input based on application state information by an electronic device according to an exemplary embodiment.
  • FIG. 10 is a diagram illustrating a voice recognition system according to an exemplary embodiment.
  • FIG. 11 is a block diagram illustrating a configuration of an electronic device according to an exemplary embodiment.
  • FIG. 12 is a block diagram illustrating the configuration of an electronic device according to another embodiment.
  • the term "user” means a person who controls a system, function, or operation, and may include a developer, administrator, or installer.
  • 'image' or 'picture' may indicate a still image, a motion picture composed of a plurality of continuous still images (or frames), or a video.
  • FIG. 1 is a diagram illustrating an electronic device according to an exemplary embodiment.
  • an electronic device 100 may be an electronic device that receives a user's voice input and performs an operation corresponding to the received user's voice input.
  • the electronic device 100 includes TVs, set-top boxes, mobile phones, tablet PCs, digital cameras, camcorders, laptop computers, desktops, e-book readers, digital broadcasting terminals, personal digital assistants (PDAs), and portable multimedia devices (PMPs). Player), navigation, MP3 player, wearable device, etc. can be implemented in various forms.
  • the electronic device 100 is illustrated as including a display in FIG. 1, it is not limited thereto.
  • the electronic device 100 may be connected to a separate display device including a display through wired/wireless communication and transmit video/audio signals to the display device.
  • the electronic device 100 may be a fixed electronic device disposed at a fixed location or a mobile electronic device having a portable form, and may be a digital broadcasting receiver capable of receiving digital broadcasting.
  • the control device 200 may be implemented as various types of devices for controlling the electronic device 200, such as a remote controller or a mobile phone.
  • An application for controlling the electronic device 100 may be installed in the control device 200 , and the control device 200 may control the electronic device 100 using the installed application.
  • the control device 200 may control the electronic device using IR (Infrared), BT (Bluetooth), Wi-Fi, and the like.
  • the user may speak to the electronic device 100 or the control device 200, and the speaking causes the electronic device 100 to perform a specific function (eg, a hardware/software configuration included in the electronic device 100). It may include natural language to perform operation control, content search, etc.).
  • a specific function eg, a hardware/software configuration included in the electronic device 100. It may include natural language to perform operation control, content search, etc.
  • the electronic device 100 may convert a user's speech (analog voice signal) into a digital audio signal (audio data) using a built-in or external audio input module (eg, a microphone). .
  • a voice input application or a voice recognition application is installed in the control device 200, and the user can perform a voice input with the control device 200 using the corresponding application.
  • the user may speak to the control device 200, and the control device 200 may convert the user's speech (analog voice signal) into a digital audio signal (audio data) using a built-in or external microphone. .
  • the control device 200 may transmit the converted audio data to the electronic device 100 .
  • the control device 200 may transmit audio data to the electronic device 100 using BT (Bluetooth), Wi-Fi, or the like.
  • the electronic device 100 may receive audio data corresponding to a user's speech (user voice input) from the control device 200 through a communication module including a Bluetooth (BT) module or a Wi-Fi module.
  • BT Bluetooth
  • Wi-Fi Wireless Fidelity
  • the electronic device 100 When a user voice input is received, the electronic device 100 according to an embodiment may perform an operation or function corresponding to the user voice input. At this time, in order for the electronic device 100 to perform an operation or function corresponding to the user's voice input, it is necessary to identify the state of the currently running application.
  • the electronic device 100 displays an item selected from the current application execution screen and an item corresponding to the setting menu ( 40, a “setting” item), and based on the positional relationship between the selected item and the “setting” item 40, a key operation can be generated.
  • the electronic device 100 may generate a right key input 3 times to select the “settings” item 40 . Accordingly, the "setting" item 40 may be selected.
  • the electronic device 100 may highlight and display the selected item or display a focus, cursor, etc. on the selected item in order to distinguish the selected item from other items. However, it is not limited thereto, and the electronic device 100 may display the selected item in various ways.
  • the "movie" item 20 is selected in the current application execution screen displayed on the electronic device 100, and the "setting" item 40 is the second item to the right of the "movie” item 20.
  • the electronic device 100 may generate the right key input twice. Accordingly, the "setting" item 40 may be selected.
  • the electronic device 100 includes information on items included in the current application execution screen, information on a selected item among items included in the execution screen, and the like. application state information.
  • FIG. 2 is a flowchart illustrating a method of operating an electronic device according to an exemplary embodiment.
  • the electronic device 100 may receive a user's voice input (S210).
  • the electronic device 100 when the electronic device 100 receives a user's speech using an audio input module (eg, a microphone) built into or external to the electronic device 100, the user's voice is input from the audio input module.
  • an audio input module eg, a microphone
  • the electronic device 100 inputs the user's voice from the external device through a communication module (communication unit).
  • a communication module communication unit
  • the electronic device 100 may obtain screen information by analyzing the application execution screen (S220).
  • the electronic device 100 may obtain screen information on the execution screen by analyzing the execution screen based on information about the application.
  • the information about the application may include at least one of types of one or more items included in the application, size information of the items, location information of the items, and pixel value information of the items according to whether or not they are highlighted.
  • the electronic device 100 detects bounding boxes included in the current execution screen based on information about the application, and determines item information corresponding to each of the detected bounding boxes, whether or not the detected bounding box is a highlighted item, and the like. It is possible to obtain screen information including the screen information.
  • the electronic device 100 may acquire application state information by inputting screen information to the neural network (S230).
  • a neural network may be trained using a plurality of training data including screen information about an application and application state information corresponding to the screen information.
  • the electronic device 100 may obtain application state information as an output by inputting the obtained screen information to the neural network.
  • Application state information may include information on items included in a current application execution screen, information on a selected item among items, and the like.
  • the electronic device 100 may perform an operation corresponding to the received user voice input based on application state information (S240).
  • the electronic device 100 determines the user voice based on whether an item corresponding to the user's voice input is included in the current application execution screen, a positional relationship between an item selected on the execution screen and an item corresponding to the user's voice input, and the like. An action corresponding to the input may be determined and performed.
  • FIG. 3 is a diagram for explaining an operation of obtaining application state information by an electronic device according to an exemplary embodiment.
  • the electronic device 100 may obtain a current application execution screen 301 . Specifically, when the electronic device 100 includes a display and the application execution screen 301 is displayed on the display of the electronic device 100, the electronic device 100 captures the currently displayed application execution screen 301 and , the execution screen 301 can be obtained.
  • the electronic device 100 when the electronic device 100 does not include a display and the electronic device 100 transmits execution screen information (video data) to an external display device, the electronic device 100 based on the execution screen information, the application An execution screen 301 may be acquired. Alternatively, the electronic device 100 may receive the application execution screen 301 captured by the external display device from the external display device. However, it is not limited thereto.
  • the electronic device 100 may include a screen analyzer 310 .
  • the screen analyzer 310 may obtain screen information by analyzing the application execution screen 301 .
  • the screen analyzer 310 may include appropriate logic, circuits, interfaces, and/or codes that operate to obtain screen information from the application execution screen 301 .
  • the screen analyzer 310 may analyze the application execution screen 301 by using application information (application information).
  • the electronic device 100 may include application information for each of a plurality of applications, and may analyze the application execution screen 301 using application information corresponding to the application execution screen 301 .
  • application information may include at least one of types of one or more items included in the application, size information of items, location information of items, and pixel value information of items according to whether or not they are highlighted.
  • the application information may include information indicating that the corresponding application includes thumbnail items and menu items, the thumbnail items have a size of 100 x 200, and the menu items have a size of 30 x 30. Also, the application information may include information indicating that menu items are located in the upper region of the screen in a horizontal direction and thumbnail items are located in the central region of the screen. In addition, the application information may include information indicating that items have a pixel value of 10 or more in a gray scale and that a highlighted item has a pixel value of 220 or more in a gray scale. Also, the application information may include a list of items included in the application. The application information described above is just one example, and the application information may include various types of information according to applications.
  • FIG. 4 is a diagram for explaining an operation of obtaining screen information by a screen analyzer analyzing an application execution screen according to an exemplary embodiment.
  • the screen analyzer 310 may detect bounding boxes included in the execution screen 301 based on application information. Specifically, the screen analyzer 310 may detect the bounding boxes 411, 412, 413, and 414 corresponding to the menu item in the upper region of the execution screen 301 based on the location information of the menu item. . Also, the screen analyzer 310 may detect the bounding boxes 421 , 422 , 423 , and 424 corresponding to the thumbnail items in the central area of the execution screen 301 based on the location information of the thumbnail items.
  • the screen analyzer 310 identifies an item corresponding to each of the detected bounding boxes based on the list of items included in the application information, and determines whether the items included in the application are displayed on the execution screen 301. (item status information) can be determined. Also, based on pixel values of the detected bounding boxes, it is possible to distinguish whether the bounding box is highlighted.
  • the screen information 430 may include highlight box information, normal box information, and item state information.
  • the highlight box information and the normal box information may include location coordinates of the bounding box, width of the bounding box, and height values of the bounding box.
  • the electronic device 100 may include an application state determining unit 320 .
  • the application state determination unit 320 may determine the state of the application based on screen information obtained from the screen analysis unit 310 .
  • the application state determination unit 320 may include appropriate logic, circuits, interfaces, and/or codes that operate to determine the state of the application from screen information.
  • the application state determination unit 320 may determine the state of the application by using screen information about the application execution screen and a neural network. This will be described in detail with reference to FIG. 5 .
  • FIG. 5 is a diagram for explaining an operation of an application state determination unit determining an application state based on screen information according to an exemplary embodiment.
  • the application state determination unit 320 may include a neural network 520 .
  • the neural network 520 may be a neural network that receives screen information and outputs application state information, and may include at least one neural network.
  • the neural network 520 according to an embodiment includes one or more layers that perform calculations and may include a deep neural network (DNN) having a plurality of layers.
  • DNN deep neural network
  • the neural network In order for the neural network to accurately output result data corresponding to input data, the neural network must be trained according to a purpose.
  • 'training' means inputting various data into the neural network, analyzing the input data, classifying the input data, and/or extracting features necessary for generating result data from the input data. It can mean training the neural network so that the neural network can discover or learn how to do it by itself.
  • the neural network may train training data (eg, a plurality of different images) to optimize and set weight values inside the neural network. And, by self-learning the input data through a neural network having optimized weight values, a desired result is output.
  • weight values inside the neural network 520 may be optimized so that the neural network 520 outputs application state information based on input screen information. Accordingly, the neural network 520 on which training is completed may receive screen information and output state information of the application.
  • State information of an application may include information on items included in a current application execution screen, information on a selected (highlighted) item among items, and the like.
  • FIG. 6 is a diagram for explaining a method for training a neural network according to an exemplary embodiment.
  • the neural network 610 may be trained by an external device.
  • the external device may be a separate device different from the electronic device 100 according to an embodiment.
  • the external device may train the neural network 610 based on a training data set, and may transmit information about the neural network 610 on which training is completed to the electronic device 100 .
  • the external device may train the neural network 610 based on the plurality of training data 620 .
  • the plurality of training data 620 may be generated based on the plurality of execution screens of the application.
  • One application may include a plurality of execution screens, and each of the plurality of execution screens may correspond to state information of the application.
  • each of the plurality of training data 620 includes first training data including first screen information representing a first execution screen of an application and first state information of an application corresponding to the first execution screen, and application Second training data including second screen information indicating a second execution screen of and second state information of an application corresponding to the second execution screen may be included.
  • the external device inputs the first screen information included in the first training data to the neural network 610 in a direction to minimize the difference between the output data (output data) and the first state information included in the first training data. Weights included in the neural network 610 may be updated. In addition, the external device inputs the second screen information included in the second training data to the neural network 610 so that the difference between the output data and the second state information included in the second training data is minimized. Weights included in 610 may be updated. The external device may train the neural network 610 by updating the weight of the neural network 610 using the plurality of training data 620 in the same way. The trained neural network 610 or information about the neural network 610 may be transmitted to the electronic device 100 .
  • FIGS. 7A and 7B are diagrams for explaining an operation in which an electronic device performs an operation corresponding to a user voice input based on application state information, according to an exemplary embodiment.
  • the electronic device 100 may execute a first application among a plurality of applications installed on the electronic device 100 .
  • the electronic device 100 may display the first execution screen 710 of the first application.
  • the electronic device 100 may receive a user voice input corresponding to the user's utterance (eg, “Show me content 2”) while the first execution screen 710 of the first application is displayed.
  • the electronic device 100 may obtain state information of the first application corresponding to the first execution screen 710 of the first application in order to perform an operation corresponding to the user's voice input. Since the operation of obtaining state information of an application by the electronic device 100 according to an embodiment has been described in detail with reference to FIGS. 2 to 6 , a detailed description thereof will be omitted.
  • an item 720 corresponding to content 1 is selected and highlighted, and an item 730 corresponding to content 2 included in the user's utterance is displayed in content 1.
  • Information indicating that the corresponding item 720 is located second in the right direction may be obtained as state information (first state information) corresponding to the first execution screen.
  • the electronic device 100 selects the item 730 corresponding to the content 2 by generating the right direction key twice to execute the content 2 included in the user's utterance. 2 can be executed. Accordingly, the electronic device 100 may operate to display the execution screen 740 of content 2.
  • the electronic device 100 may execute a first application among a plurality of applications.
  • the electronic device 100 may display the second execution screen 750 of the first application.
  • the electronic device 100 may receive a user voice input corresponding to the user's utterance (eg, “show content 2”) in a state where the second execution screen 750 of the first application is displayed.
  • a user voice input corresponding to the user's utterance eg, “show content 2”
  • the electronic device 100 may obtain second state information of the first application corresponding to the second execution screen 750 of the first application in order to perform an operation corresponding to the user's voice input.
  • the second status information may include information indicating that the second execution screen does not include an item corresponding to content 2 and includes an item 760 corresponding to the search menu.
  • the electronic device 100 may operate to select a search menu, search for content 2 included in the user's utterance, and display a search result 770 for content 2.
  • FIG. 8 is a flowchart illustrating a method of performing an operation corresponding to a user voice input by an electronic device according to an exemplary embodiment.
  • the electronic device may display an application execution screen (S810).
  • the electronic device 100 may display a home screen.
  • the home screen may include items corresponding to each of a plurality of applications.
  • the electronic device 100 may execute the first application.
  • the electronic device 100 may display an execution screen of the first application.
  • the electronic device 100 may receive a user's voice input while an application execution screen is displayed (S820).
  • the electronic device 100 when the electronic device 100 receives a user's speech using an audio input module (eg, a microphone) built into or external to the electronic device 100, the user's voice is input from the audio input module.
  • an audio input module eg, a microphone
  • the electronic device 100 inputs the user's voice from the external device through a communication module (communication unit).
  • a communication module communication unit
  • the electronic device 100 may acquire screen information by analyzing the displayed execution screen (S830).
  • the electronic device 100 may identify the currently selected item on the execution screen of the application by inputting the screen information obtained in step 830 (S830) to the neural network (S840).
  • the electronic device 100 may perform an operation corresponding to the user's voice input based on the currently selected item (S850).
  • the electronic device 100 may perform an operation corresponding to a user voice input based on a positional relationship between a location of a currently selected item and an item corresponding to the user voice input.
  • FIG. 9 is a diagram for explaining a method of performing an operation corresponding to a user voice input based on application state information by an electronic device according to an exemplary embodiment.
  • the electronic device 100 may execute a first application among a plurality of applications.
  • the electronic device 100 may display a first execution screen of the first application.
  • the electronic device 100 may receive a user voice input corresponding to the user's speech (eg, "Select content 2") in a state where the first execution screen of the first application is displayed.
  • the electronic device 100 may obtain state information of the first application corresponding to the first execution screen of the first application in order to perform an operation corresponding to the user's voice input. Since the operation of obtaining state information of an application by the electronic device 100 according to an embodiment has been described in detail with reference to FIGS. 2 to 6 , a detailed description thereof will be omitted.
  • an item 920 corresponding to content 1 is selected and highlighted, and an item 930 corresponding to content 2 included in the user's utterance corresponds to content 1.
  • Information indicating that the corresponding item 920 is located second in the right direction may be obtained as state information (first state information) corresponding to the first execution screen.
  • the electronic device 100 may generate the right direction key twice to select the item 930 corresponding to the content 2 in order to select the content 2 included in the user's utterance. Accordingly, the item 930 corresponding to content 2 may be highlighted. Alternatively, the focus may be displayed on the item 930 corresponding to content 2.
  • FIG. 10 is a diagram illustrating a voice recognition system according to an exemplary embodiment.
  • a voice recognition system may include an electronic device 100 and a server 1000.
  • the server 1000 may be connected to the electronic device 100 through a network or short-range communication.
  • the server 1000 according to an embodiment may be a server that performs voice recognition processing.
  • FIG. 10 shows one server, it is not limited thereto, and voice recognition processing may be performed by a plurality of servers.
  • the electronic device 100 may receive a user voice input corresponding to a user's speech.
  • the electronic device 100 may receive a user voice input and perform voice recognition processing on the user voice input.
  • a signal (audio signal) corresponding to the voice input may be transmitted to the server 1000 .
  • the server 1000 may perform voice recognition processing on audio data received from the electronic device 100 .
  • the speech recognition process may be a process of obtaining text data corresponding to an audio signal.
  • Speech recognition processing may include speech-to-text (STT) processing.
  • the voice recognition process may include a process of recognizing a voice signal uttered by a user as a character string.
  • the text obtained as a result of voice recognition may have a natural language sentence form, word form, or phrase form. However, it is not limited thereto.
  • the server 1000 may perform a specific operation or function based on the voice recognition result. Alternatively, the server 1000 may transmit a voice recognition result (eg, text data obtained from the server) to the electronic device 100 or another server.
  • a voice recognition result eg, text data obtained from the server
  • the electronic device 100 may perform a specific operation or function based on a voice recognition result obtained by performing voice recognition processing in the electronic device 100 or a voice recognition result received from the server 1000 .
  • the electronic device 100 may perform a specific operation or function corresponding to a voice recognition result based on state information of an application being executed.
  • the other server may perform a specific function based on the voice recognition result or control another electronic device to perform a specific function.
  • FIG. 11 is a block diagram illustrating a configuration of an electronic device according to an exemplary embodiment.
  • an electronic device 100 may include a microphone 110, a processor 120, a memory 130, a communication unit 140, and a display 150.
  • the microphone 110 may receive a sound signal from an external device or a speaker (a user of the electronic device 100).
  • the microphone 110 may receive a voice of a user's speech.
  • the microphone 110 may receive an external sound signal and convert it into an electrical signal (audio data).
  • the microphone 110 may use various noise cancellation algorithms for removing noise generated in the process of receiving an external sound signal.
  • the communication unit 140 may include a Wi-Fi module, a Bluetooth module, an infrared communication module, a wireless communication module, a LAN module, an Ethernet module, a wired communication module, and the like. At this time, each communication module may be implemented in the form of at least one hardware chip.
  • the Wi-Fi module and the Bluetooth module perform communication using the Wi-Fi method and the Bluetooth method, respectively.
  • various connection information such as an SSID and a session key is first transmitted and received, and various information can be transmitted and received after communication is connected using this.
  • the wireless communication module includes zigbee, 3 rd Generation (3G), 3 rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), LTE Advanced (LTE-A), 4 th Generation (4G), and 5G (5 th Generation) may include at least one communication chip that performs communication according to various wireless communication standards.
  • the communication unit 140 may receive a user voice input from the control device 200 .
  • the processor 120 controls overall operation of the electronic device 100 and signal flow between durable components of the electronic device 100 and processes data.
  • the processor 120 may include a single core, a dual core, a triple core, a quad core, and multiple cores thereof. Also, the processor 120 may include a plurality of processors. For example, the processor 120 may be implemented as a main processor (not shown) and a sub processor (not shown) operating in a sleep mode.
  • the processor 120 may include at least one of a Central Processing Unit (CPU), a Graphic Processing Unit (GPU), and a Video Processing Unit (VPU). Alternatively, according to embodiments, it may be implemented in the form of a system on chip (SoC) in which at least one of a CPU, a GPU, and a VPU is integrated.
  • SoC system on chip
  • the memory 130 may store various data, programs, or applications for driving and controlling the display device 100 .
  • a program stored in memory 130 may include one or more instructions.
  • a program (one or more instructions) or application stored in memory 130 may be executed by processor 120 .
  • the processor 120 may include at least one of the screen analysis unit and the application state determination unit described with reference to FIGS. 3 to 6 .
  • the processor 120 may obtain screen information on the execution screen by analyzing the execution screen based on information about the application.
  • the information about the application may include at least one of types of one or more items included in the application, size information of the items, location information of the items, and pixel value information of the items according to whether or not they are highlighted.
  • the processor 120 detects bounding boxes included in the current execution screen based on information about the application, item information corresponding to each of the detected bounding boxes, whether the detected bounding box is a highlighted item, and the like. Screen information including may be acquired.
  • the processor 120 may obtain application state information by inputting screen information to the neural network.
  • a neural network according to an embodiment may be trained using a plurality of training data including screen information about an application and application state information corresponding to the screen information.
  • the processor 120 may obtain application state information as an output by inputting the obtained screen information to the neural network.
  • Application state information may include information on items included in a current application execution screen, information on a selected item among items, and the like.
  • the processor 120 may perform an operation corresponding to the received user voice input based on application state information. For example, the processor 120 inputs the user's voice based on whether the current application execution screen includes an item corresponding to the user's voice input, the positional relationship between the item selected on the execution screen and the item corresponding to the user's voice input, and the like. An operation corresponding to may be determined and performed.
  • the display 150 converts an image signal, a data signal, an OSD signal, a control signal, and the like processed by the processor 1210 to generate a driving signal.
  • the display 150 may be implemented as a PDP, LCD, OLED, flexible display, or the like, and may also be implemented as a 3D display. Also, the display 150 may be configured as a touch screen and used as an input device other than an output device.
  • the display 150 may display an application execution screen.
  • FIG. 12 is a block diagram illustrating the configuration of an electronic device according to another embodiment.
  • the electronic device 1200 of FIG. 12 may be an embodiment of the electronic device 100 described with reference to FIGS. 1 to 11 .
  • an electronic device 1200 includes a tuner unit 1240, a processor 1210, a display 1220, a communication unit 1250, a sensing unit 1230, an input/output unit ( 1270), a video processing unit 1280, an audio processing unit 1285, an audio output unit 1260, a memory 1290, and a power supply unit 1295.
  • the microphone 1231 of FIG. 12 is connected to the microphone 110 of FIG. 11, the communication unit 1250 of FIG. 12 is connected to the communication unit 140 of FIG. 11, the processor 1210 of FIG. 12 is the processor 120 of FIG.
  • the memory 1290 of FIG. 12 corresponds to the memory 130 of FIG. 11 and the display 1220 of FIG. 12 corresponds to the display 150 of FIG. Therefore, the same contents as those described above will be omitted.
  • the tuner unit 1240 attempts to receive a broadcast signal received by wire or wirelessly in the broadcast reception device 100 among many radio wave components through amplification, mixing, resonance, and the like. You can select by tuning only the frequency of the desired channel.
  • the broadcast signal includes audio, video, and additional information (eg, Electronic Program Guide (EPG)).
  • EPG Electronic Program Guide
  • the tuner unit 1240 may receive broadcast signals from various sources such as terrestrial broadcasting, cable broadcasting, satellite broadcasting, and Internet broadcasting.
  • the tuner unit 1240 may receive a broadcasting signal from a source such as analog broadcasting or digital broadcasting.
  • the sensing unit 1230 detects a user's voice, a user's video, or a user's interaction, and may include a microphone 1231, a camera unit 1232, and a light receiving unit 1233.
  • the microphone 1231 receives the user's utterance.
  • the microphone 1231 may convert the received voice into an electrical signal and output it to the processor 1210 .
  • the user's voice may include, for example, a voice corresponding to a menu or function of the electronic device 1200 .
  • the camera unit 1232 may receive an image (eg, continuous frames) corresponding to a user's motion including a gesture within the camera recognition range.
  • the processor 1210 may select a menu displayed on the electronic device 1200 or control corresponding to the motion recognition result by using the received motion recognition result.
  • the light receiver 1233 receives light signals (including control signals) received from an external control device through a light window (not shown) of a bezel of the display 1220 .
  • the light receiving unit 1233 may receive an optical signal corresponding to a user input (eg, touch, pressure, touch gesture, voice, or motion) from the control device.
  • a control signal may be extracted from the received optical signal under control of the processor 1210 .
  • the input/output unit 1270 provides video (eg, motion pictures, etc.), audio (eg, voice, music, etc.) and additional information (eg, video) from the outside of the electronic device 1200 under the control of the processor 1210. For example, EPG, etc.) and the like are received.
  • Input and output interfaces include HDMI (High-Definition Multimedia Interface), MHL (Mobile High-Definition Link), USB (Universal Serial Bus), DP (Display Port), Thunderbolt, VGA (Video Graphics Array) port, RGB port , D-subminiature (D-SUB), digital visual interface (DVI), component jack, and PC port.
  • the processor 1210 controls overall operation of the electronic device 1400 and signal flow between internal components of the electronic device 1200 and processes data.
  • the processor 1210 may execute an operation system (OS) and various applications stored in the memory 1290 when there is a user's input or when a pre-set stored condition is satisfied.
  • OS operation system
  • the processor 1210 stores signals or data input from the outside of the electronic device 1200, or RAM used as a storage area corresponding to various tasks performed in the electronic device 1200, the electronic device 1200 It may include a ROM and a processor in which a control program for control of is stored.
  • the video processor 1280 processes video data received by the electronic device 1200 .
  • the video processing unit 1280 may perform various image processing such as decoding, scaling, noise filtering, frame rate conversion, and resolution conversion on video data.
  • the audio processing unit 1285 processes audio data.
  • the audio processing unit 1285 may perform various processes such as decoding or amplifying audio data and filtering noise. Meanwhile, the audio processing unit 1285 may include a plurality of audio processing modules to process audio corresponding to a plurality of contents.
  • the audio output unit 1260 outputs audio included in the broadcast signal received through the tuner unit 1240 under the control of the processor 1210 .
  • the audio output unit 1260 may output audio (eg, voice, sound) input through the communication unit 1250 or the input/output unit 1270 .
  • the audio output unit 1260 may output audio stored in the memory 1290 under the control of the processor 1210 .
  • the audio output unit 1260 may include at least one of a speaker, a headphone output terminal, or a Sony/Philips Digital Interface (S/PDIF) output terminal.
  • S/PDIF Sony/Philips Digital Interface
  • the power supply unit 1295 supplies power input from an external power source to internal components of the electronic device 1200 under the control of the processor 1210 .
  • the power supply unit 1295 may supply power output from one or more batteries (not shown) located inside the electronic device 1200 to internal components under the control of the processor 1210 .
  • the memory 1290 may store various data, programs, or applications for driving and controlling the electronic device 1200 under the control of the processor 1210 .
  • the memory 1290 includes a broadcast reception module (not shown), a channel control module, a volume control module, a communication control module, a voice recognition module, a motion recognition module, an optical reception module, a display control module, an audio control module, an external input control module, and a power supply. It may include a control module, a power control module of an external device connected wirelessly (eg, Bluetooth), a voice database (DB), or a motion database (DB).
  • DB voice database
  • DB motion database
  • modules and database of the memory 1290 include a broadcast reception control function, a channel control function, a volume control function, a communication control function, a voice recognition function, a motion recognition function, and a light reception control function in the electronic device 1200.
  • a display control function, an audio control function, an external input control function, a power control function, or a power control function of an external device connected wirelessly (eg, Bluetooth) may be implemented in the form of software.
  • the processor 1210 may perform each function using these software stored in the memory 1290.
  • FIGS. 11 and 12 are block diagrams for one embodiment.
  • Each component of the block diagram may be integrated, added, or omitted according to specifications of the electronic device 100 or 1200 that is actually implemented. That is, if necessary, two or more components may be combined into one component, or one component may be subdivided into two or more components.
  • the functions performed in each block are for explaining the embodiments, and the specific operation or device does not limit the scope of the present invention.
  • An operating method of an electronic device may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer readable medium.
  • the computer readable medium may include program instructions, data files, data structures, etc. alone or in combination.
  • Program instructions recorded on the medium may be those specially designed and configured for the present invention or those known and usable to those skilled in computer software.
  • Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks.
  • - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like.
  • Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.
  • the operating method of the electronic device according to the disclosed embodiments may be included in a computer program product and provided.
  • Computer program products may be traded between sellers and buyers as commodities.
  • a computer program product may include a S/W program and a computer-readable storage medium in which the S/W program is stored.
  • a computer program product may include a product in the form of a S/W program (eg, a downloadable app) that is distributed electronically through a manufacturer of an electronic device or an electronic marketplace (eg, Google Play Store, App Store). there is.
  • a part of the S/W program may be stored in a storage medium or temporarily generated.
  • the storage medium may be a storage medium of a manufacturer's server, an electronic market server, or a relay server temporarily storing SW programs.
  • a computer program product may include a storage medium of a server or a storage medium of a client device in a system composed of a server and a client device.
  • the computer program product may include a storage medium of the third device.
  • the computer program product may include a S/W program itself transmitted from the server to the client device or the third device or from the third device to the client device.
  • one of the server, the client device and the third device may execute the computer program product to perform the method according to the disclosed embodiments.
  • two or more of the server, the client device, and the third device may execute the computer program product to implement the method according to the disclosed embodiments in a distributed manner.
  • a server may execute a computer program product stored in the server to control a client device communicatively connected to the server to perform a method according to the disclosed embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Est divulgué dans le mode de réalisation divulgué un dispositif électronique comprenant : une mémoire qui stocke une ou plusieurs instructions ; et un processeur qui exécute la ou les instructions stockées dans la mémoire, le processeur pouvant : recevoir une entrée vocale d'utilisateur sur la base d'un écran d'exécution d'une application ; obtenir des informations d'écran pour l'écran d'exécution par analyse de l'écran d'exécution sur la base d'informations concernant l'application ; obtenir des informations d'état d'application en entrant les informations d'écran obtenues dans un réseau neuronal ; et effectuer une opération correspondant à l'entrée vocale d'utilisateur sur la base des informations d'état d'application.
PCT/KR2022/013102 2021-10-25 2022-09-01 Dispositif électronique et son procédé de fonctionnement WO2023075118A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020210143035A KR20230059029A (ko) 2021-10-25 2021-10-25 전자 장치 및 그 동작 방법
KR10-2021-0143035 2021-10-25

Publications (1)

Publication Number Publication Date
WO2023075118A1 true WO2023075118A1 (fr) 2023-05-04

Family

ID=86159473

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/013102 WO2023075118A1 (fr) 2021-10-25 2022-09-01 Dispositif électronique et son procédé de fonctionnement

Country Status (2)

Country Link
KR (1) KR20230059029A (fr)
WO (1) WO2023075118A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101521909B1 (ko) * 2008-04-10 2015-05-20 엘지전자 주식회사 이동 단말기 및 그 메뉴 제어방법
KR20170014353A (ko) * 2015-07-29 2017-02-08 삼성전자주식회사 음성 기반의 화면 내비게이션 장치 및 방법
KR20200008341A (ko) * 2018-07-16 2020-01-28 주식회사 케이티 화면을 제어하는 미디어 재생 장치, 방법 및 화면을 분석하는 서버
US20200202851A1 (en) * 2018-12-20 2020-06-25 Shenzhen Lenkeng Technology Co., Ltd. Speech recognition device and system
KR20210029010A (ko) * 2019-09-05 2021-03-15 주식회사 엘지유플러스 화면 자동 인식을 위한 셋톱박스의 동작 방법 및 그 셋톱박스

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101521909B1 (ko) * 2008-04-10 2015-05-20 엘지전자 주식회사 이동 단말기 및 그 메뉴 제어방법
KR20170014353A (ko) * 2015-07-29 2017-02-08 삼성전자주식회사 음성 기반의 화면 내비게이션 장치 및 방법
KR20200008341A (ko) * 2018-07-16 2020-01-28 주식회사 케이티 화면을 제어하는 미디어 재생 장치, 방법 및 화면을 분석하는 서버
US20200202851A1 (en) * 2018-12-20 2020-06-25 Shenzhen Lenkeng Technology Co., Ltd. Speech recognition device and system
KR20210029010A (ko) * 2019-09-05 2021-03-15 주식회사 엘지유플러스 화면 자동 인식을 위한 셋톱박스의 동작 방법 및 그 셋톱박스

Also Published As

Publication number Publication date
KR20230059029A (ko) 2023-05-03

Similar Documents

Publication Publication Date Title
WO2020101143A1 (fr) Appareil d'affichage d'image et procédé de fonctionnement associé
WO2014106986A1 (fr) Appareil électronique commandé par la voix d'un utilisateur et procédé pour le commander
WO2021101087A1 (fr) Appareil électronique et son procédé de commande
WO2019231138A1 (fr) Appareil d'affichage d'image et son procédé de fonctionnement
WO2021071155A1 (fr) Appareil électronique et son procédé de commande
WO2019142988A1 (fr) Dispositif électronique, procédé de commande associé, et support d'enregistrement lisible par ordinateur
EP3867742A1 (fr) Appareil électronique et procédé de commande associé
WO2021251632A1 (fr) Dispositif d'affichage pour générer un contenu multimédia et procédé de mise en fonctionnement du dispositif d'affichage
WO2020050508A1 (fr) Appareil d'affichage d'image et son procédé de fonctionnement
WO2017146518A1 (fr) Serveur, appareil d'affichage d'image et procédé pour faire fonctionner l'appareil d'affichage d'image
WO2020101189A1 (fr) Appareil de traitement d'image et d'audio et son procédé de fonctionnement
WO2020166796A1 (fr) Dispositif électronique et procédé de commande associé
WO2020060071A1 (fr) Appareil électronique et son procédé de commande
WO2023075118A1 (fr) Dispositif électronique et son procédé de fonctionnement
WO2019216484A1 (fr) Dispositif électronique et son procédé de fonctionnement
WO2021256760A1 (fr) Dispositif électronique mobile et son procédé de commande
WO2022108008A1 (fr) Appareil électronique et son procédé de commande
WO2021049802A1 (fr) Dispositif électronique et son procédé de commande
WO2023014030A1 (fr) Dispositif d'affichage et son procédé de fonctionnement
WO2023068502A1 (fr) Dispositif d'affichage et son procédé de fonctionnement
WO2018155810A1 (fr) Dispositif électronique, procédé de commande associé, et support d'enregistrement non-transitoire lisible par ordinateur
WO2022092530A1 (fr) Dispositif électronique et son procédé de commande
WO2022124640A1 (fr) Dispositif électronique et procédé associé de commande
WO2021172747A1 (fr) Dispositif électronique et procédé de commande associé
WO2021107371A1 (fr) Dispositif électronique et son procédé de commande

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22887322

Country of ref document: EP

Kind code of ref document: A1