WO2020247259A1 - Procédés et systèmes pour faciliter des communications d'interfaces de programmation d'application - Google Patents

Procédés et systèmes pour faciliter des communications d'interfaces de programmation d'application Download PDF

Info

Publication number
WO2020247259A1
WO2020247259A1 PCT/US2020/035191 US2020035191W WO2020247259A1 WO 2020247259 A1 WO2020247259 A1 WO 2020247259A1 US 2020035191 W US2020035191 W US 2020035191W WO 2020247259 A1 WO2020247259 A1 WO 2020247259A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
command
user interface
api
response
Prior art date
Application number
PCT/US2020/035191
Other languages
English (en)
Inventor
Manik Malhotra
Jon Wayne Heim
Thomas Page ODOM
Original Assignee
Rovi Guides, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/430,711 external-priority patent/US11249823B2/en
Priority claimed from US16/430,719 external-priority patent/US10990456B2/en
Application filed by Rovi Guides, Inc. filed Critical Rovi Guides, Inc.
Publication of WO2020247259A1 publication Critical patent/WO2020247259A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • Viewers are consuming content in a plurality of ways and from a plurality of devices. Each of these devices and each of these forms of consumption comes with unique features and requirements for interacting with the content.
  • API application programming interface
  • a first party may provide devices that use a third party’s (e.g., a voice search software provider) application.
  • a third party e.g., a voice search software provider
  • a first party’s device may provide voice search features.
  • Search/Recommendation & Voice Search application would conventionally include only a specific input (e.g., an audio sample of the voice command received from a user) because the voice recognition application’s function is conventionally to interpret the audio data.
  • API calls for Natural Language Processing applications would conventionally include only a specific input (e.g., a text string of a command received from a user) because the Natural Language Processing function is
  • the limitations discussed above can be overcome.
  • the UI context at the time of API call even at a basic level, provides useful data (e.g., what screen is currently displayed on the device, the name of the content currently being played, whether the closed captions are enabled or not, etc.) for interpreting ambiguous commands, identifying user intent, etc. or otherwise mitigating the potential for poor performance or the loss of certain features.
  • a device may
  • the device may receive, by the control circuitry, a command (e.g., vocal search command).
  • a command e.g., vocal search command
  • the device may capture, by the control circuitry, an image of the user interface.
  • the device may then generate an application programming interface (“API”) request for interpreting the command (e.g., an API request for a voice recognition application), wherein the API request includes the image.
  • API application programming interface
  • the device may receive, by the control circuitry, an API response to the API request, wherein the API response is customized based on the image.
  • a device may receive, by control circuitry, an API request for interpreting a command, wherein the API request includes an image of a user interface as displayed on a display screen when the command was received.
  • the device may determine, by the control circuitry, a command response based on the command and the image.
  • the device may generate an API response based on the command response.
  • the device may then transmit the API response.
  • FIG. 1 shows an illustrative embodiment of determining a context of a user interface and supplementing an API request in accordance with some embodiments of the disclosure
  • FIG. 2 shows yet another illustrative embodiment of determining a context of a user interface and supplementing an API request in accordance with some embodiments of the disclosure
  • FIG. 3 shows another illustrative embodiment of determining a context of a user interface and supplementing an API request in accordance with some embodiments of the disclosure
  • FIG. 4 is a block diagram of an illustrative user device in accordance with some embodiments of the disclosure.
  • FIG. 5 is a flow chart of illustrative steps involved in facilitating
  • FIG. 6 is a flow chart of illustrative steps involved in facilitating
  • FIG. 7 is a flow chart of illustrative steps involved in customizing an API response in accordance with some embodiments of the disclosure.
  • FIG. 8 is a flow chart of illustrative steps involved in determining the context of a user interface in accordance with some embodiments of the disclosure.
  • FIG. 9 is an illustrative example of a supplemented API call in accordance with some embodiments of the disclosure. Detailed Description of Drawings
  • one or more devices may generate for display a user interface on a display screen.
  • a display screen is currently displaying user interface 100 with four objects (i.e., objects 102, 104, 106, and 108) corresponding to different types of content.
  • an“object” may include any portion of content and/or user interface that has electronically or manually distinguishable boundaries.
  • an object may correspond to a detectable class of items (e.g., an alphanumeric character, face of a person, etc.).
  • the object may be detectable by metadata or other tags in content or may be detected through the use of machine learning approaches such as edge orientation histograms, scale-invariant feature transform descriptors, vectors, etc.
  • machine learning approaches such as edge orientation histograms, scale-invariant feature transform descriptors, vectors, etc.
  • object 102 includes content (e.g., an advertisement) that is currently being displayed.
  • Object 102 includes audio, video and textual data.
  • the textual data may appear as textual information within the content or may include metadata (e.g., subtitles, program descriptions, etc.).
  • the terms“ asset” and/or“ content” should be understood to mean an electronically consumable user asset, such as television programming, as well as pay-per-view programs, on-demand programs (as in video-on- demand (VOD) systems), IP TV, Internet content (e.g., streaming content, downloadable content, Webcasts, etc.), live video (e.g. FaceBook Live or Twitch), video (e.g.
  • Object 104 corresponds to an on-screen function (e.g., the function of skipping the currently displayed advertisement). As explained below, some on-screen functions may correspond to user interface templates. That is, objects may appear with
  • predetermined positions in a user interface template may correspond to a preset function.
  • the user interfaces referred to herein may include interfaces provided by any applications that allow users to navigate among and locate content.
  • Object 106 corresponds to a playback tracker bar.
  • the playback tracker bar may, within its boundaries, feature multiple other objects.
  • object 108 is within the boundaries of object 106.
  • Object 108 corresponds to a playback timer, which describes the current point of playback of the content.
  • the functions and operations provided by the illustrative objects are not meant to be limiting.
  • these objects may relate to any operation such as the modification, selection, and/or navigation of data related to content, such as libraries, playlists, listings, titles, descriptions, ratings information (e.g., parental control ratings, critic ratings, etc.), genre or category information, actor information, logo data (for broadcasters' or providers' logos, etc.), content format (e.g., standard definition, high definition, 3D, 360 video, etc.), advertisement information (e.g., text, images, video clips, etc.).
  • ratings information e.g., parental control ratings, critic ratings, etc.
  • genre or category information e.g., actor information, logo data (for broadcasters' or providers' logos, etc.)
  • logo data for broadcasters' or providers' logos, etc.
  • content format e.g., standard definition, high definition, 3D, 360 video, etc.
  • advertisement information e.g., text, images, video clips, etc.
  • Functions and operations may also include playing content or executing a“fast- access playback operation,” which should be understood to mean any operation that pertains to pausing or playing back a non-linear asset faster than normal playback speed or in a different order than the asset is designed to be played, such as a fast-forward, rewind, skip, chapter selection, segment selection, skip segment, jump segment, next segment, previous segment, skip advertisement or commercial, next chapter, previous chapter or any other operation that does not play back the asset at normal playback speed.
  • a“fast- access playback operation should be understood to mean any operation that pertains to pausing or playing back a non-linear asset faster than normal playback speed or in a different order than the asset is designed to be played, such as a fast-forward, rewind, skip, chapter selection, segment selection, skip segment, jump segment, next segment, previous segment, skip advertisement or commercial, next chapter, previous chapter or any other operation that does not play back the asset at normal playback speed.
  • the system has identified the object boundaries in boundary layout 110.
  • the system has identified object boundaries 112, 114, 116, and 118 for objects 102, 104, 106, and 108, respectively. This identification may occur prior to modifying the user interface in response to the command.
  • Object boundaries 112, 114, 116, and 118 may then be used to classify each object and retrieve additional information about each object.
  • the system may input each object, object boundary, or characteristics of the object or object boundary into a lookup table database that lists potential objects, object boundaries, or object characteristics. The lookup table may then return additional characteristics for an object.
  • the system may use the position of the object boundary 114 to determine additional characteristics of object 104.
  • object 104 may corresponds to a particular template (e.g., a“Playback” template) and additionally is associated with a skip-ad function.
  • the information used to populate database 120 may be retrieved in numerous ways.
  • database 120 may be populated automatically by the system (e.g., the API includes, or has access to, database 120) or the system may generate database 120 (e.g., the API analyzes metadata included in the content, user interface, etc. and compiles information about each object).
  • the system may use this information to determine how to interpret a received command. For example, while the receipt of a user command to“Skip Ad” may trigger a search function for content titled“Skip Ad” if the user interface (or user interface template) is currently displaying a search screen, if the system determines that an option for a“Skip Ad” function is currently displayed, the system may trigger the“Skip Ad” function. By doing so, the API response is customized based on the image by interpreting the command based on an object in the image.
  • FIG. 2 shows another illustrative embodiment of determining a context of a user interface and supplementing an API request.
  • a display screen is currently displaying user interface 200 with four objects (i.e., objects 202, 204, 206, and 208) corresponding to different types of content.
  • objects i.e., objects 202, 204, 206, and 208 corresponding to different types of content.
  • the system has identified object boundaries 212, 214, 216, and 218 for objects 202, 204, 206, and 208, respectively.
  • Object boundaries 212, 214, 216, and 218 may then be used to classify each object and retrieve additional information about each object.
  • FIG. 2 also includes database 220, which includes additional or alternative information and classes of information beyond those shown in database 120 (FIG. 1).
  • database 220 includes classes of OCR’ed text for each object (if detectable) as well as a determined context.
  • the context may be determined based directly on the content (e.g., screenshot image) received with the API request, or the context may be determined based on a further analysis of the data in the records (e.g., records 222, 224, 226, and 228) in database 220.
  • a device may receive, by control circuitry, a command (e.g., vocal search command).
  • a command e.g., vocal search command
  • the device may capture, by the control circuitry, an image of the user interface (e.g., a screenshot of the display upon which the user interface is present).
  • the device may then generate an application programming interface (“API”) request for interpreting the command (e.g., an API request for a voice recognition application), wherein the API request includes the image (e.g., appended to, or included in, the API request as described in FIG. 9 below).
  • API application programming interface
  • the device may receive, by the control circuitry, an API response to the API request, wherein the API response is customized based om the image.
  • the first device may send the API request to a second device (e.g., a server).
  • the first device may supplement an API request with information that is cached on the first device (e.g., an image, metadata, or other information derived from the current state of the user interface).
  • the second device may receive, by control circuitry, the application programming interface (“API”) request for interpreting a command, wherein the API request includes an image of a user interface as displayed on a display screen when the command was received.
  • the second device may then determine, by the control circuitry, a command response based on the command and the image.
  • the second device may generate an API response based on the command response.
  • the second device may then transmit the API response.
  • API application programming interface
  • the first device may determine the object boundaries and reference database 220.
  • the information derived from database 220 e.g., the context of the user interface
  • the system may pull
  • the device issuing an API request may not be the same device that is causing a user interface to be displayed.
  • the device issuing the API request may determine a device to which the command relates or may pull data from multiple devices and send the data from multiple devices in the API request.
  • the system may first analyze the supplemental data to determine which device the user command related to. For example, a user may issue a voice command that is received by a first device (e.g., a smart home device with voice recognition) the first device may then pull data from multiple other devices and include that data in an API request (e.g., to a server).
  • a first device e.g., a smart home device with voice recognition
  • the first device may then pull data from multiple other devices and include that data in an API request (e.g., to a server).
  • the system may pull initial data from other devices on a network (e.g., a television, set-top box, stereo, computer, etc.) to determine what device the command related to.
  • This initial data pull may involve detecting which devices are powered on or off (e.g., powered off devices may be excluded from further analysis), whether or not a device was currently in use (e.g., only currently in use devices may be selected), and/or other filter steps.
  • the system may then analyze data about the remaining devices to select a given device from which to pull more data (if necessary). For example, in response to receiving a voice command, the system may detect that three devices corresponding to the user (e.g., on the user’s network or currently logged into a user’s online profile) are available. The system may then pull data from those devices. Alternatively, the system may pull supplemental data from all devices (e.g., without first filtering).
  • the system may analyze the supplemental data pulled from the one or more devices. If the system did not select the device to which the command related to based on an initial data pull, the system may analyze the supplemental data received from the one or more devices (or request more) to select the device, prior to determining a context of the command. For example, based on an image and/or other data included within an API request, the system may in addition to determine a context of the command also determine a device to which the command relates (e.g., prior to determining the context and/or customizing an API response).
  • This determination may be based on current content of a device (e.g., a word that is included in the title of content being displayed on a device), functions associated with the device (e.g., a function (“record,”“volume up,” etc.) that is only available of one device), key words detected in the user command (e.g., a command naming the device), etc.
  • a device e.g., a word that is included in the title of content being displayed on a device
  • functions associated with the device e.g., a function (“record,”“volume up,” etc.) that is only available of one device
  • key words detected in the user command e.g., a command naming the device
  • FIG. 3 shows another illustrative embodiment of determining a context of a user interface and supplementing an API request.
  • a screen capture is performed every time the user presses the search button (voice or text) before screen 302 is changed to show either audio cues or a keyboard.
  • the screen capture (or information derived from the screen capture) is then sent as part of an API request.
  • the API that receives the request may then extract the user interface context from the screen capture and respond accordingly.
  • the amount and type of information that the API extracts may vary.
  • the API may segment the screen capture of screen 302 into multiple objects by analyzing the screen capture and assigning boundaries to the detected objects.
  • the API may use the screen capture to generate a vector or polygonal data structure based on each object.
  • the data structure may include data defining interconnected vectors or polygons for further analysis.
  • the original user interface or displayed image may include vectors or polygons such that when those vectors or polygons are rendered (e.g., by a graphics engine) the resulting rendering will represent the object or resemble the object with sufficient similarity as to be recognized by the API, without the API having to generate vectors or polygons from the image.
  • the image file comprising the vectors and/or polygons for rendering by the graphics engine (or a simplified subset of the file), is sent to the API rather than a screen capture.
  • the API can apply an optical character recognition (“OCR”) algorithm to detect different blocks of text, options, and/or functions.
  • OCR optical character recognition
  • the API can detect the order in which the results are displayed, on-screen options like“Skip Ad,” names of the content that is playing, enabled settings, positions of content playback, etc. This information can serve as additional inputs (along with the received text or voice command) for a natural language processing or natural language understanding algorithm used to generate the API response.
  • a search application using natural language understanding may account for the various detected objects when resolving ambiguities in the command.
  • the system may use information derived from the detected objects to weigh a potential response to the API request. For example, if the system is trying to select between a first response and a second response, the system may use on-screen listings that are closely associated with the first response (or the subject matter of the first response) to select the first response over the second response.
  • a user is watching a video on screen 302.
  • Screen 302 is currently displaying a skippable advertisement.
  • screen 302 includes an option to“Skip Ad.”
  • the user may issue a voice command to“Skip Ad”.
  • the system may send the command along with the screenshot of screen 302 in an API request.
  • the API may detect the “Skip Ad” option in the screenshot along with the coordinates of the option itself. It should be noted that in some embodiments, the image analysis may occur prior to sending the API request. That is, the device and/or application that received the command may analyze the screenshot and send the results of the analysis as
  • the API may then customize a response to the API request. For example, in response to determining that the voice command was“Skip Ad” when there is a“Skip Ad” function currently displayed, the API response may include instructions to select the“Skip Ad” action or otherwise trigger the on-screen icon (or its function). For example, the API response may include instructions to select the coordinates of the polygon the containing“Skip Ad” function.
  • the API may customize a response to the API request by adjusting its logic (e.g., modifying the route of a decision tree based on the inputs created by supplemental data in the API) as shown in logic 308.
  • logic e.g., modifying the route of a decision tree based on the inputs created by supplemental data in the API
  • the API may determine that the API request is a command from a user to select an item using its position. For example, the API may determine that the current screenshot is of a list of available content. Using the screen capture, the API can not only detect each of the listed assets (e.g., via detecting titles, metadata, etc.), but it can also assign a ordinal position to each of the listings, to easily generate API responses to trigger actions for commands with a positional component such as“select the third one.”
  • FIG. 4 shows a generalized embodiment of illustrative user device 400, which may in some embodiments constitute a device capable of issuing an API request, responding to an API request, or both. It should also be noted that in some embodiments user device 400 may correspond to a server (either remote or local) and the API may form part of that server.
  • User device 400 may receive content and data via input/output (hereinafter “I/O”) path 402.
  • I/O path 402 may provide content and data to control circuitry 404, which includes processing circuitry 406 and storage 408.
  • Control circuitry 404 may be used to send and receive commands, requests, and other suitable data using I/O path 402.
  • I/O path 402 may connect control circuitry 404 (and specifically processing circuitry 406) to one or more communications paths (described below). I/O functions may be provided by one or more of these communications paths, but are shown as a single path in FIG. 4 to avoid overcomplicating the drawing.
  • Control circuitry 404 may be based on any suitable processing circuitry such as processing circuitry 406.
  • processing circuitry may be distributed across multiple separate processors or processing units, for example, multiple of the same type of processing units (e.g., two Intel Core i7 processors) or multiple different processors (e.g., an Intel Core i5 processor and an Intel Core i7 processor).
  • control circuitry 404 may include communications circuitry suitable for communicating with a server or other networks or servers.
  • Memory may be an electronic storage device provided as storage 408 that is part of control circuitry 404.
  • the phrase“electronic storage device” or“storage device” should be understood to mean any device for storing electronic data, computer software, or firmware, including cloud-based devices.
  • a user may send instructions to control circuitry 404 using user input interface 410.
  • User input interface 410 may be any suitable user interface, such as a remote control, mouse, trackball, keypad, keyboard, touchscreen, touchpad, stylus input, joystick, voice recognition interface, or other user input interfaces.
  • user input interface may be incorporated into user device 400 or may be incorporated into another device accessible by user device 400.
  • user device 400 is a user optical device
  • surface space limitation may prevent user input interface from recognizing one or more input types.
  • user input interface 410 may be implemented on a separate device that is accessible to control circuitry 404 (FIG. 4)).
  • Display 412 may be provided as a stand-alone device or integrated with other elements of user equipment device 400.
  • display 412 may be a touchscreen or touch-sensitive display.
  • FIG. 5 is a flow chart of illustrative steps involved in facilitating
  • process 500 or any step thereof could be displayed on, or provided by, one or more devices (e.g., device 400 (FIG. 4)).
  • process 500 may be executed using one or more of control circuitry 404 (FIG. 4), processing circuitry 406 (FIG. 4), or storage 408 (FIG. 4).
  • control circuitry 404 FIG. 4
  • processing circuitry 406 FIG. 4
  • storage 408 FIG. 4
  • steps of process 500 may be incorporated into or combined with one or more steps of any other process (e.g., as described in FIGS. 6-8).
  • process 500 generates for display (e.g., using control circuitry 404 (FIG. 4) a user interface (e.g., user interface 100 (FIG. 100)) on a display screen (e.g., display 412 (FIG. 4)).
  • a user interface e.g., user interface 100 (FIG. 100)
  • a display screen e.g., display 412 (FIG. 4)
  • the content and/or guide may appear on the computer screen.
  • process 500 receives (e.g., using control circuitry 404 (FIG. 4)) a command while the user interface is displayed. For example, while the user is viewing content, the user may issue a voice command or enter a text string.
  • the voice command or text string may relate to searching for additional content or relate to receiving additional information on content currently displayed on screen.
  • process 500 captures (e.g., using control circuitry 404 (FIG. 4)) an image of the user interface in response to receiving the command.
  • the system may capture an image (e.g., a screenshot of the user interface), wherein the image is captured prior to modifying the user interface in response to the command.
  • process 500 generates (e.g., using control circuitry 404 (FIG. 4)) an API request for interpreting the command, wherein the API request includes the image.
  • the API request may be structured similarly to the illustrative API request of FIG. 9.
  • process 500 receives (e.g., using control circuitry 404 (FIG. 4)) an API response to the API request, wherein the API response is customized based on the image or vectorized data file.
  • the API request generated in step 508 may be transmitted to another device or application that generates an API response (e.g., as discussed below in FIG. 7).
  • the API response may be customized (e.g., as described below in FIG. 8) based on the image (and/or the context of the user interface as described below in FIG. 8).
  • FIG. 6 is a flow chart of illustrative steps involved in facilitating
  • process 600 could be displayed on, or provided by, one or more devices (e.g., device 400 (FIG. 4)).
  • process 600 may be executed using one or more of control circuitry 404 (FIG. 4), processing circuitry 406 (FIG. 4), or storage 408 (FIG. 4).
  • control circuitry 404 FIG. 4
  • processing circuitry 406 FIG. 4
  • storage 408 FIG. 4
  • steps of process 600 may be incorporated into or combined with one or more steps of any other process (e.g., as described in FIGS. 5, 7, and 8).
  • process 600 receives (e.g., using control circuitry 404 (FIG. 4)) an API request for interpreting a command, wherein the API request includes an image of a user interface as displayed on a display screen when the command was received.
  • the system may receive an API request as shown in FIG. 9 below.
  • process 600 determines (e.g., using control circuitry 404 (FIG. 4)) a command response based on the command and the image. For example, as discussed below in FIG. 7, the system may determine a response to the API request based on both the command received from the user as well as the supplemental content (e.g., an image) received with the API request.
  • supplemental content e.g., an image
  • process 600 generates (e.g., using control circuitry 404 (FIG. 4)) an API response based on the command response. For example, after determining a command response at step 604, the system generates an API response.
  • the API response may be generated in the same format of the API request as described in FIG. 9 below.
  • the API request and response may take any format (e.g., JSON or XML).
  • process 600 transmits (e.g., using control circuitry 404 (FIG. 4)) the API response.
  • the API response may be transmitted to a second device (e.g., the device that issued the API request) or a second application (e.g., the application that issued the API request).
  • FIG. 7 is a flow chart of illustrative steps involved in customizing an API response in accordance with some embodiments of the disclosure. It should be noted that process 700 or any step thereof could be displayed on, or provided by, one or more devices (e.g., device 400 (FIG. 4)). For example, process 700 may be executed using one or more of control circuitry 404 (FIG. 4), processing circuitry 406 (FIG. 4), or storage 408 (FIG. 4). In addition, one or more steps of process 700 may be incorporated into or combined with one or more steps of any other process (e.g., as described in FIGS. 5, 7, and 8).
  • control circuitry 404 FIG. 4
  • processing circuitry 406 FIG. 4
  • storage 408 FIG. 4
  • steps of process 700 may be incorporated into or combined with one or more steps of any other process (e.g., as described in FIGS. 5, 7, and 8).
  • process 700 determines (e.g., using control circuitry 404 (FIG. 4)) an object in the image.
  • the system may detect the object by metadata or other tags in content or through the use of machine learning approaches such as edge orientation histograms, scale-invariant feature transform descriptors, polygons, vectors, etc.
  • process 700 determines (e.g., using control circuitry 404 (FIG. 4)) whether or not to customize the response to the API request based on the context of the object. This determination may be an automatic determination based on information in the API request or information supplementing the API request.
  • the system may determine what information to use to customize the response. Alternatively or additionally, the system may look for instructions on what information to use to customize the response. Alternatively or additionally, the system may allow a user to manually determine or select presets for how API responses should be customized. If process 700 determines not to customize the response based on the context of the object, process 700 continues to step 714. If process 700 determines to customize the response based on the context of the object, process 700 continues to step 706.
  • process 700 determines a context for the user interface based on the object. For example, the system may input the object into a lookup table database that lists the context of a given object (e.g., record 224 (FIG. 2)). The system may then receive an output of the context for that object.
  • a lookup table database that lists the context of a given object (e.g., record 224 (FIG. 2)). The system may then receive an output of the context for that object.
  • process 700 customizes the API response based on the context.
  • the system may generate the API response based on the context (i.e., the system may modify the API response to the command in the API request based on the context of an object found in an image of the user interface, display capture or rendering file).
  • process 700 determines (e.g., using control circuitry 404 (FIG. 4)) whether or not to customize the response to the API request based on the context of the position of the object. This determination may be an automatic determination based on information in the API request or information supplementing the API request. For example, based on a file type of the information supplementing the API request, the system may determine what information to use to customize the response. Alternatively or additionally, the system may look for instructions on what information to use to customize the response. Alternatively or additionally, the system may allow a user to manually determine or select presets for how API responses should be customized. If process 700 determines not to customize the response based on the position of the object, process 700 continues to step 724.
  • This determination may be an automatic determination based on information in the API request or information supplementing the API request. For example, based on a file type of the information supplementing the API request, the system may determine what information to use to customize the response. Alternatively or additionally, the system may look for instructions on what information to use to customize the response
  • process 700 determines a position of the object. For example, the system may input the object into a lookup table database that lists the position of a given object (e.g., record 124 (FIG. 1)). The system may then receive an output of the position for that object. Alternatively or additionally, the system may determine the position of the object as part of, or instead of, the detection of the boundaries of the object, as described below in FIG. 8. Alternatively or additionally, the system may determine the object itself from the vector or polygon information.
  • a lookup table database that lists the position of a given object (e.g., record 124 (FIG. 1)). The system may then receive an output of the position for that object.
  • the system may determine the position of the object as part of, or instead of, the detection of the boundaries of the object, as described below in FIG. 8.
  • the system may determine the object itself from the vector or polygon information.
  • process 700 customizes the API response based on the position.
  • the system may generate the API response based on the position (i.e., the system may modify the API response to the command in the API request based on the position of an object found in an image of the user interface).
  • process 700 determines (e.g., using control circuitry 404 (FIG. 4)) whether or not to customize the response to the API request based on a word (or other text, alphanumeric character, etc.) of the object.
  • This determination may be an automatic determination based on information in the API request or information supplementing the API request. For example, based on a file type of the information supplementing the API request, the system may determine what information to use to customize the response. Alternatively or additionally, the system may call or query for instructions on what information to use to customize the response. Alternatively or additionally, the system may allow a user to manually determine or select presets for how API responses should be customized. If process 700 determines not to customize the response based on a word corresponding to the object, process 700 continues to step 734. If process 700 determines to customize the response based on the context of the object, process 700 continues to step 726.
  • process 700 determines a word in (or corresponding to) the object. For example, the system may input the object into a lookup table database that lists the OCR’ed content in a given object (e.g., record 224 (FIG. 2)). The system may then receive an output of the word for that object.
  • a lookup table database that lists the OCR’ed content in a given object (e.g., record 224 (FIG. 2)).
  • the system may then receive an output of the word for that object.
  • process 700 customizes the API response based on the word. For example, the system may generate the API response based on the word (i.e., the system may modify the API response to the command in the API request based on the word corresponding to an object found in an image of the user interface).
  • process 700 transmits the API based on the one or more customizations in steps 708, 718, or 728. It should be noted that in some embodiments, step 734 corresponds to step 608.
  • FIG. 8 is a flow chart of illustrative steps involved in determining the context of a user interface.
  • process 800 or any step thereof could be displayed on, or provided by, one or more devices (e.g., device 400 (FIG. 4)).
  • process 800 may be executed using one or more of control circuitry 404 (FIG. 4), processing circuitry 406 (FIG. 4), or storage 408 (FIG. 4).
  • control circuitry 404 FIG. 4
  • processing circuitry 406 FIG. 4
  • storage 408 FIG. 4
  • one or more steps of process 800 may be incorporated into or combined with one or more steps of any other process (e.g., as described in FIGS. 5-7).
  • process 800 determines (e.g., using control circuitry 404 (FIG. 4)) an object in the image.
  • the system may detect the object by metadata or other tags in content or may be detected through the use of machine learning approaches such as edge orientation histograms, scale-invariant feature transform descriptors, vectors, polygons, etc.
  • process 800 determines (e.g., using control circuitry 404 (FIG. 4)) boundaries of objects in the image or the objects themselves. For example, the system may identify points in the image at which the image brightness changes sharply or has discontinuities (edge detection) and/or partition the image into multiple segments or sets of pixels (texture segmentation). It should be noted that in some embodiments, the detection of the object in step 802 may include the determination of the boundaries at step 804. In such case, the system stores the boundaries of the object for use in template matching in step 806.
  • process 800 matches (e.g., using control circuitry 404 (FIG. 4)) the boundaries of objects to a user interface template of a plurality of user interface templates, wherein each of the plurality of user interface templates corresponds to a respective context.
  • the system may input the template into a lookup table database that lists the context of a given template. The system may then receive an output of the context that matches the inputted template.
  • process 800 determines (e.g., using control circuitry 404 (FIG. 4)) the context for the user interface based on the respective context for the user interface template.
  • the system may then customize the determined API response based on the context.
  • the context may be used to determine the circumstances of the command in terms of which it can be fully understood and assessed by the system. For example, if the context relates to a list of movies, the system may account for that context when determining the response.
  • FIG. 9 is an illustrative example of a supplemented API request in accordance with some embodiments of the disclosure.
  • API request 900 includes URL 902, body 904, body 906, and method 908.
  • API request 900 may correspond to one half of the API request-response cycle between one or more devices and/or applications. For example, communication in HTTP (Hyper Text Transfer Protocol) centers around the request- response cycle.
  • HTTP Hyper Text Transfer Protocol
  • the client e.g., a first device and/or application
  • URL Uniform Resource Locator
  • URL 902 allows the client to inform the server (e.g., a second device and/or application) what resources to use. For example, URL 902 directs the server to the “ Voi ceRecogniti on Appli cati on .”
  • API request 900 also includes body 904 and body 906, which contain headers and data.
  • the headers e.g.,“Content-Type” provide metadata about the request. For example, the header information may be used to determine what information should be used to customize a response (e.g., as described in FIG. 7).
  • Body 904 and body 906 also include data (i.e., files). For example, body 904 corresponds to an image (e.g., a screenshot of a user interface), while body 906 corresponds to an audio track (e.g., a recording of a voice command issued by a user).
  • Method 908 informs the server of the action the client wants the server to take.
  • Method 908 indicates a“POST” request asking the server to create a new resource.
  • a method for facilitating communications using application programming interfaces (“APIs”) comprising: generating for display, by control circuitry, a user interface on a display screen; while the user interface is displayed, receiving, by the control circuitry, a command; in response to receiving the command, capturing, by the control circuitry, an image of the user interface; generating an application programming interface (“API”) request for interpreting the command, wherein the API request includes the image; and receiving, by the control circuitry, an API response to the API request, wherein the API response is customized based on the image.
  • APIs application programming interfaces
  • API response is customized based on the image by: determining an object in the image; determining a context for the user interface based on the object; and generating the API response based on the context.
  • the object in the image is determined by: determining boundaries of objects in the image; matching the boundaries of objects to a user interface template of a plurality of user interface templates, wherein each of the plurality of user interface templates corresponds to a respective context; and determining the context for the user interface based on the respective context for the user interface template.
  • API response is customized based on the image by: determining an object in the image; determining a word corresponding to the object in the user interface; and generating the API response based on the word.
  • the method of item 1 further comprising transmitting, by the control circuitry, the API request from a first device to a second device.
  • a system for facilitating communications using application programming interfaces (“APIs”) comprising: control circuitry configured to: generate for display a user interface on a display screen; receive a command while the user interface is displayed; capture an image of the user interface in response to receiving the command; and generate an application programming interface (“API”) request for interpreting the command, wherein the API request includes the image; and input circuitry configured to: receive an API response to the API request, wherein the API response is customized based the image.
  • the control circuitry is further configured to: determine an object in the image; determine a context for the user interface based on the object; and generate the API response based on the context. 3.
  • control circuitry is further configured to: determine boundaries of objects in the image; match the boundaries of objects to a user interface template of a plurality of user interface templates, wherein each of the plurality of user interface templates corresponds to a respective context; and determine the context for the user interface based on the respective context for the user interface template. 4.
  • control circuitry is further configured to: determine an object in the image; determine a position of the object in the user interface; and generate the API response based on the position. 5.
  • control circuitry is further configured to: determine an object in the image; determine a word corresponding to the object in the user interface; and generate the API response based on the word.
  • control circuitry is further configured to cache the image in the API request.
  • a method for facilitating communications using application programming interfaces (“APIs”) comprising: generating for display a user interface on a display screen; while the user interface is displayed, receiving a command; in response to receiving the command, capturing an image of the user interface; generating an application programming interface (“API”) request for interpreting the command, wherein the API request includes the image; and receiving an API response to the API request, wherein the API response is customized based the image.
  • APIs application programming interfaces
  • the object in the image is determined by: determining boundaries of objects in the image; matching the boundaries of objects to a user interface template of a plurality of user interface templates, wherein each of the plurality of user interface templates corresponds to a respective context; and determining the context for the user interface based on the respective context for the user interface template.
  • API response is customized based on the image by: determining an object in the image; determining a word corresponding to the object in the user interface; and generating the API response based on the word.
  • a system for facilitating communications using application programming interfaces (“APIs”) comprising: means for generating for display a user interface on a display screen; means for receiving a command while the user interface is displayed; means for capturing an image of the user interface in response to receiving the command; means for generating an application programming interface (“API”) request for interpreting the command, wherein the API request includes the image; and means for receiving an API response to the API request, wherein the API response is customized based the image.
  • APIs application programming interfaces
  • the means for generating the API request is further configured to: determine an object in the image; determine a context for the user interface based on the object; and generate the API response based on the context.
  • the means for generating the API request is further configured to: determine boundaries of objects in the image; match the boundaries of objects to a user interface template of a plurality of user interface templates, wherein each of the plurality of user interface templates corresponds to a respective context; and determine the context for the user interface based on the respective context for the user interface template.
  • the means for generating the API request is further configured to: determine an object in the image; determine a position of the object in the user interface; and generate the API response based on the position.
  • the means for generating the API request is further configured to: determine an object in the image; determine a word corresponding to the object in the user interface; and generate the API response based on the word.
  • a non-transitory computer-readable medium having instructions recorded thereon for facilitating communications using application programming interfaces (“APIs”), the instruction comprising: an instruction for generating for display a user interface on a display screen; an instruction for receiving a command while the user interface is displayed; an instruction for capturing an image of the user interface in response to receiving the command; an instruction for generating an application programming interface (“API”) request for interpreting the command, wherein the API request includes the image; and an instruction for receiving an API response to the API request, wherein the API response is customized based the image.
  • APIs application programming interfaces
  • the object in the image is determined by: determining boundaries of objects in the image; matching the boundaries of objects to a user interface template of a plurality of user interface templates, wherein each of the plurality of user interface templates corresponds to a respective context; and determining the context for the user interface based on the respective context for the user interface template.
  • the non-transitory computer-readable medium of item 41 further comprising an instruction for transmitting the API request from a first device to a second device.
  • a method for facilitating communications using application programming interfaces (“APIs”) comprising: receiving, by control circuitry, an application programming interface (“API”) request for interpreting a command, wherein the API request includes an image of a user interface as displayed on a display screen when the command was received; determining, by the control circuitry, a command response based on the command and the image; generating an API response based on the command response; and transmitting the API response.
  • APIs application programming interface
  • determining the command response based on the command and the image further comprises: determining an object in the image; determining a context for the user interface based on the object; and customizing the command response based on the context.
  • the object in the image is determined by: determining boundaries of objects in the image; matching the boundaries of objects to a user interface template of a plurality of user interface templates, wherein each of the plurality of user interface templates corresponds to a respective context; and determining the context for the user interface based on the respective context for the user interface template.
  • determining the command response based on the command and the image further comprises: determining an object in the image; determining a position of the object in the user interface; and customizing the command response based on the position.
  • determining the command response based on the command and the image further comprises: determining an object in the image; determining a word corresponding to the object in the user interface; and customizing the command response based on the word.
  • determining the command response based on the command and the image comprises interpreting the command based on an object in the image.
  • a system for facilitating communications using application programming interfaces (“APIs”) comprising: control circuitry configured to: receive an application programming interface (“API”) request for interpreting a command, wherein the API request includes an image of a user interface as displayed on a display screen when the command was received; determine a command response based on the command and the image; and generate an API response based on the command response; and output circuitry configured to transmit the API response.
  • API application programming interface
  • control circuitry is further configured to: determine an object in the image; determine a context for the user interface based on the object; and customize the command response based on the context.
  • control circuitry is further configured to: determine boundaries of objects in the image; match the boundaries of objects to a user interface template of a plurality of user interface templates, wherein each of the plurality of user interface templates corresponds to a respective context; and determine the context for the user interface based on the respective context for the user interface template.
  • control circuitry is further configured to: determine an object in the image; determine a position of the object in the user interface; and customize the command response based on the position.
  • control circuitry is further configured to: determine an object in the image; determine a word corresponding to the object in the user interface; and customize the command response based on the word.
  • determining the command response based on the command and the image comprises interpreting the command based on an object in the image.
  • control circuitry is further configured to cache the image in the API request.
  • image is captured prior to modifying the user interface in response to the command.
  • a method for facilitating communications using application programming interfaces (“APIs”) comprising: receiving an application programming interface (“API”) request for interpreting a command, wherein the API request includes an image of a user interface as displayed on a display screen when the command was received; determining a command response based on the command and the image; generating an API response based on the command response; and transmitting the API response.
  • APIs application programming interfaces
  • determining the command response based on the command and the image further comprises: determining an object in the image; determining a context for the user interface based on the object; and customizing the command response based on the context.
  • the object in the image is determined by: determining boundaries of objects in the image; matching the boundaries of objects to a user interface template of a plurality of user interface templates, wherein each of the plurality of user interface templates corresponds to a respective context; and determining the context for the user interface based on the respective context for the user interface template.
  • determining the command response based on the command and the image further comprises: determining an object in the image; determining a position of the object in the user interface; and customizing the command response based on the position.
  • determining the command response based on the command and the image further comprises: determining an object in the image; determining a word corresponding to the object in the user interface; and customizing the command response based on the word.
  • determining the command response based on the command and the image comprises interpreting the command based on an object in the image.
  • a system for facilitating communications using application programming interfaces (“APIs”) comprising: means for receiving an application programming interface (“API”) request for interpreting a command, wherein the API request includes an image of a user interface as displayed on a display screen when the command was received; means for determining a command response based on the command and the image; means for generating an API response based on the command response; and means for transmitting the API response.
  • APIs application programming interfaces
  • the means for determining the command response based on the command and the image further comprises: means for determining an object in the image; means for determining a context for the user interface based on the object; and means for customizing the command response based on the context.
  • the object in the image is determined by: determining boundaries of objects in the image; matching the boundaries of objects to a user interface template of a plurality of user interface templates, wherein each of the plurality of user interface templates corresponds to a respective context; and determining the context for the user interface based on the respective context for the user interface template.
  • the means for determining the command response based on the command and the image further comprises: means for determining an object in the image; means for determining a position of the object in the user interface; and means for customizing the command response based on the position.
  • the means for determining the command response based on the command and the image further comprises: means for determining an object in the image; means for determining a word corresponding to the object in the user interface; and means for customizing the command response based on the word.
  • a non-transitory computer-readable medium having instructions recorded thereon for facilitating communications using application programming interfaces (“APIs”), the instructions comprising: an instruction for receiving an application programming interface (“API”) request for interpreting a command, wherein the API request includes an image of a user interface as displayed on a display screen when the command was received; an instruction for determining a command response based on the command and the image; an instruction for generating an API response based on the command response; and an instruction for transmitting the API response.
  • APIs application programming interface
  • the instruction for determining the command response based on the command and the image further comprises: determining an object in the image; determining a context for the user interface based on the object; and customizing the command response based on the context.
  • the object in the image is determined by: determining boundaries of objects in the image; matching the boundaries of objects to a user interface template of a plurality of user interface templates, wherein each of the plurality of user interface templates corresponds to a respective context; and determining the context for the user interface based on the respective context for the user interface template.
  • non-transitory computer-readable medium of item 91 further comprising an instruction for caching, by the control circuitry, the image in the API request.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

L'invention concerne un procédé et des systèmes pour faciliter des communications à l'aide d'interfaces de programmation d'application (API) par interprétation d'une instruction reçue sur la base de l'instruction et d'une image de l'interface utilisateur qui a été affichée sur un écran d'affichage lorsque l'instruction a été reçue.
PCT/US2020/035191 2019-06-04 2020-05-29 Procédés et systèmes pour faciliter des communications d'interfaces de programmation d'application WO2020247259A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US16/430,711 2019-06-04
US16/430,719 2019-06-04
US16/430,711 US11249823B2 (en) 2019-06-04 2019-06-04 Methods and systems for facilitating application programming interface communications
US16/430,719 US10990456B2 (en) 2019-06-04 2019-06-04 Methods and systems for facilitating application programming interface communications

Publications (1)

Publication Number Publication Date
WO2020247259A1 true WO2020247259A1 (fr) 2020-12-10

Family

ID=72179171

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/035191 WO2020247259A1 (fr) 2019-06-04 2020-05-29 Procédés et systèmes pour faciliter des communications d'interfaces de programmation d'application

Country Status (1)

Country Link
WO (1) WO2020247259A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024179203A1 (fr) * 2023-03-02 2024-09-06 华为技术有限公司 Procédé de commande vocale et dispositif électronique

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120254810A1 (en) * 2011-03-31 2012-10-04 Microsoft Corporation Combined Activation for Natural User Interface Systems
US20160259775A1 (en) * 2015-03-08 2016-09-08 Speaktoit, Inc. Context-based natural language processing
US20170031652A1 (en) * 2015-07-29 2017-02-02 Samsung Electronics Co., Ltd. Voice-based screen navigation apparatus and method
US20180336009A1 (en) * 2017-05-22 2018-11-22 Samsung Electronics Co., Ltd. System and method for context-based interaction for electronic devices

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120254810A1 (en) * 2011-03-31 2012-10-04 Microsoft Corporation Combined Activation for Natural User Interface Systems
US20160259775A1 (en) * 2015-03-08 2016-09-08 Speaktoit, Inc. Context-based natural language processing
US20170031652A1 (en) * 2015-07-29 2017-02-02 Samsung Electronics Co., Ltd. Voice-based screen navigation apparatus and method
US20180336009A1 (en) * 2017-05-22 2018-11-22 Samsung Electronics Co., Ltd. System and method for context-based interaction for electronic devices

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024179203A1 (fr) * 2023-03-02 2024-09-06 华为技术有限公司 Procédé de commande vocale et dispositif électronique

Similar Documents

Publication Publication Date Title
US10148928B2 (en) Generating alerts based upon detector outputs
US20190373322A1 (en) Interactive Video Content Delivery
US20180152767A1 (en) Providing related objects during playback of video data
US10198498B2 (en) Methods and systems for updating database tags for media content
US10333767B2 (en) Methods, systems, and media for media transmission and management
JP2021525031A (ja) 埋め込まれた情報カード位置特定およびコンテンツ抽出のためのビデオ処理
US20160014482A1 (en) Systems and Methods for Generating Video Summary Sequences From One or More Video Segments
US11627379B2 (en) Systems and methods for navigating media assets
US20150293995A1 (en) Systems and Methods for Performing Multi-Modal Video Search
US20140255003A1 (en) Surfacing information about items mentioned or presented in a film in association with viewing the film
CN110663079A (zh) 基于语音纠正使用自动语音识别生成的输入的方法和系统
US10419799B2 (en) Systems and methods for navigating custom media presentations
US9542395B2 (en) Systems and methods for determining alternative names
US20150012946A1 (en) Methods and systems for presenting tag lines associated with media assets
US11249823B2 (en) Methods and systems for facilitating application programming interface communications
US10990456B2 (en) Methods and systems for facilitating application programming interface communications
WO2020247259A1 (fr) Procédés et systèmes pour faciliter des communications d'interfaces de programmation d'application
US20220317968A1 (en) Voice command processing using user interface context
US20240305865A1 (en) Methods and systems for automated content generation
US11856245B1 (en) Smart automatic skip mode

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20760616

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20760616

Country of ref document: EP

Kind code of ref document: A1