WO2021061620A1 - Insertion de pipelines de traitement de données spécifiés par le propriétaire dans un trajet d'entrée/sortie d'un service de stockage d'objets - Google Patents

Insertion de pipelines de traitement de données spécifiés par le propriétaire dans un trajet d'entrée/sortie d'un service de stockage d'objets Download PDF

Info

Publication number
WO2021061620A1
WO2021061620A1 PCT/US2020/051955 US2020051955W WO2021061620A1 WO 2021061620 A1 WO2021061620 A1 WO 2021061620A1 US 2020051955 W US2020051955 W US 2020051955W WO 2021061620 A1 WO2021061620 A1 WO 2021061620A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
execution
code
request
output
Prior art date
Application number
PCT/US2020/051955
Other languages
English (en)
Other versions
WO2021061620A9 (fr
Inventor
Kevin C. Miller
Ramyanshu Datta
Timothy Lawrence Harris
Original Assignee
Amazon Technologies, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/586,619 external-priority patent/US11106477B2/en
Priority claimed from US16/586,673 external-priority patent/US11360948B2/en
Priority claimed from US16/586,704 external-priority patent/US11055112B2/en
Application filed by Amazon Technologies, Inc. filed Critical Amazon Technologies, Inc.
Priority to CN202080067195.5A priority Critical patent/CN114586011B/zh
Priority to EP20786202.0A priority patent/EP4035007A1/fr
Publication of WO2021061620A1 publication Critical patent/WO2021061620A1/fr
Publication of WO2021061620A9 publication Critical patent/WO2021061620A9/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44568Immediately runnable code
    • G06F9/44573Execute-in-place [XIP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0272Virtual private networks

Definitions

  • data centers provide a number of other beneficial services to client devices.
  • data centers may provide data storage services configured to store data submitted by client devices, and enable retrieval of that data over a network.
  • data storage services can be provided, often varying according to their input/output (I/O) mechanisms.
  • database services may allow I/O based on a database query language, such as the Structured Query Language (SQL).
  • SQL Structured Query Language
  • Block storage services may allow I/O based on modification to one or more defined-length blocks, in a manner similar to how an operating system interacts with local storage, and may thus facilitate virtualized disk drives usable, for example, to store an operating system of a virtual such as individual files, which may vary in content and length.
  • FIG. 1 is a block diagram depicting an illustrative environment in which an object storage service can operate in conjunction with an on-demand code execution system to implement functions in connection with input/output (I/O) requests to the object storage service;
  • I/O input/output
  • FIG. 2 depicts a general architecture of a computing device providing a frontend of the object storage service of FIG. 1;
  • FIG. 3 is a flow diagram depicting illustrative interactions for enabling a client device to modify an I/O path for the object storage service by insertion of a function implemented by execution of a task on the on-demand code execution system;
  • FIG. 4 is an illustrative visualization of a pipeline of functions to be applied to an I/O path for the object storage service of FIG. 1 ;
  • FIGS. 5A-5B show a flow diagram depicting illustrative interactions for handling a request to store input data as an object on the object storage service of FIG. 1, including execution of an owner-specified task to the input data and storage of output of the task as the object;
  • FIG. 8 is a flow chart depicting an illustrative routine for executing a task on the on-demand code execution system of FIG. 1 to enable data manipulations during implementation of an owner-defined function.
  • aspects of the present disclosure relate to handling requests to read or write to data objects on an object storage system. More specifically, aspects of the present disclosure relate to modification of an input/output (I/O) path for an object storage service, such that one or more data manipulations can be inserted into the I/O path to modify the data to which a called request method is applied, without requiring a calling client device to specify such data manipulations.
  • data manipulations occur through execution of user-submitted code, which may be provided for example by an owner of a collection of data objects on an object storage system in order to control interactions with that data object.
  • an owner of an object collection wishes to ensure that end users do not submit objects to the collection including any personally identifying information (to ensure end user’s privacy)
  • the owner may submit code executable to strip such information from a data input.
  • the owner may further specify that such code should be executed during each write of a data object to the collection.
  • the code may be first executed against the input data, and resulting output data may be written to the collection as the data object.
  • an on-demand code execution system may generate an execution environment for the code, provision the environment with the code, execute the code, and provide a result
  • an on-demand code execution system can remove a need for a user to handle configuration and management of environments for code execution.
  • Example techniques for implementing an on-demand code execution system are disclosed, for example, within U.S. Patent No. 9,323,556, entitled “PROGRAMMATIC EVENT DETECTION AND MESSAGE GENERATION FOR REQUESTS TO EXECUTE PROGRAM CODE,” and filed September 30, 2014 (the ‘“556 Patent”), the entirety of which is hereby incorporated by reference.
  • the task could provide an interface similar or identical to that of the object storage service, and be operable to obtain input data in response to a request method call (e.g., HTTP PUT or GET calls), execute the code of the task against the input data, and perform a call to the object storage service for implementation of the request method on might be required under this scenario to submit I/O requests to the on-demand code execution system, rather than the object storage service, to ensure execution of the task.
  • a request method call e.g., HTTP PUT or GET calls
  • this technique may require that code of a task be authored to both provide an interface to end users that enables handling of calls to implement request methods on input data, and an interface that enables performance of calls from the task execution to the object storage service.
  • Implementation of these network interfaces may significantly increase the complexity of the required code, thus disincentivizing owners of data collections from using this technique.
  • user-submitted code directly implements network communication, that code may need to be varied according to the request method handled. For example, a first set of code may be required to support GET operations, a second set of code may be required to support PUT operations, etc. Because embodiments of the present disclosure relieve the user-submitted code of the requirement of handling network communications, one set of code may in some cases be enabled to handle multiple request methods.
  • embodiments of the present disclosure can enable strong integration of serverless task executions with interfaces of an object storage service, such that the service itself is configured to invoke a task execution on receiving an I/O request to a data collection.
  • generation of code to perform data manipulations may be simplified by configuring the object storage service to facilitate data input and output from a task execution, without requiring the task execution to itself implement network communications for I/O operations.
  • an object storage service and on- demand code execution system can be configured in one embodiment to “stage” input data to a task execution in the form of a handle (e.g., a POSIX-compliant descriptor) to an operating- system-level input/output stream, such that code of a task can manipulate the input data via defined-stream operations (e.g., as if the data existed within a local file system).
  • a handle e.g., a POSIX-compliant descriptor
  • an operating- system-level input/output stream e.g., a POSIX-compliant descriptor
  • This stream- level access to input data can be contrasted, for example, with network-level access of input data, which generally requires that code implement network communication to retrieve the input data.
  • the object storage service and on-demand code execution system can be execution may write output
  • the object storage service and on-demand code execution system may handle such writes as output data of the task execution, and apply a called request method to the output data.
  • a general-use on-demand code execution system may operate permissively with respect to network communications from a task execution, enabling any network communication from the execution unless such communication is explicitly denied
  • This permissive model is reflective of the use of task executions as micro-services, which often require interaction with a variety of other network services. However, this permissive model also decreases security of the function, since potentially malicious network communications can also reach the execution.
  • task executions used to perform data manipulations on an object storage system’s I/O path can utilize a restrictive model, whereby only explicitly-allowed network communications can occur from an environment executing a task.
  • a data collection owner may require only a single data manipulation to occur with respect to I/O to the collection.
  • the object storage service may detect I/O to the collection, implement the data manipulation (e.g., by executing a serverless task within an environment provisioned with input and output handles), and apply the called request method to the resulting output data.
  • an owner may request multiple data manipulations occur with respect to an I/O path. For example, to increase portability and reusability, an owner may author multiple serverless tasks, which may be define a series of serverless tasks to be executed on I/O to the path.
  • an object storage system may natively provide one or more data manipulations.
  • an object storage system may natively accept requests for only portions of an object (e.g., of a defined byte range), or may natively enable execution of queries against data of an object (e.g., SQL queries).
  • any combination of various native manipulations and serverless task-based manipulations may be specified for a given I/O path.
  • an owner may specify that, for a particular request to read an object, a given SQL query be executed against the object, the output of which is processed via a first task execution, the output of which is processed via a second task execution, etc.
  • the collection of data manipulations (e.g., native manipulations, serverless task-based manipulations, or a combination thereof) applied to an I/O path is generally referred to herein as a data processing “pipeline” applied to the I/O path.
  • a particular path modification (e.g., the addition of a pipeline) applied to an I/O path may vary according to attributes of the path, such as a client device from which an I/O request originates or an object or collection of objects within the request.
  • attributes of the path such as a client device from which an I/O request originates or an object or collection of objects within the request.
  • pipelines may be applied to individual objects, such that the pipeline is applied to all I/O requests for the object, or a pipeline may be selectively applied only when certain client devices access the object.
  • an object storage service may provide multiple I/O paths for an object or collection.
  • the same object or collection may be associated with multiple resource identifiers on the object storage service, such that the object or collection can be accessed through the multiple identifiers (e.g., uniform resource identifiers, or URIs), which illustratively correspond to different network-accessible endpoints.
  • multiple identifiers e.g., uniform resource identifiers, or URIs
  • different pipelines may be applied to each I/O path for a given object.
  • a first I/O path may be associated with unprivileged access to a data set, and thus be subject to data manipulations that remove confidential information from the data set prior during retrieval.
  • a second I/O path may be associated with privileged access, and thus not be subject to those data manipulations.
  • pipelines may be selectively applied based on other criteria.
  • embodiments disclosed herein improve the ability of computing systems, such as object storage systems, to provide and enforce data manipulation functions against data objects.
  • prior techniques generally depend on external enforcement of data manipulation functions (e.g., requesting that users strip personal information before uploading it)
  • embodiments of the present disclosure enable direct insertion of data manipulation into an I/O path for the object storage system.
  • embodiments of the present disclosure provide a secure mechanism for implementing data manipulations, by providing for serverless execution of manipulation functions within an isolated execution environment.
  • Embodiments of the present disclosure further improve operation of serverless functions, by enabling such functions to operate on the basis of local stream (e.g., “file”) handles, rather than requiring that functions act as network-accessible services.
  • the presently disclosed embodiments therefore address technical problems inherent within computing systems, such as the difficulty of enforcing data manipulations at storage systems and the complexity of creating external services to enforce such data manipulations.
  • These technical problems are addressed by the various technical solutions described herein, including the insertion of data processing pipelines into an I/O path for an object or object collection, potentially without knowledge of a requesting user, the use of serverless functions to perform aspects of such pipelines, and the use of local stream handles to enable simplified creation of serverless functions.
  • the present disclosure represents an improvement on existing data processing systems and computing systems in general.
  • the on-demand code execution system may provide a network-accessible service enabling users to submit or designate computer- executable source code to be executed by virtual machine instances on the on-demand code execution system.
  • Each set of code on the on-demand code execution system may define a “task,” and implement specific functionality corresponding to that task when executed on a virtual machine instance of the on-demand code execution system.
  • Individual implementations of the task on the on-demand code execution system may be referred to as an “execution” of the task (or a “task execution”).
  • the on-demand code execution system may enable users to directly trigger execution of a task based on a variety of potential events, such execution system, or transmission of a specially formatted hypertext transport protocol (“HTTP”) packet to the on-demand code execution system.
  • HTTP hypertext transport protocol
  • the on-demand code execution system may further interact with an object storage system, in order to execute tasks during application of a data manipulation pipeline to an I/O path.
  • the on-demand code execution system can therefore execute any specified executable code “on-demand,” without requiring configuration or maintenance of the underlying hardware or infrastructure on which the code is executed.
  • the on-demand code execution system may be configured to execute tasks in a rapid manner (e.g., in under 100 milliseconds [ms]), thus enabling execution of tasks in “real-time” (e.g., with litde or no perceptible delay to an end user).
  • the on-demand code execution system can include one or more virtual machine instances that are “pre-warmed” or pre-initialized (e.g., booted into an operating system and executing a complete or substantially complete runtime environment) and configured to enable execution of user-defined code, such that the code may be rapidly executed in response to a request to execute the code, without delay caused by initializing the virtual machine instance.
  • the code corresponding to that task can be executed within a pre-initialized virtual machine in a very short amount of time.
  • virtual machine instance is intended to refer to an execution of software or other executable code that emulates hardware to provide an environment or platform on which software may execute (an example “execution environmenf ’).
  • Virtual machine instances are generally executed by hardware devices, which may differ from the physical hardware emulated by the virtual machine instance.
  • a virtual machine may emulate a first type of processor and memoiy while being executed on a second type of processor and memory.
  • virtual machines can be utilized to execute software intended for a first execution environment (e.g., a first operating system) on a physical device that is executing a second execution environment (e.g., a second operating system).
  • hardware emulated by a virtual machine instance may be the same or similar to hardware of an underlying device.
  • a device with a first type of processor may implement a plurality of virtual machine instances, each emulating an instance of that first type of processor.
  • virtual machine instances can be used to divide a device into a number of logical sub-devices (each referred to as a “virtual machine instance”). While virtual machine instances can generally provide a level of abstraction away from the hardware of an underlying physical device, this abstraction is not required. For example, assume a device implements a plurality of virtual machine instances, each of which emulate hardware identical to that provided by the device.
  • each virtual machine instance may allow a software application to execute code on the underlying hardware without translation, while maintaining a logical separation between software applications running on other virtual machine instances.
  • This process which is generally referred to as “native execution,” may be utilized to increase the speed or performance of virtual machine instances.
  • Other techniques that allow direct utilization of underlying hardware such as hardware pass-through techniques, may be used, as well.
  • execution environment other execution environments are also possible.
  • tasks or other processes may be executed within a software “container,” which provides a runtime environment without itself providing virtualization of hardware.
  • Containers may be implemented within virtual machines to provide additional security, or may be run outside of a virtual machine instance.
  • the object storage service 160 can operate to enable clients to read, write, modify, and delete data objects, each of which represents a set of data associated with an identifier (an “object identifier” or “resource identifier”) that can be interacted with as an individual resource.
  • an object may represent a single file submitted by a client device 102 (though the object storage service 160 may or may not store such an object as a single file).
  • This object-level interaction can be contrasted with other types of storage services, such as block-based storage services providing data manipulation at the level of individual blocks or database storage services providing data manipulation at the level of tables (or parts thereof) or the like.
  • frontends 162 which provide an interface (a command-line interface (CLIs), application programing interface (APIs), or other programmatic interface) through which client devices 102 can interface with the service 160 to configure the service 160 on their behalf and to perform I/O operations on the service 160.
  • a client device 102 may interact with a frontend 162 to create a collection of data objects on the service 160 (e.g., a “bucket” of objects) and to configure permissions for that collection.
  • Client devices 102 may thereafter create, read, update, or delete objects within the collection based on the interfaces of the frontends 162.
  • the frontend 162 provides a REST-compliant HTTP interface supporting a variety of request methods, each of which corresponds to a requested I/O operation on the service 160.
  • request methods may include:
  • the service 160 may provide a POST operation similar to a PUT operation but associated with a different upload mechanism (e.g., a browser-based HTML upload), or a HEAD operation enabling retrieval of metadata for an object without retrieving the object itself.
  • the service 160 may enable operations that combine one or more of the above operations, or combining an operation with a native data manipulation.
  • the service 160 may provide a COPY operation enabling copying of an object stored on the service 160 to another object, which operation combines a GET operation with a PUT operation.
  • the service 160 may provide a SELECT operation enabling specification of an SQL query to be applied to an object prior to returning the contents of that object, which combines an application of an SQL query to a data object (a native data manipulation) with a GET operation.
  • portion of a data object may be transmitted to the service via an HTTP request, which itself may include an HTTP method.
  • the HTTP method specified within the request may match the operation requested at the service 160.
  • the HTTP method of a request may not match the operation requested at the service 160.
  • a request may utilize an HTTP POST method to transmit a request to implement a SELECT operation at the service 160.
  • frontends 162 may be configured to obtain a call to a request method, and apply that request method to input data for the method.
  • a frontend 162 can respond to a request to PUT input data into the service 160 as an object by storing that input data as the object on the service 160.
  • Objects may be stored, for example, on object data stores 168, which correspond to any persistent or substantially persistent storage (including hard disk drives (HDDs), solid state drives (SSDs), network accessible storage (NAS), storage area networks (SANs), non-volatile random access memory (NVRAM), or any of a variety of storage devices known in the art).
  • the frontend 162 can respond to a request to GET an object from the service 160 by retrieving the object from the stores 168 (the object representing input data to the GET resource request), and returning the object to a requesting client device 102.
  • calls to a request method may invoke one or more native data manipulations provided by the service 160.
  • a SELECT operation may provide an SQL-formatted query to be applied to an object (also identified within the request), or a GET operation may provide a specific range of bytes of an object to be returned.
  • the service 160 illustratively includes an object manipulation engine 170 configured to perform native data manipulations, which illustratively corresponds to a device configured with software executable to implement native data manipulations on the service 160 (e.g., by stripping non-selected bytes from an object for a byte-range GET, by applying an SQL query to an object and returning results of the query, etc.).
  • the service 160 can further be configured to enable modification of an I/O path for a given object or collection of objects, such that a called request method is applied to an output of a data manipulation enable a client device 102 to specify that GET operations for a given object should be subject to execution of a user-defined task on the on-demand code execution system 120, such that the data returned in response to the operation is the output of a task execution rather than the requested obj ect.
  • the service 160 may enable a client device 102 to specify that PUT operations to store a given object should be subject to execution of a user-defined task on the on-demand code execution system 120, such that the data stored in response to the operation is the output of a task execution rather than the data provided for storage by a client device 102.
  • path modifications may include specification of a pipeline of data manipulations, including native data manipulations, task-based manipulations, or combinations thereof.
  • a client device 102 may specify a pipeline or other data manipulation for an object or object collection through the frontend 162, which may store a record of the pipeline or manipulation in the I/O path modification data store 164, which store 164, like the object data stores 168, can represent any persistent or substantially persistent storage. While shown as distinct in FIG. 1, in some instances the data stores 164 and 168 may represent a single collection of data stores. For example, data modifications to objects or collections may themselves be stored as objects on the service 160.
  • the system further includes an on-demand code execution system 120.
  • the system 120 is solely usable by the object storage service 160 in connection with data manipulations of an I/O path.
  • the system 120 is additionally accessible by client devices 102 to directly implement serverless task executions.
  • the client devices 102, object storage service 160, and on-demand code execution system 120 may communicate via a network 104, which may include any wired network, wireless network, or combination thereof.
  • the network 104 may be a personal area network, local area network, wide area network, over-the-air broadcast network (e.g., for radio or television), cable network, satellite network, cellular telephone network, or combination thereof.
  • the network 104 may be a publicly accessible network of linked networks, possibly operated by various distinct parties, such as the Internet.
  • the network 104 may be a private or semi-private network, such as a corporate or university intranet.
  • the network 104 may include one or more wireless networks, such as a Global System for Mobile Communications (GSM) network, a Code Division Multiple Access (CDMA) network, a Long Term Evolution (LTE) network, or any other type of wireless network.
  • GSM Global System for Mobile Communications
  • CDMA Code Division Multiple Access
  • LTE Long Term Evolution
  • the network 104 can use protocols and components for communicating via the Internet or any of the other aforementioned types of networks.
  • the protocols used by the network 104 may include Hypertext Transfer Protocol (HTTP), HTTP Secure (HTTPS), Message Queue Telemetry Transport (MQTT), Constrained Application Protocol (CoAP), and the like. Protocols and components for communicating via the Internet or any of the other aforementioned types of communication networks are well known to those skilled in the art and, thus, are not described in more detail herein.
  • the system 120 includes one or more frontends 130, which enable interaction with the on-demand code execution system 120.
  • the frontends 130 serve as a “front door” to the other services provided by the on-demand code execution system 120, enabling users (via client devices 102) or the service 160 to provide, request execution of, and view results of computer executable code.
  • the frontends 130 include a variety of components to enable interaction between the on-demand code execution system 120 and other computing devices.
  • each frontend 130 may include a request interface providing client devices 102 and the service 160 with the ability to upload or otherwise communication user- specified code to the on-demand code execution system 120 and to thereafter request execution of that code.
  • the request interface communicates with external computing or API.
  • the frontends 130 process the requests and make sure that the requests are properly authorized. For example, the frontends 130 may determine whether the user associated with the request is authorized to access the user code specified in the request.
  • references to user code as used herein may refer to any program code (e.g., a program, routine, subroutine, thread, etc.) written in a specific program language.
  • code e.g., a program, routine, subroutine, thread, etc.
  • program code may be used interchangeably.
  • Such user code may be executed to achieve a specific function, for example, in connection with a particular data transformation developed by the user.
  • Tasks individual collections of user code (e.g., to achieve a specific function) are referred to herein as “tasks,” while specific executions of that code (including, e.g., compiling code, interpreting code, or otherwise making the code executable) are referred to as “task executions” or simply “executions.” Tasks may be written, by way of non-limiting example, in JavaScript (e.g., node.js), Java, Python, or Ruby (or another programming language).
  • JavaScript e.g., node.js
  • Java e.g., Python
  • Ruby or another programming language
  • the frontend 130 can include an execution queue, which can maintain a record of requested task executions.
  • an execution queue which can maintain a record of requested task executions.
  • the number of simultaneous task executions by the on-demand code execution system 120 is limited, and as such, new task executions initiated at the on-demand code execution system 120 (e.g., via an API call, via a call from an executed or executing task, etc.) may be placed on the execution queue and processed, e.g., in a first-in-first-out order.
  • the on-demand code execution system 120 may include multiple execution queues, such as individual execution queues for each user account
  • users of the service provider system 110 may desire to limit the rate of task executions on the on-demand code execution system 120 (e.g., for cost reasons).
  • the on-demand code execution system 120 may utilize an account-specific execution queue to throttle the rate of simultaneous task executions by a specific user account.
  • the on-demand code execution system 120 may prioritize task executions, such that task executions of specific accounts or of specified priorities bypass or are prioritized within the execution queue.
  • the on- demand code execution system 120 may execute tasks immediately or substantially immediately after receiving a call for that task, and thus, the execution queue may be omitted.
  • the output interface may transmit data regarding task executions (e.g., results of a task, errors related to the task execution, or details of the task execution, such as total time required to complete the execution, total data processed via the execution, etc.) to the client devices 102 or the object storage service 160.
  • task executions e.g., results of a task, errors related to the task execution, or details of the task execution, such as total time required to complete the execution, total data processed via the execution, etc.
  • the on-demand code execution system 120 may include multiple frontends 130.
  • a load balancer may be provided to distribute the incoming calls to the multiple frontends 130, for example, in a round-robin fashion.
  • the manner in which the load balancer distributes incoming calls to the multiple frontends 130 may be based on the location or state of other components of the on-demand code execution system 120. For example, a load balancer may distribute calls to a geographically nearby frontend 130, or to a frontend with capacity to service the call.
  • the load balancer may distribute calls according to the capacities or loads on those other components. Calls may in some instances be distributed between frontends 130 deterministically, such that a given call to execute a task will always (or almost always) be routed to the same frontend 130. This may, for example, assist in maintaining an accurate execution record for a task, to ensure that the task executes only a desired number of times. For example, calls may be distributed to load balance between frontends 130. Other distribution techniques, such as anycast routing, will be apparent to those of skill in the art.
  • the on-demand code execution system 120 further includes one or more worker managers 140 that manage the execution environments, such as virtual machine instances 150 (shown as VM instance 150A and 150B, generally referred to as a “VM”), used for servicing incoming calls to execute tasks. While the following will be described with reference to virtual machine instances 150 as examples of such environments, embodiments of the present disclosure may utilize other environments, such as software containers.
  • each worker manager 140 manages an active pool 148, which is a group (sometimes referred to as a pool) of virtual machine instances 150 executing on one or having the code of the task and any dependency data objects loaded into the instance).
  • the instances 150 are described here as being assigned to a particular task, in some embodiments, the instances may be assigned to a group of tasks, such that the instance is tied to the group of tasks and any tasks of the group can be executed within the instance.
  • the tasks in the same group may belong to the same security group (e.g., based on their security credentials) such that executing one task in a container on a particular instance 150 after another task has been executed in another container on the same instance does not pose security risks.
  • a task may be associated with permissions encompassing a variety of aspects controlling how a task may execute. For example, permissions of a task may define what network connections (if any) can be initiated by an execution environment of the task.
  • permissions of a task may define what authentication information is passed to a task, controlling what network-accessible resources are accessible to execution of a task (e.g., objects on the service 160).
  • a security group of a task is based on one or more such permissions.
  • a security group may be defined based on a combination of permissions to initiate network connections and permissions to access network resources.
  • the tasks of the group may share common dependencies, such that an environment used to execute one task of the group can be rapidly modified to support execution of another task within the group.
  • each frontend 130 passes a request to a worker manager 140 to execute the task.
  • each frontend 130 may be associated with a corresponding worker manager 140 (e.g., a worker manager 140 co-located or geographically nearby to the frontend 130) and thus, the frontend 130 may pass most or all requests to that worker manager 140.
  • a frontend 130 may include a location selector configured to determine a worker manager 140 to which to pass the execution request
  • the location selector may determine the worker manager 140 to receive a call based on hashing the call, and distributing the call to a worker manager 140 selected based on the hashed value (e.g., via a hash ring).
  • instance 150 (if necessary) and execute the code of the task within the instance 150. As shown in FIG.
  • user code 156 may interact with input data and output data as files on the VM data store 158, by use of file handles passed to the code 156 during an execution.
  • input and output data may be stored as files within a kernel-space file system of the data store 158.
  • the staging code 157 may provide a virtual file system, such as a filesystem in userspace (FUSE) interface, which provides an isolated file system accessible to the user code 156, such that the user code’s access to the VM data store 158 is restricted.
  • FUSE filesystem in userspace
  • local file system generally refers to a file system as maintained within an execution environment, such that software executing within the environment can access data as file, rather than via a network connection.
  • the data storage accessible via a local file system may itself be local (e.g., local physical storage), or may be remote (e.g., accessed via a network protocol, like NFS, or represented as a virtualized block device provided by a network-accessible to access data, rather than physical location of the data.
  • the VM data store 158 can include any persistent or non-persistent data storage device.
  • the VM data store 158 is physical storage of the host device, or a virtual disk drive hosted on physical storage of the host device.
  • the VM data store 158 is represented as local storage, but is in fact a virtualized storage device provided by a network accessible service.
  • the VM data store 158 may be a virtualized disk drive provided by a network-accessible block storage service.
  • the object storage service 160 may be configured to provide file-level access to objects stored on the data stores 168, thus enabling the VM data store 158 to be virtualized based on communications between the staging code 157 and the service 160.
  • the object storage service 160 can include a file-level interface 166 providing network access to objects within the data stores 168 as files.
  • the file-level interface 166 may, for example, represent a network-based file system server (e.g., a network file system (NFS)) providing access to objects as files, and the staging code 157 may implement a client of that server, thus providing file-level access to objects of the service 160.
  • NFS network file system
  • the VM data store 158 may represent virtualized access to another data store executing on the same host device of a VM instance 150.
  • an active pool 148 may include one or more data staging VM instances (not shown in FIG. 1), which may be co-tenanted with VM instances 150 on the same host device.
  • a data staging VM instance may be configured to support retrieval and storage of data from the service 160 (e.g., data objects or portions thereof, input data passed by client devices 102, etc.), and storage of that data on a data store of the data staging VM instance.
  • the data staging VM instance may, for example, be designated as unavailable to support execution of user code 156, and thus be associated with elevated permissions relative to instances 150 supporting execution of user code.
  • the data staging VM instance may make this data accessible to other VM instances 150 within its host device (or, potentially, on nearby host devices), such as by use of a network- based file protocol, like NFS.
  • Other VM instances 150 may then act as clients to the data staging VM instance, enabling creation of virtualized VM data stores 158 that, from the point of view of user code 156A, appear as local data stores.
  • 10 streams may additionally be used to read from or write to other interfaces of a VM instance 150 (while still removing a need for user code 156 to conduct operations other than stream-level operations, such as creating network connections).
  • staging code 157 may “pipe” input data to an execution of user code 156 as an input stream, the output of which may be “piped” to the staging code 157 as an output stream.
  • a staging VM instance or a hypervisor to a VM instance 150 may pass input data to a network port of the VM instance 150, which may be read-from by staging code 157 and passed as an input stream to the user code 157.
  • data written to an output stream by the task code 156 may be written to a second network port of the instance 150A for retrieval by the staging VM instance or hypervisor.
  • a hypervisor to the instance 150 may pass input data as data written to a virtualized hardware input device (e.g., a keyboard) and staging code 157 may pass to the user code 156 a handle to the IO stream corresponding to that input device.
  • the hypervisor may similarly pass to the user code 156 a handle for an IO stream corresponding to an virtualized hardware output device, and read data written to that stream as output data.
  • file streams may generally be modified to relate to any IO stream.
  • the object storage service 160 and on-demand code execution system 120 are depicted in FIG. 1 as operating in a distributed computing environment including several computer systems that are interconnected using one or more computer networks (not shown in FIG. 1).
  • the object storage service 160 and on-demand code execution system 120 could also operate within a computing environment having a fewer or greater number of devices than are illustrated in FIG. 1.
  • the depiction of the object storage service 160 and on-demand code execution system 120 in FIG. 1 should be taken as illustrative and not limiting to the present disclosure.
  • the on-demand code execution system 120 or various constituents thereof could implement various Web services components, hosted or “cloud” computing environments, or peer to peer network configurations to implement at least a portion of the processes described herein.
  • the object storage service 160 and on-demand code execution system 120 may be combined into a single service.
  • the object storage hardware or software executed by hardware devices may, for instance, include one or more physical or virtual servers implemented on physical computer hardware configured to execute computer executable instructions for performing various features that will be described herein.
  • the one or more servers may be geographically dispersed or geographically co-located, for instance, in one or more data centers.
  • the one or more servers may operate as part of a system of rapidly provisioned and released computing resources, often referred to as a “cloud computing environment"
  • While some functionalities are generally described herein with reference to an individual component of the object storage service 160 and on-demand code execution system 120, other components or a combination of components may additionally or alternatively implement such functionalities.
  • the object storage service 160 is depicted in FIG. 1 as including an object manipulation engine 170, functions of that engine 170 may additionally or alternatively be implemented as tasks on the on-demand code execution system 120.
  • the on-demand code execution system 120 is described as an example system to apply data manipulation tasks, other compute systems may be used to execute user-defined tasks, which compute systems may include more, fewer or different components than depicted as part of the on-demand code execution system 120.
  • the object storage service 160 may include a physical computing device configured to execute user-defined tasks on demand, thus representing a compute system configuration of elements within FIG. 1 is intended to be illustrative.
  • FIG. 2 depicts a general architecture of a frontend server 200 computing device implementing a frontend 162 of FIG. 1.
  • the general architecture of the frontend server 200 depicted in FIG. 2 includes an arrangement of computer hardware and software that may be used to implement aspects of the present disclosure. The hardware may be implemented on physical electronic devices, as discussed in greater detail below.
  • the frontend server 200 may include many more (or fewer) elements than those shown in FIG. 2. It is not necessary, however, that all of these generally conventional elements be shown in order to provide an enabling disclosure. Additionally, the general architecture illustrated in FIG. 2 may be used to implement one or more of the other components illustrated in FIG. 1.
  • the frontend server 200 includes a processing unit 290, a network interface 292, a computer readable medium drive 294, and an input/output device interface 296, all of which may communicate with one another by way of a communication bus.
  • the network interface 292 may provide connectivity to one or more networks or computing systems.
  • the processing unit 290 may thus receive information and instructions from other computing systems or services via the network 104.
  • the processing unit 290 may also communicate to and from primary memory 280 or secondary memory 298 and further provide output information for an optional display (not shown) via the input/output device interface 296.
  • the input/output device interface 296 may also accept input from an optional input device (not shown).
  • the memory 280 may include a control plane unit 286 and data plane unit 288 each executable to implement aspects of the present disclosure.
  • the control plane unit 286 may include code executable to enable owners of data objects or collections of objects to attach manipulations, serverless functions, or data processing pipelines to an I/O path, in accordance with embodiments of the present disclosure.
  • the control plane unit 286 may enable the frontend 162 to implement the interactions of FIG. 3.
  • the data plane unit 288 may illustratively include code enabling handling of I/O operations on the object storage service 160, including implementation of manipulations, serverless functions, or data processing pipelines attached to an I/O path (e.g., via the interactions of FIGS. 5A-6B, implementation of the routines of FIGS. 7-8, etc.).
  • the frontend server 200 of FIG. 2 is one illustrative configuration of such a device, of which others are possible.
  • a frontend server 200 may in some embodiments be implemented as multiple physical host devices.
  • a first device of such a frontend server 200 may implement the control plane unit 286, while a second device may implement the data plane unit 288.
  • FIG. 2 While described in FIG. 2 as a frontend server 200, similar components may be utilized in some embodiments to implement other devices shown in the environment 100 of FIG. 1.
  • a similar device may implement a worker manager 140, as described in more detail in U S. Patent No. 9,323,556, entitled “PROGRAMMATIC EVENT DETECTION AND MESSAGE GENERATION FOR REQUESTS TO EXECUTE PROGRAM CODE,” and filed September 30, 2014 (the ‘“556 Patent”), the entirety of which is hereby incorporated by reference.
  • a client device 102 A to modify an I/O path for one or more objects on an object storage service 160 by inserting a data manipulation into the I/O path, which manipulation is implemented within a task executable on the on-demand code execution system 120.
  • IO operating- system-level input/output
  • IO operating-system-level input/output
  • Streams may be created in various manners.
  • a programming language may generate a stream by use of a function library to open a file on a local operating system, or a stream may be created by use of a “pipe” operator (e.g., within an operating system shell command language).
  • a pipe e.g., within an operating system shell command language.
  • most general purpose programming languages include, as basic functionality of the code, the ability to interact with streams.
  • streams may be created outside of the code, and potentially outside of an execution environment of the code, stream manipulation code need not necessarily be trusted to conduct certain operations that may be necessary to create a stream.
  • a stream may represent information transmitted over a network connection, without the code being provided with access to that network connection.
  • the code may be authored in a variety of programming languages. Authoring tools for such languages are known in the art and thus will not be described herein. While authoring is described in FIG. 3 as occurring on the client device 102A, the service 160 may in some instances provide interfaces (e.g., web GUIs) through which to author or select code.
  • interfaces e.g., web GUIs
  • the client device 102A submits the stream manipulation code to the frontend 162 of the service 160, and requests that an execution of the code be inserted into an I/O path for one or more objects.
  • the frontends 162 may provide one or more interfaces to the device 102 A enabling submission of the code (e.g., as a compressed file).
  • the frontends 162 may further provide interfaces enabling designation of one or more I/O paths to which an execution of the code should be applied.
  • Each I/O path may correspond, for example, to an object or collection of objects (e.g., a “bucket” of objects).
  • an I/O path may further corresponding to a given way of accessing such object or collection (e.g., a URI through which the object is created), to one or more accounts attempting to access the object or collection, or to other path criteria. Designation of the path modification is then stored in the I/O path modification data store 164, at (3). Additionally, the stream manipulation code is stored within the object data stores 166 at (4).
  • the service 160 when an I/O request is received via the specified I/O path, the service 160 is configured to execute the stream manipulation code against input data for the request (e.g., data provided by the client device 102A or an object of the service 160, depending on the I/O request), before then applying the request to the output of the code execution.
  • input data for the request e.g., data provided by the client device 102A or an object of the service 160, depending on the I/O request
  • a client device 102 A (which in FIG. 3 illustratively represents an retrieved from the object storage service 160.
  • the interactions of FIG. 3 generally relate to insertion of a single data manipulation into the I/O path of an object or collection on the service 160.
  • an owner of an object or collection is enabled to insert multiple data manipulations into such an I/O path.
  • Each data manipulation may correspond, for example, to a serverless code-based manipulation or a native manipulation of the service 160.
  • an owner has submitted a data set to the service 160 as an object, and that the owner wishes to provide an end user with a filtered view of a portion of that data set. While the owner could store that filtered view of the portion as a separate object and provide the end user with access to that separate object, this results in data duplication on the service 160.
  • the owner wishes to provide multiple end users with different portions of the data set, potentially with customized filters, that data duplication grows, resulting in significant inefficiencies.
  • another option may be for the owner to author or obtain custom code to implement different filters on different portions of the object, and to insert that code into the I/O path for the object.
  • this approach may require the owner to duplicate some native functionality of the service 160 (e.g., an ability to retrieve a portion of a data set).
  • this approach would inhibit modularity and reusability of code, since a single set of code would be required to conduct two functions (e.g., selecting a portion of the data and filtering that portion).
  • embodiments of the present disclosure enable an owner to create a pipeline of data manipulations to be applied to an I/O path, linking together multiple data manipulations, each of which may also be inserted into other I/O paths.
  • An illustrative visualization of such a pipeline is shown in FIG. 4 as pipeline 400.
  • the pipeline 400 illustrates a series of data manipulations that an owner specifies are to occur on calling of a request method against an object or object collection.
  • the pipeline begins with input data, specified within the call according to a called request method.
  • a PUT call may generally include the input data as the data to be stored, while a GET call may generally include the input data by reference to a stored object
  • a LIST call may specify a directory, a manifest of which is the input data to the LIST request method.
  • pipeline 400 the called request method is not initially applied to the input data. Rather, the input data is initially passed to an execution of “code A” 404, where code A represents a first set of user-authored code. The output of that execution is then passed to “native function A” 406, which illustratively represents a native function of the service 160, such as a “SELECT” or byte-range function implemented by the object manipulation engine 170.
  • the output of that native function 406 is then passed to an execution of “code B” 408, which represents a second set of user-authored code. Thereafter, the output of that execution 408 is passed to the called request method 410 (e.g., GET, PUT, LIST, etc.). Accordingly, rather than the request method being applied to the input data as in conventional techniques, in the illustration of FIG. 4, the request method is applied to the output of the execution 408, which illustratively represents a transformation of the input data according to one or more owner- specified manipulations 412. Notably, implementation of the pipeline 400 may not require any action or imply any knowledge of the pipeline 400 on the part of a calling client device 102.
  • implementation of pipelines can be expected not to impact existing mechanisms of interacting with the service 160 (other than altering the data stored on or retrieved from the service 160 in accordance with the pipeline).
  • implementation of a pipeline can be expected not to require reconfiguration of existing programs utilizing an API of the service 160.
  • the service 160 may enable an owner to configure non-linear pipelines, such as by include conditional or branching nodes within the pipeline.
  • data manipulations e.g., serverless-based functions
  • the return value of a data manipulation may be used to select a conditional branch within a branched pipeline, such that a first return value causes the pipeline to proceed on a first branch, while a second return value causes the pipeline to proceed on a second branch.
  • pipelines may include parallel branches, such that data is copied or divided to multiple data manipulations, the outputs of which are passed to a single data manipulation for merging prior to executing the called method.
  • the service 160 may illustratively provide a graphical user interface through which owners can create pipelines, such as by specifying of flow-based development interfaces are known and may be utilized in conjunction with aspects of the present disclosure.
  • a pipeline applied to a particular I/O path may be generated on-the-fly, at the time of a request, based on data manipulations applied to the path according to different criteria. For example, an owner of a data collection may apply a first data manipulation to all interactions with objects within a collection, and a second data manipulation to all interactions obtained via a given URL
  • the service 160 may generate a pipeline combining the first and second data manipulations.
  • the service 160 may illustratively implement a hierarchy of criteria, such that manipulations applied to objects are placed within the pipeline prior to manipulations applied to a URI, etc.
  • a serverless task execution may be passed the content of a request (including, e.g., a called method and parameters) and be configured to modify and return, as a return value to a frontend 162, a modified version of the method or parameters.
  • a serverless task execution may be passed a call to “GET” that data object, and may transform parameters of the GET request such that it applies only to a specific byte range of the data object corresponding to the portion that the user may access.
  • tasks may be utilized to implement customized parsing or restrictions on called methods, such as by limiting the methods a user may call, the parameters to those methods, or the like.
  • application of one or more functions to a request e.g., to modify the method called or method parameters
  • a data object may contain sensitive data that a data owner desires to remove prior to providing the data to a client.
  • the owner may further enable a client to specify native manipulations to the data set, such as conducting a database query on the dataset (e.g., via a SELECT resource method).
  • embodiments of the present disclosure can enable an owner to specify manipulations to occur subsequent to application of a called method but prior to conducting a final operation to satisfy a request. For example, in the case of a SELECT (e.g., a data object), and then pass the output of that SELECT operation to a data manipulation, such as a serverless task execution. The output of that execution can then be returned to a client device 102 to satisfy the request.
  • a SELECT e.g., a data object
  • FIG. 3 and FIG. 4 are generally described with reference to serverless tasks authored by an owner of an object or collection
  • the service 160 may enable code authors to share their tasks with other users of the service 160, such that code of a first user is executed in the I/O path of an object owned by a second user.
  • the service 160 may also provide a library of tasks for use by each user.
  • the code of a shared task may be provided to other users.
  • the code of the shared task may be hidden from other users, such that the other users can execute the task but not view code of the task.
  • other users may illustratively be enabled to modify specific aspects of code execution, such as the permissions under which the code will execute.
  • FIGS. 5 A and 5B illustrative interactions will be discussed for applying a modification to an I/O path for a request to store an object on the service 160, which request is referred to in connection with these figures as a “PUT’ request or “PUT object call.” While shown in two figures, numbering of interactions is maintained across FIGS. 5 A and 5B.
  • the interactions begin at (1), where a client device 102A submits a PUT object call to the storage service 160, corresponding to a request to store input data (e.g., included or specified within the call) on the service 160.
  • the input data may correspond, for example, to a file stored on the client device 102A.
  • the call is directed to a frontend 162 of the service 162 that, at (2), retrieves from the I/O path modification data store 164 an indication of modifications to the I/O path for the call.
  • the indication may reflect, for example, a pipeline to be applied to calls received on the I/O path.
  • the I/O path for a call may generally be specified with respect to a request method included within a call, an object or collection of objects indicated within the call, a specific mechanism of reaching the service 160 (e.g., protocol, URI used, etc.), an identity or authentication status of the client device 102A, or a combination thereof. For example, in FIG.
  • the I/O path used can correspond to use of a PUT request method directed to a particular URI (e.g., associated with the frontend 162) to store an object in a particular logical location on the service 160 (e.g., a previously specified a modification to the I/O path, and specifically, has specified that a serverless function should be applied to the input data before a result of that function is stored in the service 160.
  • a PUT request method directed to a particular URI e.g., associated with the frontend 162
  • a particular logical location on the service 160 e.g., a previously specified a modification to the I/O path, and specifically, has specified that a serverless function should be applied to the input data before a result of that function is stored in the service 160.
  • the frontend 162 detects within the modifications for the I/O path inclusion of a serverless task execution.
  • the frontend 162 submits a call to the on-demand code execution system 120 to execute the task specified within the modifications against the input data specified within the call.
  • the on-demand code execution system 120 at (5), therefore generates an execution environment 502 in which to execute code corresponding to the task.
  • the call may be directed to a frontend 130 of the system, which may distribute instructions to a worker manager 140 to select or generate a VM instance 150 in which to execute the task, which VM instance 150 illustratively represents the execution environment 502.
  • the system 120 further provisions the environment with code 504 of the task indicated within the I/O path modification (which may be retrieved, for example, from the object data stores 166). While not shown in FIG. 5 A, the environment 502 further includes other dependencies of the code, such as access to an operating system, a runtime required to execute the code, etc.
  • generation of the execution environment 502 can include configuring the environment 502 with security constraints limiting access to network resources.
  • the environment 502 can be configured with no ability to send or receive information via a network.
  • access to such resources can be provided on a “whitelist” basis, such that network communications from the environment 502 are allowed only for specified domains, network addresses, or the like.
  • Network restrictions may be implemented, for example, by a host device hosting the environment 502 (e.g., by a hypervisor or host operating system). In some instances, network access requirements may be utilized to assist in placement of the environment 502, either logically or physically.
  • the environment 502 for the task may be placed on a host device that is distant from other network-accessible services of the service provider system 110, such as an requires access to otherwise private network services, such as services implemented within a virtual private cloud (e.g., a local-area-network-like environment implemented on the service 160 on behalf of a given user), the environment 502 may be created to exist logically within that cloud, such that a task execution 502 accesses resources within the cloud.
  • a task may be configured to execute within a private cloud of a client device 102 that submits an I/O request.
  • a task may be configured to execute within a private cloud of an owner of the object or collection referenced within the request.
  • the system 120 provisions the environment with stream-level access to an input file handle 506 and an output file handle 508, usable to read from and write to the input data and output data of the task execution, respectively.
  • files handle 506 and 508 may point to a (physical or virtual) block storage device (e.g., disk drive) attached to the environment 502, such that the task can interact with a local file system to read input data and write output data.
  • the environment 502 may represent a virtual machine with a virtual disk drive, and the system 120 may obtain the input data from the service 160 and store the input data on the virtual disk drive.
  • files handle 506 and 508 may point to a network file system, such as an NFS-compatible file system, on which the input data has been stored.
  • the frontend 162 during processing of the call may store the input data as an object on the object data stores 166, and the file-level interface 166 may provide file-level access to the input data and to a file representing output data.
  • the file handles 506 and 508 may point to files on a virtual file system, such as a file system in user space.
  • handles 506 and 508 the task code 504 is enabled to read the input data and write output data using stream manipulations, as opposed to being required to implement network transmissions. Creation of the handles 506 and 508 (or streams corresponding to the handles) may illustratively be achieved by execution of staging code 157 within or associated with the environment 502.
  • FIG. 5B The interactions of FIG. 5 A are continued in FIG. 5B, where the system 120 executes the task code 504.
  • the code 504 may be user-authored, any number of description of FIGS. 5A and 5B, it will be assumed that the code 504, when executed, reads input data from the input file handle 506 (which may be passed as a commonly used input stream, such as stdin), manipulates the input data, and writes output data to the output file handle 508 (which may be passed as a commonly used output stream, such as stdout).
  • the system 120 obtains data written to the output file (e.g., the file referenced in the output file handle) as output data of the execution.
  • a success return value from the task code may further indicate that the frontend 162 should return an HTTP 200 code to the client device 102 A.
  • An error return value may, for example, indicate that the frontend 162 should return a 3XX HTTP redirection or 4XX HTTP error code to the client device 102A.
  • return values may specify to the frontend 162 content of a return message to the client device 102A other than a return value.
  • the frontend 162 may be configured to return a given HTTP code (e.g., 200) for any request from the client device 102A that is successfully retrieved at the frontend 162 and invokes a data processing pipeline.
  • a task execution may then be configured to specify, within its return value, data to be passed to the client device 102 A in addition to that HTTP code.
  • data may illustratively include structured data (e.g., extensible markup language (XML) data) providing information generated by the task execution, such as data indicating success or failure of the task.
  • XML extensible markup language
  • a call to PUT an object on the storage service 160 resulted in creation of that object on the service 160.
  • the object stored on the service 160 corresponds to output data of an owner-specified task, thus enabling the owner of the object greater control over the contents of that object.
  • the service 160 may additionally store the input data as an object (e.g., where the owner-specified task data, such as checksum generated from the input data).
  • FIGS. 6A and 6B illustrative interactions will be discussed for applying a modification to an I/O path for a request to retrieve an object on the service 160, which request is referred to in connection with these figures as a “GET” request or “GET call.” While shown in two figures, numbering of interactions is maintained across FIGS. 6A and 6B.
  • the interactions begin at (1), where a client device 102 A submits a GET call to the storage service 160, corresponding to a request to obtain data of an object (identified within the call) stored on the service 160.
  • the call is directed to a frontend 162 of the service 160 that, at (2), retrieves from the I/O path modification data store 164 an indication of modifications to the I/O path for the call.
  • the I/O path used can correspond to use of a GET request method directed to a particular URI (e.g., associated with the frontend 162) to retrieve an object in a particular logical location on the service 160 (e.g., a specific bucket).
  • URI e.g., associated with the frontend 162
  • the frontend 162 detects within the modifications for the I/O path inclusion of a serverless task execution.
  • the frontend 162 submits a call to the on-demand code execution system 120 to execute the task specified within the modifications against the object specified within the call.
  • the on-demand code execution system 120 at (5), therefore generates an execution environment 502 in which to execute code corresponding to the task.
  • the call may be directed to a frontend 130 of the system, which may distribute instructions to a worker manager 140 to select or generate a VM instance 150 in which to execute the task, which VM instance 150 illustratively represents the execution environment 502.
  • the system 120 further provisions the environment with code 504 of the task indicated within the I/O path modification (which may be retrieved, for example, from the object data stores 166). While not shown in FIG. 6 A, the environment 502 further includes other dependencies of the code, such as access to an operating system, a runtime required to execute the code, etc. access to an input file handle 506 and an output file handle 508, usable to read from and write to the input data (the object) and output data of the task execution, respectively. As discussed above, files handle 506 and 508 may point to a (physical or virtual) block storage device (e.g., disk drive) attached to the environment 502, such that the task can interact with a local file system to read input data and write output data.
  • a (physical or virtual) block storage device e.g., disk drive
  • handles 506 and 508 the task code 504 is enabled to read the input data and write output data using stream manipulations, as opposed to being required to implement network transmissions. Creation of the handles 506 and 508 may illustratively be achieved by execution of staging code 157 within or associated with the environment 502.
  • FIG. 6A The interactions of FIG. 6A are continued in FIG. 6B, where the system 120 executes the task code 504 at (7).
  • the task code 504 may be user-authored, any number of functionalities may be implemented within the code 504.
  • the code 504 when executed, reads input data (corresponding to the object identified within the call) from the input file handle 506 (which may be passed as a commonly used input stream, such as stdin), manipulates the input data, and writes output data to the output file handle 508 (which may be passed as a commonly used output stream, such as stdout).
  • the system 120 obtains data written to the output file (e.g., the file referenced in the output file handle) as output data of the execution.
  • the system 120 obtains a return value of the code execution (e.g., a value passed in a final call of the function). For the purposes of description of FIGS. 6A and 6B, it will be assumed that the return value indicates success of the execution.
  • the output data and the success return value are then passed to the frontend 162. output data of the task execution as the requested object.
  • Interaction (11) thus illustratively corresponds to implementation of the GET request method, initially called for by the client device 102 A, albeit by returning the output of the task execution rather than the object specified within the call.
  • a call to GET an object from the storage service 160 therefore results in return of data to the client device 102 A as the object
  • the data provided to the client device 102 A corresponds to output data of an owner-specified task, thus enabling the owner of the object greater control over the data returned to the client device 102 A.
  • output data of a task execution and a return value of that execution may be returned separately.
  • success return value is assumed in FIGS. 6A and 6B, other types of return value are possible and contemplated, such as error values, pipeline-control values, or calls to execute other data manipulations.
  • return values may indicate what return value is to be returned to the client device 102 A (e.g., as an HTTP status code). In some instances, where output data is iteratively returned from a task execution, the output data may also be iteratively provided by the frontend 162 to the client device 102A.
  • output data is large (e.g., on the order of hundreds of megabytes, gigabytes, etc.)
  • iteratively returning output data to the client device 102 A can enable that data to be provided as a stream, thus speeding delivery of the content to the device 102A relative to delaying return of the data until execution of the task completes.
  • a serverless task may be inserted into the I/O path of the service 160 to perform functions other than data manipulation.
  • a serverless task may be utilized to perform validation or authorization with respect to a called request method, to verify that a client device 102 A is authorized to perform the method.
  • Task-based validation or authorization may enable functions not provided natively by the service 160. For example, objects in the collection created during a certain time range (e.g., the last 30 days, any time excluding the last 30 days, etc.).
  • the service 160 may natively provide authorization on a per-object or per-collection basis, the service 160 may in some cases not natively provide authorization on a duration-since-creation basis. Accordingly, embodiments of the present disclosure enable the owner to insert into an I/O path to the collection (e.g., a GET path using a given URI to the collection) a serverless task that determines whether the client is authorized to retrieve a requested object based on a creation time of that object
  • the return value provided by an execution of the task may correspond to an “authorized” or “unauthorized” response. In instances where a task does not perform data manipulation, it may be unnecessary to provision an environment of the task execution with input and output stream handles.
  • the service 160 and system 120 can be configured to forego provisioning the environment with such handles in these cases.
  • Whether a task implements data manipulation may be specified, for example, on creation of the task and stored as metadata for the task (e.g., within the object data stores 166).
  • the service 160 may thus determine from that metadata whether data manipulation within the task should be supported by provisioning of appropriate stream handles.
  • the system 120 may be configured to detect completion of a function based on interaction with an output stream handle.
  • staging code within an environment e.g., providing a file system in user space or network-based file system
  • the staging code may interpret such a call as successful completion of the function, and notify the service 160 of successful completion without requiring the task execution to explicitly provide return value.
  • Such information may include the content of the request from the client device 102 (e.g., the HTTP data transmitted), metadata regarding the request (e.g., a regarding the client device 102 (e.g., an authentication status of the device, account time, or request history), or metadata regarding the requested object or collection (e.g., size, storage location, permissions, or time created, modified, or accessed).
  • metadata regarding the request e.g., a regarding the client device 102
  • metadata regarding the requested object or collection e.g., size, storage location, permissions, or time created, modified, or accessed.
  • task executions may be configured to modify metadata regarding input data, which may be stored together with the input data (e.g., within the object) and thus written by way of an output stream handle, or which may be separately stored and thus modified by way of a metadata stream handle, inclusion of metadata in a return value, or separate network transmission to the service 160.
  • routine 700 for implementing owner-defined functions in connection with an I/O request obtained at the object storage service of FIG. 1 over an I/O path will be described.
  • the routine 700 may illustratively be implemented subsequent to association of an I/O path (e.g., defined in terms of an object or collection, a mechanism of access to the object or collection, such as a URI, an account transmitting an IO request, etc.) with a pipeline of data manipulations.
  • the routine 700 may be implemented prior to the interactions of FIG. 3, discussed above.
  • the routine 700 is illustratively implemented by a frontend 162.
  • the routine 700 begins at block 702, where the frontend 162 obtains a request to apply an I/O method to input data.
  • the request illustratively corresponds to a client device (e.g., an end user device).
  • the I/O method may correspond, for example, to an HTTP request method, such as GET, PUT, LIST, DELETE, etc.
  • the input data may be included within the request (e.g., within a PUT request), or referenced in the request (e.g., as an existing object on the object storage service 160.
  • the frontend 162 determines one or more data manipulations in the I/O path for the request.
  • the I/O path may be defined based on a variety of criteria (or combinations thereof), such as the object or collection referenced in the request, a URI through which the request was transmitted, an account associated with the request, etc.
  • Manipulations for each defined I/O path may illustratively be stored at the object storage service 160. Accordingly, at block 704, the frontend 162 may compare parameters of the I/O path for the request to stored data manipulations at the object storage service 160to determine data manipulations inserted into the I/O path.
  • the manipulations form a the frontend 162 at block 704 (e.g., by combining multiple manipulations that apply to the I/O path).
  • an additional data manipulation may be specified within the request, which data manipulation may be inserted, for example, prior to pre-specified data manipulations (e.g., not specified within the request).
  • the request may exclude reference to any data manipulation.
  • FIG. 7 illustratively describes data manipulations
  • other processing may be applied to an I/O path by an owner.
  • an owner may insert into an I/O path for an object or collection a serverless task that provides authentication independent of data manipulation.
  • block 706 may be modified such that other data, such as metadata regarding a request or an object specified in the request, is passed to an authentication function or other path manipulation.
  • routine 700 then returns to block 708, until no additional manipulations exist to be implemented.
  • the routine 700 then proceeds to block 712, where the frontend 162 applies the called I/O method (e.g., GET, PUT, POST, LIST, DELETE, etc.) to the output of the prior manipulation.
  • the frontend 162 may provide the output as a result of a GET or LIST request, or may store the output as a new object as a result a requesting device, such as an indication of success of the routine 700 (or, in cases of failure, failure of the routine).
  • the response may be determined by a return value provided by a data manipulation implemented at blocks 706 or 710 (e.g., the final manipulation implemented before error or success).
  • a manipulation that indicates an error e.g., lack of authorization
  • a manipulation that proceeds successfully may instruct the frontend 162 to return an HTTP code indicating success, or may instruct the frontend 162 to return a code otherwise associated with application of the I/O method (e.g., in the absence of data manipulations).
  • the routine 700 thereafter ends at block 714.
  • routine 700 enables an owner of data objects to assert greater control over I/O to an object or collection stored on the object storage service 160 on behalf of the owner.
  • serverless task executions may provide a return value.
  • this return value may instruct a frontend 162 as to further actions to take in implementing the manipulation.
  • an error return value may instruct the frontend 162 to halt implementation of manipulations, and provide a specified error value (e.g., an HTTP error code) to a requesting device.
  • Another return value may instruct the frontend 162 to implement an additional serverless task or manipulation.
  • an illustrative routine 800 will be described for executing a task on the on-demand code execution system of FIG. 1 to enable data illustratively implemented by the on-demand code execution system 120 of FIG. 1.
  • the system 120 stages the environment with an 10 stream representing to input data.
  • the system 120 may configure the environment with a file system that includes the input data, and pass to the task code a handle enabling access of the input data as a file stream.
  • the system 120 may configure the environment with a network file system, providing network-based access to the input data (e.g., as stored on the object storage system).
  • the system 120 may configure the environment with a “local” file system (e.g., from the point of view of an operating system providing the file system), and copy the input data to the local file system.
  • the local file system may, for example, be a filesystem in user space (FUSE).
  • the local file environment or by a network-based device may provide the 10 stream by “piping” the input data to the execution environment, by writing the input data to a network socket of the environment (which may not provide access to an external network), etc.
  • the system 120 further configures the environment with stream-level access to an output stream, such as by creating a file on the file system for the output data, enabling an execution of the task to create such a file, piping a handle of the environment (e.g., stdout) to a location on another VM instance colocated with the environment or a hypervisor of the environment, etc..
  • the task is executed within the environment.
  • Execution of the task may include executing code of the task, and passing to the execution handles or handles of the input stream and output stream.
  • the system 120 may pass to the execution a handle for the input data, as stored on the file system, as a “stdin” variable.
  • the system may further pass to the execution a handle for the output data stream, e.g., as a “stdout” variable.
  • the system 120 may pass other information, such as metadata of the request or an object or collection specified within the request, as parameters to the execution.
  • the code of the task may thus execute to conduct stream manipulations on the input data according to functions of the code, and to write an output of the execution to the output stream using OS- level stream operations.
  • the routine 800 then proceeds to block 810, where the system 120 returns data written to the output stream as output data of the task (e.g., to the frontend 162 of the object storage system).
  • block 810 may occur subsequent to the execution of the task completing, and as such, the system 120 may return the data written as the complete output data of the task. In other instances, block 810 may occur during execution of the task. For example, the system 120 may detect new data written to the output stream and return that data immediately, without awaiting execution of the task.
  • system 120 may delete data of the output file after writing, such that sending of new data immediately obviates a need for the file system to maintain sufficient storage to store all output data of the task execution. Still further, in some embodiments, block 810 may occur on detecting a close of the output stream handle describing the output stream.
  • system 120 returns a return value provided by the execution (e.g., to the frontend 162 of the object storage system). The return value may specify an outcome of the execution, such as success or failure. In some instances, the return value may specify a next action to be undertaken, such as implementation an additional data manipulation.
  • the return value may specify data to be provided to a calling device requesting an I/O operation on a data object, such as an HTTP code to be returned.
  • a data object such as an HTTP code to be returned.
  • the frontend 162 may obtain such return value and undertake appropriate action, such as returning an error or HTTP code to a calling device, implementing an additional data manipulation, performing an I/O operation on output data, etc.
  • a return value may be explicitly specified within code of the task. In other instances, such as where no return value is specified within the code, a default return value may be returned (e.g., a ‘ 1 ’ indicating success).
  • the routine 800 then ends at block 814.
  • a system to apply a data processing pipeline to input/output (10) operations to a collection of data objects stored on a data object storage service comprising: one or more data stores including: a collection of data objects; and information designating the data processing pipeline as being applied to the IO operations prior to providing responses to requests to perform the IO operations, wherein the data processing pipeline includes a series of individual data manipulations, an initial data manipulation of the series being applied to input data of each IO request, and subsequent data manipulations of the series being applied to an output of a prior data manipulation in the series; one or more processors configured with computer-executable instructions to: collection of data objects, wherein the 10 request specifying a request method to be applied to input data, wherein the input data corresponds to an existing data object of the collection of data objects when the 10 operation corresponds to retrieving the existing data object from the collection of data objects, and wherein the input data corresponds to data provided by the client device when the 10 operation corresponds to storing a new data object into the collection of data objects; implement an
  • Clause 3 The system of Clause 2, wherein to implement the at least one data manipulation, the one or more processors are configured to provision an execution environment of the execution with access to a first 10 stream corresponding to an input data file and a second 10 stream corresponding to an output data file for the execution.
  • Clause 4 The system of Clause 1, wherein at least one data manipulation of the series is implemented as native functionality of the data object storage service.
  • Clause 5. A computer-implemented method comprising: obtaining from a client device a request to perform an input/output (10) operation against one or more data objects on an object storage service, wherein corresponds to an existing data object of the one or more data objects when the 10 operation corresponds to retrieving the existing data object from the one or more data objects, and wherein the input data corresponds to data provided by the client device when the 10 operation corresponds to storing a new data object into the one or more data objects; determining a data processing pipeline to be applied to the 10 operation prior to providing a response, wherein the data processing pipeline includes a series of individual data manipulations, an initial data manipulation of the series being applied to the input data, and subsequent data manipulations of the series being applied to an output of a prior data manipulation in the series; implementing an initial data manipulation of the series against the input data; for each subsequent data manipulation of the series, implementing the subsequent data
  • Clause 6 The computer-implemented method of Clause 5, wherein determining the data processing pipeline to be applied to the 10 operation prior to providing the response comprises identifying an 10 path of the request based at least in part on the one or more data objects, a resource identifier to which the request was transmitted, or the client device.
  • Clause 7 The computer-implemented method of Clause 5, wherein determining the data processing pipeline to be applied to the 10 operation prior to providing the response comprises combining a first data manipulation associated with the one or more data objects and a second data manipulation associated with a resource identifier to which the request was transmitted.
  • operation corresponds a hypertext transport protocol (HTTP) GET operation
  • applying the request method to the output comprises transmitting a response to the GET operation to the client device.
  • HTTP hypertext transport protocol
  • Clause 9 The computer-implemented method of Clause 5, wherein at least one data manipulation of the series is implemented on an on-demand code execution system as a serverless function implemented by execution of code specified by an owner of the one or more data objects.
  • Clause 10 The computer-implemented method of Clause 9, wherein implementing the at least one data manipulation comprises generating an execution environment on the on-demand code execution system for the execution, the execution environment lacking network access.
  • Clause 11 The computer-implemented method of Clause 9, wherein implementing the at least one data manipulation comprises generating an execution environment on the on-demand code execution system for the execution, the execution environment having access to a virtual private local area network associated with the client device.
  • Clause 12 The computer-implemented method of Clause 5, wherein the data processing pipeline further includes an authorization function prior to the series, and wherein the method further comprises: implementing the authorization function; passing to the authorization function metadata regarding the request; and verifying a return value of the authorization function indicating successful authorization.
  • Clause 13 The computer-implemented method of Clause 12 further comprising passing to the authorization function metadata regarding the existing data object
  • Non-transitory computer-readable media comprising computer- executable instructions that, when executed by a computing system, cause the computing system to: operation against one or more data objects on an object storage service, wherein the request specifies input data for the 10 operation; determine a data processing pipeline to be applied to the 10 operation prior to providing a response, wherein the data processing pipeline includes a series of individual data manipulations, an initial data manipulation of the series being applied to the input data, and subsequent data manipulations of the series being applied to an output of a prior data manipulation in the series; implement an initial data manipulation of the series against data of the data object; for each subsequent data manipulation of the series, implement the subsequent data manipulation against an output of a prior data manipulation in the series; and apply the 10 operation to the output of a final data manipulation in the senes.
  • Clause 15 The non-transitory computer-readable media of Clause 14, wherein the 10 operation is an HTTP request method.
  • Clause 16 The non-transitory computer-readable media of Clause 14, wherein the input data corresponds to a manifest of the one or more data objects when the 10 operation corresponds to retrieving a listing of data objects in the one or more data objects.
  • Clause 18 The non-transitory computer-readable media of Clause 14, wherein at least one data manipulation of the series is implemented on an on-demand code execution system as a serverless function implemented by execution of code specified by an owner of the collection of data objects. wherein to implement the at least one data manipulation, the instructions cause the computing system to provision an execution environment of the execution with access to a first 10 stream corresponding to an input data file and a second 10 stream corresponding to an output data file for the execution.
  • Clause 20 The non-transitory computer-readable media of Clause 14, wherein the data processing pipeline further includes an authorization function prior to the series, and wherein the instructions further cause the computing system to: implement the authorization function; pass to the authorization function metadata regarding the request; and verify a return value of the authorization function indicating successful authorization.
  • Clause 2 The system of Clause 1, wherein the request is a hypertext transport protocol (HTTP) GET request
  • HTTP hypertext transport protocol
  • Clause 3 The system of Clause 1 , wherein the execution occurs within an isolated execution environment of the on-demand code execution system, and wherein the input handle describes a file on a local file system of the execution environment Clause 4.
  • a computer-implemented method comprising: obtaining from a client device a request to perform an input/output (10) operation against one or more data objects on an object storage service, wherein the request specifies input data for the 10 operation, wherein the input data corresponds to a data object of the one or more data objects when the 10 operation corresponds to retrieving an existing data object from the one or more data objects, and wherein the input data corresponds to data provided by the client device when the 10 operation corresponds to storing a new data object into the one or more data objects; determining a modification to the 10 operation established by an owner of the one or more data objects independent of the request, the modification requesting initiation of an execution of owner-defined code against the input data prior to providing a response to the request; initiating, on an on-demand code execution system, the execution of the owner-defined code; obtaining, from the execution of the owner-defined code, output data representing a manipulation of the input data; and 10 operation comprises at least one of transmitting the output data to the client device as the existing data object or
  • Clause 6 The computer-implemented method of Clause 5, wherein the 10 operation is at least one of an GET request, a PUT request, a POST request, a LIST request, or a DELETE request.
  • Clause 7 The computer-implemented method of Clause 5 further comprising passing to the execution of the owner-defined code an input handle providing access to the input data as an 10 stream.
  • Clause 8 The computer-implemented method of Clause 7 further comprising passing to the execution of the owner-defined code an output handle providing access to a the output data as an 10 stream.
  • Clause 9 The computer-implemented method of Clause 8, wherein implementing the execution of the owner-defined code comprises implementing the execution in an isolated execution environment without network access.
  • Clause 12 The computer-implemented method of Clause 5, wherein the request is a hypertext transport protocol (HTTP) request, and wherein the method further comprises: obtaining, from the execution of the owner-defined code, a return value identifying an HTTP response code; and returning the HTTP response code to the client device responsive to the request.
  • HTTP hypertext transport protocol
  • executable instructions that, when executed by a computing system, cause the computing system to: obtain from a client device a request perform an input/output (10) operation against one or more data objects on an object storage service, wherein the request specifies input data for the 10 operation, wherein the input data corresponds to a data object of the one or more data objects when the 10 operation corresponds to retrieving an existing data object from the one or more data objects, and wherein the input data corresponds to data provided by the client device when the 10 operation corresponds to storing a new data object into the one or more data objects; determine a modification to the 10 operation established by an owner of the one or more data objects independent of the request, the modification requesting initiation of an execution of owner-defined code against the input data prior to providing a response to the request; initiate, on an on-demand code execution system, the execution of the owner-defined code; obtain, from the execution of the owner-defined code, output data representing a manipulation of the input data; and perform the 10 operation on the output data by at least one of transmitting the output data to
  • Clause 14 The non-transitory computer-readable media of Clause 13, wherein the computer-executable instructions further cause the computing system to pass to the execution of the owner-defined code an input handle providing access to the input data as an 10 stream.
  • Clause 15 The non-transitory computer-readable media of Clause 14, wherein the input handle describes a file on a virtual file system implemented in an execution environment of the on-demand code execution system. further comprising staging code that, when executed by the computing system, causes the computing system to: provision the execution environment with the file; and provision the execution environment with an output file; wherein an on-demand code execution system is configured to pass an output file handle for the output file to the execution of the owner-defined code, and wherein the output data is written by the execution to the output file using the output file handle.
  • non-transitory computer-readable media of Clause 16 wherein the computer-executable instructions are executed on one or more computing devices of the computing system, and wherein the staging code further causes the computing system to: detect a closing of the output file handle by the execution; and provide contents of the output file to the one or more computing devices.
  • Clause 18 The non-transitory computer-readable media of Clause 15, wherein the client device is authenticated as associated as a user account, wherein a user of the user account has specified a set of network transmissions permitted from the execution of the owner-defined code, and wherein to implement the execution of the owner-defined code on the on-demand code execution system, the computing system is configured to generate an execution environment on the on-demand code execution system with network communications limited to the permitted set of network transmissions.
  • Clause 19 The non-transitory computer-readable media of Clause 18, wherein the user is the owner.
  • Clause 20 The non-transitory computer-readable media of Clause 15, wherein the execution is implemented in an execution environment of the on-demand code execution system, and wherein the execution environment is located within a virtual private local area network associated with a user account of the client device.
  • input/output (10) operations requesting retrieval of data objects stored on an object storage service the system comprising: one or more computing devices implementing the object storage service, the service storing a collection of data objects on behalf of an owner; and one or more processors configured with computer-executable instructions to: receive a request from a computing device of the owner to insert execution of owner-defined code into an input/output (10) path for the collection of data objects; store an association of the 10 path with the owner-defined code that configures the object storage service to, prior to satisfying requests received via the 10 path for the collection of data objects, initiate an execution of the owner-defined code; and wherein the object storage service is configured by the stored association of the 10 path with the owner-defined code to: receive, via the 10 path, an 10 request associated with the collection of data objects, wherein the 10 request specifies a request method to be applied to input data; and satisfy the 10 request at least partly by initiating the execution of the
  • Clause 2 The system of Clause 1, wherein the request method is hypertext transport protocol (HTTP) GET, wherein the input data is an object within the collection of objects to be retrieved by the request, and wherein applying the request method to the output data comprises returning the output data as the object.
  • HTTP hypertext transport protocol
  • Clause 3 The system of Clause 1, wherein the 10 path is defined at least in part based on one or more of the collection of data objects, a resource identifier to which the 10 request is transmitted, or a computing device from which the 10 request is received. further configured with the computer-executable instructions to: receive a request from the computing device of the owner to insert execution of second owner-defined code into a second input/output (10) path for the collection of data objects; store an association of the second 10 path with the second owner-defined code that configures the object storage service to, prior to satisfying requests received via the second 10 path for to the collection of data objects, execute the second owner- defined code; and wherein the object storage service is configured by the stored association of the second 10 path with the second owner-defined code to: receive, via the second 10 path, a second 10 request associated with the collection of data objects, wherein the second 10 request specifies an additional request method to be applied to second input data; and satisfy the second 10 request at least partly by initiating an execution of the second owner-defined code against the second input data
  • a computer-implemented method comprising: receiving a request from a computing device of an owner to insert execution of owner-defined code into an input/output (10) path for a collection of data objects stored on an object storage service on behalf of the owner; and storing an association of the 10 path with the owner-defined code that configures the object storage service to, prior to satisfying 10 requests for 10 to the collection of data objects, execute the owner-defined code; wherein configuring the object storage service causes the object storage service, in response to an 10 request associated with the collection of data objects and specifying a request method to be applied to input data, to satisfy the 10 request at least partly by: initiating an execution of the owner-defined code against the input data to produce output data, and Clause 6.
  • Clause 7 The computer-implemented method of Clause 5, wherein the owner-defined code is selected from a library of code associated with the object storage service.
  • Clause 8 The computer-implemented method of Clause 5, wherein the request method is hypertext transport protocol (HTTP) PUT, wherein the input data is provided within the 10 request, and wherein applying the request method to the output data comprises storing the output data as a new object within the collection of objects.
  • HTTP hypertext transport protocol
  • Clause 9 The computer-implemented method of Clause 5, wherein the 10 request includes a call to second owner-defined code associated with the collection of data objects, and wherein the object storage service is configured, in response to the 10 request, to execute the second owner-defined code to produce the input data.
  • Clause 10 The computer-implemented method of Clause 5, wherein 10 request does not reference the owner-defined code.
  • Clause 11 The computer-implemented method of Clause 5, wherein to satisfy the 10 request at least partly by initiating the execution of the owner-defined code against the input data to produce output data, the object storage service is further configured to pass to the execution a handle providing access to the input data as an 10 stream.
  • Non-transitory computer-readable media storing computer- executable instructions that, when executed by a data object storage system, cause the data object storage system to: receive a request from a computing device of an owner to insert execution of owner-defined code into an input/output (10) path for a collection of data objects stored on an object storage service on behalf of the owner; receive an 10 request associated with the collection of data objects, wherein the 10 request specifies a request method to be applied to input data; and responsive to the stored association of the 10 path with the owner- defined code, satisfy the 10 request at least partly by initiating an execution of the owner-defined code against the input data to produce output data, and applying the request method to the output data.
  • Clause 14 The non-transitory computer-readable media of Clause 13, wherein the request method is hypertext transport protocol (HTTP) LIST, wherein the input data is a manifest of data objects within the collection of data objects, and wherein applying the request method to the output data comprises returning the output data as the manifest of data objects within the collection of data objects.
  • HTTP hypertext transport protocol
  • Clause 15 The non-transitory computer-readable media of Clause 14, wherein the output data excludes identification of at least one data object within the collection of data objects removed from the manifest during execution of the owner- defined code.
  • Clause 16 The non-transitory computer-readable media of Clause 13, wherein to satisfy the 10 request at least partly by initiating the execution of the owner- defined code against the input data to produce output data, the object storage service is further configured to pass to the execution a file handle providing access to the input data as an 10 stream.
  • Clause 17 The non-transitory computer-readable media of Clause 13, wherein to satisfy the 10 request at least partly by initiating the execution of the owner- defined code against the input data to produce output data, the object storage service is further configured to pass to the execution a file handle providing access to an output stream for the output data.
  • Clause 18 The non-transitory computer-readable media of Clause 13, wherein the owner-defined code is first owner-defined code, and wherein the computer- executable instructions further cause the data object storage system to: execution of second owner-defined code into the input/output (10) path for the collection of data objects subsequent to execution of the first owner-defined code; store an association of the 10 path with the second owner-defined code; receive a second 10 request associated with the collection of data objects, wherein the second 10 request specifies an additional request method to be applied to additional input data; and responsive to the stored associations of the 10 path with the first owner- defined code and second owner defined code, satisfy the 10 request at least partly by initiating an execution of the first owner-defined code against the input data to produce intermediate data, initiating an execution of the second owner-defined code against the intermediate data to produce output data, and applying the request method to the output data.
  • the owner-defined code is first owner-defined code
  • the computer- executable instructions further cause the data object storage system to: execution of second owner-defined code into the input/out
  • a device configured to are intended to include one or more recited devices. Such one or more recited devices can also be collectively configured to carry out the stated recitations.
  • a processor configured to carry out recitations A, B and C can include a first configured to carry out recitations B and C.

Abstract

L'invention concerne des systèmes et des procédés pour modifier une entrée et une sortie (E/S) vers un service de stockage d'objets en mettant en œuvre une ou plusieurs fonctions spécifiées par le propriétaire par rapport aux demandes d'E/S. Les fonctions peuvent être appliquées avant la mise en œuvre d'un procédé de demande (par exemple GET ou PUT) spécifié dans la demande d'E/S de telle sorte que les données auxquelles est appliqué le procédé peuvent ne pas correspondre à l'objet spécifié dans la demande. Par exemple, un utilisateur peut demander d'obtenir (par exemple, GET) un ensemble de données. L'ensemble de données peut être transmis à une fonction qui filtre des données sensibles à l'ensemble de données, et le procédé de demande GET peut ensuite être appliqué à la sortie de la fonction. De cette manière, le service de stockage d'objets offre aux propriétaires d'objets une plus grande maîtrise des objets stockés ou récupérés par le service.
PCT/US2020/051955 2019-09-27 2020-09-22 Insertion de pipelines de traitement de données spécifiés par le propriétaire dans un trajet d'entrée/sortie d'un service de stockage d'objets WO2021061620A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080067195.5A CN114586011B (zh) 2019-09-27 2020-09-22 将所有者指定的数据处理流水线插入到对象存储服务的输入/输出路径
EP20786202.0A EP4035007A1 (fr) 2019-09-27 2020-09-22 Insertion de pipelines de traitement de données spécifiés par le propriétaire dans un trajet d'entrée/sortie d'un service de stockage d'objets

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US16/586,619 US11106477B2 (en) 2019-09-27 2019-09-27 Execution of owner-specified code during input/output path to object storage service
US16/586,619 2019-09-27
US16/586,673 US11360948B2 (en) 2019-09-27 2019-09-27 Inserting owner-specified data processing pipelines into input/output path of object storage service
US16/586,704 2019-09-27
US16/586,673 2019-09-27
US16/586,704 US11055112B2 (en) 2019-09-27 2019-09-27 Inserting executions of owner-specified code into input/output path of object storage service

Publications (2)

Publication Number Publication Date
WO2021061620A1 true WO2021061620A1 (fr) 2021-04-01
WO2021061620A9 WO2021061620A9 (fr) 2022-04-28

Family

ID=72744930

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/051955 WO2021061620A1 (fr) 2019-09-27 2020-09-22 Insertion de pipelines de traitement de données spécifiés par le propriétaire dans un trajet d'entrée/sortie d'un service de stockage d'objets

Country Status (3)

Country Link
EP (1) EP4035007A1 (fr)
CN (1) CN114586011B (fr)
WO (1) WO2021061620A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11360948B2 (en) 2019-09-27 2022-06-14 Amazon Technologies, Inc. Inserting owner-specified data processing pipelines into input/output path of object storage service
US11394761B1 (en) 2019-09-27 2022-07-19 Amazon Technologies, Inc. Execution of user-submitted code on a stream of data
US11416628B2 (en) 2019-09-27 2022-08-16 Amazon Technologies, Inc. User-specific data manipulation system for object storage service based on user-submitted code
US11550944B2 (en) 2019-09-27 2023-01-10 Amazon Technologies, Inc. Code execution environment customization system for object storage service
US11656892B1 (en) 2019-09-27 2023-05-23 Amazon Technologies, Inc. Sequential execution of user-submitted code and native functions
US11860879B2 (en) 2019-09-27 2024-01-02 Amazon Technologies, Inc. On-demand execution of object transformation code in output path of object storage service

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117111904A (zh) * 2023-04-26 2023-11-24 领悦数字信息技术有限公司 用于将Web应用自动转换成无服务器函数的方法和系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150372807A1 (en) * 2014-06-18 2015-12-24 Open Text S.A. Flexible and secure transformation of data using stream pipes
US9323556B2 (en) 2014-09-30 2016-04-26 Amazon Technologies, Inc. Programmatic event detection and message generation for requests to execute program code
US20180322176A1 (en) * 2017-05-02 2018-11-08 Home Box Office, Inc. Data delivery architecture for transforming client response data

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3436927B1 (fr) * 2016-03-30 2023-12-13 Amazon Technologies Inc. Traitement d'ensembles préexistants de données dans un environnement d'exécution de code sur demande
US10305734B2 (en) * 2016-04-07 2019-05-28 General Electric Company Method, system, and program storage device for customization of services in an industrial internet of things
WO2018005829A1 (fr) * 2016-06-30 2018-01-04 Amazon Technologies, Inc. Exécution de code à la demande à l'aide d'alias de comptes croisés

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150372807A1 (en) * 2014-06-18 2015-12-24 Open Text S.A. Flexible and secure transformation of data using stream pipes
US9323556B2 (en) 2014-09-30 2016-04-26 Amazon Technologies, Inc. Programmatic event detection and message generation for requests to execute program code
US20180322176A1 (en) * 2017-05-02 2018-11-08 Home Box Office, Inc. Data delivery architecture for transforming client response data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
AMAZON: "AWS Lambda: Developer Guide", 26 June 2016 (2016-06-26), XP055402554, Retrieved from the Internet <URL:https://docs.aws.amazon.com/lambda/latest/dg/welcome.html> [retrieved on 20170830] *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11360948B2 (en) 2019-09-27 2022-06-14 Amazon Technologies, Inc. Inserting owner-specified data processing pipelines into input/output path of object storage service
US11394761B1 (en) 2019-09-27 2022-07-19 Amazon Technologies, Inc. Execution of user-submitted code on a stream of data
US11416628B2 (en) 2019-09-27 2022-08-16 Amazon Technologies, Inc. User-specific data manipulation system for object storage service based on user-submitted code
US11550944B2 (en) 2019-09-27 2023-01-10 Amazon Technologies, Inc. Code execution environment customization system for object storage service
US11656892B1 (en) 2019-09-27 2023-05-23 Amazon Technologies, Inc. Sequential execution of user-submitted code and native functions
US11860879B2 (en) 2019-09-27 2024-01-02 Amazon Technologies, Inc. On-demand execution of object transformation code in output path of object storage service

Also Published As

Publication number Publication date
CN114586011B (zh) 2023-06-06
WO2021061620A9 (fr) 2022-04-28
CN114586011A (zh) 2022-06-03
EP4035007A1 (fr) 2022-08-03

Similar Documents

Publication Publication Date Title
US11386230B2 (en) On-demand code obfuscation of data in input path of object storage service
US11106477B2 (en) Execution of owner-specified code during input/output path to object storage service
EP4034998B1 (fr) Système de manipulation de données spécifiques à un utilisateur destiné à un service de stockage d&#39;objets basé sur un code soumis par l&#39;utilisateur
US10908927B1 (en) On-demand execution of object filter code in output path of object storage service
US11055112B2 (en) Inserting executions of owner-specified code into input/output path of object storage service
US11860879B2 (en) On-demand execution of object transformation code in output path of object storage service
US11360948B2 (en) Inserting owner-specified data processing pipelines into input/output path of object storage service
EP4035007A1 (fr) Insertion de pipelines de traitement de données spécifiés par le propriétaire dans un trajet d&#39;entrée/sortie d&#39;un service de stockage d&#39;objets
US11836516B2 (en) Reducing execution times in an on-demand network code execution system using saved machine states
US11023416B2 (en) Data access control system for object storage service based on owner-defined code
US11138030B2 (en) Executing code referenced from a microservice registry
US10996961B2 (en) On-demand indexing of data in input path of object storage service
US11550944B2 (en) Code execution environment customization system for object storage service
CN114586010B (zh) 对象存储服务的输出路径中对象过滤代码的按需执行
US11023311B2 (en) On-demand code execution in input path of data uploaded to storage service in multiple data portions
US11250007B1 (en) On-demand execution of object combination code in output path of object storage service
US11416628B2 (en) User-specific data manipulation system for object storage service based on user-submitted code
EP4035047A1 (fr) Obscurcissement de code à la demande de données dans un trajet d&#39;entrée d&#39;un service de mémorisation d&#39;objets
US11394761B1 (en) Execution of user-submitted code on a stream of data
US11656892B1 (en) Sequential execution of user-submitted code and native functions
US20240103942A1 (en) On-demand code execution data management

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20786202

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2020786202

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2020786202

Country of ref document: EP

Effective date: 20220428