CN105706047B - Data Stream Processing frame based on subregion - Google Patents

Data Stream Processing frame based on subregion Download PDF

Info

Publication number
CN105706047B
CN105706047B CN201480061587.5A CN201480061587A CN105706047B CN 105706047 B CN105706047 B CN 105706047B CN 201480061587 A CN201480061587 A CN 201480061587A CN 105706047 B CN105706047 B CN 105706047B
Authority
CN
China
Prior art keywords
node
data
record
subregion
working node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201480061587.5A
Other languages
Chinese (zh)
Other versions
CN105706047A (en
Inventor
M·M·泰默
G·D·高雷
J·D·杜纳根
G·伯吉斯
熊颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Amazon Technologies Inc
Original Assignee
Amazon Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/077,167 external-priority patent/US10635644B2/en
Application filed by Amazon Technologies Inc filed Critical Amazon Technologies Inc
Publication of CN105706047A publication Critical patent/CN105706047A/en
Application granted granted Critical
Publication of CN105706047B publication Critical patent/CN105706047B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

A kind of control node reception request of multi-tenant stream process service, the request instruction have the operation executed in the data record for staying in specific data flow.Based on stream partitioning strategies, the control node determines the initial number for having working node ready for use.The control node configuration work node in the data of reception to execute the operation.It is in the determination in unsound state in response to the working node, working node is replaced in the control node configuration.

Description

Data Stream Processing frame based on subregion
Background technology
Since the cost of the storage of data for many years has reduced and due to by the various element interconnections of computer infrastructure Ability improved, potentially can acquire and analyze more and more data related with a variety of application programs.For example, moving Mobile phone can generate the data of their place of instruction, the application program used by telephone subscriber etc., can acquire and analyze At least some of described data that customization discount coupon, advertisement etc. is presented for user.By point of the data of monitor camera acquisition Analysis may be useful for preventing and/or solving crime, and is embedded in from aeroengine, automobile or complicated machinery each The data of sensor acquisition at kind place can be used for various purposes, and such as preventive maintenance improves efficiency and reduces cost.
The increased use that the volume increase of flow data has been accompanied by commercial hardware (and can pass through quotient in some cases It is possibly realized with the increased use of hardware).Going out for virtualization technology for commercial hardware has been that management is used for many The large-scale calculations resource of the application program of type provides benefit, to allow various computing resources efficiently and safely by more A client is shared.For example, virtualization technology can be by providing for each user by one of single one physical computer trustship Or multiple virtual machines and allow the single one physical computer to share between a plurality of users, wherein each such virtual machine fills When the software of Different Logic computing system is simulated, the software is modeled as user and provides with oneself to be that given hardware calculates money The sole operators in source and the illusion of administrator, while additionally providing the isolation of the application program between various virtual machines and safety Property.In addition, some virtualization technologies are capable of providing the virtual resource across two or more physical resources, such as have across more The single virtual machine of multiple virtual processors of a difference physical computing systems.In addition to computing platform, some large organizations also carry For the various types of storage services for using virtualization technology to establish.By using this storage service, mass data can store With required life level.
Although from various suppliers can relatively low cost obtain virtual computing and/or storage resource, it is larger Dynamic fluctuation data flow acquisition, storage and processing management and layout be still challenge proposition for various reasons. When more resources be added to system be arranged for handling larger data flow when, such as be likely to occur system different piece it Between workload it is unbalanced.If do not solved, in addition to other resources underuse (and loss therefore) other than, The so this unbalanced severe performance problem that may cause in some Energy Resources Service.Client is also possible to the peace to their flow data Full property worry, or if this data or result are stored at the uncontrollable facility of client, also to analyzing fluxion According to result worry.In the case where increasing frequency when increasing in distributed system size, propensity is in the failure of generation (such as internuncial lose once in a while and/or hardware fault) may also must efficiently solve, to prevent the acquisition of fluid stopping data, storage Or the expensive interruption of analysis.
Description of the drawings
Fig. 1 is provided to be summarized according to the simplifying for data flow concept of at least some embodiments.
Fig. 2 is provided according at least some embodiments in Workflow Management System (SMS) and adopting including the stream process stage The general introduction of data flow among each sub-components of the stream processing system (SPS) of collection.
Fig. 3 shows the example of achievable respective sets programming interface at SMS SPS according at least some embodiments.
Fig. 4 shows to realize to make according to the exemplary network-based interface of at least some embodiments, the interface The figure in stream process stage can be generated by obtaining SPS clients.
Fig. 5 shows that achievable programmed recording submits interface and record to examine at SMS according at least some embodiments The example of rope interface.
Fig. 6 shows the exemplary elements of the intake subsystem according to the SMS of at least some embodiments.
Fig. 7 shows the exemplary elements of the storage subsystem of the SMS according at least some embodiments.
Fig. 8 shows the exemplary elements and retrieval subsystem of the retrieval subsystem of the SMS according at least some embodiments With the example of the interaction of SPS.
Fig. 9 shows the example of the redundancy group for establishing the node for SMS or SPS according at least some embodiments.
Figure 10 shows the provider network environment according at least some embodiments, wherein the node of given redundancy group can divide Cloth is in multiple data centers.
Figure 11 shows multiple placements of the node that can be selected for SMS or SPS according at least some embodiments Destination.
Figure 12 a and Figure 12 b be shown respectively according at least some embodiments can be by SPS clients and SMS clients The example of the secure option request of submission.
Figure 13 a show the showing between flow data manufacturer and the intake node of SMS according at least some embodiments Example sexual intercourse is mutual.
Figure 13 b show the data record that intake can be generated at SMS according at least some embodiments The exemplary elements of sequence number.
Figure 14 shows the orderly reality stored and retrieve of the flow data record at SMS according at least some embodiments Example.
Figure 15 shows according to the mapping of the flow point area of at least some embodiments and can be directed to what SMS and SPS nodes were made The example of corresponding configuration decisions.
Figure 16 shows the example according to the dynamic stream of at least some embodiments again subregion.
Figure 17 is to show may perform to support for data record intake and data note according at least some embodiments Record the flow chart of the operating aspect of the respective sets programming interface of retrieval.
Figure 18 a are the stream according to the operating aspect for showing may perform to the configuration stream process stage of at least some embodiments Cheng Tu.
Figure 18 b are the clients shown in response to the configuration for stream process working node according at least some embodiments Hold the flow chart of the executable operating aspect of component invocation in library.
Figure 19 is to show may perform to realization for the one or more extensive of stream process according at least some embodiments The flow chart of the operating aspect of multiple strategy.
Figure 20 is to show may perform to a variety of secure options of the realization for data flow according at least some embodiments Operating aspect flow chart.
Figure 21 is to show may perform to behaviour of the realization for the partitioning strategies of data flow according at least some embodiments Flow chart in terms of work.
Figure 22 be according at least some embodiments show may perform to realize the behaviour of the dynamic subregion again of data flow Flow chart in terms of work.
Figure 23 is to show may perform to realize that being used for data flow records at least once according at least some embodiments The flow chart of the operating aspect of record intake strategy.
Figure 24 is to show may perform to a variety of persistence plans of the realization for data flow according at least some embodiments The flow chart of operating aspect slightly.
Figure 25 shows the example of the stream processing system according at least some embodiments, the wherein working node of processing stage Coordinate their workload using database table.
Figure 26 shows being storable in the subregion allocation table coordinated for workload according at least some embodiments Exemplary entries.
Figure 27 shows to be selected at it by the execution of the working node in stream process stage according at least some embodiments The operating aspect of the upper subregion for executing processing operation.
Figure 28 shows to be based on from stream by the execution of the working node in stream process stage according at least some embodiments The operating aspect for the information update subregion allocation table that management service control subsystem obtains.
Figure 29 shows the load balancing that can be executed by the working node in stream process stage according at least some embodiments The aspect of operation.
Figure 30 is the block diagram for the exemplary computing devices for showing to use at least some embodiments.
Although describing embodiment party herein by the mode for the example for enumerating several embodiments and schematic figures Case, it will be recognized by one skilled in the art that embodiment is not limited to described embodiment or attached drawing.It should be understood that attached Figure and detailed description are not intended to embodiment being limited to particular forms disclosed, but on the contrary, it is intended to Cover all modifications, equivalent and the alternative solution in the spirit and scope for falling into and being defined by the appended claims.This Any title used herein is only used for organizational goal, and is not intended the model for limiting description or claims It encloses.Such as run through it is used herein, word "available" be with allow meaning (i.e., it is meant that it is possible that) rather than force meaning (that is, anticipate Taste necessary) it uses.Similarly, word " including (include/including/includes) " mean include but unlimited In.
Specific implementation mode
It describes and is designed to manipulate hundreds of or even thousands of concurrent data manufacturers and data disappear for managing The establishment of the large-scale data stream of expense person, the method and apparatus for storing, retrieving and processing various embodiments.As used herein Term " data flow " refer to can by one or more data manufacturers generate and accessed by one or more data consumers The sequence of data record, wherein each data record is assumed to the byte of constant sequence.Flow management service (SMS) can provide programming Interface (such as application programming interface (API), webpage or network address, graphical user interface or command-line tool) enables to Establishment, configuration and the deletion flowed, and in some embodiments flow data record submission, storage and retrieval.It relates to And (such as stream is created or is deleted or described below various dynamic with the operation of some type of stream of the interaction of SMS control units State division operation again) can referred to herein as " control plane " operation, and be not usually required to interact with control unit all Operation as data record is submitted, stores and retrieved can referred to herein as " data plane " operation.The meter of dynamic offer group Calculation, storage and Internet resources can be used to for example realize the service based on partitioning strategies in some this embodiments, described Partitioning strategies allow flow management workload to be allocated in many Service Parts in a manner of mensurable, and following article is further detailed Carefully describe.Abbreviation SMS can be used to refer to flow management service herein, and also refer to including the void for realizing flow management service The Workflow Management System of quasi- and/or physical resource acquisition.
In various embodiments, some consumers of SMS can develop the application journey for directly invoking SMS programming interface Sequence.However, at least some embodiments, in addition to SMS interfaces, the abstract of greater degree can be provided for consumer or apply journey The processing frame of sequence grade, the processing frame, which can simplify, to be not intended to for those using the lower grade directly supported by SMS Flow management function come development and application program client stream process various aspects.This frame can provide its own (such as established on the top of SMS interfaces) programming interface, so that consumer can grasp with the flow management of lower grade Make compared to the business logic for focusing more on stream record realization to be used.The frame of higher level can be realized as in some embodiment party The stream process service (SPS) with the control plane and data plane components of its own, the stream process service can provide in case Premium Features, such as the automation failure for automation resource, processing node applied to stream process are shifted, are built at arbitrary stream It manages the ability of work flow diagram table, support temporary current, the dynamic based on workload variation or other trigger conditions subregion etc. again. In at least some embodiments, flow management service, stream process service or two kinds of services may be implemented as in virtualized environment Multi-tenant management Network Accessible Service.That is, various physical resource (such as computers in such an implementation Server or host, storage device, network equipment etc.) it can at least be shared in the stream of different consumers in some cases, Without make consumer accurately recognize resource be how to share or need not even make at all consumer recognize to Determine resource be shared.The service of the multi-tenant flow management of management and/or the control unit management of processing can dynamically be added Add, remove or based on it is various can application strategy reconfigure the node or resource that are used for specific stream, some strategies can be with It is that client is selectable.In addition, control unit can be also responsible for significantly realizing various types of security protocols (for example, coming true The data of another client can not be accessed by protecting the streaming application of a client, though at least some hardware or software can by this two Name client is shared), monitoring resource uses for charging, generates the record information etc. that can be used for auditing or debug.From the more of management From the point of view of the customer perspective of tenant's service, the control/management function realized by the service can eliminate the extensive stream application journey of support Involved many complexity in sequence.In some situations, the consumer of this multi-tenant service can indicate for extremely They are not intended to shared resource for few some type of stream relevant operation, in this case, can be extremely for the operation of those types Some physical resources of major general be temporarily designated as single tenant (that is, being limited to represent single consumer or client and execute Operation).
Can take realize in various embodiments SMS and/or SPS control planes and data plane operations it is a variety of not Same method.For example, being operated about control plane, in some implementations, the control server or section of redundancy group can be set Point.Redundancy group may include multiple control servers, a server in the multiple control server be designated based on service Device, the master server is responsible for responding the management request about various streams, and another server can be designated to such as Master is taken in the case of the trigger condition for generating failure (or losing the connectivity of master server) at current master server Server.In another implementation, the one or more tables created in the Internet-accessible database service center can be used to deposit Control plane metadata (such as subregion mapping) of the storage for various streams, and various intakes, storage or retrieval node can energy Enough the table is accessed depending on the needs of the subset of the metadata needed for acquisition data plane operation.Provided hereinafter about in difference The details of SPS and the various aspects of SMS data plane and control plane function in embodiment.It should be noted that realizing flow tube In some embodiments for managing service, it may not be necessary to realize the stream process service that higher level primitive is provided.In other implementations In scheme, can the high-grade programming interface of stream process service be only exposed to consumer, and by lower grade used Flow management interface may be disabled client.
According to some embodiments, Workflow Management System may include multiple subsystems that independently can configure, the subsystem packet It includes and is mainly responsible for the record intake subsystem of acquisition or gathered data record, is mainly responsible for according to applicable persistence or durability Strategy preserves the record storage subsystem of data record content and is mainly responsible in response to the read requests for storage record Record retrieval subsystem.It can also realize that control subsystem, the control subsystem include one or more in some embodiments A management or control unit, one or more of management or control unit are responsible for for example, by being dynamically determined and/or initializing For the node of each in intake, storage and the retrieval subsystem at selected resource (such as virtually or physically server) Requirement configure its sub-systems.Each in intake, storage, retrieval and control subsystem is usable corresponding multiple Hardware and/or software component realize, the multiple hardware and/or software component can be collectively referenced as subsystem " node " or " server ".Therefore, the various resources of SMS can logically belong to one kind in four kinds of functional types:Absorb, store, Retrieval and control.In some implementations, respective sets control unit can be established for each in other subsystems, such as Independent intake control subsystem, storage control subsystem and/or retrieval control subsystem can be achieved.This control subsystem can be with It is each responsible for identifying the resource of other nodes for corresponding to subsystem and/or is responsible in response to coming from client or coming from other The management of subsystem is inquired.In some implementations, the node pool for being able to carry out various types of SMS and/or SPS functions can To be configured in advance, and the selected member of those node pools can be assigned to new stream or new processing as needed Stage.
Stream partitioning strategies and associated mapping can be achieved at least some embodiments, for example to be taken the photograph at different groups Take, store, retrieving and/or control node between distribute data record subset.For example, based on selection for specific data stream Partitioning strategies and based on other factors (desired value for such as recording uptake ratio and/or recall ratio), control unit can determine most Just it should establish that how many node (for example, process or thread) are used for absorbing, store and retrieving (i.e. at stream creation time), and How these nodes should be mapped to virtual machine and/or physical machine.Over time, work associated with given stream Amount can increase or decrease, this may lead to the subregion again of stream (in addition to other trigger conditions).This subregion again can be related to The variation of various parameters, such as function of the subregion for determining record, the subregion key used, the sum of subregion, intake node, Memory node or the quantity for retrieving node, or different physics or the node on virtual resource placement.In at least some implementations In scheme, subregion can use the skill described in further detail below in the case of no interruption data record flowing again Art is dynamically realized.Different partition scheme and subregion trigger criteria in some embodiments can be for example based on client again The inspiration of the parameter or based on SMS control node of offer and be used for different data flows.In some embodiments, perhaps have Quantity and/or frequency that may be for example based on customer priorities, the life expectancy of stream or other factors limitation again subregion.
Many different record intake strategies and interface can be realized in different implementation scenarios.For example, in some implementations In scheme, client (such as being configured to represent the executable component or module of the programming interface of consumer's calling SMS of SMS) Interface is submitted using online or with reference to submission interface.For submitting online, the interior perhaps ontology of data record is in this implementation It can be included as submitting the part of request in scheme.In contrast, in reference to submission request, it is possible to provide (such as deposit address Storage device address, data-base recording address or URL (uniform resource locator)), the interior perhaps ontology of data record can be from institute State address acquisition.In some implementations, can also or alternatively mixing be supported to submit interface, wherein before data record N number of byte can be included in line, and remainder bytes are provided as referring to (if any).It is shorter under this situation Record (its ontology be less than N byte long) can be completely by submitting request specified, and the part of longer record may have to be from right Address is answered to obtain.
It, in some embodiments, can also be real in addition to the different substitutes for designated recorder content during intake Now related various confirmations or deduplication with intake strategy.For example, for some streaming applications, client may want to ensure that Each and all data records are reliably absorbed by SMS.In large-scale distributed flow management environment, data packet may lose, Or various failures may be generated every now and then along the path between data manufacturer and intake node, this may potentially lead The loss of data for causing some to submit.Therefore, in some embodiments, SMS can realize intake strategy at least once, according to described Strategy record submitter one or many can submit identical record until receiving affirmative acknowledgment from intake subsystem.Just Under normal operating condition, record can be submitted once, and submitter can after the intake node of reception has obtained and stored record Receive confirmation.If it is confirmed that losing or delay, or itself lost if record submits request, submitter can it is primary or Identical data record is repeatedly resubmited, is confirmed until eventually receiving.It can be for example based on by carrying if absorbing node Friendship person, which receives, confirms the so described record by the expectation that do not resubmit to generate the confirmation for being directed to and each submitting, no matter it is described Whether submit is to repeat.However, intake node can be responsible for identifying identical data record at least some embodiments It has been filed on repeatedly, and is responsible for avoiding unnecessarily storing the latest copy of duplicate data.In one embodiment, can support to It is negative that strategy-mono- of the intake at least once version of few two versions (can be described as " absorbing at least once, without repetition ") wherein SMS Duty deduplication data record (is submitted only in response to one in one group of two or more submission to ensure that data are stored in At SMS storage subsystems) and a version, wherein the repetition that the data record by SMS stores is allowed (to can be described as " at least Once, allow to repeat ").It is described at least once, allow repeat method may be useful for streaming application, wherein depositing In the stream application of negative consequences seldom or without data record repetition, and/or the repeated elimination for executing themselves Program may be useful.It can also support other intake strategies, such as absorb strategy as possible, wherein all carry need not be directed to The confirmation of the data record of friendship.If absorbing strategy as possible to come into force at least some embodiments, low volume data record Loss be acceptable.It is any that client can select them to be desirable for for various streams in various embodiments Intake strategy.
About the storage of stream record, many alternative strategies can be also supported at least some embodiments.For example, client End can select persistence strategy, these sides of the policy control record storage from several strategies supported by SMS Face is such as:The quantity of data-oriented transcript to be stored, be ready to use in copy memory technology type (such as volatibility or Non-volatile ram, the storage device based on spinning disk, solid-state device (SSD), network attached storage device etc.) etc..For example, N times are selected to replicate persistence strategy if client is the storage device based on disk, data record submission may not be recognized To be that N number of corresponding disk set is completely written safely in N number of copy of record until.Depositing based on disk is used wherein In at least some embodiments of storage device, SMS storage subsystems can attempt the then data note by the input of given subregion Record write-in disk, such as influenced to avoid the performance of disk tracking.Various technologies as described below can be used to generate sequence Number be used for (and being stored with) data record, including for example can based on intake the time come carry out ordered record retrieval based on the time The technology of stamp.In at least some embodiments, the data record of given subregion can be stored together, such as on disk Continuously and with the data record of other subregions dividually store.In some implementations, according to retention strategy (by client End selection is selected by SMS) or deduplication time window strategy (indicate when after submitting any data-oriented record Phase, within the period, SMS may need to ensure that the data-oriented transcript, which is not stored in SMS, stores subsystem In system, even if having submitted some copies), at least some data records can be archived to different types of storage device and/or one It is deleted from SMS after section period.This removal operation can be described as stream " finishing " herein.In some embodiments, client It can submit stream finishing request, such as notice SMS that specified data is no longer needed to record and therefore from the visitor for submitting finishing request The specified data record can be deleted from the perspective of the end of family, or clearly specified data record is deleted in request.Can Under the situation that can have the multiple client of the data record of consumption given stream, SMS can be responsible for ensuring it is given be recorded in its by All interested consumers are not prematurely deleted or are modified before accessing.In some implementations, if there is given N number of data consumer of stream, then before the given record R for deleting the stream, SMS may wait for all N number of until having determined that Data consumer has read or has processed R.For example, SMS can be asked or be based on data based on the corresponding finishing from consumer Consumer has proceeded to the corresponding instruction of what degree to determine that R is read by all consumers in the stream.In some realities It applies in scheme, some type of data consumer (such as testing related application) is subjected to accessing at least data record Small subset before they are deleted.Therefore, at least some embodiments, application program can lead to before retrieval Know the acceptability that SMS is deleted about data, and SMS can arrange to delete according to notice.In some embodiments, it achieves The data that strategy can for example be implemented as the type of the instruction storage device that for example flow data record should be copied to retain Strategy and the part tactful for the arrangement of this copy.
In at least some embodiments, can also support multiple programming interface for record retrieval.In an embodiment In, the method based on iterator, one of programming interface (such as obtaining iterator (getIterator)) can be used available Come (such as based on sequence number or timestamp) instantiation and positioning iterator at the specified logical offset in the subregion of stream or refer to Needle.Different programming interface (such as obtaining next record (getNextRecords)) can be subsequently used to working as from iterator Front position starts sequentially to read certain amount of data record.The instantiation of iterator can actually allow client flowing The arbitrary or random starting position for recording retrieval is specified in subregion.In such an implementation, if client is wished With random access mode reads data log, then client may have to repeatedly create new iterator.Based on rotation In the storage system for turning disk, the disk tracking needed for frequent random access may significantly affect the I/O response times.Cause This, at least some embodiments, when the motivation of client is sequentially rather than when randomly reading flow data record, with quilt Random-read access can be applied to by being applied to different (such as higher) charge rate of sequence read access.Thus, for example, one In a little realization methods, client can each obtain iterator calling and be billed X monetary unit, and by obtaining next note Each of record retrieval record is billed Y monetary unit, wherein X>Y.When alternative client-side interface is supported for other behaviour When making type (such as absorbing), at least some embodiments, charge rate or price for substitute may also be different, example As client may be for asking charge more, as client may be for random with reference to submission request ratio for submitting online It reads more than reading charge for sequence.In various embodiments, other factors may also influence charging, and such as data are remembered The size of record, writing to the distributing of read request, selected persistence strategy etc. at any time.
According to some embodiments, it includes appointing for many processing stages that stream process service (SPS), which allows client specified, Anticipate complicated processing work flow, wherein the processing of given stage place execution output can be used as zeroth order section or more its The input in his stage.In some embodiments, (similar to for for absorbing, storing and the SMS of retrieved data record is retouched Those of state) partitioning strategies can be used to divide processing work amount in multiple working nodes at each stage.It is this at one , it can be achieved that programming SPS interfaces are so that client can specify the various configurations for any given stage in embodiment Setting, including for example for the stage (for example, one or more streams that data record needs to be retrieved from it are together with for described The partitioning strategies of stream) input data source, have stay at the stage execute processing operation and for come from the stage Output or result distribution descriptor or specification (for example, output whether be saved in the form of different stream storage location, It is sent to network endpoint or is fed in other one or more processing stages).In at least some embodiments, specifies and use In the processing operation in SPS stages can be idempotent:That is, if given processing operation is performed in identical input data Repeatedly, if then operating result will not be different from operating only being executed once acquired result.If processing operation is Idempotent, then restoring to be simplified from failure (such as working node failure at the SPS stages), following article is further Detailed description.According to some embodiments, the processing operation of non-idempotent is allowed at some or all of SPS stages.
It is at least partially based on configuration information, such as inlet flow partitioning strategies and the place then received by SPS programming interface The property of operation is managed, SPS controls server and can determine that how many initial working node has use to be placed in various embodiments In each stage of processing work flow.When the initial number and placement for determining working node, it is also contemplated that being ready to use in The executive capability of the resource of working node (for example, virtual machine currently in use or physical machine).Selected quantity can be instantiated Working node (working node can include each executable thread or executable process in some implementations).Often A working node can be configured, such as with from input resource (such as retrieval node from one or more flow point areas) appropriate Data record is obtained, specified processing operation is executed in data record and handling result is transmitted to specified output purpose Ground.In addition, at least some embodiments, check point scheme may be implemented, according to the check point scheme, given work Node can be configured to store progress record or indicate the inspection of processed part at that working node of subregion It tests a little, wherein assuming that partitioned record is sequentially processed.In some implementations, working node can be for example by progress record Periodically (for example, every N seconds or every R processed data records) write-in permanent storage device and/or response come from SPS controls the check point request of server.
In some embodiments, progress record can be used for the fast quick-recovery from working node failure.For example, SPS is controlled Server for example can utilize horizontal (such as cpu busy percentage, I/O utilization ratio of device using heartbeat mechanism and/or by monitoring resource Or network utilization is horizontal) health status of each working node is monitored at any time.In response to being made by SPS control servers Particular job node be in do not need or unhealthy condition (for example, if its be without response or overload) determination, replace Working node can be instantiated to take over the responsibility of particular job node.Working node is replaced to may have access to by replacement working node The nearest progress record of storage replaces the described group of data record that working node should be handled with identification.It is in processing operation In the embodiment of idempotent, even if some operations are repeated (for example, since nearest progress record is in the reality for replacing working node The some time is written into before exampleization), the total result of the processing will not be influenced by failure and replacement.In some realization sides In formula, in addition to storage instruction given stream or subregion by the progress record of its processed subset, working node can also quilt It configures to store the application state information of accumulation.For example, if stream process workflow is responsible for based on analysis instruction service The flow data of service index records to determine the client charging total value for special services, then working node can scheduled store For the charging total value for the accumulation that various clients determine.
In at least some embodiments, SPS control servers can also be configured to ring by starting other action Should be in various other triggerings, the workload for such as changing workload level or detecting is unbalance (for example, if being directed to a subregion Uptake ratio be disproportionately higher than those of other subregions uptake ratio), other described action are such as defeated for each phase requests The dynamic to become a mandarin again subregion, at given stage place change the quantity of working node of the distribution to given subregion, for some stages It distributes the working node of higher performance or is transferred to working node with the another of different performance ability from a physical resource One physical resource.In some embodiments, for example, in response to being directed to Given Order by needing of making of SPS control servers Section completes the determination of recovery policy (rather than based on recovery policy of check point) as possible, and the progress record of the type described above can Not stored by the working node at least some SPS stages.In this some realization methods of recovery policy as possible, work is replaced Simple process can be carried out when receiving new data record to them by making node, without accessing progress record.At some In embodiment, if client wishes to realize recovery policy of doing the best at the SPS stages, the stream executed at the stage Processing operation not necessarily needs to be idempotent.The non-idempotent processing operation executed on stream record at the SPS stages is stayed in having In embodiment, the recovery based on check point may not be supported, and the different recovery schemes such as restored as possible can be used. In at least one embodiment, it can be operated only to allow the stream process of idempotent at the SPS stages.
The data record of some streams can include sensitive or confidential information, or the processing operation executed at the SPS stages It may include the use of proprietary algorithm, if it is problematic that the proprietary algorithm is had found that it is likely that by rival.Client may be because This is concerned about the safety of the various species of flow management and processing operation, especially if using being located at not exclusively by client itself Resource at the provider network data center of control executes the operation.The entity organized by such as company or public sector It establishes to provide the addressable one or more network-accessibles of client by internet and/or other networks to distribution group The network of service (such as various types of databases based on cloud, calculating or storage service) can be referred to as supplying herein Quotient's network.In some implementations, client can be from a variety of safety-related options of the data flow for them In selected.As described above, SPS and the SMS configuration of combination may include the node for belonging to a variety of different functional types, such as Node, SMS memory nodes, SMS retrieval nodes and SPS processing or work are absorbed for the control node of SMS and/or SPS, SMS Make node.In some embodiments, the selection based on safety that can be used for client made may include for various types of The placement of the node of type and the option in place.For example, in one embodiment, client can be asked positioned at client Hold the SPS work sections that one or more processing stages for flowing workflow are realized at the computing device on all facilities Point, even if stream record is acquired and/or stored using the resource at provider network.In response to this placement The node of request, the different functional types for given stream can be in the respective resources with no security feature or feature It is instantiated at set.
In different implementation scenarios, the resource collection can be different from each other in various safety-related characteristics, Including such as physical location, physical security agreement currently in use (such as it has the physical access to resource), Network Isolation Grade (such as the network address of resource is to visible level of various entities), multi-tenant are to single tenant etc..In some embodiments In, client can establish the virtual network (IVN) of isolation in provider network, wherein given client is endowed pair It include the substantive control of the network configuration of the various devices in the IVN of that client.Specifically, client can be with Can limit each server or calculated examples in the IVN to distributing to them network address (for example, Internet protocol or IP address) access.In such an implementation, client can ask certain subsets in their SMS or SPS nodes to exist It is instantiated in specified IVN.In the supplier of such as virtualization example host (it can be commonly configured to multi-tenant host) Internet resources are used in the embodiment of SMS or SPS nodes of various species, and client can ask real on example host Some group nodes of exampleization, the example host are limited to realize example (i.e. some the SMS or SPS nodes for only belonging to client It can be realized at the example host for being configured to single tenant's host).
In some embodiments, as another safety-related option, client can ask the data of specific stream Be recorded in before transmitting them in network linking and be encrypted, for example, at SMS, in intake subsystem and storage subsystem Between, between storage subsystem and retrieval subsystem, between retrieval subsystem and SPS working nodes and/or work save It is encrypted before intake between point and SPS output destinations.In some embodiments, client may specify ready for use add Close algorithm.In one embodiment, the safety net of such as TLS (Transport Layer Security) agreements or SSL (Secure Socket Layer) agreement Network agreement can be used for data record transmission and/or be used for transmission SPS handling results.
Data flow concept and general introduction
Fig. 1 is provided to be summarized according to the simplifying for data flow concept of at least some embodiments.As shown, stream 100 can To include multiple data records (DR) 110, such as DR 110A, 110B, 110C, 110D and 110E.Such as data manufacturer 120A Write operation 151 is can perform to generate the number of stream 100 with one or more data manufacturers 120 (alternatively referred to as data source) of 120B According to the content of record.Many different types of data manufacturers can generate data flow in different implementation scenarios, such as move Mobile phone or tablet computer application program, sensor array, social media platform, records application program or system recording-member, Different types of monitoring agent etc..One or more data consumers 130 (such as data consumer 130A and 130B) are executable Read operation 152 is to access the content of the data record generated by data manufacturer 120.In some embodiments, data consumption Person 130 may include the working node in such as stream process stage.
In at least some embodiments, the given data record 110 being such as stored in SMS may include data portion 101 (such as be respectively DR 110A, 110B, 110C, 110D and 110E data portion 101A, 101B, 101C, 101D and 101E) and sequence number SN 102 (such as be respectively DR 110A, SN 102A of 110B, 110C, 110D and 110E, 102B, 102C, 102D and 102E).In the embodiment of description, sequence number 102 may indicate that by DR receive at Workflow Management System (or Person is at the specific node of Workflow Management System) sequence.In some implementations, data portion 101 may include it is constant not The byte sequence of explanation:That is, write operation 152 is once completed, since the content of DR caused by write-in can not be changed by SMS Become, and usually SMS may not know the semanteme of data.In some implementations, the different data records of given stream 100 It may include different data volumes, and in other realization methods, all data records of given stream can have same size.Extremely In some few realization methods, the node (such as intake subsystem node and/or storage subsystem node) of SMS can be responsible for generation SN 102.As described in further detail below, the sequence number of data record need not be continuous always.In a kind of realization side In formula, as the part of write request, client or data manufacturer 120 can provide minmal sequence number and be ready to use in corresponding number According to the instruction of record.In some embodiments, data manufacturer 120 for example (can such as be filled by providing storage device address Set the offset in title and device) or the network address (such as URL) of data portion can be obtained from it cover number to submit According to the write request of the pointer (or its address) of the data portion of record.
Flow management service can be responsible for receiving data from data manufacturer 120, the storage data and data is made to disappear The person of expense 130 can access the data with one or more access modules in various embodiments.In at least some embodiment party In case, stream 100 can be partitioned or " fragmentation " is to distribute the workload of reception, storage and retrieved data record.In this reality It applies in scheme, subregion or fragment can be selected by one or more attributes based on data record for incoming data record 110, and wait absorbing, store or the specific node of retrieved data record can be identified based on the subregion.In some realization sides In formula, data manufacturer 120 can provide the specific subregion key that can be used as zone attribute with respective write operation, and this Kind key can be mapped to partition identifier.In other realization methods, SMS can be based on such as identity of data manufacturer 120, number According to the IP address of manufacturer this kind of factor or be based even on the data content of submission and infer partition id.By data flow point In some realization methods in area, sequence number can be allocated on the basis of by subregion, such as although sequence number may indicate that reception The sequence number of the sequence of the data record of particular zones, data record DR1 and DR2 in two different subregions may be not necessarily referring to Show the relative ranks for receiving DR1 and DR2.In other realization methods, sequence number can be wide in stream rather than on the basis for pressing subregion Upper distribution, so that if the sequence number SN1 for distributing to data record DR1 is less than the sequence number for distributing to data record DR2 SN2, this will imply that DR1 is received compared with DR2 by SMS earlier, no matter which subregion DR1 and DR2 belongs to.
By the SMS retrievals supported and read interface allow data consumer 130 in various embodiments unceasingly and/ Or with random-sequential access data record.In one embodiment, the reading application programming based on iterator can be supported Interface (API) group.Data consumer 130 can submit request to obtain the iterator for data flow, wherein the iterator Initial position is indicated by assigned serial number and/or partition identifier.After instantiating initiator, data consumer can be submitted Request with since it is described stream or subregion in the initial position by sequential order reads data log.In this embodiment In, if data consumer wishes that, with some random sequence reads data logs, new iterator may have to be directed to It reads and instantiates every time.In at least some realization methods, it can come usually using the sequence write operation for avoiding disk from seeking The data record of given subregion or stream is sequentially written in the storage device based on disk with sequence number.Sequence read operation also can avoid The expense of disk tracking.Therefore, in some embodiments, data consumer can be used price incentive to encourage to execute ratio Random read take is more sequentially read:It is visited than sequence for example, such as random access read operation of iterator instantiation can have Ask read operation higher associated charge rate.
Exemplary system environment
Fig. 2 is provided according at least some embodiments in Workflow Management System (SMS) and adopting including the stream process stage The general introduction of data flow among each sub-components of the stream processing system (SPS) of collection.As shown, SMS 280 may include absorbing Subsystem 204, storage subsystem 206, retrieval subsystem 208 and SMS control subsystems 210.As described below, SMS subsystems In each may include for example using the various resources in provider network (or client all or third party's facility) The one or more nodes or component that the corresponding executable thread or process of place's instantiation are realized.Absorb the section of subsystem 204 Point can based on the partitioning strategies for the stream come by (for example, by node of SMS control subsystems 210) configuration come obtain come From the data record of the specific data stream of data manufacturer 120 (such as 120A, 120B and 120C), and each intake node can The data record of reception is transferred to the corresponding node of storage subsystem 206.Storage subsystem node can be used for institute according to selection Data record is stored on any different types of storage device by the persistence strategy for stating stream.The node of retrieval subsystem 208 It may be in response to the read requests from data consumer, such as working node of SPS 290.Subsystem can be controlled by means of SPS System 220 is established the stream process stage 215, stage 215A, 215B, 1215C and 215D of such as SPS 290.Each stage 215 can wrap It includes to be configured by SPS control subsystems 220 and be worked with executing one group of the one or more of processing operation in the data record of reception Node.As shown, some stages 215 (such as 215A and 215B) directly can obtain data record from SMS280, and it is other Stage, (such as 215C and 215D) can receive their input from other stages.In some embodiments, multiple SPS stages 215 can parallel work-flow, such as at stage 215A and 215B, different processing operations can be in the data retrieved from identical stream It is performed simultaneously on record.It should be noted that similar to the corresponding subsystem shown in Figure 2 for those of specific stream and processing rank Section can also be instantiated for other streams.
In at least some embodiments, at least some nodes of subsystem shown in Figure 2 and processing stage can be used Provider network resource is realized.As previously noted, the entity organized by such as company or public sector is established logical to provide The addressable one or more Network Accessible Services of client for crossing internet and/or other networks to distribution group are (such as each Database based on cloud, calculating or the storage service of type) network can be referred to as provider network herein.Some Service can be used to build higher levels of service:Such as calculate, storage or database service can be used as flow management service or The structure module of stream process service.At least some kernel services of provider network can be in the service unit for being known as " example " It is packaged and is used for client:For example, the virtual machine of service instantiation is calculated by virtualization can represent " calculated examples ", and " storage example " such as can be referred to as by the storage device of the block grade volume of storage device instantiation or data base administration takes Business device can be referred to as " database instance ".Computing device can be referred to as " example host " or more simply claim herein For " host ", the computing device can such as realize this unit of the various Network Accessible Services of provider network at which Server.In some embodiments, subsystem 204, storage subsystem 206, retrieval subsystem 208, SMS controls system are absorbed The node of system 210, processing stage 215 and/or SPS control subsystems 220 may include each calculating on multiple example hosts The thread or process executed at example.Given example host may include several calculated examples, and at particular instance host The acquisitions of calculated examples can be used to realize the node of a variety of different streams for one or more clients.At some In embodiment, storage example can be used to store the data records of various streams, or the result as the stream process stage Destination.It is described below with reference to Figure 15 and Figure 16, with the variation of time, control subsystem node may be in response to various touch Clockwork spring part dynamically changes the group of other subsystems, for example, by add or remove node, concept transfer to processing example or The mapping of calculated examples or example host or again subregion given stream continue to receive simultaneously, store and process data note Record.
Provider network resource be used for flow relevant operation embodiment content in, term " client " when as Can refer to by entity (such as tissue, the group with multiple users or single user) institute when the source or destination of given communication There are, manage or be dispatched to any one of computing device, process, hardware module or software module of the entity, the reality Body be able to access that and using provider network at least one Network Accessible Service.A kind of client of service can be made with itself It is realized with the resource of another kind service, such as flow data consumer (client of flow management device) may include calculated examples (resource that service provides is calculated by virtualization).
Given provider network may include that (it can spread different ground for many data centers of the various resource pools of trustship Region is managed to distribute), such as physics and/or virtual computer service device, each storage with one or more storage devices The acquisition of server, the network equipment etc. needs to realize, configuration and distributes the foundation structure provided by supplier and service.This In embodiment, many different hardware and/or software component can be used for realizing each in the service jointly, described hard Some in part and/or software component can be instantiated or hold at different data centers or in different geographic areas Row.Client can be with the resource and service interaction at provider network, and the resource and service are outside provider network Positioned at client is all or the device and/or the dress in provider network of the place of client-side management or data center It sets.It should be noted that although provider network is used as wherein realizing that one kind of many flow managements as described herein and treatment technology is shown Example property content, those technologies also apply to the other kinds of distributed system other than provider network, such as using To the large-scale distributed environment operated for the application program of its own by individual enterprise's entity.
Programming interface embodiment
As indicated above, at least some embodiments, SPS can build higher water using SMS programming interface Flat function, the function can be used more easily by SPS clients to realize for the various application programs based on stream Required business logic.When in view of the difference between SPS functions and SMS functions, it may be useful to analogize.SPS functions can be with Generally be compared with the programming language structure in higher levels of language (such as C++), and SMS functions can generally with Assembly language directive is compared, and programming language structure is converted to assembly language directive by compiler.Perhaps it is possible that directly Same operation is realized using assembly language directive, but may be usually easier for many in the programming of higher level language The consumer or user of type.Similarly, perhaps there is a possibility that realize various application programs with the primitive provided by SMS, but It is that this may be more easily accomplished by using SPS features.(the processing of the idempotent executed in such as data record of SPS processing operations Operation) it can be realized on the data content of stream record, and SMS operations are performed and record itself with acquisition, storage and retrieval, and The content of the record is not considered usually.Fig. 3 shows achievable corresponding at SMS SPS according at least some embodiments The example of group programming interface.By example, many different application programming interfaces (API) are indicated for SMS and SPS. The API shown is not intended to be the full list that API those of is supported in any given realization method, and the API shown In some may not be supported in given realization method.
As indicated by arrow 350, SPS clients 375 can call SPS programming interface 305 to configure processing stage.Respectively The SPS programming interface 305 of type can be realized in different implementation scenarios.For example, creating the stream process stage (createStreamProcessingStage) API can enable a client to ask the new place for specifying inlet flow The configuration in reason stage 215, so that each of the working node in the stage is configured to carry out the meaning in interface calling Fixed one group of idempotent operation, and result is distributed to the destination by output distribution descriptor or strategy instruction.It is flowed creating In some versions of processing stage API or its equivalent, client can also ask to create inlet flow, and in other versions, Inlet flow may have to create before generating processing stage.Recovery policy can be specified for working node, to example As indicated, whether the recovery technology based on check point has to be used or whether recovery technology is preferred as possible.In some implementations In scheme, initial work node (initializeWorkerNode) API can be supported, to ask work section at specified phases The clearly instantiation of point.In the embodiment for realizing the recovery based on check point, it can support to preserve check point (saveCheckpoint) API, to allow client request to generate progress record by working node.
Various types of SPS outgoing managements API, such as setting output distribution can be supported in different implementation scenarios (setOutputDistribution) API, client export distribution API by the setting and may indicate that use in specified phases The result and the one or more to be created by the particular zones strategy of the stream for newly creating for locating the processing operation executed Stream.Some processing stages can be mainly configured to subregion again, such as be mapped data record based on record attribute collection A1 A kind of sectoring function PF1 to N1 subregions can be used for inlet flow S1, and processing stage can be used for realizing different subregions Those identical data records are mapped to N2 points by function PF2 with (using different property set A2 or identical property set A1) Area.Some SPS API of such as link stage (linkStage) can be used to the arbitrary graphic that configuration includes multiple stages (such as directed acyclic graph).In some embodiments, the connection to third party or increase income stream process frame or service can be supported Device.In a this embodiment, it can be used for that (such as the processing operation by being executed at the stage is suitable the SPS stages When the result of formatting) prepare data record, for the consumption by existing third party or open source system.In the embodiment party of description In case, the API for such as creating third party's connector (createThirdPartyConnector) can be used for establishing this connection Device, and the result in SPS stages can be executed by one or more connector modules to the format compatible with third party system Appropriate conversion, one or more of connector modules are instantiated as the result that third party's connector calls is created.
SPS can call SMS API 307 to execute at least some of its function, as indicated by arrow 352.It is retouching In the embodiment painted, SMS API 307 may include for example creating stream (createStream) and delete stream (deleteStream) (be respectively created and delete stream) and obtain stream information (getStreamInfo) (with obtain for flowing Metadata, be such as responsible for the network address of the various types node of given subregion).Placing record (putRecord) interface can For data record is written, and obtain iterator (getIterator) and obtain next record (getNextRecord) interface can It is respectively used to non-sequential and sequence reading.In some embodiments, subregion stream interface can be used for asking specified stream again Dynamic subregion again.Wish that the client done so 370 can directly invoke SMS API 307, as indicated by arrow 354. As previously indicated, various other SMS and/or SPS API can be also realized in other embodiments, and in some embodiment party Some in the API listed in Fig. 3 can not be realized in case.
In various embodiments, the programming interface other than API can also or instead be implemented to SPS or SMS.This interface may include graphical user interface, webpage or website, command line interface etc..In some cases, it is based on network Interface or GUI can be used API as structure module, such as network-based interaction can be at the control unit of SMS or SPS Generate the calling of one or more API.Fig. 4 shows the exemplary network-based interface according at least some embodiments, institute It states interface and may be implemented such that SPS clients can generate the figure in stream process stage.As shown, interface includes having message Webpage 400, EFR STK area 404 and the graphic designs area 403 in area 402.
User can be provided the normal instruction of the structure about the stream process figure in message area 402, and can use In the study more link about stream concept and primitive.Many graphic icons can be provided as stream process figure in menu area 404 The part of shape tool set.For example, client can be allowed to indicate to continue stream as the input or output of each SPS processing stages 451, temporary current 452 or the connector 453 to third party's processing environment.It is implemented to its about network-based interface SPS/SMS, the stream that its data record is stored in persistent storage can be defined as by persistently flowing 451, described lasting Property storage device such as disk, non-volatile ram or SSD, and temporary current 452 can be defined as its data record and not need A kind of stream being stored at persistent storage.Temporary current can be generated for example from the output in SPS stages, the output quilt It is expected that being consumed as input by the different SPS stages of recovery policy as possible to be achieved.
Two kinds of processing stage is supported in exemplary SPS figures build webpage 400:Stage 455, wherein using base Restore in the working node of check point (for example, each working node preserves progress record every now and then, and in particular job node In the case of failure, node is replaced with reference to the progress record of the node of failure so that start to process which data record determined);With And the stage 456, wherein using restoring as possible (for example, replace working node without reference to progress record, but only new when receiving Data record when start to handle it).About there is the details of processing operation for staying in and being executed at each stage that can lead to It crosses in the corresponding icon during figure builds area 403 and clicks and enter, as indicated by the instruction in message area 402.In addition to For flowing, the icon of connector and processing stage, menu area 404 further includes the icon for indicating third party or external stream processing system Type 459, and indicate the icon type 460 of the node for the storage device that can be realized at provider network, the supply The resource of quotient's network is used for the processing stage.
Under the Exemplary contexts being shown in FIG. 4, client has built figure 405, and the figure 405 is in graphic designs area Include three processing stages 412,415 and 416 in 403.It is configured to make using the processing stage 412 of the recovery based on check point Use lasting stream 411 as input.The output of processing at stage 412 or result are sent to two destinations:In formation stages The form of the different lasting streams 413 of 415 input;And in the form of the temporary current 414 of the input of formation stages 416.Rank Section 415 and 416 is all that their working node uses recovery policy of doing the best.The output in stage 415 is sent out in the form of temporary current It send to storage service node 419.The output in stage 415 is sent to third method, process system 418 by connector 417.It " preserves Figure " button 420 can be used to for example preserve indicating for processing stage figure with any appropriate format, and the format is such as JSON (JavaScript object labelling method), XML (extensible markup language) or YAML.In various embodiments, arbitrarily Complicated processing work flow can be used to be built similar to those tools shown in Fig. 4.Use the work of this instrument creation Making flow then can be activated, and this activation can cause the calling of SMS API, such as (all for processing stage to obtain Such as the stage 412 of Fig. 4) data record, obtaining iterator interface and/or obtaining next record interface can call on stream 411.
Fig. 5 shows that achievable programmed recording submits interface and record to examine at SMS according at least some embodiments The example of rope interface.In the embodiment of description, the data record of all DR 110K and 110Q as shown can be by various types of The programming intake interface 510 of type submits to SMS.In some embodiments, DR 110 may include the element of four types:It fails to be sold at auction Know symbol, such as 501A (for flowing " S1 ") or 501B (for flowing " S2 ");The data of record or the instruction of ontology;Optional subregion Key 504 (such as 504A or 504B);And optional sequence preference indicator 506 (such as sort preference indicator 506A and 506B).In some data records, data itself can provide (such as online data 502 of DR 110K) online, and for it His data record can provide pointer or address 503, to indicate that network-accessible place (or does not need network transmission for SMS Local device at address).In some embodiments, given stream can support that online data record is submitted and reference (is based on Address) data record submission.In other embodiments, given stream may need data manufacturer to supply all in line number According to or all reference datas.In some implementations, data record submits the subregion that may include being ready to use in the record Identifier.
In the embodiment of description, incoming data record 110 can be directed to corresponding intake based on partitioning strategies And/or memory node.Similarly, record retrieval can also be based on subregion, such as one or more retrieval nodes can be by It specifies for the read requests in response to the record for given subregion.For some streams, data manufacturer may need to provide Specific subregion key with respective data record write request.For other stream, SMS can according to partition scheme come Data record is distributed, the partition scheme depends on metadata or the attribute other than the subregion key clearly supplied, for example, with The related identification information of data manufacturer of submission can be used as subregion key, or the IP address of the data manufacturer submitted can be used It is some or all, or the part of data submitted can be used.In some implementations, for example, letter can will be hashed Number is applied to subregion key to obtain a certain size integer value, such as 128 integers.The size (such as from 0 to 2^128-1) The full scope of positive integer can be divided into N number of continuous subinterval, wherein each subinterval represents corresponding subregion.Cause This determines in such instances or will be hashed into corresponding 128 for any given subregion key applied to data record Integer, and the continuous subinterval of 128 integers of the integer subordinate may indicate that the section of the data record subordinate.It closes It is provided hereinafter with reference to Figure 15 in partitioning strategies and their the other details used.
It is responsible for intake or receives the data record of particular zones, stores the data record and in response to being directed to the spy The group node for determining the read requests of subregion is collectively referenced as (absorbing, store and retrieve) section for the ISR of the subregion in Fig. 5 Point.Label Sj-Pk is used to refer to k-th of subregion of stream Si.In the shown embodiment, ISR nodes 520A is configured to use In the record of intake, storage and retrieval subregion S1-P1, ISR nodes 520B is established for the record of subregion S1-P2, ISR nodes 520C is established for the record of subregion S1-P3, and ISR nodes 520K is established for the record of subregion S2-P1, and ISR is saved Point 520L is established for the record of subregion S2-P2.In some embodiments, subsystem, storage subsystem or inspection are absorbed The given node of large rope system can be configured to handle more than one subregion (or more than one stream is more than a subregion) Data record.In some embodiments, the record of the single subregion of given stream can be absorbed by more than one node, be stored Or retrieval.Specify the quantity of the intake node for giving subregion Sj-Pk that can be different from specified use at least some cases In the quantity of the intake node of different subregions Sj-Pl, and can also be different from specifying the quantity of the memory node for Sj-Pk And/or specify the quantity of the retrieval node for Sj-Pk.In some embodiments, about intake and/or retrieval, SMS controls Node can realize API (such as obtaining stream information), to allow client to determine which node is responsible for which subregion.Data record with The mapping configured between subregion and between subregion and ISR nodes (or control node) can be changed at any time, in following article About dynamically again described in the discussion of subregion.
In some embodiments, several different programming interface 580 can be realized for from given area search or reading Flow data records.As shown in Figure 5, some Retrieval Interfaces 581 can be realized for non-sequential access, such as obtained iterator and connect It mouthful (instantiate iterator or reading pointer at the data record with assigned serial number or thereafter) or is recorded (getRecord) interface (read the data record with assigned serial number).Other Retrieval Interfaces 582 can be realized for sequence Retrieval, such as obtaining next record interface, (it is to ask to read N from the current location of iterator according to the sequence for increasing sequence number The interface of a record).In the storage system based on spinning disk, as previously mentioned, sequential I/O is comparable in many cases Random I/O is much effective, because the quantity that the magnetic disk head needed on averagely each I/O is found can lead to for sequential I/O It is often more much lower than random I/O.In many embodiments, giving the data record of subregion can be sequentially written in sequence number, and Therefore the sequence read requests based on sequence number sequence (such as using the next record interface of acquisition or similar interface) are than random Read requests are much effective.In at least some embodiments, therefore, different charge rates can be arranged for sequence to non- The Retrieval Interface of sequence, such as more expenses may be charged for for non-sequential reading client.
Absorb subsystem
Fig. 6 shows the exemplary elements of the intake subsystem 204 according to the SMS of at least some embodiments.In description In embodiment, intake operation is logically divided into front-end functionality and back-end function, and wherein front-end functionality is related to giving birth to data The interaction of business men 120 (such as 120A, 120B or 120C), and back-end function is related to the interaction with SMS storage subsystems.It is this The separation of front end/rear end can have the advantages that several, such as strengthen the safety of storage subsystem and avoid having to number The details of partitioning strategies is provided according to manufacturer.SMS client libraries 602 can be provided for the peace at various data manufacturers 120 Dress, and data manufacturer can call the programming interface that library 602 includes to be absorbed to submit data.For example, implementing at one In scheme, data manufacturer 120 may include instantiating at the hundreds and thousands of a physics and/or virtual server of provider network Record or monitor agent.This agency can acquire various log informations and/or index at their corresponding servers, and And the message of acquisition or index are periodically submitted into the front end instantiated by one or more intake control nodes 660 of SMS and born Carry 604 endpoint of distributor.In some embodiments, one or more virtual ip address (VIP) can establish for loading distribution Flow data can be submitted to load divider by device, data manufacturer.In one implementation, Circling DNS (domain name system) skill Art can be used for VIP, to select certain loads distributor, data to need to be given birth to by data from the load divider of several comparable configurations Business men 120 is sent to the load divider.
In the embodiment of description, the data record of reception can be guided to several front end nodes 606 (such as 606A, Any one of 606B or 606C).In at least some embodiments, load divider 604 may not known about for data The partitioning strategies 650 of record, and front end node 606 can be therefore (or some other general by using round robin load-balancing Load-balancing algorithm) rather than data-oriented record is selected for based on the load balancing of subregion.Front end node 606 can Partitioning strategies 650 of the solution for various streams, and can be interacted with intake control node 660 and absorb node to obtain specified rear end The identity of 608 (such as 608A, 608B or 608C), the rear end intake node 608 are configured for the data note of given subregion Record.Therefore, in the embodiment of description, front end node 604 can be each based on the respective partition of data record institute subordinate by Data record is transmitted to multiple backend nodes 606.As previously noted, the subregion of data record institute subordinate can be based on various factors Any combinations determine, subregion key that the factor is such as supplied by data manufacturer, such as identity of data manufacturer or Other one or more attributes of address or the content of data.
Backend nodes 606 can respectively receive the data record for the one or more subregions for being subordinated to one or more streams, and The data record is transmitted to one or more nodes of storage subsystem.In some embodiments, backend nodes can be with It is referred to as " placing (PUT) server ", wherein data are serviced API by HTTP (hypertext transfer protocol) " placement " network and carried It hands over.Given backend nodes can determine that storage subsystem set of node, data record need to be inquired by submitting to control node 660 And it (is wherein the set of node by separating for the control function of different sub-systems to be transmitted to the storage subsystem set of node In the embodiment of reason, this transfers that corresponding inquiry can be submitted to the control node of storage subsystem).
In at least some embodiments, many different intakes can be supported to confirm strategy 652, such as absorbed at least once Strategy absorbs strategy as possible.In strategy at least once, data manufacturer 120 may need the data record to each submission Positive assurance, and repeatably submit identical data record (if not receiving the confirmation submitted for the first time) until most Terminating receives confirmation.In absorbing strategy as possible, positive assurance may not be needed (to the greatest extent at least some data records of submission Pipe intake subsystem may still provide confirmation once in a while, or may be in response to clearly the asking to confirmation from data manufacturer It asks).In wherein intake subsystem 204 needs to provide some embodiments of confirmation for data manufacturer, confirmation is being generated Before, the rear end intake node 608 for being responsible for data-oriented record may wait for until having successfully created data note at storage subsystem Required amount of copy (for example, according to the persistence strategy established for the stream) of record.In various embodiments, sequence Number can by intake subsystem generate for reception each data record, such as indicate it is described record by relative to identical subregion Or the sequence of other record intakes of stream, and this sequence number can be used as confirmation or the part as confirmation returns to data life Business men.Other details about sequence number is provided hereinafter with reference to Figure 13 a and Figure 13 b.In some implementations, confirm And/or sequence number can be transmitted back to data manufacturer by front end node 606.In at least one realization method, plan at least once It can slightly be realized between the front end node and backend nodes of intake subsystem itself, such as given front end node 606 can be suitable When backend nodes 608 repeatedly submit data record, until backend nodes provide confirmation.
Intake control node 660 can be responsible in addition to other functions:Instantiate front end node and backend nodes, monitoring node Health and workload are horizontal, coordinate fault as needed shifts, provide to being responsible for giving the inquiry of subregion about which node Response or the response that policy-related (noun) is inquired, the relevant configuration behaviour of intake for the dynamic from stream again subregion Make.In some embodiments, specify the quantity itself of the intake control node of the given collection for one or more stream can be with Time and change, such as one or more main control node can be responsible for as needed to reconfigure control node pond.Wherein Redundancy group is established in some embodiments of intake front end node or backend nodes, about Fig. 9 and Figure 10 in following article Be described in further detail, control node 660 can be responsible for tracking which node be primitive and which be non-primitive, be used for Detection is directed to the trigger condition of failure transfer and when needing failure to shift for selecting alternative.It should be noted that in some realities Multilayer intake subsystem architecture shown in Fig. 6 can not be realized by applying in scheme, such as list can be only configured under some situations Group intake node.
Storage subsystem
Fig. 7 shows the exemplary elements of the storage subsystem of the SMS according at least some embodiments.As shown, taking the photograph It (such as in wherein front-end and back-end intake responsibility is that rear end in the embodiment handled by different group nodes is taken the photograph to take node 608 Take node) data record of one or more subregions of stream can be transmitted to the corresponding memory node for being configured to those subregions 702.For example, the data record 110A of subregion S1-P1 is sent to memory node 702A, the data record 110B of subregion S2-P3 It is sent to memory node 702B and 702C, the data record 110C of subregion S3-P7 is sent to memory node 702D, and divides The data record 110D of area S4-P5 is initially sent to memory node 702E.Storage control node 780 can be responsible for:Implementation is answered With the persistence strategy 750 of the data record to not cocurrent flow, configure and reconfigure as needed memory node, monitoring storage section Dotted state, the transfer of management failure are inquired and in response to storage configuration inquiry or storage strategy in the embodiment of description Various other management roles.
In different implementation scenarios, persistence strategy 750 can in different ways be different from each other.For example, using To stream Sj persistence strategy P1 can in the following areas in be different from be applied to stream Sk tactful P2:(a) each of to be stored The quantity of the copy of data record;(b) copy has type (such as the copy of the storage device to be stored to thereon or system It is whether to be stored to volatile memory, non-volatile cache, the storage device based on spinning disk, solid state drive (SSD), in various types of memory tool, all kinds of RAID (redundant array of inexpensive disk), if to be stored to arrive data base management system In, in the node of storage service etc. realized by provider network);(c) geographical distribution of the copy is not (such as by With copy is placed in data center, whether flow data is recoverable for extensive failure or certain type of disaster);(d) Write-in confirm agreement (for example, if there is N number of copy to be stored, then should will confirm that be supplied to intake node before must succeed How many copies in N number of copy are written);And/or (e) in the case where there is multiple copies of data record to be stored, Whether the copy should in parallel or sequentially be created.In some cases for the multiple copies for having data record to be stored Under, such as in the case of data record 110D, data record can be transmitted to another memory node (example by given memory node As the data record 110D for further replicating is sent to memory node 702F, and memory node by memory node 702E 702F is continued to be sent to memory node 702G).In other for using more copy persistence strategies, such as it In the case of the data record 110B for having the copy in two memories to be stored, intake node can concurrently start repeatedly multiple System.In at least some embodiments, the persistence strategy of the selection of client, which can not be assigned with, is ready to use in flow data record Storage location type;On the contrary, SMS can select memory technology and/or the place of appropriate type based on various standards, it is described Standard cost, performance, the degree of approach, durability demand etc. to data source.In one embodiment, client or SMS It can determine using the different subregions for given stream or the different memory technologies or storage location type for not cocurrent flow.
In the example being shown in FIG. 7, being applied to the persistence strategy of stream S1 (or at least flowing the subregion S1-P1 of S1) is The single strategy of copy in memory, and be that flow S2 applications is the two parallel strategies of copy in memory.Therefore, number It is created at memory node 702A according to the copy 704A in the memory of record 110A, and corresponding to two of data record 110B Copy 705A and 705B in memory is concurrently created at memory node 702B and 702C.For flowing the data record of S3 110C creates the copy 706A on single disk.For flowing S4, can application order strategy of three copies on disk, and And therefore sequentially create at memory node 702E, 702F and 702G copy 707A, 707B and 707C on corresponding disk. In different implementation scenarios, the persistence strategy of various other types can be applied to data flow.The node of retrieval subsystem Data record can be obtained from memory node appropriate by the calling of data consumer in response to various types of retrieval API.
Retrieval subsystem and processing stage
Fig. 8 shows the exemplary elements and retrieval subsystem of the retrieval subsystem of the SMS according at least some embodiments With the example of the interaction of SPS.As shown, retrieval subsystem 206 may include multiple retrieval nodes 802, node is such as retrieved 802A, 802B and 802C, and retrieve the set of control node 880.Each in retrieval node 802 can be configured to ring Flow data retrieval requests of the Ying Yu from various clients or data consumer 130, the work section of SPS such as described below Point 840.In different implementation scenarios, a variety of programming Retrieval Interfaces 802 can be realized by retrieval node, all non-as previously described The Retrieval Interface of sequence and sequence.In some embodiments, it is available to obtain the network service API that (GET) is asked by such as HTTP In data record retrieval, and retrieve node 802 can therefore be referred to as obtain server.In the embodiment of description, give Retrieval node 802 can for example be configured by retrieval control node 880, (such as to be stored from appropriate group storage subsystem node 702 Node 702A and 702B) obtain the data record in one or more flow point areas.
In the embodiment of description, retrieval node 802 can be interacted with one or more memory nodes 702, and also be rung The retrieval request that Ying Yucong one or more SPS working nodes 840 receive.For example, subregion S4-P5 data record (such as count According to record 110K) and subregion S5-P8 data record (such as data record 110L) by by retrieval node 802A from memory node 702A is read, and is respectively provided to working node 840A and 840K.The data record (such as 110M) of subregion S6-P7 by by It retrieves node 802B to read from memory node 702A, and is supplied to working node 840K.The data record of subregion S4-P7 is by by examining Socket point 802C is read from memory node 702B, and is supplied to working node 840B, and is also provided to other data consumers 130 (for example, directly invoking SMS retrieval API rather than the data consumer that is interacted with SMS by SPS).
In at least some embodiments, some or all of achievable corresponding caches 804 in node 802 are retrieved (such as retrieve the cache 804A at node 802A, the cache 804B at retrieval node 802B and retrieval node 802C The cache 804C at place), wherein the data record of each subregion anticipates that the retrieval request in future can temporarily retain.Retrieval Control node 880 can be responsible for realizing many search strategies 882, including for example (such as cache should be by for cache policies Configure it is much for give subregion, data record should be cached how long), memory node selection strategy (such as storing Which particular memory node should be contacted in the situation of the data record of multiple copies at first to obtain data-oriented record) Deng.In addition, retrieval control node can be responsible for:Which instantiation and monitoring retrieval node 802, response be responsible for about which retrieval node The inquiry of a little subregions starts or in response to division operation again etc..
In the example shown, SPS 290 includes two processing stages:215A and 215B.SPS control nodes 885 can be born Duty instantiation working node 804 at each processing stage 215, such as the working node 840A of the record of processing subregion S4-P5, Handle the working node 840K of the working node 840B of the record of subregion S4-P7 and the record of processing subregion S5-P8 and S6-P7. SPS control nodes 885 can realize programming interface (those interfaces shown in such as Fig. 3 and Fig. 4), so that SPS client energy Enough design treatment workflows.Various check point strategies 850 can be realized for different processing stages or workflow, to refer to Show when or whether working node has progress record to be stored, the progress record to indicate that the working node is handling them The type etc. for reaching what degree in corresponding subregion, having the storage device for being ready to use in progress record.Failure transfer/recovery policy 852 may indicate that and will lead to the trigger condition or threshold value of being replaced working node using different nodes, and restore as possible whether There is recovery to be used or based on check point whether to have and is ready to use in given processing stage.In at least some embodiments, SPS control nodes 885 can be interacted with various types of SMS control nodes, such as need the number from its acquisition given stream with identification According to the retrieval node of record, establish the new temporary current that may be needed for particular procedure workflow or lasting stream etc..Extremely In a few embodiment, client can be interacted with SPS control nodes to instantiate stream, such as some clients may want to only It calls the SPS interfaces of higher level rather than utilizes SMS control interfaces.It should be noted that although showing to separate in Fig. 6, Fig. 7 and Fig. 8 Control node collection for SMS intakes, storage and retrieval subsystem, and for the SPS stages, at least some embodiments Given control node can be used for several subsystems and/or SPS.
Node redundancy group
In at least some embodiments, the redundancy group of node can be configured for one or more subsystems of SMS. That is, being used to be flow point area Sj-Pk retrieved data records instead of for example configuring a retrieval node, two or more can be established Multiple nodes are used for this retrieval, and one of node is in time authorized " main " or positive role in set point, and Other node or multiple nodes are designated as " non-principal " node.Current main node can be responsible for responsive operation request, Such as the request that the node from client or from other subsystems receives.A non-staple node or multiple nodes can be kept Stop, until for example triggering failure due to failure, to internuncial loss or other trigger conditions of main node and turning It moves, the non-primary node selected at that time can be taken over the responsibility of previous main node by control node notice.It is shifted in failure Period, therefore dominant role can be recalled from current incumbent main node, and current non-primary node is awarded.One In a little embodiments, timing (such as may not be needed explicitly to notify) really, non-principal section are shifted when making to break down Point can be taken over as main node itself.In various embodiments, the redundancy group of this node can be established at SMS Intake, storage, retrieval and/or control function, and at least some embodiments, can also be taken at SPS similar Method is used for working node.In some embodiments, include at least one main node and at least one for give function This group of a non-primary node can be referred to as " redundancy group " or " copy group ".It should be noted that the redundancy group of memory node can be only It on the spot realizes the quantity of the physical copy of the data record of storage, such as has the quantity of the copy of data record to be stored can be by holding Long property strategy determines, and the quantity for being configured for the memory node of corresponding subregion can be determined based on redundancy group policy.
Fig. 9 shows the example of the redundancy group for establishing the node for SMS or SPS according at least some embodiments. In the embodiment of description, for given stream subregion Sj-Pk, corresponding redundancy group (RG) 905,915,925 and 935 is established For absorbing node, memory node, retrieval node and control node.It is realized in the shown embodiment for control node Shared RG 935, although can be realized in some embodiments for absorbing control node, storage control node or retrieval The separated RG of control node.Each RG include main node (such as main intake node 910A, main memory node 920A, Main retrieval node 930A and main control node 940A) and at least one non-primary node (such as non-principal intake node 910B, non-principal memory node 920B, non-principal retrieval node 920C and non-principal retrieval node 920D).According to corresponding Failover policy 912 (for absorbing node), 922 (being used for memory node), 932 (for retrieving node) and 942 (are used for Control node), dominant role can be withdrawn and authorize current non-primary node.Failover policy can be managed for example: By the trigger condition for leading to main node state change, whether and how to monitor the healthy shape of main node or non-primary node State, the quantity etc. for having non-primary node to be configured in given redundancy group.In at least some embodiments, list can be established A RG is used for multiple subregions, such as RG 905 can be responsible for handling the intake of the record of subregion Sj-Pk and Sp-Pq.In some realizations In mode, the non-master of another subregion can be designated for simultaneously by being designated for the node of the main node of a subregion Want node.In one embodiment, multiple nodes can be designated as the main node in given RG, such as given point simultaneously The relevant workload of intake in area can be allocated in two main nodes, and one of node is in any one main node Place is designated as non-primary node in the case of breaking down.The quantity of the node instantiated in given RG may depend on corresponding work( Availability or restoration needed for energy (such as can be born in how many concurrent or overlapping failures in described group of intention) is horizontal. In some embodiments, in addition to or instead of be used for SMS nodes, redundancy group can be established for the work of SPS processing stages Make node.The component of given RG can be geographically distributed sometimes, such as throughout several data centers, as shown in Figure 10. In some embodiments, the control node of selection can be configured to for example examine using heartbeat mechanism or other health monitoring techniques The condition of failure transfer triggering is surveyed, and this control node can be by selecting non-primary node appropriate as the master to failure The alternative of node, notice/start replacement node of selection etc. is wanted to carry out coordinate fault transfer.
In some embodiments, provider network can be organized into multiple geographic areas, and each region Ke Bao One or more availability containers in " availability area " can also be referred to as herein by including.Availability container transfers to may include one Or multiple and different places or data center, the availability container are engineered in such a way (for example, passing through independence Basic structural member, the relevant equipment of such as power, cooling equipment, physical security member):In given availability container Resource and the Fault Isolation in other availability containers.Failure in one availability container may not expected can at any other Failure is generated in property container, therefore, resource instances or the availability configuration file for controlling server are intended to independently of in difference Availability container in resource instances or control server availability configuration file.It can be by holding in corresponding availability Start multiple Application Instances or (in the case of some SMS and SPS) in device to spread the node of given redundancy group Multiple availability containers are distributed to protect various types of application programs from failure at single place.Meanwhile in some realities In existing mode, in the resource being present in identical geographic area (such as the host or calculated examples of SMS and SPS nodes) Between cheap and low latency network connection can be provided, and the network transmission between the resource of identical availability container can be with Even faster.Some clients may want to for example, referring at zone level, the horizontal place of availability vessel level or data center The place of the fixed flow management for retaining and/or instantiating them or stream process resource, to maintain the various portions of their application program The control for the required degree where part accurately runs.Other clients may be for retaining or instantiating their resource Accurate place is less interested, as long as the resource performance, high availability such as meeting client demand.One In a little embodiments, the control node being located in an availability container (or data center) can remotely configure other Other SMS or SPS nodes in availability container (or other data centers), that is to say, that specific availability container or number SMS/SPS nodes can need not be managed according to center with Partial controll node.
Figure 10 shows the provider network environment according at least some embodiments, wherein the node of given redundancy group can divide Cloth is in multiple data centers.In the embodiment of description, provider network 1002 include three availability container 1003A, 1003B and 1003C.Each availability container includes some or all of one or more data centers, such as availability container 1003A includes data center 1005A and 1005B, and availability container 1003B includes data center 1005C and availability container 1003C includes data center 1005D.Show many different redundancy groups 1012 of SMS and/or SPS nodes.Some RG 1012 can all realize in individual data center, such as in the case of the RG 1012A in data center 1005A.Other The resource of multiple data centers in given availability container can be used in RG, and such as RG 1012B, the RG 1012B are crossed over can With the data center 1005A and 1005B of property container 1003A.However the money for spreading all over different availability containers can be used in other RG Source is realized.For example, RG 1012C respectively using positioned at availability container 1003A and 1003B data center 1005B and Resource in 1005C, and RG 1012D are utilized respectively the data center in availability container 1003A, 1003B and 1003C Resource at 1005B, 1005C and 1005D.In a kind of exemplary deployment, if RG 1012 includes a main node and two A non-primary node, then each in these three nodes can be located in different availability containers, thereby, it is ensured that at least one A node is very likely to holding, and it is functional, even if extensive failure thing occurs simultaneously at two different availability containers Part.
In the embodiment of description, console service associated with SMS and SPS 1078 and 1076 respectively can provide easily In the network-based interface used, for configuring the relevant setting of stream in provider network 1002.It can be existed using resource Many other services (its at least some can be by SMS and/or SPS uses), the resource time are realized in provider network 1002 Cloth one or more data center spreads one or more availability containers.Such as, it can be achieved that virtual computing service 1072, So that client can utilize the computing capability of the calculated examples for being packaged as various different ability levels of selection quantity, and And this calculated examples can be used to realize SMS and/or SPS nodes.One or more storage services 1070 can be achieved, so that Client can be stored and accessed for example by block assembly volume interface or by network service interface with required data The data object of life level.In some embodiments, storage object could attach to service 1072 calculated examples or Can be from its access, and can be used to realize various stream persistence strategies at SMS storage subsystems.In an embodiment In, such as one or more database services of high-performance key assignments (key-value) database management services 1074, or it is related Database service can be realized at provider network 1002, and this database service can be used to SMNS storage subsystem System storage flow data record, and/or for storing control subsystem, intake subsystem, storage subsystem, retrieval subsystem or place The metadata in reason stage.
Flow secure option
In at least some embodiments, the user of SMS and/or SPS can be provided for a variety of safety of data flow Relevant options, so that client can select the secure configuration file (such as virtual machine or physical machine) of resource, the money Source is ready to use in various functions type, such as absorbs, stores, retrieval, handling and/or control.This option may include for example closing In the type of the physical location of the resource for various nodes selection (such as, if need to be used provider network facility, or The facility whether person has client to be used all, any facility can have the Special safety different from provider network facility Sign), the selection of Network Isolation about the encrypted selection of flow data and/or in the various pieces of stream process foundation structure.One A little clients may worry the possibility of effractor or attacker, and the effractor or attacker get valuable proprietary quotient The access of industry logic or algorithm, such as and may want to realize at stream using the computing device in all places of client Manage working node.Need to can be described as being used for those herein for realizing the type of the resource of one group of SMS and/or SPS node " placing destination type " of node.Figure 11 shows to be selected for SMS's or SPS according at least some embodiments Multiple placement destinations type of node.
In the embodiment of description, placement destination can be selected for some type of in provider network 1102 SMS/SPS functional types (for example, intake, storage, retrieval, control or processing), and for other kinds of SMS/SPS work( Outside the provider network 1102 of energy type.In provider network 1102, multi-tenant example host 1103 can be used to realize one A little resources, such as calculated examples, storage example or database instance.This multi-tenant example host can be for one or more It is instantiated at each in SMS the or SPS nodes of a client, the first type for placing destination type can be formed “A”.In order to avoid having to share physical resource with other clients, some clients can ask their SMS/SPS nodes It is realized using the example host for being confined to single client.This list tenant's example host, which can be formed, places genre types " B ". For several reasons, from the perspective of some clients, single tenant's example host may be preferred.Due to multi-tenant example Host may include the calculated examples for being subordinated to other clients, can than in single tenant's example host in multi-tenant example host There can be the more high likelihood of the security attack of the example from another client.In addition, when using single tenant's example host When, the surge for the calculated examples CI1 experience workloads of one of client run on multi-tenant host can also be avoided And start consume large scale host calculating cycle or other resources, therefore potentially influence another client not " noisy neighbours " phenomenon of the performance of the application program run on same calculated examples CI2.
In the embodiment of description, the virtual network (IVN) 1106 (such as IVN1106A and 1106B) of isolation can represent Place another type " C " of destination type.In some embodiments, the request of provider network client can be answered to create Logic equivalents of the IVN 1106 as dedicated network, but the feelings just largely controlled by client in network configuration Provider network resource construction IVN 1106 can be used under condition.For example, client can determine with having IP ready for use in IVN1106 Location, without the possibility for the IP address for worrying to repeat to have used outside IVN.In the embodiment of description, Realize that various types of SMS and SPS nodes can be that the management of the flow data of client and/or processing increase in one or more IVN Add the internet security of extra level.In some cases, given client may want to place one in an IVN 1106 The SMS/SPS nodes of a functional type, and in different IVN place different function type SMS/SPS nodes.Various In embodiment, given IVN 1106 may include single tenant's example host, multi-tenant example host or two kinds of example Host.In some embodiments, using another group of placement destination type selection of the resource of provider network, (or safety is matched Set file selection) (being not shown in Figure 11) can be available at least some clients.Client can be from supplier's net The virtualization for being used to flow relevant operation of network calculates service and obtains and use in the embodiment of computing resource, the calculated examples It can be used in one kind in both of which.In one mode, client can provide an executable journey for SPS or SMS Sequence or multiple executable programs, the executable program have and stay in and be configured as at the calculated examples of SPS working nodes (or At intake, storage or retrieval node) operation, and SMS or SPS is made to run described program and management node.This first mode can To be referred to as " stream service management " pattern using the calculated examples for flowing operation.In other patterns, client may Wish to run executable program in the case of support less from SPS or SMS and manages calculated examples.This second mode can To be referred to as " client-side management " pattern using the calculated examples for flowing operation.Therefore both operation modes can represent About the selectable other selection for placing destination type or secure configuration file of client.If such as executable program It is likely to require debugging (including single-step debug), the pattern of client-side management may be selected in client, and the debugging can be by coming from visitor The subject matter expert of the tissue at family end best executes, and the pattern for flowing service management is more ripe for being less likely to need to debug Code can be reasonable selection.In some embodiments, different price strategies can be applied to both patterns.
In the embodiment shown in Figure 11, many placement options can be supported at the facility outside provider network.Example Such as, the host 1160 that the libraries SMS 1171 and/or the libraries SPS 1172 are mounted on can be used for from client facility (such as visitor All data centers in family end or place) 1110A or the flow management in 1110B or processing, the client of two of which type set It applies to be connected in the mode of provider network at them and be different.Client facility 1110A is by by least some shared Internet link 1151 is linked to provider network 1102, and (i.e. the network flow of other entities also can be in client facility 1110A Some between provider network 1102 chain flowing).In contrast, some client facilities (such as 1110B) can be with It is linked to supplier by special unshared special physical link 1106 (" being directly connected to " link can be referred to as sometimes) Network.In the term used in fig. 11, respectively include placing destination option " D " at both different types of clients " E ".In some embodiments, the part of SMS and/or SPS third party's facility (such as using but not by SMS/ The client of SPS is all or the data center of management) at can also be achievable, and this third party place can be referred to It is set to and places destination type " F ".In at least some clients and/or third party place, the libraries SMS and/or SPS may be necessary It obtains from provider network, and is ready to use on the host of SMS/SPS nodes mounted on having.In at least one embodiment, The node of all different functional types can be realized outside provider network by means of library appropriate.
In different implementation scenarios, different placement destination types may differ from that in various security-related aspects This, the intrusion detection feature of the Network Isolation feature, support such as realized, the physical security strategy of realization, support encryption stage Not etc..Therefore, each in various destination types may be considered that with corresponding secure configuration file, and the safety is matched The secure configuration file of other placement destinations can be different from one or more ways by setting file.In Figure 12 a and Figure 12 b It is shown, in some embodiments, the client of SMS and/or SPS can programmatically (such as by be sent to SMS or The request of one or more control nodes of SPS) it is different subsystems or the corresponding placement destination type of set of node selection. It should be noted that in some embodiments and for certain form of streaming application, client may want to control and place mesh Ground type, this be not only for security reasons but also for performance and/or function reason.For example, can be by using Private client place resource or single tenant's example host avoid noisy neighbours' phenomenon described above.In some embodiments In, client can be desirably used for the special or proprietary hardware and/or software of SPS stages or SMS nodes with them, wherein using The accessible Functional Capability of this component or performance level can not be replicated easily at provider network, or only supplied It is not supported at quotient's network.Client may access the meter with supercomputer horizontal processing ability at external data center Calculation machine server, such as the computer server can be will likely obtain than provider network resource is used alone Rate much higher rate executes SPS processing.It enables the client to select to place the permissible use in destination for various nodes This dedicated unit or software.
Figure 12 a and Figure 12 b be shown respectively according at least some embodiments can be by SPS clients and SMS clients The example of the secure option request of submission.Figure 12 a show that SPS secure options request 1200, wherein client instruction have mark Symbol 1210 processing stage, request for the stage control node (element 1212) placement destination type (PDT) with And PDT one or more of of the request for working node (element 1214).In at least one embodiment, client is also Request can be submitted to think their flow data record or the configuration encryption setting of stream process result, such as by asking at that A little data are recorded in various network linkings before transmission and they are encrypted using specified algorithm or agreement, or are asked The interaction by various controls or management is asked to be encrypted.For example, in Figure 12 a, the encryption setting for the stage may indicate that Need the encryption technology for being applied to the result of phase process operation, and/or the control node for the stage and the stage Working node between communication encryption.
Similarly, in Figure 12 b, the SMS secure options request 1250 of client includes many elements, the element instruction Safety preference of the client for one or more streams with specified identifier 1252.For intake node, memory node and The placement destination type preference of retrieval node can be indicated in element 1254,1258 and 1262 respectively.Intake is controlled It node processed, storage control node and retrieves the PDT preferences of control node and can be referred to respectively by element 1256,1260 and 1264 Show.The encryption preference of data record can be indicated by element 1266, such as when data record is by the section from a type Whether and/or how encrypted for its realization when point is transmitted to the node of another type.By using such as Figure 12 a and Figure 12 b Shown in the secure option request of those, client can be (such as in provider network or in provider network It is external) the selection place, and for they flow management and processing environment different piece various other security configurations Files component.
It should be noted that at least some embodiments, the selection that node places destination may be in addition to safety Other reasons provide.For example, for performance reason (such as " noisy neighbours " problem in order to avoid pointing out before, rather than Primarily for security reason), client may want to some type of SMS realized at single tenant's host or SPS sections Point.In at least some embodiments, placing selection can change during the service life of stream, such as client can initially permit Perhaps SMS nodes instantiate at multi-tenant example host, but may want to move at least some subsets of the node later To single tenant's example host.In at least some embodiments, different price strategies can be applied to different safety-related Option, for example, realizing that the SMS nodes of specific function type may be more real than at the multi-tenant example host outside IVN at IVN The SMS nodes of existing specific function type spend higher, or realize that SMS nodes may be than in more rents at single tenant's example host Realize that SMS nodes spend higher at the example host of family.
Flow sequential storage and the retrieval of record
For the streaming application of many types, data record can be given birth to very high rate from multiple data at SMS Business men 120 receives, and data consumer may it is generally desirable to access the data note of storage to generate the sequence of the record Record.As previously mentioned, especially in the environment that spinning disk is used as the storage device recorded for flow data, sequence I/O access modules (for reading and writing) can have the significant performance advantage better than random I/O access modules.In several realities It applies in scheme, the sequence number that stream is specified or subregion is specified can distribute to them when data record is received by SMS, and can prop up Hold the ordered retrieval operation based on sequence number.Figure 13 a are shown according at least some embodiments in flow data manufacturer and SMS Intake subsystem between exemplary interaction.Flow data manufacturer can to intake subsystem submit data record 110, and In the embodiment of description, intake subsystem can reply the sequence number 102 for the record for being selected for submitting.At least some In embodiment, intake node can obtain the part of the sequence number from storage subsystem, such as sequence in such an implementation Row number 102 can determine after the storage of the data record of reception according to applicable persistence strategy, and store subsystem System can generate for the data record oneself number sequence indicators, and provide that indicator for be included in by In the sequence number for the bigger that intake node distributes to data record.
Sequence number can be realized in various embodiments with the stabilization for providing data record, consistent sequence, and is made Repeatable iteration can be carried out on record by data consumer.In at least some realization methods, particular zones are distributed to The sequence number of data record can be increased monotonically at any time, although they need not be continuous.In various embodiments, sequence Row number can be designated have at least some of following semanteme subset:(a) sequence number is unique in stream, that is, is not had Two data records of given stream can be assigned identical sequence number;(b) in the available data record for accomplishing stream of sequence number Index, and can be used to be iterated in the data record in given stream subregion;(c) for any data-oriented manufacturer, Data manufacturer successfully submits the sequence of data record to be reflected in the sequence number for distributing to data record;And it (d) is used for The sequence number of data record with given partitioning key values entirely again dynamic partition operation on keep be increased monotonically semanteme, Such as it distributes to the sequence number of the data record with partitioning key values K1 and may each greater than be distributed to dynamic after subregion again State again before subregion the data record with that partitioning key values K1 any sequence number.(below with reference to Figure 16 is further Describe dynamic subregion again in detail.
In some embodiments, data manufacturer may want to the sequence for influencing to be selected at least some data records The selection of row number 102.For example, data manufacturer 120 may want to define boundary or separator in the sequence number of the distribution of stream, So that submitting the read requests of the specific subset for stream to become easier to for the data consumer of the stream.One In a little realization methods, data manufacturer 120 can submit minmal sequence number together with the instruction of record, and SMS can be according to request Minimum value selects sequence number, and that the minimum value also complies with sequence number discussed above is semantic.
Figure 13 b show the data record that intake can be generated at SMS according at least some embodiments The exemplary elements of sequence number.In the embodiment of description, sequence number may include four elements:N1 SMS version numbers 1302, N2 timestamps or epoch value 1304, the seats n3 sequence number 1306 and n4 partition numbers 1308.In some implementations, can make With 128 bit sequence numbers, such as n1, n2, n3 and n4 can be 4,44,64 and 16 respectively.Version number 1302 can be only Confusion for avoiding entire SMS software versions from issuing, such as so that relatively easily inform any version of SMS softwares Originally it is used to generate sequence number.In at least some realization methods, version number 1302 may not expect to be frequently changed.It can be such as Timestamp value 1304 is obtained (for example, realizing from local clock source or globally accessible clock source by absorbing subsystem node It obtains current epoch (getCurrentEpoch) or obtains the provider network of current time (getCurrentTime) API System for managing state).In at least some realization methods, from the offset at well-known time point (for example, from 1970 1 The 00 of the moon 1:00:The past number of seconds of 00AM UTC, can be by based on UnixTMOperating system in call various time phases The system calling of pass obtains) it can be used for timestamp value 1304.In some embodiments, sequence number 1036 can be by storage subsystem System generates, and may indicate that the sequence of the data record write storage device of particular zones.Therefore many data be recorded in Determine to receive in the second and timestamp value 1304 is only in the realization method of approximate one second interval variation, sequence number 1306 can be used as Record for data record reaches the indicator of (or storage) sequence, and the data record has just reached within the identical second And therefore it is assigned identical timestamp value.In some embodiments, partition number 1308 can unique identification go out in given stream Subregion.At least some realities of the clock time of corresponding data record are absorbed in sequence number timestamp (at least approximately) instruction In existing mode, sequence number can be used for Indexing Mechanism, and the Indexing Mechanism is used for certain form of time-based retrieval request.Example Such as, the stream record that client may want to retrieve in particular day or at the appointed time generate or absorb during range, and And sequence number can be used as the key of implicit secondary index to retrieve the data record suitably organized.Therefore, at least some embodiment party In case, including the use for the orderly sequence number of the timestamp of storage and retrieval can have other benefit, that is, provide To time index in the data record of described group of storage.
Usually usually the data record of given subregion can be sequentially written in sequence number by using big sequence write operation (such as to disk).In some embodiments, as previously noted, it can be achieved that the programming interface based on iterator, to permit Perhaps data consumer is with sequence number sequence reads data log.Figure 14 is shown according at least some embodiments at SMS The orderly example stored and retrieve of flow data record.Six data records of subregion Sj-Pk (k-th of subregion of stream Sj) 110A -110F are shown to store with sequence number sequence.As shown, sequence number can not be at least some embodiments It continuously, such as may not be total due to assigning values to the mode of timestamp part 1304 or sequence number 1306 as discussed above It is the continuous value generated for those elements.
In the example being shown in FIG. 14, data consumer has requested that generation iteration by specified starting sequence number " 865 " Device.In response to the request, SMS has initialized iterator 1, and the iterator 1 is positioned in the number with nearest sequence number At center, the nearest sequence number is greater than or equal to requested starting sequence number.In this case, due to next Lower sequence number (860, be assigned to data record 110B) is less than the starting sequence number in the request of consumer, has sequence Numbers 870 data record 110C has been selected as the initial position of iterator.Iterator interface is obtained to be contemplated that in subregion At requested position be arranged pointer request logic equivalent, and obtain next record interface can be subsequently used to from The pointer position starts reads data log, such as pointer is moved along stream with sequence number sequence.In the example shown, Data consumer, which has called, obtains next record interface, and wherein parameter " iterator " is arranged to iterator 1, and " maximum number scale Record (maxNumRecord) " (maximum number of the data record of return) is arranged to 3.Therefore, SMS retrieval subsystems are by data record 110C, 110D and 110E sequentially return to data consumer with that.Iterator (iterator 1) obtains next record in completion and exhales It cries and is movable to new position later, such as to data record 110F, and under the subsequent acquisition of identical iterator A record calling can return to the data record originated with 110F.In some embodiments, the semanteme for obtaining iterator calling exists It can be different in some embodiments, such as iterator can be positioned in equal to or less than requested sequence number Highest serial number nearest data record at, rather than by the iterator be located in be greater than or equal to specified sequence Number nearest sequence number data record at.In another embodiment, client may must obtain iterator calling In specify existing sequence number, for example, if with requested sequence number being recorded in stream in be not present, then can return to mistake Accidentally.
Subregion maps
As described above, in various embodiments, to the intake of the record of given stream, store, retrieve and process it is related Workload can be divided and distribute in several nodes according to various subregions and again partitioning strategies.Figure 15 is shown according at least The flow point area mapping 1501 of some embodiments and the example that the correspondence configuration decisions that SMS and SPS nodes are made can be directed to.When The calling of establishment stream API when creating or initializing specific data stream, such as in response to client, partitioning strategies, which can start, to be used for The stream, the partitioning strategies may be used to determine subregion, and any data-oriented record of stream is considered the member of the subregion. It absorbs subsystem 204, storage subsystem 206, retrieval subsystem 208 and needs to be directed to any of data-oriented record execution operation The specific node in related SPS stages 215 can be selected on the basis of the subregion of record.In one embodiment, it is used for At least one subset of the control node of data-oriented record may be based on subregion and be selected.In at least some embodiments In, the dynamic of support data stream part of the subregion as partitioning strategies again, such as in response to pointed in the strategy Trigger condition or in response to explicitly asking.
In various embodiments, selection can be dependent on the subregion for the record for the subregion of data-oriented record Key, the value of the subregion key can be by data manufacturers directly (such as write-in or place parameter of request) or indirectly (for example, SMS can be used metadata as subregion key, the metadata such as identifier of data manufacturer client or title, The part of the IP address of data manufacturer or the actual content of data record) supply.In the embodiment being shown in FIG. 15, One or more mapping functions 1506 can be applied to data record subregion key or attribute 1502, to determine data record partition identification Symbol 1510.In one implementation, for example, given partition identifier 1510 can represent 128 integer values spatially Successive range, so that the union of the range of all subregions for stream being possible to of can covering that 128 integers can assume that Value.In this Exemplary contexts, simple mapping function 1506 can be from the partitioning key values of data record or selected Attribute value generate 128 hashed values, and partition identifier can be placed exactly in based on hashed value the specific successive range in it come It determines.In some implementations, successive range can at least original size it is equal;In other realization methods, different subregions Can correspond to may be with the successive range that differs in size from one another.In one implementation, subregion can also be generated to range again The adjusting of boundary.Other partition functions 106 can be used in different realization methods.
If data flow undergoes dynamically subregion (as further discussed in detail) again, the note with particular key Recording the subregion be mappeding to can change.Therefore, at least some embodiments, SMS and/or SPS control nodes are possible must Several different mappings for being applied to stream must be recorded during the service life of stream.In some embodiments, such as timestamp The metadata of effective range 1511 or sequence number effective range can be stored by the control node mapped for each subregion.Timestamp Effective range 1511 can for example indicate that specifically mapping M1 applies from the creation time of stream until time T1, indicates different reflect M2 is penetrated from T1 to T2 to apply.When the read requests in response to being guided at stream, retrieval node must may first determine Any mapping (such as depending on sequence number indicated in read requests) will be used, and will then be identified using that mapping Memory node appropriate.
In at least some embodiments, SMS and SPS control nodes can be responsible for subregion mapping to several different interval rulers The resource at very little place.For example, as shown in the example implementations 1599 of Figure 15, in one implementation, absorb, store, Retrieval or processing (work) node can respectively be embodied as the corresponding process of the execution in server virtual machine or corresponding thread, Such as JavaTMVirtual machine (JVM) or calculated examples, and JVM or calculated examples can respectively instantiate at specific physical host. In some embodiments, multiple JVM can start in single calculated examples, to increase another layer of resource impact decision. Therefore, for giving subregion, one or more control nodes may be selected that any specific resources will be used as intake node 1515, memory node 1520, retrieval node 1525 or processing stage working node 1530 (such as are respectively used to stage PS1 or PS2 Node 1530A or 1530B).Control node may further determine that those nodes (such as absorb server 1535, storage to server Server 1540, retrieval server 1545 or processing server 1550) mapping and server and host (such as intake is led Machine 1555, storage host 1560, retrieval host 1565 or SPS host 1570A/1570B) between mapping.In some realization sides In formula, subregion mapping may be considered that each resource size of space (such as node, server and the host interval for being included in and showing Size) each at identification information (for example, resource identifier), input as a function or multiple functions 1506 The instruction of data record attribute and function 1506 itself.Control server can store the subregion mapping in metadata storage It represents, and various API can be exposed in some embodiments and (such as obtain partition information (getPartitionInfo) API) or other programming interface, reflected with being provided for data manufacturer, data consumer or node for SMS subsystems or SPS Penetrate information.
Data be recorded the mapping of subregion and mapping from subregion to resource in some embodiments by it is various because Element may become more complicated, and the factor is such as:(a) in some embodiments, give node, server or host can be with It is designated to be responsible for multiple subregions, or (b) is being assigned into given subregion or the new node of partition set, server or host There may be failure or other triggerings.In addition, as hereinbefore pointed out and be described below, the subregion for given stream reflects Penetrate dynamically to change at any time, and flows record and continue by SMS and/or SPS node processings.Therefore, in some embodiments, The map metadata of several versions can temporarily, at least be preserved for given stream, and each version corresponds to the different periods.
Dynamic stream subregion again
Figure 16 shows the example according to the dynamic stream of at least some embodiments again subregion.The time being shown in FIG. 16 At the time T1 of axis, establishment or initialization flow S1.Subregion mapping PM1 is created for stream S1, and in time interval T1 to the T2 phases Between keep effective.It is shown by example by three data records that SMS is received between T1 and T2.Data record 110A (DR The partitioning key values " Alice " supplied with client 110A) are submitted, DR 110B are submitted the subregion supplied with client Key assignments " Bill ", and DR 110C are submitted the partitioning key values " Charlie " supplied with client.In initial mapping PM1 In, all three data records 110A, 110B and 110C are mapped to the identical partitions with partition identifier " P1 ".For P1 Data record, individual node I1 are configured to handle intake, and individual node S1 is configured to handle storage, individual node R1 by with It sets to handle retrieval, and single working node W1 is configured to handle SPS processing.Validity range for mapping PM1 Initial time stamp is configured to T1.
In the Exemplary temporal axis of Figure 16, at time T2, stream S1 is by dynamic again subregion.In the embodiment of description In, data record continues to and by SMS and SPS processing, and subregion again occurs without taking into account when;Either SMS or SPS Off line is not all needed.Again subregion can start due to any one of many factors, for example, in response to intake, storage, Retrieve or handle the detection of the overload at node, in response to the workload level at the different hosts of each subsystem Between inclination or it is unbalanced detection or in response to the request from data consumer or data manufacturer's client.It is retouching In the embodiment painted, new mapping PM2 (or T2 in the near future) at time T2 works, such as by showing to be used for PM2 Validity range initial time stamp setting it is indicated.In at least some realization methods, difference group data record attribute removes It can be used for except use to data record partitioning before subregion again.In some cases, zone attribute in addition can (example Such as at the request of SMS) it is submitted by data manufacturer, and in other cases, the other attribute can absorb node by SMS It generates.This other attribute can be described as " adding salt " attribute, and can using the technology for the other attribute of subregion again Referred to as " add salt ".In a kind of example implementations, overload intake server can be to data manufacturer (such as to by counting The SMS client libraries code executed according to manufacturer) instruction:For subregion again, provided other than previously used subregion key Randomly selected smaller integer value.Raw partition key and plus the combination of other integer of salt then can be used in difference Distribution intake workload in group intake node.In some embodiments, it retrieves node and/or data consumer may necessary quilt It informs about the other attribute for subregion again.In at least some realization methods, this other attribute can be not used in Again subregion.
In the embodiment being shown in FIG. 16, relative to the subregion for being selected for identical key before T2, new Subregion mapping generates the different subregions for at least some data records for being selected for receiving after t 2.DR 110P T2 it After be submitted and be submitted after t 2 with partitioning key values " Bill ", and DR with partitioning key values " Alice ", DR 110Q 110R is submitted after t 2 with partitioning key values " Charlie ".In the Exemplary contexts shown, mapped by using PM2, DR 110P are designated the member of subregion " P4 ", and DR 110Q are designated the member of subregion " P5 ", and DR 110R are designated subregion The member of " P6 ".In the embodiment of description, the exemplary data record that neither one is shown as receiving after t 2 is referred to It is set to the member of previously used subregion " P1 ", on the contrary, completely new subregion can be used after subregion again.In some embodiments In, at least some previously used subregions can be continuing with after subregion again.For each in new subregion P4, P5 and P6 A, different nodes can be specified for intake, storage, retrieval and/or processing.For example, node I4, S4, R4 and W4 can be with It is configured for subregion P4, node I5, S5, R5 and P5 can be configured for subregion P5 and node I6, S6, R6 and P6 can To be configured for subregion P6.In some embodiments, such as before subregion again be used for this record, subregion again it Identical memory node can be used for having particular zones key or attribute afterwards record but storage different in the node Point (for example, different disks, different disk partition or different SSD can be used after subregion again).
Again during at least some periods after subregion, retrieval request can continue to be retrieved for counting for dynamic at T2 According to record, the data are handled before being recorded in again subregion by SMS intakes and/or storage subsystem.In at least some situations Under, the data record of request may must be based on PM1 mappings to retrieve, and the PM1 is when being mapped in intake data record Effectively.Therefore, as indicated in Figure 16, for the purpose of data retrieval, when PM1 and PM2 can continue one section after t 2 Between use.In at least some realization methods, data record can be deleted finally from stream when their agings, and old subregion Mapping also can be abandoned finally, such as when all corresponding data records have been deleted itself.In some embodiments, instead of It is deleted (or before deletion), stream record can achieve (such as archival strategy select based on client) and extremely deposit for different groups Place or device are stored up, so that the subregion mapping used by SMS can be still that can be used to the retrieval record after achieving.This In embodiment, the subregion mapping of such as PM1 and PM2 can retain, as long as they need to support for archive storage device Retrieval request.Some archive realization method in, can be used need not retain flow point area mapping different search methods (such as New index can create the data record for archive).In some embodiments, such as P2 before subregion again It is using but write-in will not re-boot to a certain moment quilt that its subregion can be after subregion again after subregion again " closing " is used to read, such as the equivalent of " partition end of arrival " error message can be provided in response to retrieval request.
In some implementations, data-oriented stream can be divided into many (such as hundreds and thousands of) a subregions.Consider A kind of exemplary cases, wherein stream S1 is initially divided into 1000 subregions, P1, P2 ..., P1000.Corresponding to one point In the case of the overload in area, for example P7 is deleted, this initial mapping that P7 is recorded for change data may be worth , but the mapping of other subregions haves no need to change.In one approach, can to create two by division operation again new Subregion P1001 and P1002.The record received after subregion again can be mapped to after subregion again P1001 or P1002, therefore the workload of P7 is distributed in two subregions, the attribute of the record will be initially (i.e. on the basis of original mappings On) their member relation has been generated in P7.Such as the remaining subregion of P1-P6 and P8-P1000 can need not be changed. When the relatively small subset of only subregion by it is this subregion is influenced again when, at least some embodiments, can generate and deposit Store up the data structure of combination, such as directed acyclic graph of subregion entry (or tree of subregion entry).Each entry may indicate that subregion Function output area and validity time range (period that partition information of entry is considered valid during that).Upper In the example of text, it is assumed that the subregion again for being related to P7 executes at time T2, and flows S1 (and its initial mapping) at time T1 It creates.Under this situation, the validity period for the entry about P7 will be " T1 to T2 ", for P1001's and P1002 Validity period will be " T2 is forward ", and the validity period for remaining subregion will be " T1 is forward ".In at least some realities In existing mode, the number of the memory or storage device for subregion map metadata can be caused using the data structure of this combination The substantive of amount is reduced.In example above, discusses and subregion P7 is separated into two new subregions.In at least some realizations In mode, subregion can also be merged during subregion again, and for example, it receives relatively little of retrieval request or submits phase Two to few record neighbouring subregions can be merged into single subregion.For any time point, partition functions can be used and have Effect property time range information expressly determines the subregion of data record institute subordinate.With the variation of time, the data knot of combination Structure can develop into more separate sections and/or execute merging, but for needed for subregion metadata total space may (when The influence that how long separation once occurs and how many average subregion is segregated so depended on) insignificantly increase.Compared to it Under, in different realization methods, whenever subregion again occurs, the constant metadata of the whole group for stream can be replicated and It is combined with the entry of the subregion for being influenced by subregion again.In a kind of realization method below, subregion is reflected The demand of the storage device and memory of penetrating metadata may be increased with faster rate, especially if old mapping is as above At least a period of time may must be retained described in literary after subregion again.
In at least some realities using the sequence number for including timestamp value (timestamp value 1304 shown in such as Figure 13 b) It applies in scheme, the sequence number transformation of specified type can be implemented to dynamic subregion again.Assume to be similar to figure by example The sequence number scheme based on timestamp of scheme shown in 13b is used for stream S1, wherein per second generate new timestamp value It is included in the sequence number.Supporting dynamic again at least some realization methods of subregion, in addition in dynamically subregion again Before except use, the sequence number dynamically distributed after subregion again can be all using different groups of timestamp value (with correspondence In the selected initial time stamp value of partition event again).For example, if dividing again submitting (even if it comes into force) dynamic Timestamp value is Tk used at the time in area, then any new sequence number sent out after described submit may need It will be to the usage time timestamp value Tk+1 that comes.Due to the high-order in timestamp value in scheme of the sequence number value used in Figure 13 b Position it is at least some in they are encoded, it is ensured that again partition event correspond to foregoing timestamp boundary can transfer Simplify in response to retrieval request come identify have mapping ready for use in involved bookkeeping.Therefore, in this implementation, When receiving the retrieval request of specified specific sequence number, can from the sequence number extraction time timestamp value, and can be easy Ground determines whether that the mapping after subregion again should be used, or whether should use the mapping before subregion again.If extraction Timestamp value be less than and be selected for again the initial time stamp of subregion, then the mapping before subregion again can be used, and If the timestamp value of extraction is equal to or higher than the initial time stamp value for being selected for again subregion, it can be used and divide again Mapping behind area.
Method for flow management and processing
Figure 17 is to show may perform to support for data record intake and data note according at least some embodiments Record the flow chart of the operating aspect of the respective sets programming interface of retrieval.As shown in element 1701, can for example from SMS clients or Data manufacturer's client receives the request for establishment or initialization data stream.It is described to can determine that (element 1704) is ready to use in The initial subregion mapping of stream, such as can identify point for needing particular data record institute subordinate for identification based on partitioning strategies The function in area and there is the input parameter for being ready to use in the function.As previously mentioned, in various embodiments, the control of SMS Component can be responsible for receiving and responding stream request to create.Realize that stream creates and initialize the mode of (and the operation of other control planes) It can be different from an embodiment to another embodiment.In one embodiment, for example, control server can be established Redundancy group, and the main control server of the redundancy group can pass through in persistent storage place generate and store use It is rung in the metadata appropriate (for example, initial subregion mapping, initial intake, storage and retrieval set of node etc.) of new stream Ying Yuliu requests to create.It can be generated to subsequent about described by using the main control server of the metadata of storage The response of the inquiry (such as the request about the backend nodes for being responsible for given subregion of node is absorbed from front end) of stream.In SMS In another realization method of control plane function, stream configuration metadata can be stored in the database, the database is by taking the photograph Take, store or at least some nodes of retrieval subsystem be directly it is addressable.It is creating with after initialization flow, is usually existing In the case of other interaction not with control unit, the data plane behaviour that such as record is submitted, stores and retrieved can be started Make, and can the data plane be handled by the corresponding component of corresponding subsystem and operated.
In some embodiments, data manufacturer may need to submit the specific subregion key with write request, and In other embodiments, can be based on metadata associated with write request (such as identity of data manufacturer receives from it The IP address of data record) determine the input for being ready to use in partition functions, or determined from the content of data record itself There is the input for being ready to use in partition functions.In at least one realization method, client optionally supplies in data record submission Partition identifier is answered, and in this implementation, it may not be necessary to partition functions in addition.
When the initial set of node (element 1707) for determining or being configured to intake, storage and retrieval functions for the stream When, it may be considered that many different factors.For example, subregion map itself (it can determine is divided into how many a subregions by the stream, And the relatively expected size of the subregion), about expected uptake rate and/or retrieval rate information (if this Information is useful), for flow data record durability/persistence requirements and/or for the high-availability requirement of each subsystem (it can cause the foundation for being similar to the redundancy group of those shown in Fig. 9 and Figure 10) can influence the number of the node of different sub-systems Amount and placement.In addition, may indicate that putting for various nodes (as shown in Figure 11, Figure 12 a and Figure 12 b) in client In the embodiment for setting destination type preference, this preference can also have the resource for being ready to use in SMS and/or SPS nodes in determination In work.In at least some embodiments, the node for being able to carry out intake, storage and/or search function can be established in advance Corresponding pond, and the selection member in this pond can be distributed to the new stream of each of establishment by control unit.In other implementations In scheme, at least in some cases, when establishment or initialization flow, it may be necessary to instantiate new intake, storage or retrieval Node.
It, can be by being implemented to any group of of data record submission at intake node in the embodiment of description Programming interface records (element 1710) to receive, including it (includes wherein, submitting request by data for example to submit interface online In) and with reference to submit interface (wherein, submit ask in address is provided, can node or SMS memory nodes be absorbed for example by SMS Retrieval data are asked from the submission) using network service request or other interfaces.It can be directed to and carry in different implementation scenarios It hands over each in the method for record to provide any amount of different types of programming interface, such as can support to apply accordingly Program Interfaces (API) are for online to that with reference to submitting, can establish webpage or network address, it can be achieved that graphic user interface or can Develop command-line tool.In at least some embodiments, SMS can be the record assigned sequence number each absorbed, such as indicate The sequence of intake or storage record, and sequence number can be used by the retrieval request of data consumer.In retrieval subsystem It unites at node, record retrieval request can be received by the programming Retrieval Interface of any group of realization, and can carry in response For the content (element 1713) of the data record of request.For non-sequential access, interface may include that for example obtaining iterator (is based on Sequence number indicated during iterator calls is obtained to ask to have iterator to be instantiated at the position selected in subregion) or Person obtains the record (getRecordWithSequenceNumber) with sequence number (to obtain the number with assigned serial number According to record).For sequential access, it can be achieved that such as obtain next record interface (since the current location of iterator or Ask multiple records in order since specified sequence number).In at least some embodiments, different Retrieval Interfaces can have There are the different charge rates being associated, such as can will be arranged to less than use by the charge rate of record for ordered retrieval In the charge rate that each of non-sequential retrieval records.In some embodiments, different submission interfaces can also have different Charge rate, such as submit each record to spend higher than online with reference to submitting.
With the variation of time, control node or dedicated accounting server can be acquired in each of flow management service The service index (element 1716) for the different programming interface realized at subsystem.The index may include for example:Difference programming connects The calling of mouth counts, (it may differ from the calling counting at least some interfaces to the sum of the record of intake or retrieval, described Interface can such as be used to retrieve the next record interface of acquisition of multiple records by individually calling), intake or retrieval The sum etc. of data.It is optionally at least partially based on service index and produces with the associated corresponding charge rate of programming interface The raw charging volume for needing to be collected come the client for the data flow automatically to the client or production and/or consumption for possessing the stream (element 1719).In at least some embodiments, billing activities can be asynchronous, example relative to stream intake/search operaqtion As can based on during this month collected index come the monthly charging phase at the end of generate charging.
Figure 18 a are the operation sides for showing may perform to configuration stream process (SPS) stage according at least some embodiments The flow chart in face., it can be achieved that programming interface is so that client can be permitted for flow data record configuration as shown in element 1801 The multiprocessing stage.In order to configure moment, such as client may indicate that there is the flow data record for staying in subregion in the stage The processing operation of upper execution, for processing operation output allocation strategy and other parameters, such as pending data will The identity of the inlet flow obtained from it.In some embodiments, the processing operation in SPS stages may need to be idempotent. In other embodiments, non-idempotent can also be supported to operate at least some stages.In some embodiments, if given Pending processing is non-idempotent at stage, is regularly that place is removed outside some persistence by configuration work node The output of operation is then matched during restoration to record when clear operation executes relative to record sorted order It sets and replaces working node to recur the clear operation, client can remain able to obtain the relevant benefit of recovery of idempotence. In at least some embodiments, client can utilize flow data on parallel work-flow several different states and by with The result in some stages of the inlet flow in other stages is acted on to configure directed acyclic graph (DAG) or other processing stages Figure.In some embodiments, one or more temporary currents can be created between different phase rather than lasting stream, such as need not Must the data record output from a stage persistence be stored in before being fed to different phase as input to deposit On storage device.
In some embodiments, it can be achieved that any amount of different recovery policy is used for SPS stages, including such as base In the recovery policy or recovery policy as possible of check point.In one embodiment, programming interface can be used to select in client Recovery policy for the different SPS stages.At the stage using the recovery policy based on check point, working node can be configured Discontinuously store progress record or check point, to instruction reach in the flow point area that they arrived what degree (for example, The sequence number of the record handled recently can be stored as to the indicator of the progress).Described in Figure 19, in event After barrier, progress record then can be used during recovery operation.In recovery policy as possible, progress record need not be stored, and And the replacement working node configured in response to failure can carry out simple process when receiving new data record to it.To Determine in SPS stage diagrams or workflow, in some embodiments, different recovery policies can be applied to the different stages.
SPS controls server for example can receive idempotent behaviour by one in programming interface indicated in element 1801 Making the instruction of Op1, the idempotent operation Op1 has according to partitioning strategies PPol1 to be executed at the moment PS1 for staying in stream S1, Described in the result that handles need to be allocated (element 1804) according to output distribution descriptor DDesc1.Can for example based on Various factors determines the quantity for the working node for needing to be configured to stage PS1 and for the virtually or physically money needed for node Source (element 1807), the factor such as Ppol1, idempotent operate the complexity of Op1 and have the resource for being ready to use in working node Performance capability.
Then can instantiate with configuration work node (element 1810), such as selected virtually or physically machine Energy Resources Service Process or thread.In a kind of simple realization method, for example, can one working node of initial allocation for S1 each divide Area.Given working node is can configure to come:(a) data record is received from the subset appropriate of the retrieval node of S1, (b) received Data record on execute Op1, (c) optionally, such as processed which group of instruction stored based on the recovery policy for PS1 Progress record/check point of partitioned record, and output (d) is transmitted to the destination indicated by DDesc1 and (such as is used as Intermediate lasting stream or temporary current or the input for directly arriving other processing stages or storage system).It should be noted that at least in some realities It applies in scheme, SPS processing may not necessarily generate any output transmitted elsewhere on the basis of advance.For example, some SPS Application program can be used simply as the temporary resource library of data record, and/or may be implemented to allow users to check that data are remembered The query interface of record.This application program can manage the output of its own, such as may be in response to the inquiry received and non-basis Distribution descriptor exports to generate.Recording relevant SPS application programs can keep being acquired most from large scale distributed system Log recording one day after, such as enable a client to check record data for the purpose of debugging or analysis.Therefore, exist In some embodiments, output distribution descriptor need not be specified at least some stages for SPS, be used at least some streams Or it is used at least some subregions.Working node then can start to retrieve and process data note according to their corresponding configuration settings It records (element 1813).In at least some embodiments, SPS control nodes can (such as the response using such as heart-beat protocol Check) health status of monitoring node and various other indexs, such as money in the Energy Resources Service for being used for working node Source utilizes horizontal (element 1816).For example, if working node should be replaced and realize recovery policy as described below, then The information acquired from working node can be used to determine whether that failure is needed to shift.
In some embodiments, installable SPS client libraries are provided to hope at all places of client And/or the Energy Resources Service of the client selection in provider network realizes those of SPS working nodes client.Client library may be used also Allow SPS clients that them is selected to be desirable for the degree of the various control plane features for the service that SPS is managed, such as health prison Brake, automatic workload monitoring and equilibrium, safety management, dynamically subregion etc. again.Figure 18 b are according at least some embodiment party The executable operating aspect of component invocation for showing the client library in response to the configuration for stream process working node of case Flow chart.As shown in element 1851, it is possible to provide SPS client libraries (such as by from being configured to execute shown in Figure 18 a The network address of the service of the multi-tenant SPS management of various operations is downloaded).The library may include many executable components and/or can chain It is connected to the component of client application.Some library components may make client that can select SPS management services, be managed to SPS Service registration or the required characteristic for specifying various working nodes, there is pending one or more SPS ranks at the working node The stream process operation of section.For example, a client may want to using by the provider network of working node it is virtual based on The calculated examples collection of themselves that service center realizes is calculated, and another client may want to using positioned at client oneself For the computing device (dedicated unit that do not supported by provider network such as) at the data center of processing stream record.Client Virtual computing can be used to service online or if necessary at the place of themselves on the basis of if necessary Calculated examples bring working node.In addition to or replace working node this on-demand instantiation, in some embodiments In, client can be pre-configured with the potential reusable working node pond that can be disposed when needed.In some implementations, may be used Service registration of the library component to allow client to be managed to SPS is executed or called, can be handled by client by SPS management services The subsequent control plane operations that end is instantiated as the working node of specified phases are used for its specific process or thread.In a reality Apply in scheme, client can with can from need by for working node SPS management services handle different stage control It is selected in plane responsibility processed, for example, a client may want to carry out monitoring using the customized module of themselves Node health, and another client may want to using SPS management services come be used for monitoring node health and if It detects failure and takes appropriate action.
SPS management services can receive the instruction (element 1854) that particular clients are desirable for client library, the client End library is used to configure working node and/or the control plane operation of specific SPS stages PS1.(PS1 itself can be used in the library Included programming interface is similar to shown in Fig. 4 to design, or using what the service managed by SPS was exposed based on net The programming interface of the interface of network designs.) client can also indicate that stream, the data of the stream have to be retrieved for being used as passing through The input of PS1.Optionally, at least some embodiments, client may indicate that the control plane setting for PS1, such as Whether client is desirable for the health monitoring ability of the service for node, or whether is ready the health monitoring using customization Tool (element 1857).Depending on by the preference indicated by client, it may be determined that need the SMS used for being configured to client And/or one or more nodes (element 1860) of SPS.It can be established between the working node of client and arrive SMS/SPS nodes Network connectivity and/or executable other configurations be operable so that data record stream and processing can be obtained as needed As a result.When receiving retrieval request, data record can be supplied to SP1 working nodes, and can be executed as needed required Control plane operation (if having by client request).It should be noted that at least in some embodiments, can also or replace Realize that similar method, the method enable a client to control each seed that they are desirable for SMS management services in generation ground The degree of the control plane function of system.
Figure 19 is to show may perform to realization for the one or more extensive of stream process according at least some embodiments The flow chart of the operating aspect of multiple strategy.As shown in element 1901, SPS control nodes can determine met it is specific for replacing The trigger criteria of working node, such as working node may become without response or unsound, the workload of present node Level may have reached the threshold value of failure transfer, and the quantity of the mistake detected at working node may be more than threshold value, or Person can recognize that some other unexpected states of working node.Working node (the element that recognizable or instantiation is replaced 1904).In some embodiments, the pond that can establish available work thread is used as from a wherein optional worker thread and replaces It changes object, such as new thread or process can be started.
If there is recovery as possible to be used at the SPS stages (particular job node is effective at the SPS stages) Strategy (determined by such as element 1907), then the working node replaced can be only when other data record be made available by Start to be handled (element 1916) to them, such as the record of the progress for the working node that do not replace needs to check.If Recovery policy to be used based on check point, then instruction (such as storage device address or URL) (element in place can be provided 1910) working node, is replaced at the place may have access to the progress record stored by the working node replaced.Replace work Node can retrieve the nearest progress record stored by the node replaced, and described group of number is determined using the progress record According to record (element 1913), in described group of data record, the idempotent operation in the stage should be executed by replacing working node. In this recovery policy based on check point, depend on last progress record with instantiation replace working node time it Between duration, and working node depending on the replacement processed other record after the progress record stored Rate, the data record of some quantity can be handled more than once.In at least some embodiments, if the behaviour being carrying out Work is idempotent, then this repetitive operation can not have negative effect.Working node is being replaced based on before storing After progress record executes repetition recovery operation, at least some embodiments, its own can be stored by replacing working node It indicates the progress record for completing to restore, and normal worker thread operation can be started in the data record of newest reception and (wanted Element is 1916).
Figure 20 is to show may perform to a variety of secure options of the realization for data flow according at least some embodiments Operating aspect flow chart., it can be achieved that one or more programming interface, one or more of volumes as shown in element 2001 Journey interface enables a client to be selected from the various secure options for data stream management and processing, the option packet Include the placement destination of the node (for example, intake, storage, retrieval, processing or control node) for example for different function type Type option.Destination type is placed to may differ from each other in the various aspects of their secure configuration file.In some realities It applies in scheme, has the physical location for the resource for being ready to use in SMS or SPS nodes can be from a kind of destination type to another destination Type and it is different.For example, the resource of the example host such as at provider network data center can be used for the node, or Person may be used at resource or usable third party's resource at all facilities of client.In at least some embodiments, Network Isolation rank or other network characterizations can be different from a kind of destination type to another destination type, such as can be Some SMS or SPS nodes are instantiated in the virtual network of isolation, or pass through dedicated isolation at all facilities of client Some SMS or SPS nodes are connected to provider network by physical link.In one embodiment, client, which may indicate that, is supplying Answer and need to be established certain form of SMS or SPS nodes at single tenant's example host of quotient's network, rather than use can also be can Multi-tenant example host is established.In at least some embodiments, various types of Encryption Options pass through safety-related Programming interface can also be selectable.
Can by safety-related programming interface come receive client about the one or more functions kind for flowing S1 The secure configuration file of the node of class selects or preference.For example, client is alternatively used for one of the node of functional type FC1 Secure configuration file (such as client may want to realize SPS working nodes at all places of client) and for difference The different secure configuration files of the node of functional type FC2 are (for example, client may be ready in provider network data center Realize SMS intake nodes or memory node in place) (element 2004).In some cases, client can determine to establish with identical Secure configuration file all different function types node.In some embodiments, SMS and/or SPS can be limited and is directed to The placement destination type of the acquiescence of various functions type, for example, unless client in addition isolation of the instruction in provider network Virtual network in can establish the nodes of all functional types.
Then can based on client for secure configuration file and/or place preference (or based on for client not The default setting of the functional type of preference is provided it) configure the node (element 2007) of different function type.The configuration Can relate to for example select physical host appropriate or physical machine, and instantiate by the node of different function type it is appropriate based on Example, virtual machine, process and/or thread are calculated, and establishes network connection appropriate among the nodes.In some embodiments, It can provide for the executable library component (part as the configuration) of different flow managements and processing function for supplying It answers and is installed at the host of quotient's network-external.
According at least some embodiments, encryption preference that can be for example expressed by client or based on acquiescence Encryption is arranged and starts encrypting module (element 2010) at the node of one or more types.It can be with subsequent start-up various functions The node of type, so that such as desirably absorbing, storing, retrieve and/or handling flow data (element 2013) by client.
Figure 21 is to show may perform to behaviour of the realization for the partitioning strategies of data flow according at least some embodiments Flow chart in terms of work.Can be that data flow determines partitioning strategies as shown in element 2101.The strategy may include such as data The initial mapping of subregion, data note of the initial mapping based on the key supplied by data manufacturer or based on submission is recorded Each attribute of record and one or more trigger criterias for partition data stream again.In some embodiments, for example, Hash function can be applied to a subregion key or multiple subregion keys, to generate the hashed value of 128 integers.Can will likely The range of 128 integers is divided into N number of continuous subinterval, and each subinterval represents one in N number of subregion of stream.At some In embodiment, the quantity of subregion and/or the relative size in subinterval can flow to from one another stream and be changed.At least one In a little embodiments, the client of stream is configured for its interests can be provided about the input for having partition scheme ready for use, such as The quantity of required subregion or the required feature for having partition functions ready for use.In at least one embodiment, client can Partition identifier or title are provided for some subsets or the data record all submitted of the data record of submission.
When the data record of receiving stream, their corresponding subregions can be determined based on the key of supply and/or other attributes, and And the intake suitably organized, the subregion (element 2104) of storage and retrieval node for identification may be selected.In at least some embodiment party Can be that data record generates corresponding sequence number, such as instruction receives the sequence (element 2107) for the record for giving subregion in case. In some implementations, sequence number may include many elements, and such as timestamp value is (for example, from such as 1 day 00 January in 1970: 00:The past number of seconds of known epoch of 00UTC), from storage subsystem obtain subsequence value, SMS softwares version number and/ Or partition identifier.In some embodiments, sequence number can be supplied to data manufacturer, such as to confirm the data submitted The successful intake of record.In some embodiments, sequence number can also be used by data consumer, come to absorb ordered retrieval stream Or the data record of subregion.
In at least some embodiments, data record can be stored at memory node with sequence number sequence, be based on dividing Area's strategy guides data record to the memory node (element 2110).In the embodiment using rotating disk storage device In, the data record of reception can be preserved to disk usually using being sequentially written in, to avoid disk Seek latency time. In at least some realization methods, non-volatile cache can be used as write cache, example before by record storage to disk Such as to be further reduced the possibility of disk tracking.Request in response to the multiple data records to sort according to sequence number to reading (for example, obtaining the calling of next record or similar interface), then can use sequence to read from storage device reads data log (element 2113).
Figure 22 be according at least some embodiments show may perform to realize the behaviour of the dynamic subregion again of data flow Flow chart in terms of work.As shown in element 2201, (for example, at control unit of SMS or SPS) can make stream have it is pending The dynamically determination of subregion again.Many different trigger conditions can generate the decision of subregion stream again, such as intake, storage, The detection of overload at the one or more of retrieval, processing or control node, or in the workload level of different nodes Unbalanced detection, or can be from the request of the subregion again of client (such as data manufacturer or data consumer) reception. In some implementations, the request of client subregion again may include the detail of the subregion again of request, such as need The various parameters of the mapping of the modification of generation (for example, there is the quantity etc. of to be added or removal subregion, should combine or detach institute State specified partition).In one implementation, subregion request may indicate that client wishes to solve the problems, such as state to client again (such as load imbalance), and SMS or SPS can be responsible for the description of problem state being converted to division operation again appropriate. In some cases, instead of asking subregion again or description problem state, client, which may specify, is ready to use in again touching for subregion Issue of bidding documents is accurate.In some embodiments, the determination of the change of the persistent data demand of data flow can trigger subregion again, this can Such as generate the different group storage devices for flowing record or the selection of different memory technologies.In some cases, data flow The detection of change of use pattern (for example, rate of production or consumption data record) can also cause subregion again, and also It can cause the use of different memory technology or the different group storage devices for the use pattern for being particularly suited for changing.For example, weight Determining for new subregion can be based on the determination to it is expected the rate read and write for giving subregion or all flowing, and SSD can be with It is the memory technology being more suitable for than spinning disk.In one embodiment, arrangement or the software and/or hardware that will generate Version, which changes, can trigger subregion again.In some cases, when client is indicated by using different partition method or When the budget limit that different storage methods can more effectively meet, price or billing issues can trigger subregion again.Extremely In some few embodiments, the performance objective of change also can trigger subregion again.It is optional in the embodiment described in fig. 22 The initial timestamp value for being ready to use in the sequence number distributed after subregion again is selected (such as from 1 day 00 January in 1970:00: The offset of the second of 00UTC passes through the typically available epoch value of system calling in several operating systems) (element 2204). In some realization methods, the global state manager realized at provider network can be supported to obtain epoch value (getEpochValue) API, such as so that the various parts of SMS and/or SPS, which can obtain, is ready to use in sequence number production Raw consistent timestamp value.In other realization methods, other times source can be used, such as may specify SMS or SPS control sections Point calls come the timestamp value consistently to sort to other component offer or the calling of usable local system.In some implementations In scheme, timestamp value necessarily corresponds to wallclock timestamp at any particular host, such as can simply use and be increased monotonically Integer counter value.
It can be the subregion mapping (element of the modification of the raw mapping for being different from using when subregion again determines of the miscarriage 2207).In at least some embodiments, before subregion again, the mapping of change can be by the data with particular zones key Record maps to the subregion different from the subregion for mapping to the data record with same keys.It may depend on for subregion again Trigger condition and/or depending in accordance with work figureofmerit detach some subregions (subregion usually largely used), and can Merge other and (usually uses) subregion on a small quantity.In some embodiments, can be used after subregion again with subregion again it Preceding different partition functions, such as different hash functions, or the difference that hash function result is divided to Composition Region can be used Method.Correspond in such as subregion in some realization methods of the successive range of 128 integers, can be incited somebody to action after subregion again 128 integer spaces are divided into different groups of subinterval.It, can be by intake, storage, retrieval, place at least some embodiments The new group of reason or control node distributes to the subregion newly created.It in some implementations, can be by the number of space efficient combination It is used for representing the mapping (element 2208) of initial mapping and modification according to structure.For example, directed acyclic graph or tree structure can be stored, Wherein each entry includes partition functions output area (for example, the model of the result corresponding to the subregion hash function of given subregion Enclose) and validity time range instruction, such that due to again subregion it is only necessary to change correspond to modification subregion note Record.The entry for subregion remained unchanged during subregion again can need not modify in data structure.It can match New node is set to realize that the subregion of modification maps (element 2210).In at least some embodiments, due at least one section The retrieval request of data record to being stored on the basis of mapping before can be continued in time, can retain previous node and Previous mapping is for a period of time.It, can when receiving the read requests of specified specific sequence number or timestamp (element 2213) It is made with (for example, at control node or at retrieval node) about whether read requests need by using new subregion The determination that mapping or previous subregion map to meet.It then can be needed from its acquisition request to identify using selected mapping The memory node appropriate of data.
Figure 23 is to show may perform to realize that being used for data flow records at least once according at least some embodiments The flow chart of the operating aspect of record intake strategy.As shown in element 2301, it can be achieved that one or more programming interface so that Obtain client selection can absorb tactful, the record intake strategy for the record of data flow from several intake policing options Option includes such as (a) tactful at least once, and according to the strategy at least once, record submitter by one or many submits Record is until receiving affirmative acknowledgment, or (b) absorbs strategy as possible, strategy is absorbed as possible according to described, not at least one A little records, which are submitted, provides confirmation.Some data production client may worry them unlike other data produce client Record sub-fraction potential loss, and may therefore select acquisition method as possible.In some implementations, even if For being configured to the stream absorbed as possible, SMS can still provide the confirmation of some subsets for data record, or may be very To confirmation of the offer for all data records is attempted, even if confirmation of the strategy without need for each data record as possible.
Request can be received by a programming interface, the request instruction has the specific intake strategy for being ready to use in specified stream (element 2304).Intake node (element 2307) can be instantiated according to the effective intake strategy of the stream.It is saved when in intake When receiving one or more submissions of identical data record at point (element 2310), effective intake strategy is may depend on to adopt Take different actions.If intake strategy (determined by such as element 2313) at least once is used, one can be directed to Each in a or multiple submissions sends an acknowledgment to data manufacturer, but a number can be only preserved at storage subsystem According to record (2316).(it should be noted that according to for flowing effective persistence strategy, the N of given record can be stored in some cases A copy, but if data-oriented is submitted to record M times, may be only one and submit generation copy, that is, the record pair stored This sum will be still N rather than NxM.) if strategy is absorbed as possible effectively (as also detected in element 2313), it can be Still a data record is preserved at storage device, but need not send an acknowledgment to data manufacturer (element 2319). In at least some embodiments, selected intake strategy can be at least partially based on optionally to determine client charging volume (element 2322).As previously noted, in some embodiments, the intake strategy at least once of two versions can be supported.In a version In this, it is similar to that version shown in Figure 23, SMS can be responsible for deduplication data record (i.e., it is ensured that only in response to one group two It is a or more submit in one and store data at SMS storage subsystems).In the intake at least once of different editions In, allow the repetition of the data record by SMS.Later approach may be useful for streaming application, wherein depositing In the stream application of negative consequences seldom or without data record repetition, and/or the repeated elimination for executing themselves Program may be useful.
Figure 24 is to show may perform to a variety of persistence plans of the realization for data flow according at least some embodiments The flow chart of operating aspect slightly., it can be achieved that enabling a client to from multiple persistence strategies as shown in element 2401 One or more programming interface of the selection for the persistence strategy of flow data record.Persistence strategy can be in all fields appoint What one upper different from each other:Such as the quantity of (a) copy to be saved can be different (for example, 2, N number of copy pair can be supported Copy is to single replication policy), (b) storage location/type of device ready for use can be different (for example, spinning disk is to SSD pairs RAM is to database service or multi-tenant storage device) and/or it is (c) described tactful in the restorative expection to extensive failure It can difference (for example, multiple data centers can be supported to forms data Central Policy) in degree.It can receive request, the request instruction The selection (element 2404) of the specific persistence strategy for specified stream of client.In some embodiments, by client The persistence strategy of selection can cause different storage location types or the use of type of device of the respective partition for given stream. In one embodiment, SMS (rather than client) can select storage location class in stream grade or in subregion grade Type or type of device.In some embodiments, when selecting persistence strategy in some embodiments, client can refer to Show data endurance target and/or performance objective (all readings as required or write-in output or delay), and these targets Storage arrangement type appropriate or place can be selected by SMS uses.For example, if necessary to less delay, then can be used SSD stores the data record of one or more subregions or stream instead of spinning disk.
One group of intake node be can determine or configured to receive the data record of selected stream from data manufacturer, and can be matched One group of memory node is set to realize selected persistence strategy (element 2407).When receiving data record at intake node (element 2410) can store data record at selected storage device based on selected persistence strategy by memory node One or more copies, the memory node are responsible for the subregion (element 2413) of data record institute subordinate.In at least some realizations In mode, charging volume (element can be determined based on the specified persistence strategy selected by client optionally (and/or asynchronously) 2416)。
The work load management of dispersion for stream process
In some embodiments, it can for example be realized in decentralized manner by giving the working node in the SPS stages A big chunk of the control plane function of SPS is all, and the given SPS stages pass through the shared of such as database table Data structure (is such as distributed to the subregion of working node, in response to dynamically subregion, health are supervised again to coordinate various control operations Survey and/or load balancing).Given working node W1 can check the entry in shared data structure, for example current with determination The inlet flow (if yes) of which subregion in the untreated stage.If it find that this subregion P1, then W1 is renewable Entry in shared data structure, to indicate that W1 will execute the processing operation in the stage on the record of P1.Other work Node would know that W1 be assigned handle P1 record, and may therefore distribute different subregions to themselves.Working node Can inquiry periodically or once in a while be submitted to SMS control planes, to determine that the currently valid subregion for inlet flow maps, and must Shared data structure is updated when wanting to indicate that mapping changes (such as due to again subregion).In various embodiments, it loads Balanced and other operations can also be coordinated by shared data structure, as described below.In the realization side of some this dispersions In formula, it may not be necessary to which dedicated control node is used for SPS, to reduce the expense realized needed for SPS workflows.It is this The SPS control planes realization method of dispersion may especially be welcome by the consumer for being concerned about budget, and the consumer utilizes SPS visitors Family end library is come the ground at the calculated examples for example in the provider network for distributing to consumer or outside provider network The various aspects of stream process are realized at point.Such as when all resources for SMS and SPS are configured in provider network When, the SPS control plane technologies of dispersion can be also used in the embodiment of unused client library.Working node is realized at which The SPS of some or all of SPS control plane functions at least some processing stages can referred to herein as " dispersion control SPS " processed.
Figure 25 shows the example of the stream processing system according at least some embodiments, the wherein working node of processing stage Coordinate their workload using database table.In decentralised control SPS 2590, two stages 215A and 215B are defined, Each stage has respective sets working node.Stage 215A includes working node 2540A and 2540B, and stage 415B includes work Make node 2540K and 2540L.For each in stage 215A and 215B, correspondence is created at database service 2520 Subregion distribute (PA) table 2550, such as the PA tables 2550A of stage 215A, for the PA tables 2550B of stage 215B.One In a little embodiments, such as in response to the calling of client library component or function, it can be created during the stage initializes for giving Determine the PA tables 2550 in stage.Each PA tables 2550 can insert the entry or row of the unallocated subregion for the inlet flow for representing the stage Initial set (subregion that i.e. no working node is currently allocated to).The exemplary row or attribute of PA table clauses show in fig. 26 Go out and is described below.Start the working node 2540 for the stage (for example, in calculated examples or other services At device start process or thread) can be imparted into the stage PA tables read/write access.From working node The arrow by being respectively used to working node 2540A, 2540B, 2540K and 2540L in fig. 25 is read and write for PA tables 2564A, 2564B, 2564K and 2564L are indicated.
Given working node 2540 can be configured to select to execute institute on it by checking the entry in PA tables State the particular zones of the processing operation in stage.In one implementation, working node 2540A can be scanned in PA tables 2550A Entry until it finds the entry of unappropriated subregion Pk, and can be attempted to distribute subregion Pk by updating the entry To its own, such as by the way that the identifier of working node to be inserted into a row of the entry.It is this to be inserted into it is believed that class It is similar to lock subregion by working node.Depending on the type of database service currently in use, can be used (for example, passing through The just almost the same time identifies two or more working nodes of unappropriated subregion) it manages to the latent of PA table clauses In the distinct methods of concurrent write-in.
In one embodiment, the irrelevant multi-tenant database service of provider network, more rents can be used Strong consistency and conditionity are supported in the service of user data library in the case where necessarily supporting relevant db transaction semanteme Write operation.In this case, it can be updated by working node use condition write operation.If no working node is distributed to Subregion, then considering that wherein row " working node ID " are used to refer to distribute to the mark of the particular job node of subregion in PA tables Know the example of symbol, and the value of the row is configured to " null value ".Under this situation, the working node with identifier WID1 It can ask logic equivalent below:If " in the entry for subregion Pk, working node ID is null value, then will be used for The working node ID of that entry is arranged to WID1 ".If this conditionity write request success, with identifier WID1 Working node can be assumed that subregion Pk is assigned to it.Working node then can start the note for example using SMS retrieval subsystems 206 Record Retrieval Interface retrieves the data record of subregion Pk, such as by arrow 2554 (such as be respectively used to working node 2540A, Arrow 2554A, 2554B, 2554K and 2554L of 2540B, 2540K and 2540L) indicated by, and held on the record of retrieval Row processing operation.If conditionity write-in failure, working node can restart to search for different unappropriated subregions. In other embodiments, the database service (such as relevant database) for supporting affairs can be used, and transaction functionality is available Realize the equivalent of conditionity write operation, such as to ensure that subregion, which is only distributed to the multiple of working node, concurrent (or connects A success in trial closely concurrently), and involved working node in this concurrently trial is reliably notified it Success or failure.In some embodiments, it can be used both to be written independent of conditionity and be supported also not dependent on affairs Simultaneous techniques.In some implementations, database service can not used;Can locked service be used by working node on the contrary To obtain exclusiveness access, for the update to the entry in the persistent data structure similar to PA tables.
Other working nodes 2540 can check the entry in PA tables, determine which subregion is unappropriated, and can be final One or more subregions are successfully distributed to oneself.In this way, it is used for an inlet flow or multiple defeated in the stage The workload of the processing of the subregion to become a mandarin can be finally allocated by the working node in the stage in them.
The initial subregion mapping of any given stream can change over time, such as the dynamic due to describing before is divided again Area operates and changes.Therefore, in the embodiment described in fig. 25, one or more of working node 2540 can be once in a while (or in response to trigger condition as described below) is asked to the submission of the SMS control subsystems 210 of the inlet flow in their stage It asks, to obtain current subregion metadata.In some implementations, this tune for asking to may include SMS control planes API With such as by the calling for obtaining stream information API of arrow 2544A, 2544B, 2544K and 2544L instruction.SMS control subsystems Newest list and/or the other details of the subregion of the stream, the validity period of such as subregion can for example be replied.If by The partition information that SMS control subsystems 210 provide mismatches the entry in PA tables, then PA tables can be repaiied by working node Change, such as is modified by the way that entry is inserted into or deleted for one or more subregions.In at least some embodiments, arrive This request 2554 of SMS control subsystems can be usually than record (and/or the database read or write operation of retrieval request 2554 2564) frequency is much lower, as indicated by the label " infrequently " by arrow 2554A.For example, once working node is assigned point Area, can generally remain retrieve and process the data record of that subregion until partition data consumed completely (for example, if The owner of the stream closes the stream, either if due to dynamically again subregion and close the subregion) or until encountering The situation of some other low possibilities is (for example, as discussed below, if different working nodes is due to the load that detects Unbalanced and request partition transfer).Therefore, in various embodiments, related to acquisition stream information or similar API is called The expense of connection may be typically fairly small, even if being provided with a large amount of information (if at hundred in response to any given calling Thousands of a subregions are defined for the inlet flow in stage, it would be possible that can be such case).
In the embodiment described in fig. 25, some crucial work load management operations of decentralised control SPS environment can Therefore it is summarized as follows:(a) it is at least partially based on by first working node in stream process stage and accesses database table to select to flow The particular zones of the input traffic of processing stage are realized on the stream process stage and are limited at one group of that stage Reason operation;(b) indicator of particular zones to the distribution of the first working node is written in the particular items being stored in table; (c) the programmed recording Retrieval Interface realized in multi-tenant flow management service center is used to retrieve particular zones by the first working node Record;(d) described group of processing operation is realized on the record of particular zones by the first working node;(e) by the second working node The particular items being at least partially based in particular database table determine the first working node of distribution to execute institute on particular zones State a group processing operation;And different subregions (f) is selected by the second working node, execute described group on the different subregion Processing operation.And if be retained in the subregion for being assigned to it without more record when working node is determined, work Metadata on the inlet flow from SMS control subsystems can be asked by making node, and can if metadata indicates difference Update PA tables.
Figure 26 shows to be storable in the subregion allocation table 2550 coordinated for workload according at least some embodiments In exemplary entries.As shown, table 2550 may include four row:Partition identifier row 2614, assignment node identification Accord with row 2618, working node health indicator column 2620 and workload level indicator row 2622.In other realization methods, Other row settings can be achieved, such as instruction partition creating time or sectoring function output valve can be used in some embodiments The row of range, or workload level indicator can not used and arranged.
It should be noted that in some embodiments, the partition list 2650 that is maintained by SMS control subsystems (such as point The part of the data structures of area entry tree, figure or other combinations described before) at least it may include at some time points Than being included in more subregions in PA tables 2550.In the example of description, partition list 2650 includes subregion P1, P2, P3, P4 And P5, wherein P1 and P4 are shown as the closed state due to subregion again, and P2, P3 and P5 be shown to it is effective (i.e. The subregion that its data record is currently being retrieved and is handling).In the embodiment of description, PA tables 2650 include for effective The entry of subregion, and do not include the entry of subregion for closing (for example, when working node obtains after subregion again occurs When taking the response to obtaining stream information calling, the entry may be deleted by working node).At least in some realization methods In, and place can must be with the corresponding entry in PA tables at given time point for all subregions when front opening of non-streaming;Phase Instead, those of such as current distribution can only be presented or handling the subset of subregion.
In the Exemplary contexts being shown in FIG. 26, subregion P1 and P2 are assigned to the work for being respectively provided with identifier W7 and W3 Make node, and P5 is currently unappropriated.In different realization methods, healthy indicator column 2620 can store different types of Value.In some implementations, working node can be responsible for periodically (for example, intuitively being pushed away every N seconds, or according to based on some groups Disconnected arrangement) content that updates healthy indicator column in the PA entries of the subregion that they are distributed, to indicate that working node is It is effective and can continue to them retrieves and processes operation.In fig. 26, the working node for the entry can be stored The instruction of the nearest time (" last modification time ") of the healthy indicator column of update, such as working node W7 are shown as 2013 The 02 of on December 1, in:24:Entry is had been modified by 54 and 53 seconds.In some embodiments, other working nodes can be used most Modification time value determines whether assignment node is healthy afterwards, for example, if pass by X seconds or X minutes, as institute It states defined in the failover policy in stage, then assignment node may be assumed unhealthy or can not visit It asks, and the subregion can be redistributed.In other realization methods, counter can be used as to healthy indicator (for example, if Counter Value does not change in Y seconds, assignment node can be considered as the time shifted for failure Choosing), or " last reading the time " value for indicating when assignment node reads entry for the last time can be used.
In at least some embodiments, workload level indicator value 2622 can be deposited for example by assignment node In the entry, (for example, in five minutes before last modification time) are residing such as during some nearest time intervals for storage The quantity of the record of reason, the nearest performance-relevant index of working node, such as cpu busy percentage, memory utilization rate, storage Utilization ratio of device etc..In some embodiments, can this workload level indicator value be used by working node, is with determination It is no there are load imbalance, following article is taken action about described in Figure 29, and in response to detect unbalanced.Example Such as, working node Wk can determine that its workload level has been more than that average workload is horizontal, and can not distribute in its subregion One, or can be with request dynamic again subregion;Alternatively, working node Wk can determine its workload relative to other works Make it is too low for the workload of node, and can be its own distribute other subregion.Therefore, in the embodiment of description In, by using the row of PA tables indicated in Figure 26, working node can perform one in the control plane function of same type A bit, the control plane function can be executed usually by dedicated SPS control nodes in central controlled SPS realization methods
Figure 27 shows to be selected at it by the execution of the working node in stream process stage according at least some embodiments The operating aspect of the upper subregion for executing processing operation.It, can be in the SPS processing stages for decentralised control as shown in element 2701 PA tables PAT1 is initialized at the database service of SP1.Can for example when for example from client facility host or from supplier Calculated examples at network data center are called creates the table when SPS client library components.Client library can be used for various Purpose:Such as to provide for having the executable component for staying in the particular procedure realized at SPS stages operation, such as JAR (JavaTMAchieve) file, with indicating label (such as program name, process title or calculated examples title), the label can For identify working node, be used to refer to need to be used as input for the stage stream, be used to refer to the defeated of the stage Go out destination (if yes) etc..In some embodiments, can be initially that PAT1 is inserted for being defined for the stage Inlet flow subregion { P1, P2 ... } at least one subset entry or row.In some implementations, it can initially protect It is vacant to hold table, and one or more working nodes can for example be due to obtaining subregion metadata from SMS control subsystems Row of the table filling for unappropriated subregion.At each calculated examples that can be for example in provider network or in client Start the initial set (element 2704) of working node { W1, W2 ... } at all computing devices.In the embodiment of description In, working node can be assigned and read and write access to PAT1.
When working node occurs online, they can respectively access PAT1 with the unappropriated subregion that tries to find out.For example, Working node W1 can check PAT1 and find that subregion P1 is unappropriated (element 2707).W1 then can be for example depending on The type of the database service used is come by using the write request of conditionity or businesslike update request and in PAT1 P1 is distributed to W1 (element 2710) by the entry for updating P1 with instruction.By having updated the table, W1 can be examined by using SMS Large rope system interface starts the retrieval (element 2713) of the data record of P1, and stage PS1 can be executed on the record of retrieval Processing operation.
Meanwhile at some time points, different working node W2 can access PAT1 with the trial of their own, to find Unappropriated subregion (element 2716).W2 can distribute P1 based on the determination of more newly arriving before W1, but unallocated different Subregion P2.In some embodiments, by the current of W2 (such as healthy indicator column in the entry based on the P2) P2 made Also bootable W2 selects P2 to the unhealthy or inactive determination of assignment node.Therefore, at least some embodiments, The determination of unallocated state or the unhealthy condition of work at present node can be used to select for redistributing (or initial point With) given subregion.W2 then can attempt update PAT1 to distribute to oneself (element 2719) P2.If be updated successfully, that W2 can begin to use SMS Retrieval Interfaces and record (element 2722) to retrieve P2, and executes and be defined for the suitable of the stage When processing operation.
As previously mentioned, the working node in the SPS of decentralised control (usually non-frequently) can obtain subregion from SMS and reflect Penetrate information, and use this information update PA tables if necessary.Figure 28 show according at least some embodiments can be by The working node in stream process stage, which executes, to be come based on the information update subregion allocation table obtained from flow management service control subsystem Operating aspect.As shown in element 2801, (such as divide during working node initializes or in response to various trigger conditions One closing in its subregion of dispensing), working node W1 can to SMS control subsystems submit request with obtain it is nearest or Current partition list or effective partition list.In some implementations, acquisition stream information can be called for this purpose Or similar API.In some embodiments, other trigger conditions can be used:For example, after the time of random quantity or ringing It should be in unexpected the decreasing or increasing of workload level, working node can respectively be configured to obtain new partition list.It can The partition list returned by SMS is compared (element 2807) with for the entry in the PA tables of the subregion.In description In embodiment, if it find that difference is (for example, if in the presence of not some in PA tables point in the partition list of newest acquisition Area, or if there is the not entry in the list of SMS in PA tables), working node can be inserted into or delete in PA tables Except entry, to solve the difference (element 2810).If (in some implementations, current for the entry of target to delete With assignment node, it would be possible that needing other coordination, such as can notify to distribute directly or through PA tables itself Working node.
After adjusting the difference, or if not detecting difference, a component may be selected in working node W1 Area, working node W1 should execute the processing operation (element 2813) in the stage on described group of subregion, and therefore can be more New PA tables.In some cases, depending on the trigger condition for causing to retrieve partition list, W1, which may have, distributes to its One or more subregions, and may not be needed to make a change its distribution or update PA tables.W1 then can must not In the case of must interacting with SMS control subsystems or changing the quantity of the entry in PA tables, continue to retrieve its allocated A subregion or multiple subregions data record, and handle the record (element 2816).Finally, when detecting trigger condition When (such as when " partition end of arrival " response equivalent be received retrieval request, to indicate subregion be close), It is asked in order to which newest partition information W1 can be sent to SMS control subsystems again, and 2801 forward behaviour of repeatable element Make.
Figure 29 shows the load balancing that can be executed by the working node in stream process stage according at least some embodiments The aspect of operation.As shown in element 2901, working node W1, which can determine to work as, detects any one of various trigger conditions, It such as detects when high resource utilization level or is stayed in based on configurable arrangement and execute load balancing on its stage Analysis.W1 can check the entry (element 2904) in PA tables, to determine the various work figureofmerits for the stage.This finger Mark may include distributing to the average of subregion of working node, working node or different subregions average work load it is horizontal (being stored in the embodiment in table by workload level indicator), the range of each working node workload or distribution Deng.
W1 then can (such as the quantity based on the subregion for distributing to W1 and/or each subregion workload level indicator) general The workload of its own is compared with some or all of of the index.In general, in the conclusion that can get three types Any type:W1 be overload, W1 be underload or W1 workload it is both less high or less low.It can be by by one The client for configuring the stage in a little embodiments for its interests is selected tactful or lead in other embodiments It crosses horizontal come the workload for limiting " excessively high " or " too low " using the heuristics of some default settings.If W1 determines its work It measures too low (element 2907), such as less than some minimal load thresholds T1, then can recognize that busier or more high load work Node Wk (element 2910).W1 then can for example by attempting to change, the Pm entries in PA tables, (this may be produced this modification of request Generating the notice for Wk) or by direct request Wk start one or more subregion Pm being transferred to W1 from Wk Process (element 2913).
If W1 determines that its workload is excessively high (element 2916), such as more than max-thresholds T2, then W1 is one recognizable Or multiple its allocated subregion Pn are to abandon (that is, release to be allocated by other working nodes) (element 2919).W1 can Then its identifier for example is removed by the row of the distribution from the entry for Pn to change the entry (element appropriate in PA tables 2922).If the workload of W1 is both less high or less low or has taken a variety of action described above after W1 to increase Or its workload is reduced, then W1 can start the process over the record (element 2925) for the subregion that it is assigned extremely.When and such as When fruit meets the condition for triggering the analysis of another load balancing, the forward operation corresponding to element 2901 is repeated.It should be noted that In the operation being shown in FIG. 29, W1 is shown as the only ability when its workload relative to its own detects unbalanced The variation of start-up operation amount.In other embodiments, if W1 detects unevenness in addition to itself in other working nodes When weighing apparatus, for example, if when W1 determines that W2 has the workload level more much lower than W3, then W1 can start again balanced action. In some realization methods, and if when W1 detects that workload is unbalanced, W1 can (such as pass through call such as Fig. 3 Shown in subregion stream (repartitionStream) SMS API and its equivalent again) request or start dynamic and divide again Area.In some embodiments, it can be operated for many kinds shown in Figure 29 by the working node configured recently to execute, such as when When new node being added to the stage after a period of time in operation in stage, new node can be conducted oneself with dignity by asking The subregion of the existing node of load is redistributed to notify their presence of existing node indirectly.In some embodiments, Also it can be used at one or more SMS subsystems or alternatively be used for SPS works using similar to those described above Make the control technology of the dispersion of node, such as the node of intake, storage or retrieval subsystem can be used similar to the shared of PA tables Data structure coordinate their workload.
It should be noted that in various embodiments, can be used in addition in the flow chart of Figure 17-Figure 24 and Figure 27-Figure 29 The operation of those of shown operation, to realize flow management service described above and/or stream process function.In some embodiment party In case, some in the operation shown can not be realized, either can in a different order realize or parallel rather than sequentially It realizes.It shall yet further be noted that each in SMS the and SPS functions of being supported about programming interface in various embodiments, a kind of Or any combinations of multiple technologies can be used to realize the interface, including webpage, network address, network service API, other API, order Row tool, graphic user interface, mobile applications (app), tablet computer app etc..
Use case
Establish the mensurable dynamic based on subregion of the acquisition for flow data record, storage, retrieval and interim processing The technique described above of the configurable managed multi-tenant service of state may be useful in many kinds of situations.For example, large-scale Provider network may include thousands of example hosts, to realize while be used for many differences of ten hundreds of clients Multi-tenant or single tenant service Service Instance.The monitoring installed on various examples and host and/or charging proxy can be fast Speed generates thousands of index records, it may be necessary to and it stores and analyzes the index and record to generate the accurate station message recording, The effective supply plan of the data center for provider network is determined, detecting network attack etc..The record of monitoring can The inlet flow of the SMS for mensurable intake and storage is formed, and can realize finger of the SPS technologies of description for acquisition Target is analyzed.Similarly, it acquires and analyzes from many Log Sources (for example, the application of the node from distributed application program Program daily record, or host or calculated examples at data center system log) a large amount of log recording application Program can also can utilize SMS and SPS functions.In at least some environment, SPS processing operations may include real-time ETL (extractions Conversion load) processing operation, (that is, the data record of reception is converted into the operation for being loaded into destination in real time, without It is to carry out the conversion offline), or the conversion for being inserted into the data record in data warehouse.Using for by data The SMS/SPS combinations being loaded into real time in data warehouse are inserted into warehouse by the data for avoidable pair before analyzing In the delay cleaned and data of the arrangement from one or more data sources are commonly required.
Many different " big data " application programs can also be used SMS and SPS technologies to build.For example, stream can be used Efficiently perform the analysis of the trend in various forms of social media interactions.It can will acquire from mobile phone or tablet computer The location information of data, such as user is managed as stream record.Such as the audio or video acquired from a monitor camera group of planes Information, which can represent, can be acquired and be handled the flow data for potentially contributing to prevent various types of attacks in a manner of mensurable Another type of collection.The sensor for example from meteorological satellite, based on ocean, the sensor based on forest, astronomy is needed to look in the distance The scientific application program of the analysis of the growing data set of mirror acquisition may also benefit from flow management and processing as described herein Ability.Config option and priced option based on flexible policy can help different types of user customization to be suitble to, and theirs is specified pre- Calculate the stream function with persistent data/usability requirements.
The embodiment of the disclosure can be described in view of following clause:
1. a kind of system comprising:
One or more computing devices, are configured to:
One or more programming interface are realized, so that the client of multi-tenant stream process service can correspond to and finger Fixed data flow associated specific processing stage indicates:(a) need according to partitioning strategies in the specified data stream Data record on the processing operation that executes, and (b) be used for the output of result of the processing operation and distribute descriptor;
By one or more of programming interface the specific processing stage is stayed in from the reception of specific client The instruction of specific processing operation is executed in the data record of the specific data flow at place, and is used for the specific processing operation Result specific output distribute descriptor;
It is at least partially based on the partitioning strategies and is at least partially based on and need to be deployed as the processing stage The performance capability of the estimation of the resource of working node determines the working node of the initial number for the specified data stream;
The specific working node for configuring the working node of the initial number comes:(a) the specific data flow is received One or more subregions data record, the specific processing operation (b) is executed in the data record of reception, (c) is deposited Progress record is stored up, the progress record indicates the part of processed one or more of subregions at the working node And the result of the specific processing operation (d) is transferred to by one or more according to the specific output distribution descriptor Destination;
Monitor the health status of the specific working node;And
Be in the determination in undesirable state in response to the specific working node, configuration replace working node with The specific working node is replaced, wherein the working node of replacing is accessed by the progress of the specific working node storage Record, to identify at least one data record of one or more of subregions, need on one or more of subregions by The replacement working node executes the specific processing operation.
2. the system as described in clause 1, wherein the specific output distribution descriptor instruction needs according to different points Area's strategy using the result of the specific processing operation as the data record of different data flows distribute to or more by with Set the intake node for the different data flow.
3. the system as described in clause 1, wherein one or more of computing devices are also configured to:
The instruction that another processing stage is received from the specific client needs the number of the specific data flow It is fed as input to another described processing stage according to record, wherein it is different to stay in execution at other described processing stages Processing operation;And
It is configured to the working node of other processing stages in addition organized.
4. the system as described in clause 1, wherein one or more of computing devices are also configured to:
Different working nodes in response to being configured to execute another processing operation for another processing stage are just located Determination in undesirable state, the data record for configuring different replacement working nodes then to be received in one or more Other processing operations described in upper execution, without accessing progress record.
5. the system as described in clause 1, wherein one or more of computing devices are also configured to:
Meet the determination of trigger criteria in response to the workload level at the different operating node of the processing stage, it is real Operation is reconfigured at this stage, and it includes one or more of the following terms that the stage, which reconfigures operation,:(a) when continuing to locate The dynamic of the specific data flow performed when the other data record of stream subregion again is managed, work (b) is substituted Node (c) is configured for the place to the distribution of previously processed at least one subregion at the different working node The transfer of the change of multiple working nodes in reason stage or (d) working node from a server to another server.
6. a kind of method comprising:
The following terms is executed by one or more computing devices:
Have from the reception of specific client in multi-tenant stream process service center and stays at specified processing stage specific The instruction of specific operation is executed in the data record of data flow, and the specific output of result for the specific operation distributes Descriptor;
The specific operation is at least partially based on to determine the work for needing to be configured to the specified processing stage The initial number of node;
The specific working node for configuring the working node of the initial number comes:(a) in the specific data flow The specific operation is executed in the data record of the reception of one or more subregions, (b) stores progress record, the progress note The record instruction part of processed one or more of subregions and (c) according to described specific at the working node The result specifically operated is transferred to one or more destinations by output distribution descriptor;And
Be in the determination in unsound state in response to the specific working node, selection replace working node with The specific working node is replaced, wherein the working node of replacing is accessed by the progress of the specific working node storage Record, to identify at least one data record of one or more of subregions, need on one or more of subregions by The replacement working node executes the specific operation.
7. the method as described in clause 6 further includes being executed by one multiple computing devices:
It calls and records Retrieval Interface by one or more programming datas that multi-tenant flow management service is realized, to receive The data record of one or more subregions is stated, including specific programming data records Retrieval Interface comprising the data of request are remembered The instruction of sequence number in the subregion of record is as parameter.
8. the method as described in clause 6 further includes being executed by one multiple computing devices:
Realize one or more programming interface so that the client of the stream process service can specify for one or The directed acyclic graph of the processing stage of the data record of multiple data flows.
9. the method as described in clause 6 further includes being executed by one multiple computing devices:
From the multi-tenant flow management service acquisition subregion of the storage for the data record for being responsible for the specific data flow Strategy is being used for the instruction of the specific data flow;And
The partitioning strategies are at least partially based on to determine the initial number of the working node.
10. the method as described in clause 6, wherein the specific output distribution descriptor instruction needs according to different points Area's strategy using the result of the specific operation as the data record of different data flows distribute to or more be configured for The intake node of the different data flow.
11. the method as described in clause 6 further includes being executed by one multiple computing devices:
Different working nodes in response to being configured to execute another operation for another processing stage are in not Determination in desired state configures different replacement working nodes to be held in the data record that one or more then receives Other described operations of row, without accessing progress record.
12. the method as described in clause 6 further includes being executed by one multiple computing devices:
Meet the determination of trigger criteria in response to the workload level at the different operating node of the processing stage, it is real One or more of existing the following terms:(a) dynamic of specific data flow subregion again (b) substitutes working node and arrives The distribution of previously processed at least one subregion, (c) is configured for the processing stage at the different working node Multiple working nodes change or (d) transfer of the working node from a server to another server.
13. the method as described in clause 6, wherein formatted according to input format the specific operation as a result, described Input format is compatible with another stream processing system, and the specific destination of wherein one or more of destinations includes The input node of other stream processing systems.
14. the method as described in clause 6, wherein entry is stored in persistence by the specific working node configuration In data repository, the entry represents the application state information of accumulation, and the status information corresponds to described specific Working node at processed multiple data records, and the specific working node is configured to include in progress record The entry instruction.
15. the method as described in clause 6, wherein the operation includes one in the following terms:Log recording analysis behaviour (extraction turns by work, monitoring resource index analysis, the calculating of charging volume, sensor data analysis, social media interaction analysis, real-time ETL Change load) processing operation or in the conversion that data record is inserted into foregoing description data record in data warehouse.
16. the method as described in clause 6 further includes being executed by one multiple computing devices:
In response to the stream process configuring request of the calling by client library component, in multi-tenant stream process service center The specified resource of registration is as the working node for different processing stages.
17. the method as described in clause 6 further includes being executed by one multiple computing devices:
In response to the stream process configuring request of the calling by client library component, in multi-tenant stream process service center Determine the one or more control plane functions for having and staying in and being realized at different processing stages.
18. the method as described in clause 6, wherein the operation is idempotent operation.
19. the method as described in clause 6 further includes being executed by one multiple computing devices:
Have from the specific client reception in multi-tenant stream process service center and stays at different processing stages The instruction of specific non-idempotent operation is executed in the data record of different data flows;And
The first working node for configuring the different disposal stage, to execute the non-idempotent in the data record of reception Operation.
20. the method as described in clause 19 further includes being executed by one multiple computing devices:
Configure first working node in the different disposal stage with:(a) clear operation is executed with by the non-power The result of equal operations is stored to one or more destinations, and the instruction of clear operation timing (b) is stored in persistent storage At place;And
Replacement working node is configured using the instruction of the clear operation timing, in first working node The clear operation is recurred during recovery after failure.
21. a kind of non-transitory computer of storage program instruction may have access to storage medium, described program instruction is when one The control node that multi-tenant stream process service is realized when being executed on a or multiple processors, wherein the control node is operable Come:
There is the spy executed in the data record for staying in specific data flow from the reception of specific client by programming interface Surely the instruction operated;
Partitioning strategies associated with the specific data flow are at least partially based on to be used at processing stage to determine The initial number of the working node of the specified data stream;
The specific working node for configuring the working node of the initial number, at one of the specific data flow Or the specific operation is executed in the data record of the reception of multiple subregions;And
Be in the determination in unsound state in response to the specific working node, configuration replace working node with Replace the specific working node.
22. the non-transitory computer as described in clause 21 may have access to storage medium, wherein the control node is operable Come:
Configuring redundancy group, the redundancy group include multiple work of the data record for the different subregions for handling different data streams Node, wherein at least one of the multiple working node working node is appointed as to receive the number of the different subregions According to the main node of record, and wherein by least another working node in the multiple working node be configured in response to Trigger event undertakes the standby node of the responsibility of main node.
23. the non-transitory computer as described in clause 21 may have access to storage medium, wherein the control node is operable Come:
Meet the determination of trigger criteria in response to the workload level at the different operating node of the processing stage, it is real One or more of existing the following terms:(a) dynamic of specific data flow subregion again (b) substitutes working node and arrives The distribution of previously processed at least one subregion, (c) is configured for the processing stage at the different working node Multiple working nodes change or (d) transfer of the working node from a server to another server.
24. the non-transitory computer as described in clause 21 may have access to storage medium, wherein the specific output distribution Descriptor instruction needs according to different partitioning strategies using the result of the specific operation as the data record of of short duration data flow Distribution to or more be configured for the intake node of the of short duration data flow, the of short duration data flow is mentioned persistently The storage of property storage device is unwanted.
25. the non-transitory computer as described in clause 21 may have access to storage medium, wherein the control node is operable Come:
Realize one or more programming interface so that the client of the stream process service can specify for one or The directed acyclic graph of the processing stage of the data record of multiple data flows.
Illustrative computer system
In at least some embodiments, some or all of one or more services of the techniques described herein are realized Device may include general-purpose computing system, and the general-purpose computing system includes or be configured to access one or more computers can Medium is accessed, the techniques described herein include realizing the portion of SMS subsystems (for example, intake, storage, retrieval and control subsystem) The technology of part and SPS working nodes and control node.Figure 30 shows this general-purpose calculating appts 9000.In the embodiment party shown In case, computing device 9000 includes one or more that system storage 9020 is connected to by input/output (I/O) interface 9030 A processor 9010.Computing device 9000 further includes the network interface 9040 for being connected to I/O interfaces 9030.
In various embodiments, computing device 9000 can be the single processor system for including a processor 9010, or Include the multicomputer system of several processors 9010 (such as two, four, eight or another suitable quantity).Processor 9010 It can be any processor for being able to carry out instruction.For example, in various embodiments, processor 9010 can be to implement various instructions Collect the general or embeded processor of any type framework in framework (ISA), the framework such as x86, PowerPC, SPARC, Or MIPS ISA or any other suitable ISA.In a multi-processor system, each processor 9010 can be usually but not necessarily real Apply identical ISA.In some implementations, alternative conventional processor or external use in addition to conventional processor are schemed Shape processing unit (GPU).
System memory 9020 can be configured to store the instruction and data that can be accessed by processor 9010.In various realities It applies in scheme, any appropriate memory technology can be used to implement for system memory 9020, and the reservoir technology is for example static Random access memory (SRAM), synchronous dynamic ram (SDRAM), non-volatile/flash-type reservoir or any other type Reservoir.In the shown embodiment, the program instruction and data (such as those described above of one or more required functions are realized Method, technology and data) it is illustrated as code 9025 and data 9026 are stored in system storage 9020.
In one embodiment, I/O interfaces 9030 can be configured to coprocessor 9010,9020 and of system storage The I/O flows between any peripheral unit in device, the peripheral unit include network interface 9040 or other peripheral interfaces, Such as store data object subregion physical copy various types of persistence and/or volatile storage.One In a little embodiments, I/O interfaces 9030 can perform any required agreement, sequential or other data conversions to come from a portion The data-signal of part (for example, system storage 9020) is converted into being suitable for being made by another component (for example, processor 9010) Format.In some embodiments, I/O interfaces 9030 may include the dress for being attached by various types of peripheral buses The support set, the peripheral bus such as peripheral component interconnection (PCI) bus standard or universal serial bus (USB) standard change Deformation type.In some embodiments, the function of I/O interfaces 9030 can be divided into two or more individual components, such as North bridge and south bridge.In addition, in some embodiments, some or all of functions of I/O interfaces 9030, such as to system memory 9020 interface can be directly incorporated into processor 9010.
Network interface 9040 can be configured to allow data in computing device 9000 and be attached to one or more networks It is swapped between 9050 other devices 9060 (other computer systems or device shown in such as Fig. 1 to Figure 29). In each embodiment, network interface 9040 can be supported via any suitable wired or wireless general data network (example As ethernet network type) it is communicated.In addition, network interface 9040 can be supported via telecommunication/telephone network (as simulated Speech network or digital fiber communication network), it is suitable via storage area network (such as fiber channel SAN) or via any other The network and/or agreement of type are communicated.
In some embodiments, system storage 9020 can be as above with respect to described in Fig. 1 to Figure 29 by with An embodiment for setting the computer accessible to store program instruction and data, for realize corresponding method and The embodiment of equipment.However, in other embodiments, can be received in different types of computer accessible, Send or store program instruction and/or data.In general, computer accessible may include the storage medium of non-transitory Or storage medium, such as magnetic medium or optical medium, such as it is connected to by I/O interfaces 9030 magnetic of computing device 9000 Disk or DVD/CD.It can also include that can be used as system storage 9020 or another kind of that non-transitory computer, which may have access to storage medium, The memory of type is included in any volatibility or non-volatile media in some embodiments of computing device 9000, such as RAM (for example, SDRAM, DDR SDRAM, RDRAM, SRAM etc.), ROM etc..In addition, computer accessible may include passing Defeated medium or signal are such as believed via the electric signal of communication media (network and/or Radio Link) transmission, electromagnetic signal or number Number, such as can implement via network interface 9040.In various embodiments, all calculating dresses multiple as shown in Figure 30 It some or all of sets and can be used to realize the function;For example, the software component and server that are run on various different devices It may cooperate to provide the function.In some embodiments, in addition to or instead of using general-purpose computing system is realized, institute Storage device, network equipment or dedicated computer system can be used to realize for the part for stating function.Term " meter as used herein Calculate device " refer to the device of at least all these types, and it is not limited to the device of these types.
Conclusion
Each embodiment can also include that sending and receiving are connect in computer accessible according to what the description of front was realized It send or store instruction and/or data.In general, computer accessible may include storage medium or storage medium (such as magnetic medium or optical medium, such as disk or DVD/CD-ROM), volatibility or non-volatile media (such as RAM (examples Such as, SDRAM, DDR, RDRAM, SRAM etc.), ROM etc.) and transmission medium or signal (such as pass through communication media (such as network And/or Radio Link) transmission signal (such as electric signal, electromagnetic signal or digital signal)).
Such as the exemplary implementation scheme of various method representation methods shown in the figure and described herein.The method can To implement in software, hardware or combinations thereof.The sequence of method can change, and each element can be added, arrange again Sequence, combination, omission, modification etc..
Those skilled in the art in benefit of this disclosure, which will be clear that, can carry out various modifications and change.Be intended to comprising it is all this A little modifications and variations, and correspondingly, above description should be regarded as having illustrative and not restrictive meaning.

Claims (15)

1. a kind of method of processing data flow comprising:
The following terms is executed by one or more computing devices:
Have from the reception of specific client in multi-tenant stream process service center and stays at specified processing stage in specific data Execute the instruction of specific operation in the data record of stream, and the specific output of result for the specific operation is distributed and described Symbol;
The specific operation is at least partially based on to determine the working node for needing to be configured to the specified processing stage Initial number;
The specific working node for configuring the working node of the initial number comes:(a) at one of the specific data flow Or the specific operation is executed in the data record of the reception of multiple subregions, progress record (b) is stored, the progress record refers to Show the part of processed one or more of subregions at the working node, and (c) according to described specific defeated Go out to distribute descriptor and the result specifically operated is transferred to one or more destinations;And
It is in the determination in unsound state in response to the specific working node, selection replaces working node to replace The specific working node, wherein the working node of replacing is accessed by the progress note of the specific working node storage Record, to identify at least one data record of one or more of subregions, needs on one or more of subregions by institute It states and replaces the working node execution specific operation.
2. the method as described in claim 1 further includes being executed by one or more of computing devices:
One or more programming datas associated with the service of multi-tenant stream process are called to record Retrieval Interface, to receive described one The data record of a or multiple subregions, including specific programming data record Retrieval Interface, the specific programming data record Retrieval Interface includes the instruction of the sequence number in the subregion of the data record of request as parameter.
3. the method as described in claim 1 further includes being executed by one or more of computing devices:
One or more programming interface are realized so that the client of the stream process service can be specified for one or more The directed acyclic graph of the processing stage of the data record of data flow.
4. the method as described in claim 1 further includes being executed by one or more of computing devices:
The instruction that partitioning strategies are being used for the specific data flow is obtained from the interface of multi-tenant stream process service;And
The partitioning strategies are at least partially based on to determine the initial number of the working node.
5. the method as described in claim 1, wherein the specific output distribution descriptor instruction needs according to different points Area's strategy, which distributes the result of the specific operation to one or more as the data record of different data flows, to be configured Intake node for the different data flow.
6. the method as described in claim 1 further includes being executed by one or more of computing devices:
Meet the determination of trigger criteria in response to the workload level at the different operating node of the processing stage, realize with One or more of lower items:(a) dynamic of specific data flow subregion again, (b) another working node to The distribution of previously processed at least one subregion, (c) is configured for the processing stage at the different working node The change of multiple working nodes, or (d) transfer of the working node from a server to another server.
7. the method as described in claim 1, wherein entry is stored in persistence by the specific working node configuration In data repository, the entry represents the application state information of accumulation, and the status information corresponds in the spy Processed multiple data records at fixed working node, and the specific working node is configured to include progress record In the entry instruction.
8. the method as described in claim 1 further includes being executed by one or more of computing devices:
In response to the stream process configuring request of the calling by client library component, registered in multi-tenant stream process service center Specified resource is as the working node for different processing stages.
9. the method as described in claim 1 further includes being executed by one or more of computing devices:
Have from the specific client reception in multi-tenant stream process service center and stays at different processing stages not The instruction of specific non-idempotent operation is executed in the data record of same data flow;And
The first working node for configuring the different disposal stage is grasped with executing the non-idempotent in the data record of reception Make.
10. method as claimed in claim 9 further includes being executed by one or more of computing devices:
Configure first working node in the different disposal stage with:(a) clear operation is executed to grasp the non-idempotent The result of work is stored to one or more destinations, and the instruction of clear operation timing (b) is stored in persistent storage place Place;And
Replacement working node is configured using the instruction of the clear operation timing, in the first working node failure The clear operation is recurred during recovery later.
11. a kind of system including one or more processors and one or more memories, one or more of memories Including program instruction, multi-tenant stream process service is realized when executing described program instruction on the one or more processors Control node, wherein the control node can operate come:
There is the specific behaviour executed in the data record for staying in specific data flow from the reception of specific client by programming interface The instruction of work;
Partitioning strategies associated with the specific data flow are at least partially based on to determine at processing stage for described The initial number of the working node of specific data flow;
The specific working node for configuring the working node of the initial number, at one or more of the specific data flow The specific operation is executed in the data record of the reception of a subregion;And
It is in the determination in unsound state in response to the specific working node, configuration replaces working node to replace The specific working node.
12. it is system as claimed in claim 11, come wherein the control node can operate:
Configuring redundancy group, the redundancy group include multiple work sections of the data record for the different subregions for handling different data streams Point, wherein at least one of the multiple working node working node is appointed as to receive the data of the different subregions The main node of record, and wherein configure in response to touching at least another working node in the multiple working node to Hair event undertakes the standby node of the responsibility of main node.
13. it is system as claimed in claim 11, come wherein the control node can operate:
Meet the determination of trigger criteria in response to the workload level at the different operating node of the processing stage, realize with One or more of lower items:(a) dynamic of specific data flow subregion again, (b) another working node to The distribution of previously processed at least one subregion, (c) is configured for the processing stage at the different working node The change of multiple working nodes, or (d) transfer of the working node from a server to another server.
14. system as claimed in claim 11, wherein specific output distribution descriptor instruction needs according to different subregions The result of the specific operation is distributed as the data record of of short duration data flow to one or more and is configured for by strategy The intake node of the of short duration data flow, the storage for mentioning persistent storage for the of short duration data flow is not need 's.
15. it is system as claimed in claim 11, come wherein the control node can operate:
One or more programming interface are realized so that the client of the stream process service can be specified for one or more The directed acyclic graph of the processing stage of the data record of data flow.
CN201480061587.5A 2013-11-11 2014-11-11 Data Stream Processing frame based on subregion Active CN105706047B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US14/077,167 2013-11-11
US14/077,167 US10635644B2 (en) 2013-11-11 2013-11-11 Partition-based data stream processing framework
PCT/US2014/065057 WO2015070236A1 (en) 2013-11-11 2014-11-11 Partition-based data stream processing framework

Publications (2)

Publication Number Publication Date
CN105706047A CN105706047A (en) 2016-06-22
CN105706047B true CN105706047B (en) 2018-08-31

Family

ID=

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101395602A (en) * 2005-12-29 2009-03-25 亚马逊科技公司 Method and apparatus for a distributed file storage and indexing service
US8572091B1 (en) * 2011-06-27 2013-10-29 Amazon Technologies, Inc. System and method for partitioning and indexing table data using a composite primary key

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101395602A (en) * 2005-12-29 2009-03-25 亚马逊科技公司 Method and apparatus for a distributed file storage and indexing service
US8572091B1 (en) * 2011-06-27 2013-10-29 Amazon Technologies, Inc. System and method for partitioning and indexing table data using a composite primary key

Similar Documents

Publication Publication Date Title
CN105723679B (en) System and method for configuration node
CN105765575B (en) Data flow intake and persistence technology
CN105706086B (en) For obtaining, storing and consuming the management service of large-scale data stream
US10795905B2 (en) Data stream ingestion and persistence techniques
US10691716B2 (en) Dynamic partitioning techniques for data streams
AU2014346366B2 (en) Partition-based data stream processing framework
CN105706047B (en) Data Stream Processing frame based on subregion

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant