US11416283B2 - Method and apparatus for processing data in process of expanding or reducing capacity of stream computing system - Google Patents
Method and apparatus for processing data in process of expanding or reducing capacity of stream computing system Download PDFInfo
- Publication number
- US11416283B2 US11416283B2 US16/503,145 US201916503145A US11416283B2 US 11416283 B2 US11416283 B2 US 11416283B2 US 201916503145 A US201916503145 A US 201916503145A US 11416283 B2 US11416283 B2 US 11416283B2
- Authority
- US
- United States
- Prior art keywords
- execution unit
- target execution
- identifier
- processed data
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5077—Logical partitioning of resources; Management or configuration of virtualized resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/4401—Bootstrapping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
- G06F9/5088—Techniques for rebalancing the load in a distributed system involving task migration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/544—Buffers; Shared memory; Pipes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45579—I/O management, e.g. providing access to device drivers or storage
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
Definitions
- Embodiments of the present disclosure relate to the field of computer technology, and specifically to a method and apparatus for processing data.
- Stream computing is widely applied in large-scale distributed computing scenarios such as a scenario of information flow, a scenario of searching to construct a database, and a scenario of charging for retrievals.
- the stream computing is a pipeline-like data processing mode.
- the stream computing comes from a concept that data processing is performed once instantly as soon as one event occurs, instead of buffering data for batch processing.
- the traffic In a stream computing system, the traffic generally has obvious fluctuations, which is greatly affected by an unexpected event, seasonality and so on. In order to ensure the service quality and make a rational use of resources, it is required to expand the capacity when the traffic is at the peak and reduce the capacity when the traffic is at the trough.
- Embodiments of the present disclosure propose a method and apparatus for processing data.
- the embodiments of the present disclosure provide a method for processing data.
- the method includes: acquiring a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system; adjusting a number of the target execution units in the stream computing system based on the to-be-adjusted number; determining, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.
- the method before the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set, the method further includes: persisting, according to an identifier set to which an identifier of to-be-processed data generated through running of an upstream execution unit of the target execution unit belongs, the generated to-be-processed data through the upstream execution unit of the target execution unit
- the method further includes: sending indication information to the upstream execution unit of the target execution unit through the target execution unit, the indication information being used to indicate the to-be-processed data generated through the running of the upstream execution unit of the target execution unit and processed by the target execution unit.
- the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set includes: restarting the at least one target execution unit after the adjustment; and receiving and processing, through the restarted target execution unit, to-be-processed data not processed by the target execution unit, wherein the to-be-processed data is sent by the upstream execution unit of the target execution unit, is in the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit, and is determined according to the indication information
- the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set includes: de-duplicating, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit; and processing, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set
- the embodiments of the present disclosure provide an apparatus for processing data.
- the apparatus includes: an acquiring unit, configured to acquire a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system; an adjusting unit, configured to adjust a number of the target execution units in the stream computing system based on the to-be-adjusted number; and a processing unit, configured to determine, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and process, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.
- the processing unit includes: a persisting subunit, configured to persist, according to an identifier set to which an identifier of to-be-processed data generated through running of an upstream execution unit of the target execution unit belongs, the generated to-be-processed data through the upstream execution unit of the target execution unit.
- the processing unit includes: a sending subunit, configured to send indication information to the upstream execution unit of the target execution unit through the target execution unit, the indication information being used to indicate the to-be-processed data generated through the running of the upstream execution unit of the target execution unit and processed by the target execution unit.
- the processing unit includes: a starting subunit, configured to restart the at least one target execution unit after the adjustment; and a processing subunit, configured to receive and process, through the restarted target execution unit, to-be-processed data not processed by the target execution unit, wherein the to-be-processed data is sent by the upstream execution unit of the target execution unit, is in the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit, and is determined according to the indication information.
- the processing unit includes: a de-duplicating subunit, configured to de-duplicate, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit; and a processing subunit, configured to process, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set.
- a de-duplicating subunit configured to de-duplicate, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit
- a processing subunit configured to process, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set.
- the embodiments of the present disclosure provide a device.
- the device includes: one or more processors; and a storage device, configured to store one or more programs.
- the one or more programs when executed by the one or more processors, cause the one or more processors to implement the method described in the first aspect.
- the embodiments of the present disclosure provide a computer readable medium storing a computer program.
- the program when executed by a processor, implements the method described in the first aspect.
- the to-be-adjusted number of the target execution units is acquired, and then, the number of the target execution units in the stream computing system is adjusted based on the to-be-adjusted number. Finally, for the target execution unit in the at least one target execution unit after the adjustment, the identifier set corresponding to the target execution unit is determined, and the to-be-processed data indicated by the identifier in the corresponding identifier set is processed through the target execution unit.
- FIG. 1 is a diagram of an exemplary system architecture in which an embodiment of the present disclosure may be implemented
- FIG. 2 is a flowchart of an embodiment of a method for processing data according to the present disclosure
- FIG. 3 is a flowchart of another embodiment of the method for processing data according to the present disclosure.
- FIG. 4 is a schematic diagram of an application scenario of the method for processing data according to the present disclosure.
- FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for processing data according to the present disclosure.
- FIG. 6 is a schematic structural diagram of a computer system adapted to implement a server according to embodiments of the present disclosure.
- FIG. 1 shows an exemplary system architecture 100 in which an embodiment of a method for processing data or an apparatus for processing data according to the present disclosure may be implemented.
- the system architecture 100 may include terminal devices 101 , 102 and 103 , a network 104 and a server 105 .
- the network 104 serves as a medium providing a communication link between the terminal devices 101 , 102 and 103 and the server 105 .
- the network 104 may include various types of connections, for example, wired or wireless communication links, or optical fiber cables.
- a user may interact with the server 105 via the network 104 using the terminal devices 101 , 102 and 103 , to receive or send messages.
- Various applications e.g., a social application, an image processing application, an e-commerce application and a search application
- the terminal devices 101 , 102 and 103 may be hardware or software.
- the terminal devices 101 , 102 and 103 may be various electronic devices having a display screen, which include, but not limited to, a smart phone, a tablet computer, a laptop portable computer and a desktop computer.
- the terminal devices 101 , 102 and 103 may be installed in the above listed electronic devices.
- the terminal devices may be implemented as a plurality of pieces of software or a plurality of software modules, or as a single piece of software or a single software module, which will not be specifically defined here.
- the server 105 may be a server providing various services, for example, a backend server providing a support for the applications installed on the terminal devices 101 , 102 and 103 .
- the server 105 may acquire a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system; adjust a number of the target execution unitss in the stream computing system based on the to-be-adjusted number; determine, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and process, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.
- the method for processing data provided by the embodiments of the present disclosure may be generally performed by the server 105 .
- the apparatus for processing data may be provided in the server 105 .
- the server may be hardware or software.
- the server may be implemented as a distributed server cluster composed of a plurality of servers, or as a single server.
- the server may be implemented as a plurality of pieces of software or a plurality of software modules (e.g., software or software modules for providing a distributed service), or as a single piece of software or a single software module, which will not be specifically defined here.
- terminal devices the numbers of the terminal devices, the networks, and the servers in FIG. 1 are merely illustrative. Any number of terminal devices, networks, and servers may be provided based on actual requirements.
- the method for processing data includes the following steps.
- Step 201 acquiring a to-be-adjusted number of target execution units.
- an execution body e.g., the server shown in FIG. 1
- the target execution unit refers to a unit executing a target program segment in a stream computing system.
- the stream computing system may include a control node and a plurality of work nodes.
- a work node may also be referred to as an operator.
- the control node may send a corresponding control instruction to a subordinate work node, so that the work node invokes an execution unit to process a data stream generated by a service according to the control instruction.
- Each work node may include one or more execution units. When the work node is invoked to process the data stream, the data stream is processed by the execution units included in the work node, and the execution unit may be a thread or a process.
- the stream computing system may include several stream computing tasks (applications), each of the stream computing tasks is composed of some independent computational logics (processors) according to an upstream and downstream subscription relationship.
- processors independent computational logics
- the computational logics may be distributed on a plurality of servers in a multi-process mode.
- Data (Tuple) flows between processes having the upstream and downstream subscription relationship through a remote procedure call (PRC), and to-be-processed data given to a downstream execution unit is produced and a modification for an intermediate state is caused during the data processing process.
- the execution unit may be a thread or a process executing an independent computational logic, and the independent computational logic is embodied as a segment of a program.
- the target execution unit may be a unit in the stream computing system, the number of which needs to be modified. For example, the number of the target execution units may be increased when the load is too heavy.
- the to-be-adjusted number may be determined by the execution body according to a corresponding relationship between a pre-established load condition and the number of the target execution units, and the load condition may be reflected by traffic information or processing speed information.
- the to-be-adjusted number may also be determined according a concurrency setting instruction after the concurrency setting instruction is acquired.
- the concurrency of the word node may represent the number of the execution units included in the work node. For example, if the concurrency of the work node is 3, it means that the work node may invoke 3 execution units to process the data stream.
- Step 202 adjusting a number of the target execution units in a stream computing system based on the to-be-adjusted number.
- the execution body may adjust the number of the target execution units in the stream computing system based on the to-be-adjusted number acquired in step 201 . If the number of the target execution units running actually in the stream computing system is identical to the to-be-adjusted number, the number of the target execution units may not need to be adjusted. If the number of the target execution units actually running in the stream computing system is different from the to-be-adjusted number, the number of the target execution units in the stream computing system may be adjusted to the to-be-adjusted number.
- Step 203 determining, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit.
- the execution body may first determine the identifier set corresponding to the target execution unit.
- An identifier in the identifier set is used to indicate to-be-processed data.
- the identifier may be generated according to a preset rule, for example, may be determined according to the generation order, the generation time, the storage location and the source of data.
- the identifier set includes the identifier for indicating the to-be-processed data of the execution unit.
- the identifier included in the identifier set may remain unchanged before and after the adjustment of the number of target execution units in the stream computing system.
- the identifier may be mapped to the identifier set using a hash algorithm, and the corresponding relationship between the identifier and the identifier set may also be pre-established by other means.
- a total number of identifier sets may also remain unchanged before and after the adjustment of the number of target execution units in the stream computing system. That is, the identifier sets corresponding to the total of the target execution units in the stream computing system remain unchanged.
- the execution body may adjust a mapping relationship between each target execution unit and each identifier set according to a preset rule.
- the specific rule may be set based on actual requirements. For example, in consideration of load balance, the identifier sets may be allocated averagely to the target execution units.
- the method before the to-be-processed data indicated by the identifier in the corresponding identifier set is processed through the target execution unit, the method further includes: persisting, according to the identifier set to which the identifier of the to-be-processed data generated through running an upstream execution unit of the target execution unit belongs, the generated to-be-processed data through the upstream execution unit of the target execution unit.
- the upstream execution unit of the target execution unit may be a unit that provides the to-be-processed data to the target execution unit in the stream computing system.
- the persistence is a mechanism of converting program data between a persistent state and a transient state. That is, transient data (e.g., data in a memory, which cannot be permanently preserved) is persisted as persistent data, for example, persisted into the database, so that the data can be stored for a long time.
- the persistence may include full persistence and incremental persistence, and the incremental persistence may avoid the duplication of the data, to further improve the data processing efficiency.
- the to-be-processed data generated through the running of the upstream execution unit is persisted, which may avoid the loss of the data when the capacity of the stream computing system is expanded or reduced, thus further improving the data processing efficiency.
- Step 204 processing, through the target execution unit, to-be-processed data indicated by an identifier in the corresponding identifier set.
- the execution body may process, through the target execution unit, the to-be-processed data indicated by the identifier in the identifier set corresponding to the target execution unit and determined in step 203 .
- the method further includes: sending, through the target execution unit, indication information to the upstream execution unit of the target execution unit.
- the indication information is used to indicate the to-be-processed data generated through the running of the upstream execution unit of the target execution unit and processed by the target execution unit.
- the indication information may further be used to indicate the to-be-processed data generated through the running of the upstream execution unit of the target execution unit and successfully received by the target execution unit, for example, an acknowledgement (ACK).
- ACK acknowledgement
- the acknowledgement may be a transmission control character sent to a sender by a receiver, representing that the receipt of the sent data is acknowledged without errors.
- the sending of the indication information may prevent the upstream execution unit from sending duplicated to-be-processed data to the target execution unit, thus further improving the data processing efficiency.
- the processing, through the target execution unit, to-be-processed data indicated by an identifier in the corresponding identifier set includes: restarting the at least one target execution unit after the adjustment; and receiving and processing, through the restarted target execution unit, to-be-processed data not processed by the target execution unit, wherein the to-be-processed data is sent by the upstream execution unit of the target execution unit, is in the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit, and is determined according to the indication information.
- the target execution unit receives and processes the to-be-processed data not processed by the target execution unit, and the to-be-processed data is in the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit, and is determined according to the indication information. Accordingly, it is ensured that the to-be-processed data is not duplicated and lost when the capacity of the stream computing system is expanded or reduced, which further improves the data processing efficiency.
- the to-be-adjusted number of the target execution units is acquired, the target execution unit referring to the unit executing the target program segment in the stream computing system.
- the number of the target execution units in the stream computing system is adjusted based on the to-be-adjusted number.
- the identifier set corresponding to the target execution unit is determined, the identifier in the identifier set being used to indicate the to-be-processed data.
- the to-be-processed data indicated by the identifier in the corresponding identifier set is processed through the target execution unit. Accordingly, an identifier-based data processing mechanism is provided.
- An identifier set is a logical concept with few physical costs, and the setting of the identifier set is very flexible. Thus, the flexibility of the streaming computing system is improved when the capacity of the system is expanded and reduced.
- FIG. 3 illustrates a flow 300 of another embodiment of the method for processing data.
- the flow 300 of the method for processing data includes the following steps.
- Step 301 acquiring a to-be-adjusted number of target execution units.
- an execution body e.g., the server shown in FIG. 1
- the method for processing data may first acquire the to-be-adjusted number of the target execution units.
- Step 302 adjusting a number of the target execution units in a stream computing system based on the to-be-adjusted number.
- the execution body may adjust the number of the target execution units in the stream computing system based on the to-be-adjusted number acquired in step 301 .
- Step 303 determining, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit.
- the execution body may first determine the identifier set corresponding to the target execution unit.
- An identifier in the identifier set is used to indicate to-be-processed data.
- Step 304 de-duplicating, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, to-be-processed data sent to the target execution unit by an upstream execution unit of the target execution unit.
- the execution body may de-duplicate the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit.
- the execution body may further remove, from the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit, data repeated with the received to-be-processed data recorded in the historical record.
- the historical record may include information such as the identifier of the received to-be-processed data.
- Step 305 processing, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set.
- the execution body may process the to-be-processed data indicated by the identifier in the corresponding identifier set after the de-duplication in step 304 , and thus, the target execution unit may merely process the data that has not been processed by the target execution unit.
- steps 301 , 302 and 303 are substantially the same as the operations in steps 201 , 202 and 203 , which will not be repeatedly described here.
- the execution body filters the data that is in the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit and is repeated with the received data recorded in the historical record.
- the repeated processing on the data may be avoid, to further improve the information processing efficiency.
- FIG. 4 is a schematic diagram of an application scenario of the method for processing data according to this embodiment.
- the stream computing system includes a target execution unit 402 , an upstream execution unit 401 of the target execution unit 402 , and a downstream execution unit 403 of the target execution unit 402 .
- the number of target execution units 402 in the stream computing system is 2 before being adjusted, and the to-be-adjusted number of the target execution units 402 is acquired as 3. Based on the to-be-adjusted number, the number of the target execution units 402 is adjusted to 3.
- the identifier sets corresponding to the unit having the identifier of 0 are kg 0 , kg 1 and kg 2
- the identifier sets corresponding to the unit having the identifier of 1 are kg 3 , kg 4 and kg 5 .
- the identifier sets corresponding to the unit having the identifier of 0 are kg 0 and kg 1
- the identifier sets corresponding to the unit having the identifier of 1 are kg 2 and kg 3
- the identifier sets corresponding to the unit having the identifier of 2 are kg 4 and kg 5 .
- the to-be-processed data indicated by an identifier in the identifier sets kg 0 and kg 1 may be processed through the unit having the identifier of 0.
- the to-be-processed data indicated by an identifier in the identifier sets kg 2 and kg 3 may be processed through the unit having the identifier of 1.
- the to-be-processed data indicated by an identifier in the identifier sets kg 4 and kg 5 may be processed through the unit having the identifier of 2.
- the data processed by the target execution units 402 may further be persisted into a storage device 404 .
- the present disclosure provides an embodiment of an apparatus for processing data.
- the embodiment of the apparatus corresponds to the embodiment of the method shown in FIG. 2 , and the apparatus may be applied in various electronic devices.
- the apparatus 500 for processing data in this embodiment includes: an acquiring unit 510 , an adjusting unit 520 and a processing unit 530 .
- the acquiring unit 510 is configured to acquire a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system.
- the adjusting unit 520 is configured to adjust a number of the target execution units in the stream computing system based on the to-be-adjusted number.
- the processing unit 530 is configured to determine, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and process, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.
- step 201 for specific processing of the acquiring unit 510 , the adjusting unit 520 and the processing unit 530 in the apparatus 500 for processing data, and the technical effects thereof, reference may be made to relative descriptions of step 201 , step 202 and step 203 in the corresponding embodiment of FIG. 2 respectively.
- the processing unit 530 includes a persisting subunit 531 .
- the persisting subunit is configured to persist, according to an identifier set to which an identifier of to-be-processed data generated through running of an upstream execution unit of the target execution unit belongs, the generated to-be-processed data through the upstream execution unit of the target execution unit.
- the processing unit 530 includes a sending subunit 532 .
- the sending subunit is configured to send indication information to the upstream execution unit of the target execution unit through the target execution unit.
- the indication information is used to indicate the to-be-processed data generated through the running of the upstream execution unit of the target execution unit and processed by the target execution unit.
- the processing unit 530 includes: a starting subunit 533 , configured to restart the at least one target execution unit after the adjustment; and a processing subunit 535 , configured to receive and process, through the restarted target execution unit, to-be-processed data not processed by the target execution unit, where the to-be-processed data is sent by the upstream execution unit of the target execution unit, is in the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit, and is determined according to the indication information.
- the processing unit 530 includes: a de-duplicating subunit 534 , configured to de-duplicate, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit; and a processing subunit 535 , configured to process, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set.
- the to-be-adjusted number of the target execution units is acquired, the target execution unit referring to the unit executing the target program segment in the stream computing system.
- the number of the target execution units in the stream computing system is adjusted based on the to-be-adjusted number.
- the identifier set corresponding to the target execution unit is determined, the identifier in the identifier set being used to indicate the to-be-processed data.
- the to-be-processed data indicated by the identifier in the corresponding identifier set is processed through the target execution unit. Accordingly, an identifier-based data processing mechanism is provided, thus improving the efficiency of processing the data.
- FIG. 6 is a schematic structural diagram of a computer system 600 adapted to implement a server of the embodiments of the present disclosure.
- the server shown in FIG. 6 is merely an example, and should not bring any limitations to the functions and the scope of use of the embodiments of the present disclosure.
- the computer system 600 includes a central processing unit (CPU) 601 , which may execute various appropriate actions and processes in accordance with a program stored in a read-only memory (ROM) 602 or a program loaded into a random access memory (RAM) 603 from a storage portion 608 .
- the RAM 603 also stores various programs and data required by operations of the system 600 .
- the CPU 601 , the ROM 602 and the RAM 603 are connected to each other through a bus 604 .
- An input/output (I/O) interface 605 is also connected to the bus 604 .
- the following components are connected to the I/O interface 605 : an input portion 606 including a keyboard, a mouse etc.; an output portion 607 including a cathode ray tube (CRT), a liquid crystal display device (LCD), a speaker etc.; a storage portion 608 including a hard disk and the like; and a communication portion 609 including a network interface card such as a LAN card and a modem.
- the communication portion 609 performs communication processes via a network such as the Internet.
- a driver 610 is also connected to the I/O interface 605 as required.
- a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, and a semiconductor memory may be installed on the driver 610 , to facilitate the retrieval of a computer program from the removable medium 611 , and the installation thereof on the storage portion 608 as needed.
- an embodiment of the present disclosure includes a computer program product, including a computer program hosted on a computer readable medium, the computer program including program codes for performing the method as illustrated in the flowchart.
- the computer program may be downloaded and installed from a network via the communication portion 609 , and/or may be installed from the removable medium 611 .
- the computer program when executed by the central processing unit (CPU) 601 , implements the above mentioned functionalities defined in the method of the present disclosure.
- the computer readable medium in the present disclosure may be a computer readable signal medium, a computer readable storage medium, or any combination of the two.
- the computer readable storage medium may be, but not limited to: an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or element, or any combination of the above.
- a more specific example of the computer readable storage medium may include, but not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read only memory (ROM), an erasable programmable read only memory (EPROM or flash memory), a fibre, a portable compact disk read only memory (CD-ROM), an optical memory, a magnet memory or any suitable combination of the above.
- the computer readable medium may be any physical medium containing or storing programs, which may be used by a command execution system, apparatus or element or incorporated thereto.
- the computer readable signal medium may include a data signal that is propagated in a baseband or as a part of a carrier wave, which carries computer readable program codes. Such propagated data signal may be in various forms, including, but not limited to, an electromagnetic signal, an optical signal, or any suitable combination of the above.
- the computer readable signal medium may also be any computer readable medium other than the computer readable storage medium.
- the computer readable medium is capable of transmitting, propagating or transferring programs for use by, or used in combination with, a command execution system, apparatus or element.
- the program codes contained on the computer readable medium may be transmitted with any suitable medium including, but not limited to, wireless, wired, optical cable, RF medium, or any suitable combination of the above.
- a computer program code for executing the operations according to the present disclosure may be written in one or more programming languages or a combination thereof.
- the programming language includes an object-oriented programming language such as Java, Smalltalk and C++, and further includes a general procedural programming language such as “C” language or a similar programming language.
- the program codes may be executed entirely on a user computer, executed partially on the user computer, executed as a standalone package, executed partially on the user computer and partially on a remote computer, or executed entirely on the remote computer or a server.
- the remote computer may be connected to the user computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or be connected to an external computer (e.g., connected through Internet provided by an Internet service provider).
- LAN local area network
- WAN wide area network
- each of the blocks in the flowcharts or block diagrams may represent a module, a program segment, or a code portion, the module, the program segment, or the code portion comprising one or more executable instructions for implementing specified logic functions.
- the functions denoted by the blocks may occur in a sequence different from the sequences shown in the figures. For example, any two blocks presented in succession may be executed substantially in parallel, or they may sometimes be executed in a reverse sequence, depending on the function involved.
- each block in the block diagrams and/or flowcharts as well as a combination of blocks may be implemented using a dedicated hardware-based system executing specified functions or operations, or by a combination of dedicated hardware and computer instructions.
- the units involved in the embodiments of the present disclosure may be implemented by means of software or hardware.
- the described units may also be provided in a processor.
- the processor may be described as: a processor comprising an acquiring unit, an adjusting unit and a processing unit.
- the names of these units do not in some cases constitute a limitation to such units themselves.
- the acquiring unit may also be described as “a unit for acquiring a number of targets.”
- the present disclosure further provides a computer readable medium.
- the computer readable medium may be the computer readable medium included in the apparatus described in the above embodiments, or a stand-alone computer readable medium not assembled into the apparatus.
- the computer readable medium carries one or more programs.
- the one or more programs when executed by the apparatus, cause the apparatus to: acquire a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system; adjust a number of the target execution units in the stream computing system based on the to-be-adjusted number; determine, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and process, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.
Abstract
Description
Claims (9)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810812280.4A CN108984770A (en) | 2018-07-23 | 2018-07-23 | Method and apparatus for handling data |
CN201810812280.4 | 2018-07-23 |
Publications (2)
Publication Number | Publication Date |
---|---|
US20200026553A1 US20200026553A1 (en) | 2020-01-23 |
US11416283B2 true US11416283B2 (en) | 2022-08-16 |
Family
ID=64550176
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/503,145 Active 2040-01-06 US11416283B2 (en) | 2018-07-23 | 2019-07-03 | Method and apparatus for processing data in process of expanding or reducing capacity of stream computing system |
Country Status (2)
Country | Link |
---|---|
US (1) | US11416283B2 (en) |
CN (1) | CN108984770A (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111459667B (en) * | 2020-03-27 | 2024-01-05 | 深圳市梦网科技发展有限公司 | Data processing method, device, server and medium |
CN113764110A (en) * | 2021-01-29 | 2021-12-07 | 北京京东拓先科技有限公司 | Data processing method and device, electronic equipment and storage medium |
Citations (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101217491A (en) | 2008-01-04 | 2008-07-09 | 杭州华三通信技术有限公司 | A method of rectification processing unit load allocation method and device |
US20100070828A1 (en) * | 2006-11-02 | 2010-03-18 | Panasonic Corporation | Transmission method, transmitter apparatus and reception method |
US20120265890A1 (en) * | 2011-04-15 | 2012-10-18 | International Business Machines Corporation | Data streaming infrastructure for remote execution in a constrained environment |
US20130013901A1 (en) * | 1995-08-16 | 2013-01-10 | Microunity Systems Engineering, Inc. | System and apparatus for group floating-point inflate and deflate operations |
US20130159255A1 (en) * | 2011-12-20 | 2013-06-20 | Hitachi Computer Peripherals, Co., Ltd. | Storage system and method for controlling storage system |
CN103530189A (en) | 2013-09-29 | 2014-01-22 | 中国科学院信息工程研究所 | Automatic scaling and migrating method and device oriented to stream data |
CN103782270A (en) | 2013-10-28 | 2014-05-07 | 华为技术有限公司 | Method for managing stream processing system, and related apparatus and system |
CN104252466A (en) | 2013-06-26 | 2014-12-31 | 阿里巴巴集团控股有限公司 | Stream computing processing method, equipment and system |
CN104298556A (en) | 2013-07-17 | 2015-01-21 | 华为技术有限公司 | Allocation method and device for steam processing units |
CN104424186A (en) | 2013-08-19 | 2015-03-18 | 阿里巴巴集团控股有限公司 | Method and device for realizing persistence in flow calculation application |
US20150128150A1 (en) * | 2012-08-02 | 2015-05-07 | Fujitsu Limited | Data processing method and information processing apparatus |
CN104978232A (en) | 2014-04-09 | 2015-10-14 | 阿里巴巴集团控股有限公司 | Computation resource capacity expansion method for real-time stream-oriented computation, computation resource release method for real-time stream-oriented computation, computation resource capacity expansion device for real-time stream-oriented computation and computation resource release device for real-time stream-oriented computation |
US20160078093A1 (en) * | 2005-05-25 | 2016-03-17 | Experian Marketing Solutions, Inc. | Software and Metadata Structures for Distributed And Interactive Database Architecture For Parallel And Asynchronous Data Processing Of Complex Data And For Real-Time Query Processing |
US20160171009A1 (en) * | 2014-12-10 | 2016-06-16 | International Business Machines Corporation | Method and apparatus for data deduplication |
US20160248688A1 (en) | 2015-02-19 | 2016-08-25 | International Business Machines Corporation | Algorithmic changing in a streaming environment |
US20160373494A1 (en) * | 2014-03-06 | 2016-12-22 | Huawei Technologies Co., Ltd. | Data Processing Method in Stream Computing System, Control Node, and Stream Computing System |
US20170091011A1 (en) * | 2015-09-30 | 2017-03-30 | Robert Bosch Gmbh | Method and device for generating an output data stream |
CN106874133A (en) | 2017-01-17 | 2017-06-20 | 北京百度网讯科技有限公司 | The troubleshooting of calculate node in streaming computing system |
US20170201434A1 (en) * | 2014-05-30 | 2017-07-13 | Hewlett Packard Enterprise Development Lp | Resource usage data collection within a distributed processing framework |
US20170212894A1 (en) * | 2014-08-01 | 2017-07-27 | Hohai University | Traffic data stream aggregate query method and system |
US20180083839A1 (en) * | 2016-09-22 | 2018-03-22 | International Business Machines Corporation | Operator fusion management in a stream computing environment |
CN108073445A (en) | 2016-11-18 | 2018-05-25 | 腾讯科技(深圳)有限公司 | The back pressure processing method and system calculated based on distributed stream |
US10178021B1 (en) * | 2015-12-28 | 2019-01-08 | Amazon Technologies, Inc. | Clustered architecture design |
US11089076B1 (en) * | 2018-03-06 | 2021-08-10 | Amazon Technologies, Inc. | Automated detection of capacity for video streaming origin server |
-
2018
- 2018-07-23 CN CN201810812280.4A patent/CN108984770A/en active Pending
-
2019
- 2019-07-03 US US16/503,145 patent/US11416283B2/en active Active
Patent Citations (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130013901A1 (en) * | 1995-08-16 | 2013-01-10 | Microunity Systems Engineering, Inc. | System and apparatus for group floating-point inflate and deflate operations |
US20160078093A1 (en) * | 2005-05-25 | 2016-03-17 | Experian Marketing Solutions, Inc. | Software and Metadata Structures for Distributed And Interactive Database Architecture For Parallel And Asynchronous Data Processing Of Complex Data And For Real-Time Query Processing |
US20100070828A1 (en) * | 2006-11-02 | 2010-03-18 | Panasonic Corporation | Transmission method, transmitter apparatus and reception method |
CN101217491A (en) | 2008-01-04 | 2008-07-09 | 杭州华三通信技术有限公司 | A method of rectification processing unit load allocation method and device |
US20120265890A1 (en) * | 2011-04-15 | 2012-10-18 | International Business Machines Corporation | Data streaming infrastructure for remote execution in a constrained environment |
US20130159255A1 (en) * | 2011-12-20 | 2013-06-20 | Hitachi Computer Peripherals, Co., Ltd. | Storage system and method for controlling storage system |
US20150128150A1 (en) * | 2012-08-02 | 2015-05-07 | Fujitsu Limited | Data processing method and information processing apparatus |
CN104252466A (en) | 2013-06-26 | 2014-12-31 | 阿里巴巴集团控股有限公司 | Stream computing processing method, equipment and system |
US20150026347A1 (en) * | 2013-07-17 | 2015-01-22 | Huawei Technologies Co., Ltd. | Method and apparatus for allocating stream processing unit |
CN104298556A (en) | 2013-07-17 | 2015-01-21 | 华为技术有限公司 | Allocation method and device for steam processing units |
CN104424186A (en) | 2013-08-19 | 2015-03-18 | 阿里巴巴集团控股有限公司 | Method and device for realizing persistence in flow calculation application |
CN103530189A (en) | 2013-09-29 | 2014-01-22 | 中国科学院信息工程研究所 | Automatic scaling and migrating method and device oriented to stream data |
CN103782270A (en) | 2013-10-28 | 2014-05-07 | 华为技术有限公司 | Method for managing stream processing system, and related apparatus and system |
US20160373494A1 (en) * | 2014-03-06 | 2016-12-22 | Huawei Technologies Co., Ltd. | Data Processing Method in Stream Computing System, Control Node, and Stream Computing System |
US10097595B2 (en) * | 2014-03-06 | 2018-10-09 | Huawei Technologies Co., Ltd. | Data processing method in stream computing system, control node, and stream computing system |
CN104978232A (en) | 2014-04-09 | 2015-10-14 | 阿里巴巴集团控股有限公司 | Computation resource capacity expansion method for real-time stream-oriented computation, computation resource release method for real-time stream-oriented computation, computation resource capacity expansion device for real-time stream-oriented computation and computation resource release device for real-time stream-oriented computation |
US20150295970A1 (en) * | 2014-04-09 | 2015-10-15 | Alibaba Group Holding Limited | Method and device for augmenting and releasing capacity of computing resources in real-time stream computing system |
US20170201434A1 (en) * | 2014-05-30 | 2017-07-13 | Hewlett Packard Enterprise Development Lp | Resource usage data collection within a distributed processing framework |
US20170212894A1 (en) * | 2014-08-01 | 2017-07-27 | Hohai University | Traffic data stream aggregate query method and system |
US20160171009A1 (en) * | 2014-12-10 | 2016-06-16 | International Business Machines Corporation | Method and apparatus for data deduplication |
US20160248688A1 (en) | 2015-02-19 | 2016-08-25 | International Business Machines Corporation | Algorithmic changing in a streaming environment |
US20170091011A1 (en) * | 2015-09-30 | 2017-03-30 | Robert Bosch Gmbh | Method and device for generating an output data stream |
US10178021B1 (en) * | 2015-12-28 | 2019-01-08 | Amazon Technologies, Inc. | Clustered architecture design |
US20180083839A1 (en) * | 2016-09-22 | 2018-03-22 | International Business Machines Corporation | Operator fusion management in a stream computing environment |
CN108073445A (en) | 2016-11-18 | 2018-05-25 | 腾讯科技(深圳)有限公司 | The back pressure processing method and system calculated based on distributed stream |
CN106874133A (en) | 2017-01-17 | 2017-06-20 | 北京百度网讯科技有限公司 | The troubleshooting of calculate node in streaming computing system |
US11089076B1 (en) * | 2018-03-06 | 2021-08-10 | Amazon Technologies, Inc. | Automated detection of capacity for video streaming origin server |
Non-Patent Citations (1)
Title |
---|
Chinese Office Action for Chinese Application No. 201810812280.4, dated May 24, 2021, 10 pages. |
Also Published As
Publication number | Publication date |
---|---|
CN108984770A (en) | 2018-12-11 |
US20200026553A1 (en) | 2020-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220253458A1 (en) | Method and device for synchronizing node data | |
CN109145023B (en) | Method and apparatus for processing data | |
US10122598B2 (en) | Subscription watch lists for event handling | |
CN111277639B (en) | Method and device for maintaining data consistency | |
US11416283B2 (en) | Method and apparatus for processing data in process of expanding or reducing capacity of stream computing system | |
CN113485962B (en) | Log file storage method, device, equipment and storage medium | |
US10601915B2 (en) | Data stream processor with both in memory and persisted messaging | |
EP3825865A2 (en) | Method and apparatus for processing data | |
CN111338834B (en) | Data storage method and device | |
CN113779452B (en) | Data processing method, device, equipment and storage medium | |
CN113282589A (en) | Data acquisition method and device | |
US11048555B2 (en) | Method, apparatus, and computer program product for optimizing execution of commands in a distributed system | |
US20240069991A1 (en) | Abnormal request processing method and apparatus, electronic device and storage medium | |
US11277300B2 (en) | Method and apparatus for outputting information | |
CN113760487B (en) | Service processing method and device | |
CN114115941A (en) | Resource sending method, page rendering method, device, electronic equipment and medium | |
CN114785770A (en) | Mirror layer file sending method and device, electronic equipment and computer readable medium | |
CN114051024A (en) | File background continuous transmission method and device, storage medium and electronic equipment | |
CN113760929A (en) | Data synchronization method and device, electronic equipment and computer readable medium | |
CN110019671B (en) | Method and system for processing real-time message | |
CN115250276A (en) | Distributed system and data processing method and device | |
CN112784139A (en) | Query method, query device, electronic equipment and computer readable medium | |
CN113761548B (en) | Data transmission method and device for Shuffle process | |
CN114268558B (en) | Method, device, equipment and medium for generating monitoring graph | |
CN116431523B (en) | Test data management method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GAO, WEIKANG;WANG, YANLIN;XING, YUE;AND OTHERS;REEL/FRAME:049678/0817 Effective date: 20180827 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |