US11416283B2 - Method and apparatus for processing data in process of expanding or reducing capacity of stream computing system - Google Patents

Method and apparatus for processing data in process of expanding or reducing capacity of stream computing system Download PDF

Info

Publication number
US11416283B2
US11416283B2 US16/503,145 US201916503145A US11416283B2 US 11416283 B2 US11416283 B2 US 11416283B2 US 201916503145 A US201916503145 A US 201916503145A US 11416283 B2 US11416283 B2 US 11416283B2
Authority
US
United States
Prior art keywords
execution unit
target execution
identifier
processed data
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US16/503,145
Other versions
US20200026553A1 (en
Inventor
Weikang Gao
Yanlin Wang
Yue Xing
Jianwei Zhang
Yi Cheng
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Assigned to BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD. reassignment BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHENG, YI, Gao, Weikang, WANG, YANLIN, XING, YUE, ZHANG, JIANWEI
Publication of US20200026553A1 publication Critical patent/US20200026553A1/en
Application granted granted Critical
Publication of US11416283B2 publication Critical patent/US11416283B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45579I/O management, e.g. providing access to device drivers or storage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Definitions

  • Embodiments of the present disclosure relate to the field of computer technology, and specifically to a method and apparatus for processing data.
  • Stream computing is widely applied in large-scale distributed computing scenarios such as a scenario of information flow, a scenario of searching to construct a database, and a scenario of charging for retrievals.
  • the stream computing is a pipeline-like data processing mode.
  • the stream computing comes from a concept that data processing is performed once instantly as soon as one event occurs, instead of buffering data for batch processing.
  • the traffic In a stream computing system, the traffic generally has obvious fluctuations, which is greatly affected by an unexpected event, seasonality and so on. In order to ensure the service quality and make a rational use of resources, it is required to expand the capacity when the traffic is at the peak and reduce the capacity when the traffic is at the trough.
  • Embodiments of the present disclosure propose a method and apparatus for processing data.
  • the embodiments of the present disclosure provide a method for processing data.
  • the method includes: acquiring a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system; adjusting a number of the target execution units in the stream computing system based on the to-be-adjusted number; determining, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.
  • the method before the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set, the method further includes: persisting, according to an identifier set to which an identifier of to-be-processed data generated through running of an upstream execution unit of the target execution unit belongs, the generated to-be-processed data through the upstream execution unit of the target execution unit
  • the method further includes: sending indication information to the upstream execution unit of the target execution unit through the target execution unit, the indication information being used to indicate the to-be-processed data generated through the running of the upstream execution unit of the target execution unit and processed by the target execution unit.
  • the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set includes: restarting the at least one target execution unit after the adjustment; and receiving and processing, through the restarted target execution unit, to-be-processed data not processed by the target execution unit, wherein the to-be-processed data is sent by the upstream execution unit of the target execution unit, is in the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit, and is determined according to the indication information
  • the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set includes: de-duplicating, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit; and processing, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set
  • the embodiments of the present disclosure provide an apparatus for processing data.
  • the apparatus includes: an acquiring unit, configured to acquire a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system; an adjusting unit, configured to adjust a number of the target execution units in the stream computing system based on the to-be-adjusted number; and a processing unit, configured to determine, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and process, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.
  • the processing unit includes: a persisting subunit, configured to persist, according to an identifier set to which an identifier of to-be-processed data generated through running of an upstream execution unit of the target execution unit belongs, the generated to-be-processed data through the upstream execution unit of the target execution unit.
  • the processing unit includes: a sending subunit, configured to send indication information to the upstream execution unit of the target execution unit through the target execution unit, the indication information being used to indicate the to-be-processed data generated through the running of the upstream execution unit of the target execution unit and processed by the target execution unit.
  • the processing unit includes: a starting subunit, configured to restart the at least one target execution unit after the adjustment; and a processing subunit, configured to receive and process, through the restarted target execution unit, to-be-processed data not processed by the target execution unit, wherein the to-be-processed data is sent by the upstream execution unit of the target execution unit, is in the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit, and is determined according to the indication information.
  • the processing unit includes: a de-duplicating subunit, configured to de-duplicate, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit; and a processing subunit, configured to process, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set.
  • a de-duplicating subunit configured to de-duplicate, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit
  • a processing subunit configured to process, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set.
  • the embodiments of the present disclosure provide a device.
  • the device includes: one or more processors; and a storage device, configured to store one or more programs.
  • the one or more programs when executed by the one or more processors, cause the one or more processors to implement the method described in the first aspect.
  • the embodiments of the present disclosure provide a computer readable medium storing a computer program.
  • the program when executed by a processor, implements the method described in the first aspect.
  • the to-be-adjusted number of the target execution units is acquired, and then, the number of the target execution units in the stream computing system is adjusted based on the to-be-adjusted number. Finally, for the target execution unit in the at least one target execution unit after the adjustment, the identifier set corresponding to the target execution unit is determined, and the to-be-processed data indicated by the identifier in the corresponding identifier set is processed through the target execution unit.
  • FIG. 1 is a diagram of an exemplary system architecture in which an embodiment of the present disclosure may be implemented
  • FIG. 2 is a flowchart of an embodiment of a method for processing data according to the present disclosure
  • FIG. 3 is a flowchart of another embodiment of the method for processing data according to the present disclosure.
  • FIG. 4 is a schematic diagram of an application scenario of the method for processing data according to the present disclosure.
  • FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for processing data according to the present disclosure.
  • FIG. 6 is a schematic structural diagram of a computer system adapted to implement a server according to embodiments of the present disclosure.
  • FIG. 1 shows an exemplary system architecture 100 in which an embodiment of a method for processing data or an apparatus for processing data according to the present disclosure may be implemented.
  • the system architecture 100 may include terminal devices 101 , 102 and 103 , a network 104 and a server 105 .
  • the network 104 serves as a medium providing a communication link between the terminal devices 101 , 102 and 103 and the server 105 .
  • the network 104 may include various types of connections, for example, wired or wireless communication links, or optical fiber cables.
  • a user may interact with the server 105 via the network 104 using the terminal devices 101 , 102 and 103 , to receive or send messages.
  • Various applications e.g., a social application, an image processing application, an e-commerce application and a search application
  • the terminal devices 101 , 102 and 103 may be hardware or software.
  • the terminal devices 101 , 102 and 103 may be various electronic devices having a display screen, which include, but not limited to, a smart phone, a tablet computer, a laptop portable computer and a desktop computer.
  • the terminal devices 101 , 102 and 103 may be installed in the above listed electronic devices.
  • the terminal devices may be implemented as a plurality of pieces of software or a plurality of software modules, or as a single piece of software or a single software module, which will not be specifically defined here.
  • the server 105 may be a server providing various services, for example, a backend server providing a support for the applications installed on the terminal devices 101 , 102 and 103 .
  • the server 105 may acquire a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system; adjust a number of the target execution unitss in the stream computing system based on the to-be-adjusted number; determine, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and process, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.
  • the method for processing data provided by the embodiments of the present disclosure may be generally performed by the server 105 .
  • the apparatus for processing data may be provided in the server 105 .
  • the server may be hardware or software.
  • the server may be implemented as a distributed server cluster composed of a plurality of servers, or as a single server.
  • the server may be implemented as a plurality of pieces of software or a plurality of software modules (e.g., software or software modules for providing a distributed service), or as a single piece of software or a single software module, which will not be specifically defined here.
  • terminal devices the numbers of the terminal devices, the networks, and the servers in FIG. 1 are merely illustrative. Any number of terminal devices, networks, and servers may be provided based on actual requirements.
  • the method for processing data includes the following steps.
  • Step 201 acquiring a to-be-adjusted number of target execution units.
  • an execution body e.g., the server shown in FIG. 1
  • the target execution unit refers to a unit executing a target program segment in a stream computing system.
  • the stream computing system may include a control node and a plurality of work nodes.
  • a work node may also be referred to as an operator.
  • the control node may send a corresponding control instruction to a subordinate work node, so that the work node invokes an execution unit to process a data stream generated by a service according to the control instruction.
  • Each work node may include one or more execution units. When the work node is invoked to process the data stream, the data stream is processed by the execution units included in the work node, and the execution unit may be a thread or a process.
  • the stream computing system may include several stream computing tasks (applications), each of the stream computing tasks is composed of some independent computational logics (processors) according to an upstream and downstream subscription relationship.
  • processors independent computational logics
  • the computational logics may be distributed on a plurality of servers in a multi-process mode.
  • Data (Tuple) flows between processes having the upstream and downstream subscription relationship through a remote procedure call (PRC), and to-be-processed data given to a downstream execution unit is produced and a modification for an intermediate state is caused during the data processing process.
  • the execution unit may be a thread or a process executing an independent computational logic, and the independent computational logic is embodied as a segment of a program.
  • the target execution unit may be a unit in the stream computing system, the number of which needs to be modified. For example, the number of the target execution units may be increased when the load is too heavy.
  • the to-be-adjusted number may be determined by the execution body according to a corresponding relationship between a pre-established load condition and the number of the target execution units, and the load condition may be reflected by traffic information or processing speed information.
  • the to-be-adjusted number may also be determined according a concurrency setting instruction after the concurrency setting instruction is acquired.
  • the concurrency of the word node may represent the number of the execution units included in the work node. For example, if the concurrency of the work node is 3, it means that the work node may invoke 3 execution units to process the data stream.
  • Step 202 adjusting a number of the target execution units in a stream computing system based on the to-be-adjusted number.
  • the execution body may adjust the number of the target execution units in the stream computing system based on the to-be-adjusted number acquired in step 201 . If the number of the target execution units running actually in the stream computing system is identical to the to-be-adjusted number, the number of the target execution units may not need to be adjusted. If the number of the target execution units actually running in the stream computing system is different from the to-be-adjusted number, the number of the target execution units in the stream computing system may be adjusted to the to-be-adjusted number.
  • Step 203 determining, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit.
  • the execution body may first determine the identifier set corresponding to the target execution unit.
  • An identifier in the identifier set is used to indicate to-be-processed data.
  • the identifier may be generated according to a preset rule, for example, may be determined according to the generation order, the generation time, the storage location and the source of data.
  • the identifier set includes the identifier for indicating the to-be-processed data of the execution unit.
  • the identifier included in the identifier set may remain unchanged before and after the adjustment of the number of target execution units in the stream computing system.
  • the identifier may be mapped to the identifier set using a hash algorithm, and the corresponding relationship between the identifier and the identifier set may also be pre-established by other means.
  • a total number of identifier sets may also remain unchanged before and after the adjustment of the number of target execution units in the stream computing system. That is, the identifier sets corresponding to the total of the target execution units in the stream computing system remain unchanged.
  • the execution body may adjust a mapping relationship between each target execution unit and each identifier set according to a preset rule.
  • the specific rule may be set based on actual requirements. For example, in consideration of load balance, the identifier sets may be allocated averagely to the target execution units.
  • the method before the to-be-processed data indicated by the identifier in the corresponding identifier set is processed through the target execution unit, the method further includes: persisting, according to the identifier set to which the identifier of the to-be-processed data generated through running an upstream execution unit of the target execution unit belongs, the generated to-be-processed data through the upstream execution unit of the target execution unit.
  • the upstream execution unit of the target execution unit may be a unit that provides the to-be-processed data to the target execution unit in the stream computing system.
  • the persistence is a mechanism of converting program data between a persistent state and a transient state. That is, transient data (e.g., data in a memory, which cannot be permanently preserved) is persisted as persistent data, for example, persisted into the database, so that the data can be stored for a long time.
  • the persistence may include full persistence and incremental persistence, and the incremental persistence may avoid the duplication of the data, to further improve the data processing efficiency.
  • the to-be-processed data generated through the running of the upstream execution unit is persisted, which may avoid the loss of the data when the capacity of the stream computing system is expanded or reduced, thus further improving the data processing efficiency.
  • Step 204 processing, through the target execution unit, to-be-processed data indicated by an identifier in the corresponding identifier set.
  • the execution body may process, through the target execution unit, the to-be-processed data indicated by the identifier in the identifier set corresponding to the target execution unit and determined in step 203 .
  • the method further includes: sending, through the target execution unit, indication information to the upstream execution unit of the target execution unit.
  • the indication information is used to indicate the to-be-processed data generated through the running of the upstream execution unit of the target execution unit and processed by the target execution unit.
  • the indication information may further be used to indicate the to-be-processed data generated through the running of the upstream execution unit of the target execution unit and successfully received by the target execution unit, for example, an acknowledgement (ACK).
  • ACK acknowledgement
  • the acknowledgement may be a transmission control character sent to a sender by a receiver, representing that the receipt of the sent data is acknowledged without errors.
  • the sending of the indication information may prevent the upstream execution unit from sending duplicated to-be-processed data to the target execution unit, thus further improving the data processing efficiency.
  • the processing, through the target execution unit, to-be-processed data indicated by an identifier in the corresponding identifier set includes: restarting the at least one target execution unit after the adjustment; and receiving and processing, through the restarted target execution unit, to-be-processed data not processed by the target execution unit, wherein the to-be-processed data is sent by the upstream execution unit of the target execution unit, is in the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit, and is determined according to the indication information.
  • the target execution unit receives and processes the to-be-processed data not processed by the target execution unit, and the to-be-processed data is in the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit, and is determined according to the indication information. Accordingly, it is ensured that the to-be-processed data is not duplicated and lost when the capacity of the stream computing system is expanded or reduced, which further improves the data processing efficiency.
  • the to-be-adjusted number of the target execution units is acquired, the target execution unit referring to the unit executing the target program segment in the stream computing system.
  • the number of the target execution units in the stream computing system is adjusted based on the to-be-adjusted number.
  • the identifier set corresponding to the target execution unit is determined, the identifier in the identifier set being used to indicate the to-be-processed data.
  • the to-be-processed data indicated by the identifier in the corresponding identifier set is processed through the target execution unit. Accordingly, an identifier-based data processing mechanism is provided.
  • An identifier set is a logical concept with few physical costs, and the setting of the identifier set is very flexible. Thus, the flexibility of the streaming computing system is improved when the capacity of the system is expanded and reduced.
  • FIG. 3 illustrates a flow 300 of another embodiment of the method for processing data.
  • the flow 300 of the method for processing data includes the following steps.
  • Step 301 acquiring a to-be-adjusted number of target execution units.
  • an execution body e.g., the server shown in FIG. 1
  • the method for processing data may first acquire the to-be-adjusted number of the target execution units.
  • Step 302 adjusting a number of the target execution units in a stream computing system based on the to-be-adjusted number.
  • the execution body may adjust the number of the target execution units in the stream computing system based on the to-be-adjusted number acquired in step 301 .
  • Step 303 determining, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit.
  • the execution body may first determine the identifier set corresponding to the target execution unit.
  • An identifier in the identifier set is used to indicate to-be-processed data.
  • Step 304 de-duplicating, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, to-be-processed data sent to the target execution unit by an upstream execution unit of the target execution unit.
  • the execution body may de-duplicate the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit.
  • the execution body may further remove, from the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit, data repeated with the received to-be-processed data recorded in the historical record.
  • the historical record may include information such as the identifier of the received to-be-processed data.
  • Step 305 processing, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set.
  • the execution body may process the to-be-processed data indicated by the identifier in the corresponding identifier set after the de-duplication in step 304 , and thus, the target execution unit may merely process the data that has not been processed by the target execution unit.
  • steps 301 , 302 and 303 are substantially the same as the operations in steps 201 , 202 and 203 , which will not be repeatedly described here.
  • the execution body filters the data that is in the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit and is repeated with the received data recorded in the historical record.
  • the repeated processing on the data may be avoid, to further improve the information processing efficiency.
  • FIG. 4 is a schematic diagram of an application scenario of the method for processing data according to this embodiment.
  • the stream computing system includes a target execution unit 402 , an upstream execution unit 401 of the target execution unit 402 , and a downstream execution unit 403 of the target execution unit 402 .
  • the number of target execution units 402 in the stream computing system is 2 before being adjusted, and the to-be-adjusted number of the target execution units 402 is acquired as 3. Based on the to-be-adjusted number, the number of the target execution units 402 is adjusted to 3.
  • the identifier sets corresponding to the unit having the identifier of 0 are kg 0 , kg 1 and kg 2
  • the identifier sets corresponding to the unit having the identifier of 1 are kg 3 , kg 4 and kg 5 .
  • the identifier sets corresponding to the unit having the identifier of 0 are kg 0 and kg 1
  • the identifier sets corresponding to the unit having the identifier of 1 are kg 2 and kg 3
  • the identifier sets corresponding to the unit having the identifier of 2 are kg 4 and kg 5 .
  • the to-be-processed data indicated by an identifier in the identifier sets kg 0 and kg 1 may be processed through the unit having the identifier of 0.
  • the to-be-processed data indicated by an identifier in the identifier sets kg 2 and kg 3 may be processed through the unit having the identifier of 1.
  • the to-be-processed data indicated by an identifier in the identifier sets kg 4 and kg 5 may be processed through the unit having the identifier of 2.
  • the data processed by the target execution units 402 may further be persisted into a storage device 404 .
  • the present disclosure provides an embodiment of an apparatus for processing data.
  • the embodiment of the apparatus corresponds to the embodiment of the method shown in FIG. 2 , and the apparatus may be applied in various electronic devices.
  • the apparatus 500 for processing data in this embodiment includes: an acquiring unit 510 , an adjusting unit 520 and a processing unit 530 .
  • the acquiring unit 510 is configured to acquire a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system.
  • the adjusting unit 520 is configured to adjust a number of the target execution units in the stream computing system based on the to-be-adjusted number.
  • the processing unit 530 is configured to determine, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and process, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.
  • step 201 for specific processing of the acquiring unit 510 , the adjusting unit 520 and the processing unit 530 in the apparatus 500 for processing data, and the technical effects thereof, reference may be made to relative descriptions of step 201 , step 202 and step 203 in the corresponding embodiment of FIG. 2 respectively.
  • the processing unit 530 includes a persisting subunit 531 .
  • the persisting subunit is configured to persist, according to an identifier set to which an identifier of to-be-processed data generated through running of an upstream execution unit of the target execution unit belongs, the generated to-be-processed data through the upstream execution unit of the target execution unit.
  • the processing unit 530 includes a sending subunit 532 .
  • the sending subunit is configured to send indication information to the upstream execution unit of the target execution unit through the target execution unit.
  • the indication information is used to indicate the to-be-processed data generated through the running of the upstream execution unit of the target execution unit and processed by the target execution unit.
  • the processing unit 530 includes: a starting subunit 533 , configured to restart the at least one target execution unit after the adjustment; and a processing subunit 535 , configured to receive and process, through the restarted target execution unit, to-be-processed data not processed by the target execution unit, where the to-be-processed data is sent by the upstream execution unit of the target execution unit, is in the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit, and is determined according to the indication information.
  • the processing unit 530 includes: a de-duplicating subunit 534 , configured to de-duplicate, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit; and a processing subunit 535 , configured to process, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set.
  • the to-be-adjusted number of the target execution units is acquired, the target execution unit referring to the unit executing the target program segment in the stream computing system.
  • the number of the target execution units in the stream computing system is adjusted based on the to-be-adjusted number.
  • the identifier set corresponding to the target execution unit is determined, the identifier in the identifier set being used to indicate the to-be-processed data.
  • the to-be-processed data indicated by the identifier in the corresponding identifier set is processed through the target execution unit. Accordingly, an identifier-based data processing mechanism is provided, thus improving the efficiency of processing the data.
  • FIG. 6 is a schematic structural diagram of a computer system 600 adapted to implement a server of the embodiments of the present disclosure.
  • the server shown in FIG. 6 is merely an example, and should not bring any limitations to the functions and the scope of use of the embodiments of the present disclosure.
  • the computer system 600 includes a central processing unit (CPU) 601 , which may execute various appropriate actions and processes in accordance with a program stored in a read-only memory (ROM) 602 or a program loaded into a random access memory (RAM) 603 from a storage portion 608 .
  • the RAM 603 also stores various programs and data required by operations of the system 600 .
  • the CPU 601 , the ROM 602 and the RAM 603 are connected to each other through a bus 604 .
  • An input/output (I/O) interface 605 is also connected to the bus 604 .
  • the following components are connected to the I/O interface 605 : an input portion 606 including a keyboard, a mouse etc.; an output portion 607 including a cathode ray tube (CRT), a liquid crystal display device (LCD), a speaker etc.; a storage portion 608 including a hard disk and the like; and a communication portion 609 including a network interface card such as a LAN card and a modem.
  • the communication portion 609 performs communication processes via a network such as the Internet.
  • a driver 610 is also connected to the I/O interface 605 as required.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, and a semiconductor memory may be installed on the driver 610 , to facilitate the retrieval of a computer program from the removable medium 611 , and the installation thereof on the storage portion 608 as needed.
  • an embodiment of the present disclosure includes a computer program product, including a computer program hosted on a computer readable medium, the computer program including program codes for performing the method as illustrated in the flowchart.
  • the computer program may be downloaded and installed from a network via the communication portion 609 , and/or may be installed from the removable medium 611 .
  • the computer program when executed by the central processing unit (CPU) 601 , implements the above mentioned functionalities defined in the method of the present disclosure.
  • the computer readable medium in the present disclosure may be a computer readable signal medium, a computer readable storage medium, or any combination of the two.
  • the computer readable storage medium may be, but not limited to: an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or element, or any combination of the above.
  • a more specific example of the computer readable storage medium may include, but not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read only memory (ROM), an erasable programmable read only memory (EPROM or flash memory), a fibre, a portable compact disk read only memory (CD-ROM), an optical memory, a magnet memory or any suitable combination of the above.
  • the computer readable medium may be any physical medium containing or storing programs, which may be used by a command execution system, apparatus or element or incorporated thereto.
  • the computer readable signal medium may include a data signal that is propagated in a baseband or as a part of a carrier wave, which carries computer readable program codes. Such propagated data signal may be in various forms, including, but not limited to, an electromagnetic signal, an optical signal, or any suitable combination of the above.
  • the computer readable signal medium may also be any computer readable medium other than the computer readable storage medium.
  • the computer readable medium is capable of transmitting, propagating or transferring programs for use by, or used in combination with, a command execution system, apparatus or element.
  • the program codes contained on the computer readable medium may be transmitted with any suitable medium including, but not limited to, wireless, wired, optical cable, RF medium, or any suitable combination of the above.
  • a computer program code for executing the operations according to the present disclosure may be written in one or more programming languages or a combination thereof.
  • the programming language includes an object-oriented programming language such as Java, Smalltalk and C++, and further includes a general procedural programming language such as “C” language or a similar programming language.
  • the program codes may be executed entirely on a user computer, executed partially on the user computer, executed as a standalone package, executed partially on the user computer and partially on a remote computer, or executed entirely on the remote computer or a server.
  • the remote computer may be connected to the user computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or be connected to an external computer (e.g., connected through Internet provided by an Internet service provider).
  • LAN local area network
  • WAN wide area network
  • each of the blocks in the flowcharts or block diagrams may represent a module, a program segment, or a code portion, the module, the program segment, or the code portion comprising one or more executable instructions for implementing specified logic functions.
  • the functions denoted by the blocks may occur in a sequence different from the sequences shown in the figures. For example, any two blocks presented in succession may be executed substantially in parallel, or they may sometimes be executed in a reverse sequence, depending on the function involved.
  • each block in the block diagrams and/or flowcharts as well as a combination of blocks may be implemented using a dedicated hardware-based system executing specified functions or operations, or by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure may be implemented by means of software or hardware.
  • the described units may also be provided in a processor.
  • the processor may be described as: a processor comprising an acquiring unit, an adjusting unit and a processing unit.
  • the names of these units do not in some cases constitute a limitation to such units themselves.
  • the acquiring unit may also be described as “a unit for acquiring a number of targets.”
  • the present disclosure further provides a computer readable medium.
  • the computer readable medium may be the computer readable medium included in the apparatus described in the above embodiments, or a stand-alone computer readable medium not assembled into the apparatus.
  • the computer readable medium carries one or more programs.
  • the one or more programs when executed by the apparatus, cause the apparatus to: acquire a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system; adjust a number of the target execution units in the stream computing system based on the to-be-adjusted number; determine, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and process, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.

Abstract

A method and apparatus for processing stream data are provided. The method may include: acquiring a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system; adjusting a number of the target execution units in the stream computing system based on the to-be-adjusted number; determining, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority to Chinese Application No. 201810812280.4, filed on Jul. 23, 2018 and entitled “Method and Apparatus for Processing Data,” the entire disclosure of which is hereby incorporated by reference.
TECHNICAL FIELD
Embodiments of the present disclosure relate to the field of computer technology, and specifically to a method and apparatus for processing data.
BACKGROUND
Stream computing is widely applied in large-scale distributed computing scenarios such as a scenario of information flow, a scenario of searching to construct a database, and a scenario of charging for retrievals. The stream computing is a pipeline-like data processing mode. The stream computing comes from a concept that data processing is performed once instantly as soon as one event occurs, instead of buffering data for batch processing.
In a stream computing system, the traffic generally has obvious fluctuations, which is greatly affected by an unexpected event, seasonality and so on. In order to ensure the service quality and make a rational use of resources, it is required to expand the capacity when the traffic is at the peak and reduce the capacity when the traffic is at the trough.
When the capacity of the current streaming computing system is expanded and reduced, it is required to first stop a task involved in the streaming computing system and update the concurrency configuration, and then restart the task involved in the streaming computing system.
SUMMARY
Embodiments of the present disclosure propose a method and apparatus for processing data.
In a first aspect, the embodiments of the present disclosure provide a method for processing data. The method includes: acquiring a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system; adjusting a number of the target execution units in the stream computing system based on the to-be-adjusted number; determining, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.
In some embodiments, before the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set, the method further includes: persisting, according to an identifier set to which an identifier of to-be-processed data generated through running of an upstream execution unit of the target execution unit belongs, the generated to-be-processed data through the upstream execution unit of the target execution unit
In some embodiments, after the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set, the method further includes: sending indication information to the upstream execution unit of the target execution unit through the target execution unit, the indication information being used to indicate the to-be-processed data generated through the running of the upstream execution unit of the target execution unit and processed by the target execution unit.
In some embodiments, the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set includes: restarting the at least one target execution unit after the adjustment; and receiving and processing, through the restarted target execution unit, to-be-processed data not processed by the target execution unit, wherein the to-be-processed data is sent by the upstream execution unit of the target execution unit, is in the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit, and is determined according to the indication information
In some embodiments, the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set includes: de-duplicating, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit; and processing, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set
In a second aspect, the embodiments of the present disclosure provide an apparatus for processing data. The apparatus includes: an acquiring unit, configured to acquire a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system; an adjusting unit, configured to adjust a number of the target execution units in the stream computing system based on the to-be-adjusted number; and a processing unit, configured to determine, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and process, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.
In some embodiments, the processing unit includes: a persisting subunit, configured to persist, according to an identifier set to which an identifier of to-be-processed data generated through running of an upstream execution unit of the target execution unit belongs, the generated to-be-processed data through the upstream execution unit of the target execution unit.
In some embodiments, the processing unit includes: a sending subunit, configured to send indication information to the upstream execution unit of the target execution unit through the target execution unit, the indication information being used to indicate the to-be-processed data generated through the running of the upstream execution unit of the target execution unit and processed by the target execution unit.
In some embodiments, the processing unit includes: a starting subunit, configured to restart the at least one target execution unit after the adjustment; and a processing subunit, configured to receive and process, through the restarted target execution unit, to-be-processed data not processed by the target execution unit, wherein the to-be-processed data is sent by the upstream execution unit of the target execution unit, is in the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit, and is determined according to the indication information.
In some embodiments, the processing unit includes: a de-duplicating subunit, configured to de-duplicate, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit; and a processing subunit, configured to process, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set.
In a third aspect, the embodiments of the present disclosure provide a device. The device includes: one or more processors; and a storage device, configured to store one or more programs. The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method described in the first aspect.
In a fourth aspect, the embodiments of the present disclosure provide a computer readable medium storing a computer program. The program, when executed by a processor, implements the method described in the first aspect.
According to the method and apparatus for processing data provided by the embodiments of the present disclosure, the to-be-adjusted number of the target execution units is acquired, and then, the number of the target execution units in the stream computing system is adjusted based on the to-be-adjusted number. Finally, for the target execution unit in the at least one target execution unit after the adjustment, the identifier set corresponding to the target execution unit is determined, and the to-be-processed data indicated by the identifier in the corresponding identifier set is processed through the target execution unit. Thus, when the capacity of the stream computing system is expanded or reduced, it is not required to first stop an involved task in the streaming computing system and re-determine the to-be-processed data of the target execution unit based on the identifier, thereby improving the efficiency of processing the data.
BRIEF DESCRIPTION OF THE DRAWINGS
After reading detailed descriptions of non-limiting embodiments given with reference to the following accompanying drawings, other features, objectives and advantages of the present disclosure will be more apparent:
FIG. 1 is a diagram of an exemplary system architecture in which an embodiment of the present disclosure may be implemented;
FIG. 2 is a flowchart of an embodiment of a method for processing data according to the present disclosure;
FIG. 3 is a flowchart of another embodiment of the method for processing data according to the present disclosure;
FIG. 4 is a schematic diagram of an application scenario of the method for processing data according to the present disclosure;
FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for processing data according to the present disclosure; and
FIG. 6 is a schematic structural diagram of a computer system adapted to implement a server according to embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
The present disclosure will be described below in detail with reference to the accompanying drawings and in combination with the embodiments. It should be appreciated that the specific embodiments described herein are merely used for explaining the relevant invention, rather than limiting the invention. In addition, it should be noted that, for the ease of description, only the parts related to the relevant invention are shown in the accompanying drawings.
It should also be noted that the embodiments in the present disclosure and the features in the embodiments may be combined with each other on a non-conflict basis. The present disclosure will be described below in detail with reference to the accompanying drawings and in combination with the embodiments.
FIG. 1 shows an exemplary system architecture 100 in which an embodiment of a method for processing data or an apparatus for processing data according to the present disclosure may be implemented.
As shown in FIG. 1, the system architecture 100 may include terminal devices 101, 102 and 103, a network 104 and a server 105. The network 104 serves as a medium providing a communication link between the terminal devices 101, 102 and 103 and the server 105. The network 104 may include various types of connections, for example, wired or wireless communication links, or optical fiber cables.
A user may interact with the server 105 via the network 104 using the terminal devices 101, 102 and 103, to receive or send messages. Various applications (e.g., a social application, an image processing application, an e-commerce application and a search application) may be installed on the terminal devices 101, 102 and 103.
The terminal devices 101, 102 and 103 may be hardware or software. When being the hardware, the terminal devices 101, 102 and 103 may be various electronic devices having a display screen, which include, but not limited to, a smart phone, a tablet computer, a laptop portable computer and a desktop computer. When being the software, the terminal devices 101, 102 and 103 may be installed in the above listed electronic devices. The terminal devices may be implemented as a plurality of pieces of software or a plurality of software modules, or as a single piece of software or a single software module, which will not be specifically defined here.
The server 105 may be a server providing various services, for example, a backend server providing a support for the applications installed on the terminal devices 101, 102 and 103. The server 105 may acquire a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system; adjust a number of the target execution unitss in the stream computing system based on the to-be-adjusted number; determine, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and process, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.
It should be noted that the method for processing data provided by the embodiments of the present disclosure may be generally performed by the server 105. Correspondingly, the apparatus for processing data may be provided in the server 105.
It should be noted that the server may be hardware or software. When being the hardware, the server may be implemented as a distributed server cluster composed of a plurality of servers, or as a single server. When being the software, the server may be implemented as a plurality of pieces of software or a plurality of software modules (e.g., software or software modules for providing a distributed service), or as a single piece of software or a single software module, which will not be specifically defined here.
It should be appreciated that the numbers of the terminal devices, the networks, and the servers in FIG. 1 are merely illustrative. Any number of terminal devices, networks, and servers may be provided based on actual requirements.
Further referring to FIG. 2, a flow 200 of an embodiment of a method for processing data according to the present disclosure is illustrated. The method for processing data includes the following steps.
Step 201, acquiring a to-be-adjusted number of target execution units.
In this embodiment, an execution body (e.g., the server shown in FIG. 1) of the method for processing data may first acquire the to-be-adjusted number of the target execution units. The target execution unit refers to a unit executing a target program segment in a stream computing system. The stream computing system may include a control node and a plurality of work nodes. A work node may also be referred to as an operator. The control node may send a corresponding control instruction to a subordinate work node, so that the work node invokes an execution unit to process a data stream generated by a service according to the control instruction. Each work node may include one or more execution units. When the work node is invoked to process the data stream, the data stream is processed by the execution units included in the work node, and the execution unit may be a thread or a process.
The stream computing system may include several stream computing tasks (applications), each of the stream computing tasks is composed of some independent computational logics (processors) according to an upstream and downstream subscription relationship. In actual operation, by configuring a concurrency (parallelism), the computational logics may be distributed on a plurality of servers in a multi-process mode. Data (Tuple) flows between processes having the upstream and downstream subscription relationship through a remote procedure call (PRC), and to-be-processed data given to a downstream execution unit is produced and a modification for an intermediate state is caused during the data processing process. The execution unit may be a thread or a process executing an independent computational logic, and the independent computational logic is embodied as a segment of a program.
The target execution unit may be a unit in the stream computing system, the number of which needs to be modified. For example, the number of the target execution units may be increased when the load is too heavy. The to-be-adjusted number may be determined by the execution body according to a corresponding relationship between a pre-established load condition and the number of the target execution units, and the load condition may be reflected by traffic information or processing speed information. The to-be-adjusted number may also be determined according a concurrency setting instruction after the concurrency setting instruction is acquired. The concurrency of the word node may represent the number of the execution units included in the work node. For example, if the concurrency of the work node is 3, it means that the work node may invoke 3 execution units to process the data stream.
Step 202, adjusting a number of the target execution units in a stream computing system based on the to-be-adjusted number.
In this embodiment, the execution body may adjust the number of the target execution units in the stream computing system based on the to-be-adjusted number acquired in step 201. If the number of the target execution units running actually in the stream computing system is identical to the to-be-adjusted number, the number of the target execution units may not need to be adjusted. If the number of the target execution units actually running in the stream computing system is different from the to-be-adjusted number, the number of the target execution units in the stream computing system may be adjusted to the to-be-adjusted number.
Step 203, determining, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit.
In this embodiment, for the target execution unit in the at least one target execution unit after the adjustment in step 202, the execution body may first determine the identifier set corresponding to the target execution unit. An identifier in the identifier set is used to indicate to-be-processed data. The identifier may be generated according to a preset rule, for example, may be determined according to the generation order, the generation time, the storage location and the source of data.
Here, the identifier set includes the identifier for indicating the to-be-processed data of the execution unit. The identifier included in the identifier set may remain unchanged before and after the adjustment of the number of target execution units in the stream computing system. As an example, the identifier may be mapped to the identifier set using a hash algorithm, and the corresponding relationship between the identifier and the identifier set may also be pre-established by other means. A total number of identifier sets may also remain unchanged before and after the adjustment of the number of target execution units in the stream computing system. That is, the identifier sets corresponding to the total of the target execution units in the stream computing system remain unchanged. The execution body may adjust a mapping relationship between each target execution unit and each identifier set according to a preset rule. The specific rule may be set based on actual requirements. For example, in consideration of load balance, the identifier sets may be allocated averagely to the target execution units.
In some alternative implementations of this embodiment, before the to-be-processed data indicated by the identifier in the corresponding identifier set is processed through the target execution unit, the method further includes: persisting, according to the identifier set to which the identifier of the to-be-processed data generated through running an upstream execution unit of the target execution unit belongs, the generated to-be-processed data through the upstream execution unit of the target execution unit.
In this implementation, the upstream execution unit of the target execution unit may be a unit that provides the to-be-processed data to the target execution unit in the stream computing system. The persistence is a mechanism of converting program data between a persistent state and a transient state. That is, transient data (e.g., data in a memory, which cannot be permanently preserved) is persisted as persistent data, for example, persisted into the database, so that the data can be stored for a long time. The persistence may include full persistence and incremental persistence, and the incremental persistence may avoid the duplication of the data, to further improve the data processing efficiency. According to the identifier set to which the identifier of the to-be-processed data belongs, the to-be-processed data generated through the running of the upstream execution unit is persisted, which may avoid the loss of the data when the capacity of the stream computing system is expanded or reduced, thus further improving the data processing efficiency.
Step 204, processing, through the target execution unit, to-be-processed data indicated by an identifier in the corresponding identifier set.
In this embodiment, for the target execution unit in the at least one target execution unit after the adjustment in step 202, the execution body may process, through the target execution unit, the to-be-processed data indicated by the identifier in the identifier set corresponding to the target execution unit and determined in step 203.
In some alternative implementations of this embodiment, after the to-be-processed data indicated by the identifier in the corresponding identifier set is processed through the target execution unit, the method further includes: sending, through the target execution unit, indication information to the upstream execution unit of the target execution unit. The indication information is used to indicate the to-be-processed data generated through the running of the upstream execution unit of the target execution unit and processed by the target execution unit.
In this implementation, the indication information may further be used to indicate the to-be-processed data generated through the running of the upstream execution unit of the target execution unit and successfully received by the target execution unit, for example, an acknowledgement (ACK). In the data communication, the acknowledgement may be a transmission control character sent to a sender by a receiver, representing that the receipt of the sent data is acknowledged without errors. The sending of the indication information may prevent the upstream execution unit from sending duplicated to-be-processed data to the target execution unit, thus further improving the data processing efficiency.
In some alternative implementations of this embodiment, the processing, through the target execution unit, to-be-processed data indicated by an identifier in the corresponding identifier set includes: restarting the at least one target execution unit after the adjustment; and receiving and processing, through the restarted target execution unit, to-be-processed data not processed by the target execution unit, wherein the to-be-processed data is sent by the upstream execution unit of the target execution unit, is in the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit, and is determined according to the indication information.
In this implementation, the target execution unit receives and processes the to-be-processed data not processed by the target execution unit, and the to-be-processed data is in the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit, and is determined according to the indication information. Accordingly, it is ensured that the to-be-processed data is not duplicated and lost when the capacity of the stream computing system is expanded or reduced, which further improves the data processing efficiency.
According to the method provided by the above embodiment of the present disclosure, the to-be-adjusted number of the target execution units is acquired, the target execution unit referring to the unit executing the target program segment in the stream computing system. The number of the target execution units in the stream computing system is adjusted based on the to-be-adjusted number. For the target execution unit in the at least one target execution unit after the adjustment, the identifier set corresponding to the target execution unit is determined, the identifier in the identifier set being used to indicate the to-be-processed data. The to-be-processed data indicated by the identifier in the corresponding identifier set is processed through the target execution unit. Accordingly, an identifier-based data processing mechanism is provided. An identifier set is a logical concept with few physical costs, and the setting of the identifier set is very flexible. Thus, the flexibility of the streaming computing system is improved when the capacity of the system is expanded and reduced.
Further referring to FIG. 3, FIG. 3 illustrates a flow 300 of another embodiment of the method for processing data. The flow 300 of the method for processing data includes the following steps.
Step 301, acquiring a to-be-adjusted number of target execution units.
In this embodiment, an execution body (e.g., the server shown in FIG. 1) of the method for processing data may first acquire the to-be-adjusted number of the target execution units.
Step 302, adjusting a number of the target execution units in a stream computing system based on the to-be-adjusted number.
In this embodiment, the execution body may adjust the number of the target execution units in the stream computing system based on the to-be-adjusted number acquired in step 301.
Step 303, determining, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit.
In this embodiment, for the target execution unit in the at least one target execution unit after the adjustment in step 302, the execution body may first determine the identifier set corresponding to the target execution unit. An identifier in the identifier set is used to indicate to-be-processed data.
Step 304, de-duplicating, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, to-be-processed data sent to the target execution unit by an upstream execution unit of the target execution unit.
In this embodiment, according to the historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, the execution body may de-duplicate the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit. The execution body may further remove, from the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit, data repeated with the received to-be-processed data recorded in the historical record. The historical record may include information such as the identifier of the received to-be-processed data.
Step 305, processing, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set.
In this embodiment, through the target execution unit, the execution body may process the to-be-processed data indicated by the identifier in the corresponding identifier set after the de-duplication in step 304, and thus, the target execution unit may merely process the data that has not been processed by the target execution unit.
In this embodiment, the operations in steps 301, 302 and 303 are substantially the same as the operations in steps 201, 202 and 203, which will not be repeatedly described here.
It may be seen from FIG. 3 that, as compared with the embodiment corresponding to FIG. 2, in the flow 300 of the method for processing data in this embodiment, the execution body filters the data that is in the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit and is repeated with the received data recorded in the historical record. Thus, the repeated processing on the data may be avoid, to further improve the information processing efficiency.
Further referring to FIG. 4, FIG. 4 is a schematic diagram of an application scenario of the method for processing data according to this embodiment. In the application scenario of FIG. 4, the stream computing system includes a target execution unit 402, an upstream execution unit 401 of the target execution unit 402, and a downstream execution unit 403 of the target execution unit 402. The number of target execution units 402 in the stream computing system is 2 before being adjusted, and the to-be-adjusted number of the target execution units 402 is acquired as 3. Based on the to-be-adjusted number, the number of the target execution units 402 is adjusted to 3. Before the adjustment, in the target execution units 402, the identifier sets corresponding to the unit having the identifier of 0 are kg0, kg1 and kg2, and the identifier sets corresponding to the unit having the identifier of 1 are kg3, kg4 and kg5. After the adjustment, in the target execution units 402, the identifier sets corresponding to the unit having the identifier of 0 are kg0 and kg1, the identifier sets corresponding to the unit having the identifier of 1 are kg2 and kg3, and the identifier sets corresponding to the unit having the identifier of 2 are kg4 and kg5. The to-be-processed data indicated by an identifier in the identifier sets kg0 and kg1 may be processed through the unit having the identifier of 0. The to-be-processed data indicated by an identifier in the identifier sets kg2 and kg3 may be processed through the unit having the identifier of 1. The to-be-processed data indicated by an identifier in the identifier sets kg4 and kg5 may be processed through the unit having the identifier of 2. Moreover, the data processed by the target execution units 402 may further be persisted into a storage device 404.
Further referring to FIG. 5, as an implementation of the method shown in the above drawings, the present disclosure provides an embodiment of an apparatus for processing data. The embodiment of the apparatus corresponds to the embodiment of the method shown in FIG. 2, and the apparatus may be applied in various electronic devices.
As shown in FIG. 5, the apparatus 500 for processing data in this embodiment includes: an acquiring unit 510, an adjusting unit 520 and a processing unit 530. Here, the acquiring unit 510 is configured to acquire a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system. The adjusting unit 520 is configured to adjust a number of the target execution units in the stream computing system based on the to-be-adjusted number. The processing unit 530 is configured to determine, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and process, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.
In this embodiment, for specific processing of the acquiring unit 510, the adjusting unit 520 and the processing unit 530 in the apparatus 500 for processing data, and the technical effects thereof, reference may be made to relative descriptions of step 201, step 202 and step 203 in the corresponding embodiment of FIG. 2 respectively.
In some alternative implementations of this embodiment, the processing unit 530 includes a persisting subunit 531. The persisting subunit is configured to persist, according to an identifier set to which an identifier of to-be-processed data generated through running of an upstream execution unit of the target execution unit belongs, the generated to-be-processed data through the upstream execution unit of the target execution unit.
In some alternative implementations of this embodiment, the processing unit 530 includes a sending subunit 532. The sending subunit is configured to send indication information to the upstream execution unit of the target execution unit through the target execution unit. The indication information is used to indicate the to-be-processed data generated through the running of the upstream execution unit of the target execution unit and processed by the target execution unit.
In some alternative implementations of this embodiment, the processing unit 530 includes: a starting subunit 533, configured to restart the at least one target execution unit after the adjustment; and a processing subunit 535, configured to receive and process, through the restarted target execution unit, to-be-processed data not processed by the target execution unit, where the to-be-processed data is sent by the upstream execution unit of the target execution unit, is in the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit, and is determined according to the indication information.
In some alternative implementations of this embodiment, the processing unit 530 includes: a de-duplicating subunit 534, configured to de-duplicate, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit; and a processing subunit 535, configured to process, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set.
According to the apparatus provided by the above embodiment of the present disclosure, the to-be-adjusted number of the target execution units is acquired, the target execution unit referring to the unit executing the target program segment in the stream computing system. The number of the target execution units in the stream computing system is adjusted based on the to-be-adjusted number. For the target execution unit in the at least one target execution unit after the adjustment, the identifier set corresponding to the target execution unit is determined, the identifier in the identifier set being used to indicate the to-be-processed data. The to-be-processed data indicated by the identifier in the corresponding identifier set is processed through the target execution unit. Accordingly, an identifier-based data processing mechanism is provided, thus improving the efficiency of processing the data.
Referring to FIG. 6, FIG. 6 is a schematic structural diagram of a computer system 600 adapted to implement a server of the embodiments of the present disclosure. The server shown in FIG. 6 is merely an example, and should not bring any limitations to the functions and the scope of use of the embodiments of the present disclosure.
As shown in FIG. 6, the computer system 600 includes a central processing unit (CPU) 601, which may execute various appropriate actions and processes in accordance with a program stored in a read-only memory (ROM) 602 or a program loaded into a random access memory (RAM) 603 from a storage portion 608. The RAM 603 also stores various programs and data required by operations of the system 600. The CPU 601, the ROM 602 and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to the bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse etc.; an output portion 607 including a cathode ray tube (CRT), a liquid crystal display device (LCD), a speaker etc.; a storage portion 608 including a hard disk and the like; and a communication portion 609 including a network interface card such as a LAN card and a modem. The communication portion 609 performs communication processes via a network such as the Internet. A driver 610 is also connected to the I/O interface 605 as required. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, and a semiconductor memory may be installed on the driver 610, to facilitate the retrieval of a computer program from the removable medium 611, and the installation thereof on the storage portion 608 as needed.
In particular, according to the embodiments of the present disclosure, the process described above with reference to the flow chart may be implemented in a computer software program. For example, an embodiment of the present disclosure includes a computer program product, including a computer program hosted on a computer readable medium, the computer program including program codes for performing the method as illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 609, and/or may be installed from the removable medium 611. The computer program, when executed by the central processing unit (CPU) 601, implements the above mentioned functionalities defined in the method of the present disclosure. It should be noted that the computer readable medium in the present disclosure may be a computer readable signal medium, a computer readable storage medium, or any combination of the two. For example, the computer readable storage medium may be, but not limited to: an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or element, or any combination of the above. A more specific example of the computer readable storage medium may include, but not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read only memory (ROM), an erasable programmable read only memory (EPROM or flash memory), a fibre, a portable compact disk read only memory (CD-ROM), an optical memory, a magnet memory or any suitable combination of the above. In the present disclosure, the computer readable medium may be any physical medium containing or storing programs, which may be used by a command execution system, apparatus or element or incorporated thereto. In the present disclosure, the computer readable signal medium may include a data signal that is propagated in a baseband or as a part of a carrier wave, which carries computer readable program codes. Such propagated data signal may be in various forms, including, but not limited to, an electromagnetic signal, an optical signal, or any suitable combination of the above. The computer readable signal medium may also be any computer readable medium other than the computer readable storage medium. The computer readable medium is capable of transmitting, propagating or transferring programs for use by, or used in combination with, a command execution system, apparatus or element. The program codes contained on the computer readable medium may be transmitted with any suitable medium including, but not limited to, wireless, wired, optical cable, RF medium, or any suitable combination of the above.
A computer program code for executing the operations according to the present disclosure may be written in one or more programming languages or a combination thereof. The programming language includes an object-oriented programming language such as Java, Smalltalk and C++, and further includes a general procedural programming language such as “C” language or a similar programming language. The program codes may be executed entirely on a user computer, executed partially on the user computer, executed as a standalone package, executed partially on the user computer and partially on a remote computer, or executed entirely on the remote computer or a server. When the remote computer is involved, the remote computer may be connected to the user computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or be connected to an external computer (e.g., connected through Internet provided by an Internet service provider).
The flowcharts and block diagrams in the accompanying drawings illustrate architectures, functions and operations that may be implemented according to the system, the method, and the computer program product of the various embodiments of the present disclosure. In this regard, each of the blocks in the flowcharts or block diagrams may represent a module, a program segment, or a code portion, the module, the program segment, or the code portion comprising one or more executable instructions for implementing specified logic functions. It should also be noted that, in some alternative implementations, the functions denoted by the blocks may occur in a sequence different from the sequences shown in the figures. For example, any two blocks presented in succession may be executed substantially in parallel, or they may sometimes be executed in a reverse sequence, depending on the function involved. It should also be noted that each block in the block diagrams and/or flowcharts as well as a combination of blocks may be implemented using a dedicated hardware-based system executing specified functions or operations, or by a combination of dedicated hardware and computer instructions.
The units involved in the embodiments of the present disclosure may be implemented by means of software or hardware. The described units may also be provided in a processor. For example, the processor may be described as: a processor comprising an acquiring unit, an adjusting unit and a processing unit. The names of these units do not in some cases constitute a limitation to such units themselves. For example, the acquiring unit may also be described as “a unit for acquiring a number of targets.”
In another aspect, the present disclosure further provides a computer readable medium. The computer readable medium may be the computer readable medium included in the apparatus described in the above embodiments, or a stand-alone computer readable medium not assembled into the apparatus. The computer readable medium carries one or more programs. The one or more programs, when executed by the apparatus, cause the apparatus to: acquire a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system; adjust a number of the target execution units in the stream computing system based on the to-be-adjusted number; determine, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; and process, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set.
The above description is only an explanation for the preferred embodiments of the present disclosure and the applied technical principles. It should be appreciated by those skilled in the art that the inventive scope of the present disclosure is not limited to the technical solution formed by the particular combinations of the above technical features. The inventive scope should also cover other technical solutions formed by any combinations of the above technical features or equivalent features thereof without departing from the concept of the invention, for example, technical solutions formed by replacing the features as disclosed in the present disclosure with (but not limited to) technical features with similar functions.

Claims (9)

What is claimed is:
1. A method for processing stream data, comprising:
acquiring a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system;
adjusting a number of the target execution units in the stream computing system based on the to-be-adjusted number; and
determining, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; persisting, according to the identifier set to which the identifier of the to-be-processed data generated through running of an upstream execution unit of the target execution unit belongs, the to-be-processed data generated through the upstream execution unit of the target execution unit into a database; and processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set,
wherein determining, for the target execution unit in at least one target execution unit after the adjustment, the identifier set corresponding to the target execution unit, comprises: adjusting a mapping relationship between each target execution unit and each identifier set according to a preset rule, wherein a total number of identifier sets remains unchanged before and after the adjustment of the number of the target execution units in the stream computing system.
2. The method according to claim 1, wherein after the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set, the method further comprises:
sending indication information to the upstream execution unit of the target execution unit through the target execution unit, the indication information being used to indicate the to-be-processed data having been processed by the target execution unit.
3. The method according to claim 2, wherein the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set includes:
starting the at least one target execution unit after the adjustment; and
receiving and processing, through the started target execution unit, the to-be-processed data that is sent by the upstream execution unit of the target execution unit and has been determined, from the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit.
4. The method according to claim 1, wherein the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set includes:
de-duplicating, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit; and processing, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set.
5. An apparatus for processing stream data, comprising:
at least one processor; and
a memory storing instructions, the instructions when executed by the at least one processor, cause the at least one processor to perform operations, the operations comprising:
acquiring a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system;
adjusting a number of the target execution units in the stream computing system based on the to-be-adjusted number; and
determining, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; persisting, according to the identifier set to which the identifier of the to-be-processed data generated through running of an upstream execution unit of the target execution unit belongs, the to-be-processed data generated through the upstream execution unit of the target execution unit into a database; and processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set,
wherein determining, for the target execution unit in at least one target execution unit after the adjustment, the identifier set corresponding to the target execution unit, comprises: adjusting a mapping relationship between each target execution unit and each identifier set according to a preset rule, wherein a total number of identifier sets remains unchanged before and after the adjustment of the number of the target execution units in the stream computing system.
6. The apparatus according to claim 5, wherein after the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set, the operations further comprise:
sending indication information to the upstream execution unit of the target execution unit through the target execution unit, the indication information being used to indicate the to-be-processed data having been processed by the target execution unit.
7. The apparatus according to claim 6, wherein the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set includes:
starting the at least one target execution unit after the adjustment; and
receiving and processing, through the started target execution unit, the to-be-processed data that is sent by the upstream execution unit of the target execution unit and has been determined, from the persisted to-be-processed data indicated by the identifier included in the identifier set corresponding to the target execution unit.
8. The apparatus according to claim 5, wherein the processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set includes:
de-duplicating, according to a historical record of receiving the to-be-processed data by the target execution unit in the stream computing system, the to-be-processed data sent to the target execution unit by the upstream execution unit of the target execution unit; and
processing, through the target execution unit, the de-duplicated to-be-processed data indicated by the identifier in the corresponding identifier set.
9. A non-transitory computer readable medium, storing a computer program, wherein the computer program, when executed by a processor, causes the processor to perform operations, the operations comprising:
acquiring a to-be-adjusted number of target execution units, the target execution unit referring to a unit executing a target program segment in a stream computing system;
adjusting a number of the target execution units in the stream computing system based on the to-be-adjusted number; and
determining, for a target execution unit in at least one target execution unit after the adjustment, an identifier set corresponding to the target execution unit, an identifier in the identifier set being used to indicate to-be-processed data; persisting, according to the identifier set to which the identifier of the to-be-processed data generated through running of an upstream execution unit of the target execution unit belongs, the to-be-processed data generated through the upstream execution unit of the target execution unit into a database; and processing, through the target execution unit, the to-be-processed data indicated by the identifier in the corresponding identifier set,
wherein determining, for the target execution unit in at least one target execution unit after the adjustment, the identifier set corresponding to the target execution unit, comprises: adjusting a mapping relationship between each target execution unit and each identifier set according to a preset rule, wherein a total number of identifier sets remains unchanged before and after the adjustment of the number of the target execution units in the stream computing system.
US16/503,145 2018-07-23 2019-07-03 Method and apparatus for processing data in process of expanding or reducing capacity of stream computing system Active 2040-01-06 US11416283B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810812280.4A CN108984770A (en) 2018-07-23 2018-07-23 Method and apparatus for handling data
CN201810812280.4 2018-07-23

Publications (2)

Publication Number Publication Date
US20200026553A1 US20200026553A1 (en) 2020-01-23
US11416283B2 true US11416283B2 (en) 2022-08-16

Family

ID=64550176

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/503,145 Active 2040-01-06 US11416283B2 (en) 2018-07-23 2019-07-03 Method and apparatus for processing data in process of expanding or reducing capacity of stream computing system

Country Status (2)

Country Link
US (1) US11416283B2 (en)
CN (1) CN108984770A (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459667B (en) * 2020-03-27 2024-01-05 深圳市梦网科技发展有限公司 Data processing method, device, server and medium
CN113764110A (en) * 2021-01-29 2021-12-07 北京京东拓先科技有限公司 Data processing method and device, electronic equipment and storage medium

Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101217491A (en) 2008-01-04 2008-07-09 杭州华三通信技术有限公司 A method of rectification processing unit load allocation method and device
US20100070828A1 (en) * 2006-11-02 2010-03-18 Panasonic Corporation Transmission method, transmitter apparatus and reception method
US20120265890A1 (en) * 2011-04-15 2012-10-18 International Business Machines Corporation Data streaming infrastructure for remote execution in a constrained environment
US20130013901A1 (en) * 1995-08-16 2013-01-10 Microunity Systems Engineering, Inc. System and apparatus for group floating-point inflate and deflate operations
US20130159255A1 (en) * 2011-12-20 2013-06-20 Hitachi Computer Peripherals, Co., Ltd. Storage system and method for controlling storage system
CN103530189A (en) 2013-09-29 2014-01-22 中国科学院信息工程研究所 Automatic scaling and migrating method and device oriented to stream data
CN103782270A (en) 2013-10-28 2014-05-07 华为技术有限公司 Method for managing stream processing system, and related apparatus and system
CN104252466A (en) 2013-06-26 2014-12-31 阿里巴巴集团控股有限公司 Stream computing processing method, equipment and system
CN104298556A (en) 2013-07-17 2015-01-21 华为技术有限公司 Allocation method and device for steam processing units
CN104424186A (en) 2013-08-19 2015-03-18 阿里巴巴集团控股有限公司 Method and device for realizing persistence in flow calculation application
US20150128150A1 (en) * 2012-08-02 2015-05-07 Fujitsu Limited Data processing method and information processing apparatus
CN104978232A (en) 2014-04-09 2015-10-14 阿里巴巴集团控股有限公司 Computation resource capacity expansion method for real-time stream-oriented computation, computation resource release method for real-time stream-oriented computation, computation resource capacity expansion device for real-time stream-oriented computation and computation resource release device for real-time stream-oriented computation
US20160078093A1 (en) * 2005-05-25 2016-03-17 Experian Marketing Solutions, Inc. Software and Metadata Structures for Distributed And Interactive Database Architecture For Parallel And Asynchronous Data Processing Of Complex Data And For Real-Time Query Processing
US20160171009A1 (en) * 2014-12-10 2016-06-16 International Business Machines Corporation Method and apparatus for data deduplication
US20160248688A1 (en) 2015-02-19 2016-08-25 International Business Machines Corporation Algorithmic changing in a streaming environment
US20160373494A1 (en) * 2014-03-06 2016-12-22 Huawei Technologies Co., Ltd. Data Processing Method in Stream Computing System, Control Node, and Stream Computing System
US20170091011A1 (en) * 2015-09-30 2017-03-30 Robert Bosch Gmbh Method and device for generating an output data stream
CN106874133A (en) 2017-01-17 2017-06-20 北京百度网讯科技有限公司 The troubleshooting of calculate node in streaming computing system
US20170201434A1 (en) * 2014-05-30 2017-07-13 Hewlett Packard Enterprise Development Lp Resource usage data collection within a distributed processing framework
US20170212894A1 (en) * 2014-08-01 2017-07-27 Hohai University Traffic data stream aggregate query method and system
US20180083839A1 (en) * 2016-09-22 2018-03-22 International Business Machines Corporation Operator fusion management in a stream computing environment
CN108073445A (en) 2016-11-18 2018-05-25 腾讯科技(深圳)有限公司 The back pressure processing method and system calculated based on distributed stream
US10178021B1 (en) * 2015-12-28 2019-01-08 Amazon Technologies, Inc. Clustered architecture design
US11089076B1 (en) * 2018-03-06 2021-08-10 Amazon Technologies, Inc. Automated detection of capacity for video streaming origin server

Patent Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130013901A1 (en) * 1995-08-16 2013-01-10 Microunity Systems Engineering, Inc. System and apparatus for group floating-point inflate and deflate operations
US20160078093A1 (en) * 2005-05-25 2016-03-17 Experian Marketing Solutions, Inc. Software and Metadata Structures for Distributed And Interactive Database Architecture For Parallel And Asynchronous Data Processing Of Complex Data And For Real-Time Query Processing
US20100070828A1 (en) * 2006-11-02 2010-03-18 Panasonic Corporation Transmission method, transmitter apparatus and reception method
CN101217491A (en) 2008-01-04 2008-07-09 杭州华三通信技术有限公司 A method of rectification processing unit load allocation method and device
US20120265890A1 (en) * 2011-04-15 2012-10-18 International Business Machines Corporation Data streaming infrastructure for remote execution in a constrained environment
US20130159255A1 (en) * 2011-12-20 2013-06-20 Hitachi Computer Peripherals, Co., Ltd. Storage system and method for controlling storage system
US20150128150A1 (en) * 2012-08-02 2015-05-07 Fujitsu Limited Data processing method and information processing apparatus
CN104252466A (en) 2013-06-26 2014-12-31 阿里巴巴集团控股有限公司 Stream computing processing method, equipment and system
US20150026347A1 (en) * 2013-07-17 2015-01-22 Huawei Technologies Co., Ltd. Method and apparatus for allocating stream processing unit
CN104298556A (en) 2013-07-17 2015-01-21 华为技术有限公司 Allocation method and device for steam processing units
CN104424186A (en) 2013-08-19 2015-03-18 阿里巴巴集团控股有限公司 Method and device for realizing persistence in flow calculation application
CN103530189A (en) 2013-09-29 2014-01-22 中国科学院信息工程研究所 Automatic scaling and migrating method and device oriented to stream data
CN103782270A (en) 2013-10-28 2014-05-07 华为技术有限公司 Method for managing stream processing system, and related apparatus and system
US20160373494A1 (en) * 2014-03-06 2016-12-22 Huawei Technologies Co., Ltd. Data Processing Method in Stream Computing System, Control Node, and Stream Computing System
US10097595B2 (en) * 2014-03-06 2018-10-09 Huawei Technologies Co., Ltd. Data processing method in stream computing system, control node, and stream computing system
CN104978232A (en) 2014-04-09 2015-10-14 阿里巴巴集团控股有限公司 Computation resource capacity expansion method for real-time stream-oriented computation, computation resource release method for real-time stream-oriented computation, computation resource capacity expansion device for real-time stream-oriented computation and computation resource release device for real-time stream-oriented computation
US20150295970A1 (en) * 2014-04-09 2015-10-15 Alibaba Group Holding Limited Method and device for augmenting and releasing capacity of computing resources in real-time stream computing system
US20170201434A1 (en) * 2014-05-30 2017-07-13 Hewlett Packard Enterprise Development Lp Resource usage data collection within a distributed processing framework
US20170212894A1 (en) * 2014-08-01 2017-07-27 Hohai University Traffic data stream aggregate query method and system
US20160171009A1 (en) * 2014-12-10 2016-06-16 International Business Machines Corporation Method and apparatus for data deduplication
US20160248688A1 (en) 2015-02-19 2016-08-25 International Business Machines Corporation Algorithmic changing in a streaming environment
US20170091011A1 (en) * 2015-09-30 2017-03-30 Robert Bosch Gmbh Method and device for generating an output data stream
US10178021B1 (en) * 2015-12-28 2019-01-08 Amazon Technologies, Inc. Clustered architecture design
US20180083839A1 (en) * 2016-09-22 2018-03-22 International Business Machines Corporation Operator fusion management in a stream computing environment
CN108073445A (en) 2016-11-18 2018-05-25 腾讯科技(深圳)有限公司 The back pressure processing method and system calculated based on distributed stream
CN106874133A (en) 2017-01-17 2017-06-20 北京百度网讯科技有限公司 The troubleshooting of calculate node in streaming computing system
US11089076B1 (en) * 2018-03-06 2021-08-10 Amazon Technologies, Inc. Automated detection of capacity for video streaming origin server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Chinese Office Action for Chinese Application No. 201810812280.4, dated May 24, 2021, 10 pages.

Also Published As

Publication number Publication date
CN108984770A (en) 2018-12-11
US20200026553A1 (en) 2020-01-23

Similar Documents

Publication Publication Date Title
US20220253458A1 (en) Method and device for synchronizing node data
CN109145023B (en) Method and apparatus for processing data
US10122598B2 (en) Subscription watch lists for event handling
CN111277639B (en) Method and device for maintaining data consistency
US11416283B2 (en) Method and apparatus for processing data in process of expanding or reducing capacity of stream computing system
CN113485962B (en) Log file storage method, device, equipment and storage medium
US10601915B2 (en) Data stream processor with both in memory and persisted messaging
EP3825865A2 (en) Method and apparatus for processing data
CN111338834B (en) Data storage method and device
CN113779452B (en) Data processing method, device, equipment and storage medium
CN113282589A (en) Data acquisition method and device
US11048555B2 (en) Method, apparatus, and computer program product for optimizing execution of commands in a distributed system
US20240069991A1 (en) Abnormal request processing method and apparatus, electronic device and storage medium
US11277300B2 (en) Method and apparatus for outputting information
CN113760487B (en) Service processing method and device
CN114115941A (en) Resource sending method, page rendering method, device, electronic equipment and medium
CN114785770A (en) Mirror layer file sending method and device, electronic equipment and computer readable medium
CN114051024A (en) File background continuous transmission method and device, storage medium and electronic equipment
CN113760929A (en) Data synchronization method and device, electronic equipment and computer readable medium
CN110019671B (en) Method and system for processing real-time message
CN115250276A (en) Distributed system and data processing method and device
CN112784139A (en) Query method, query device, electronic equipment and computer readable medium
CN113761548B (en) Data transmission method and device for Shuffle process
CN114268558B (en) Method, device, equipment and medium for generating monitoring graph
CN116431523B (en) Test data management method, device, equipment and storage medium

Legal Events

Date Code Title Description
AS Assignment

Owner name: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD., CHINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GAO, WEIKANG;WANG, YANLIN;XING, YUE;AND OTHERS;REEL/FRAME:049678/0817

Effective date: 20180827

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: ADVISORY ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE