CN109145023A - Method and apparatus for handling data - Google Patents

Method and apparatus for handling data Download PDF

Info

Publication number
CN109145023A
CN109145023A CN201811003311.8A CN201811003311A CN109145023A CN 109145023 A CN109145023 A CN 109145023A CN 201811003311 A CN201811003311 A CN 201811003311A CN 109145023 A CN109145023 A CN 109145023A
Authority
CN
China
Prior art keywords
data
mark
processing node
pending
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811003311.8A
Other languages
Chinese (zh)
Other versions
CN109145023B (en
Inventor
徐德传
邢越
程怡
张建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811003311.8A priority Critical patent/CN109145023B/en
Publication of CN109145023A publication Critical patent/CN109145023A/en
Application granted granted Critical
Publication of CN109145023B publication Critical patent/CN109145023B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the present application discloses the method and apparatus for handling data.One specific embodiment of this method includes: that the upstream data processing node for the target data processing node that data flow is flow to from streaming computing system obtains the mark of pending data and pending data, streaming computing system includes the data processing node set for being handled data stream, data flow is from data processing node set, data processing node as entrance flows into and flows out data flow system after flowing through at least one data processing node in data processing node set, target data processing node includes the execution unit of the program segment of the characterize data processing logic for the user's submission for executing streaming computing system;The mark of pending data and pending data is sent to execution unit;Obtain the mark that execution unit runs processing result data corresponding with pending data and processing result data generated.This embodiment improves data-handling efficiencies.

Description

Method and apparatus for handling data
Technical field
The invention relates to field of computer technology, more particularly, to handle the method and apparatus of data.
Background technique
In the large-scale distributed calculating scenes such as streaming computing is widely used in information flow, library is built in search, retrieval charging.Streaming Calculating is a kind of data processing mode of similar pipeline system, and streaming computing is from a theory: an event occurs and just stands A data processing is carried out, rather than data buffer storage is got up batch processing.
Existing streaming computing system, in Storm (a distributed, fault-tolerant real time computation system), Yong Huti It is counted between the operation and Storm platform of friendship based on JSON (JavaScript Object Notation, JS object numbered musical notation) According to exchange, data exchange agreement is complicated, and requires after user understands Storm and JSON, could use the streaming computing system System carries out data processing.
Summary of the invention
The embodiment of the present application proposes the method and apparatus for handling data.
In a first aspect, the embodiment of the present application provides a kind of method for handling data, this method comprises: from streaming meter The upstream data processing node of data flow is flow in calculation system target data processing node obtains pending data and to be processed The mark of data, streaming computing system include the data processing node set for being handled data stream, and data flow is from number According to processing node set in, as entrance data processing node flow into and in flowing through data processing node set at least Data flow system is flowed out after one data processing node, target data processing node includes the user for executing streaming computing system The execution unit of the program segment of the characterize data processing logic of submission;Pending data and pending data are sent to execution unit Mark;It obtains execution unit and runs processing result data corresponding with pending data and processing result data generated Mark.
In some embodiments, it obtains execution unit and runs processing result data corresponding with pending data generated Later, method further include: the mark of persistence processing result data and processing result data;It is handled under node to target data Swim the mark that data processing node sends processing result data and processing result data;In response to getting instruction processing result number According to the instruction information having been processed, the processing result data of persistence and the mark of processing result data are removed.
In some embodiments, to target data processing node downstream data processing node send processing result data and After the mark of processing result data, method further include: in response to getting the instruction of instruction processing result data processing failure Information resets the processing result data and processing result number of persistence to the downstream data processing node of target data processing node According to mark.
In some embodiments, to execution unit send pending data and pending data mark, comprising: by into Cheng Guandao sends the mark of pending data and pending data to execution unit.
In some embodiments, the mark of pending data and pending data is sent to execution unit, comprising: according to pre- If agreement sends the mark of pending data and pending data to execution unit, provided at oriented target data in preset protocol The separator used between field and field included by the row data that reason node is sent, field includes the mark of data, number According to keyword, data value and the label that whether is had been processed for determining data.
In some embodiments, method further include: request is executed in response to get the mark including target program section, Execution unit is created with performance objective program segment.
In some embodiments, the topological relation of data processing node passes through pre-set configuration in streaming computing system File description, configuration file includes extensible markup language configuration file;And method further include: generated simultaneously based on configuration file Show the topological diagram of streaming computing system.
Second aspect, the embodiment of the present application provide a kind of for handling the device of data, which includes: the first acquisition Unit, the upstream data processing node for being configured to the target data processing node that data flow is flow to from streaming computing system obtain The mark of pending data and pending data is taken, streaming computing system includes the data processing for being handled data stream Node set, data flow from it is in data processing node set, as entrance data processing node flow into and flowing through data Data flow system is flowed out after at least one data processing node in processing node set, target data processing node includes holding The execution unit of the program segment for the characterize data processing logic that the user of row streaming computing system submits;First transmission unit, quilt It is configured to send the mark of pending data and pending data to execution unit;Second acquisition unit is configured to obtain and hold Row unit runs the mark of processing result data corresponding with pending data and processing result data generated.
In some embodiments, device further include: persistence unit is configured to persistence processing result data and processing The mark of result data;Second transmission unit, the downstream data processing node for being configured to handle node to target data are sent The mark of processing result data and processing result data;Clearing cell is configured in response to get instruction processing result number According to the instruction information having been processed, the processing result data of persistence and the mark of processing result data are removed.
In some embodiments, device further include: playback unit is configured in response to get instruction processing result number According to the instruction information of processing failure, the processing result of persistence is reset to the downstream data processing node of target data processing node The mark of data and processing result data.
In some embodiments, the first transmission unit is further configured to send by process pipeline to execution unit The mark of pending data and pending data.
In some embodiments, the first transmission unit is further configured to send according to preset protocol to execution unit The mark of pending data and pending data provides the row data institute that oriented target data processing node is sent in preset protocol Including field and field between the separator that uses, field include the mark of data, the keyword of data, data value with And the label whether being had been processed for determining data.
In some embodiments, device further include: creating unit is configured in response to get including target program section Mark execute request, create execution unit with performance objective program segment.
In some embodiments, the topological relation of data processing node passes through pre-set configuration in streaming computing system File description, configuration file includes extensible markup language configuration file;And device further include: show unit, be configured to The topological diagram of streaming computing system is generated and showed based on configuration file.
The third aspect, the embodiment of the present application provide a kind of equipment, comprising: one or more processors;Storage device, On be stored with one or more programs, when said one or multiple programs are executed by said one or multiple processors so that on It states one or more processors and realizes such as the above-mentioned method of first aspect.
Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, should Such as first aspect above-mentioned method is realized when program is executed by processor.
Method and apparatus provided by the embodiments of the present application for handling data, pass through the data flow from streaming computing system The upstream data processing node for the target data processing node flowing to obtains the mark of pending data and pending data, target Data processing node includes the execution list of the program segment of the characterize data processing logic for the user's submission for executing streaming computing system Member, and after to execution unit send pending data and pending data mark, finally obtain execution unit operation is generated Processing result data corresponding with pending data and processing result data mark so that the user of streaming computing system It need to only submit the program segment write using programming language known to it that the processing to data can be completed, to improve at data Manage efficiency.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the flow chart according to one embodiment of the method for handling data of the application;
Fig. 3 is a schematic diagram according to the application scenarios of the method for handling data of the application;
Fig. 4 is the flow chart according to another embodiment of the method for handling data of the application;
Fig. 5 is the structural schematic diagram according to one embodiment of the device for handling data of the application;
Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the server of the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using the method for handling data of the application or the implementation of the device for handling data The exemplary system architecture 100 of example.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various client applications, such as the application of streaming computing class, society can be installed on terminal device 101,102,103 Hand over class application, searching class application etc..
Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard When part, it can be the various electronic equipments with display screen, including but not limited to smart phone, tablet computer, on knee portable Computer and desktop computer etc..When terminal device 101,102,103 is software, above-mentioned cited electricity may be mounted at In sub- equipment.Multiple softwares or software module (such as providing data processing service) may be implemented into it, also may be implemented At single software or software module.It is not specifically limited herein.
Server 105 can be to provide the server of various services, such as to installing on terminal device 101,102,103 Using the background server supported is provided, server 105 can be from the target data that data flow is flow in streaming computing system The upstream data processing node for managing node obtains the mark of pending data and pending data, and streaming computing system includes being used for To the data processing node set that data stream is handled, data flow is from number in data processing node set, as entrance It is flowed into according to processing node and flows out data flow after flowing through at least one data processing node in data processing node set System, target data processing node include that the user of execution streaming computing system is submitted by terminal device 101,102,103 Characterize data handles the execution unit of the program segment of logic;The mark of pending data and pending data is sent to execution unit Know;Obtain the mark that execution unit runs processing result data corresponding with pending data and processing result data generated Know.
It should be noted that the method provided by the embodiment of the present application for handling data can be held by server 105 Row, correspondingly, the device for handling data can be set in server 105.
It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software To be implemented as multiple softwares or software module (such as providing Distributed Services), single software or software also may be implemented into Module.It is not specifically limited herein.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the process of one embodiment of the method for handling data according to the application is shown 200.The method for being used to handle data, comprising the following steps:
Step 201, the upstream data for the target data processing node that data flow is flow to from streaming computing system handles section Point obtains the mark of pending data and pending data.
It in the present embodiment, can be first for handling the method executing subject (such as server shown in FIG. 1) of data The upstream data processing node for the target data processing node that data flow is flow to from streaming computing system obtains pending data With the mark of pending data.Streaming computing system includes the data processing node set for being handled data stream, number According to stream from data processing node set, as entrance data processing node flow into and flowing through data processing node set In at least one data processing node after flow out data flow system, it includes executing streaming computing system that target data, which handles node, The execution unit of the program segment for the characterize data processing logic that the user of system submits.The program segment of above-mentioned characterize data processing logic Used programming language can be different from the programming language that streaming computing system uses, the journey of above-mentioned characterize data processing logic Sequence section can be used a variety of programming languages and write, for example, can as user using it known to programming language write.
It herein, may include control node and multiple working nodes in streaming computing system, at working node, that is, data Reason node is referred to as operator (Operator), and control node can send corresponding control instruction to the work section of subordinate Point, so that working node is handled according to the data flow that control instruction calls execution unit to generate business.Each work It may include one or more execution units as node, when working node is called to be handled data stream, specifically by work Make the execution unit that node is included to handle data flow, execution unit is specifically as follows thread or process.As an example, target Data processing node can also include the execution unit for executing the program segment of streaming computing system operation logic, the executing subject The execution unit of the program segment of streaming computing system operation logic can specifically be executed.It may include several in streaming computing system Each and every one streaming computing operation, each streaming computing operation are made of some independent calculating logics according to upstream and downstream subscribing relationship.
In the present embodiment, the upstream data processing node of target data processing node can be saves to target data processing Point provides the data processing node of pending data.Pending data can be the upstream data processing of target data processing node Node operation is generated, and the mark of pending data can be generated according to default rule, for example, can be generated according to data Sequentially, the information such as time, storage location, source are generated to generate.
Step 202, the mark of pending data and pending data is sent to execution unit.
In the present embodiment, the pending data that above-mentioned executing subject can be obtained into execution unit sending step 201 With the mark of pending data.Above-mentioned executing subject can be by modes such as signal, pipeline, message queue, shared drives to holding The mark of row unit transmission pending data and pending data.
In some optional implementations of the present embodiment, pending data and pending data are sent to execution unit Mark, comprising: send the mark of pending data and pending data to execution unit by process pipeline.Process pipeline can be with By calling pipeline (pipe) function to create.Pipeline be it is semiduplex, data can only be flowed to direction;Need both sides logical When letter, need to set up two pipelines;Pipeline is exactly a file for the process of pipe ends, but it is not common File, it is not belonging to certain file system, but keeps house, be separately formed a kind of file system, and only exist with it is interior In depositing.The content that one process is write into pipeline is read by the process of the pipeline other end.The content of write-in is added every time in pipe The end of road buffer area, and be to read data from the head of buffer area every time.It is simple and convenient by pipeline transmission data, into One step improves the convenience of data processing.
In some optional implementations of the present embodiment, pending data and pending data are sent to execution unit Mark, comprising: send the mark of pending data and pending data to execution unit according to preset protocol, advised in preset protocol The separator used between field and field included by the row data that fixed oriented target data processing node is sent, field packet Include the mark of data, the keyword of data, the value of data and the label whether having been processed for determining data.
The row data that can specify that oriented target data processing node is sent in this implementation, in preset protocol are wrapped The separator used between the field and field included, field include the mark of data, the keyword of data, data value and The label whether being had been processed for determining data.As an example, row data can be using such as flowering structure: the mark of data // Label // data keyword // data the value whether being had been processed for determining data.Wherein, separator " // " can be with Other separators, such as "/t/t " are selected according to actual needs, and the sequence between field can also be adjusted according to actual needs It is whole, it can also include first symbol or end mark.Row data are sent to target data processing node according to preset protocol, it is single as a result, Row data can complete the processing to a data, further improve the treatment effeciency of data.
Step 203, it obtains execution unit and runs processing result data corresponding with pending data generated and place Manage the mark of result data.
In the present embodiment, above-mentioned executing subject can also pass through the modes such as signal, pipeline, message queue, shared drive It obtains execution unit and runs processing result data corresponding with pending data that is sending in step 202 generated and processing The mark of result data.The mark of processing result data can also be generated according to default rule, for example, can be raw according to data At sequence, generate the generation of the information such as time, storage location, source.
In some optional implementations of the present embodiment, method further include: in response to getting including target program section Mark execute request, create execution unit with performance objective program segment.Compared to required in storm user platform run Before, the process of performance objective program segment has existed, this implementation creates execution unit according to request is executed to execute mesh Program segment is marked, the management to the life cycle of execution unit is realized, further improves the flexibility of data processing.
In some optional implementations of the present embodiment, the topological relation of data processing node is logical in streaming computing system Pre-set configuration file description is crossed, configuration file includes extensible markup language configuration file;And method further include: base It is generated in configuration file and shows the topological diagram of streaming computing system.This implementation realizes convection type computing system topology knot Structure clearly shows, in addition, user the data processing node of convection type computing system can also carry out in the page showed Control, further improves the flexibility of streaming computing system.
With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for handling data of the present embodiment Figure.In the application scenarios of Fig. 3, the upstream number for the target data processing node that data flow is flow to from streaming computing system first The mark 302 of pending data and pending data is obtained according to processing node 301, streaming computing system includes for data flow The data processing node set handled, data flow is from data processing section in data processing node set, as entrance Point flows into and flows out data flow system after flowing through at least one data processing node in data processing node set, target Data processing node includes that the user of execution streaming computing system handles the program of logic by the characterize data that equipment 305 is submitted The execution unit 303 of section 304;The mark 302 of pending data and pending data is sent after and to execution unit 303;Finally Obtain the mark that execution unit 303 runs processing result data corresponding with pending data and processing result data generated Know 306.
The method provided by the above embodiment of the application passes through the target data that data flow is flow to from streaming computing system The upstream data processing node for handling node obtains the mark of pending data and pending data, and streaming computing system includes using In the data processing node set handled data stream, data flow is from data processing node set, as entrance Data processing node flows into and flows out data after flowing through at least one data processing node in data processing node set Streaming system, target data processing node include the program of the characterize data processing logic for the user's submission for executing streaming computing system The execution unit of section;The mark of pending data and pending data is sent to execution unit;Execution unit operation is obtained to give birth to At processing result data corresponding with pending data and processing result data mark, improve data-handling efficiency.
With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of the method for handling data.The use In the process 400 of the method for processing data, comprising the following steps:
Step 401, the upstream data for the target data processing node that data flow is flow to from streaming computing system handles section Point obtains the mark of pending data and pending data.
It in the present embodiment, can be first for handling the method executing subject (such as server shown in FIG. 1) of data The upstream data processing node for the target data processing node that data flow is flow to from streaming computing system obtains pending data With the mark of pending data.
Step 402, the mark of pending data and pending data is sent to execution unit.
In the present embodiment, the pending data that above-mentioned executing subject can be obtained into execution unit sending step 401 With the mark of pending data.
Step 403, it obtains execution unit and runs processing result data corresponding with pending data generated and place Manage the mark of result data.
In the present embodiment, the available execution unit operation of above-mentioned executing subject is generated sends with step 402 The corresponding processing result data of pending data and processing result data mark.
Step 404, the mark of persistence processing result data and processing result data.
In the present embodiment, above-mentioned executing subject can processing result data and processing to obtain in persistence step 403 The mark of result data.Persistence is the mechanism for converting program data between permanent state and instantaneous state.That is transient data (such as data in memory, be to be unable to persistence) are persisted as persistant data, and (for example persistence is into database, can It is permanent to save).Persistence may include full dose persistence and increment persistence, increment persistence can to avoid Data duplication, into One step improves data-handling efficiency.
Step 405, processing result data and processing knot are sent to the downstream data processing node of target data processing node The mark of fruit data.
In the present embodiment, above-mentioned executing subject can be subscribed to according to the upstream and downstream indicated in pre-set configuration file The processing result data and processing that relationship is obtained into the downstream data processing node sending step 403 of target data processing node The mark of result data.
Step 405, the instruction information having been processed in response to getting instruction processing result data, removes persistence The mark of processing result data and processing result data.
In the present embodiment, above-mentioned executing subject can have been processed in response to getting instruction processing result data It indicates information, removes the processing result data of persistence and the mark of processing result data.Indicate that information may include by under The mark of data that trip data processing node is properly received and handles.As an example, can be by confirming character (Acknowledgement, ACK) is realized, in data communication, confirmation character can be a kind of biography that recipient issues sender Defeated class control character indicates that the data sent have confirmed that reception is errorless.By indicating that the data of erasing of information persistence can be into One step saves memory space.
In some optional implementations of the present embodiment, to the downstream data processing node hair of target data processing node After the mark for sending processing result data and processing result data, method further include: in response to getting instruction processing result number According to the instruction information of processing failure, the processing result of persistence is reset to the downstream data processing node of target data processing node The mark of data and processing result data.Pending data processing failure, which can be target data processing node, the reasons such as restarts and makes At.It is avoided by the playback of data and omits processing data in streaming computing system, further improve data-handling efficiency.
In the present embodiment, step 401, step 402, the operation of step 403 and step 201, step 202, step 203 Operate essentially identical, details are not described herein.
Figure 4, it is seen that the method for handling data compared with the corresponding embodiment of Fig. 2, in the present embodiment Process 400 in by the mark of persistence processing result data and processing result data, can be to avoid in streaming computing system The loss of middle data, further increases data-handling efficiency.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for handling number According to device one embodiment, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer For in various electronic equipments.
As shown in figure 5, the device 500 for handling data of the present embodiment includes: that first acquisition unit 501, first is sent out Send unit 502, second acquisition unit 503.Wherein, first acquisition unit is configured to the data flow stream from streaming computing system The upstream data processing node of target data processing node extremely obtains the mark of pending data and pending data, streaming meter Calculation system includes the data processing node set for being handled data stream, and data flow is from data processing node set , as entrance data processing node flow into and flowing through at least one data processing node in data processing node set Data flow system is flowed out later, and target data processing node includes executing at the characterize data of user's submission of streaming computing system Manage the execution unit of the program segment of logic;First transmission unit is configured to send pending data to execution unit and wait locate Manage the mark of data;It is generated corresponding with pending data to be configured to obtain execution unit operation for second acquisition unit The mark of processing result data and processing result data.
In the present embodiment, for handle the first acquisition unit 501 of the device 500 of data, the first transmission unit 502, The specific processing of second acquisition unit 503 can be with reference to step 201, step 202 and the step 203 in Fig. 2 corresponding embodiment.
In some optional implementations of the present embodiment, device further include: persistence unit is configured at persistence Manage the mark of result data and processing result data;Second transmission unit is configured to handle the downstream of node to target data The mark of data processing node transmission processing result data and processing result data;Clearing cell is configured in response to obtain The instruction information having been processed to instruction processing result data, removes the processing result data and processing result data of persistence Mark.
In some optional implementations of the present embodiment, device further include: playback unit is configured in response to obtain To the instruction information of instruction processing result data processing failure, reset to the downstream data processing node of target data processing node The processing result data of persistence and the mark of processing result data.
In some optional implementations of the present embodiment, the first transmission unit is further configured to through process pipe Road sends the mark of pending data and pending data to execution unit.
In some optional implementations of the present embodiment, the first transmission unit is further configured to according to default association The mark for sending pending data and pending data to execution unit is discussed, oriented target data processing section is provided in preset protocol The separator used between field and field included by the row data that point is sent, field includes the mark of data, data Keyword, the value of data and the label whether being had been processed for determining data.
In some optional implementations of the present embodiment, device further include: creating unit is configured in response to obtain To the request that executes for the mark for including target program section, execution unit is created with performance objective program segment.
In some optional implementations of the present embodiment, the topological relation of data processing node is logical in streaming computing system Pre-set configuration file description is crossed, configuration file includes extensible markup language configuration file;And device further include: exhibition Existing unit, is configured to generate and show based on configuration file the topological diagram of streaming computing system.
The device provided by the above embodiment of the application, passes through the target data that data flow is flow to from streaming computing system The upstream data processing node for handling node obtains the mark of pending data and pending data, and streaming computing system includes using In the data processing node set handled data stream, data flow is from data processing node set, as entrance Data processing node flows into and flows out data after flowing through at least one data processing node in data processing node set Streaming system, target data processing node include the program of the characterize data processing logic for the user's submission for executing streaming computing system The execution unit of section;The mark of pending data and pending data is sent to execution unit;Execution unit operation is obtained to give birth to At processing result data corresponding with pending data and processing result data mark, improve data-handling efficiency.
Below with reference to Fig. 6, it illustrates the computer systems 600 for the server for being suitable for being used to realize the embodiment of the present application Structural schematic diagram.Server shown in Fig. 6 is only an example, should not function and use scope band to the embodiment of the present application Carry out any restrictions.
As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.
It can connect with lower component to I/O interface 605: the importation 606 including keyboard, mouse etc.;Including all The output par, c 607 of such as cathode-ray tube (CRT), liquid crystal display (LCD) and loudspeaker etc.;Storage including hard disk etc. Part 608;And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 passes through Communication process is executed by the network of such as internet.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as disk, CD, magneto-optic disk, semiconductor memory etc., are mounted on as needed on driver 610, in order to from The computer program read thereon is mounted into storage section 608 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 609, and/or from detachable media 611 are mounted.When the computer program is executed by central processing unit (CPU) 601, limited in execution the present processes Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or Computer-readable medium either the two any combination.Computer-readable medium for example can be --- but it is unlimited In system, device or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or any above combination.It calculates The more specific example of machine readable medium can include but is not limited to: electrical connection, portable meter with one or more conducting wires Calculation machine disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or The above-mentioned any appropriate combination of person.In this application, computer-readable medium, which can be, any includes or storage program has Shape medium, the program can be commanded execution system, device or device use or in connection.And in the application In, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, wherein Carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to electric Magnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable Jie Any computer-readable medium other than matter, the computer-readable medium can be sent, propagated or transmitted for being held by instruction Row system, device or device use or program in connection.The program code for including on computer-readable medium It can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. or above-mentioned any conjunction Suitable combination.
The calculating of the operation for executing the application can be write with one or more programming languages or combinations thereof Machine program code, described program design language include object oriented program language-such as Java, Smalltalk, C+ +, it further include conventional procedural programming language-such as C language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service It is connected for quotient by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include first acquisition unit, the first transmission unit and second acquisition unit.Wherein, the title of these units is not under certain conditions The restriction to the unit itself is constituted, for example, the first transmission unit is also described as " being configured to send to execution unit The unit of the mark of pending data and pending data ".
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in device described in above-described embodiment;It is also possible to individualism, and without in the supplying device.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the device, so that should Device: the upstream data processing node for the target data processing node that data flow is flow to from streaming computing system obtains to be processed The mark of data and pending data, streaming computing system include the data processing node collection for being handled data stream Close, data flow from it is in data processing node set, as entrance data processing node flow into and flowing through data processing section Data flow system is flowed out after at least one data processing node in point set, it includes executing streaming that target data, which handles node, The execution unit of the program segment for the characterize data processing logic that the user of computing system submits;Number to be processed is sent to execution unit According to the mark with pending data;Obtain execution unit run processing result data corresponding with pending data generated with And the mark of processing result data.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (16)

1. a kind of method for handling data, comprising:
The upstream data processing node for the target data processing node that data flow is flow to from streaming computing system obtains to be processed The mark of data and the pending data, the streaming computing system include the data processing for being handled data stream Node set, the data flow from it is in data processing node set, as entrance data processing node flow into and flowing through The data flow system is flowed out after at least one data processing node in data processing node set, at the target data Reason node includes the execution unit of the program segment of the characterize data processing logic for the user's submission for executing the streaming computing system;
The mark of the pending data and the pending data is sent to the execution unit;
It obtains the execution unit and runs processing result data corresponding with the pending data generated and the place Manage the mark of result data.
2. according to the method described in claim 1, wherein, it is described obtain the execution unit operation it is generated with described wait locate After managing the corresponding processing result data of data, the method also includes:
The mark of processing result data described in persistence and the processing result data;
The processing result data and processing knot are sent to the downstream data processing node of target data processing node The mark of fruit data;
In response to getting the instruction information for indicating that the processing result data has been processed, the processing of persistence is removed The mark of result data and the processing result data.
3. according to the method described in claim 2, wherein, the downstream data to target data processing node handles section After point sends the mark of the processing result data and the processing result data, the method also includes:
The instruction information that the processing result data processing failure is indicated in response to getting handles node to the target data Downstream data processing node reset persistence the processing result data and the processing result data mark.
It is described to send the pending data and described to the execution unit 4. according to the method described in claim 1, wherein The mark of pending data, comprising:
The mark of the pending data and the pending data is sent to the execution unit by process pipeline.
It is described to send the pending data and described to the execution unit 5. according to the method described in claim 1, wherein The mark of pending data, comprising:
The mark of the pending data and the pending data is sent to the execution unit according to preset protocol, it is described pre- If being used between field and field included by the row data for providing the oriented target data processing node transmission in agreement Separator, the field includes the mark of data, the keyword of data, the value of data and for determining whether data are located Manage the label completed.
6. according to the method described in claim 1, wherein, the method also includes:
In response to getting the request that executes of the mark including target program section, execution unit is created to execute the target program Section.
7. method according to claim 1 to 6, wherein data processing node in the streaming computing system Topological relation is described by pre-set configuration file, and the configuration file includes extensible markup language configuration file;With And
The method also includes:
The topological diagram of the streaming computing system is generated and showed based on the configuration file.
8. a kind of for handling the device of data, comprising:
First acquisition unit is configured to the upstream number for the target data processing node that data flow is flow to from streaming computing system The mark of pending data and the pending data is obtained according to processing node, the streaming computing system includes for data The data processing node set that stream is handled, the data flow is from data in data processing node set, as entrance Processing node flows into and flows out the data after flowing through at least one data processing node in data processing node set Streaming system, the target data processing node include that the characterize data processing for the user's submission for executing the streaming computing system is patrolled The execution unit for the program segment collected;
First transmission unit is configured to send the mark of the pending data and the pending data to the execution unit Know;
Second acquisition unit is configured to obtain the execution unit and runs place corresponding with the pending data generated Manage the mark of result data and the processing result data.
9. device according to claim 8, wherein described device further include:
Persistence unit is configured to the mark of processing result data described in persistence and the processing result data;
Second transmission unit, the downstream data processing node for being configured to handle node to the target data send the processing The mark of result data and the processing result data;
Clearing cell is configured in response to get the instruction information for indicating that the processing result data has been processed, clearly Except the processing result data of persistence and the mark of the processing result data.
10. device according to claim 9, wherein described device further include:
Playback unit is configured in response to get the instruction information for indicating the processing result data processing failure, to institute The downstream data processing node for stating target data processing node resets the processing result data and processing knot of persistence The mark of fruit data.
11. device according to claim 8, wherein first transmission unit is further configured to through process pipe Road sends the mark of the pending data and the pending data to the execution unit.
12. device according to claim 8, wherein first transmission unit is further configured to according to default association The mark for sending the pending data and the pending data to the execution unit is discussed, provides have in the preset protocol The separator used between field and field included by the row data sent to target data processing node, the word Section includes the mark of data, the keyword of data, the value of data and the label whether having been processed for determining data.
13. device according to claim 8, wherein described device further include:
Creating unit is configured in response to get the request that executes of the mark including target program section, creates execution unit To execute the target program section.
14. the device according to any one of claim 8-13, wherein data processing node in the streaming computing system Topological relation described by pre-set configuration file, the configuration file includes extensible markup language configuration file; And
Described device further include:
Show unit, is configured to generate and show based on the configuration file topological diagram of the streaming computing system.
15. a kind of electronic equipment, comprising:
One or more processors;
Storage device is stored thereon with one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors Realize the method as described in any in claim 1-7.
16. a kind of computer-readable medium, is stored thereon with computer program, such as right is realized when which is executed by processor It is required that any method in 1-7.
CN201811003311.8A 2018-08-30 2018-08-30 Method and apparatus for processing data Active CN109145023B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811003311.8A CN109145023B (en) 2018-08-30 2018-08-30 Method and apparatus for processing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811003311.8A CN109145023B (en) 2018-08-30 2018-08-30 Method and apparatus for processing data

Publications (2)

Publication Number Publication Date
CN109145023A true CN109145023A (en) 2019-01-04
CN109145023B CN109145023B (en) 2020-11-27

Family

ID=64829523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811003311.8A Active CN109145023B (en) 2018-08-30 2018-08-30 Method and apparatus for processing data

Country Status (1)

Country Link
CN (1) CN109145023B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110460495A (en) * 2019-08-01 2019-11-15 北京百度网讯科技有限公司 A kind of water level propulsion method, device, calculate node and storage medium
CN111142925A (en) * 2019-12-23 2020-05-12 山东浪潮通软信息科技有限公司 Pipeline type data processing method, equipment and storage medium
CN111324345A (en) * 2020-03-19 2020-06-23 北京奇艺世纪科技有限公司 Data processing mode generation method, data processing method and device and electronic equipment
CN111435939A (en) * 2019-01-14 2020-07-21 百度在线网络技术(北京)有限公司 Method and device for dividing storage space of node
CN111488495A (en) * 2020-04-14 2020-08-04 北京字节跳动网络技术有限公司 Information processing method and device
CN111930748A (en) * 2020-08-07 2020-11-13 北京百度网讯科技有限公司 Data tracking method, device, equipment and storage medium for streaming computing system
WO2021212385A1 (en) * 2020-04-22 2021-10-28 深圳市欢太科技有限公司 Data testing method and device, server, and data processing system
CN113965511A (en) * 2020-07-02 2022-01-21 北京瀚海云星科技有限公司 Tag data transmission method based on RDMA (remote direct memory Access), and related device and system
CN114090481A (en) * 2020-07-02 2022-02-25 北京瀚海云星科技有限公司 Data sending method, data receiving method and related device
CN114328501A (en) * 2020-09-29 2022-04-12 华为技术有限公司 Data processing method, device and equipment
CN116662325A (en) * 2023-07-24 2023-08-29 宁波森浦信息技术有限公司 Data processing method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090063515A1 (en) * 2007-09-05 2009-03-05 International Business Machines Corporation Optimization model for processing hierarchical data in stream systems
CN105959151A (en) * 2016-06-22 2016-09-21 中国工商银行股份有限公司 High availability stream processing system and method
CN107046510A (en) * 2017-01-13 2017-08-15 广西电网有限责任公司电力科学研究院 A kind of node and its system of composition suitable for distributed computing system
CN107229747A (en) * 2017-06-26 2017-10-03 湖南星汉数智科技有限公司 A kind of large-scale data processing unit and method based on Stream Processing framework
CN107277087A (en) * 2016-04-06 2017-10-20 阿里巴巴集团控股有限公司 Data processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090063515A1 (en) * 2007-09-05 2009-03-05 International Business Machines Corporation Optimization model for processing hierarchical data in stream systems
CN107277087A (en) * 2016-04-06 2017-10-20 阿里巴巴集团控股有限公司 Data processing method and device
CN105959151A (en) * 2016-06-22 2016-09-21 中国工商银行股份有限公司 High availability stream processing system and method
CN107046510A (en) * 2017-01-13 2017-08-15 广西电网有限责任公司电力科学研究院 A kind of node and its system of composition suitable for distributed computing system
CN107229747A (en) * 2017-06-26 2017-10-03 湖南星汉数智科技有限公司 A kind of large-scale data processing unit and method based on Stream Processing framework

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111435939A (en) * 2019-01-14 2020-07-21 百度在线网络技术(北京)有限公司 Method and device for dividing storage space of node
CN110460495A (en) * 2019-08-01 2019-11-15 北京百度网讯科技有限公司 A kind of water level propulsion method, device, calculate node and storage medium
CN110460495B (en) * 2019-08-01 2024-02-23 北京百度网讯科技有限公司 Water level propelling method and device, computing node and storage medium
CN111142925A (en) * 2019-12-23 2020-05-12 山东浪潮通软信息科技有限公司 Pipeline type data processing method, equipment and storage medium
CN111324345A (en) * 2020-03-19 2020-06-23 北京奇艺世纪科技有限公司 Data processing mode generation method, data processing method and device and electronic equipment
CN111488495A (en) * 2020-04-14 2020-08-04 北京字节跳动网络技术有限公司 Information processing method and device
WO2021212385A1 (en) * 2020-04-22 2021-10-28 深圳市欢太科技有限公司 Data testing method and device, server, and data processing system
CN113965511A (en) * 2020-07-02 2022-01-21 北京瀚海云星科技有限公司 Tag data transmission method based on RDMA (remote direct memory Access), and related device and system
CN114090481A (en) * 2020-07-02 2022-02-25 北京瀚海云星科技有限公司 Data sending method, data receiving method and related device
CN111930748B (en) * 2020-08-07 2023-08-08 北京百度网讯科技有限公司 Method, device, equipment and storage medium for tracking data of streaming computing system
CN111930748A (en) * 2020-08-07 2020-11-13 北京百度网讯科技有限公司 Data tracking method, device, equipment and storage medium for streaming computing system
CN114328501A (en) * 2020-09-29 2022-04-12 华为技术有限公司 Data processing method, device and equipment
CN116662325A (en) * 2023-07-24 2023-08-29 宁波森浦信息技术有限公司 Data processing method and system
CN116662325B (en) * 2023-07-24 2023-11-10 宁波森浦信息技术有限公司 Data processing method and system

Also Published As

Publication number Publication date
CN109145023B (en) 2020-11-27

Similar Documents

Publication Publication Date Title
CN109145023A (en) Method and apparatus for handling data
CN109523187A (en) Method for scheduling task, device and equipment
CN110245011A (en) A kind of method for scheduling task and device
CN109033001A (en) Method and apparatus for distributing GPU
CN108763534B (en) Method and apparatus for handling information
US20200004464A1 (en) Method and apparatus for storing data
CN109309736A (en) The generation method and generating means of globally unique ID
CN110213614A (en) The method and apparatus of key frame are extracted from video file
US11502899B2 (en) Dynamic product installation based on user feedback
CN110427304A (en) O&M method, apparatus, electronic equipment and medium for banking system
CN110391938A (en) Method and apparatus for deployment services
CN110334109A (en) Relational database data query method, system, medium and electronic equipment
CN108965098A (en) Based on information push method, device, medium and the electronic equipment being broadcast live online
CN109976919A (en) A kind of transmission method and device of message request
CN111610938B (en) Distributed data code storage method, electronic device and computer readable storage medium
CN111044062A (en) Path planning and recommending method and device
CN110109912A (en) A kind of identifier generation method and device
CN108984770A (en) Method and apparatus for handling data
CN109005250A (en) Method and apparatus for accessing server-side
CN110381471A (en) The method and apparatus for determining optimum base station for unmanned vehicle
CN110830427A (en) Method and device for message encoding and message decoding in netty environment
CN111414161B (en) Method, device, medium and electronic equipment for generating IDL file
CN114461582A (en) File processing method, device, equipment and storage medium
CN108092858B (en) For switching the method and device of agent node
CN112732835A (en) Block chain-based heterogeneous data storage method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant