WO2017058214A1 - Method and apparatus to manage multiple analytics applications - Google Patents

Method and apparatus to manage multiple analytics applications Download PDF

Info

Publication number
WO2017058214A1
WO2017058214A1 PCT/US2015/053298 US2015053298W WO2017058214A1 WO 2017058214 A1 WO2017058214 A1 WO 2017058214A1 US 2015053298 W US2015053298 W US 2015053298W WO 2017058214 A1 WO2017058214 A1 WO 2017058214A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
analysis process
analysis
streaming
incoming data
Prior art date
Application number
PCT/US2015/053298
Other languages
French (fr)
Inventor
Hiroaki Shikano
Yukinori Sakashita
Yasutaka Kono
Original Assignee
Hitachi, Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi, Ltd. filed Critical Hitachi, Ltd.
Priority to PCT/US2015/053298 priority Critical patent/WO2017058214A1/en
Publication of WO2017058214A1 publication Critical patent/WO2017058214A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2416Real-time traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/27Evaluation or update of window size, e.g. using information derived from acknowledged [ACK] packets

Definitions

  • the present disclosure relates generally to information technology (IT) systems, and more specifically, to management of IT systems based on analytics applications executed on the IT system.
  • IT information technology
  • a related art information processing apparatus includes a determination unit that determines whether to change a number of processes allocated to one or more program modules. This change is based on a measurement result of a load of each program module when the program modules that form an application are executed using scalable processing resources. Such related art implementations control processing resources based on the previous execution of the same program.
  • example implementations described herein utilize amount of data and characteristics of the analysis to conduct analysis management.
  • Example implementations described herein are directed to a method and apparatus to manage preprocessing of source data targeted for analytics to adjust the size of data obtained as a result of the preprocessing in advance of analytics processing in accordance with analytics resource requirement prediction.
  • the example implementations also involve a method and apparatus to adjust window size of streaming data processing in real-time analytics processing in accordance with the resource requirement prediction.
  • Example implementations can involve an apparatus, which can include a memory configured to store analysis information comprising a plurality of analysis processes, each of the plurality of analysis processes associated with data reduction information; and, a processor, configured to retrieve an analysis process from a queue; and determine available system resources to process the analysis process.
  • the processor can be configured to calculate processing time for the analysis process based on the determined available system resources.
  • the processor can be configured to adjust a data rate of incoming data for the analysis process.
  • the processor is configured to adjust the incoming data for the analysis process with a streaming process.
  • Example implementations can further include a method, which can involve storing analysis information comprising a plurality of analysis processes, each of the plurality of analysis processes associated with data reduction information; retrieving an analysis process from a queue; and determining available system resources to process the analysis process.
  • the method can involve calculating processing time for the analysis process based on the determined available system resources.
  • the method can include adjusting a data rate of incoming data for the analysis process.
  • the method can include adjusting the incoming data for the analysis process with a streaming process.
  • Example implementations can further include a computer program having instructions, which can involve storing analysis information comprising a plurality of analysis processes, each of the plurality of analysis processes associated with data reduction information; retrieving an analysis process from a queue; and determining available system resources to process the analysis process.
  • the instructions can involve calculating processing time for the analysis process based on the determined available system resources.
  • the instructions can include adjusting a data rate of incoming data for the analysis process.
  • the instructions can include adjusting the incoming data for the analysis process with a streaming process.
  • the computer program can be stored in a non-transitory computer readable medium.
  • FIG. 1 illustrates a system configuration, in accordance with an example implementation.
  • FIG. 2 illustrates example processing flows for hybrid analysis, in accordance with an example implementation.
  • FIG. 3 illustrates a conceptual diagram of data adjustment for batch analysis, in accordance with an example implementation.
  • FIG. 4 illustrates a conceptual diagram of data adjustment for streaming analysis, in accordance with an example implementation.
  • FIG. 5 illustrates a table for data adjustment for streaming and batch analysis, in accordance with an example implementation.
  • FIGS. 6(a) and 6(b) illustrate an example flow diagram for conducting hybrid analysis, in accordance with an example implementation.
  • FIG. 7 illustrates the processing flow of the streaming analysis management program, in accordance with an example implementation.
  • FIGS. 8(a) to 8(d) illustrate an example schedule of resource usage over time, in accordance with an example implementation.
  • FIG. 9 illustrates an example computer environment upon which example implementations may be implemented.
  • FIG. 1 illustrates a system configuration, in accordance with an example implementation.
  • the system may include servers, storage, management node and a network connecting each component.
  • Elements of the system include Servers (SRV), Analysis processes (ANL), Operating Systems (OS), Virtual machines (VM), Routers (RT), Networks (NW), Storage devices (STR), Management nodes (MNG), Job schedulers (JS), Batch processing management programs (BPMP), Stream processing management programs (SPMP), Sensors (SNSR), Terminals (TRM), and Controllers (CTRL).
  • SSV Servers
  • Analysis processes NNL
  • OS Operating Systems
  • VM Virtual machines
  • RT Routers
  • NW Networks
  • Storage devices STR
  • Management nodes MNG
  • Job schedulers JS
  • BPMP Batch processing management programs
  • SPMP Stream processing management programs
  • SPMP Sensors
  • TRM Terminals
  • CTRL Controllers
  • VM but example implementations are not limited to bare metal servers, and other environments may be substituted therefor without departing from the inventive scope.
  • Sensors are connected to the system via network (NW) and network routers (RT). Data generated from such sensors are stored into storage or directly transferred to preprocessing processes (PRP) on servers (SRV). The result of preprocessing (PRP) of the sensor data is stored into storage or directly transferred to analysis processes (ANL), and the analysis processes are executed on servers (SRV).
  • Analysis result is transmitted to terminals (TRM) for visualizing the analysis result on user's display. It can also be transmitted to controllers (CTRL) which activates some actuators based on the analysis result.
  • TRM terminals
  • CRL controllers
  • Management node includes job scheduler (JS), batch processing management program (BPMP) and stream processing management program (SPMP).
  • Job scheduler assigns preprocessing processes (PRP) and analysis processes (ANL) to processing nodes of VM.
  • Batch processing management program estimates processing time of target preprocessing processes and analysis processes assigned by job scheduler (JS) and the program controls preprocessing processes (PRP) for adjusting size of data inputted to bath analysis processes.
  • Streaming processing management program estimates monitors preprocessing processes and streaming analysis processes and adjust size of data inputted to streaming analysis processes, or window size of streaming processing analysis.
  • FIG. 2 illustrates example processing flows for hybrid analysis, in accordance with an example implementation.
  • the hybrid analysis of FIG. 2 is the collaboration of real-time analysis with historical analysis.
  • NW network
  • network analysis 202 real-time packet processing 203
  • DB time-series database
  • customer usage analysis 206 customer relations analysis 207
  • customer recommendation 208 customer recommendation 208.
  • the NW packet 209 is send to deep packet inspection 204 for processing.
  • Deep packet inspection 204 extracts packet header information, internet protocol (IP) flow data, or service session data (e.g., video) from the NW packet 209.
  • Real-time packet processing 203 receives the packet information from deep packet inspection 204 and produces statistical data of packet flows or service sessions.
  • Statistical data produced by real-time packet processing 203 is stored into time- series DB 205, which facilitates faster reading of time-indexed data.
  • Time-series DB has a data structure optimized for time-series data and creates a time-based index in the data store for faster reading. It may include special queries of time-related functions such as group-by-time.
  • Real-time packet processing 203 may also be configured to change the window size of the packets being processed.
  • Network analysis 202 detects congestion or service degradation based on processed packets from real-time packet processing 203.
  • Network analysis 202 notifies NW policy controller 201 to change network settings (e.g. route tables or router parameters such as buffer size and so on).
  • the network analysis 202 predicts future traffic based on current packet usage, past traffic usage stored in time-series DB 205, and recommendations for new content or new billing plan for customers.
  • Time-series DB 205 can send data samples to network analysis 202 and customer usage analysis 206, that is specified by time range and time step, depending on the desired implementation.
  • Customer usage analysis 206 analyzes what content is accessed by the customers based on analyzed packet data from time-series DB 205 and stored customer data 210. Customer relation analysis 206 analyzes customer behavior from their network usage and creates recommendations for contents or a new billing plan as customer recommendation 208. Customer relation analysis 206 also uses network traffic prediction from network analysis 202, which can facilitate better network use by giving, for example, some discounts to customers for unused time.
  • FIG. 3 illustrates a conceptual diagram of data adjustment for batch analysis, in accordance with an example implementation.
  • the data supplied for batch analysis is stored on storage.
  • Preprocessing process PRP reads the stored data and adjusts the data size by conducting sampling.
  • FIG. 4 illustrates a conceptual diagram of data adjustment for streaming analysis, in accordance with an example implementation.
  • the data can be directly supplied to the streaming analysis without requiring data storage.
  • the preprocessing process is also streaming processing, which samples data or creates statistical data.
  • the preprocessing process can be in the form of add-ons of new streaming analysis. For streaming analysis, changing the window size affects the processing performance requirements.
  • the window is defined as a period of recent data targeted for processing.
  • the window size can be either element-based or time-based.
  • Element-based window size is based on the number of data elements processed, and time-based window size is based on a period of time resulting in the number of data elements varies by data arrival rate.
  • larger memory capacity and additional computation resources may be required, however, more accurate calculation can be expected with a wider-range/period of data.
  • the window size is extended to take in five blocks of data in view of the incoming stream data.
  • FIG. 5 illustrates a table for data adjustment for streaming and batch analysis, in accordance with an example implementation.
  • FIG. 5 illustrates an analysis processing characteristics table.
  • the table can include performance requirement items for each analysis process such as central processing unit (CPU) usage 501, memory usage 502, and Disk input/output (I/O) 503, as well as a time period for batch execution 504.
  • CPU Usage 501 indicates the amount of CPUs required for the analysis.
  • Memory usage 502 indicates the memory size required to conduct the analysis.
  • Disk I/O 503 indicates the I/O minimum throughput required to execute the analysis.
  • Time period for batch execution 504 indicates the time period within which one batch execution must be completed. For streaming analysis, there is no period because streaming analysis is in operation for processing the incoming sequence of data, as illustrated with ANLY C.
  • the table also has a processing time formulation 505 based on given IT resources such as CPU, memory, storage and network bandwidth, which can be used for the calculation of processing time for analysis process by using how much data is supplied [Q_data] for batch analysis, or how much data throughput [T_data] is (or how often data arrives) supplied for streaming analysis.
  • the table also includes data arrival conditions 506 to determine the data required to start the batch analysis processes.
  • Processing types 507 e.g., batch, stream
  • Processing types 507 indicate the type of processing to be used in the analysis.
  • processing starts when data is ready and data size can be measured.
  • data size can be measured.
  • throughput of data arrival can be observed.
  • a management tool or a special program can be utilized to measure data size or throughput, depending on the desired implementation.
  • P_T ( Na_CPU / N_CPU ) * ( Ta_DISK / T_DISK ) * ( Qa_data / Q_data ) *
  • the data reduction rate 508 indicates the allowed amount of data reduction permitted, if applicable, and the data reduction method 509 indicates the type of data reduction to use. For example, compression indicates that when a sequence of identical data arrives, the identical data is filtered for data size reduction. Preprocessing specific indicates that the users can adjust specific parameters that affect the data size of the target preprocessing result.
  • the data reduction rate 508 and data reduction method 509 can be set by the admin of the management server, or by the applications depending on the desired implementation.
  • FIGS. 6(a) and 6(b) illustrate an example flow diagram for conducting hybrid analysis, in accordance with an example implementation.
  • the hybrid analysis can include batch analysis and streaming analysis integrated in a hybrid approach.
  • the flow begins at 601 wherein the management program checks the batch analysis process queues. Each of the entries includes a batch to be executed at a particular time interval. At 602, a check is performed to determine if new analysis processing is ready. During this process, the management program may look at the data to be sent, and look at the process to be utilized on the data. If so (Yes) then the flow proceeds to 603, otherwise (No) the flow returns to 601. At 603, the management program reads the analysis application characteristics table as illustrated in FIG. 5. At 604, the management program determines the available resources by comparing, for example, the amount of data to be analyzed versus the Central Processing Unit (CPU)/Memory available. At 605, a determination is made if the processing is to be batch analysis (conducted in intervals at specified times). If so (Yes), then the flow proceeds to 606, otherwise (No), the flow proceeds to execute the real time analysis as illustrated in FIG. 7.
  • CPU Central Processing Unit
  • the batch program calculates the batch analysis processing time.
  • the management program refers to the allowed data reduction rate from the analysis application characteristics table.
  • a determination is made if data adjustment is allowed. If so (Yes) then the flow proceeds to 611 and send a notification the data adjustment to the user.
  • the management program sets the data adjustment to reduce the data rate based on the information of FIG. 5 (e.g., determining allowed data reduction rate and allowed reduction technique), and proceeds to the flow at 608 of FIG. 6(a) to execute the analysis processing.
  • an alert is sent to the user to indicate that overtime may be required to execute the requested batch analysis, as there is not enough resources to satisfy the required analysis within the desired time frame.
  • a check is performed to determine if the user permits overtime processing for the batch analysis. If so (Yes), then the process is canceled at 617 for later processing if requested. Otherwise (No), the flow proceeds to 608 for processing.
  • FIG. 7 illustrates the processing flow of the streaming analysis management program, in accordance with an example implementation.
  • the streaming analysis process begins.
  • the management program reads the analytics application characteristics table.
  • the management program monitors the batch analysis process management and determines the throughput of the system.
  • a check is made to determine if real time analytics adjustment is required. If so (Yes), then the flow proceeds to 704. Otherwise (No), the flow proceeds to 708.
  • the management program determines if data adjustment is allowed.
  • the flow proceeds to 705, otherwise (No) the flow proceeds to 707 to send the user a notification for a failure of the real-time analysis adjustment and proceeds back to 702.
  • the management program sends a notification to the user regarding the data adjustment.
  • the management program sets the data adjustment, which can be in the form of data sampling or window size adjustments and then proceeds back to 702.
  • a check is performed to determine if real-time analysis adjustment is set. If so (Yes), then the flow proceeds to 709 to send a notification to the user and no adjustment is set. Otherwise (No), the flow proceeds back to 702.
  • FIG. 8(a) illustrates an example schedule of resource usage over time, in accordance with an example implementation.
  • the graph illustrates when sufficient batch analysis and real-time analysis are executed on the resources assigned as specified in the characteristics table.
  • resource allocation for batch analysis is limited first.
  • the batch analysis can be assigned under the limited resources.
  • FIG. 8(b) illustrates an example schedule of resource usage over time with extended resources, when overtime is given. In the situation where resources are further limited and the resource for batch analysis is limited first, and there is allowance for overtime of a given deadline for the batch analysis, the data reduction for batch analysis is applied.
  • FIG. 9 illustrates an example computing environment with an example computer device suitable for use in some example implementations, such as an apparatus to facilitate the functionality of navigating another movable apparatus.
  • Computer device 905 in computing environment 900 can include one or more processing units, cores, or processors 910, memory 915 (e.g., RAM, ROM, and/or the like), internal storage 920 (e.g., magnetic, optical, solid state storage, and/or organic), and/or I/O interface 925, any of which can be coupled on a communication mechanism or bus 930 for communicating information or embedded in the computer device 905.
  • memory 915 e.g., RAM, ROM, and/or the like
  • internal storage 920 e.g., magnetic, optical, solid state storage, and/or organic
  • I/O interface 925 any of which can be coupled on a communication mechanism or bus 930 for communicating information or embedded in the computer device 905.
  • Computer device 905 can be communicatively coupled to input/user interface 935 and output device/interface 940. Either one or both of input/user interface 935 and output device/interface 940 can be a wired or wireless interface and can be detachable.
  • Input/user interface 935 may include any device, component, sensor, or interface, physical or virtual, that can be used to provide input (e.g., buttons, touch-screen interface, keyboard, a pointing/cursor control, microphone, camera, braille, motion sensor, optical reader, and/or the like).
  • Output device/interface 940 may include a display, television, monitor, printer, speaker, braille, or the like.
  • input/user interface 935 and output device/interface 940 can be embedded with or physically coupled to the computer device 905.
  • other computer devices may function as or provide the functions of input/user interface 935 and output device/interface 940 for a computer device 905.
  • Examples of computer device 905 may include, but are not limited to, highly mobile devices (e.g., smartphones, devices in vehicles and other machines, devices carried by humans and animals, and the like), mobile devices (e.g., tablets, notebooks, laptops, personal computers, portable televisions, radios, and the like), and devices not designed for mobility (e.g., desktop computers, other computers, information kiosks, televisions with one or more processors embedded therein and/or coupled thereto, radios, and the like).
  • highly mobile devices e.g., smartphones, devices in vehicles and other machines, devices carried by humans and animals, and the like
  • mobile devices e.g., tablets, notebooks, laptops, personal computers, portable televisions, radios, and the like
  • devices not designed for mobility e.g., desktop computers, other computers, information kiosks, televisions with one or more processors embedded therein and/or coupled thereto, radios, and the like.
  • Computer device 905 can be communicatively coupled (e.g., via I/O interface 925) to external storage 945 and network 950 for communicating with any number of networked components, devices, and systems, including one or more computer devices of the same or different configuration.
  • Computer device 905 or any connected computer device can be functioning as, providing services of, or referred to as a server, client, thin server, general machine, special-purpose machine, or another label.
  • I/O interface 925 can include, but is not limited to, wired and/or wireless interfaces using any communication or I O protocols or standards (e.g., Ethernet, 802. l lx, Universal System Bus, WiMax, modem, a cellular network protocol, and the like) for communicating information to and/or from at least all the connected components, devices, and network in computing environment 900.
  • Network 950 can be any network or combination of networks (e.g., the Internet, local area network, wide area network, a telephonic network, a cellular network, satellite network, and the like).
  • Computer device 905 can use and/or communicate using computer-usable or computer-readable media, including transitory media and non-transitory media.
  • Transitory media include transmission media (e.g., metal cables, fiber optics), signals, carrier waves, and the like.
  • Non-transitory media include magnetic media (e.g., disks and tapes), optical media (e.g., CD ROM, digital video disks, Blu-ray disks), solid state media (e.g., RAM, ROM, flash memory, solid-state storage), and other non-volatile storage or memory.
  • Computer device 905 can be used to implement techniques, methods, applications, processes, or computer-executable instructions in some example computing environments.
  • Computer-executable instructions can be retrieved from transitory media, and stored on and retrieved from non-transitory media.
  • the executable instructions can originate from one or more of any programming, scripting, and machine languages (e.g., C, C++, C#, Java, Visual Basic, Python, Perl, JavaScript, and others).
  • Processor(s) 910 can execute under any operating system (OS) (not shown), in a native or virtual environment.
  • OS operating system
  • One or more applications can be deployed that include logic unit 960, application programming interface (API) unit 965, input unit 970, output unit 975, and inter-unit communication mechanism 995 for the different units to communicate with each other, with the OS, and with other applications (not shown).
  • API application programming interface
  • the described units and elements can be varied in design, function, configuration, or implementation and are not limited to the descriptions provided.
  • API unit 965 when information or an execution instruction is received by API unit 965, it may be communicated to one or more other units (e.g., logic unit 960, input unit 970, output unit 975).
  • logic unit 960 may be configured to control the information flow among the units and direct the services provided by API unit 965, input unit 970, output unit 975, in some example implementations described above.
  • the flow of one or more processes or implementations may be controlled by logic unit 960 alone or in conjunction with API unit 965.
  • the input unit 970 may be configured to obtain input for the calculations described in the example implementations
  • the output unit 975 may be configured to provide output based on the calculations described in example implementations.
  • Memory 915 can be configured to store analysis information which can involve a plurality of analysis processes, each of the plurality of analysis processes associated with data reduction information such as the data reduction rate and the data reduction process as illustrated in FIG. 5, as well as resource usage information such as CPU and memory usage, conditions for processing the process and so on.
  • Processor(s) 910 can be configured to execute the flow diagrams as illustrated in FIG. 6(a), 6(b) and 7.
  • processor(s) 910 can be configured to retrieve an analysis process from a queue and determine available system resources to process the analysis process as illustrated in FIG. 6(a).
  • the processor(s) 910 can be configured to calculate processing time for the analysis process based on the determined available system resources as illustrated in FIG. 6(a) and 6(b).
  • the processor(s) 910 can be configured adjust a data rate of incoming data for the analysis process as illustrated in FIG. 6(a) and 6(b).
  • processor(s) 910 may be further configured to adjust the incoming data for the analysis process with a streaming process, which can involve conducting at least one of data sampling and adjusting window size of the incoming data as the incoming data is received, as illustrated in FIG. 6(a), 6(b) and 7.
  • Processor(s) 910 may also be configured to, for the analysis process being a streaming process, monitor throughput rate of the analysis process. Further, for an adjustment of the throughput rate being triggered, the processor(s) 910 can be configured to conduct at least one of data sampling and adjusting window size of the streaming process.
  • the processor(s) 910 may be configured to place the analysis process in the queue for overtime processing indicated as being allowed, and execute the analysis process for the overtime processing indicated as not being allowed; and for the analysis process not meeting the processing time after adjustment of the incoming data for the analysis process with the streaming process, the processor(s) 910 may be configured to place the analysis process in the queue as illustrated in FIGS. 6(a), 6(b) and 7.
  • the processor(s) 910 may be configured to adjust a data rate of incoming data for the analysis process by utilizing a data compression process indicated by the data reduction information as illustrated in FIG. 5, and can be configured to calculate the processing time based on a function involving a number of assignable central processing unit cores, disk input/output throughput to be assigned, and size of the incoming data, as illustrated in the equation described above.
  • the effects can include a more efficient use of the limited resources for processing multiple analytics applications and processing of multiple analytics applications in the manner of keeping deadline of analytics within allowed precision of analytics.
  • Example implementations are directed to a method and apparatus to manage preprocessing of source data targeted for analytics to adjust the size of data obtained as a result of the preprocessing in advance of analytics processing in accordance with analytics resource requirement prediction.
  • Example implementations are also directed to the method and apparatus to adjust the window size of the streaming data processing in real-time analytics processing in accordance with the resource requirement prediction.
  • the effects of the example implementations of the present disclosure can include more efficient use of the limited resources for processing multiple analytics applications and processing of multiple analytics applications for keeping the analytics deadline within the allowed precision.
  • Example implementations may also relate to an apparatus for performing the operations herein.
  • This apparatus may be specially constructed for the required purposes, or it may include one or more general-purpose computers selectively activated or reconfigured by one or more computer programs.
  • Such computer programs may be stored in a computer readable medium, such as a computer-readable storage medium or a computer-readable signal medium.
  • a computer-readable storage medium may involve tangible mediums such as, but not limited to optical disks, magnetic disks, read-only memories, random access memories, solid state devices and drives, or any other types of tangible or non-transitory media suitable for storing electronic information.
  • a computer readable signal medium may include mediums such as carrier waves.
  • the algorithms and displays presented herein are not inherently related to any particular computer or other apparatus.
  • Computer programs can involve pure software implementations that involve instructions that perform the operations of the desired implementation.
  • the operations described above can be performed by hardware, software, or some combination of software and hardware.
  • Various aspects of the example implementations may be implemented using circuits and logic devices (hardware), while other aspects may be implemented using instructions stored on a machine -readable medium (software), which if executed by a processor, would cause the processor to perform a method to carry out implementations of the present application.
  • some example implementations of the present application may be performed solely in hardware, whereas other example implementations may be performed solely in software.
  • the various functions described can be performed in a single unit, or can be spread across a number of components in any number of ways.
  • the methods may be executed by a processor, such as a general purpose computer, based on instructions stored on a computer-readable medium. If desired, the instructions can be stored on the medium in a compressed and/or encrypted format.

Abstract

Example implementations involve a method and apparatus to manage preprocessing of source data targeted for analytics to adjust the size of data obtained as a result of the preprocessing in advance of analytics processing in accordance with analytics resource requirement prediction. Example implementations are also directed to method and apparatus to adjust the window size of streaming data processing in real-time analytics processing in accordance with the resource requirement prediction. The effects of the example implementations can include efficient use of the resources for processing multiple analytics applications and processing of multiple analytics applications while maintaining analytics thresholds within an allowed precision.

Description

METHOD AND APPARATUS TO MANAGE MULTIPLE ANALYTICS
APPLICATIONS
BACKGROUND
Field
[0001] The present disclosure relates generally to information technology (IT) systems, and more specifically, to management of IT systems based on analytics applications executed on the IT system.
Related Art
[0002] Related art analytics applications exist for business and social infrastructure such as business intelligence, customer marketing, efficient system operations, and so on. In the related art, such multiple types of analytics applications are processed in cooperation on dedicated multiple systems dedicated for each type of such analytics applications. There is a demand for consolidating such multiple analytics applications on IT infrastructure, however such infrastructure often has a limited amount of resources. Therefore, there is a need for management of such analytics applications.
[0003] A related art information processing apparatus includes a determination unit that determines whether to change a number of processes allocated to one or more program modules. This change is based on a measurement result of a load of each program module when the program modules that form an application are executed using scalable processing resources. Such related art implementations control processing resources based on the previous execution of the same program.
SUMMARY
[0004] In contrast to implementations of the related art, example implementations described herein utilize amount of data and characteristics of the analysis to conduct analysis management.
[0005] Example implementations described herein are directed to a method and apparatus to manage preprocessing of source data targeted for analytics to adjust the size of data obtained as a result of the preprocessing in advance of analytics processing in accordance with analytics resource requirement prediction. The example implementations also involve a method and apparatus to adjust window size of streaming data processing in real-time analytics processing in accordance with the resource requirement prediction.
[0006] Example implementations can involve an apparatus, which can include a memory configured to store analysis information comprising a plurality of analysis processes, each of the plurality of analysis processes associated with data reduction information; and, a processor, configured to retrieve an analysis process from a queue; and determine available system resources to process the analysis process. For the analysis process being a batch process, the processor can be configured to calculate processing time for the analysis process based on the determined available system resources. For the analysis process not meeting the processing time, and the data reduction information indicating adjustment of data is permitted for the analysis process, the processor can be configured to adjust a data rate of incoming data for the analysis process. Further, for the analysis process not meeting the processing time and the data reduction information indicating adjustment of data is not permitted for the analysis process, the processor is configured to adjust the incoming data for the analysis process with a streaming process.
[0007] Example implementations can further include a method, which can involve storing analysis information comprising a plurality of analysis processes, each of the plurality of analysis processes associated with data reduction information; retrieving an analysis process from a queue; and determining available system resources to process the analysis process. For the analysis process being a batch process, the method can involve calculating processing time for the analysis process based on the determined available system resources. For the analysis process not meeting the processing time, and the data reduction information indicating adjustment of data is permitted for the analysis process, the method can include adjusting a data rate of incoming data for the analysis process. For the analysis process not meeting the processing time and the data reduction information indicating adjustment of data is not permitted for the analysis process, the method can include adjusting the incoming data for the analysis process with a streaming process.
[0008] Example implementations can further include a computer program having instructions, which can involve storing analysis information comprising a plurality of analysis processes, each of the plurality of analysis processes associated with data reduction information; retrieving an analysis process from a queue; and determining available system resources to process the analysis process. For the analysis process being a batch process, the instructions can involve calculating processing time for the analysis process based on the determined available system resources. For the analysis process not meeting the processing time, and the data reduction information indicating adjustment of data is permitted for the analysis process, the instructions can include adjusting a data rate of incoming data for the analysis process. For the analysis process not meeting the processing time and the data reduction information indicating adjustment of data is not permitted for the analysis process, the instructions can include adjusting the incoming data for the analysis process with a streaming process. The computer program can be stored in a non-transitory computer readable medium.
BRIEF DESCRIPTION OF DRAWINGS
[0009] FIG. 1 illustrates a system configuration, in accordance with an example implementation.
[0010] FIG. 2 illustrates example processing flows for hybrid analysis, in accordance with an example implementation.
[0011] FIG. 3 illustrates a conceptual diagram of data adjustment for batch analysis, in accordance with an example implementation.
[0012] FIG. 4 illustrates a conceptual diagram of data adjustment for streaming analysis, in accordance with an example implementation.
[0013] FIG. 5 illustrates a table for data adjustment for streaming and batch analysis, in accordance with an example implementation.
[0014] FIGS. 6(a) and 6(b) illustrate an example flow diagram for conducting hybrid analysis, in accordance with an example implementation.
[0015] FIG. 7 illustrates the processing flow of the streaming analysis management program, in accordance with an example implementation. [0016] FIGS. 8(a) to 8(d) illustrate an example schedule of resource usage over time, in accordance with an example implementation.
[0017] FIG. 9 illustrates an example computer environment upon which example implementations may be implemented.
DETAILED DESCRIPTION OF THE DRAWINGS
[0018] The following detailed description provides further details of the figures and example implementations of the present application. Reference numerals and descriptions of redundant elements between figures are omitted for clarity. Terms used throughout the description are provided as examples and are not intended to be limiting. For example, the use of the term "automatic" may involve fully automatic or semiautomatic implementations involving user or operator control over certain aspects of the implementation, depending on the desired implementation of one of ordinary skill in the art practicing implementations of the present disclosure.
[0019] Analytics has been applied to related art applications within the business and social infrastructure such as business intelligence, customer marketing, efficient system operations, and so on. There has been demand for consolidating such multiple analytics applications on IT infrastructure, however such infrastructure often has a limited amount of resources. Therefore, more efficient management of processing such multiple analytics applications on the aforesaid IT infrastructure can be required.
[0020] In example implementations, there is a management program that manages performance thresholds while considering the characteristics of the applications executed on the IT system.
[0021] FIG. 1 illustrates a system configuration, in accordance with an example implementation. The system may include servers, storage, management node and a network connecting each component. Elements of the system include Servers (SRV), Analysis processes (ANL), Operating Systems (OS), Virtual machines (VM), Routers (RT), Networks (NW), Storage devices (STR), Management nodes (MNG), Job schedulers (JS), Batch processing management programs (BPMP), Stream processing management programs (SPMP), Sensors (SNSR), Terminals (TRM), and Controllers (CTRL). [0022] Analytics and preprocessing processes are executed in virtual machine
(VM) but example implementations are not limited to bare metal servers, and other environments may be substituted therefor without departing from the inventive scope. Sensors (SNSR) are connected to the system via network (NW) and network routers (RT). Data generated from such sensors are stored into storage or directly transferred to preprocessing processes (PRP) on servers (SRV). The result of preprocessing (PRP) of the sensor data is stored into storage or directly transferred to analysis processes (ANL), and the analysis processes are executed on servers (SRV).
[0023] Analysis result is transmitted to terminals (TRM) for visualizing the analysis result on user's display. It can also be transmitted to controllers (CTRL) which activates some actuators based on the analysis result.
[0024] Management node (MNG) includes job scheduler (JS), batch processing management program (BPMP) and stream processing management program (SPMP). Job scheduler assigns preprocessing processes (PRP) and analysis processes (ANL) to processing nodes of VM. Batch processing management program estimates processing time of target preprocessing processes and analysis processes assigned by job scheduler (JS) and the program controls preprocessing processes (PRP) for adjusting size of data inputted to bath analysis processes. Streaming processing management program estimates monitors preprocessing processes and streaming analysis processes and adjust size of data inputted to streaming analysis processes, or window size of streaming processing analysis.
[0025] FIG. 2 illustrates example processing flows for hybrid analysis, in accordance with an example implementation. Specifically, the hybrid analysis of FIG. 2 is the collaboration of real-time analysis with historical analysis. In the flow there is a network (NW) policy controller 201, network analysis 202, real-time packet processing 203, deep packet inspection 204, time-series database (DB) 205, customer usage analysis 206, customer relations analysis 207, and customer recommendation 208.
[0026] In an example execution of the flow, the NW packet 209 is send to deep packet inspection 204 for processing. Deep packet inspection 204 extracts packet header information, internet protocol (IP) flow data, or service session data (e.g., video) from the NW packet 209. Real-time packet processing 203 receives the packet information from deep packet inspection 204 and produces statistical data of packet flows or service sessions. Statistical data produced by real-time packet processing 203 is stored into time- series DB 205, which facilitates faster reading of time-indexed data. Time-series DB has a data structure optimized for time-series data and creates a time-based index in the data store for faster reading. It may include special queries of time-related functions such as group-by-time. Real-time packet processing 203 may also be configured to change the window size of the packets being processed.
[0027] Network analysis 202 detects congestion or service degradation based on processed packets from real-time packet processing 203. Network analysis 202 notifies NW policy controller 201 to change network settings (e.g. route tables or router parameters such as buffer size and so on). The network analysis 202 predicts future traffic based on current packet usage, past traffic usage stored in time-series DB 205, and recommendations for new content or new billing plan for customers. Time-series DB 205 can send data samples to network analysis 202 and customer usage analysis 206, that is specified by time range and time step, depending on the desired implementation.
[0028] Customer usage analysis 206 analyzes what content is accessed by the customers based on analyzed packet data from time-series DB 205 and stored customer data 210. Customer relation analysis 206 analyzes customer behavior from their network usage and creates recommendations for contents or a new billing plan as customer recommendation 208. Customer relation analysis 206 also uses network traffic prediction from network analysis 202, which can facilitate better network use by giving, for example, some discounts to customers for unused time.
[0029] FIG. 3 illustrates a conceptual diagram of data adjustment for batch analysis, in accordance with an example implementation. The data supplied for batch analysis is stored on storage. Preprocessing process PRP reads the stored data and adjusts the data size by conducting sampling. In the example depicted in FIG. 3, data blocks 1, 2,
3, 4, 5, 6 are sampled by PRP to provide the sample data of blocks 1, 3, and 5 to batch analysis. Other sampling methods are also possible, depending on the desired implementation. For example, sampling can include other statistical calculations such as taking the average or the median of a data block. [0030] FIG. 4 illustrates a conceptual diagram of data adjustment for streaming analysis, in accordance with an example implementation. In example implementations, the data can be directly supplied to the streaming analysis without requiring data storage. The preprocessing process is also streaming processing, which samples data or creates statistical data. The preprocessing process can be in the form of add-ons of new streaming analysis. For streaming analysis, changing the window size affects the processing performance requirements. The window is defined as a period of recent data targeted for processing. The window size can be either element-based or time-based. Element-based window size is based on the number of data elements processed, and time-based window size is based on a period of time resulting in the number of data elements varies by data arrival rate. For analysis that requires a larger window size, larger memory capacity and additional computation resources may be required, however, more accurate calculation can be expected with a wider-range/period of data. By reducing the window size, the consumption of computational resources can be reduced but the accuracy of analysis may be lowered. In the example depicted in FIG. 4, the window size is extended to take in five blocks of data in view of the incoming stream data.
[0031] FIG. 5 illustrates a table for data adjustment for streaming and batch analysis, in accordance with an example implementation. Specifically, FIG. 5 illustrates an analysis processing characteristics table. The table can include performance requirement items for each analysis process such as central processing unit (CPU) usage 501, memory usage 502, and Disk input/output (I/O) 503, as well as a time period for batch execution 504. CPU Usage 501 indicates the amount of CPUs required for the analysis. Memory usage 502 indicates the memory size required to conduct the analysis. Disk I/O 503 indicates the I/O minimum throughput required to execute the analysis.
[0032] Time period for batch execution 504 indicates the time period within which one batch execution must be completed. For streaming analysis, there is no period because streaming analysis is in operation for processing the incoming sequence of data, as illustrated with ANLY C. The table also has a processing time formulation 505 based on given IT resources such as CPU, memory, storage and network bandwidth, which can be used for the calculation of processing time for analysis process by using how much data is supplied [Q_data] for batch analysis, or how much data throughput [T_data] is (or how often data arrives) supplied for streaming analysis.
[0033] The table also includes data arrival conditions 506 to determine the data required to start the batch analysis processes. Processing types 507 (e.g., batch, stream) indicate the type of processing to be used in the analysis. For the batch type of analysis, processing starts when data is ready and data size can be measured. For the streaming or real-time analysis, throughput of data arrival can be observed. In both cases, a management tool or a special program can be utilized to measure data size or throughput, depending on the desired implementation.
[0034] In an example for processing time formulation, assume that the memory size specified in the table is needed for executing a particular analysis. Assume the processing time to be calculated as P_T, the number of CPU cores as N_CPU, throughput of disk I/O as T_DISK, execution time with resources provided as specified in the table as T_exec, data size when T_exec measured as Q_data, number of CPU cores to be assigned as Na_CPU, throughput of disk I/O to be assigned as Ta_DISK, actual data size as Qa_data. From the above, the processing calculation time can be formulated as:
P_T = ( Na_CPU / N_CPU ) * ( Ta_DISK / T_DISK ) * ( Qa_data / Q_data ) *
T_exec
[0035] Other formulations are also possible depending on the desired implementation.
[0036] The data reduction rate 508 indicates the allowed amount of data reduction permitted, if applicable, and the data reduction method 509 indicates the type of data reduction to use. For example, compression indicates that when a sequence of identical data arrives, the identical data is filtered for data size reduction. Preprocessing specific indicates that the users can adjust specific parameters that affect the data size of the target preprocessing result. The data reduction rate 508 and data reduction method 509 can be set by the admin of the management server, or by the applications depending on the desired implementation. [0037] FIGS. 6(a) and 6(b) illustrate an example flow diagram for conducting hybrid analysis, in accordance with an example implementation. The hybrid analysis can include batch analysis and streaming analysis integrated in a hybrid approach. The flow begins at 601 wherein the management program checks the batch analysis process queues. Each of the entries includes a batch to be executed at a particular time interval. At 602, a check is performed to determine if new analysis processing is ready. During this process, the management program may look at the data to be sent, and look at the process to be utilized on the data. If so (Yes) then the flow proceeds to 603, otherwise (No) the flow returns to 601. At 603, the management program reads the analysis application characteristics table as illustrated in FIG. 5. At 604, the management program determines the available resources by comparing, for example, the amount of data to be analyzed versus the Central Processing Unit (CPU)/Memory available. At 605, a determination is made if the processing is to be batch analysis (conducted in intervals at specified times). If so (Yes), then the flow proceeds to 606, otherwise (No), the flow proceeds to execute the real time analysis as illustrated in FIG. 7.
[0038] At 606, the batch program calculates the batch analysis processing time. At
607, a determination is made as to whether the batch analysis processing time would satisfy the requirement of the analysis application characteristics table. If so (Yes), the flow proceeds to 608 to execute the analysis processing. Otherwise (No) the flow proceeds to 609 of FIG. 6(b).
[0039] At 609, as the batch analysis processing time does not satisfy the requirement, the management program refers to the allowed data reduction rate from the analysis application characteristics table. At 610, a determination is made if data adjustment is allowed. If so (Yes) then the flow proceeds to 611 and send a notification the data adjustment to the user. At 612, the management program sets the data adjustment to reduce the data rate based on the information of FIG. 5 (e.g., determining allowed data reduction rate and allowed reduction technique), and proceeds to the flow at 608 of FIG. 6(a) to execute the analysis processing.
[0040] If data adjustment is not allowed (No), then the flow proceeds to 613 to trigger the real time analytics adjustment as illustrated in FIG. 7. At 614, a determination is made as to whether the adjustment to real time analytics was effective. If so (Yes), the flow proceeds to 608 to execute the analysis processing. Otherwise (No), the flow proceeds to 615 for overflow processing.
[0041] At 615, an alert is sent to the user to indicate that overtime may be required to execute the requested batch analysis, as there is not enough resources to satisfy the required analysis within the desired time frame. At 616, a check is performed to determine if the user permits overtime processing for the batch analysis. If so (Yes), then the process is canceled at 617 for later processing if requested. Otherwise (No), the flow proceeds to 608 for processing.
[0042] FIG. 7 illustrates the processing flow of the streaming analysis management program, in accordance with an example implementation. At 700, the streaming analysis process begins. At 701, the management program reads the analytics application characteristics table. At 702, the management program monitors the batch analysis process management and determines the throughput of the system. At 703, a check is made to determine if real time analytics adjustment is required. If so (Yes), then the flow proceeds to 704. Otherwise (No), the flow proceeds to 708.
[0043] At 704, the management program determines if data adjustment is allowed.
If so (Yes), then the flow proceeds to 705, otherwise (No) the flow proceeds to 707 to send the user a notification for a failure of the real-time analysis adjustment and proceeds back to 702. At 705, the management program sends a notification to the user regarding the data adjustment. At 706, the management program sets the data adjustment, which can be in the form of data sampling or window size adjustments and then proceeds back to 702.
[0044] At 708, a check is performed to determine if real-time analysis adjustment is set. If so (Yes), then the flow proceeds to 709 to send a notification to the user and no adjustment is set. Otherwise (No), the flow proceeds back to 702.
[0045] FIG. 8(a) illustrates an example schedule of resource usage over time, in accordance with an example implementation. In the example of FIG. 8(a), the graph illustrates when sufficient batch analysis and real-time analysis are executed on the resources assigned as specified in the characteristics table. When resources are not sufficient for the analysis, resource allocation for batch analysis is limited first. When the deadline is still satisfied, the batch analysis can be assigned under the limited resources. FIG. 8(b) illustrates an example schedule of resource usage over time with extended resources, when overtime is given. In the situation where resources are further limited and the resource for batch analysis is limited first, and there is allowance for overtime of a given deadline for the batch analysis, the data reduction for batch analysis is applied. In FIG. 8(c), where the data reduction is applied to the customer and the available resources have been completely utilized, the analysis for customer usage and customer relations can be satisfied within the deadline. When resources are further limited and data reduction is applied but overtime of deadline is caused (assuming overtime is not allowed), the window size adjustment for real-time analysis is applied. In FIG. 8(d), when there is a deadline to meet for the customer relations analysis, wherein there is a deadline to meet for customer relations analysis, the windows size for real-time packet processing is reduced so that more resources are allocated to customer relations analysis to meet its deadline.
[0046] FIG. 9 illustrates an example computing environment with an example computer device suitable for use in some example implementations, such as an apparatus to facilitate the functionality of navigating another movable apparatus. Computer device 905 in computing environment 900 can include one or more processing units, cores, or processors 910, memory 915 (e.g., RAM, ROM, and/or the like), internal storage 920 (e.g., magnetic, optical, solid state storage, and/or organic), and/or I/O interface 925, any of which can be coupled on a communication mechanism or bus 930 for communicating information or embedded in the computer device 905.
[0047] Computer device 905 can be communicatively coupled to input/user interface 935 and output device/interface 940. Either one or both of input/user interface 935 and output device/interface 940 can be a wired or wireless interface and can be detachable. Input/user interface 935 may include any device, component, sensor, or interface, physical or virtual, that can be used to provide input (e.g., buttons, touch-screen interface, keyboard, a pointing/cursor control, microphone, camera, braille, motion sensor, optical reader, and/or the like). Output device/interface 940 may include a display, television, monitor, printer, speaker, braille, or the like. In some example implementations, input/user interface 935 and output device/interface 940 can be embedded with or physically coupled to the computer device 905. In other example implementations, other computer devices may function as or provide the functions of input/user interface 935 and output device/interface 940 for a computer device 905.
[0048] Examples of computer device 905 may include, but are not limited to, highly mobile devices (e.g., smartphones, devices in vehicles and other machines, devices carried by humans and animals, and the like), mobile devices (e.g., tablets, notebooks, laptops, personal computers, portable televisions, radios, and the like), and devices not designed for mobility (e.g., desktop computers, other computers, information kiosks, televisions with one or more processors embedded therein and/or coupled thereto, radios, and the like).
[0049] Computer device 905 can be communicatively coupled (e.g., via I/O interface 925) to external storage 945 and network 950 for communicating with any number of networked components, devices, and systems, including one or more computer devices of the same or different configuration. Computer device 905 or any connected computer device can be functioning as, providing services of, or referred to as a server, client, thin server, general machine, special-purpose machine, or another label.
[0050] I/O interface 925 can include, but is not limited to, wired and/or wireless interfaces using any communication or I O protocols or standards (e.g., Ethernet, 802. l lx, Universal System Bus, WiMax, modem, a cellular network protocol, and the like) for communicating information to and/or from at least all the connected components, devices, and network in computing environment 900. Network 950 can be any network or combination of networks (e.g., the Internet, local area network, wide area network, a telephonic network, a cellular network, satellite network, and the like).
[0051] Computer device 905 can use and/or communicate using computer-usable or computer-readable media, including transitory media and non-transitory media. Transitory media include transmission media (e.g., metal cables, fiber optics), signals, carrier waves, and the like. Non-transitory media include magnetic media (e.g., disks and tapes), optical media (e.g., CD ROM, digital video disks, Blu-ray disks), solid state media (e.g., RAM, ROM, flash memory, solid-state storage), and other non-volatile storage or memory. [0052] Computer device 905 can be used to implement techniques, methods, applications, processes, or computer-executable instructions in some example computing environments. Computer-executable instructions can be retrieved from transitory media, and stored on and retrieved from non-transitory media. The executable instructions can originate from one or more of any programming, scripting, and machine languages (e.g., C, C++, C#, Java, Visual Basic, Python, Perl, JavaScript, and others).
[0053] Processor(s) 910 can execute under any operating system (OS) (not shown), in a native or virtual environment. One or more applications can be deployed that include logic unit 960, application programming interface (API) unit 965, input unit 970, output unit 975, and inter-unit communication mechanism 995 for the different units to communicate with each other, with the OS, and with other applications (not shown). The described units and elements can be varied in design, function, configuration, or implementation and are not limited to the descriptions provided.
[0054] In some example implementations, when information or an execution instruction is received by API unit 965, it may be communicated to one or more other units (e.g., logic unit 960, input unit 970, output unit 975). In some instances, logic unit 960 may be configured to control the information flow among the units and direct the services provided by API unit 965, input unit 970, output unit 975, in some example implementations described above. For example, the flow of one or more processes or implementations may be controlled by logic unit 960 alone or in conjunction with API unit 965. The input unit 970 may be configured to obtain input for the calculations described in the example implementations, and the output unit 975 may be configured to provide output based on the calculations described in example implementations.
[0055] Memory 915 can be configured to store analysis information which can involve a plurality of analysis processes, each of the plurality of analysis processes associated with data reduction information such as the data reduction rate and the data reduction process as illustrated in FIG. 5, as well as resource usage information such as CPU and memory usage, conditions for processing the process and so on.
[0056] Processor(s) 910 can be configured to execute the flow diagrams as illustrated in FIG. 6(a), 6(b) and 7. For example, processor(s) 910 can be configured to retrieve an analysis process from a queue and determine available system resources to process the analysis process as illustrated in FIG. 6(a). For the analysis process being a batch process the processor(s) 910 can be configured to calculate processing time for the analysis process based on the determined available system resources as illustrated in FIG. 6(a) and 6(b). For the analysis process not meeting the processing time, and the data reduction information indicating adjustment of data is permitted for the analysis process, the processor(s) 910 can be configured adjust a data rate of incoming data for the analysis process as illustrated in FIG. 6(a) and 6(b). Further, for the analysis process not meeting the processing time and the data reduction information indicating adjustment of data is not permitted for the analysis process, processor(s) 910 may be further configured to adjust the incoming data for the analysis process with a streaming process, which can involve conducting at least one of data sampling and adjusting window size of the incoming data as the incoming data is received, as illustrated in FIG. 6(a), 6(b) and 7.
[0057] Processor(s) 910 may also be configured to, for the analysis process being a streaming process, monitor throughput rate of the analysis process. Further, for an adjustment of the throughput rate being triggered, the processor(s) 910 can be configured to conduct at least one of data sampling and adjusting window size of the streaming process.
[0058] For the analysis process not meeting the processing time after adjustment of the data rate of incoming data for the analysis process, the processor(s) 910 may be configured to place the analysis process in the queue for overtime processing indicated as being allowed, and execute the analysis process for the overtime processing indicated as not being allowed; and for the analysis process not meeting the processing time after adjustment of the incoming data for the analysis process with the streaming process, the processor(s) 910 may be configured to place the analysis process in the queue as illustrated in FIGS. 6(a), 6(b) and 7.
[0059] Further, the processor(s) 910 may be configured to adjust a data rate of incoming data for the analysis process by utilizing a data compression process indicated by the data reduction information as illustrated in FIG. 5, and can be configured to calculate the processing time based on a function involving a number of assignable central processing unit cores, disk input/output throughput to be assigned, and size of the incoming data, as illustrated in the equation described above.
[0060] Through application of the example implementations described herein, the effects can include a more efficient use of the limited resources for processing multiple analytics applications and processing of multiple analytics applications in the manner of keeping deadline of analytics within allowed precision of analytics.
[0061] Example implementations are directed to a method and apparatus to manage preprocessing of source data targeted for analytics to adjust the size of data obtained as a result of the preprocessing in advance of analytics processing in accordance with analytics resource requirement prediction. Example implementations are also directed to the method and apparatus to adjust the window size of the streaming data processing in real-time analytics processing in accordance with the resource requirement prediction.
[0062] The effects of the example implementations of the present disclosure can include more efficient use of the limited resources for processing multiple analytics applications and processing of multiple analytics applications for keeping the analytics deadline within the allowed precision.
[0063] Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations within a computer. These algorithmic descriptions and symbolic representations are the means used by those skilled in the data processing arts to convey the essence of their innovations to others skilled in the art. An algorithm is a series of defined steps leading to a desired end state or result. In example implementations, the steps carried out require physical manipulations of tangible quantities for achieving a tangible result.
[0064] Unless specifically stated otherwise, as apparent from the discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing," "computing," "calculating," "determining," "displaying," or the like, can include the actions and processes of a computer system or other information processing device that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system's memories or registers or other information storage, transmission or display devices.
[0065] Example implementations may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may include one or more general-purpose computers selectively activated or reconfigured by one or more computer programs. Such computer programs may be stored in a computer readable medium, such as a computer-readable storage medium or a computer-readable signal medium. A computer-readable storage medium may involve tangible mediums such as, but not limited to optical disks, magnetic disks, read-only memories, random access memories, solid state devices and drives, or any other types of tangible or non-transitory media suitable for storing electronic information. A computer readable signal medium may include mediums such as carrier waves. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Computer programs can involve pure software implementations that involve instructions that perform the operations of the desired implementation.
[0066] Various general-purpose systems may be used with programs and modules in accordance with the examples herein, or it may prove convenient to construct a more specialized apparatus to perform desired method steps. In addition, the example implementations are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the example implementations as described herein. The instructions of the programming language(s) may be executed by one or more processing devices, e.g., central processing units (CPUs), processors, or controllers.
[0067] As is known in the art, the operations described above can be performed by hardware, software, or some combination of software and hardware. Various aspects of the example implementations may be implemented using circuits and logic devices (hardware), while other aspects may be implemented using instructions stored on a machine -readable medium (software), which if executed by a processor, would cause the processor to perform a method to carry out implementations of the present application. Further, some example implementations of the present application may be performed solely in hardware, whereas other example implementations may be performed solely in software. Moreover, the various functions described can be performed in a single unit, or can be spread across a number of components in any number of ways. When performed by software, the methods may be executed by a processor, such as a general purpose computer, based on instructions stored on a computer-readable medium. If desired, the instructions can be stored on the medium in a compressed and/or encrypted format.
[0068] Moreover, other implementations of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the teachings of the present application. Various aspects and/or components of the described example implementations may be used singly or in any combination. It is intended that the specification and example implementations be considered as examples only, with the true scope and spirit of the present application being indicated by the following claims.

Claims

CLAIMS What is claimed is:
1. An apparatus, comprising: a memory configured to store analysis information comprising a plurality of analysis processes, each of the plurality of analysis processes associated with data reduction information; and, a processor, configured to: retrieve an analysis process from a queue; determine available system resources to process the analysis process; for the analysis process being a batch process: calculate processing time for the analysis process based on the determined available system resources; for the analysis process not meeting the processing time, and the data reduction information indicating adjustment of data is permitted for the analysis process, adjust a data rate of incoming data for the analysis process; and for the analysis process not meeting the processing time and the data reduction information indicating adjustment of data is not permitted for the analysis process, adjust the incoming data for the analysis process with a streaming process.
2. The apparatus of claim 1, wherein the processor is further configured to: for the analysis process being a streaming process: monitor a throughput rate of the analysis process; for an adjustment of the throughput rate being triggered, conduct at least one of data sampling and adjusting window size of the streaming process.
3. The apparatus of claim 1, wherein the processor is configured to adjust the incoming data for the analysis process with a streaming process comprises conducting at least one of data sampling and adjusting window size of the incoming data as the incoming data is received.
4. The apparatus of claim 1, wherein the processor is configured to: for the analysis process not meeting the processing time after adjustment of the data rate of incoming data for the analysis process, place the analysis process in the queue for overtime processing indicated as being allowed, and execute the analysis process for the overtime processing indicated as not being allowed; and for the analysis process not meeting the processing time after adjustment of the incoming data for the analysis process with the streaming process, place the analysis process in the queue.
5. The apparatus of claim 1, wherein the processor is configured to adjust a data rate of incoming data for the analysis process by utilizing a data compression process indicated by the data reduction information.
6. The apparatus of claim 1, wherein the processor is configured to calculate the processing time based on a function involving a number of assignable central processing unit cores, disk input/output throughput to be assigned, and size of the incoming data.
7. A method, comprising: storing analysis information comprising a plurality of analysis processes, each of the plurality of analysis processes associated with data reduction information; retrieving an analysis process from a queue; determining available system resources to process the analysis process; for the analysis process being a batch process: calculating processing time for the analysis process based on the determined available system resources; for the analysis process not meeting the processing time, and the data reduction information indicating adjustment of data is permitted for the analysis process, adjusting a data rate of incoming data for the analysis process; and for the analysis process not meeting the processing time and the data reduction information indicating adjustment of data is not permitted for the analysis process, adjusting the incoming data for the analysis process with a streaming process.
8. The method of claim 7, further comprising: for the analysis process being a streaming process: monitoring a throughput rate of the analysis process; for an adjustment of the throughput rate being triggered, conducting at least one of data sampling and adjusting window size of the streaming process.
9. The method of claim 7, wherein the adjusting the incoming data for the analysis process with a streaming process comprises conducting at least one of data sampling and adjusting window size of the incoming data as the incoming data is received.
10. The method of claim 7, further comprising: for the analysis process not meeting the processing time after adjustment of the data rate of incoming data for the analysis process, placing the analysis process in the queue for overtime processing indicated as being allowed, and executing the analysis process for the overtime processing indicated as not being allowed; and for the analysis process not meeting the processing time after adjustment of the incoming data for the analysis process with the streaming process, placing the analysis process in the queue.
11. The method of claim 7, wherein the adjusting a data rate of incoming data for the analysis process comprises utilizing a data compression process indicated by the data reduction information.
12. The method of claim 7, wherein the calculating the processing time is based on a function involving a number of assignable central processing unit cores, disk input/output throughput to be assigned, and size of the incoming data.
13. A computer program storing instructions for executing a process, the instructions, comprising: storing analysis information comprising a plurality of analysis processes, each of the plurality of analysis processes associated with data reduction information; retrieving an analysis process from a queue; determining available system resources to process the analysis process; for the analysis process being a batch process: calculating processing time for the analysis process based on the determined available system resources; for the analysis process not meeting the processing time, and the data reduction information indicating adjustment of data is permitted for the analysis process, adjusting a data rate of incoming data for the analysis process; and for the analysis process not meeting the processing time and the data reduction information indicating adjustment of data is not permitted for the analysis process, adjusting the incoming data for the analysis process with a streaming process.
14. The computer program of claim 13, wherein the instructions further comprise: for the analysis process being a streaming process: monitoring a throughput rate of the analysis process; for an adjustment of the throughput rate being triggered, conducting at least one of data sampling and adjusting window size of the streaming process.
15. The computer program of claim 13, wherein the adjusting the incoming data for the analysis process with a streaming process comprises conducting at least one of data sampling and adjusting window size of the incoming data as the incoming data is received.
PCT/US2015/053298 2015-09-30 2015-09-30 Method and apparatus to manage multiple analytics applications WO2017058214A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2015/053298 WO2017058214A1 (en) 2015-09-30 2015-09-30 Method and apparatus to manage multiple analytics applications

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2015/053298 WO2017058214A1 (en) 2015-09-30 2015-09-30 Method and apparatus to manage multiple analytics applications

Publications (1)

Publication Number Publication Date
WO2017058214A1 true WO2017058214A1 (en) 2017-04-06

Family

ID=58427781

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/053298 WO2017058214A1 (en) 2015-09-30 2015-09-30 Method and apparatus to manage multiple analytics applications

Country Status (1)

Country Link
WO (1) WO2017058214A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10007704B1 (en) 2017-02-03 2018-06-26 International Business Machines Corporation Window management in a stream computing environment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003021377A2 (en) * 2001-08-31 2003-03-13 Op40, Inc. Enterprise information system
US20070053446A1 (en) * 2005-09-02 2007-03-08 Skipjam Corp. System and method for automatic adjustment of streaming video bit rate
US20130232249A1 (en) * 2003-04-28 2013-09-05 Akamai Technologies, Inc. Forward request queuing in a distributed edge processing environment
WO2015131961A1 (en) * 2014-03-07 2015-09-11 Systema Systementwicklung Dip.-Inf. Manfred Austen Gmbh Real-time information systems and methodology based on continuous homomorphic processing in linear information spaces

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003021377A2 (en) * 2001-08-31 2003-03-13 Op40, Inc. Enterprise information system
US20130232249A1 (en) * 2003-04-28 2013-09-05 Akamai Technologies, Inc. Forward request queuing in a distributed edge processing environment
US20070053446A1 (en) * 2005-09-02 2007-03-08 Skipjam Corp. System and method for automatic adjustment of streaming video bit rate
WO2015131961A1 (en) * 2014-03-07 2015-09-11 Systema Systementwicklung Dip.-Inf. Manfred Austen Gmbh Real-time information systems and methodology based on continuous homomorphic processing in linear information spaces

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10007704B1 (en) 2017-02-03 2018-06-26 International Business Machines Corporation Window management in a stream computing environment
US10339142B2 (en) 2017-02-03 2019-07-02 International Business Machines Corporation Window management in a stream computing environment
US10346408B2 (en) 2017-02-03 2019-07-09 International Business Machines Corporation Window management in a stream computing environment
US10417235B2 (en) 2017-02-03 2019-09-17 International Business Machines Corporation Window management in a stream computing environment

Similar Documents

Publication Publication Date Title
US10585693B2 (en) Systems and methods for metric driven deployments to cloud service providers
US10289973B2 (en) System and method for analytics-driven SLA management and insight generation in clouds
US9219691B2 (en) Source-driven switch probing with feedback request
JP6405416B2 (en) Data transmission system and data transmission method
US8817649B2 (en) Adaptive monitoring of telecommunications networks
US9401857B2 (en) Coherent load monitoring of physical and virtual networks with synchronous status acquisition
US20140269270A1 (en) Scheduled transmission of data
US20190334785A1 (en) Forecasting underutilization of a computing resource
CA2874633C (en) Incremental preparation of videos for delivery
US10708195B2 (en) Predictive scheduler
US9350669B2 (en) Network apparatus, performance control method, and network system
US11134023B2 (en) Network path redirection
US20230198908A1 (en) Method, apparatus, and system for adjusting routing of network traffic or utilization of network nodes
US9641642B2 (en) System and method for time shifting cellular data transfers
CN106233732B (en) Dynamic media transcoding for P2P communication
US9565116B2 (en) Executing variable-priority jobs using multiple statistical thresholds in cellular networks
WO2017058214A1 (en) Method and apparatus to manage multiple analytics applications
CN113032410A (en) Data processing method and device, electronic equipment and computer storage medium
US11106680B2 (en) System, method of real-time processing under resource constraint at edge
US9762706B2 (en) Packet processing program, packet processing apparatus, and packet processing method
WO2017003478A1 (en) Traffic prediction for enterprise wifi
CN112436951A (en) Method and device for predicting flow path
US11470145B2 (en) Server selection apparatus, server selection method and program
JP2017151825A (en) Control device and control method
EP2963875A1 (en) Method for processing data streams including time-critical messages of a power network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15905596

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15905596

Country of ref document: EP

Kind code of ref document: A1