US20200334293A1 - Computation platform agnostic data classification workflows - Google Patents

Computation platform agnostic data classification workflows Download PDF

Info

Publication number
US20200334293A1
US20200334293A1 US16/916,040 US202016916040A US2020334293A1 US 20200334293 A1 US20200334293 A1 US 20200334293A1 US 202016916040 A US202016916040 A US 202016916040A US 2020334293 A1 US2020334293 A1 US 2020334293A1
Authority
US
United States
Prior art keywords
classification
experiment
workflow
transformation
transformation block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US16/916,040
Inventor
Szymon Piechowicz
Barak Reuven Naveh
Annie Hsin-Wen Liu
Ashish Gupta
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Meta Platforms Inc
Original Assignee
Facebook Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Facebook Inc filed Critical Facebook Inc
Priority to US16/916,040 priority Critical patent/US20200334293A1/en
Publication of US20200334293A1 publication Critical patent/US20200334293A1/en
Assigned to FACEBOOK, INC. reassignment FACEBOOK, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GUPTA, ASHISH, LIU, ANNIE HSIN-WEN, NAVEH, BARAK REUVEN, PIECHOWICZ, SZYMON
Assigned to META PLATFORMS, INC. reassignment META PLATFORMS, INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: FACEBOOK, INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • Data classifiers are popular tools for analyzing data produced or otherwise collected by large computer networks. Data classification enables a computer network to analyze and react to a large and evolving data set. Data classifiers can process large data sets (e.g., sometimes referred to as “big data”) that are so large and/or complex that manual data analysis is impracticable. For example, a social networking system can run several application services that continuously produce and collect data. Classifiers can be used to identify new correlations, statistics, trends, patterns, or any combination thereof in datasets of the social networking system. For example, data classification can rely on static rules or evolving machine learning models.
  • a computer system may need to extract and transform input data, train and update machine learning models, deliberate and execute machine learning models, compile and/summarize the classification results, test or evaluate the classification results, or any combination thereof.
  • These actions often consume a large amount of computational resources (e.g., memory capacity, processor capacity, and/or network bandwidth) and require data scientists' or developers' involvement to repeatedly configure each operational step from one data classification experiment to another.
  • FIG. 1 is a data flow diagram illustrating an example of a classification experiment running on a computer system, in accordance with various embodiments.
  • FIG. 2 is a block diagram illustrating an example of a social networking system that incorporates a classification platform system to facilitate classification experiments, in accordance with various embodiments.
  • FIG. 3 is a block diagram illustrating a classification platform system that manages classification experiments, in accordance with various embodiments.
  • FIG. 4 is a block diagram illustrating an example of a classification experiment configuration, in accordance with various embodiments.
  • FIG. 5 is a block diagram illustrating an example of a transformation block memoization database, in accordance with various embodiments.
  • FIG. 6 is a flow chart illustrating a method of operating a classification platform system to create a classification experiment, in accordance with various embodiments.
  • FIG. 7 is a flow chart illustrating a method of operating a classification platform system to execute a classification experiment, in accordance with various embodiments.
  • FIG. 8 is a block diagram of an example of a computing device, which may represent one or more computing device or server described herein, in accordance with various embodiments.
  • Various embodiments are directed to a classification platform system to facilitate definition of a classification experiment by defining a classification experiment by at least defining a directed graph (DG).
  • the classification platform system can provide a graphical user interface to graphically place, connect, and arrange transformation blocks within a DG.
  • the DG can be a directed acyclical graph (DAG).
  • a transformation block can be a data processing operator that defines the computation logic and executable instructions to transform input data into some sort of output data.
  • the classification platform system can be implemented in a social networking system.
  • a classification experiment can pertain to the entire or a subset of the process of analyzing data sets (e.g., heterogeneous data sets) to place them in categories.
  • a classification experiment can include a DG-represented workflow comprising one or more repeatable functions of feature extraction, data transformation, data labeling, machine learning model training, classifier deliberation (e.g., utilizing a classifier model to classify data), classification result summarizations, classification result evaluations, or any combination thereof.
  • Portions or the whole of a DG-represented workflow can be repeatedly used by the classification platform system using different input data without manual reconfiguration (e.g., by a developer account or a data scientist account).
  • Classification experiments can include, for example, content category classifiers, event classifiers, quality detection classifiers (e.g., junk/spam filters), content deduplication classifiers, or any combination thereof.
  • the classification platform system can be coupled to various data sources (e.g., static or live sources) from which a classification experiment can source its data in each run of the classification experiment.
  • the classification platform system can implement an experiment definition user interface accessible via a Web server or an application coupled to an application programming interface (API).
  • API application programming interface
  • the classification platform system can receive configuration parameters of a classification experiment via the experiment definition user interface.
  • the configuration parameters can define, for a classification experiment, a prediction space (e.g., which data sources to use and what features to extract to make classification predictions), a labeled data space (e.g., where to find pre-classified labeled data for classifier training and evaluation), a workflow configuration (e.g., parameters describing one or more DGs representing an experiment workflow), a domain configuration (e.g., parameters binding the prediction space, the label data source, and the workflow configuration together), or any combination thereof.
  • the classification platform system can utilize the labeled data space with supervised or semi-supervised machine learning algorithms.
  • the workflow configuration can be defined as a DG connecting transformation blocks (e.g., feature extractors, classifier trainers, classifier predictors, filters, data transformers, statistical functions, other logical transformation functions, or any combination thereof).
  • a classifier predictor can be a data classification algorithm configured by a trained classifier model.
  • a DG can be defined at least partially by incorporating one or more other existing DGs stored in the classification platform system.
  • the DG can specify how one or more outputs of a transformation block are fed into one or more transformation blocks.
  • the DG and the domain configuration can further bind input data from the prediction space and/or labeled data space to at least one of the transformation blocks.
  • a transformation block can dynamically modify the DG during execution.
  • the DG can be graphically edited and created on a graphical user interface of the classification platform system.
  • the classification platform system reduces cycle time and avoiding unnecessary computation by memoization of the other transformation blocks.
  • execution of numerous data transformation workflows within a single day there can be simultaneously and/or consecutively executing workflows. These workflows can avoid re-computation by matching configurations of their transformation blocks against a memoization database to extract pre-computed results.
  • One or more inputs of a transformation block are not always directly from the prediction space and/or the labeled data space.
  • an output of a transformation block can be an input of another transformation block.
  • an output of a data classifier can be an input of another data classifier.
  • a compiler system on the classification platform system can inspect a DG representation and output a data structure representing an execution schedule.
  • One such data structure can include one or more pipelines for training one or more of classifier models, making classification predictions, and/or transforming data into other intermediate representations that can be used in one or more online services of a social networking system.
  • the data structure can be interpretable by different distributed computation platforms. Each computation platform can be configured to execute at least part of the DG representing an experiment workflow under a different paradigm (e.g., data parallelism, task parallelism, declarative programming execution, imperative programming execution, or any combination thereof).
  • the classification platform system can modularize building and usage of classifiers to be used across classification experiments.
  • the classification platform system can also remove the dependency of classification workflows from the computation platforms that execute them.
  • the classification platform system can template transformation blocks to facilitate creation of a classification experiment workflow.
  • the classification platform system can store data features and transformation block outputs to be reused across various classification experiments.
  • the classification platform system can schedule a workflow for updating (e.g., training and/or retraining) of machine learning models and a workflow for making classifications and/or predictions utilizing the trained machine learning models.
  • the classification platform system can productionalize the classification experiments to deliver deliberation results directly to consumer entities (e.g., applications, user accounts, and/or services within a social networking system) of the classification experiments.
  • the consumer entities can include a developer interface, one or more application services, and/or one or more data repositories.
  • the classification platform system can reduce cycle time and computation resources via memoization.
  • FIG. 1 is a data flow diagram illustrating an example of a classification experiment 100 running on a computer system, in accordance with various embodiments.
  • the computer system is a social networking system.
  • the classification experiment 100 can be defined by various configuration parameters, including a workflow DG 102 .
  • the workflow DG 102 can be represented by one or more transformation blocks directionally connected to one another.
  • a transformation block is a data processing operator.
  • the workflow DG 102 can form a pipeline (e.g., represented by a directed graph) out of a plurality of transformation blocks.
  • the directed graph can be a directed acyclical graph.
  • a classification platform system of the computer system can facilitate definition of classification experiments, e.g., the classification experiment 100 .
  • the classification platform system can be connected to one or more data sources (e.g., a live data source 106 A, a static data source 106 B, etc., collectively as the “data sources 106 ”).
  • the classification experiment 100 can source input data from at least a subset of the data sources.
  • the configuration parameters can identify the selected data sources utilized by at least some of the transformation blocks in the workflow DG 102 .
  • the workflow DG 102 can include a first feature extraction block 110 , a second feature extraction block 112 , a feature transformation block 116 , a classifier trainer 120 , a first classifier deliberation block 124 , and a second classifier deliberation block 128 .
  • the first feature extraction block 110 can produce a first feature set 132
  • the first feature set 132 can be consumed by (e.g., act as an input for) the feature transformation block 116 .
  • the second feature extraction block 112 can produce a second feature set 136 , and the second feature set 136 can be consumed by the first classifier deliberation block 124 .
  • the feature transformation block 116 can produce a transformed feature set 140 , and the transformed feature set can be consumed by the classifier trainer 120 .
  • the classifier trainer 120 can produce a machine learning model 144 , and the machine learning model 144 can be used to configure the first classifier deliberation block 124 .
  • the first classifier deliberation block 124 can produce a first classifier result set 148 , and the first classifier result set 148 can be consumed by the second classifier deliberation block 128 .
  • the second classifier deliberation block 128 can produce a second classifier result set 152 , and the second classifier result set 152 can be the final output of the classification experiment 100 .
  • the first feature extraction block 110 and the second feature extraction block 112 output their respective feature sets by filtering and/or selecting data entries from the data sources 106 , according to a set of one or more criteria.
  • the feature transformation block 116 generates the transformed feature set 140 by reducing, extrapolating, summarizing, normalizing, converting, or any combination thereof, data features provided in the first feature set 132 .
  • the data sources 106 constitute the inputs to the workflow DG 102 of the classification experiment 100 .
  • the first feature set 132 and the second feature set 136 are pre-generated before the classification experiment 100 begins.
  • the classification platform system can store features generated by a previously ran classification experiment for use by subsequent classification experiments.
  • the first feature set 132 and the second feature set 136 are the inputs to the workflow DG 102 .
  • the computer system can store a representation of the workflow DG 102 .
  • the workflow DG 102 is formed as a collection of transformation blocks and directed edges. Each edge can connect one transformation block to another directionally.
  • the workflow DG 102 can be acyclical.
  • the acyclical nature of the workflow DG 102 can ensure that that there is no way to start at a transformation block and follow a sequence of edges that eventually loops back to the same transformation block again. This prevents execution of a workflow that falls into an infinite/indefinite loop.
  • FIG. 2 is a block diagram illustrating an example of a social networking system 200 that incorporates a classification platform system 204 to facilitate classification experiments (e.g., the classification experiment 100 of FIG. 1 ), in accordance with various embodiments.
  • the social networking system 200 can include one or more application services (e.g., an application service 208 A and an application service 208 B, collectively as the “application services 208 ”).
  • the application services 208 can generate and/or collect data from one or more client devices 212 via one or more consumer interfaces 216 (e.g., a web interface 216 A, an application programming interface (API) 216 B, or a combination thereof, collectively as the “consumer interfaces 216 ”).
  • consumer interfaces 216 e.g., a web interface 216 A, an application programming interface (API) 216 B, or a combination thereof, collectively as the “consumer interfaces 216 ”).
  • API application programming interface
  • the application services 208 can process client interactions (e.g., content request, interaction with social network objects via a user interface of the social networking system 200 , user-generated content updates, etc.) in real-time.
  • client interactions can be considered “live traffic” that are logged in a data center 220 (e.g., including a data repository 222 A and a data repository 222 B, collectively as the “data repositories 222 ”).
  • the application services 208 can include a search engine, a photo editing tool, a location-based tool, an advertisement service, a media service, an interactive content service, a messaging service, a social networking service, or any combination thereof.
  • the application service 208 A is a location logger that stores geographical locations of user accounts of the social networking system 200 in a live location database in the data center 220 . Because the location logger continuously updates the live location database, the live location database can be considered a live data source (e.g., the live data source 106 A of FIG. 1 ).
  • the application service 208 B stores static user-generated content in a content database in the data center 220 . Because each unique content, once uploaded, remains static, the content database can be considered a static data source (e.g., the static data source 106 B of FIG. 1 ).
  • the classification platform system 204 can facilitate privileged users (e.g., developer accounts and/or data scientist accounts) of the social networking system 200 to run classification experiments based on social network data collected by the application services 208 .
  • the classification platform system 204 can facilitate creation of new experiments on a user interface.
  • Each classification experiment can be repeatedly run with or without modifying its configuration parameters.
  • Each classification experiment can correspond to at least one workflow DG (e.g., the workflow DG 102 of FIG. 1 ).
  • the social networking system 200 can include distributed computation systems (e.g., a distributed computation system 230 A and a distributed computation system 230 B, collectively as the “distributed computation systems 230 ”).
  • the distributed computation systems 230 can include computation platform of different types as well as similar computation platforms for redundancy.
  • Each of the distributed computation systems 230 can include one or more computing devices (e.g., computer servers).
  • the classification platform system 204 can schedule the classification experiments to be run on one of the distributed computation systems 230 . For example, based on the dependencies of transformation blocks indicated in a workflow DG, the classification platform system 204 can determine which of the transformation blocks to schedule on the selected distributed computation system before other transformation blocks of the workflow DG are scheduled.
  • the classification platform system 204 can provide input data or links to the input data of the scheduled transformation blocks to the selected distributed computation system.
  • the outputting results of the executed transformation blocks are stored back into a memoization repository 240 of the classification platform system 204 .
  • the final output of the last transformation block (e.g., transformation block that is not attached to an existing edge and does not feed its output result to another transformation block) of a workflow DG is marked as classification results of the classification experiment.
  • the results of a classification experiment can be fed back to the application services 208 to modify decision-making logic of at least one of the application services 208 .
  • FIG. 3 is a block diagram illustrating a classification platform system 300 that manages classification experiments (e.g., the classification experiment 100 of FIG. 1 ), in accordance with various embodiments.
  • the classification platform system 300 includes an experiment management engine 304 , a real-time workflow execution engine 308 , a batch workflow execution engine 312 , an experiment repository 316 , a workflow repository 320 , a feature repository 322 , a data source interface 324 , a transformation block repository 326 , a memoization database 330 , or any combination thereof.
  • the experiment management engine 304 facilitates the definition of one or more classification experiments (e.g., the classification experiment 100 of FIG. 1 ).
  • the experiment management engine 304 can include an experiment definition user interface 334 .
  • the experiment definition user interface 334 can be a user interface implemented as a webpage, a website comprising interconnected webpages, an API coupled to a computer application or mobile application, or any combination thereof.
  • the experiment definition user interface 334 can receive configuration parameters from a client device for defining and/or updating a classification experiment.
  • the configuration parameters can be received as text formatted in a markup language or as a series of user interactions with a graphical user interface of the experiment definition user interface 334 .
  • a workflow DG of the classification experiment can be defined by visually drawing, placing and arranging block shapes representing the transformation blocks of the workflow DG and directional connections (e.g., arrows) between the block shapes.
  • the experiment definition user interface 334 can also enable a client device to monitor the scheduling and execution of the classification experiment. In some embodiments, the experiment definition user interface 334 enables the client device to track the classification results of the classification experiment. In some embodiments, the experiment definition user interface 334 further receives instructions to connect the classification results to one or more external application services from a developer account.
  • the classification platform system 300 can include a plurality of workflow execution engines implemented to manage one or more distributed computation systems (e.g., the distributed computation systems 230 of FIG. 2 ).
  • the real-time workflow execution engine 308 can be configured to execute a classification experiment on a distributed computation system by classifying live data in substantially real-time.
  • the batch workflow execution engine 312 can be configured to execute a classification experiment by classifying batches of constant size data sets.
  • each of the workflow execution engines can facilitate execution of an experiment workflow of a classification experiment.
  • the workflow execution engines can manage and schedule execution of code packages associated with the transformation blocks in the experiment workflow.
  • a workflow execution engine can select computing devices to run specific transformation blocks of the experiment workflow, distribute code packages corresponding to the transformation blocks, distribute references or links to input datasets of the experiment workflow, and/or schedule execution of the code packages on the computing devices.
  • the workflow execution engine can also ensure load-balancing and resource consumption minimization when scheduling the experiment workflow for execution on the selected computing devices (e.g., by managing the selection of the computing devices, distributing appropriate code packages, and/or streaming the input datasets or links thereto ahead of execution schedule while minimizing network bandwidth).
  • the workflow execution engine can schedule execution of the workflow by analysis of the experiment workflow indicated by the workflow configuration (e.g., a DG of transformation blocks) to avoid bottlenecks, errors, and inconsistencies.
  • the workflow execution engine can also schedule the execution of the experiment workflow based on statuses of currently running classification experiments, health data and operation states of the data sources, and/or scheduled execution times of scheduled classification experiments. For example, the workflow execution engine can ensure that a transformation block, which requires the output of another transformation block to execute, is not executed in parallel with that other transformation block.
  • the classification platform system 300 includes the experiment repository 316 , the workflow repository 320 , and the transformation block repository 326 to facilitate definition of classification experiments.
  • the experiment repository 316 stores definitions of previous classification experiments. Any of the classification experiments in the experiment repository 316 can be executed again using different input data. A new classification experiment can also inherit the configuration and definition of a previous classification experiment.
  • the workflow repository 320 stores workflow configurations used in previous classification experiments. For example, the workflow configurations can be represented by a DG comprising transformation blocks connected to one another. A new classification experiment can import one or more DGs and/or workflow configurations from the workflow repository 320 .
  • the transformation block repository 326 stores definitions of transformation blocks.
  • the definition of a transformation block includes logic to transform input data into some sort of output data.
  • the experiment management engine 304 can import one or more transformation blocks into the new DG from the transformation block repository 326 .
  • the transformation blocks can be defined by at least an input data schema (e.g., one or more input data formats/types), a transformative function (e.g., executable instructions to transform the input data), and an output data schema (e.g., one or more output data format/types).
  • the data source interface 224 can be an API for the classification platform system 300 to connect with one or more databases storing data features (e.g., input data for classifiers) and/or precursors to data features.
  • the data source interface 224 can also be coupled to one or more application services that generate data features and/or precursors to data features.
  • the classification platform system 300 includes the feature repository 322 and the memoization database 330 to reduce wasting of computational resources when executing a classification experiment.
  • the feature repository 322 stores one or more feature data sets that can be identified as the input data for a classification experiment.
  • the memoization database 330 stores one or more outputs of a known transformation block.
  • the real-time workflow execution engine 308 or the batch workflow execution engine 312 can determine whether to schedule for the execution of a transformation block in the experiment workflow.
  • the workflow execution engine can determine whether to schedule for the execution of the transformation block by matching the identity of the transformation block and its input data set against a lookup table of the memoization database 330 . This process enables the workflow execution engine to avoid having to execute the same transformation block more than once.
  • Functional components associated with the social networking system 200 and/or the classification platform system 300 can be implemented as a combination of circuitry, firmware, software, or other executable instructions.
  • the functional components can be implemented in the form of special-purpose circuitry, in the form of one or more appropriately programmed processors, a single board chip, a field programmable gate array, a network-capable computing device, a virtual machine, a cloud computing environment, or any combination thereof.
  • the functional components described can be implemented as instructions on a tangible storage memory capable of being executed by a processor or other integrated circuit chip.
  • the tangible storage memory may be volatile or non-volatile memory. In some embodiments, the volatile memory may be considered “non-transitory” in the sense that it is not a transitory signal. Memory space and storages described in the figures can be implemented with the tangible storage memory as well, including volatile or non-volatile memory.
  • Each of the functional components may operate individually and independently of other functional components. Some or all of the functional components may be executed on the same host device or on separate devices. The separate devices can be coupled through one or more communication channels (e.g., wireless or wired channel) to coordinate their operations. Some or all of the functional components may be combined as one component. A single functional component may be divided into sub-components, each sub-component performing separate method step or method steps of the single component.
  • At least some of the functional components share access to a memory space.
  • one functional component may access data accessed by or transformed by another functional component.
  • the functional components may be considered “coupled” to one another if they share a physical connection or a virtual connection, directly or indirectly, allowing data accessed or modified by one functional component to be accessed in another functional component.
  • at least some of the functional components can be upgraded or modified remotely (e.g., by reconfiguring executable instructions that implements a portion of the functional components).
  • Other arrays, systems and devices described above may include additional, fewer, or different functional components for various applications.
  • FIG. 4 is a block diagram illustrating an example of a classification experiment configuration 400 , in accordance with various embodiments.
  • the classification experiment configuration 400 defines the parameters of a classification experiment.
  • the classification experiment configuration 400 can include a prediction space definition 404 , a labeled data space definition 408 , a workflow configuration 412 , a domain configuration 416 , or any combination thereof.
  • the prediction space definition 404 can define data sets or open-ended data streams to run the trained classifier model on.
  • the labeled data space definition 408 can define data sets for classifier training and/or classified evaluation.
  • the workflow configuration 412 defines an experiment workflow of the classification experiment.
  • the workflow configuration 412 can define a DG representative of the experiment workflow.
  • the domain configuration 416 can bind the workflow configuration 412 with the labeled data space definition 408 and/or the prediction space definition 404 .
  • the domain configuration 416 can also enable various classification experiment execution features supported by a classification platform system (e.g., the classification platform system 300 of FIG. 3 ).
  • the execution features can include a checkpoint feature that prevents regression of a classifier model by evaluating, in real-time, whether a transformation block is malfunctioning, and whether to terminate a scheduled run or an executing run of an experiment workflow.
  • the execution features can also include an active learning feature that actively requests labeling of data entries from a user account (e.g., a user account of the social networking system 200 of FIG. 2 ). The active learning feature can occur after an experiment workflow has been executed or during the execution of the experiment workflow.
  • FIG. 5 is a block diagram illustrating an example of a transformation block memoization database 500 , in accordance with various embodiments.
  • the transformation block memoization database 500 can be a lookup table.
  • the keys to the lookup table can be an input identifier 502 , a transformation block identifier 506 , a transformation block version identifier 510 , a transformation block configuration identifier 512 , or any combination thereof.
  • the lookup table can include multiple rows.
  • the potential return values of the lookup table are memoized results 520 of previously executed transformation blocks.
  • a query request to the transformation block memoization database 500 can include one or more keys.
  • the transformation block memoization database 500 can match the requested keys against the known keys to return one of the memoized results 520 .
  • the transformation block identifier 506 can be a unique identification number assigned by a classification platform system (e.g., the classification platform system 300 of FIG. 3 ).
  • the transformation block version identifier 510 can be a version number.
  • the input identifier 502 can uniquely represent an input data space of the transformation block associated with the transformation block identifier 506 .
  • the input identifier 502 can be a hash of features set identifiers.
  • the transformation block configuration identifier 512 can uniquely represent configuration parameters associated with the transformation block when the memoized results 520 were computed.
  • the transformation block configuration identifier 512 can be a hash of the configuration parameter values.
  • FIG. 6 is a flow chart illustrating a method 600 of operating a classification platform system (e.g., the classification platform system 300 of FIG. 3 ) to create a classification experiment, in accordance with various embodiments.
  • the classification platform system can interface with one or more data sources in a social networking system.
  • the data sources can be a live data source or a static data source.
  • the data source can also be a feature bank (e.g., the feature repository 322 of FIG. 3 ) of the classification platform system.
  • the feature bank can cache feature sets generated in a previous classification experiment. These cached feature sets can be reused across multiple subsequent classification experiments
  • the classification platform system can receive a command to define a classification experiment (e.g., by creating a new classification experiment or updating a stored classification experiment in an experiment repository (e.g., the experiment repository 316 of FIG. 3 ).
  • the classification platform system can define one or more input data spaces.
  • a user account with access to the classification platform system can select least one of the data sources interfaced with the classification platform system to include in the input data spaces.
  • the input data spaces can include a prediction space and a labeled data space.
  • the selected data source can include a live data source from a social networking system.
  • the live data source can produce an open-ended stream of new data entries formatted according to one or more data formats of the defined input data space.
  • the selected data source can include a static data source from a social networking system.
  • the static data source can include a static data set with a constant data size formatted according to one or more data formats of the defined input data space.
  • the classification platform system can define, via a definition user interface (e.g., the experiment definition user interface 334 of FIG. 3 ), a workflow configuration of the classification experiment.
  • the definition user interface can be a graphical user interface.
  • defining the classification experiment can include inheriting a directed graph for the workflow configuration from a workflow repository (e.g., the workflow repository 320 of FIG. 3 ).
  • the definition user interface can receive commands to graphically arrange a DG connecting a plurality of transformation blocks to represent an experiment workflow of the classification experiment.
  • the graphical arrangement can include placing graphical representations of the transformation blocks and connecting the transformation blocks with one or more arrows or other directional graphical representations.
  • the DG can specify how one or more outputs of each of the transformation blocks are fed into one or more other transformation blocks.
  • Each transformation block can be selected from a transformation block repository (e.g., the transformation blocks repository 326 of FIG. 3 ).
  • the input data space can define one or more data sources to feed into at least a subset of the transformation blocks and one or more data fields to extract from the data sources.
  • the input data space can be a labeled data space that includes at least a parameter to locate labeled data for training a supervised classifier machine learning model or for evaluating classification precision or recall of a classifier model.
  • the input data space can be a prediction space that includes at least a parameter to locate input data to be classified in the classification experiment.
  • the input data space can select at least a live data source from the data sources to feed into at least one of the transformation blocks.
  • the transformation blocks can include a data feature extraction process, data feature filtering process, a data feature transformation process, a classifier deliberation process, a classifier training process, a classifier evaluation process, or any combination thereof.
  • the transformation blocks can include a transformation block representative of a supervised machine learning training process that utilizes a labeled data source defined in the labeled data space for training.
  • the transformation blocks can include a transformation block representative of a classification evaluation process that utilizes a labeled data source defined in the labeled data space to evaluate classification precision or classification recall.
  • the transformation blocks can include a supervised machine learning prediction/deliberation process that utilizes data identified by the prediction space to run against one or more rules or trained machine learning models to produce a classification result.
  • a transformation block in the DG includes logic to dynamically modify the DG during execution of the experiment workflow.
  • the transformation block can include logic to dynamically modify input data of an existing transformation block in the DG.
  • the transformation block can include logic to change or remove an existing transformation block in the DG or to add a new transformation block to the DG.
  • the classification platform system can define a domain configuration of the classification experiment.
  • the domain configuration can include at least a parameter binding the input data space and the workflow configuration.
  • the classification platform system can format the workflow configuration, the input data space, and/or the domain configuration into a data structure such that the data structure is interpretable by a plurality of different computation platforms (e.g., computation platforms of different types operating under different programming paradigms).
  • Each of the computation platforms can be capable of executing the classification experiment.
  • the computation platforms can utilize one or more different programming paradigms (e.g., task parallelism, data parallelism, imperative workflow programming, declarative workflow programming) based on the workflow configuration of the classification experiment.
  • the data structure can be formatted to describe the dependency of transformation blocks as extracted from the DG and the input space, and thus enables a task parallel computation platform (e.g., real-time platform) to schedule transformation blocks based on dependency and interdependency of the transformation blocks.
  • the data structure can be formatted to describe the relative size of input data of the transformation blocks as extracted from the DG and the input space, and thus enables a data parallel computation platform (e.g., batch processing platform) to schedule instances of the transformation blocks to process different elements within a large dataset.
  • FIG. 7 is a flow chart illustrating a method 700 of operating a classification platform system (e.g., the classification platform system 300 of FIG. 3 ) to execute a classification experiment, in accordance with various embodiments.
  • the classification platform system can select, from a plurality of different computation platforms, a distributed computation platform to execute at least part of the classification experiment.
  • the distributed computation platform can be selected based on a geographical or network location of the distributed computation platform relative to one or more geographical or network locations of input data specified in a defined input data space of the classification experiment.
  • the selected distributed computation platform can be configured to execute the classification experiment in substantially real-time in response to new data from the live data source.
  • the selected distributed computation platform can be configured to execute the classification experiment in batch (e.g., via recurring batch processing).
  • the classification platform system can schedule the selected distributed computation platform to execute at least part of the classification experiment according to the input data space and a workflow configuration (e.g., represented by a DG of transformation blocks) of the classification experiment.
  • the classification platform system can schedule a first part (e.g., a first set of transformation blocks) of the workflow configuration to execute on a first computation platform and a second part (e.g., a second set of transformation blocks) of the workflow configuration to execute on a second computation platform.
  • the first part and the second part are mutually exclusive.
  • scheduling the execution of the classification experiment includes imperatively programming computing nodes of the distributed computation platform to execute the transformation blocks based on the DG. This type of programming occurs when at least one of the transformation blocks in the DG can dynamically modify the DG during execution.
  • Imperative programming is a programming paradigm that uses statements that change a program workflow's state. Imperative programming can focus on describing how a program workflow operates. The term can be used in contrast to declarative programming, which focuses on what the program workflow should accomplish without specifying how the program workflow should achieve the result.
  • scheduling the execution of the classification experiment includes declaratively programming computing nodes of the distributed computation platform to execute the transformation blocks based on the DG. This type of programming occurs when the transformation blocks represent static functional blocks, where the transformation blocks in the DG does not explicitly manage the control flow of the distributed computation platform.
  • the classification platform system can prevent a transformation block in the DG from being executed by the distributed computation platform if the transformation block as defined by the workflow configuration matches an entry in a memoization database (e.g., the memoization database 330 of FIG. 3 ).
  • the entry can include pre-computed output result of the transformation block given the same or substantially same input and configuration.
  • the memoization database reduces wasted computation resources when a developer is rerunning a classification experiment multiple times with modifications and changes to a subset of its experiment workflow.
  • the memoization database enables the classification platform system to avoid having to re-execute a transformation block that is not modified.
  • the memoization database also reduces wasted computation resources when multiple developers are running related classification experiments sharing one or more transformation blocks.
  • the classification platform system can pipe an output result of executing the classification experiment to an application service.
  • the classification platform system can pipe the output result to a social networking system (e.g., the social networking system 200 of FIG. 2 ) to re-configure at least an application service of the social network system.
  • the classification experiment includes a comparison of multiple DGs representing data classification workflows.
  • a classification experiment can include a main DG with multiple child DGs representing data classification workflows.
  • the main DG can include a transformation block to evaluate the child DGs (e.g., by computing classification recall and classification precision of the data classification workflows).
  • the main DG can include a transformation block to compare the evaluative measures.
  • the result of the classification experiment includes a data classification workflow selected as the “best” according to the evaluative measures.
  • the data classification workflow can be marked in a workflow repository as having the best performance.
  • processes or blocks are presented in a given order, alternative embodiments may perform routines having steps, or employ systems having blocks, in a different order, and some processes or blocks may be deleted, moved, added, subdivided, combined, and/or modified to provide alternative or subcombinations. Each of these processes or blocks may be implemented in a variety of different ways. In addition, while processes or blocks are at times shown as being performed in series, these processes or blocks may instead be performed in parallel, or may be performed at different times. When a process or step is “based on” a value or a computation, the process or step should be interpreted as based at least on that value or that computation.
  • FIG. 8 is a block diagram of an example of a computing device 800 , which may represent one or more computing device or server described herein, in accordance with various embodiments.
  • the computing device 800 can be one or more computing devices that implement the social networking system 200 of FIG. 2 and/or the classification platform system 300 of FIG. 3 .
  • the computing device 800 can execute at least part of the method 600 of FIG. 6 or the method 700 of FIG. 7 .
  • the computing device 800 includes one or more processors 810 and memory 820 coupled to an interconnect 830 .
  • the interconnect 830 is an abstraction that represents any one or more separate physical buses, point-to-point connections, or both connected by appropriate bridges, adapters, or controllers.
  • the interconnect 830 may include, for example, a system bus, a Peripheral Component Interconnect (PCI) bus or PCI-Express bus, a HyperTransport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB), IIC (I2C) bus, or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus, also called “Firewire”.
  • PCI Peripheral Component Interconnect
  • ISA industry standard architecture
  • SCSI small computer system interface
  • USB universal serial bus
  • I2C IIC
  • IEEE Institute of Electrical and Electronics Engineers
  • the processor(s) 810 is/are the central processing unit (CPU) of the computing device 800 and thus controls the overall operation of the computing device 800 . In certain embodiments, the processor(s) 810 accomplishes this by executing software or firmware stored in memory 820 .
  • the processor(s) 810 may be, or may include, one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), trusted platform modules (TPMs), or the like, or a combination of such devices.
  • the memory 820 is or includes the main memory of the computing device 800 .
  • the memory 820 represents any form of random access memory (RAM), read-only memory (ROM), flash memory, or the like, or a combination of such devices.
  • RAM random access memory
  • ROM read-only memory
  • flash memory or the like, or a combination of such devices.
  • the memory 820 may contain a code 870 containing instructions.
  • the network adapter 840 provides the computing device 800 with the ability to communicate with remote devices, over a network and may be, for example, an Ethernet adapter or Fibre Channel adapter.
  • the network adapter 840 may also provide the computing device 800 with the ability to communicate with other computers.
  • the storage adapter 850 enables the computing device 800 to access a persistent storage, and may be, for example, a Fibre Channel adapter or SCSI adapter.
  • the code 870 stored in memory 820 may be implemented as software and/or firmware to program the processor(s) 810 to carry out actions described above.
  • such software or firmware may be initially provided to the computing device 800 by downloading it from a remote system through the computing device 800 (e.g., via network adapter 840 ).
  • programmable circuitry e.g., one or more microprocessors
  • Special-purpose hardwired circuitry may be in the form of, for example, one or more application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), etc.
  • ASICs application-specific integrated circuits
  • PLDs programmable logic devices
  • FPGAs field-programmable gate arrays
  • Machine-readable storage medium includes any mechanism that can store information in a form accessible by a machine (a machine may be, for example, a computer, network device, cellular phone, personal digital assistant (PDA), manufacturing tool, any device with one or more processors, etc.).
  • a machine-accessible storage medium includes recordable/non-recordable media (e.g., read-only memory (ROM); random access memory (RAM); magnetic disk storage media; and/or optical storage media; flash memory devices), etc.
  • logic can include, for example, programmable circuitry programmed with specific software and/or firmware, special-purpose hardwired circuitry, or a combination thereof.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Various embodiments include a classification platform system. A user can define a classification experiment on the classification platform system. For example, the user can define an input data space by selecting at least one of data sources interfaced with the classification platform system and defining a workflow configuration including a directed graph (DG) connecting a plurality of transformation blocks to represent an experiment workflow. The DG can specify how one or more outputs of each of the transformation blocks are fed into one or more other transformation blocks. The DG can be executed by various types of computation platforms. The classification platform system can schedule the experiment workflow to be executed on a distributed computation platform according to the input data space and the workflow configuration.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of the co-pending U.S. patent application titled, “COMPUTATION PLATFORM AGNOSTIC DATA CLASSIFICATION WORKFLOWS,” filed on Jun. 30, 2016 and having Ser. No. 15/199,351. This application also relates to U.S. patent application Ser. No. 15/199,335, titled, “GRAPHICALLY MANAGING DATA CLASSIFICATION WORKFLOWS IN A SOCIAL NETWORKING SYSTEM WITH DIRECTED GRAPHS,” filed on Jun. 30, 2016 and U.S. patent application Ser. No. 15/199,403, titled, “DATA CLASSIFICATION WORKFLOWS IMPLEMENTED WITH DYNAMICALLY MODIFIABLE DIRECTED GRAPHS”, filed Jun. 30, 2016. The subject matter of these related applications is hereby incorporated herein by reference.
  • BACKGROUND
  • Data classifiers are popular tools for analyzing data produced or otherwise collected by large computer networks. Data classification enables a computer network to analyze and react to a large and evolving data set. Data classifiers can process large data sets (e.g., sometimes referred to as “big data”) that are so large and/or complex that manual data analysis is impracticable. For example, a social networking system can run several application services that continuously produce and collect data. Classifiers can be used to identify new correlations, statistics, trends, patterns, or any combination thereof in datasets of the social networking system. For example, data classification can rely on static rules or evolving machine learning models. To complete a data classification experiment involving machine learning models, a computer system may need to extract and transform input data, train and update machine learning models, deliberate and execute machine learning models, compile and/summarize the classification results, test or evaluate the classification results, or any combination thereof. These actions often consume a large amount of computational resources (e.g., memory capacity, processor capacity, and/or network bandwidth) and require data scientists' or developers' involvement to repeatedly configure each operational step from one data classification experiment to another.
  • FIG. 1 is a data flow diagram illustrating an example of a classification experiment running on a computer system, in accordance with various embodiments.
  • FIG. 2 is a block diagram illustrating an example of a social networking system that incorporates a classification platform system to facilitate classification experiments, in accordance with various embodiments.
  • FIG. 3 is a block diagram illustrating a classification platform system that manages classification experiments, in accordance with various embodiments.
  • FIG. 4 is a block diagram illustrating an example of a classification experiment configuration, in accordance with various embodiments.
  • FIG. 5 is a block diagram illustrating an example of a transformation block memoization database, in accordance with various embodiments.
  • FIG. 6 is a flow chart illustrating a method of operating a classification platform system to create a classification experiment, in accordance with various embodiments.
  • FIG. 7 is a flow chart illustrating a method of operating a classification platform system to execute a classification experiment, in accordance with various embodiments.
  • FIG. 8 is a block diagram of an example of a computing device, which may represent one or more computing device or server described herein, in accordance with various embodiments.
  • The figures depict various embodiments of this disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of embodiments described herein.
  • DETAILED DESCRIPTION
  • Various embodiments are directed to a classification platform system to facilitate definition of a classification experiment by defining a classification experiment by at least defining a directed graph (DG). For example, the classification platform system can provide a graphical user interface to graphically place, connect, and arrange transformation blocks within a DG. In some embodiments, the DG can be a directed acyclical graph (DAG). A transformation block can be a data processing operator that defines the computation logic and executable instructions to transform input data into some sort of output data.
  • In one example, the classification platform system can be implemented in a social networking system. A classification experiment can pertain to the entire or a subset of the process of analyzing data sets (e.g., heterogeneous data sets) to place them in categories. A classification experiment can include a DG-represented workflow comprising one or more repeatable functions of feature extraction, data transformation, data labeling, machine learning model training, classifier deliberation (e.g., utilizing a classifier model to classify data), classification result summarizations, classification result evaluations, or any combination thereof. Portions or the whole of a DG-represented workflow can be repeatedly used by the classification platform system using different input data without manual reconfiguration (e.g., by a developer account or a data scientist account).
  • Classification experiments can include, for example, content category classifiers, event classifiers, quality detection classifiers (e.g., junk/spam filters), content deduplication classifiers, or any combination thereof. The classification platform system can be coupled to various data sources (e.g., static or live sources) from which a classification experiment can source its data in each run of the classification experiment. The classification platform system can implement an experiment definition user interface accessible via a Web server or an application coupled to an application programming interface (API).
  • The classification platform system can receive configuration parameters of a classification experiment via the experiment definition user interface. The configuration parameters can define, for a classification experiment, a prediction space (e.g., which data sources to use and what features to extract to make classification predictions), a labeled data space (e.g., where to find pre-classified labeled data for classifier training and evaluation), a workflow configuration (e.g., parameters describing one or more DGs representing an experiment workflow), a domain configuration (e.g., parameters binding the prediction space, the label data source, and the workflow configuration together), or any combination thereof. The classification platform system can utilize the labeled data space with supervised or semi-supervised machine learning algorithms. The workflow configuration can be defined as a DG connecting transformation blocks (e.g., feature extractors, classifier trainers, classifier predictors, filters, data transformers, statistical functions, other logical transformation functions, or any combination thereof). A classifier predictor can be a data classification algorithm configured by a trained classifier model. A DG can be defined at least partially by incorporating one or more other existing DGs stored in the classification platform system. The DG can specify how one or more outputs of a transformation block are fed into one or more transformation blocks. The DG and the domain configuration can further bind input data from the prediction space and/or labeled data space to at least one of the transformation blocks. In some implementations, a transformation block can dynamically modify the DG during execution. In some embodiments, the DG can be graphically edited and created on a graphical user interface of the classification platform system.
  • The classification platform system reduces cycle time and avoiding unnecessary computation by memoization of the other transformation blocks. During execution of numerous data transformation workflows within a single day, there can be simultaneously and/or consecutively executing workflows. These workflows can avoid re-computation by matching configurations of their transformation blocks against a memoization database to extract pre-computed results.
  • One or more inputs of a transformation block are not always directly from the prediction space and/or the labeled data space. For example, an output of a transformation block can be an input of another transformation block. In one example, an output of a data classifier can be an input of another data classifier. A compiler system on the classification platform system can inspect a DG representation and output a data structure representing an execution schedule. One such data structure can include one or more pipelines for training one or more of classifier models, making classification predictions, and/or transforming data into other intermediate representations that can be used in one or more online services of a social networking system. The data structure can be interpretable by different distributed computation platforms. Each computation platform can be configured to execute at least part of the DG representing an experiment workflow under a different paradigm (e.g., data parallelism, task parallelism, declarative programming execution, imperative programming execution, or any combination thereof).
  • The classification platform system can modularize building and usage of classifiers to be used across classification experiments. The classification platform system can also remove the dependency of classification workflows from the computation platforms that execute them. The classification platform system can template transformation blocks to facilitate creation of a classification experiment workflow. The classification platform system can store data features and transformation block outputs to be reused across various classification experiments. The classification platform system can schedule a workflow for updating (e.g., training and/or retraining) of machine learning models and a workflow for making classifications and/or predictions utilizing the trained machine learning models. The classification platform system can productionalize the classification experiments to deliver deliberation results directly to consumer entities (e.g., applications, user accounts, and/or services within a social networking system) of the classification experiments. For example, the consumer entities can include a developer interface, one or more application services, and/or one or more data repositories. The classification platform system can reduce cycle time and computation resources via memoization.
  • Referring now to the figures, FIG. 1 is a data flow diagram illustrating an example of a classification experiment 100 running on a computer system, in accordance with various embodiments. In some embodiments, the computer system is a social networking system. The classification experiment 100 can be defined by various configuration parameters, including a workflow DG 102. The workflow DG 102 can be represented by one or more transformation blocks directionally connected to one another. A transformation block is a data processing operator. The workflow DG 102 can form a pipeline (e.g., represented by a directed graph) out of a plurality of transformation blocks. The directed graph can be a directed acyclical graph.
  • A classification platform system of the computer system can facilitate definition of classification experiments, e.g., the classification experiment 100. The classification platform system can be connected to one or more data sources (e.g., a live data source 106A, a static data source 106B, etc., collectively as the “data sources 106”). The classification experiment 100 can source input data from at least a subset of the data sources. For example, the configuration parameters can identify the selected data sources utilized by at least some of the transformation blocks in the workflow DG 102.
  • In the illustrated example, the workflow DG 102 can include a first feature extraction block 110, a second feature extraction block 112, a feature transformation block 116, a classifier trainer 120, a first classifier deliberation block 124, and a second classifier deliberation block 128. For example, the first feature extraction block 110 can produce a first feature set 132, and the first feature set 132 can be consumed by (e.g., act as an input for) the feature transformation block 116. The second feature extraction block 112 can produce a second feature set 136, and the second feature set 136 can be consumed by the first classifier deliberation block 124. The feature transformation block 116 can produce a transformed feature set 140, and the transformed feature set can be consumed by the classifier trainer 120. The classifier trainer 120 can produce a machine learning model 144, and the machine learning model 144 can be used to configure the first classifier deliberation block 124. The first classifier deliberation block 124 can produce a first classifier result set 148, and the first classifier result set 148 can be consumed by the second classifier deliberation block 128. The second classifier deliberation block 128 can produce a second classifier result set 152, and the second classifier result set 152 can be the final output of the classification experiment 100.
  • In some embodiments, the first feature extraction block 110 and the second feature extraction block 112 output their respective feature sets by filtering and/or selecting data entries from the data sources 106, according to a set of one or more criteria. In some embodiments, the feature transformation block 116 generates the transformed feature set 140 by reducing, extrapolating, summarizing, normalizing, converting, or any combination thereof, data features provided in the first feature set 132.
  • In some embodiments, the data sources 106 constitute the inputs to the workflow DG 102 of the classification experiment 100. In some embodiments, the first feature set 132 and the second feature set 136 are pre-generated before the classification experiment 100 begins. For example, the classification platform system can store features generated by a previously ran classification experiment for use by subsequent classification experiments. In these embodiments, the first feature set 132 and the second feature set 136 are the inputs to the workflow DG 102. The computer system can store a representation of the workflow DG 102. The workflow DG 102 is formed as a collection of transformation blocks and directed edges. Each edge can connect one transformation block to another directionally. The workflow DG 102 can be acyclical. The acyclical nature of the workflow DG 102 can ensure that that there is no way to start at a transformation block and follow a sequence of edges that eventually loops back to the same transformation block again. This prevents execution of a workflow that falls into an infinite/indefinite loop.
  • FIG. 2 is a block diagram illustrating an example of a social networking system 200 that incorporates a classification platform system 204 to facilitate classification experiments (e.g., the classification experiment 100 of FIG. 1), in accordance with various embodiments. The social networking system 200 can include one or more application services (e.g., an application service 208A and an application service 208B, collectively as the “application services 208”). The application services 208 can generate and/or collect data from one or more client devices 212 via one or more consumer interfaces 216 (e.g., a web interface 216A, an application programming interface (API) 216B, or a combination thereof, collectively as the “consumer interfaces 216”).
  • In some embodiments, the application services 208 can process client interactions (e.g., content request, interaction with social network objects via a user interface of the social networking system 200, user-generated content updates, etc.) in real-time. The client interactions can be considered “live traffic” that are logged in a data center 220 (e.g., including a data repository 222A and a data repository 222B, collectively as the “data repositories 222”). The application services 208, for example, can include a search engine, a photo editing tool, a location-based tool, an advertisement service, a media service, an interactive content service, a messaging service, a social networking service, or any combination thereof.
  • In one example, the application service 208A is a location logger that stores geographical locations of user accounts of the social networking system 200 in a live location database in the data center 220. Because the location logger continuously updates the live location database, the live location database can be considered a live data source (e.g., the live data source 106A of FIG. 1). In one example, the application service 208B stores static user-generated content in a content database in the data center 220. Because each unique content, once uploaded, remains static, the content database can be considered a static data source (e.g., the static data source 106B of FIG. 1).
  • The classification platform system 204 can facilitate privileged users (e.g., developer accounts and/or data scientist accounts) of the social networking system 200 to run classification experiments based on social network data collected by the application services 208. The classification platform system 204 can facilitate creation of new experiments on a user interface. Each classification experiment can be repeatedly run with or without modifying its configuration parameters. Each classification experiment can correspond to at least one workflow DG (e.g., the workflow DG 102 of FIG. 1).
  • The social networking system 200 can include distributed computation systems (e.g., a distributed computation system 230A and a distributed computation system 230B, collectively as the “distributed computation systems 230”). The distributed computation systems 230 can include computation platform of different types as well as similar computation platforms for redundancy. Each of the distributed computation systems 230 can include one or more computing devices (e.g., computer servers). The classification platform system 204 can schedule the classification experiments to be run on one of the distributed computation systems 230. For example, based on the dependencies of transformation blocks indicated in a workflow DG, the classification platform system 204 can determine which of the transformation blocks to schedule on the selected distributed computation system before other transformation blocks of the workflow DG are scheduled. The classification platform system 204 can provide input data or links to the input data of the scheduled transformation blocks to the selected distributed computation system. The outputting results of the executed transformation blocks are stored back into a memoization repository 240 of the classification platform system 204. For example, the final output of the last transformation block (e.g., transformation block that is not attached to an existing edge and does not feed its output result to another transformation block) of a workflow DG is marked as classification results of the classification experiment. In some embodiments, the results of a classification experiment can be fed back to the application services 208 to modify decision-making logic of at least one of the application services 208.
  • FIG. 3 is a block diagram illustrating a classification platform system 300 that manages classification experiments (e.g., the classification experiment 100 of FIG. 1), in accordance with various embodiments. The classification platform system 300 includes an experiment management engine 304, a real-time workflow execution engine 308, a batch workflow execution engine 312, an experiment repository 316, a workflow repository 320, a feature repository 322, a data source interface 324, a transformation block repository 326, a memoization database 330, or any combination thereof.
  • The experiment management engine 304 facilitates the definition of one or more classification experiments (e.g., the classification experiment 100 of FIG. 1). The experiment management engine 304 can include an experiment definition user interface 334. The experiment definition user interface 334 can be a user interface implemented as a webpage, a website comprising interconnected webpages, an API coupled to a computer application or mobile application, or any combination thereof. The experiment definition user interface 334 can receive configuration parameters from a client device for defining and/or updating a classification experiment. The configuration parameters can be received as text formatted in a markup language or as a series of user interactions with a graphical user interface of the experiment definition user interface 334. For example, a workflow DG of the classification experiment can be defined by visually drawing, placing and arranging block shapes representing the transformation blocks of the workflow DG and directional connections (e.g., arrows) between the block shapes.
  • The experiment definition user interface 334 can also enable a client device to monitor the scheduling and execution of the classification experiment. In some embodiments, the experiment definition user interface 334 enables the client device to track the classification results of the classification experiment. In some embodiments, the experiment definition user interface 334 further receives instructions to connect the classification results to one or more external application services from a developer account.
  • In various embodiments, the classification platform system 300 can include a plurality of workflow execution engines implemented to manage one or more distributed computation systems (e.g., the distributed computation systems 230 of FIG. 2). For example, the real-time workflow execution engine 308 can be configured to execute a classification experiment on a distributed computation system by classifying live data in substantially real-time. The batch workflow execution engine 312 can be configured to execute a classification experiment by classifying batches of constant size data sets.
  • For example, each of the workflow execution engines can facilitate execution of an experiment workflow of a classification experiment. The workflow execution engines can manage and schedule execution of code packages associated with the transformation blocks in the experiment workflow. For example, a workflow execution engine can select computing devices to run specific transformation blocks of the experiment workflow, distribute code packages corresponding to the transformation blocks, distribute references or links to input datasets of the experiment workflow, and/or schedule execution of the code packages on the computing devices. The workflow execution engine can also ensure load-balancing and resource consumption minimization when scheduling the experiment workflow for execution on the selected computing devices (e.g., by managing the selection of the computing devices, distributing appropriate code packages, and/or streaming the input datasets or links thereto ahead of execution schedule while minimizing network bandwidth). The workflow execution engine can schedule execution of the workflow by analysis of the experiment workflow indicated by the workflow configuration (e.g., a DG of transformation blocks) to avoid bottlenecks, errors, and inconsistencies. The workflow execution engine can also schedule the execution of the experiment workflow based on statuses of currently running classification experiments, health data and operation states of the data sources, and/or scheduled execution times of scheduled classification experiments. For example, the workflow execution engine can ensure that a transformation block, which requires the output of another transformation block to execute, is not executed in parallel with that other transformation block.
  • In various embodiments, the classification platform system 300 includes the experiment repository 316, the workflow repository 320, and the transformation block repository 326 to facilitate definition of classification experiments. The experiment repository 316 stores definitions of previous classification experiments. Any of the classification experiments in the experiment repository 316 can be executed again using different input data. A new classification experiment can also inherit the configuration and definition of a previous classification experiment. The workflow repository 320 stores workflow configurations used in previous classification experiments. For example, the workflow configurations can be represented by a DG comprising transformation blocks connected to one another. A new classification experiment can import one or more DGs and/or workflow configurations from the workflow repository 320.
  • The transformation block repository 326 stores definitions of transformation blocks. For example, the definition of a transformation block includes logic to transform input data into some sort of output data. When creating a new DG for a workflow configuration of a classification experiment, the experiment management engine 304 can import one or more transformation blocks into the new DG from the transformation block repository 326. For example, the transformation blocks can be defined by at least an input data schema (e.g., one or more input data formats/types), a transformative function (e.g., executable instructions to transform the input data), and an output data schema (e.g., one or more output data format/types).
  • The data source interface 224 can be an API for the classification platform system 300 to connect with one or more databases storing data features (e.g., input data for classifiers) and/or precursors to data features. The data source interface 224 can also be coupled to one or more application services that generate data features and/or precursors to data features.
  • In various embodiments, the classification platform system 300 includes the feature repository 322 and the memoization database 330 to reduce wasting of computational resources when executing a classification experiment. The feature repository 322 stores one or more feature data sets that can be identified as the input data for a classification experiment. The memoization database 330 stores one or more outputs of a known transformation block. When executing an experiment workflow of a classification experiment, the real-time workflow execution engine 308 or the batch workflow execution engine 312 can determine whether to schedule for the execution of a transformation block in the experiment workflow. The workflow execution engine can determine whether to schedule for the execution of the transformation block by matching the identity of the transformation block and its input data set against a lookup table of the memoization database 330. This process enables the workflow execution engine to avoid having to execute the same transformation block more than once.
  • Functional components (e.g., devices, engines, modules, and data repositories, etc.) associated with the social networking system 200 and/or the classification platform system 300 can be implemented as a combination of circuitry, firmware, software, or other executable instructions. For example, the functional components can be implemented in the form of special-purpose circuitry, in the form of one or more appropriately programmed processors, a single board chip, a field programmable gate array, a network-capable computing device, a virtual machine, a cloud computing environment, or any combination thereof. For example, the functional components described can be implemented as instructions on a tangible storage memory capable of being executed by a processor or other integrated circuit chip. The tangible storage memory may be volatile or non-volatile memory. In some embodiments, the volatile memory may be considered “non-transitory” in the sense that it is not a transitory signal. Memory space and storages described in the figures can be implemented with the tangible storage memory as well, including volatile or non-volatile memory.
  • Each of the functional components may operate individually and independently of other functional components. Some or all of the functional components may be executed on the same host device or on separate devices. The separate devices can be coupled through one or more communication channels (e.g., wireless or wired channel) to coordinate their operations. Some or all of the functional components may be combined as one component. A single functional component may be divided into sub-components, each sub-component performing separate method step or method steps of the single component.
  • In some embodiments, at least some of the functional components share access to a memory space. For example, one functional component may access data accessed by or transformed by another functional component. The functional components may be considered “coupled” to one another if they share a physical connection or a virtual connection, directly or indirectly, allowing data accessed or modified by one functional component to be accessed in another functional component. In some embodiments, at least some of the functional components can be upgraded or modified remotely (e.g., by reconfiguring executable instructions that implements a portion of the functional components). Other arrays, systems and devices described above may include additional, fewer, or different functional components for various applications.
  • FIG. 4 is a block diagram illustrating an example of a classification experiment configuration 400, in accordance with various embodiments. The classification experiment configuration 400 defines the parameters of a classification experiment. The classification experiment configuration 400 can include a prediction space definition 404, a labeled data space definition 408, a workflow configuration 412, a domain configuration 416, or any combination thereof. The prediction space definition 404 can define data sets or open-ended data streams to run the trained classifier model on. The labeled data space definition 408 can define data sets for classifier training and/or classified evaluation. The workflow configuration 412 defines an experiment workflow of the classification experiment. For example, the workflow configuration 412 can define a DG representative of the experiment workflow. The domain configuration 416 can bind the workflow configuration 412 with the labeled data space definition 408 and/or the prediction space definition 404.
  • The domain configuration 416 can also enable various classification experiment execution features supported by a classification platform system (e.g., the classification platform system 300 of FIG. 3). For example, the execution features can include a checkpoint feature that prevents regression of a classifier model by evaluating, in real-time, whether a transformation block is malfunctioning, and whether to terminate a scheduled run or an executing run of an experiment workflow. The execution features can also include an active learning feature that actively requests labeling of data entries from a user account (e.g., a user account of the social networking system 200 of FIG. 2). The active learning feature can occur after an experiment workflow has been executed or during the execution of the experiment workflow.
  • FIG. 5 is a block diagram illustrating an example of a transformation block memoization database 500, in accordance with various embodiments. For example, the transformation block memoization database 500 can be a lookup table. The keys to the lookup table can be an input identifier 502, a transformation block identifier 506, a transformation block version identifier 510, a transformation block configuration identifier 512, or any combination thereof. In the illustrated block diagram, only a single row of the lookup table is shown. However, in various embodiments, the lookup table can include multiple rows.
  • The potential return values of the lookup table are memoized results 520 of previously executed transformation blocks. A query request to the transformation block memoization database 500 can include one or more keys. The transformation block memoization database 500 can match the requested keys against the known keys to return one of the memoized results 520.
  • The transformation block identifier 506 can be a unique identification number assigned by a classification platform system (e.g., the classification platform system 300 of FIG. 3). The transformation block version identifier 510 can be a version number. The input identifier 502 can uniquely represent an input data space of the transformation block associated with the transformation block identifier 506. For example, the input identifier 502 can be a hash of features set identifiers. The transformation block configuration identifier 512 can uniquely represent configuration parameters associated with the transformation block when the memoized results 520 were computed. For example, the transformation block configuration identifier 512 can be a hash of the configuration parameter values.
  • FIG. 6 is a flow chart illustrating a method 600 of operating a classification platform system (e.g., the classification platform system 300 of FIG. 3) to create a classification experiment, in accordance with various embodiments. In block 602, the classification platform system can interface with one or more data sources in a social networking system. The data sources can be a live data source or a static data source. The data source can also be a feature bank (e.g., the feature repository 322 of FIG. 3) of the classification platform system. The feature bank can cache feature sets generated in a previous classification experiment. These cached feature sets can be reused across multiple subsequent classification experiments
  • In block 604, the classification platform system can receive a command to define a classification experiment (e.g., by creating a new classification experiment or updating a stored classification experiment in an experiment repository (e.g., the experiment repository 316 of FIG. 3). In block 606, the classification platform system can define one or more input data spaces. For example, a user account with access to the classification platform system can select least one of the data sources interfaced with the classification platform system to include in the input data spaces. The input data spaces can include a prediction space and a labeled data space. The selected data source can include a live data source from a social networking system. The live data source can produce an open-ended stream of new data entries formatted according to one or more data formats of the defined input data space. The selected data source can include a static data source from a social networking system. The static data source can include a static data set with a constant data size formatted according to one or more data formats of the defined input data space.
  • In block 608, the classification platform system can define, via a definition user interface (e.g., the experiment definition user interface 334 of FIG. 3), a workflow configuration of the classification experiment. In some embodiments, the definition user interface can be a graphical user interface. In one example, defining the classification experiment can include inheriting a directed graph for the workflow configuration from a workflow repository (e.g., the workflow repository 320 of FIG. 3).
  • In another example, the definition user interface can receive commands to graphically arrange a DG connecting a plurality of transformation blocks to represent an experiment workflow of the classification experiment. The graphical arrangement can include placing graphical representations of the transformation blocks and connecting the transformation blocks with one or more arrows or other directional graphical representations. The DG can specify how one or more outputs of each of the transformation blocks are fed into one or more other transformation blocks. Each transformation block can be selected from a transformation block repository (e.g., the transformation blocks repository 326 of FIG. 3).
  • The input data space, from block 606, can define one or more data sources to feed into at least a subset of the transformation blocks and one or more data fields to extract from the data sources. The input data space can be a labeled data space that includes at least a parameter to locate labeled data for training a supervised classifier machine learning model or for evaluating classification precision or recall of a classifier model. The input data space can be a prediction space that includes at least a parameter to locate input data to be classified in the classification experiment. The input data space can select at least a live data source from the data sources to feed into at least one of the transformation blocks.
  • The transformation blocks can include a data feature extraction process, data feature filtering process, a data feature transformation process, a classifier deliberation process, a classifier training process, a classifier evaluation process, or any combination thereof. For example, the transformation blocks can include a transformation block representative of a supervised machine learning training process that utilizes a labeled data source defined in the labeled data space for training. The transformation blocks can include a transformation block representative of a classification evaluation process that utilizes a labeled data source defined in the labeled data space to evaluate classification precision or classification recall. The transformation blocks can include a supervised machine learning prediction/deliberation process that utilizes data identified by the prediction space to run against one or more rules or trained machine learning models to produce a classification result.
  • In some embodiments, a transformation block in the DG includes logic to dynamically modify the DG during execution of the experiment workflow. The transformation block can include logic to dynamically modify input data of an existing transformation block in the DG. The transformation block can include logic to change or remove an existing transformation block in the DG or to add a new transformation block to the DG.
  • In block 610, the classification platform system can define a domain configuration of the classification experiment. The domain configuration can include at least a parameter binding the input data space and the workflow configuration.
  • In block 612, the classification platform system can format the workflow configuration, the input data space, and/or the domain configuration into a data structure such that the data structure is interpretable by a plurality of different computation platforms (e.g., computation platforms of different types operating under different programming paradigms). Each of the computation platforms can be capable of executing the classification experiment. The computation platforms can utilize one or more different programming paradigms (e.g., task parallelism, data parallelism, imperative workflow programming, declarative workflow programming) based on the workflow configuration of the classification experiment. For example, the data structure can be formatted to describe the dependency of transformation blocks as extracted from the DG and the input space, and thus enables a task parallel computation platform (e.g., real-time platform) to schedule transformation blocks based on dependency and interdependency of the transformation blocks. In another example, the data structure can be formatted to describe the relative size of input data of the transformation blocks as extracted from the DG and the input space, and thus enables a data parallel computation platform (e.g., batch processing platform) to schedule instances of the transformation blocks to process different elements within a large dataset.
  • FIG. 7 is a flow chart illustrating a method 700 of operating a classification platform system (e.g., the classification platform system 300 of FIG. 3) to execute a classification experiment, in accordance with various embodiments. In block 702, the classification platform system can select, from a plurality of different computation platforms, a distributed computation platform to execute at least part of the classification experiment. The distributed computation platform can be selected based on a geographical or network location of the distributed computation platform relative to one or more geographical or network locations of input data specified in a defined input data space of the classification experiment. The selected distributed computation platform can be configured to execute the classification experiment in substantially real-time in response to new data from the live data source. The selected distributed computation platform can be configured to execute the classification experiment in batch (e.g., via recurring batch processing).
  • In block 704, the classification platform system can schedule the selected distributed computation platform to execute at least part of the classification experiment according to the input data space and a workflow configuration (e.g., represented by a DG of transformation blocks) of the classification experiment. The classification platform system can schedule a first part (e.g., a first set of transformation blocks) of the workflow configuration to execute on a first computation platform and a second part (e.g., a second set of transformation blocks) of the workflow configuration to execute on a second computation platform. In some embodiments, the first part and the second part are mutually exclusive.
  • In some embodiments, scheduling the execution of the classification experiment includes imperatively programming computing nodes of the distributed computation platform to execute the transformation blocks based on the DG. This type of programming occurs when at least one of the transformation blocks in the DG can dynamically modify the DG during execution. Imperative programming is a programming paradigm that uses statements that change a program workflow's state. Imperative programming can focus on describing how a program workflow operates. The term can be used in contrast to declarative programming, which focuses on what the program workflow should accomplish without specifying how the program workflow should achieve the result.
  • In some embodiments, scheduling the execution of the classification experiment includes declaratively programming computing nodes of the distributed computation platform to execute the transformation blocks based on the DG. This type of programming occurs when the transformation blocks represent static functional blocks, where the transformation blocks in the DG does not explicitly manage the control flow of the distributed computation platform.
  • In block 706, the classification platform system can prevent a transformation block in the DG from being executed by the distributed computation platform if the transformation block as defined by the workflow configuration matches an entry in a memoization database (e.g., the memoization database 330 of FIG. 3). The entry can include pre-computed output result of the transformation block given the same or substantially same input and configuration. The memoization database reduces wasted computation resources when a developer is rerunning a classification experiment multiple times with modifications and changes to a subset of its experiment workflow. The memoization database enables the classification platform system to avoid having to re-execute a transformation block that is not modified. The memoization database also reduces wasted computation resources when multiple developers are running related classification experiments sharing one or more transformation blocks.
  • In block 708, the classification platform system can pipe an output result of executing the classification experiment to an application service. For example, the classification platform system can pipe the output result to a social networking system (e.g., the social networking system 200 of FIG. 2) to re-configure at least an application service of the social network system. In some embodiments, the classification experiment includes a comparison of multiple DGs representing data classification workflows.
  • In some embodiments, the output result of a classification experiment is stored in a database (e.g., a workflow database). For example, a classification experiment can include a main DG with multiple child DGs representing data classification workflows. The main DG can include a transformation block to evaluate the child DGs (e.g., by computing classification recall and classification precision of the data classification workflows). The main DG can include a transformation block to compare the evaluative measures. In these embodiments, the result of the classification experiment includes a data classification workflow selected as the “best” according to the evaluative measures. At the conclusion of the classification experiment, the data classification workflow can be marked in a workflow repository as having the best performance.
  • While processes or blocks are presented in a given order, alternative embodiments may perform routines having steps, or employ systems having blocks, in a different order, and some processes or blocks may be deleted, moved, added, subdivided, combined, and/or modified to provide alternative or subcombinations. Each of these processes or blocks may be implemented in a variety of different ways. In addition, while processes or blocks are at times shown as being performed in series, these processes or blocks may instead be performed in parallel, or may be performed at different times. When a process or step is “based on” a value or a computation, the process or step should be interpreted as based at least on that value or that computation.
  • FIG. 8 is a block diagram of an example of a computing device 800, which may represent one or more computing device or server described herein, in accordance with various embodiments. The computing device 800 can be one or more computing devices that implement the social networking system 200 of FIG. 2 and/or the classification platform system 300 of FIG. 3. The computing device 800 can execute at least part of the method 600 of FIG. 6 or the method 700 of FIG. 7. The computing device 800 includes one or more processors 810 and memory 820 coupled to an interconnect 830. The interconnect 830 is an abstraction that represents any one or more separate physical buses, point-to-point connections, or both connected by appropriate bridges, adapters, or controllers. The interconnect 830, therefore, may include, for example, a system bus, a Peripheral Component Interconnect (PCI) bus or PCI-Express bus, a HyperTransport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB), IIC (I2C) bus, or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus, also called “Firewire”.
  • The processor(s) 810 is/are the central processing unit (CPU) of the computing device 800 and thus controls the overall operation of the computing device 800. In certain embodiments, the processor(s) 810 accomplishes this by executing software or firmware stored in memory 820. The processor(s) 810 may be, or may include, one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), trusted platform modules (TPMs), or the like, or a combination of such devices.
  • The memory 820 is or includes the main memory of the computing device 800. The memory 820 represents any form of random access memory (RAM), read-only memory (ROM), flash memory, or the like, or a combination of such devices. In use, the memory 820 may contain a code 870 containing instructions.
  • Also connected to the processor(s) 810 through the interconnect 830 are a network adapter 840 and a storage adapter 850. The network adapter 840 provides the computing device 800 with the ability to communicate with remote devices, over a network and may be, for example, an Ethernet adapter or Fibre Channel adapter. The network adapter 840 may also provide the computing device 800 with the ability to communicate with other computers. The storage adapter 850 enables the computing device 800 to access a persistent storage, and may be, for example, a Fibre Channel adapter or SCSI adapter.
  • The code 870 stored in memory 820 may be implemented as software and/or firmware to program the processor(s) 810 to carry out actions described above. In certain embodiments, such software or firmware may be initially provided to the computing device 800 by downloading it from a remote system through the computing device 800 (e.g., via network adapter 840).
  • The techniques introduced herein can be implemented by, for example, programmable circuitry (e.g., one or more microprocessors) programmed with software and/or firmware, or entirely in special-purpose hardwired circuitry, or in a combination of such forms. Special-purpose hardwired circuitry may be in the form of, for example, one or more application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), etc.
  • Software or firmware for use in implementing the techniques introduced here may be stored on a machine-readable storage medium and may be executed by one or more general-purpose or special-purpose programmable microprocessors. A “machine-readable storage medium,” as the term is used herein, includes any mechanism that can store information in a form accessible by a machine (a machine may be, for example, a computer, network device, cellular phone, personal digital assistant (PDA), manufacturing tool, any device with one or more processors, etc.). For example, a machine-accessible storage medium includes recordable/non-recordable media (e.g., read-only memory (ROM); random access memory (RAM); magnetic disk storage media; and/or optical storage media; flash memory devices), etc.
  • The term “logic,” as used herein, can include, for example, programmable circuitry programmed with specific software and/or firmware, special-purpose hardwired circuitry, or a combination thereof.
  • Some embodiments of the disclosure have other aspects, elements, features, and steps in addition to or in place of what is described above. These potential additions and replacements are described throughout the rest of the specification. Reference in this specification to “various embodiments” or “some embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Alternative embodiments (e.g., referenced as “other embodiments”) are not mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not other embodiments. Reference in this specification to where a result of an action is “based on” another element or feature means that the result produced by the action can change depending at least on the nature of the other element or feature.

Claims (20)

1. A computer-implemented method, comprising:
defining a classification experiment for executing tasks on a computation platform by at least:
defining an input data space by selecting at least one of data sources interfaced with a classification platform system; and
defining, via a definition user interface of the classification platform system, a workflow configuration of the classification experiment by defining a directed graph (DG) connecting a plurality of transformation blocks to represent an experiment workflow, wherein the DG specifies one or more connections from one or more outputs of each transformation block in the experiment workflow to one or more other transformation blocks;
determining, based one or more previously executed classification experiments, that at least one transformation block included in the plurality of transformation blocks was previously executed;
generating an execution schedule that specifies which system component executes each transformation block in the plurality of transformation blocks excluding the at least one transformation block; and
scheduling the computation platform to execute the classification experiment by the system components under the execution schedule according to the input data space and the workflow configuration.
2. The computer-implemented method of claim 1, wherein the DG is arranged graphically via the definition user interface and wherein the definition user interface is a graphical user interface.
3. The computer-implemented method of claim 1, wherein said scheduling includes scheduling a first part of the workflow configuration to execute on a first computation platform and a second part of the workflow configuration to execute on a second computation platform.
4. The computer-implemented method of claim 1, further comprising selecting the computation platform based on a geographical or network location of the computation platform relative to one or more geographical or network locations of input data specified in the input data space.
5. The computer-implemented method of claim 1, further comprising maintaining a memorization database; wherein said scheduling includes preventing a transformation block from being executed by the computation platform when the transformation block as defined by the workflow configuration matches an entry in the memorization database; and wherein the entry includes pre-computed output result of the transformation block given the same input and configuration.
6. The computer-implemented method of claim 1, wherein that at least one data source comprises a live data source from a social networking system, and wherein the live data source produces an open-ended stream of new data entries formatted according to one or more data formats of the input data space.
7. The computer-implemented method of claim 1, wherein that at least one data source comprises a static data source from a social networking system, and wherein the static data source includes a static data set with a constant data size formatted according to one or more data formats of the input data space.
8. The computer-implemented method of claim 1, wherein the DG is acyclical and thereby prevents execution of the classification experiment to enter an infinite loop.
9. The computer-implemented method of claim 1, wherein a transformation block in the DG includes logic to dynamically modify the DG during execution of the experiment workflow.
10. The computer-implemented method of claim 9, wherein the transformation block includes logic to dynamically modify input data of an existing transformation block in the DG.
11. The computer-implemented method of claim 9, wherein the transformation block includes logic to change or remove an existing transformation block in the DG or to add a new transformation block to the DG.
12. The computer-implemented method of claim 1, further comprising piping an output result of executing the classification experiment to a social networking system to re-configure at least an application service of the social network system.
13. The computer-implemented method of claim 1, wherein the input data space is a labeled data space that includes at least a parameter to locate labeled data for training a supervised classifier machine learning model or for evaluating classification precision or recall of a classifier model, wherein said training or said evaluating is represented in a transformation block in the DG.
14. The computer-implemented method of claim 1, wherein the input data space is a prediction space that includes at least a parameter to locate input data to be classified in the classification experiment.
15. The computer-implemented method of claim 1, wherein defining the classification experiment further includes defining a domain configuration that includes at least a parameter binding the input data space to the workflow configuration.
16. The computer-implemented method of claim 1, wherein defining the classification experiment includes inheriting a directed graph for the workflow configuration from a workflow repository.
17. One or more non-transitory computer readable storage media storing instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of:
defining a classification experiment for executing tasks on a computation platform by at least:
defining an input data space by selecting at least one of data sources interfaced with a classification platform system; and
defining, via a definition user interface of the classification platform system, a workflow configuration of the classification experiment by defining a directed graph (DG) connecting a plurality of transformation blocks to represent an experiment workflow, wherein the DG specifies one or more connections from one or more outputs of each transformation block in the experiment workflow to one or more other transformation blocks;
determining, based one or more previously executed classification experiments, that at least one transformation block included in the plurality of transformation blocks was previously executed;
generating an execution schedule that specifies which system component executes each transformation block in the plurality of transformation blocks excluding the at least one transformation block; and
scheduling the computation platform to execute the classification experiment by the system components under the execution schedule according to the input data space and the workflow configuration.
18. The one or more non-transitory computer readable media of claim 17, wherein the instructions further cause the one or more processors to maintain a memorization database; wherein said scheduling includes preventing a transformation block from being executed by the computation platform when the transformation block as defined by the workflow configuration matches an entry in the memorization database; and wherein the entry includes pre-computed output result of the transformation block given the same input and configuration.
19. A computer system, comprising:
one or more memories storing one or more instructions; and
one or more processors for executing the one or more instructions to:
define a classification experiment for executing tasks on a computation platform by at least:
defining an input data space by selecting at least one of data sources interfaced with a classification platform system; and
defining, via a definition user interface of the classification platform system, a workflow configuration of the classification experiment by defining a directed graph (DG) connecting a plurality of transformation blocks to represent an experiment workflow, wherein the DG specifies one or more connections from one or more outputs of each transformation block in the experiment workflow to one or more other transformation blocks;
determine, based one or more previously executed classification experiments, that at least one transformation block included in the plurality of transformation blocks was previously executed;
generate an execution schedule that specifies which system component executes each transformation block in the plurality of transformation blocks excluding the at least one transformation block; and
schedule the computation platform to execute the classification experiment by the system components under the execution schedule according to the input data space and the workflow configuration.
20. The computer system of claim 19, wherein the one or more processors further maintain a memorization database; wherein said scheduling includes preventing a transformation block from being executed by the computation platform when the transformation block as defined by the workflow configuration matches an entry in the memorization database; and wherein the entry includes pre-computed output result of the transformation block given the same input and configuration.
US16/916,040 2016-06-30 2020-06-29 Computation platform agnostic data classification workflows Abandoned US20200334293A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/916,040 US20200334293A1 (en) 2016-06-30 2020-06-29 Computation platform agnostic data classification workflows

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15/199,351 US10698954B2 (en) 2016-06-30 2016-06-30 Computation platform agnostic data classification workflows
US16/916,040 US20200334293A1 (en) 2016-06-30 2020-06-29 Computation platform agnostic data classification workflows

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US15/199,351 Continuation US10698954B2 (en) 2016-06-30 2016-06-30 Computation platform agnostic data classification workflows

Publications (1)

Publication Number Publication Date
US20200334293A1 true US20200334293A1 (en) 2020-10-22

Family

ID=60807691

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/199,351 Active 2038-08-01 US10698954B2 (en) 2016-06-30 2016-06-30 Computation platform agnostic data classification workflows
US16/916,040 Abandoned US20200334293A1 (en) 2016-06-30 2020-06-29 Computation platform agnostic data classification workflows

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US15/199,351 Active 2038-08-01 US10698954B2 (en) 2016-06-30 2016-06-30 Computation platform agnostic data classification workflows

Country Status (1)

Country Link
US (2) US10698954B2 (en)

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11973784B1 (en) 2017-11-27 2024-04-30 Lacework, Inc. Natural language interface for an anomaly detection framework
US11991198B1 (en) 2017-11-27 2024-05-21 Lacework, Inc. User-specific data-driven network security
US12021888B1 (en) 2017-11-27 2024-06-25 Lacework, Inc. Cloud infrastructure entitlement management by a data platform
US12034750B1 (en) 2017-11-27 2024-07-09 Lacework Inc. Tracking of user login sessions
US12032634B1 (en) 2019-12-23 2024-07-09 Lacework Inc. Graph reclustering based on different clustering criteria
US12034754B2 (en) 2017-11-27 2024-07-09 Lacework, Inc. Using static analysis for vulnerability detection
US12058160B1 (en) 2017-11-22 2024-08-06 Lacework, Inc. Generating computer code for remediating detected events
US12095879B1 (en) 2017-11-27 2024-09-17 Lacework, Inc. Identifying encountered and unencountered conditions in software applications
US12095796B1 (en) 2017-11-27 2024-09-17 Lacework, Inc. Instruction-level threat assessment
US12095794B1 (en) 2017-11-27 2024-09-17 Lacework, Inc. Universal cloud data ingestion for stream processing
US12126695B1 (en) 2017-11-27 2024-10-22 Fortinet, Inc. Enhancing security of a cloud deployment based on learnings from other cloud deployments
US12126643B1 (en) 2017-11-27 2024-10-22 Fortinet, Inc. Leveraging generative artificial intelligence (‘AI’) for securing a monitored deployment
US12130878B1 (en) 2017-11-27 2024-10-29 Fortinet, Inc. Deduplication of monitored communications data in a cloud environment
US12261866B1 (en) 2017-11-27 2025-03-25 Fortinet, Inc. Time series anomaly detection
US12267345B1 (en) 2017-11-27 2025-04-01 Fortinet, Inc. Using user feedback for attack path analysis in an anomaly detection framework
US12284197B1 (en) 2017-11-27 2025-04-22 Fortinet, Inc. Reducing amounts of data ingested into a data warehouse
US12309236B1 (en) 2017-11-27 2025-05-20 Fortinet, Inc. Analyzing log data from multiple sources across computing environments
US12309185B1 (en) 2017-11-27 2025-05-20 Fortinet, Inc. Architecture for a generative artificial intelligence (AI)-enabled assistant
US12309181B1 (en) 2017-11-27 2025-05-20 Fortinet, Inc. Establishing a location profile for a user device
US12309182B1 (en) 2017-11-27 2025-05-20 Fortinet, Inc. Customer onboarding and integration with anomaly detection systems
US12323449B1 (en) 2017-11-27 2025-06-03 Fortinet, Inc. Code analysis feedback loop for code created using generative artificial intelligence (‘AI’)
US12335286B1 (en) 2017-11-27 2025-06-17 Fortinet, Inc. Compute environment security monitoring using data collected from a sub-kernel space
US12335348B1 (en) 2017-11-27 2025-06-17 Fortinet, Inc. Optimizing data warehouse utilization by a data ingestion pipeline
US12341797B1 (en) 2017-11-27 2025-06-24 Fortinet, Inc. Composite events indicative of multifaceted security threats within a compute environment
US12348545B1 (en) 2017-11-27 2025-07-01 Fortinet, Inc. Customizable generative artificial intelligence (‘AI’) assistant
US12355626B1 (en) 2017-11-27 2025-07-08 Fortinet, Inc. Tracking infrastructure as code (IaC) asset lifecycles
US12355793B1 (en) 2017-11-27 2025-07-08 Fortinet, Inc. Guided interactions with a natural language interface
US12355787B1 (en) 2017-11-27 2025-07-08 Fortinet, Inc. Interdependence of agentless and agent-based operations by way of a data platform
US12363148B1 (en) 2017-11-27 2025-07-15 Fortinet, Inc. Operational adjustment for an agent collecting data from a cloud compute environment monitored by a data platform
US12368746B1 (en) 2017-11-27 2025-07-22 Fortinet, Inc. Modular agentless scanning of cloud workloads
US12368745B1 (en) 2017-11-27 2025-07-22 Fortinet, Inc. Using natural language queries to conduct an investigation of a monitored system
US12368747B1 (en) 2019-12-23 2025-07-22 Fortinet, Inc. Using a logical graph to monitor an environment
US12375573B1 (en) 2017-11-27 2025-07-29 Fortinet, Inc. Container event monitoring using kernel space communication
US12381901B1 (en) 2017-11-27 2025-08-05 Fortinet, Inc. Unified storage for event streams in an anomaly detection framework
US12395573B1 (en) 2019-12-23 2025-08-19 Fortinet, Inc. Monitoring communications in a containerized environment
US12401669B1 (en) 2022-01-31 2025-08-26 Fortinet, Inc. Container vulnerability management by a data platform

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9996804B2 (en) 2015-04-10 2018-06-12 Facebook, Inc. Machine learning model tracking platform
US10395181B2 (en) 2015-06-05 2019-08-27 Facebook, Inc. Machine learning system flow processing
US10643144B2 (en) 2015-06-05 2020-05-05 Facebook, Inc. Machine learning system flow authoring tool
US10417577B2 (en) 2015-06-05 2019-09-17 Facebook, Inc. Machine learning system interface
US10147041B2 (en) 2015-07-14 2018-12-04 Facebook, Inc. Compatibility prediction based on object attributes
US10229357B2 (en) 2015-09-11 2019-03-12 Facebook, Inc. High-capacity machine learning system
US10459979B2 (en) 2016-06-30 2019-10-29 Facebook, Inc. Graphically managing data classification workflows in a social networking system with directed graphs
US11544621B2 (en) * 2019-03-26 2023-01-03 International Business Machines Corporation Cognitive model tuning with rich deep learning knowledge
CN112825044B (en) * 2019-11-21 2023-06-13 杭州海康威视数字技术股份有限公司 Task execution method, device and computer storage medium
CN112231091B (en) * 2020-11-05 2022-08-23 北京理工大学 Parallel cloud workflow scheduling method based on reinforcement learning strategy
US12353413B2 (en) 2023-08-04 2025-07-08 Optum, Inc. Quality evaluation and augmentation of data provided by a federated query system
US12204538B1 (en) 2023-09-06 2025-01-21 Optum, Inc. Dynamically tailored time intervals for federated query system
US12393593B2 (en) 2023-09-12 2025-08-19 Optum, Inc. Priority-driven federated query-based data caching
CN118820910B (en) * 2024-09-19 2024-11-19 北京芯盾时代科技有限公司 Heterogeneous network security big data management method and system

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060271528A1 (en) * 2003-09-10 2006-11-30 Exeros, Inc. Method and system for facilitating data retrieval from a plurality of data sources
US20130013539A1 (en) * 2011-01-13 2013-01-10 International Business Machines Corporation System and method for domain adaption with partial observation
US20140026144A1 (en) * 2012-07-23 2014-01-23 Brandon John Pack Systems And Methods For Load Balancing Of Time-Based Tasks In A Distributed Computing System
US20140032617A1 (en) * 2012-07-24 2014-01-30 Craig W. Stanfill Mapping entities in data models
US20150286748A1 (en) * 2014-04-08 2015-10-08 RedPoint Global Inc. Data Transformation System and Method
US9235652B1 (en) * 2012-06-27 2016-01-12 Groupon, Inc. Optimizing a data integration process
US20160011905A1 (en) * 2014-07-12 2016-01-14 Microsoft Technology Licensing, Llc Composing and executing workflows made up of functional pluggable building blocks
US20160072918A1 (en) * 2014-09-09 2016-03-10 Ashot Gabrelyanov System and Method for Acquisition, Management and Distribution of User-Generated Digital Media Content
US20160358103A1 (en) * 2015-06-05 2016-12-08 Facebook, Inc. Machine learning system flow processing
US20170039239A1 (en) * 2015-08-03 2017-02-09 Sap Se Distributed resource-aware task scheduling with replicated data placement in parallel database clusters
US20170337138A1 (en) * 2016-05-18 2017-11-23 International Business Machines Corporation Dynamic cache management for in-memory data analytic platforms
US9886336B2 (en) * 2015-10-28 2018-02-06 Facebook, Inc. Automatic filing of a task for application crashes
US10459979B2 (en) * 2016-06-30 2019-10-29 Facebook, Inc. Graphically managing data classification workflows in a social networking system with directed graphs
US10643144B2 (en) * 2015-06-05 2020-05-05 Facebook, Inc. Machine learning system flow authoring tool
US10733165B1 (en) * 2015-07-06 2020-08-04 Workiva Inc. Distributed processing using a node hierarchy

Family Cites Families (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8010703B2 (en) * 2000-03-30 2011-08-30 Prashtama Wireless Llc Data conversion services and associated distributed processing system
US6946715B2 (en) * 2003-02-19 2005-09-20 Micron Technology, Inc. CMOS image sensor and method of fabrication
US7451432B2 (en) * 2004-10-01 2008-11-11 Microsoft Corporation Transformation of componentized and extensible workflow to a declarative format
BRPI0419152B1 (en) * 2004-10-28 2018-02-06 Telecom Italia S.P.A. “METHODS TO MANAGE RESOURCES IN A TELECOMMUNICATION SERVICE PLATFORM AND / OR NETWORK MANAGEMENT AND TO ESTABLISH AND MANAGE TELECOMMUNICATION SERVICES, PLATFORM TO MANAGE RESOURCES FOR TELECOMMUNICATION AND / OR REDES, TELECOMMUNICATION AND NETWORK SERVICES, NETWORK, NETWORK, NETWORK AND NETWORK, BY COMPUTER ”
GB0519981D0 (en) * 2005-09-30 2005-11-09 Ignios Ltd Scheduling in a multicore architecture
JP4908073B2 (en) * 2006-06-15 2012-04-04 株式会社日立製作所 Service-based software design support method and apparatus therefor
US8010534B2 (en) * 2006-08-31 2011-08-30 Orcatec Llc Identifying related objects using quantum clustering
US20090265290A1 (en) 2008-04-18 2009-10-22 Yahoo! Inc. Optimizing ranking functions using click data
US8150723B2 (en) 2009-01-09 2012-04-03 Yahoo! Inc. Large-scale behavioral targeting for advertising over a network
US8719769B2 (en) * 2009-08-18 2014-05-06 Hewlett-Packard Development Company, L.P. Quality-driven ETL design optimization
US8751521B2 (en) 2010-04-19 2014-06-10 Facebook, Inc. Personalized structured search queries for online social networks
US9087332B2 (en) 2010-08-30 2015-07-21 Yahoo! Inc. Adaptive targeting for finding look-alike users
US20120316981A1 (en) * 2011-06-08 2012-12-13 Accenture Global Services Limited High-risk procurement analytics and scoring system
US8510807B1 (en) * 2011-08-16 2013-08-13 Edgecast Networks, Inc. Real-time granular statistical reporting for distributed platforms
US8918387B1 (en) * 2012-04-04 2014-12-23 Symantec Corporation Systems and methods for classifying applications configured for cloud-based platforms
US20140108308A1 (en) 2012-07-13 2014-04-17 Social Data Technologies, LLC System and method for combining data for identifying compatibility
US10482482B2 (en) 2013-05-13 2019-11-19 Microsoft Technology Licensing, Llc Predicting behavior using features derived from statistical information
US10248675B2 (en) 2013-10-16 2019-04-02 University Of Tennessee Research Foundation Method and apparatus for providing real-time monitoring of an artifical neural network
US9672497B1 (en) * 2013-11-04 2017-06-06 Snap-On Incorporated Methods and systems for using natural language processing and machine-learning to produce vehicle-service content
US9542412B2 (en) * 2014-03-28 2017-01-10 Tamr, Inc. Method and system for large scale data curation
US10339465B2 (en) 2014-06-30 2019-07-02 Amazon Technologies, Inc. Optimized decision tree based models
US10467569B2 (en) * 2014-10-03 2019-11-05 Datameer, Inc. Apparatus and method for scheduling distributed workflow tasks
US9836591B2 (en) * 2014-12-16 2017-12-05 Qualcomm Incorporated Managing latency and power in a heterogeneous distributed biometric authentication hardware
US9135559B1 (en) 2015-03-20 2015-09-15 TappingStone Inc. Methods and systems for predictive engine evaluation, tuning, and replay of engine performance
US10713594B2 (en) 2015-03-20 2020-07-14 Salesforce.Com, Inc. Systems, methods, and apparatuses for implementing machine learning model training and deployment with a rollback mechanism
US9996804B2 (en) 2015-04-10 2018-06-12 Facebook, Inc. Machine learning model tracking platform
US10417577B2 (en) 2015-06-05 2019-09-17 Facebook, Inc. Machine learning system interface
US10147041B2 (en) 2015-07-14 2018-12-04 Facebook, Inc. Compatibility prediction based on object attributes
US10229357B2 (en) 2015-09-11 2019-03-12 Facebook, Inc. High-capacity machine learning system
US9965330B2 (en) * 2015-09-18 2018-05-08 Salesforce.Com, Inc. Maintaining throughput of a stream processing framework while increasing processing load
US10146584B2 (en) * 2016-01-28 2018-12-04 Ca, Inc. Weight adjusted dynamic task propagation
US20170262654A1 (en) * 2016-03-14 2017-09-14 Rita H. Wouhaybi Secure group data exchange
US10452677B2 (en) * 2016-06-19 2019-10-22 Data.World, Inc. Dataset analysis and dataset attribute inferencing to form collaborative datasets
US20180004835A1 (en) 2016-06-30 2018-01-04 Facebook, Inc. Data classification workflows implemented with dynamically modifiable directed graphs
US11256743B2 (en) * 2017-03-30 2022-02-22 Microsoft Technology Licensing, Llc Intermixing literal text and formulas in workflow steps

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060271528A1 (en) * 2003-09-10 2006-11-30 Exeros, Inc. Method and system for facilitating data retrieval from a plurality of data sources
US20130013539A1 (en) * 2011-01-13 2013-01-10 International Business Machines Corporation System and method for domain adaption with partial observation
US9235652B1 (en) * 2012-06-27 2016-01-12 Groupon, Inc. Optimizing a data integration process
US20140026144A1 (en) * 2012-07-23 2014-01-23 Brandon John Pack Systems And Methods For Load Balancing Of Time-Based Tasks In A Distributed Computing System
US20140032617A1 (en) * 2012-07-24 2014-01-30 Craig W. Stanfill Mapping entities in data models
US20150286748A1 (en) * 2014-04-08 2015-10-08 RedPoint Global Inc. Data Transformation System and Method
US20160011905A1 (en) * 2014-07-12 2016-01-14 Microsoft Technology Licensing, Llc Composing and executing workflows made up of functional pluggable building blocks
US20160072918A1 (en) * 2014-09-09 2016-03-10 Ashot Gabrelyanov System and Method for Acquisition, Management and Distribution of User-Generated Digital Media Content
US20160358103A1 (en) * 2015-06-05 2016-12-08 Facebook, Inc. Machine learning system flow processing
US10643144B2 (en) * 2015-06-05 2020-05-05 Facebook, Inc. Machine learning system flow authoring tool
US10733165B1 (en) * 2015-07-06 2020-08-04 Workiva Inc. Distributed processing using a node hierarchy
US20170039239A1 (en) * 2015-08-03 2017-02-09 Sap Se Distributed resource-aware task scheduling with replicated data placement in parallel database clusters
US9886336B2 (en) * 2015-10-28 2018-02-06 Facebook, Inc. Automatic filing of a task for application crashes
US20170337138A1 (en) * 2016-05-18 2017-11-23 International Business Machines Corporation Dynamic cache management for in-memory data analytic platforms
US10459979B2 (en) * 2016-06-30 2019-10-29 Facebook, Inc. Graphically managing data classification workflows in a social networking system with directed graphs

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12058160B1 (en) 2017-11-22 2024-08-06 Lacework, Inc. Generating computer code for remediating detected events
US12284197B1 (en) 2017-11-27 2025-04-22 Fortinet, Inc. Reducing amounts of data ingested into a data warehouse
US12309185B1 (en) 2017-11-27 2025-05-20 Fortinet, Inc. Architecture for a generative artificial intelligence (AI)-enabled assistant
US12034750B1 (en) 2017-11-27 2024-07-09 Lacework Inc. Tracking of user login sessions
US12381901B1 (en) 2017-11-27 2025-08-05 Fortinet, Inc. Unified storage for event streams in an anomaly detection framework
US12034754B2 (en) 2017-11-27 2024-07-09 Lacework, Inc. Using static analysis for vulnerability detection
US11991198B1 (en) 2017-11-27 2024-05-21 Lacework, Inc. User-specific data-driven network security
US12095879B1 (en) 2017-11-27 2024-09-17 Lacework, Inc. Identifying encountered and unencountered conditions in software applications
US12095796B1 (en) 2017-11-27 2024-09-17 Lacework, Inc. Instruction-level threat assessment
US12095794B1 (en) 2017-11-27 2024-09-17 Lacework, Inc. Universal cloud data ingestion for stream processing
US12309181B1 (en) 2017-11-27 2025-05-20 Fortinet, Inc. Establishing a location profile for a user device
US12126695B1 (en) 2017-11-27 2024-10-22 Fortinet, Inc. Enhancing security of a cloud deployment based on learnings from other cloud deployments
US12126643B1 (en) 2017-11-27 2024-10-22 Fortinet, Inc. Leveraging generative artificial intelligence (‘AI’) for securing a monitored deployment
US12130878B1 (en) 2017-11-27 2024-10-29 Fortinet, Inc. Deduplication of monitored communications data in a cloud environment
US12206696B1 (en) 2017-11-27 2025-01-21 Fortinet, Inc. Detecting anomalies in a network environment
US12244621B1 (en) 2017-11-27 2025-03-04 Fortinet, Inc. Using activity monitored by multiple data sources to identify shadow systems
US12261866B1 (en) 2017-11-27 2025-03-25 Fortinet, Inc. Time series anomaly detection
US12267345B1 (en) 2017-11-27 2025-04-01 Fortinet, Inc. Using user feedback for attack path analysis in an anomaly detection framework
US11973784B1 (en) 2017-11-27 2024-04-30 Lacework, Inc. Natural language interface for an anomaly detection framework
US12021888B1 (en) 2017-11-27 2024-06-25 Lacework, Inc. Cloud infrastructure entitlement management by a data platform
US12309236B1 (en) 2017-11-27 2025-05-20 Fortinet, Inc. Analyzing log data from multiple sources across computing environments
US12120140B2 (en) 2017-11-27 2024-10-15 Fortinet, Inc. Detecting threats against computing resources based on user behavior changes
US12309182B1 (en) 2017-11-27 2025-05-20 Fortinet, Inc. Customer onboarding and integration with anomaly detection systems
US12323449B1 (en) 2017-11-27 2025-06-03 Fortinet, Inc. Code analysis feedback loop for code created using generative artificial intelligence (‘AI’)
US12335286B1 (en) 2017-11-27 2025-06-17 Fortinet, Inc. Compute environment security monitoring using data collected from a sub-kernel space
US12335348B1 (en) 2017-11-27 2025-06-17 Fortinet, Inc. Optimizing data warehouse utilization by a data ingestion pipeline
US12341797B1 (en) 2017-11-27 2025-06-24 Fortinet, Inc. Composite events indicative of multifaceted security threats within a compute environment
US12348545B1 (en) 2017-11-27 2025-07-01 Fortinet, Inc. Customizable generative artificial intelligence (‘AI’) assistant
US12355626B1 (en) 2017-11-27 2025-07-08 Fortinet, Inc. Tracking infrastructure as code (IaC) asset lifecycles
US12355793B1 (en) 2017-11-27 2025-07-08 Fortinet, Inc. Guided interactions with a natural language interface
US12355787B1 (en) 2017-11-27 2025-07-08 Fortinet, Inc. Interdependence of agentless and agent-based operations by way of a data platform
US12363148B1 (en) 2017-11-27 2025-07-15 Fortinet, Inc. Operational adjustment for an agent collecting data from a cloud compute environment monitored by a data platform
US12368746B1 (en) 2017-11-27 2025-07-22 Fortinet, Inc. Modular agentless scanning of cloud workloads
US12368745B1 (en) 2017-11-27 2025-07-22 Fortinet, Inc. Using natural language queries to conduct an investigation of a monitored system
US12375573B1 (en) 2017-11-27 2025-07-29 Fortinet, Inc. Container event monitoring using kernel space communication
US12368747B1 (en) 2019-12-23 2025-07-22 Fortinet, Inc. Using a logical graph to monitor an environment
US12032634B1 (en) 2019-12-23 2024-07-09 Lacework Inc. Graph reclustering based on different clustering criteria
US12395573B1 (en) 2019-12-23 2025-08-19 Fortinet, Inc. Monitoring communications in a containerized environment
US12401669B1 (en) 2022-01-31 2025-08-26 Fortinet, Inc. Container vulnerability management by a data platform

Also Published As

Publication number Publication date
US10698954B2 (en) 2020-06-30
US20180004859A1 (en) 2018-01-04

Similar Documents

Publication Publication Date Title
US20200334293A1 (en) Computation platform agnostic data classification workflows
US10459979B2 (en) Graphically managing data classification workflows in a social networking system with directed graphs
US20180004835A1 (en) Data classification workflows implemented with dynamically modifiable directed graphs
JP7715657B2 (en) Method, system, and computer readable program
US11645548B1 (en) Automated cloud data and technology solution delivery using machine learning and artificial intelligence modeling
JP6926047B2 (en) Methods and predictive modeling devices for selecting predictive models for predictive problems
US11276011B2 (en) Self-managed adaptable models for prediction systems
US10395181B2 (en) Machine learning system flow processing
US10643144B2 (en) Machine learning system flow authoring tool
US20200333772A1 (en) Semantic modeling and machine learning-based generation of conceptual plans for manufacturing assemblies
CN116508019A (en) Learning-based workload resource optimization for database management systems
CN118093962A (en) Data retrieval method, device, system, electronic equipment and readable storage medium
US12079214B2 (en) Estimating computational cost for database queries
Lugaresi et al. Generation and tuning of discrete event simulation models for manufacturing applications
US12254419B2 (en) Machine learning techniques for environmental discovery, environmental validation, and automated knowledge repository generation
CN113535804A (en) Business data processing method, device, equipment and system
CN115248815A (en) Predictive query processing
TWI879684B (en) High-performance resource and job scheduling
TW202522215A (en) High-performance resource and job scheduling
CN114385121B (en) Software design modeling method and system based on business layering
KR102651797B1 (en) Machine learning platform system based on software-defined manufacturing for ai non-technical
US20250173550A1 (en) Artificial intelligence-driven data classification
Anderson Deep Mining: scaling Bayesian auto-tuning of data science pipelines
Li Performance management of event processing systems
Roehl Cloud Based IoT Architecture

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

AS Assignment

Owner name: FACEBOOK, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PIECHOWICZ, SZYMON;NAVEH, BARAK REUVEN;LIU, ANNIE HSIN-WEN;AND OTHERS;SIGNING DATES FROM 20160915 TO 20160916;REEL/FRAME:054259/0113

AS Assignment

Owner name: META PLATFORMS, INC., CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:FACEBOOK, INC.;REEL/FRAME:058605/0840

Effective date: 20211028

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION