WO2022003737A1 - Platform and method for pluggable ai and machine learning cataloguing and prediction - Google Patents

Platform and method for pluggable ai and machine learning cataloguing and prediction Download PDF

Info

Publication number
WO2022003737A1
WO2022003737A1 PCT/IT2020/000055 IT2020000055W WO2022003737A1 WO 2022003737 A1 WO2022003737 A1 WO 2022003737A1 IT 2020000055 W IT2020000055 W IT 2020000055W WO 2022003737 A1 WO2022003737 A1 WO 2022003737A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
database
temporal
anyone
processes
Prior art date
Application number
PCT/IT2020/000055
Other languages
French (fr)
Inventor
Davide GIAROLO
Davide BONAMINI
Pietro SALA
Original Assignee
Giarolo Davide
Bonamini Davide
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Giarolo Davide, Bonamini Davide filed Critical Giarolo Davide
Priority to PCT/IT2020/000055 priority Critical patent/WO2022003737A1/en
Publication of WO2022003737A1 publication Critical patent/WO2022003737A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • G06Q10/06375Prediction of business process outcome or impact based on a proposed change
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the invention concerns a computer-implemented method for operating a system for customer relationship management and related systems involving artificial intelligence systems to classify and predict business process outcomes.
  • CRM Customer Relationship Management Software
  • IoT Internet of Things
  • CRM software no longer manages just human interactions but implements more and more automated, semi-automated, or supervised processes.
  • the correct executions of such processes usually needs the correct integration of heterogeneous sources of data.
  • data quality is crucial but often obstacled by errors, missing values, which need (automatic) corrections.
  • CRM solutions must be able to generate periodic reports, interactive dashboards and complex analysis often based on OLAP (Online Analytical Processing) cubes.
  • OLAP Online Analytical Processing
  • the scope of the present invention is to propose a CRM method and related systems to overcome the above drawbacks and which guarantee an economic, fast and reliable way to help companies to guide business processes to a positive outcome adapting computing methods and systems such that they minimize errors and need as less human intervention or individual skills as possible.
  • a further problem to be solved by the invention is to offer a system that substitutes vague and incomplete guidelines by an automatic method and system based system that permits the technical implementation of artificial intelligence avoiding human errors.
  • CRM customer relationship management
  • At least one explicit process is selected from defined process models already present in said temporal database;
  • the preparation steps (I) and (II) are executed by a temporal extraction module that is responsible for transforming all the data collected by the CRM system into a temporal database representation.
  • a temporal extraction module that is responsible for transforming all the data collected by the CRM system into a temporal database representation.
  • Such representation can form the input of the AI algorithms and every subsequent analysis to be performed.
  • the core of the process is the process discovery part that permits a processs driven management of business activities and decision making.
  • the process base of the CRM system is more accurate than a solely data based system that needs a high grade of expert intervents. From a technical point of view, the challenge does not (only) arise in terms of interaction as it can be easily overcome by using standard communication logic between systems but rather in ensuring that the current approach in terms of prediction model is able to adapt independently to the data of companies operating in different sectors.
  • a great advantage of the system is the fact that the generated process models can be based on data already available to the company, already stored in the company’s database. It follows that the simple import of data present in the various company silos into the system, combined with the automatic generation of models, makes the solution ready to work on different business sectors. This means that the complete description of events is contained in the database(s) of the CRM software, such set of events are called implicit processes. To the knowledge of the inventors, up to now process driven systems have never been proposed due to complex processes that involve different kinds of customers
  • the process discovery module active in step (III) discovers implicit processes related to various entities from the data produced by the temporal extraction. Implicit processes represent de-facto practices in the workplace or customer behaviour. Since such processes are supported by the data in the first case, they are useful for discovering loopholes and inefficiencies. In the latter case, implicit processes representing customer behaviour are not just an indication of how certain classes of customers behave in general, but actually also determine the exact points in which a decision has to be made in order to drive the behaviour either to a desired outcome (e.g., new purchase, license renewal, and so on) or away from an undesired one (e.g. customer abandonment). Such decision points are represented by exclusive gateways in the extracted process models, for example BPMN diagrams.
  • CRM users may extract BPMN processes from their data or formalize them by hand by importing them into CRM or using the CRM built-in editor.
  • CRM processes extracted from data are immediately available in the CRM editor for analysis and/or revision. Users may then adopt the hybrid approach of extracting processes from data, modifying the extracted processes to improve their execution flow and, finally, send the process to production for execution.
  • Any BPMN process whether discovered by process discovery or formalized/revised by a human, may be immediately instantiated for execution in the CRM engine.
  • the CRM engine already allows the insertion of custom code into BPMN elements like tasks or gateways (e.g. for automating mail notification, detecting IoT events on industrial machines, checking expiration dates for payments, and so on).
  • Such custom code is executed whenever the execution of a process instance reaches the enclosing element. Then, in its current stage of development the CRM engine is ready and perfectly capable of integrating AI algorithms that may be inserted into tasks or gateways. Establishing how such plugins are orchestrated inside a process is the responsibility of a classification manager module realizing step (IV).
  • IV classification manager module realizing step
  • decision points in a process model whether extracted or formalized by hand, are represented by exclusive gateways.
  • the importance of having a formal process representation, like BPMN is that it allows to determine where and when, during execution of a process, to take advantage of decision support
  • process model and AI go hand in hand.
  • the process is as a way to supervise the application of AI methods by providing suitable recovery mechanisms, still in the form of BPMN task exceptions and control flow elements, to intercept the inevitable errors conveyed by statistical methods.
  • the classification manager module implements such statistical methods; it is responsible for testing and scoring a plethora of classification algorithms in order to find (if any) the best method to embed it in an exclusive gateway for decision support. This happens almost automatically.
  • the user has to specify the goal by simply clicking on the corresponding part of the process model and stating whether he/she wishes to drive towards or drive away from such a component (e.g. you want to drive towards a renewal event and away from an abandonment event).
  • the classification manager for each exclusive gateway in the process performs feature selection, testing and scoring over all the data provided by the temporal extraction layer, and, if some accuracy threshold is complied with, the best classification method to apply in the gateway for deciding is found.
  • an object classification maker configurator allows the system to be easily and flexibly configured via the interface, configurations are set up in self-service mode.
  • This approach ensures that the system administrator or data scientist can configure for a given type of objects in the system the characteristic of the type of data to be interpreted to indicate to the machine whether it is for example simple text or articulated text and above all which data must instead be guessed, without having to code programming.
  • the system learns how to generate an anonymized classification model to be used later to guess the classification of a new object In the same way it will be possible to download preconfigured models provided by the community.
  • a system, library of ready to use classificators can be implemented which, based on the metadata of the standard CRM modules, allows to share the standardized classifier models by industry type with the user community. These classifiers will be simply installable and usable. The corrections made by users to the predictions will be subsequently, in an anonymized form, used to power and improve the classification engine for the benefit of all the installations that use it, through a built-in update system.
  • the implicit process discovery integration advantageously takes place in order to maintain an approach without code, the system administrator or data scientist will be able to select a series of modules and submit them in selfservice mode to the machine, entrusting it with the task of discovering the inherent processes.
  • the dedicated configuration panel will allow the use of pre-packaged layouts for the classic modules present in the CRM system, but at the same time, in order to guarantee the current flexibility of the product, it will allow the configuration of new metadata for new or customized modules. Subsequently it will be possible to preview these processes in BPMN format and at the same time activate them following a possible modification, in order to optimize tire identified workflows. There will also be some links with the classifier that in simplified mode will allow to better identify the groups of similar objects.
  • said process models are in a BPMN format, optionally integrated with an artificial intelligence system to be executed at a certain stage of the process model.
  • a BPMN (business process model and notation) engine permits to systematically deal with the aforementioned requirements in a standard and replicable way.
  • a BPMN engine is a program that guarantees the correct execution of processes written in Business Process Management Notation, that is, the de-facto standard for formalizing most of the processes in today company world.
  • the advantages of implementing company processes as BPMN and of delegating their executions to an engine are manyfold: at any time for every process instance (i.e., a single execution of a BPMN process) it is possible to monitor the execution.
  • the BPMN engine is often delegated to notify to automatically start the next task in a workflow when the current one has been completed, in doing so the engine usually automatically notifies the correct person for the next task and provides all the necessary data for supporting its work.
  • a BPMN process implemented with an AI function at a certain stage of the process permits to take decisions at deviation points by AI systems which may detect error sources and apply a more suitable solution.
  • a monolithic BPMN diagram would be too difficult to understand, maintain and execute, especially counter-intuitiveness is an unforgivable sin in a BPMN diagram since BPMN notation has been adopted for being a process management among people with different educations inside a company and its success heavily relies on its simplicity. Thus it is easier to have a group of BPMN diagrams which are classified and are selected on the bases of determined static features or trigger events. The most common BPMN diagrams used in CRM installations will be shared through an online library to facilitate their adoption.
  • diagrams can be downloadable and activated via the online library available for consultation directly in the software.
  • step (V) the process models are further revised by an artificial intelligence system in a forecast manager module taking data and process history from the temporal database to identify inside the process models paths which have an increased probability to guide the process to the desired outcome and storing said revised process models in said database.
  • an artificial intelligence system in a forecast manager module taking data and process history from the temporal database to identify inside the process models paths which have an increased probability to guide the process to the desired outcome and storing said revised process models in said database.
  • This module uses the data taken from the temporal extraction module to see if certain paths in the process may increase the probability of the desired or undesired outcome. This is done, for example, by testing and scoring various forecasting methods against the process history and selecting the most accurate for predicting the probability of the given outcome. If a user so desires he/she can activate the method and receive prompt warnings when such probability is too low (i.e., in the case of desired outcomes) or too high (i.e., in the case of undesired events).
  • said artificial intelligence systems to be applied in said classification, extraction, modification and/or forecasting operations are selected according to predefined criteria among a plurality of artificial intelligence systems which are tested against the corresponding process model.
  • the artificial intelligence systems applied are suitable for at least one member of the group consisting of fault prediction, natural language processing and understanding, and customer behaviour prediction based on collaborative filtering techniques.
  • AI technologies are the following: fault prediction for hardware/software Systems [6] and failure prediction for industrial machines [3] ; natural language processing (NLP) [1] and natural language understanding (NLU) [4] for customer support (e.g. ticket redirection, and chatbot assistant); customer behaviour prediction based on collaborative filtering echniques [2] .
  • the above points usually imply at least the following steps to be completely integrated into the company’s day-to-day activities: expert consulting to address feasibility and the correct AI model to apply; tuning, testing and evaluation of the AI algorithm on the company data to achieve the desired performances; integration of the AI algorithm into the company’s software system.
  • the invention provides the small/medium-sized enterprises with an entry-level tool for consistently testing, tuning and integrating AI algorithms into their processes, simulating process outcomes and choosing the solution that best suits the needs.
  • the solution is a framework for integrating AI plugins over the CRM software which provides an extensible and full featured process engine in its core.
  • Processes are expressed, for example, using the Business Process Management Notation (BPMN) [5] which is the de-facto standard for formalizing organizational processes.
  • BPMN Business Process Management Notation
  • Current artificial intelligence solutions are vertical instruments for each function or sector, therefore their use is independent of availability for the specific sector. Even when available these tools are atomic entities and therefore provide for the feeding of their data and offer as output a result that must subsequently be worked on or made available for the appropriate assessments. Usually this activity is delegated to system integrators or data scientists with a variable cost for each project and results that are not always certain.
  • Other CRM solutions have artificial intelligence on board but based on pre-established and static configurations.
  • said artificial intelligence systems are comprised in a market place, in particular a market place for open source systems.
  • AI plug and plays permit the test of several instruments against the process models for finding out the best one.
  • a standardized (BPMN) process based CRM engine, with sector independent process models, opens the possibility to do so.
  • said databases contain data, event logs and process models in a generalized manner without sector or department specific attributes and preferably in an anonymized manner.
  • a freemium GDPR (General Data Protection Regulation) Cloud a datacenter can be activated and conforms to the GDPR, where in complete autonomy and in self-service mode companies can map and import pseudo-anonymously the information and events present in their CRM.
  • the pseudo-anonymized databases loaded will contribute to the creation of classifiers and processes that can be downloaded from the marketplace.
  • AI elements thus come from a module called model marketplace.
  • AI models are very specialized and there is a flourishing of AI-focuse start-ups in Europe that study and implement efficient and accurate solutions for specific needs.
  • This module allows the user to test and compare the best of the AI models on the market.
  • the data that normally feed the platform can be immediately used to be processed by artificial intelligence.
  • an interface system can been set up that allows to select the fields that characterize information to be treated and the field that must be guessed by the artificial intelligence system, so the solution is ready to allow immediate use of artificial intelligence in different situations.
  • the next step is to allow the interaction with different classifiers, activated for example, through the marketplace and thus allowing to evaluate the actual goodness in the field, their integration can take place via a microservice API (application programming interface) logic.
  • a microservice API application programming interface
  • said temporal database comprises information regarding the moment in which a certain event occurs and the time interval for which a certain information is valid.
  • said discovery step (IH-i) further comprises the identification of a trigger event within the process model. Identifying a trigger event in the process model means identifying exactly the event which drives the process (without intervention) into one or another direction, and selecting a certain path at this stage allows to positively drive the process.
  • step (Ill-i) said process models are further modified by a human user who selects in candidate process models desired and undesired outcomes to train the artificial intelligence system employed.
  • a comparison function evaluates the changes introduced in terms of process improvements and time flow.
  • the comparison function maybe in the form of a discovered processes performance checker: for the processes discovered by the method and subsequently modified by the expert or AI for better execution, a comparison function can be implemented between the previous executions and the current execution orchestrated according to the changes made. In this way, according to the identification of the new flow execution times and the steps involved, it will be shown graphically if there have been performance improvements with the new adoption or if the time flow remains unchanged and it is therefore necessary to intervene again to improve it.
  • said process models stored in said databases are periodically revised by new artificial intelligence systems which have not yet been applied.
  • Such a periodic revision function allows for having always the best fitting AI system available on the market accessed by the user and thus, in other words, permits substituting an old AI system by a new better one.
  • the metrics in particular fitness, precision and score of the discovered process models are tested with predefined thresholds and only the process models that passed the test are expressed in the event array. This step optimizes the process selection.
  • step (V) the execution in step (V) is automatically performed when a certain static feature and trigger event fall together.
  • a certain process is started when the conditions of static and dynamic behaviour fall together.
  • a process driven method and system considers dynamic behaviour and reacts in a dynamic way.
  • step (IV) may comprise detecting a value to be controlled and driving the process to the desired outcome whenever the process reaches a given trigger event identified by this detected value.
  • said database comprises a change-log functionality regarding temporal entities and attributes for the discovery of trigger events, and optionally an XML description wherein the set of events to be considered when looking for trigger events can be restricted, and in particular thresholds for data mining parameters such as support, precision, fitness and score. XML description and thresholds help to target trigger events. For extracting such instances advantage is taken of the change-log feature that considers dynamic behaviour and helps to identify hidden instances which are the moment where a process is driven to a certain outcome. Usually software applications implement such change-log features for recovery purposes and thus, there is no need for additional plugins or custom functions implemented, it is possible to just start from what has been collected in the database so far.
  • said database comprises an XML metamodel containing lifespan and static features, temporal features, events, thresholds for mining and classifications, the latter ones being inserted at the end of the classification/mining step, an id, and a temporal XML metamodel to move inside the temporal database for extracting traces regarding corresponding leads, wherein said metamodel XML is general being applicable to every context which uses a database containing change-logs relative attributes and temporal entities with the possibility to personalize it, specifying hierarchies and alias with an option to exclude the source of relative changes and decide whether to consider or not the order number with respect to the temporal entity.
  • said event array has a structure corresponding to a vector which connects any master entity, i.e event, with another array containing all the events belonging to the event identified for a lead and a timestamp associated to any event which indicates the exact moment when it took place, further counting the number of scores of a certain instance (singleton) exceeding a predefined threshold and defining a table of cardinalities connecting it to the possible instances of another attribute and creating possible combinations forming entity clusters.
  • epsilon_events for each group of attributes from the event array are only extracted the events relative to the leads which satisfy the requirement of the static attribute.
  • for every extracted event is verified if it is supported by the group of traces present in the event array exceeding the corresponding threshold, i.e. epsilon_events.
  • the pairs of set (static feature) and events are controlled to find the first occurrence of the event among the traces belonging to a selected lead by the limits represented by the set of attributes, truncating the analyzed trace at this point and creating sub-traces having as a start event the event of interest (trigger) expressed.
  • step (Ill-i) comprises the following steps:
  • an xml description extracts from the database an event array which is a representation of the set of traces representing the implicit processes;
  • a process discovery tool starts to enumerate all the combinations of static features which are sufficiently represented, i.e. above a certain, in particular user defined, support threshold, in the event array;
  • the process discovery tool filters the traces associated to customers with static features in the event array and on this set of traces the tool selects the events with higher occurrences, again according to some defined threshold;
  • the steps for process discovery are performed in a web-based application of a virtual net of containers which communicate among each other, wherein a first container communicates with a second container to have access to the databases with read and write permission to create XES files regarding lead clusters and then communicates with process mining tools contained in a fourth container, a web server, through a database and message broker contained in a third container, wherein preferably the first container extracts event logs in XES format, and the mining tools work with process models in BPMN format
  • a configuration not necessarily is inside the company’s system, but can be used from outside, such that the company does not have to install such a complex system, but at the same time can have controlled, also in a codified protected manner, its data/processes.
  • classification and/or forecasting steps are supported by a text miner.
  • a text miner helps to find inside complex texts (emails for example) key words or expressions which usually mean a certain customer attitude or behaviour and can help to identify trigger events to intervene and control the customer’s behaviour.
  • the text miner may be realized as a standalone container and may be used for similar situations on different data or it may be installed locally or in a cloud.
  • step (IV) the stored process models during step (IV) are integrated with an alert function.
  • an improved alert system is implemented to highlight these reports.
  • the suggested steps will therefore not be binding but underlined as if it was a navigator for the best route.
  • This panel of suggestions goes alongside the process assistant already in use for the processes implemented by the company.
  • this function will have the task of ranking among the possible options or rather engaging the user only for value-added activities, neglecting or minimizing the impact of those reports which, if not considered, would have an irrelevant impact in terms of overall business.
  • a computing system is foreseen which is adapted to perform the process according to the invention, in particular a computing system for customer relationship management (CRM), comprising:
  • a CRM database for recording customer data including static features and logs of events generated by company processes and optionally external processes and being associated with
  • a classification manager module for one or more of the following activities to be performed against the processes furnished by the process discovery module or directly from the temporal extraction module: selection and integration of AI plug-ins, searching errors, implementing statistical methods for testing and scoring a classification algorithm to find the best mode for decision support for each gateway, and selecting the modality of execution, i.e. supervised, non supervised, semi-supervised;
  • a forecast manager module that takes data from the temporal extraction module or the process discovery module to verify if certain paths have an increased probability to achieve the desired outcome, preferably by testing various forecast systems against process history and selecting the most accurate for predicting the probability of the given outcome;
  • the single elements are suitable to perform the single steps listed above, each one according to its functions.
  • the various modules and their plugins can be incapsulated in little stand-alone independent containers. Installation can be very modular and there is no particular problem in distributing such components.
  • a data processing system comprising means for carrying out the steps of the method according to the invention.
  • a computer program comprising instructions which, when the program is executed by a computer causes the computer to carry out the steps of the method according to the invention.
  • a computer-readable storage medium comprising instructions which, when the program is executed by a computer causes the computer to carry out the steps of the method according to the invention.
  • the invention permits to solve the problem initially stated and thus proposes a method for predicting and controlling in an efficient, fast and economic way business management processes, even considering dynamic behaviours, and this independently from sectors. It permits to extract a structured version of the implicit processes, in particular in form of multiple BPMN diagrams, focussed on a combination of the customer static features and the suspected events that triggered the process, thus enabling the user to mark desired or undesired outcomes in the process.
  • AI systems for classification and/or forecasting are helpful, which can be sustained by human intervention to train the artificial system.
  • the continuous storing of identified and classified/modified systems in the database let it become richer and usable (in an anonymous manner) in a shared form.
  • the start is a CRM database containing some implicit process already present and/or to be discovered.
  • a structured version of the implicit processes can be extracted, which usually is not represented by a single BPMN diagram but by a collection of BPMN diagrams according to some static features of the customer and some recurrent trigger events.
  • the process may be repeated for measuring improvements, detect critical situations and deal with them, evaluating/improving existing AI techniques, find new decisions where AI is applicable, and so on.
  • the inventions involves both the development of a process discovery technique and of using plugins for AI driven decisions to be integrated in the process model.
  • Using the process driven CRM logics permits to increase the return of investment to accelerate the implementations and to reduce dramatically the project failures.
  • the customers can use a new kind of technology, based on the business processes shooting the departmental silos and pushing the organization to work for real for the customer needs without the need to split the CRM building into many areas (such as marketing, sales, services, ...) and which is not depending on personal experience, character and points of view of a human being.
  • An open source platform which fully integrates an engine which may execute BPMN diagrams is Vtenext of Crmvillage.biz in Italy focusing on the integration of data and processes.
  • an effective use of the day-by-day more pervasive AI in the context of CRM by integrating it via BPMN guarantees sufficient levels of controllability, return of investment, validation, learning, and privacy and independency.
  • Companies can preserve their dats, working with the invention locally on their machines or via a cloud- based service where the AI can be provided with no third party support.
  • a database with a change-log functionality stores the implicit process; and an xml description for obtaining a temporal view of the data, wherein this xml can then be specialized with user options, for instance, a user may restrict the set of events to be considered when looking for trigger events and/or user defined thresholds for data mining parameters such as support, precision, fitness, and score.
  • the method algorithm uses the specialized xml description for extracting from the database an event array which is a representation of the set of traces representing the implicit process, then the framework starts to enumerate all the combinations of static features which are sufficiently represented (i.e., above the support threshold defined by the user) in the event array; and for each combination of static features the framework filters the traces associated to customers with static features in the event array.
  • the framework selects the events with higher occurrences, again according to some user defined thresholds; for every pair (static feature, event) obtained before, a BPMN diagram is generated using for example Splitminer (an open-source process miner provided by the University of Melbourne), afterwards every BPMN diagram obtained is checked against the thresholds for the metrics fitness, precision, and score using, for example, the Markov Fitness Precision (an open-source tool provided by the University of Melbourne), and finally, if the BPMN diagram obtained successfully passes the test performed, the triple BPMN diagram/static feature/event is saved in the output database, otherwise it is discarded.
  • Splitminer an open-source process miner provided by the University of Melbourne
  • Markov Fitness Precision an open-source tool provided by the University of Melbourne
  • every step can be realized as a distinct (Docker) container and, for example, Redis can be used as message broker.
  • Docker a distinct container
  • Redis can be used as message broker.
  • one supported pair static feature, event
  • the discovered diagrams may be viewed before all the computation has been terminated. It is possible to parallelize the process for additional speed up by simply replicating the containers for Splitminer and Markov Fitness Precision.
  • advantegeously standard representation is used for the event array (i.e., Open XES format) and BPMN diagrams (i.e., BPMN 2.0 specification).
  • the system permits the test of several AI plugins against extracted process models, a solution in particular in the case of open source AIs is very economic and widens the range of testable models, finding out the best fit to obtain optimal results.
  • the process driven type can operate with a logic to dynamic processes.
  • the solution can be applied cross-department and it does not operate on individual CRM areas but in a transversal way on customer management processes.
  • the AI engine positions the framework at the top of selection of potential customers, since no one up to now uses this technology in CRM processes.
  • the AI engine can introduce very easily the CRM process logics into the medium size companies with the process finder.
  • Fig. 1 illustrates a block diagram showing a process driven CRM system with a process discovery, classification and forecasting module
  • Fig. 2 illustrates a flow chart of an example for a process model in form of a BPMN diagram.
  • Fig. 3 illustrates the process discovery part of a CRM system according to the invention in a computing environment. DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
  • Fig. 1 illustrates a block diagram showing a process driven CRM system with a process discovery 54, a classification 62 and a forecasting module 64 and their interaction.
  • the CRM system comprises a database 50 containing explicit and implicit processes which are extracted by the temporal extraction module 52.
  • the process discovery module 54 extracts process models 56 in the form of implicit 56a and explicit processes 56b.
  • the user or an AI system selects candidate processes 58 for a further AI analysis, inside the candidate processes the user or an AI system selects desired or undesired outcomes 60, the process models with selected outcomes pass a classification 62 and optionally a forecasting managing module. Both modules work with scoring techniques and extract and modify processes 66.
  • the user or an AI algorithm can make choices of unsupervised or supervised process parts and insert warning thresholds obtaining production-ready Al-powered processes 68.
  • the processes 66, 68 are stored in the data base 60 and can pass a new cycle as already described undergoing classification and forecasting with new AI systems or can be used directly.
  • the classification and forecasting modules optionally, can be offered by a marketplace 67 which ideally is an open source.
  • Fig. 2 illustrates a flow chart of a simplified example for a process model in form of a BPMN diagram.
  • the example encompasses all the interactions between the various modules.
  • a company is using CRM for managing the customers of a software solution.
  • the software has a community edition and the company promotes its use by seminars and conferences.
  • the BPMN diagram shown in Fig. 2 is obtained, which summarizes the common interactions between a lead and employees using CRM.
  • a lead 1 Following a seminar 3 a lead 1 usually downloads the community version 5 and either asks for a couple of training sessions 7 with the experts of the company or starts asking by using the support ticket 9 for more information on how to use the software.
  • a request for a training session 7 it is up to the manager to decide to charge the lead fully 11, to apply a discounted price 13, or to provide the training free of charge 15.
  • leads show a high level of interest in the software by repeatedly asking for support 17a a manager contacts him/her in order to check if he/she is interested in a training session 19, alternatively, the ticket, after the question has been answered, is simply closed 17b and no further ticket is opened, here the system can intertvene and propose a training day 7.
  • the manager At the end of a training period 23 the manager always asks the lead to become a customer 25 by purchasing the enterprise edition of the software.
  • the offer may be refused 21, this may happen already for the offer for a paid training day.
  • the desired outcome is clearly that of the offer accepted 27.
  • the CRM user who is managing the process simply clicks on it and marks it as desired goal.
  • the aim can be that it is desired to make the choice between full charge 11, discount price 13, and free training day 15 depending on the size of the client.
  • the classification manager module will help by simply taking the feature of the various leads (provided by the temporal extraction module) and, by testing and scoring a classifier, it detects the values to look at and drives the process to the desired outcome.
  • the user may decide whether the model proposed (the size depending choice) should be discarded (the decision stays in the manager’s hands without any influence from the classification outcome) or be integrated in a supervised- fashion (the decision remains with the manager, but the outcome, and possibly the explanation, are disclosed to the manager in order to motivate his choice) or become automatic (the decision is delegated to the process engine that applies the classification model whenever it reaches the given point and takes the course of action provided by it).
  • the forecasting module helps to find a solution to the following problem: How much time since the first ticket do we have to wait before proposing a training day in order to raise the probability that the lead will accept our offer in the end? Does the number of tickets opened matter?
  • the forecast managing module considers all the history of previous similar leads and builds a set of prediction models that are tested and scored. The winning model is then proposed to the user. The user may discard the model, use as a simple waming/notification system that is limited to send an alert when certain leads may be “ripe for being collected”, i.e., when the probability of success given by the model is above a certain threshold, or, when the probability is acceptable, to contact the client directly with some message generator.
  • both classification managing and forecasting managing are benchmarkers for the models with the only aim of selecting the best one for the specific goal and are retrieved from a marketplace model.
  • the process discovery part allows the user to select the CRM modules to be analyzed and, on the basis of the time series, highlights the relevant processes after defining the data structure. More generally, the data and processes are mapped at the API level to guarantee a complete interaction with internal and external artificial intelligence tools while maintaining a unique vision on the user side.
  • the invention’s approach involves: 1) simple feeding and use of data through the application interface or through data feed flows, all configurable and capable of being activated independently by the company's IT operators;
  • the risk associated with obsolescence becomes an opportunity for continuous improvement as the solution natively allows replacement of the various artificial intelligence algorithms with the most recent and performing ones without burdening the users, simply by activating alternative artificial intelligence solutions available in the marketplace.
  • the data feed takes place through the most common synchronization practices between the systems or through special APIs, and the outcome of artificial intelligence workings automatically becomes available through the interaction APIs using open standards.
  • the data supplied for AI analysis are obfuscated beforehand to guarantee the confidentiality of the information contained, thus overcoming the reticence of companies to provide such data flows externally.
  • the invention gives the possibility of using data already uploaded to the CRM platform by individuals and other application systems (e.g.: ERP (enterprice resource planning), e- commerce, HR (human resource) applications), propose marketplace solutions to many users; adopt a pay-per-use business model where companies can take advantage of accessible prices to use algorithms without having to face startup costs.
  • ERP enterpriseprice resource planning
  • HR human resource
  • the AI or machine learning function can be of assistance, in particular for presale processes (classification of lead potential and automatic routing to the salesperson with the most appropriate skills); sale processes (classification of customers with the highest purchasing potential per new product/service inserted or customers with the highest chum rates); customer service processes (automatic identification of trouble tickets with associated routing towards the most suitable work teams); HR processes (identification of the most suitable personnel skills in relation to the type of activities to be carried out); logistic processes (identification of the most efficient routes in relation to the required transport type); supplier selection processes (classification of the most attractive vendors per type of product or service to purchase).
  • the invention offers access to multiple algorithms with infinite growth potential and with no risk of having to face the challenge of real competitors on the market and to find the best best algorithm on the market available at that time.
  • the AI algorithms currently on the market are subject to very fast depreciation because algorithms offering vastly superior performance are developed on a daily basis.
  • the end user's possibility of adopting the algorithms of different vendors on the same platform at competitive prices and being able to change them in real time is not currently provided by the existing platforms.
  • FIG. 3 illustrates the implementation of the process discovery part of the CRM engine according to the invention in a computing environment.
  • the framework designed and developed provides a contribution in the field of process discovery or mining. More precisely, it classifies a company's customers, basing the analysis on their (dynamic) behavior. To do this, there is an input database 10a containing the event logs generated by the various business processes and the related XML (extensible Markup Language) metamodel 10b of the data. These are used as a starting point for the creation of a temporal database 16 containing only the information useful for the analysis of interest.
  • XML extensible Markup Language
  • the construction of the temporal database 16 is done by an import module Ruby 12 wherein certain limits 14 are imposed to make a selection.
  • the activities recorded during the year were then reconstructed in chronological order showing interactions between the company and the various leads. Finally, the latter are classified according to their own characteristics, also identifying what are the possible triggering events related to observed behavior.
  • the framework is divided into four main sections:
  • the mining section C it reconstructs the traces of the various types of supported leads from the event log, from which the process model and the possible trigger events are discovered;
  • the user provides a database 10a containing the event log and an entry in the table used as input called pd_mining_data.
  • pd_mining_data an entry in the table used as input.
  • the following fields are defined in the table: an id: that is an integer value that uniquely represents an instance of framework execution; xml_etl 10b that is the XML metamodel of the data describing the initial database structure and its translation into the new time database 16; xml_temp 20 that is an XML model that describes how to navigate the time database in order to extract the traces of the various types of leads; and params that is a JSON (JavaScript Object Notation) document containing the thresholds 14 for the customization of the analysis related to each instance; xml_etl 10b represents the XML metamodel used for the translation of a database containing a temporal event log.
  • JSON JavaScript Object Notation
  • master_entity containing the following properties: name: name of the master entity; source_relation: table in the source database; source_key: key in the source database; target_relation: table in the time database; target_key: key in the time database; target_back_key: integer value corresponding to the related key of the database entry spring, changelog: reports all changes related to the main entities and its temporal properties. It contains the following mandatory fields: relation: corresponding table; external_key: identifier of the entity to which the change refers; field: field containing the change information; time: corresponding timestamp.
  • the changes can be found in the description field of the changelog table and refer to the main entity via aparent_id field. They can be divided into entities which were generated from an external element or entities not generated from an external element. There are fields in the source database and target fields in the time database which can have a value of 1 or 0.
  • the lifespan corresponds to the temporal interval wherein the entity interacts with the company. It contains a field corresponding to the initial moment of validity of the entity in the source database; a field corresponding to the initial moment of validity of the entity in the time database; a field corresponding to the final moment of validity of the entity in the source database; and a field corresponding to the final moment of validity of the entity in the time database.
  • Static_features are a list of aspects specific to the entity to which they relate that do not undergo changes over time, specified by the static_feature tag.
  • a static feature is not present in the database source, but only in the target field.
  • External entities generate the master entity containing fields for a parent condition: indicates whether the entity's parent is an external element or not.
  • Each lead is connected with the internal resource that generated it.
  • Source root tags specify the fields containing the information related to this resource while, the tags with root target, represent their translation into the time database.
  • Source root tags specify the fields containing the information related to this resource while, the tags with root target, represent their translation into the time database.
  • join_relation tag represents the table that relates each lead with the internal resource that generated it.
  • Source root tags specify the fields containing the information related to this resource while, the tags with root target, represent their translation into the time database. There is a mapping between the various source_fields andtarget_fields by matching the value of the order attribute.
  • tags and related fields field of the master entity in the source database; table in the time database; key of the master entity; corresponding field in the time database; field corresponding to the initial moment of entity validity in the time database; field corresponding to the final moment of validity of the entity in the time database.
  • temporal_features properties whose instances, varying over time, are associated to the reference entity with a time interval, specified by the tag temporal feature.
  • Each of these contains the following mandatory fields: a field of the master entity in the source database; a target field that contains the tags to specify the information in the temporal database with a table containing the temporal feature variations, in the time database for the various master entities; a master entity key; the instance of the temporal feature in the temporal database; a field corresponding to the initial moment of the validity of the temporal feature in the temporal database; a field corresponding to the final moment of the validity of the temporal feature in the temporal database.
  • temporal_features not present in the database.
  • Each of these has only the target tag, which in turn contains the following fields: a table containing the variations of the additional temporal feature, in the time database for the various master entities; a master entity key; instance of the additional temporal feature in the temporal database; a field corresponding to the initial moment of the validity of the additional temporal feature, in the temporal database; field corresponding to the final moment of validity of the temporal additional feature, in the time database.
  • temporal_extemal_ownerships which describe temporal relationships belonging to one entity to another, specified by the temporal_extemal__ownership tag.
  • Each of these contains the following mandatory fields: a table containing the owner entity in the source database; a key of the above table; a field containing the first owner entity in the source database; a field containing the last owner entity in the source database; a field corresponding to the initial moment of the validity of the ownership relationship in the source database; a field corresponding to the final moment of the range of ownership in the source database; fields of interest in the above table.
  • Each of these is identified by the source_field tag and the order attribute.
  • the target field contains the tags to specify the information in the time database with the following fields: table containing the owner entity in the time database; a key of the above table; an integer value corresponding to the related key of the entry in the source database; fields of interest in the above table.
  • each of these is identified by the target_ field tag and the order attribute that must be equal to the corresponding attribute in the source fields.
  • the join_relation tag represents the table that relates each lead with all the internal resources in charge of managing it (with the relative time intervals).
  • Source root tags specify the fields containing the information related to to the resources while, the target root tags represent the translation in the temporal database. Also here there is a mapping between the various source_fields and the target_field by matching the value of the order attribute.
  • the target field contains the tags to specify the information in the time database and contains a table containing the event in the time database; a key of the above table; an integer value corresponding to the relevant key of the entry in the source database; fields of interest in the above table.
  • Each of these is identified by the target_field tag and the order attribute that is equal to the corresponding attribute in the source fields.
  • One field indicates whether the event was generated by an external element or not
  • the field needs only to be specified if there is a relationship between event and internal entity that generated it It contains the following mandatory fields: corresponding field in the source database; corresponding field in the time database; internal entity that generated the event; value true and false fields.
  • Internal fields comprise internal entities that generate the event. The tag must be specified only if there is a relationship between events and internal entities that generate and specify using the extemal_entity tag.
  • Each of these contains the following mandatory fields: a field that indicates if the event parent is an external element or not. In the latter case the following fields identify the external entity: a table containing the internal entity in the source database; a key of the above table; fields of interest in the above table.
  • a target field contains the tags to specify the information in the time database with the following fields: a table containing the internal entity in the time database; a key of the above table; an integer value corresponding to the related key of the entry in the source database; fields of interest in the above table.
  • Each of these is identified by the target_field tag and the order attribute which must be equal to the corresponding attribute in the source fields.
  • the relationship in the time database that matches the event and the internal entity. It contains the following mandatory fields: relation: name of the table; event key; key of the internal entity; event timestamp.
  • the example shows the translation of the ticket comments table into comments and its related fields.
  • the externally_created tag defines the translation of the ownertype field in the Boolean externally_created one following the logic of value_true/false/default.
  • For comments generated by an internal resource it is proceeded with the analysis of the extemal_tag entities as for the master entity.
  • the join_relation related to the ticket comments table defines the relationship between the event and the entity to which it refers (ticket) while, that relating to comment_owners represents the relationship between the event and the internal resource that generated it.
  • Additional_events with a related tag which specifies in related fields the tags to specify the information in the time database with a table containing the instances of the additional event, in the time database, for the various master entities; table key; the target_field tags corresponding to the fields of interest in the event, timestamp corresponding to the time at which the additional event was verified; relationship in the time database that matches master entity and additional event with the name of the table; the master entity key; the key of the additional event; the timestamp of the additional event.
  • the temporal entities in the tree are linked to the parent entity by an interval temporal_entity tag, specified by the temporal_entity tag. Each of them contains the following fields: table in the source database; key in the source database; key of the master entity to which it refers.
  • the tag must be returned to the master entity representing the key of the temporal entity; a table in the temporal database; key in the temporal database; integer value corresponding to the relative key of the entry in the source database; a field that indicates whether the entity was generated from an external element or not (This tag is handled like the homonym in the master entity.); time interval in which the entity interacts with the company (This tag is handled like the homonym in the master entity.); a relationship in the time database that matches the master entity and temporal entity, with the following mandatory fields: name of the table; master entity key; temporal entity key.
  • a static features tag is optional and managed like the homonym in the master entity.
  • Temporal_entities are handled exactly like the master entity, while still maintaining the link with it via the source_field and the lead_tickets relationship join relation.
  • the model xml_temp 20 is defined to allow the navigation of the time database 16 in order to to extract the information of interest for the reconstruction of the tracks related to the various types of leads.
  • One of the main aspects is its generality, since it can be applied to any context that makes use of a database containing changelogs related to the attributes and temporal entities.
  • the master_entity properties to specify are static features, temporal features, events, ownerships and temporal entities. Having previously described these concepts, only the tags contained within them, with an explanation, are shown below.
  • This model 20 has a tree structure with the selected main entity (master_entity) as roots, containing the following information: name of the master entity; table in the time database; key in the time database.
  • the static features are specified by the static_feature tag and each one of these contains the following mandatory fields: name of the static feature; corresponding field in the time database.
  • the extemal_entities are specified using the external_entity tag and each of these contains the following mandatory fields: aparent_condition field indicating whether the entity's parent is an element outside or not In the latter case, the following fields identify the external entity: name of the external entity; a table containing the external entity in the time database; key of the above table; relation in the temporal database that matches master entity and external entity with fields for name of the table; external entity key; master entity key.
  • the static features are the static properties of the external entity.
  • the static_feature tag contains the following mandatory fields: name of the static property; field containing the static property; a nil_value as default value; aparent_ feature that represents an alternative static property of the external entity and containing the following mandatory fields: name of the static property; field containing the static property; and the default value.
  • the join_relation tag represents the table that relates each lead with the internal resource that generated it
  • the static_features tag specifies the static attributes of the external entity.
  • the classifications are specified using the classification tag and each of these contains the following mandatory fields: name of the classification; a table that keeps track of the classifications of each master entity, in the time database; a master entity key; a field containing the value of the classification in the time database.
  • the Lead Converted? classification of the lead lead_id is located in the converted field of the lead_converted table.
  • Temporal features are specified via the temporal_feature tag and each of these contains the following mandatory fields: name of the temporal feature; table containing the temporal feature variations for each master entity; master entity key; instance of the temporal feature.
  • the Business Unit variation of the lead_id is in the lead_business_unit field of the lead_business_unit table.
  • Each temporal_external_ownership tag contains the following mandatory fields: name of the owner entity; table that keeps track of owner changes for each master entity, in the time database; key to the table above; relationship in the temporal database that matches master entity and owner entity. It contains the following mandatory fields: name of the table; owner entity key; master entity key.
  • the static_features are the static properties of the owner entity. The following are specified through the static_feature tag and each of them contains the following mandatory fields: name of the static property; field containing the static property; the parent feature represents an alternative static property of the owner entity and contains the following mandatory fields: name of the static property; field containing the static property.
  • the join relation tag represents the table that relates each lead with the internal resources to which it is assigned.
  • the static_features tag specifies the static attributes of the owner entity.
  • Events are specified using the event tag and contain the following mandatory fields: name of the event; table containing the event in the time database; key of the above table; relationship in the temporal database that matches the master entity and event. It contains the following mandatory fields: name of the table; event key; master entity key.
  • the static_features are the static properties of the event. The following are specified through the static_feature tag and each of them contains the following fields: name of the static property; field containing the static property; aparent_feature that represents an alternative static property of the event and contains the following mandatory fields: name of the static property; field containing the static property.
  • Extemal_entities are specified using the external_entity tag and each of these contains the following mandatory fields: the parent_condition as field indicating whether the event parent is an element outside or not.
  • the following fields identify the external entity: name of the external entity; table containing the external entity in the time database; key of the above table; relation in the temporal database that matches the event and external entity. It contains the following mandatory fields: name of the table; external entity key; (parent) event key.
  • the static_features are the static properties of the external entity. They are specified by the static_feature tag and each of them contains the following mandatory fields: name of the static property; field containing the static property; default value;
  • The_ parent_ feature represents an alternative static property of the external entity and contains the following mandatory fields: name of the static property; field containing the static property; default value.
  • the example indicates that the comments table contains the events of the type Ticket Comments that have two types of static attributes: the text or its classification (parent_feature).
  • join_relation related to the ticket_comments table defines the relationship between the event and the entity to which it refers (ticket) while, that relating to comment_owners represents the relationship between the event and the internal resource that generated it
  • the temporal_entities are specified using the temporal_entity tag, and can possess in turn the characteristics just described. Each of them contains the following mandatory fields: name of the time entity; table in the time database; key of the above table; relation in the temporal database that matches the master entity and temporal entity. It contains the following mandatory fields: name of the table; temporal entity key, master entity key; the static_features tag is optional and handled like the homonym in the master entity. The same holds for the temporal_features tag, the extemal_entities tag, the temporal _extemal_ownerships tag, the events tag. In the example the temporal_entities are handled exactly like the master entity, while still maintaining the connection with it through the join_relation related to the lead_tickets relationship.
  • a generalized data model is used, a feature of the data model is its customization, i.e. the user has the ability to act on attributes, both static and temporal, specifying hierarchies and alias, excluding the source of the relevant changes and indicating whether to take into account or not the order number for the time entities. Through this customizations it is possible to highlight or not some aspects in the final process model.
  • hierarchies on static attributes this data model allows the user to define attributes at different levels of detail. There are, for example, two specified ones, i.e.the name and the role. To indicate the level of detail you want to use in practice (in this case the role), you need to make use of the enabled ⁇ "true" attribute in one of the parent_features.
  • params contains the thresholds defined by the user, used by the framework to accept or not the discovered process templates.
  • the parameters are divided into four groups (distributed as an input during the import and mining phases B, C) and defined within a JSON document:
  • Thresholds 30 define the minimum (min) and maximum (max) thresholds related to the metrics that evaluate the process in terms of
  • supports 22 represents the minimum thresholds for which a lead type with certain static attributes is defined supported by the log of events (min_support_features); a certain event is supported by the number of tracks that contain it and can be defined a possible trigger event (min_support_events);
  • limits 14 allows to define the date from which to start the extraction of the main entities in the import phase (limit_date), the time order to follow (flag: "up/down") and the number of entities to extract (limit_process). In case it is not desired to impose such limits, the first two parameters must be set to "-1 "; (4) splitminer 28: defines the following parameters to customize process templates returned as output from the process discovery tool, Splitminer:
  • percentile for the frequency threshold that varies in the range [0; 1]. It is calculated on the frequencies of the most frequent input and output arcs of each knot, and retains only those arcs with a frequency that exceeds the indicated threshold.
  • the framework takes care of analyzing the tables contained in the initial database, navigating it through the xml_etl document 20.
  • the goal is to extract from them only the necessary information, organising it within a new temporal database 16 containing information relating to time. More precisely, it allows for storage of:
  • the public schema of the new database is populated by the creation of the various tables in the xml_etl document 10b. Subsequently, the table relating to the master entity within the original database 10a is taken into account and, at the same time, the user- defined limits 14 in the params field are extracted. In particular limit_date indicates the date from which to start the extraction. A flag specifies the time order to follow and the limits 14 the number of entities to extract. For each selected entry the changes reported in the changelog register are extracted and all other entities related to it. These data are then organized within the new temporal database 16 following always the structure imposed by the xml_etl metamodel 10b.
  • the mining phase C is the phase of the event array generation, thus ofd an array of tracks (event-array), each corresponding to a master entity, containing in turn the events extracted from the database.
  • a Ruby library 18 is implemented that takes care of scrolling the tree structure of the xml _temp model 20, going from time to time to query the database 16, in order to build in an incremental way the strings related to the recorded events.
  • each of the produced strings is associated to the relative timestamp to be then ordered temporally, thus creating the tracks that compose the event-array.
  • Each track is also preceded by the identifier of the entity that generated it (Table 1).
  • the leads are divided into clusters.
  • the structure of the event-array corresponds to a vector that puts in relation each master entity (corresponding to a track) with an additional array, containing in turn the following elements: all events belonging to the track identified by the lead in question and the timestamp associated with each event, which indicates the start time, or the exact time it occurred (in the case of messages and comments).
  • each one generates a set of superior cardinality associating it with all possible instances of another attribute, thus creating any permissible combination.
  • This step is performed recursively until maximum cardinality of the sets is achieved, which corresponds to the number of static features of the master entity. Note that attribute sets that during the analysis do not appear to be supported, are not further expanded in order to avoid an unnecessary waste of time and resources.
  • Each set of attributes, generated corresponds to a cluster of entities, i.e. a group of main entities that share the same instances of the attributes that compose the set. Table 3:
  • the lead identifiers that match the attributes of the set are identified and, given the event array, only the corresponding tracks are extracted.
  • the following step takes care of recursively analyzing each of the tracks previously selected. The objective of this step is to extract from each of these traces all the events that compose them, bringing them back into a new table, associated with the corresponding set For each extracted event it is verified if it is supported by the set of tracks present in the event array.
  • This analysis consists of counting the number of distinct tracks in which the event in question appears, verifying that it exceeds the threshold called epsilon_events (selected by the user). Thereby it is possible to customize the minimum degree of support required for what could be the trigger events for the extracted behaviors.
  • Table 4 shows thefeature_event_sets.
  • the first step consists in scrolling through all the pairs (set, event) inside of the fearture_event_sets table, in order to search for the first occurrence of such an event within the tracks belonging to the selected leads through the constraints represented by the set of attributes.
  • the first step is the creation, for example, of a Docker application, represented by a virtual network of containers. This network has some characteristics common to a black box, since outside it will show only the input and output relations.
  • This application contains within it four services organized among them and represented by the relative containers: such as Jupyter, Postgres, Redis and Sinatra. These containers are able to communicate with each other within the network through the port mapping defined in the docker-compose.yml.
  • the main container Jupyter communicates with Postgres to get access to to databases with read and write permissions.
  • the two process mining tools 28 by means of a Redis database (not represented) contained in a corresponding container.
  • a web server is created in Ruby 26 through the Sinatra 28 gem.
  • This gem was chosen for its intuitiveness in the creation of a minimal web server that allows you to use the Split Miner and Markovian Fitness and Precision (MFP) as web services 28.
  • MFP Split Miner and Markovian Fitness and Precision
  • Sinatra 28 was chosen since no complex web server is needed, but a server that simply writes and starts commands on the machine terminal below and reports the results to the calling container. In addition, it is important to use a web minimal server as it is started on the same machine where Split Miner is present which, during the process discovery phases, makes use of most of the available resources.
  • Another step in the mining section C is the process discovery.
  • the Jupyter container communicates with the Redis container by saving an event log in XES format.
  • the Sinatra container 28 extracts the document and passes it as input to the Split Miner tool, running inside the physical machine.
  • This last one, through process discovery techniques, is able to extract the process model corresponding to the XES document.
  • the XES document and the process model will be saved on the web server with the purpose of being used in the next step as input for the second tool.
  • the system proceeds with the extraction of the metrics to assess the actual quality of the model obtained. This procedure is performed using the Markovian Fitness and Precision (MFP) tool.
  • MFP Markovian Fitness and Precision
  • the Jupyter container as soon as it receives the BPMN process generated by Split Miner, communicates again with the Sinatra container 28, requesting the calculation of the metrics related to the event log generated in the previous step and the extracted process model.
  • the server takes care of passing the two documents as input to the Markovian Fitness and Precision tool, also present inside the physical machine. The latter, applying conformance checking techniques, is able to provide the metrics that describe, in terms of fitness, precision and fscore, the quality of the model compared to the original event log.
  • the metrics obtained are then compared with the following thresholds 30: epsilon_fitness, epsilon_precision and epsilon_fscore (user-defined), with the aim of defining when an process model can be defined as actually conforming to the event log that has created the process model.
  • the results obtained as a result of using the two process mining tools just mentioned are saved inside the output table 32, in the database, thus switching to the output section D.
  • Each instance of the framework execution produces as output a series of entries in the same output table, within the related database.
  • id full identifier of each entry in the table; set: set of instances of static attributes that identify a type of master entity; event: event triggering the behaviours recorded for that type of entity; xes: event log reported in XES format; bpmn: process model extracted from the event log, reported in BPMN format; bpmn_translated : is the same process model just mentioned, reporting some changes to task labels in order to improve readability by the final user, metrics: metrics that describe the quality, in terms of fitness, precision and fscore, of the process model extracted from the event log; n_events: integer number representing the number of events present within the extracted process model; n_task: integer number representing the number of tasks/activities inside the extracted process model; n_gateways: integer number representing the number of gateways inside the extracted process
  • any of the three phases can only be started after having completed at least once all the ones before it
  • the commands can be started from the command line only after having performed access to the container containing the Sinatra server (running), which can be started by means of the ruby command sinatra-main.rb from the directory that contains it.
  • the first step is to generate the time database 16 related to the identifier and the module for which it is desired to perform process mining.
  • This database will contain only the (empty) public (populated by the next phase) and mining (used in the last phase) schemes.
  • the name of the new database will be of the type "temporal_ ⁇ id>_ ⁇ module>" and, in the case of which it was already present, will be regenerated by eliminating its contents.
  • the second part deals with the import phase, described in detail above, creating, in the public scheme, the tables with the respective fields specified by the user in the xml_etl metamodel 10b. Finally, each of these tables is populated with the data extracted (and timed) from the source database 10a.
  • This last part deals with the mining phase, i.e. the extraction of tracks, the division of leads into clusters, the extraction of possible trigger events and the discovery (and evaluation) of process models. It then deals with populating the final output 32 table, making it available to the end user for their own analysis.
  • the machine learning technology starts with an input of structured and/or unstructured data. These data must be of quality. According to the "garbage in, garbage out” logic, if the algorithms are instructed with incomplete or incorrect data, too many errors will be generated to be used. The main problem is that these quality data are typically owned by the user companies (suppliers of products and services).
  • the user interface according to the invention allows end users to easily use algorithms from a market place and implement them in their organization. Above has been explained how the selection, testing and production process of the different algorithms chosen by the end user for their own purposes and business processes takes place. As explained before, AI algorithms on the market today are subject to a very fast deterioration, by virtue of the fact that much more performing algorithms are invented every day.
  • the machine learning system aligns the appropriate type of learning system, for example in a supervised or unsupervised manner. At the end there is the data output in form of prediction, classification or in an exploratory manner.
  • testing a single AI algorithm for a given process implies license, expert consulting and even implemention (i.e., making a sytem talking with the API of the algorithm) costs, time consumption, without the assurance of certain results, small-medium companies will be driven away by the excess of risk of implementing it in their day-to-day activities.
  • the invention s solution is a framework for integrating AI plugins over the CRM software with preferably an extensible and full-featured BPMN engine in its core.
  • the extension of this CRM machine offered by the invention adds the following main modules: A temporal extraction module that is responsible for turning all the data collected by the vte into a temporal database representation that it is easy to manage for any kind of analysis to perform.
  • a process discovery module that allows to discover implicit processes relative to various entities from the data produced by the temporal extraction module. All the processes discovered are already integrated in CRM and may be modified, executed and integrated with the present and future plugins described in the classification manager module section below.
  • a forecast manager module helps to verify if certain paths in the process may increase the probability of the desired outcome.
  • the data thus imported/synchronized and fed from and to the outside can be used to instruct result classifiers.
  • the user defines an information silo and indicates which fields of a given object are those containing the information to be automatically catalogized and indicates a field as a value to be predicted by artificial intelligence training.
  • the classifier is ready to be used through a properly configured BPMN process.
  • it will be possible to produce also other trainings for the same classifier as the most performing and improved artificial intelligence algorithms are periodically added, which in case of increased reliability in the prediction will replace the existing classifiers for that particular service or data silos.
  • process discovery module allows to define agents to analyze the historical log of system interactions. Also in this case the interface used allows to select a data set, a particular silo of information, which under artificial intelligence can produce a series of processes in BPMN format and then display the operational flows and the lead time of the events themselves. This is done through the automatic transformation of all the data present in the system into an anonymous time database.
  • the processes highlighted can be used to analyze and improve the company's operational flow performance and can also be activated directly from the interface, i.e. they can be the starting point for the improvement of operational activities.
  • the transformation flow of data into a time database has been better specified before.
  • the CRM backoffice allows to activate artificial intelligence plugins (classifier and forecaster) and others that will be available through the dedicated marketplace in a simplified way.
  • artificial intelligence plugins classifier and forecaster
  • the transformation, preparation, delivery and orchestration of data and flows can be made available through a simple configuration interface.
  • Each new artificial intelligence plugin connected to the CRM service is then modified or engineered in order to provide plug and play intelligence.
  • the user only has to worry about correctly importing the data into the CRMt interface in use, activate the pre-configured agents according to his needs and activate the related automated flows that these automatisms will manage.
  • the computer-implemented method and realted products may be subject to further modifications or variants not described. Should such modifications or variants fall within the scope of the following claims, they shall all be deemed to be protected by this patent.
  • the components used, as well as the dimensions, numbers and shapes as long as they are compatible with the specific use and function and unless otherwise specified, may be any, depending on the needs.
  • all details can be replaced by other technically equivalent elements.

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Educational Administration (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a CRM method and system that automatically discovers hidden and consolidated customer business processes based on events flow iteration inside a company, helping to improve internal and customer processes, for example in a BPMN format. The tool can also classify automatically items to speed up the flow start and object categorization and to give decision support, by artificial intelligence aided classification (62) and optionally forecasting modules (64). The invention improves business proceedings, an improvement that mental work would not be able to give in reasonable time or reliable manner, the proposed solution avoiding through a technical implementation of different modules human or interpretation errors without the need for a data scientist. The invention furnishes a platform to permit an easy comparison of AI systems without an extensive/expensive phase of trial and error, wherein the AI system which best fits the CRM needs can be found.

Description

“PLATFORM AND METHOD FOR PLUGGABLE AI AND MACHINE LEARNING
CATALOGUING AND PREDICTION”
* * * * *
TECHNICAL FIELD The invention concerns a computer-implemented method for operating a system for customer relationship management and related systems involving artificial intelligence systems to classify and predict business process outcomes.
BACKGROUND ART
In recent years, the role of data shifting changed from being the raw material for performing analysis to a strategic asset for many companies. In this context Customer Relationship Management Software (CRM) like vtenext (https://www.vtenext.com; https://sourceforge.net/projects/vtecrm/) are of growing importance. CRM solutions are often intensively used as ticketing systems wherein its “history”, which comprises all the relative employee-employee and employee-customer interactions, has to be precisely tracked from the moment the ticket enters the system to the moment it is closed. Often CRM products are able to collect data and manage data coming from various sources in an IoT (Internet of Things) fashion. Then the CRM software no longer manages just human interactions but implements more and more automated, semi-automated, or supervised processes. The correct executions of such processes usually needs the correct integration of heterogeneous sources of data. In such integration, data quality is crucial but often obstacled by errors, missing values, which need (automatic) corrections. CRM solutions must be able to generate periodic reports, interactive dashboards and complex analysis often based on OLAP (Online Analytical Processing) cubes. Up to now data are seen as a mere output for a posterior analysis leading to a very limitative view of the application of CRM solutions. Many decisions in a certain stage of process are human made by simply taking a look at some report, analysis, or market study trying to make a decision made on a prediction of the possible outcomes. Some of these predictions are inherently complex and require sophisticated ad-hoc analysis involving, for instance, ever-changing corporate policies or the current socio-cultural state of the countries in which the company operates. For such decisions human intervention are still mandatory but there are many human decisions which are quite trivial like finding the right person for managing an incoming ticket according to certain basic metrics. For such decisions the form of the current input (e.g., the type of request contained in the ticket) and the statistics on the available data (e.g., the history of similar tickets) suffices to address most of them. The outcome of such decision has a structured form and it is promptly actionable (e.g., notify the fittest employee for taking care of the ticket in its current state). These kinds of decisions are ground decisions which usually may be taken by implementing Artificial Intelligence (AI) techniques but these AI systems are not fully satisfying in predicting the possible outcomes for every choice. Usually, this scenario, where the decision is validated by a human before it is taken, is referred as a semi-automatic decision or supervised decision depending on how the human intervention is required (i.e., it is semi-automatic when everything must be validated by a human, it is supervised when only “critical” cases are evaluated by a human). But this way of data driven analysis must be accelerated and error-reduced, and this in an economic way, not constraining a company to invest much money in experts or softwares which afterwards turn out not to fit the companies’ needs. A problem arises from the fact that, not at least, due to different classes of customers, static features (e.g., annual revenue, type of products the customer is selling, functions of the CRM the customer uses, and so on) are complicated by dynamic features (e.g., internal customer policies) which can only hardly be represented. In such cases companies prefer to establish simple guidelines for the actors involved leaving it to the opacity of the employees to work as a team for reaching the common goal. This approach works, but there are some important facts that may undermine its optimality in terms of costs and time. It is difficult to have a precise snapshot of how things are going on since guidelines usually have the form of an incomplete simple set of constraints on the workflow (e.g., “activity A must be taken only if activity B has been completed successfully”), but they do not represent it in its entirety. For this reason, if at some point one wants to know if some activity that may be executed will effectively be executed in the future one has to contact the person which is responsible for it in the current instance and it must be checked if this person is delaying the activity for some reason: it is difficult to decide if the instance is stuck in some loophole or if it is simply slower due to scarcity of resources or to problems with the customer.
Even under the optimistic assumption that all the people involved will behave correctly, their different personal interpretations on how the guideline has to be applied may result in unbearable inefficiencies and negative outcomes and relies too much on the skills of the individuals involved. It is difficult and time-consuming to detect structural errors in the guideline or in its application. Even if it is possible to query the whole set of "bad instances” through the data stored in the CRM system in order to find similarities, like recurrent events or unnecessary delays, it is difficult to address if the problem belongs to one or a combination of the following error classes: an systematic employee error due to misunderstanding or errors in the separation of duties, excessive workload, structural inefficiencies in the guideline which needs too much validations or permits to oversimplify the procedure by ignoring certain mandatory passages leading to poor results or the execution of expensive recovering procedures, insufficient resources to implement the guidelines for obtaining the desired throughput, and external interactions like erratic behaviour of some customer and/or some third party supplier.
DISCLOSURE OF THE INVENTION
The scope of the present invention is to propose a CRM method and related systems to overcome the above drawbacks and which guarantee an economic, fast and reliable way to help companies to guide business processes to a positive outcome adapting computing methods and systems such that they minimize errors and need as less human intervention or individual skills as possible. A further problem to be solved by the invention is to offer a system that substitutes vague and incomplete guidelines by an automatic method and system based system that permits the technical implementation of artificial intelligence avoiding human errors.
The problem is solved by a computer-implemented method for operating a system for customer relationship management (CRM) comprising the following steps:
(I) an input phase comprising the preparation of a CRM database containing static features and logs of events generated by company processes and optionally external processes;
(II) an import phase comprising the translation of said database into a corresponding temporal database;
(III) a process preparation phase wherein
(Ill-i) at least one implicit process is discovered in a process discovery phase mining said temporal database and expressing the discovered processes as explicit process models in an event array and/or
(Ill-ii) at least one explicit process is selected from defined process models already present in said temporal database; (IV) classification of said selected and/or discovered process models and/or extracting of suitable process models and optionally modifying said process models by an artificial intelligence system and listing of said classified and/or extracted and/or modified process models in an output phase and storing them in said database; (V) execution of at least one stored process selected among the classified and/or extracted and/or modified process models.
The preparation steps (I) and (II) are executed by a temporal extraction module that is responsible for transforming all the data collected by the CRM system into a temporal database representation. Such representation can form the input of the AI algorithms and every subsequent analysis to be performed.
The core of the process is the process discovery part that permits a processs driven management of business activities and decision making. The process base of the CRM system is more accurate than a solely data based system that needs a high grade of expert intervents. From a technical point of view, the challenge does not (only) arise in terms of interaction as it can be easily overcome by using standard communication logic between systems but rather in ensuring that the current approach in terms of prediction model is able to adapt independently to the data of companies operating in different sectors. A great advantage of the system is the fact that the generated process models can be based on data already available to the company, already stored in the company’s database. It follows that the simple import of data present in the various company silos into the system, combined with the automatic generation of models, makes the solution ready to work on different business sectors. This means that the complete description of events is contained in the database(s) of the CRM software, such set of events are called implicit processes. To the knowledge of the inventors, up to now process driven systems have never been proposed due to complex processes that involve different kinds of customers.
The process discovery module active in step (III) discovers implicit processes related to various entities from the data produced by the temporal extraction. Implicit processes represent de-facto practices in the workplace or customer behaviour. Since such processes are supported by the data in the first case, they are useful for discovering loopholes and inefficiencies. In the latter case, implicit processes representing customer behaviour are not just an indication of how certain classes of customers behave in general, but actually also determine the exact points in which a decision has to be made in order to drive the behaviour either to a desired outcome (e.g., new purchase, license renewal, and so on) or away from an undesired one (e.g. customer abandonment). Such decision points are represented by exclusive gateways in the extracted process models, for example BPMN diagrams.
CRM users may extract BPMN processes from their data or formalize them by hand by importing them into CRM or using the CRM built-in editor. CRM processes extracted from data are immediately available in the CRM editor for analysis and/or revision. Users may then adopt the hybrid approach of extracting processes from data, modifying the extracted processes to improve their execution flow and, finally, send the process to production for execution. Any BPMN process, whether discovered by process discovery or formalized/revised by a human, may be immediately instantiated for execution in the CRM engine. The CRM engine already allows the insertion of custom code into BPMN elements like tasks or gateways (e.g. for automating mail notification, detecting IoT events on industrial machines, checking expiration dates for payments, and so on). Such custom code is executed whenever the execution of a process instance reaches the enclosing element. Then, in its current stage of development the CRM engine is ready and perfectly capable of integrating AI algorithms that may be inserted into tasks or gateways. Establishing how such plugins are orchestrated inside a process is the responsibility of a classification manager module realizing step (IV). In the process discovery step, decision points in a process model, whether extracted or formalized by hand, are represented by exclusive gateways. The importance of having a formal process representation, like BPMN, is that it allows to determine where and when, during execution of a process, to take advantage of decision support Thus, process model and AI go hand in hand. Moreover the process is as a way to supervise the application of AI methods by providing suitable recovery mechanisms, still in the form of BPMN task exceptions and control flow elements, to intercept the inevitable errors conveyed by statistical methods.
The classification manager module implements such statistical methods; it is responsible for testing and scoring a plethora of classification algorithms in order to find (if any) the best method to embed it in an exclusive gateway for decision support. This happens almost automatically. The user has to specify the goal by simply clicking on the corresponding part of the process model and stating whether he/she wishes to drive towards or drive away from such a component (e.g. you want to drive towards a renewal event and away from an abandonment event). Then the classification manager for each exclusive gateway in the process performs feature selection, testing and scoring over all the data provided by the temporal extraction layer, and, if some accuracy threshold is complied with, the best classification method to apply in the gateway for deciding is found. Finally, it is up to the user to decide whether or not to apply the method in an unsupervised manner (i.e., the gateway is automatically driven by the method without human intervention for all future process instances), in a supervised manner (i.e. the decision remains in the hand of the current user but the method solution is notified to the user for decision support), or to simply ignore the application of the method. Despite the fact that CRM software solutions enable and encourage companies to formalizeZexecute all their activities as processes, companies tend to formalize as explicit processes only a small part of their processes and most of them are minor processes not related to their core business. This is in contrast with the fact that the majority if not the totality of the business activities of many companies is recorded and managed by the CRM software. Reasons for such discrepancy may be found in the fact that the decisions to be taken depend too much on the dynamic behaviour of a customer, which is difficult to know or foresee, so they instinctively refuse to constrain every possible case into a huge monolithic process diagram.
Preferably, an object classification maker configurator allows the system to be easily and flexibly configured via the interface, configurations are set up in self-service mode. This approach ensures that the system administrator or data scientist can configure for a given type of objects in the system the characteristic of the type of data to be interpreted to indicate to the machine whether it is for example simple text or articulated text and above all which data must instead be guessed, without having to code programming. In this way the system learns how to generate an anonymized classification model to be used later to guess the classification of a new object In the same way it will be possible to download preconfigured models provided by the community.
A system, library of ready to use classificators, can be implemented which, based on the metadata of the standard CRM modules, allows to share the standardized classifier models by industry type with the user community. These classifiers will be simply installable and usable. The corrections made by users to the predictions will be subsequently, in an anonymized form, used to power and improve the classification engine for the benefit of all the installations that use it, through a built-in update system. The implicit process discovery integration advantageously takes place in order to maintain an approach without code, the system administrator or data scientist will be able to select a series of modules and submit them in selfservice mode to the machine, entrusting it with the task of discovering the inherent processes. The dedicated configuration panel will allow the use of pre-packaged layouts for the classic modules present in the CRM system, but at the same time, in order to guarantee the current flexibility of the product, it will allow the configuration of new metadata for new or customized modules. Subsequently it will be possible to preview these processes in BPMN format and at the same time activate them following a possible modification, in order to optimize tire identified workflows. There will also be some links with the classifier that in simplified mode will allow to better identify the groups of similar objects.
In a preferred embodiment of the invention, said process models are in a BPMN format, optionally integrated with an artificial intelligence system to be executed at a certain stage of the process model. A BPMN (business process model and notation) engine permits to systematically deal with the aforementioned requirements in a standard and replicable way. A BPMN engine is a program that guarantees the correct execution of processes written in Business Process Management Notation, that is, the de-facto standard for formalizing most of the processes in today company world. The advantages of implementing company processes as BPMN and of delegating their executions to an engine are manyfold: at any time for every process instance (i.e., a single execution of a BPMN process) it is possible to monitor the execution. It is possible to query the current status of all active process instances to deal, for example, with excessive workloads of a pool of resources or to detect bottlenecks in the execution of some process, or to detect unnecessary loops that uselessly increase the average time of completion of some process. The BPMN engine is often delegated to notify to automatically start the next task in a workflow when the current one has been completed, in doing so the engine usually automatically notifies the correct person for the next task and provides all the necessary data for supporting its work. A BPMN process implemented with an AI function at a certain stage of the process permits to take decisions at deviation points by AI systems which may detect error sources and apply a more suitable solution.
A monolithic BPMN diagram would be too difficult to understand, maintain and execute, especially counter-intuitiveness is an unforgivable sin in a BPMN diagram since BPMN notation has been adopted for being a process management among people with different educations inside a company and its success heavily relies on its simplicity. Thus it is easier to have a group of BPMN diagrams which are classified and are selected on the bases of determined static features or trigger events. The most common BPMN diagrams used in CRM installations will be shared through an online library to facilitate their adoption. The widespread use of similar diagrams will then allow them to be improved through the reiteration of similar operations between users and will form a common basis for sharing improved flows through the discovery of new processes, this function will allow the consolidation of common operations and will therefore better highlight the particular processes of the individual organizations. The diagrams can be downloadable and activated via the online library available for consultation directly in the software.
Preferably, before step (V) the process models are further revised by an artificial intelligence system in a forecast manager module taking data and process history from the temporal database to identify inside the process models paths which have an increased probability to guide the process to the desired outcome and storing said revised process models in said database.
There might further be a corrector of classifications to improve the efficiency of the classifier to allow the user to specify a better prediction on the object displayed. These corrections are subsequently taken up and reworked by the system to improve its reliability. Some processes that started or followed a certain course in front of an incorrect correction could then be rewound and re-run according to the most correct flow, if so requested by the user.
This module uses the data taken from the temporal extraction module to see if certain paths in the process may increase the probability of the desired or undesired outcome. This is done, for example, by testing and scoring various forecasting methods against the process history and selecting the most accurate for predicting the probability of the given outcome. If a user so desires he/she can activate the method and receive prompt warnings when such probability is too low (i.e., in the case of desired outcomes) or too high (i.e., in the case of undesired events).
Obviously, not all the modules need to be used together, for instance, if a process is already formalized as an explicit process in the database, the process discovery does not take place, it is simply possible to extract a process model from the database. To optimize the success rate of the system, advantageously, said artificial intelligence systems to be applied in said classification, extraction, modification and/or forecasting operations) are selected according to predefined criteria among a plurality of artificial intelligence systems which are tested against the corresponding process model. Preferably, the artificial intelligence systems applied are suitable for at least one member of the group consisting of fault prediction, natural language processing and understanding, and customer behaviour prediction based on collaborative filtering techniques.
Considering any production/commercial/maintenance process in a given company, it is desired that such a process performs better for various reasons (e.g. costs, time consumption, and resources availability). Examples of tasks that can be improved by AI technologies are the following: fault prediction for hardware/software Systems[6] and failure prediction for industrial machines[3]; natural language processing (NLP)[1] and natural language understanding (NLU)[4] for customer support (e.g. ticket redirection, and chatbot assistant); customer behaviour prediction based on collaborative filtering echniques[2].
The above points usually imply at least the following steps to be completely integrated into the company’s day-to-day activities: expert consulting to address feasibility and the correct AI model to apply; tuning, testing and evaluation of the AI algorithm on the company data to achieve the desired performances; integration of the AI algorithm into the company’s software system. From the perspective of a small-medium enterprise, the costs, risks, and time involved in performing the above steps may be daunting and thus they are usually dissuaded from implementing it. The invention provides the small/medium-sized enterprises with an entry-level tool for consistently testing, tuning and integrating AI algorithms into their processes, simulating process outcomes and choosing the solution that best suits the needs. To this end, the solution, is a framework for integrating AI plugins over the CRM software which provides an extensible and full featured process engine in its core. Processes are expressed, for example, using the Business Process Management Notation (BPMN)[5] which is the de-facto standard for formalizing organizational processes. Current artificial intelligence solutions are vertical instruments for each function or sector, therefore their use is independent of availability for the specific sector. Even when available these tools are atomic entities and therefore provide for the feeding of their data and offer as output a result that must subsequently be worked on or made available for the appropriate assessments. Usually this activity is delegated to system integrators or data scientists with a variable cost for each project and results that are not always certain. Other CRM solutions have artificial intelligence on board but based on pre-established and static configurations. The limitations of this model are clear: fragmentation of applications and high activation costs. To overcome this problem, in a preferred embodiment of the invention, said artificial intelligence systems are comprised in a market place, in particular a market place for open source systems. AI plug and plays permit the test of several instruments against the process models for finding out the best one. A standardized (BPMN) process based CRM engine, with sector independent process models, opens the possibility to do so.
Preferably, said databases contain data, event logs and process models in a generalized manner without sector or department specific attributes and preferably in an anonymized manner. Thus, one single system can be applied to different customers, different company departments. For all AI systems, classifiers and forecasters, there may be launchable a freemium GDPR (General Data Protection Regulation) Cloud: a datacenter can be activated and conforms to the GDPR, where in complete autonomy and in self-service mode companies can map and import pseudo-anonymously the information and events present in their CRM. In this way the appropriate call back actions triggered by the processes discovered by the method and related computing system and similarly carried out of the classifications can then be mapped from an interface. The pseudo-anonymized databases loaded will contribute to the creation of classifiers and processes that can be downloaded from the marketplace.
AI elements thus come from a module called model marketplace. AI models are very specialized and there is a flourishing of AI-focuse start-ups in Europe that study and implement efficient and accurate solutions for specific needs. This module allows the user to test and compare the best of the AI models on the market. In a transparent way the data that normally feed the platform can be immediately used to be processed by artificial intelligence. For the automatic classification part an interface system can been set up that allows to select the fields that characterize information to be treated and the field that must be guessed by the artificial intelligence system, so the solution is ready to allow immediate use of artificial intelligence in different situations. The next step is to allow the interaction with different classifiers, activated for example, through the marketplace and thus allowing to evaluate the actual goodness in the field, their integration can take place via a microservice API (application programming interface) logic. For the user, everything happens in a completely transparent way as the platform deals with dialogue between the database and the artificial intelligence in use.
To better individuate critical decisive moments in a process model and understand the time available to take a decision, said temporal database comprises information regarding the moment in which a certain event occurs and the time interval for which a certain information is valid.
In an advantageous embodiment of the invention said discovery step (IH-i) further comprises the identification of a trigger event within the process model. Identifying a trigger event in the process model means identifying exactly the event which drives the process (without intervention) into one or another direction, and selecting a certain path at this stage allows to positively drive the process.
Alternatively, in step (Ill-i) said process models are further modified by a human user who selects in candidate process models desired and undesired outcomes to train the artificial intelligence system employed. To evaluate the human-made or machine-made selections, preferably, a comparison function evaluates the changes introduced in terms of process improvements and time flow.
The comparison function maybe in the form of a discovered processes performance checker: for the processes discovered by the method and subsequently modified by the expert or AI for better execution, a comparison function can be implemented between the previous executions and the current execution orchestrated according to the changes made. In this way, according to the identification of the new flow execution times and the steps involved, it will be shown graphically if there have been performance improvements with the new adoption or if the time flow remains unchanged and it is therefore necessary to intervene again to improve it.
A great advantage can be obtained by the invention, if in an embodiment, said process models stored in said databases are periodically revised by new artificial intelligence systems which have not yet been applied. Such a periodic revision function, allows for having always the best fitting AI system available on the market accessed by the user and thus, in other words, permits substituting an old AI system by a new better one.
Preferably, at the end of step (Ill-i) the metrics, in particular fitness, precision and score of the discovered process models are tested with predefined thresholds and only the process models that passed the test are expressed in the event array. This step optimizes the process selection.
In one embodiment of the invention, the execution in step (V) is automatically performed when a certain static feature and trigger event fall together. Thus, a certain process is started when the conditions of static and dynamic behaviour fall together. A process driven method and system, considers dynamic behaviour and reacts in a dynamic way.
Therefore, step (IV) may comprise detecting a value to be controlled and driving the process to the desired outcome whenever the process reaches a given trigger event identified by this detected value. Advantageously, said database comprises a change-log functionality regarding temporal entities and attributes for the discovery of trigger events, and optionally an XML description wherein the set of events to be considered when looking for trigger events can be restricted, and in particular thresholds for data mining parameters such as support, precision, fitness and score. XML description and thresholds help to target trigger events. For extracting such instances advantage is taken of the change-log feature that considers dynamic behaviour and helps to identify hidden instances which are the moment where a process is driven to a certain outcome. Usually software applications implement such change-log features for recovery purposes and thus, there is no need for additional plugins or custom functions implemented, it is possible to just start from what has been collected in the database so far.
To fully consider static and dynamic features, in a preferred embodiment of the invention, said database comprises an XML metamodel containing lifespan and static features, temporal features, events, thresholds for mining and classifications, the latter ones being inserted at the end of the classification/mining step, an id, and a temporal XML metamodel to move inside the temporal database for extracting traces regarding corresponding leads, wherein said metamodel XML is general being applicable to every context which uses a database containing change-logs relative attributes and temporal entities with the possibility to personalize it, specifying hierarchies and alias with an option to exclude the source of relative changes and decide whether to consider or not the order number with respect to the temporal entity.
To realize an event array that contains many or all of the information necessary to select an idoneous process model and to drive it at the right moment (trigger event) to the right outcome, said event array has a structure corresponding to a vector which connects any master entity, i.e event, with another array containing all the events belonging to the event identified for a lead and a timestamp associated to any event which indicates the exact moment when it took place, further counting the number of scores of a certain instance (singleton) exceeding a predefined threshold and defining a table of cardinalities connecting it to the possible instances of another attribute and creating possible combinations forming entity clusters.
Advantageously, to give a further selection criterium to the event, for each group of attributes from the event array are only extracted the events relative to the leads which satisfy the requirement of the static attribute. In a further step, preferably, for every extracted event is verified if it is supported by the group of traces present in the event array exceeding the corresponding threshold, i.e. epsilon_events.
Preferably, the pairs of set (static feature) and events are controlled to find the first occurrence of the event among the traces belonging to a selected lead by the limits represented by the set of attributes, truncating the analyzed trace at this point and creating sub-traces having as a start event the event of interest (trigger) expressed.
In one embodiment of the invention, step (Ill-i) comprises the following steps:
(a) an xml description extracts from the database an event array which is a representation of the set of traces representing the implicit processes; (b) a process discovery tool starts to enumerate all the combinations of static features which are sufficiently represented, i.e. above a certain, in particular user defined, support threshold, in the event array;
(c) for each combination of static features obtained at (b) the process discovery tool filters the traces associated to customers with static features in the event array and on this set of traces the tool selects the events with higher occurrences, again according to some defined threshold;
(d) for every pair static feature/event obtained at steps (b) and (c) a BPMN diagram is generated;
(e) every BPMN diagram obtained at step (d) is checked against the thresholds for the metrics fitness, precision, and score;
(f) if the BPMN diagram obtained at step (d) successfully passes the test performed at step (e), the triple BPMN diagram/static feature/event is saved in the database, otherwise it is discarded. Practically, the steps for process discovery are performed in a web-based application of a virtual net of containers which communicate among each other, wherein a first container communicates with a second container to have access to the databases with read and write permission to create XES files regarding lead clusters and then communicates with process mining tools contained in a fourth container, a web server, through a database and message broker contained in a third container, wherein preferably the first container extracts event logs in XES format, and the mining tools work with process models in BPMN format Such a configuration not necessarily is inside the company’s system, but can be used from outside, such that the company does not have to install such a complex system, but at the same time can have controlled, also in a codified protected manner, its data/processes.
In a preferred embodiment of the invention, classification and/or forecasting steps are supported by a text miner. A text miner helps to find inside complex texts (emails for example) key words or expressions which usually mean a certain customer attitude or behaviour and can help to identify trigger events to intervene and control the customer’s behaviour. The text miner may be realized as a standalone container and may be used for similar situations on different data or it may be installed locally or in a cloud.
Studying the text for the assistance requests, the inventors noticed that after a several number of interactions, it is possible to predict if the text will lead to a conversion from assistance request to a new feature request The text miner launches a warning to the employee managing the request alerting him when the conversation will lead to a converted one and thus he may raise the priority for the requests of this particular customer in order to raise the probability of him becoming a customer. This will help to focus on the leads which are more prone to become customers.
To give a user the possibility to intervene in very crucial moments which might require a human evaluation, the stored process models during step (IV) are integrated with an alert function.To this extent, there might be a decision support integration: to effectively illustrate to the user the classifications performed by the system and at the same time to highlight particular moments in which a certain discovered process is about to undertake a path unfavorable to the company, an improved alert system is implemented to highlight these reports. The suggested steps will therefore not be binding but underlined as if it was a navigator for the best route. This panel of suggestions goes alongside the process assistant already in use for the processes implemented by the company. Furthermore, this function will have the task of ranking among the possible options or rather engaging the user only for value-added activities, neglecting or minimizing the impact of those reports which, if not considered, would have an irrelevant impact in terms of overall business.
All the above steps can be performed on a computer and/or by cloud computing and can be implemented by hardware and software elements well known to the person skilled in the art
In a further aspect of the invention a computing system is foreseen which is adapted to perform the process according to the invention, in particular a computing system for customer relationship management (CRM), comprising:
(a) a CRM database for recording customer data including static features and logs of events generated by company processes and optionally external processes and being associated with
(b) a temporal extraction module for translating the database into a temporal database as an input for AI algorithms;
(c) downstream of the temporal extraction module a process discovery module for discovering implicit processes related to various entities from the data produced by the temporal extraction module, and optionally for modifying the processes to improve the execution flow;
(d) a classification manager module for one or more of the following activities to be performed against the processes furnished by the process discovery module or directly from the temporal extraction module: selection and integration of AI plug-ins, searching errors, implementing statistical methods for testing and scoring a classification algorithm to find the best mode for decision support for each gateway, and selecting the modality of execution, i.e. supervised, non supervised, semi-supervised;
(e) optionally a forecast manager module, that takes data from the temporal extraction module or the process discovery module to verify if certain paths have an increased probability to achieve the desired outcome, preferably by testing various forecast systems against process history and selecting the most accurate for predicting the probability of the given outcome;
(f) optionally a library of AI plug-ins for classification, forecasting and/or process discovery purposes feeding the classification manager module and/or the process discovery module and, if present, the forecast manager module; (g) an interface to permit the user to interact with at least the process discovery module and the output of the classification manager and the forecast manager module, if present, and wherein the interface optionally comprises an alert system; wherein the above listed elements (a) to (g) are located on a computer or data processor and/or in a cloud computing system, and wherein the modules can be organised in containers including a web-based application.
The single elements are suitable to perform the single steps listed above, each one according to its functions. As mentioned above the various modules and their plugins can be incapsulated in little stand-alone independent containers. Installation can be very modular and there is no particular problem in distributing such components.
Further aspects of the invention are related to:
- A data processing system comprising means for carrying out the steps of the method according to the invention.
- A computer program comprising instructions which, when the program is executed by a computer causes the computer to carry out the steps of the method according to the invention.
- A data carrying signal carrying the computer-program according to the invention.
- A computer-readable storage medium comprising instructions which, when the program is executed by a computer causes the computer to carry out the steps of the method according to the invention.
The invention permits to solve the problem initially stated and thus proposes a method for predicting and controlling in an efficient, fast and economic way business management processes, even considering dynamic behaviours, and this independently from sectors. It permits to extract a structured version of the implicit processes, in particular in form of multiple BPMN diagrams, focussed on a combination of the customer static features and the suspected events that triggered the process, thus enabling the user to mark desired or undesired outcomes in the process.
To find hidden processes and related trigger events which are responsible for starting them, AI systems for classification and/or forecasting are helpful, which can be sustained by human intervention to train the artificial system. The continuous storing of identified and classified/modified systems in the database let it become richer and usable (in an anonymous manner) in a shared form. Substantially, the start is a CRM database containing some implicit process already present and/or to be discovered. Applying the invention, a structured version of the implicit processes can be extracted, which usually is not represented by a single BPMN diagram but by a collection of BPMN diagrams according to some static features of the customer and some recurrent trigger events. To enhance the performance of the AI engines applied, an expert review of the processes is possible identifying possible errors and revise the process models integrating it with the now evident decision where and which AI may be beneficial to work alone, testing it perhaps against the current events and deciding the modality: fully automatic, semi-automatic, or supervised. The output is then an annotation of the revised BPMN diagrams obtained, possibly annotated with pairs of the form decision point/AI techniques to be applied. The annotated BPMN diagrams are loaded into the CRM system turning the implicit process into a set off explicit BPMN diagrams that will be manually or automatically executed when the relative pair (static features, trigger event) for the customer are satisfied, calling the AI techniques where indicated with the assigned modality. The process may be repeated for measuring improvements, detect critical situations and deal with them, evaluating/improving existing AI techniques, find new decisions where AI is applicable, and so on. The inventions involves both the development of a process discovery technique and of using plugins for AI driven decisions to be integrated in the process model. Using the process driven CRM logics permits to increase the return of investment to accelerate the implementations and to reduce dramatically the project failures. On the other hand, the customers can use a new kind of technology, based on the business processes shooting the departmental silos and pushing the organization to work for real for the customer needs without the need to split the CRM building into many areas (such as marketing, sales, services, ...) and which is not depending on personal experience, character and points of view of a human being.
An open source platform which fully integrates an engine which may execute BPMN diagrams is Vtenext of Crmvillage.biz in Italy focusing on the integration of data and processes. As explained above, an effective use of the day-by-day more pervasive AI in the context of CRM by integrating it via BPMN guarantees sufficient levels of controllability, return of investment, validation, learning, and privacy and independency. Companies can preserve their dats, working with the invention locally on their machines or via a cloud- based service where the AI can be provided with no third party support. In the process discovery part, a database with a change-log functionality stores the implicit process; and an xml description for obtaining a temporal view of the data, wherein this xml can then be specialized with user options, for instance, a user may restrict the set of events to be considered when looking for trigger events and/or user defined thresholds for data mining parameters such as support, precision, fitness, and score.
The method algorithm uses the specialized xml description for extracting from the database an event array which is a representation of the set of traces representing the implicit process, then the framework starts to enumerate all the combinations of static features which are sufficiently represented (i.e., above the support threshold defined by the user) in the event array; and for each combination of static features the framework filters the traces associated to customers with static features in the event array. On this set of traces the framework selects the events with higher occurrences, again according to some user defined thresholds; for every pair (static feature, event) obtained before, a BPMN diagram is generated using for example Splitminer (an open-source process miner provided by the University of Melbourne), afterwards every BPMN diagram obtained is checked against the thresholds for the metrics fitness, precision, and score using, for example, the Markov Fitness Precision (an open-source tool provided by the University of Melbourne), and finally, if the BPMN diagram obtained successfully passes the test performed, the triple BPMN diagram/static feature/event is saved in the output database, otherwise it is discarded.
Advantegeously, every step can be realized as a distinct (Docker) container and, for example, Redis can be used as message broker. Preferably, one supported pair (static feature, event) is created and checked at the time and, if successful, is stored with the relative diagram in the database. Then the discovered diagrams may be viewed before all the computation has been terminated. It is possible to parallelize the process for additional speed up by simply replicating the containers for Splitminer and Markov Fitness Precision. Moreover, advantegeously standard representation is used for the event array (i.e., Open XES format) and BPMN diagrams (i.e., BPMN 2.0 specification).
The features described for one aspect of the invention can be transferred mutatis mutandis to the other aspects of the invention.
The embodiments of the invention described solve the problems initially stated and highlighted before. In particular, the system permits the test of several AI plugins against extracted process models, a solution in particular in the case of open source AIs is very economic and widens the range of testable models, finding out the best fit to obtain optimal results. The process driven type can operate with a logic to dynamic processes. The solution can be applied cross-department and it does not operate on individual CRM areas but in a transversal way on customer management processes. The AI engine positions the framework at the top of selection of potential customers, since no one up to now uses this technology in CRM processes. The AI engine can introduce very easily the CRM process logics into the medium size companies with the process finder.
Classical CRM solutions are all data driven solutions. The processes are wired into the code and cannot be modified without using complex development tools. Furthermore none of the known solutions is cross-department. That is to say, they are all divided by CRM area (marketing, services, sales, ...). This guarantees the various vendors to multiply the value of the licenses by selling new extensions from time to time as customer management processes are implemented. Moreover, almost all the offers of these vendors are now cloud based without on-premise alternatives. This entails for customers a choice of so called lock-in. Once started using the solution, one can no longer stop paying the fees, otherwise the services will be turned off and the database that may be returned to the user is unusable. So the whole system is made to lock the client to himself.
The above purposes and advantages will be further highlighted in the description of preferred embodiment examples of the invention given as examples, but not limited to. Variants or embodiments of the invention are the subject of the dependent claims. The description of the preferred embodiment examples of the CRM method and related systems is given by way of example, but not limited to the attached drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 illustrates a block diagram showing a process driven CRM system with a process discovery, classification and forecasting module,
Fig. 2 illustrates a flow chart of an example for a process model in form of a BPMN diagram. Fig. 3 illustrates the process discovery part of a CRM system according to the invention in a computing environment. DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
Fig. 1 illustrates a block diagram showing a process driven CRM system with a process discovery 54, a classification 62 and a forecasting module 64 and their interaction. The CRM system comprises a database 50 containing explicit and implicit processes which are extracted by the temporal extraction module 52. The process discovery module 54 extracts process models 56 in the form of implicit 56a and explicit processes 56b. The user or an AI system selects candidate processes 58 for a further AI analysis, inside the candidate processes the user or an AI system selects desired or undesired outcomes 60, the process models with selected outcomes pass a classification 62 and optionally a forecasting managing module. Both modules work with scoring techniques and extract and modify processes 66. The user or an AI algorithm can make choices of unsupervised or supervised process parts and insert warning thresholds obtaining production-ready Al-powered processes 68. The processes 66, 68 are stored in the data base 60 and can pass a new cycle as already described undergoing classification and forecasting with new AI systems or can be used directly. The classification and forecasting modules, optionally, can be offered by a marketplace 67 which ideally is an open source.
Fig. 2 illustrates a flow chart of a simplified example for a process model in form of a BPMN diagram. The example encompasses all the interactions between the various modules. A company is using CRM for managing the customers of a software solution. The software has a community edition and the company promotes its use by seminars and conferences. By importing the data about leads via the temporal extraction module and launching the process discovery module module the BPMN diagram shown in Fig. 2 is obtained, which summarizes the common interactions between a lead and employees using CRM.
Following a seminar 3 a lead 1 usually downloads the community version 5 and either asks for a couple of training sessions 7 with the experts of the company or starts asking by using the support ticket 9 for more information on how to use the software. When there is a request for a training session 7 it is up to the manager to decide to charge the lead fully 11, to apply a discounted price 13, or to provide the training free of charge 15. When leads show a high level of interest in the software by repeatedly asking for support 17a a manager contacts him/her in order to check if he/she is interested in a training session 19, alternatively, the ticket, after the question has been answered, is simply closed 17b and no further ticket is opened, here the system can intertvene and propose a training day 7. At the end of a training period 23 the manager always asks the lead to become a customer 25 by purchasing the enterprise edition of the software. The offer may be refused 21, this may happen already for the offer for a paid training day. In this context the desired outcome is clearly that of the offer accepted 27. The CRM user who is managing the process simply clicks on it and marks it as desired goal.
Alternatively, the aim can be that it is desired to make the choice between full charge 11, discount price 13, and free training day 15 depending on the size of the client. The classification manager module will help by simply taking the feature of the various leads (provided by the temporal extraction module) and, by testing and scoring a classifier, it detects the values to look at and drives the process to the desired outcome. Now, by looking at the results of the classification, the user may decide whether the model proposed (the size depending choice) should be discarded (the decision stays in the manager’s hands without any influence from the classification outcome) or be integrated in a supervised- fashion (the decision remains with the manager, but the outcome, and possibly the explanation, are disclosed to the manager in order to motivate his choice) or become automatic (the decision is delegated to the process engine that applies the classification model whenever it reaches the given point and takes the course of action provided by it). The forecasting module helps to find a solution to the following problem: How much time since the first ticket do we have to wait before proposing a training day in order to raise the probability that the lead will accept our offer in the end? Does the number of tickets opened matter? There is a need for a specious of prediction. The forecast managing module considers all the history of previous similar leads and builds a set of prediction models that are tested and scored. The winning model is then proposed to the user. The user may discard the model, use as a simple waming/notification system that is limited to send an alert when certain leads may be “ripe for being collected”, i.e., when the probability of success given by the model is above a certain threshold, or, when the probability is acceptable, to contact the client directly with some message generator.
Preferably, both classification managing and forecasting managing are benchmarkers for the models with the only aim of selecting the best one for the specific goal and are retrieved from a marketplace model.
The process discovery part allows the user to select the CRM modules to be analyzed and, on the basis of the time series, highlights the relevant processes after defining the data structure. More generally, the data and processes are mapped at the API level to guarantee a complete interaction with internal and external artificial intelligence tools while maintaining a unique vision on the user side.
The invention’s approach involves: 1) simple feeding and use of data through the application interface or through data feed flows, all configurable and capable of being activated independently by the company's IT operators;
2) integration of artificial intelligence modules optimized in individual activities and potentially extendable indefinitely as a panorama of solutions; 3) activation of automation mechanisms at the process level through the native interaction of the BPMN engine and the artificial intelligence part;
4) the software adapts to the company and not vice versa, repetitive steps can be automated through the introduction of artificial intelligence and finally make this intelligence available to other companies that use the service with a view to continuous evolution; 5) the supply and development of these tools with an open source logic with all its advantages. The technological risk associated with the introduction of artificial intelligence often consists of the imminent obsolescence of the proposed solution as it is superseded by more recent algorithmic models. Information security should be considered.
The risk associated with obsolescence becomes an opportunity for continuous improvement as the solution natively allows replacement of the various artificial intelligence algorithms with the most recent and performing ones without burdening the users, simply by activating alternative artificial intelligence solutions available in the marketplace. The data feed takes place through the most common synchronization practices between the systems or through special APIs, and the outcome of artificial intelligence workings automatically becomes available through the interaction APIs using open standards. The data supplied for AI analysis are obfuscated beforehand to guarantee the confidentiality of the information contained, thus overcoming the reticence of companies to provide such data flows externally. Providing a system preconfigured in this way allows its use also by IT technicians lacking specialization in the field of artificial intelligence: expert data scientists will work to develop the artificial intelligence while IT workers with more classical skills will have their work facilitated and limited to the activation of that what is already available. The presence of ready-to-use modules or algorithms in the marketplace guarantees that the selection and validation work has been carried out upstream.
The invention gives the possibility of using data already uploaded to the CRM platform by individuals and other application systems (e.g.: ERP (enterprice resource planning), e- commerce, HR (human resource) applications), propose marketplace solutions to many users; adopt a pay-per-use business model where companies can take advantage of accessible prices to use algorithms without having to face startup costs.
The AI or machine learning function can be of assistance, in particular for presale processes (classification of lead potential and automatic routing to the salesperson with the most appropriate skills); sale processes (classification of customers with the highest purchasing potential per new product/service inserted or customers with the highest chum rates); customer service processes (automatic identification of trouble tickets with associated routing towards the most suitable work teams); HR processes (identification of the most suitable personnel skills in relation to the type of activities to be carried out); logistic processes (identification of the most efficient routes in relation to the required transport type); supplier selection processes (classification of the most attractive vendors per type of product or service to purchase).
Similar services supplied by multinational players, offer exclusively processing power using the AI applications already present on their marketplaces rather than the single algorithm. With respect to the end user, however, this solution fails to address the problem of usability and costeffectiveness requiring the services of a data scientist; moreover, there is no facility to make comparisons between solutions. On the other hand, the invention offers access to multiple algorithms with infinite growth potential and with no risk of having to face the challenge of real competitors on the market and to find the best best algorithm on the market available at that time. The AI algorithms currently on the market are subject to very fast depreciation because algorithms offering vastly superior performance are developed on a daily basis. The end user's possibility of adopting the algorithms of different vendors on the same platform at competitive prices and being able to change them in real time is not currently provided by the existing platforms. In this marketplace, customers can test multiple algorithms easily: once the data have been uploaded they can run the above illustrated analysis engine that will automatically propose the best solutions, in accordance with a logic of merit Fig. 3 illustrates the implementation of the process discovery part of the CRM engine according to the invention in a computing environment. The framework designed and developed provides a contribution in the field of process discovery or mining. More precisely, it classifies a company's customers, basing the analysis on their (dynamic) behavior. To do this, there is an input database 10a containing the event logs generated by the various business processes and the related XML (extensible Markup Language) metamodel 10b of the data. These are used as a starting point for the creation of a temporal database 16 containing only the information useful for the analysis of interest. The construction of the temporal database 16 is done by an import module Ruby 12 wherein certain limits 14 are imposed to make a selection. The activities recorded during the year were then reconstructed in chronological order showing interactions between the company and the various leads. Finally, the latter are classified according to their own characteristics, also identifying what are the possible triggering events related to observed behavior. In addition to the implementation development, there is made use of some external tools useful for the process discovery and related process model quality analysis to the lead categories. The framework is divided into four main sections:
- the input section A: the inputs that the user will have to provide to the system are described in order to customize each execution;
- the import section B: phase in which the database 10a containing the event logs is translated into the corresponding time database 16;
- the mining section C: it reconstructs the traces of the various types of supported leads from the event log, from which the process model and the possible trigger events are discovered;
- tire output section D: in the output table the process models whose metrics, obtained from the quality analysis, meet user-defined thresholds are reported.
In the input step, the user provides a database 10a containing the event log and an entry in the table used as input called pd_mining_data. In particular, the following fields are defined in the table: an id: that is an integer value that uniquely represents an instance of framework execution; xml_etl 10b that is the XML metamodel of the data describing the initial database structure and its translation into the new time database 16; xml_temp 20 that is an XML model that describes how to navigate the time database in order to extract the traces of the various types of leads; and params that is a JSON (JavaScript Object Notation) document containing the thresholds 14 for the customization of the analysis related to each instance; xml_etl 10b represents the XML metamodel used for the translation of a database containing a temporal event log. Inside this metamodel, a tree structure is used, having as root the chosen main entity (master_entity), containing the following properties: name: name of the master entity; source_relation: table in the source database; source_key: key in the source database; target_relation: table in the time database; target_key: key in the time database; target_back_key: integer value corresponding to the related key of the database entry spring, changelog: reports all changes related to the main entities and its temporal properties. It contains the following mandatory fields: relation: corresponding table; external_key: identifier of the entity to which the change refers; field: field containing the change information; time: corresponding timestamp. The changes can be found in the description field of the changelog table and refer to the main entity via aparent_id field. They can be divided into entities which were generated from an external element or entities not generated from an external element. There are fields in the source database and target fields in the time database which can have a value of 1 or 0. The lifespan corresponds to the temporal interval wherein the entity interacts with the company. It contains a field corresponding to the initial moment of validity of the entity in the source database; a field corresponding to the initial moment of validity of the entity in the time database; a field corresponding to the final moment of validity of the entity in the source database; and a field corresponding to the final moment of validity of the entity in the time database.
Static_features are a list of aspects specific to the entity to which they relate that do not undergo changes over time, specified by the static_feature tag. A static feature is not present in the database source, but only in the target field. External entities generate the master entity containing fields for a parent condition: indicates whether the entity's parent is an external element or not. There are corresponding order attributes in the target and in the source field and tags to specify the information in the time database. Further, there is a relationship in the time database that matches the master entity and external entity. Each lead is connected with the internal resource that generated it. Source root tags specify the fields containing the information related to this resource while, the tags with root target, represent their translation into the time database. There is a mapping between the various source_fields and target_fields by matching the value of the order attribute. Further, a table relates each lead with the internal resource that generated it. Source root tags specify the fields containing the information related to this resource while, the tags with root target, represent their translation into the time database.
In the example, the join_relation tag represents the table that relates each lead with the internal resource that generated it. Source root tags specify the fields containing the information related to this resource while, the tags with root target, represent their translation into the time database. There is a mapping between the various source_fields andtarget_fields by matching the value of the order attribute.
For classifications fields statically entered in the master entity, values are only associated with its end result. The following are specified tags and related fields: field of the master entity in the source database; table in the time database; key of the master entity; corresponding field in the time database; field corresponding to the initial moment of entity validity in the time database; field corresponding to the final moment of validity of the entity in the time database. Further there are considered temporal_features : properties whose instances, varying over time, are associated to the reference entity with a time interval, specified by the tag temporal feature. Each of these contains the following mandatory fields: a field of the master entity in the source database; a target field that contains the tags to specify the information in the temporal database with a table containing the temporal feature variations, in the time database for the various master entities; a master entity key; the instance of the temporal feature in the temporal database; a field corresponding to the initial moment of the validity of the temporal feature in the temporal database; a field corresponding to the final moment of the validity of the temporal feature in the temporal database. There are additional temporal_features not present in the database. Each of these has only the target tag, which in turn contains the following fields: a table containing the variations of the additional temporal feature, in the time database for the various master entities; a master entity key; instance of the additional temporal feature in the temporal database; a field corresponding to the initial moment of the validity of the additional temporal feature, in the temporal database; field corresponding to the final moment of validity of the temporal additional feature, in the time database. There are temporal_extemal_ownerships which describe temporal relationships belonging to one entity to another, specified by the temporal_extemal__ownership tag. Each of these contains the following mandatory fields: a table containing the owner entity in the source database; a key of the above table; a field containing the first owner entity in the source database; a field containing the last owner entity in the source database; a field corresponding to the initial moment of the validity of the ownership relationship in the source database; a field corresponding to the final moment of the range of ownership in the source database; fields of interest in the above table. Each of these is identified by the source_field tag and the order attribute. The target field contains the tags to specify the information in the time database with the following fields: table containing the owner entity in the time database; a key of the above table; an integer value corresponding to the related key of the entry in the source database; fields of interest in the above table. Each of these is identified by the target_ field tag and the order attribute that must be equal to the corresponding attribute in the source fields. Further there is a relationship in the time database that matches the master entity and external entity and contains the following mandatory fields: name of the table; master entity key; owner entity key; field corresponding to the initial moment of validity of the ownership relationship in the time database; field corresponding to the final moment of the ownership interval in the time database. In the example, the join_relation tag represents the table that relates each lead with all the internal resources in charge of managing it (with the relative time intervals). Source root tags specify the fields containing the information related to to the resources while, the target root tags represent the translation in the temporal database. Also here there is a mapping between the various source_fields and the target_field by matching the value of the order attribute.
Considered are further events: facts that happen instantly, specified by the event tag. Each one of these links to the entity to which it refers and contains the following mandatory fields: table containing the event in the source database; key of the above table; a parent_entity: contains the tags to trace back to the key of the entity to which it does refer the event. If theparent_key is not in the source_relation, aparent_relation and join_fieId is also indicated, thus a table containing the parent key (key of the entity to which the event refers) and a common field between parent_relation and source_relation. Each of these is identified by the source_fieId tag and the order attribute; the target field contains the tags to specify the information in the time database and contains a table containing the event in the time database; a key of the above table; an integer value corresponding to the relevant key of the entry in the source database; fields of interest in the above table. Each of these is identified by the target_field tag and the order attribute that is equal to the corresponding attribute in the source fields. There may be an additional field in the temporal database that indicates the classification of the event. Further there are timestamps corresponding to the time when the event occurred; relationships in the time database that match master entity and event with the fields for the name of the table; the master entity key; the event key; the event timestamp. One field indicates whether the event was generated by an external element or not The field needs only to be specified if there is a relationship between event and internal entity that generated it It contains the following mandatory fields: corresponding field in the source database; corresponding field in the time database; internal entity that generated the event; value true and false fields. Internal fields comprise internal entities that generate the event. The tag must be specified only if there is a relationship between events and internal entities that generate and specify using the extemal_entity tag. Each of these contains the following mandatory fields: a field that indicates if the event parent is an external element or not. In the latter case the following fields identify the external entity: a table containing the internal entity in the source database; a key of the above table; fields of interest in the above table. Each of these is identified by the source_field tag and the order attribute. A target field contains the tags to specify the information in the time database with the following fields: a table containing the internal entity in the time database; a key of the above table; an integer value corresponding to the related key of the entry in the source database; fields of interest in the above table. Each of these is identified by the target_field tag and the order attribute which must be equal to the corresponding attribute in the source fields.
To consider further is the relationship in the time database that matches the event and the internal entity. It contains the following mandatory fields: relation: name of the table; event key; key of the internal entity; event timestamp. The example shows the translation of the ticket comments table into comments and its related fields. The externally_created tag defines the translation of the ownertype field in the Boolean externally_created one following the logic of value_true/false/default. For comments generated by an internal resource, it is proceeded with the analysis of the extemal_tag entities as for the master entity. The join_relation related to the ticket comments table defines the relationship between the event and the entity to which it refers (ticket) while, that relating to comment_owners represents the relationship between the event and the internal resource that generated it. There can be additional_events with a related tag which specifies in related fields the tags to specify the information in the time database with a table containing the instances of the additional event, in the time database, for the various master entities; table key; the target_field tags corresponding to the fields of interest in the event, timestamp corresponding to the time at which the additional event was verified; relationship in the time database that matches master entity and additional event with the name of the table; the master entity key; the key of the additional event; the timestamp of the additional event. The temporal entities in the tree are linked to the parent entity by an interval temporal_entity tag, specified by the temporal_entity tag. Each of them contains the following fields: table in the source database; key in the source database; key of the master entity to which it refers. If the field is not present, the tag must be returned to the master entity representing the key of the temporal entity; a table in the temporal database; key in the temporal database; integer value corresponding to the relative key of the entry in the source database; a field that indicates whether the entity was generated from an external element or not (This tag is handled like the homonym in the master entity.); time interval in which the entity interacts with the company (This tag is handled like the homonym in the master entity.); a relationship in the time database that matches the master entity and temporal entity, with the following mandatory fields: name of the table; master entity key; temporal entity key. A static features tag is optional and managed like the homonym in the master entity. Tags for temporal_features, external entities, temporal external ownership, events are handled like the homonym in the master entity. Temporal_entities are handled exactly like the master entity, while still maintaining the link with it via the source_field and the lead_tickets relationship join relation.
Passing now to the import section B, the model xml_temp 20 is defined to allow the navigation of the time database 16 in order to to extract the information of interest for the reconstruction of the tracks related to the various types of leads. One of the main aspects is its generality, since it can be applied to any context that makes use of a database containing changelogs related to the attributes and temporal entities. The master_entity properties to specify are static features, temporal features, events, ownerships and temporal entities. Having previously described these concepts, only the tags contained within them, with an explanation, are shown below. This model 20 has a tree structure with the selected main entity (master_entity) as roots, containing the following information: name of the master entity; table in the time database; key in the time database. The following examples show the tags related to the master entity lead. The static features are specified by the static_feature tag and each one of these contains the following mandatory fields: name of the static feature; corresponding field in the time database. The extemal_entities are specified using the external_entity tag and each of these contains the following mandatory fields: aparent_condition field indicating whether the entity's parent is an element outside or not In the latter case, the following fields identify the external entity: name of the external entity; a table containing the external entity in the time database; key of the above table; relation in the temporal database that matches master entity and external entity with fields for name of the table; external entity key; master entity key. The static features are the static properties of the external entity. The following are specified through the static_feature tag and each of them contains the following mandatory fields: name of the static property; field containing the static property; a nil_value as default value; aparent_ feature that represents an alternative static property of the external entity and containing the following mandatory fields: name of the static property; field containing the static property; and the default value. In the example, the join_relation tag represents the table that relates each lead with the internal resource that generated it The static_features tag specifies the static attributes of the external entity. The classifications are specified using the classification tag and each of these contains the following mandatory fields: name of the classification; a table that keeps track of the classifications of each master entity, in the time database; a master entity key; a field containing the value of the classification in the time database.
The Lead Converted? classification of the lead lead_id is located in the converted field of the lead_converted table. Temporal features are specified via the temporal_feature tag and each of these contains the following mandatory fields: name of the temporal feature; table containing the temporal feature variations for each master entity; master entity key; instance of the temporal feature. The Business Unit variation of the lead_id is in the lead_business_unit field of the lead_business_unit table. Each temporal_external_ownership tag contains the following mandatory fields: name of the owner entity; table that keeps track of owner changes for each master entity, in the time database; key to the table above; relationship in the temporal database that matches master entity and owner entity. It contains the following mandatory fields: name of the table; owner entity key; master entity key. The static_features are the static properties of the owner entity. The following are specified through the static_feature tag and each of them contains the following mandatory fields: name of the static property; field containing the static property; the parent feature represents an alternative static property of the owner entity and contains the following mandatory fields: name of the static property; field containing the static property. The join relation tag represents the table that relates each lead with the internal resources to which it is assigned. The static_features tag specifies the static attributes of the owner entity.
Events are specified using the event tag and contain the following mandatory fields: name of the event; table containing the event in the time database; key of the above table; relationship in the temporal database that matches the master entity and event. It contains the following mandatory fields: name of the table; event key; master entity key. The static_features are the static properties of the event. The following are specified through the static_feature tag and each of them contains the following fields: name of the static property; field containing the static property; aparent_feature that represents an alternative static property of the event and contains the following mandatory fields: name of the static property; field containing the static property. Extemal_entities are specified using the external_entity tag and each of these contains the following mandatory fields: the parent_condition as field indicating whether the event parent is an element outside or not. In the latter case, the following fields identify the external entity: name of the external entity; table containing the external entity in the time database; key of the above table; relation in the temporal database that matches the event and external entity. It contains the following mandatory fields: name of the table; external entity key; (parent) event key. The static_features are the static properties of the external entity. They are specified by the static_feature tag and each of them contains the following mandatory fields: name of the static property; field containing the static property; default value; The_ parent_ feature represents an alternative static property of the external entity and contains the following mandatory fields: name of the static property; field containing the static property; default value. The example indicates that the comments table contains the events of the type Ticket Comments that have two types of static attributes: the text or its classification (parent_feature). For comments generated by an internal resource, it is proceeded as for the analysis of the extemal_entities tag as for the master entity. The join_relation related to the ticket_comments table defines the relationship between the event and the entity to which it refers (ticket) while, that relating to comment_owners represents the relationship between the event and the internal resource that generated it
The temporal_entities are specified using the temporal_entity tag, and can possess in turn the characteristics just described. Each of them contains the following mandatory fields: name of the time entity; table in the time database; key of the above table; relation in the temporal database that matches the master entity and temporal entity. It contains the following mandatory fields: name of the table; temporal entity key, master entity key; the static_features tag is optional and handled like the homonym in the master entity. The same holds for the temporal_features tag, the extemal_entities tag, the temporal _extemal_ownerships tag, the events tag. In the example the temporal_entities are handled exactly like the master entity, while still maintaining the connection with it through the join_relation related to the lead_tickets relationship.
A generalized data model is used, a feature of the data model is its customization, i.e. the user has the ability to act on attributes, both static and temporal, specifying hierarchies and alias, excluding the source of the relevant changes and indicating whether to take into account or not the order number for the time entities. Through this customizations it is possible to highlight or not some aspects in the final process model. Regarding hierarchies on static attributes, this data model allows the user to define attributes at different levels of detail. There are, for example, two specified ones, i.e.the name and the role. To indicate the level of detail you want to use in practice (in this case the role), you need to make use of the enabled≈"true" attribute in one of the parent_features. In particular, in case there is no parent feature with the enabled-"true" attribute, the name and field of the static feature are used; otherwise those of the first parent feature with the enabled="true". There can be an exclusion of source in changes. This data model allows the user to decide whether or not to remove the source from the generated strings related to the changes. This is possible by using of the source attribute≈" false". Often unnecessary changes are generated. For example, if you add the enabled- 'true" attribute and the two resources have the same role, no event will be generated. Aliases on temporal entities can be used, i.e. nicknames, instead of the frill name of the entity. This customization does not effect on the creation of the process model, but only on time entity labels of which it is composed.
The last field of the entry provided in the input is called params and contains the thresholds defined by the user, used by the framework to accept or not the discovered process templates. The parameters are divided into four groups (distributed as an input during the import and mining phases B, C) and defined within a JSON document:
(1) Thresholds 30 define the minimum (min) and maximum (max) thresholds related to the metrics that evaluate the process in terms of
- fitness: decimal value in the range [0; 1] and measures how much of the behaviour recorded in an event log can be reproduced by the discovered process model;
- precision: decimal value in the range [0; 1] and indicates how much the discovered process model adapts to the event log;
- fscore: decimal value in the range [0; 1] and indicates the harmonic mean of fitness and precision:
Figure imgf000035_0001
- n_task: number of tasks contained in the discovered process model.
(2) supports 22: represents the minimum thresholds for which a lead type with certain static attributes is defined supported by the log of events (min_support_features); a certain event is supported by the number of tracks that contain it and can be defined a possible trigger event (min_support_events);
(3) limits 14: allows to define the date from which to start the extraction of the main entities in the import phase (limit_date), the time order to follow (flag: "up/down") and the number of entities to extract (limit_process). In case it is not desired to impose such limits, the first two parameters must be set to "-1 "; (4) splitminer 28: defines the following parameters to customize process templates returned as output from the process discovery tool, Splitminer:
- parallelism: decimal value that varies in the range [0; 1] and defines the threshold for the representation of parallelism (AND gateway);
- filtering: percentile for the frequency threshold that varies in the range [0; 1]. It is calculated on the frequencies of the most frequent input and output arcs of each knot, and retains only those arcs with a frequency that exceeds the indicated threshold.
-remove_or_joins: Boolean value for removing gateway ORs.
In the import phase B, the framework takes care of analyzing the tables contained in the initial database, navigating it through the xml_etl document 20. The goal is to extract from them only the necessary information, organising it within a new temporal database 16 containing information relating to time. More precisely, it allows for storage of:
- the instant when a certain event occurs, by means of the vt (valid time) attribute; - the time interval in which a certain information is valid, by means of the pair of attributes vts (valid time start) and vte (valid time end), and refers to the temporal database.
Initially the public schema of the new database is populated by the creation of the various tables in the xml_etl document 10b. Subsequently, the table relating to the master entity within the original database 10a is taken into account and, at the same time, the user- defined limits 14 in the params field are extracted. In particular limit_date indicates the date from which to start the extraction. A flag specifies the time order to follow and the limits 14 the number of entities to extract. For each selected entry the changes reported in the changelog register are extracted and all other entities related to it. These data are then organized within the new temporal database 16 following always the structure imposed by the xml_etl metamodel 10b.
The mining phase C is the phase of the event array generation, thus ofd an array of tracks (event-array), each corresponding to a master entity, containing in turn the events extracted from the database. To do this, a Ruby library 18 is implemented that takes care of scrolling the tree structure of the xml _temp model 20, going from time to time to query the database 16, in order to build in an incremental way the strings related to the recorded events. Finally, each of the produced strings is associated to the relative timestamp to be then ordered temporally, thus creating the tracks that compose the event-array. Each track is also preceded by the identifier of the entity that generated it (Table 1).
Table 1:
Figure imgf000036_0001
The leads are divided into clusters. The structure of the event-array corresponds to a vector that puts in relation each master entity (corresponding to a track) with an additional array, containing in turn the following elements: all events belonging to the track identified by the lead in question and the timestamp associated with each event, which indicates the start time, or the exact time it occurred (in the case of messages and comments).
Before proceeding with the event array analysis, all possible values of the static attributes of the master entity are extracted and, for each of them, it is checked if it is supported by the set of tracks. This kind of analysis consists in counting the number of occurrences of a certain instance (singleton) of the attribute, checking that it exceeds the threshold called epsilon_features (chosen a priori by the user). Thereby it is possible to customize the minimum degree of support required to select the values of interest related to the set of static attributes of the master entity. The following Table 2 shows the feature_sets after having inserted singletons. Table 2:
Figure imgf000037_0001
Once the singletons have been inserted into the table, each one generates a set of superior cardinality associating it with all possible instances of another attribute, thus creating any permissible combination. This step is performed recursively until maximum cardinality of the sets is achieved, which corresponds to the number of static features of the master entity. Note that attribute sets that during the analysis do not appear to be supported, are not further expanded in order to avoid an unnecessary waste of time and resources. Each set of attributes, generated corresponds to a cluster of entities, i.e. a group of main entities that share the same instances of the attributes that compose the set. Table 3:
Figure imgf000038_0001
Looking at Table 3 representing the complete feature sets, the following features can be seen:
- the first six records that correspond to the starting singletons;
- the eight records that follow the singletons correspond to all possible combinations, of cardinality two (i.e. the maximum attainable cardinality), generated by the individual attributes;
- for each set the number of occurrences within the event log is indicated and whether is supported by the latter.
It follows the definition of possible trigger events. Once all possible attribute sets are generated, for each of them only the traces related to the leads that meet the static attribute constraints described in the set are extracted from the event array.
To do this, the lead identifiers that match the attributes of the set are identified and, given the event array, only the corresponding tracks are extracted. The following step takes care of recursively analyzing each of the tracks previously selected. The objective of this step is to extract from each of these traces all the events that compose them, bringing them back into a new table, associated with the corresponding set For each extracted event it is verified if it is supported by the set of tracks present in the event array. This analysis consists of counting the number of distinct tracks in which the event in question appears, verifying that it exceeds the threshold called epsilon_events (selected by the user). Thereby it is possible to customize the minimum degree of support required for what could be the trigger events for the extracted behaviors. Table 4 shows thefeature_event_sets.
Table 4:
Figure imgf000039_0001
At this stage follows the event log generation, thus of the documents in XES (extensible event stream) format 24 related to the traces that identify the observed behaviors of the various lead clusters, with the aim of identifying the possible triggers. The first step consists in scrolling through all the pairs (set, event) inside of the fearture_event_sets table, in order to search for the first occurrence of such an event within the tracks belonging to the selected leads through the constraints represented by the set of attributes. Once the location of the first occurrence of the event in question has been located, one proceeds by cutting the analysed trace at that point In this way, the result obtained corresponds to a sub-track of the initial track, having as start event of interest. At this point, it is possible to generate the XES document 24 related to the new sub-tracks corresponding to a certain cluster of master entities. Once all the event logs have been generated and represented using the XES standard, the user wants to extract the relative process models from them using a tool that implements process discovery techniques. This last step allows to extract only the models of processes that represent the main behaviours of the leads in contact with the company, also identifying possible triggering events. Then a web application structure is developed. The first step is the creation, for example, of a Docker application, represented by a virtual network of containers. This network has some characteristics common to a black box, since outside it will show only the input and output relations. This application contains within it four services organized among them and represented by the relative containers: such as Jupyter, Postgres, Redis and Sinatra. These containers are able to communicate with each other within the network through the port mapping defined in the docker-compose.yml. In particular, the main container Jupyter communicates with Postgres to get access to to databases with read and write permissions. In this way, after generating the XES files 24 related to the lead clusters, it communicates with the two process mining tools 28 by means of a Redis database (not represented) contained in a corresponding container.
To make this last part possible, a web server is created in Ruby 26 through the Sinatra 28 gem. This gem was chosen for its intuitiveness in the creation of a minimal web server that allows you to use the Split Miner and Markovian Fitness and Precision (MFP) as web services 28. These tools would be bootable only from the command line while, in this way, they can be used inside the of a container and communicate, in a simple way, with other services.
Sinatra 28 was chosen since no complex web server is needed, but a server that simply writes and starts commands on the machine terminal below and reports the results to the calling container. In addition, it is important to use a web minimal server as it is started on the same machine where Split Miner is present which, during the process discovery phases, makes use of most of the available resources.
Another step in the mining section C is the process discovery. Once geberated the virtual web of containers, it is possible to discover the process models or templates related the XES documents 24 produced in the previous step. To do this, the Jupyter container communicates with the Redis container by saving an event log in XES format. At this point the Sinatra container 28 extracts the document and passes it as input to the Split Miner tool, running inside the physical machine. This last one, through process discovery techniques, is able to extract the process model corresponding to the XES document. The XES document and the process model will be saved on the web server with the purpose of being used in the next step as input for the second tool. Once the process model is discovered, the system proceeds with the extraction of the metrics to assess the actual quality of the model obtained. This procedure is performed using the Markovian Fitness and Precision (MFP) tool.
The Jupyter container, as soon as it receives the BPMN process generated by Split Miner, communicates again with the Sinatra container 28, requesting the calculation of the metrics related to the event log generated in the previous step and the extracted process model. The server takes care of passing the two documents as input to the Markovian Fitness and Precision tool, also present inside the physical machine. The latter, applying conformance checking techniques, is able to provide the metrics that describe, in terms of fitness, precision and fscore, the quality of the model compared to the original event log.
The metrics obtained are then compared with the following thresholds 30: epsilon_fitness, epsilon_precision and epsilon_fscore (user-defined), with the aim of defining when an process model can be defined as actually conforming to the event log that has created the process model. Finally, the results obtained as a result of using the two process mining tools just mentioned, are saved inside the output table 32, in the database, thus switching to the output section D.
Each instance of the framework execution produces as output a series of entries in the same output table, within the related database. Below are the descriptions of the columns that make up this table, in order to to facilitate the reading of the results produced: id: full identifier of each entry in the table; set: set of instances of static attributes that identify a type of master entity; event: event triggering the behaviours recorded for that type of entity; xes: event log reported in XES format; bpmn: process model extracted from the event log, reported in BPMN format; bpmn_translated : is the same process model just mentioned, reporting some changes to task labels in order to improve readability by the final user, metrics: metrics that describe the quality, in terms of fitness, precision and fscore, of the process model extracted from the event log; n_events: integer number representing the number of events present within the extracted process model; n_task: integer number representing the number of tasks/activities inside the extracted process model; n_gateways: integer number representing the number of gateways inside the extracted process model; and module: field that indicates the name of the master entity on which the framework has been executed. The overall execution of the framework is divided into three main parts. Each of them can be started individually by indicating the identifier of the entry, in the input table, for which it is desired to run the framework portion.
Given an id identifier, any of the three phases can only be started after having completed at least once all the ones before it The commands can be started from the command line only after having performed access to the container containing the Sinatra server (running), which can be started by means of the ruby command sinatra-main.rb from the directory that contains it.
The first step is to generate the time database 16 related to the identifier and the module for which it is desired to perform process mining. This database will contain only the (empty) public (populated by the next phase) and mining (used in the last phase) schemes. The name of the new database will be of the type "temporal_<id>_<module>" and, in the case of which it was already present, will be regenerated by eliminating its contents.
The second part deals with the import phase, described in detail above, creating, in the public scheme, the tables with the respective fields specified by the user in the xml_etl metamodel 10b. Finally, each of these tables is populated with the data extracted (and timed) from the source database 10a.
This last part deals with the mining phase, i.e. the extraction of tracks, the division of leads into clusters, the extraction of possible trigger events and the discovery (and evaluation) of process models. It then deals with populating the final output 32 table, making it available to the end user for their own analysis.
The machine learning technology starts with an input of structured and/or unstructured data. These data must be of quality. According to the "garbage in, garbage out" logic, if the algorithms are instructed with incomplete or incorrect data, too many errors will be generated to be used. The main problem is that these quality data are typically owned by the user companies (suppliers of products and services). The user interface according to the invention allows end users to easily use algorithms from a market place and implement them in their organization. Above has been explained how the selection, testing and production process of the different algorithms chosen by the end user for their own purposes and business processes takes place. As explained before, AI algorithms on the market today are subject to a very fast deterioration, by virtue of the fact that much more performing algorithms are invented every day. The possibility for the end user to use algorithms from different suppliers in the same platform at competitive prices and to be able to change them in real time has not been possible up to now with the pre-existing platforms. Furthermore it is not possible to make comparisons between one solution and the other and not least the prices of the proposed solutions are typically unattainable for a cutting end user company average. Dealing with a growing complexity of processes involving multiple data sources and multiple choices under time-critical, resource-critical, and mission-critical requirements, according to the invention they can be handled systematically by a framework that is: a) flexible: it allows updates in the models, AI algorithms, and feature selection algorithms in a fast, consistent and controllable way in order to implement the constant changes in regulations, technologies, and policies; b) accessible: it is usable, at least in its basic functionalities, even by a domain expert, in an independent fashion, without the constant help of any computer scientists. More precisely, integration, tuning and testing of a new model, AI algorithm, or feature selection algorithm can be done without writing a raw code; c) affordable: AI integration for real world productivity boosting often requires cherry picking, both models and features. Roughly speaking there is no philosopher’s stone of AI model/algorithms that is able to automatically turn every kind of data into gold for the company. This is due to the fact that AI models work under different statistical assumptions and thus select the best one for every goal cannot avoid an extensive phase of trial and error, what here is done by the proposed CRM framework.
The machine learning system aligns the appropriate type of learning system, for example in a supervised or unsupervised manner. At the end there is the data output in form of prediction, classification or in an exploratory manner.
If testing a single AI algorithm for a given process implies license, expert consulting and even implemention (i.e., making a sytem talking with the API of the algorithm) costs, time consumption, without the assurance of certain results, small-medium companies will be driven away by the excess of risk of implementing it in their day-to-day activities.
The consequences are that the access to the benefits of AI-enabling technologies in the state of the art will be granted only to big companies that can afford armies of top notch consultants.
The invention’s solution is a framework for integrating AI plugins over the CRM software with preferably an extensible and full-featured BPMN engine in its core. The extension of this CRM machine offered by the invention adds the following main modules: A temporal extraction module that is responsible for turning all the data collected by the vte into a temporal database representation that it is easy to manage for any kind of analysis to perform.
A process discovery module that allows to discover implicit processes relative to various entities from the data produced by the temporal extraction module. All the processes discovered are already integrated in CRM and may be modified, executed and integrated with the present and future plugins described in the classification manager module section below.
In the classification manager module decison points in a BPMN process, are represented by exclusive gateways. The importance of having a formal process presentation like BPMN, allows to determine where and when, during the execution of a process, to take advantage of a decision support. Moreover the process may be seen as a way to supervise the application of AI methods by providing suitable recovering mechanism, still in forms of BPMN task exceptions and control flow elements, for the inevitable errors conveyed by statistical methods.
Finally, a forecast manager module helps to verify if certain paths in the process may increase the probability of the desired outcome.
Obviously, not all the modules must be used together, for instance, if a process is already formalized the process discovery module does not come into play. Practically, in an Saas (software as a service) mode, a special online area is activated on cloud computer systems where, after registration, one can import and synchronize its data through different methods, i.e. direct access to its databases, REST (representational state transfer) services, flat files via ftp (file transfer protocol), import excel / csv (comma- separated values) and a series of pre-configured connectors for the most popular erp, crm and marketing suites. Alternatively, a plugin mode can be implemented, without having to activate the online area.
The data thus imported/synchronized and fed from and to the outside can be used to instruct result classifiers. The user defines an information silo and indicates which fields of a given object are those containing the information to be automatically catalogized and indicates a field as a value to be predicted by artificial intelligence training. In this way as soon as the artificial intelligence training is finished, the classifier is ready to be used through a properly configured BPMN process. Subsequently it will be possible to produce also other trainings for the same classifier as the most performing and improved artificial intelligence algorithms are periodically added, which in case of increased reliability in the prediction will replace the existing classifiers for that particular service or data silos.
Further expedients adopted in the flow foresee the presence of algorithms aimed at optimizing the predictions based on the language used and moreover the cleaning of words not useful for the cataloguing is carried out automatically, removing for example the introductory greetings in a text. The classified data is returned by the centralized cloud- based CRM backoffice system and directly updates the relevant field in the CRM system in use. This update as well as other changes made to the records through external interactions are subject to any activation of a dedicated process, for example to update a third external system CRM or ERP, according to microservice logic.
Another natively present functionality is represented by the process discovery module that allows to define agents to analyze the historical log of system interactions. Also in this case the interface used allows to select a data set, a particular silo of information, which under artificial intelligence can produce a series of processes in BPMN format and then display the operational flows and the lead time of the events themselves. This is done through the automatic transformation of all the data present in the system into an anonymous time database.
The processes highlighted can be used to analyze and improve the company's operational flow performance and can also be activated directly from the interface, i.e. they can be the starting point for the improvement of operational activities. The transformation flow of data into a time database has been better specified before.
Basically the CRM backoffice allows to activate artificial intelligence plugins (classifier and forecaster) and others that will be available through the dedicated marketplace in a simplified way. The transformation, preparation, delivery and orchestration of data and flows can be made available through a simple configuration interface.
Each new artificial intelligence plugin connected to the CRM service is then modified or engineered in order to provide plug and play intelligence. The user only has to worry about correctly importing the data into the CRMt interface in use, activate the pre-configured agents according to his needs and activate the related automated flows that these automatisms will manage.
During the implementation phase, the computer-implemented method and realted products, like the computing system, which are the object of the invention, may be subject to further modifications or variants not described. Should such modifications or variants fall within the scope of the following claims, they shall all be deemed to be protected by this patent. In practice, the components used, as well as the dimensions, numbers and shapes as long as they are compatible with the specific use and function and unless otherwise specified, may be any, depending on the needs. In addition, all details can be replaced by other technically equivalent elements.
REFERENCES
[1] Conneau, A., Schwenk, H., Barrault, L., & Lecun, Y. (2016). Very deep convolutional networks for text classification. arXiv preprint arXiv: 1606.01781. [2] Koren, Yehuda, and Robert Bell. " Advances in collaborative filtering” Recommender systems handbook. Springer, Boston, MA, 2015. 77-118.
[3] Kanawaday, Ameeth, and Aditya Sane. " Machine learning for predictive maintenance of industrial machines using IoT sensor data ” 20178th IEEE International Conference on Software Engineering and Service Science (ICSESS). IEEE, 2017. [4] Mishra, Amit, and Sanjay Kumar Jain. "A survey on question answering systems withclassification ” Journal of King Saud University-Computer and Information Sciences 28.3 (2016): 345-361.
[5] OMG (2011). Business Process Model and Notation (BPMN), Version 2.0 Object Management Group (Technical report, Object Management Group). [6] Malhotra, Ruchika. "A systematic review of machine learning techniques for software fault prediction ” Applied Soft Computing 27 (2015): 504-518.

Claims

1) A computer-implemented method for operating a system for customer relationship management comprising the following steps: (I) an input phase (A) comprising the preparation of a CRM database (10a) containing static features and logs of events generated by company processes and optionally external processes;
(II) an import phase (B) comprising the translation (52) of said database (10a) into a corresponding temporal database (16); (III) a process preparation phase wherein
(Ill-i) at least one implicit process (56a) is discovered in a process discovery phase mining (C) said temporal database (16) and expressing the discovered processes as explicit process models in an event array and/or
(Ill-ii) at least one explicit process (56b) is selected from defined process models already present in said temporal database (16);
(IV) classification of said selected and/or discovered process models and/or extracting of suitable process models and optionally modifying said process models by an artificial intelligence system and listing of said classified and/or extracted and/or modified process models in an output phase (D) and storing them in said database (10a); (V) execution of at least one stored process selected among the classified and/or extracted and/or modified process models.
2) The method according to claim 1 characterized in that said process models are in a BPMN format, optionally integrated with an artificial intelligence system to be executed at a certain stage of the process model.
3) The method according to claim 1 or 2 characterized in that before step (V) the process models are revised by an artificial intelligence system in a forecast manager module (64) taking data and process history from the temporal database (16) to identify inside the process models paths which have an increased probability to guide the process to the desired outcome and storing said revised process models in said database. 4) The method according to anyone of the preceding claims characterized in that said artificial intelligence systems to be applied in said classification (62), extraction (52), modification (60, 68) and/or forecasting (64) operations are selected according to predefined criteria among a plurality of artificial intelligence systems which are tested against the corresponding process model.
5) The method according to claim 4 characterized in that said artificial intelligence systems are comprised in a market place (67) for open source systems. 6) The method according to anyone of the preceding claims characterized in that said databases (10a) contain the data and event logs and process models in a generalized manner without sector or department specific attributes and preferably in an anonymized manner. 7) The method according to anyone of the preceding claims characterized in that said temporal database (16) comprises information regarding the moment in which a certain event occurs and the time interval for which a certain information is valid.
8) The method according to anyone of the preceding claims characterized in that said discovery step (Ill-ii) further comprises the identification of a trigger event within the process model.
9) The method according to anyone of the preceding claims characterized in that in step (Ill-ii) said process models are further modified by a human user who selects in candidate process models (58) desired and undesired outcomes to train the artificial intelligence system employed and in that optionally a comparison function evaluates the changes introduced in terms of process improvements and time flow.
10) The method according to anyone of the preceding claims characterized in that said process models stored in said databases (10a, 50) are periodically revised by new artificial intelligence systems which have not yet been applied. 11) The method according to anyone of the preceding claims characterized in that at the end of step (Ill-ii) the metrics, in particular fitness, precision and score of the discovered process models are tested with predefined thresholds (30) and that only the process models that passed the test are expressed in the event array.
12) The method according to anyone of the claims from 8 to 11 characterized in that the execution in step (V) is automatically performed when a certain static feature and trigger event fall together.
13) The method according to anyone of the preceding claims characterized in that step (IV) comprises detecting a value to be controlled and driving the process to the desired outcome whenever the process reaches a given trigger event.
14) The method according to anyone of the preceding claims characterized in that said database (10a) comprises a change-log functionality regarding temporal entities and attributes for the discovery of trigger events, an XML description (10b) wherein the set of events to be considered when looking for trigger events can be restricted, and thresholds for data mining parameters such as support, precision, fitness and score.
15) The method according to anyone of the preceding claims characterized in that said database (10a) comprises an XML metamodel (10b) containing lifespan and static features, temporal features, events, thresholds for mining and classifications, and classifications, the latter ones being inserted at the end of the classification/mining step (C), an id, and a temporal XML metamodel (20) to move inside the temporal database (16) for extracting traces regarding corresponding leads, wherein said metamodel XML is general being applicable to every context which uses a database containing change-logs relative attributes and temporal entities with the possibility to personalize it specifying hierarchies and alias with an option to exclude the source of relative changes and decide whether to consider or not the order number with respect to the temporal entity.
16) The method according to anyone of the preceding claims characterized in that said event array has a structure corresponding to a vector which connects any master entity, i.e event, with another array containing all the events belonging to the event identified for a lead and a timestamp associated to any event which indicates the exact moment when it took place, further counting the number of scores of a certain instance (singleton) exceeding a predefined threshold and defining a table of cardinalities connecting it to the possible instances of another attribute and creating possible combinations forming entity clusters.
17) The method according to claim 16 characterized in that for each group of attributes from the event array are only extracted the events relative to the leads which satisfy the requirement of the static attribute.
18) The method according to claim 17 characterized in that for every extracted event is verified if it is supported by the group of traces present in the event array exceeding the corresponding threshold ( epsilon_events). 19) The method according to claim 18 characterized in that the pairs of set and events are controlled to find the first occurrence of the event among the traces belonging to a selected lead by the limits (14) represented by the set of attributes, truncating the analyzed trace at this point and creating sub-traces having as a start event the event of interest (trigger) expressed.
20) The method according to anyone of the preceding claims characterized in that step (Ill-ii) comprises the following steps:
(a) an xml description (20) extracts from the database (16) an event array which is a representation of the set of traces representing the implicit processes; (b) a process discovery tool (28) starts to enumerate all the combinations of static features which are sufficiently represented, i.e. above a certain, in particular user defined, support threshold, in the event array;
(c) for each combination of static features obtained at (b) the process discovery tool (28) filters the traces associated to customers with static features in the event array and on this set of traces the tool selects the events with higher occurrences, again according to some defined threshold (30);
(d) for every pair static feature / event obtained at steps (b) and (c) a BPMN diagram is generated; (e) every BPMN diagram obtained at step (d) is checked against the thresholds for the metrics fitness, precision, and score;
(f) if the BPMN diagram obtained at step (d) successfully pass the test performed at step (e) the triple BPMN diagram/static feature/event is saved in the database, otherwise it is discarded.
21) The method according to anyone of the preceding claims characterized in that the steps for process discovery (C) are performed in a web-based application of a virtual net of containers which communicate among each other, wherein a first container communicates with a second container to have access to the databases with read and write permission to create XES files (24) regarding lead clusters and then communicates with process mining tools contained in a fourth container (28), a web server, through a database and message broker contained in a third container, wherein preferably the first container extracts event logs in XES format (24), and the mining tools (28) work with process models in BPMN format.
22) The method according to anyone of the preceding claims characterized in that the artificial intelligence systems applied are suitable for at least one member of the group consisting of fault prediction, natural language processing and understanding, and customer behaviour prediction based on collaborative filtering techniques.
23) The method according to anyone of the preceding claim characterized in that classification and/or forecasting steps (62, 64) are supported by a text miner. 24) The method according to anyone of the preceding claim characterized in that the stored process models during step (IV) are integrated with an alert function.
25) A computing system for customer relationship management, comprising:
(a) a CRM database (10a) for recording customer data including static features and logs of events generated by company processes and optionally external processes and being associated with
(b) a temporal extraction module (52) adapted to translate the database into a temporal database (16) as an input for AI algorithms; (c) downstream of the temporal extraction module (52) a process discovery module (54) adapted to discover implicit processes (56a) related to various entities from the data produced by the temporal extraction module (52), and optionally for modifying the processes to improve the execution flow, (d) a classification manager module (62) adapted to perform for one or more of the following activities against the processes furnished by the process discovery module (54) or directly from the temporal extraction module (52): selection and integration of AI plugins, searching errors, implementing statistical methods for testing and scoring a classification algorithm to find the best mode for decision support for each gateway, and selecting the modality of execution, i.e. supervised, non supervised, semi-supervised;
(e) optionally a forecast manager module (64) adapted to take data from the temporal extraction module (52) or the process discovery module (54) to verify if certain paths have an increased probability to achieve the desired outcome, preferably by testing various forecast systems against process history and selecting the most accurate for predicting the probability of the given outcome;
(0 optionally a library (67) of AI plug-ins for classification, forecasting and/or process discovery purposes feeding the classification manager module (62) and, if present, the forecast manager module (64);
(g) an interface to permit the user to interact with at least the process discovery module (54) and the output of the classification manager module (62) and the forecast manager module (64), if present, and wherein the interface optionally comprises an alert system; wherein the above listed elements (a) to (g) are located on a computer or data processor and/or in a cloud computing system, and wherein the modules can be organised in containers including a web-based application.
26) A data processing system comprising means for carrying out the steps of the method according to anyone of the claims from 1 to 24.
27) Computer program comprising instructions which, when the program is executed by a computer causes the computer to carry out the steps of the method of anyone of the claims from 1 to 24.
28) A data carrying signal carrying the computer-program of claim 27. 29) Computer-readable storage medium comprising instructions which, when the program is executed by a computer causes the computer to carry out the steps of the method of anyone of the claims from 1 to 24.
PCT/IT2020/000055 2020-07-03 2020-07-03 Platform and method for pluggable ai and machine learning cataloguing and prediction WO2022003737A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IT2020/000055 WO2022003737A1 (en) 2020-07-03 2020-07-03 Platform and method for pluggable ai and machine learning cataloguing and prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IT2020/000055 WO2022003737A1 (en) 2020-07-03 2020-07-03 Platform and method for pluggable ai and machine learning cataloguing and prediction

Publications (1)

Publication Number Publication Date
WO2022003737A1 true WO2022003737A1 (en) 2022-01-06

Family

ID=72266601

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IT2020/000055 WO2022003737A1 (en) 2020-07-03 2020-07-03 Platform and method for pluggable ai and machine learning cataloguing and prediction

Country Status (1)

Country Link
WO (1) WO2022003737A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150082864A (en) * 2014-01-08 2015-07-16 국립대학법인 울산과학기술대학교 산학협력단 Method and appartus for process model discovery using process mining
US20170111245A1 (en) * 2015-10-14 2017-04-20 International Business Machines Corporation Process traces clustering: a heterogeneous information network approach
US20180083825A1 (en) * 2016-09-20 2018-03-22 Xerox Corporation Method and system for generating recommendations associated with client process execution in an organization
WO2018138601A1 (en) * 2017-01-25 2018-08-02 Koninklijke Philips N.V. Generating a process model
US10592544B1 (en) * 2019-02-12 2020-03-17 Live Objects, Inc. Generation of process models in domains with unstructured data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150082864A (en) * 2014-01-08 2015-07-16 국립대학법인 울산과학기술대학교 산학협력단 Method and appartus for process model discovery using process mining
US20170111245A1 (en) * 2015-10-14 2017-04-20 International Business Machines Corporation Process traces clustering: a heterogeneous information network approach
US20180083825A1 (en) * 2016-09-20 2018-03-22 Xerox Corporation Method and system for generating recommendations associated with client process execution in an organization
WO2018138601A1 (en) * 2017-01-25 2018-08-02 Koninklijke Philips N.V. Generating a process model
US10592544B1 (en) * 2019-02-12 2020-03-17 Live Objects, Inc. Generation of process models in domains with unstructured data

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
"Technical report", 2011, OBJECT MANAGEMENT GROUP, article "Business Process Model and Notation (BPMN"
CONNEAU, A.SCHWENK, H.BARRAULT, L.LECUN, Y.: "Very deep convolutional networks for text classification", ARXIV PREPRINT ARXIV: 1606.01781, 2016
KANAWADAYAMEETHADITYA SANE: "2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS", 2017, IEEE, article "Machine learning for predictive maintenance of industrial machines using IoT sensor data"
KORENYEHUDAROBERT BELL: "Recommender systems handbook", 2015, SPRINGER, article "Advances in collaborative filtering", pages: 77 - 118
MALHOTRARUCHIKA: "A systematic review of machine learning techniques for software fault prediction", APPLIED SOFT COMPUTING, vol. 27, 2015, pages 504 - 518
MISHRAAMITSANJAY KUMAR JAIN: "A survey on question answering systems withclassification", JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, vol. 28.3, 2016, pages 345 - 361

Similar Documents

Publication Publication Date Title
US7574379B2 (en) Method and system of using artifacts to identify elements of a component business model
Raisinghani Business intelligence in the digital economy: opportunities, limitations and risks
CA3051499C (en) Platform product recommender
Hull et al. Data management perspectives on business process management: tutorial overview
US20210342723A1 (en) Artificial Intelligence Techniques for Improving Efficiency
WO2018236886A1 (en) System and method for code and data versioning in computerized data modeling and analysis
Li et al. Digital Platform Ecosystem Dynamics: The Roles of Product Scope, Innovation, and Collaborative Network Centrality.
US20210398232A1 (en) System and method for implementing a market data contract analytics tool
US20230281212A1 (en) Generating smart automated data movement workflows
US20140149186A1 (en) Method and system of using artifacts to identify elements of a component business model
WO2022003737A1 (en) Platform and method for pluggable ai and machine learning cataloguing and prediction
Xu et al. Designing for unified experience: a new perspective and a case study
da Silva Process Mining: Application to a case study
Kumar Software Engineering for Big Data Systems
Szafir Digital Transformation Enabled by Big Data
Salgueiro The Impact of Microsoft Power Platform in Streamlining End-to-End Business Solutions: Internship Report at Microsoft Portugal, Specialist Team Unit
Edmondson Learning Google Analytics
KR102425519B1 (en) Method for enabling business process model transaction, computer readable recording medium, and computer system
Ta et al. A specification framework for big data initiatives
Chikh Component-based approach for requirements reuse
Wagner Platform Infrastructure for Agile Software Estimation
Bister The business intelligence transformation–A case study research
Jagare Operating AI: Bridging the Gap Between Technology and Business
Grohmann Fundamental Data Mining Techniques for Declarative Process Mining
Coll Ribas A DataOps reference architecture for Data Science

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20764172

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20764172

Country of ref document: EP

Kind code of ref document: A1