GB2611852A

GB2611852A - A system and method for selecting a service supplier

Info

Publication number: GB2611852A
Application number: GB2211376.5A
Authority: GB
Inventors: Llewellyn Loveday Colin; Sebastian Hill Warrick; Mary Cooper Jessica
Original assignee: North Swell Tech Ltd
Current assignee: North Swell Tech Ltd
Priority date: 2021-08-11
Filing date: 2022-08-04
Publication date: 2023-04-19
Also published as: WO2023017244A1; GB202211376D0

Abstract

A computer system for selecting a service provider comprises a customer data processing module for receiving customer usage information and a machine learning module that comprises a trained data model which analyses the usage information and predicts future customer usage. The system further comprises a benchmarking module which compares the usage information with benchmarking data, a tender database which formats client tender information and creates an invitation to tender (ITT), a supplier database to which ITT responses are uploaded by suppliers, and a comparison module comparing and ranking the responses in accordance with one or more criteria and provides the ranking to the customer. The customer selects one of the responses supported by the usage information and ranking of responses. This allows for the customer to make better informed decisions about which supplier to use by using an automated system to match customer needs to the best supplier solution.

Description

A System and Method for Selecting a Service Supplier

Introduction

The present invention relates to a system and method for selecting a service supplier and in particular to a system and method for selecting a service supplier by analysing service usage and supplier provided information.

Background

Businesses of all sizes will use mobile devices for phone calls and to use mobile data. In a typical business, the mobile device contract will be a significant cost because of the high number of devices which are provided under the contract and the ever-increasing use of these devices. The market for providing mobile devices to businesses is dominated by a small number of service suppliers.

When purchasing a business mobile contract, a business (customer) will typically issue an invitation to tender (ITT) and seek responses from service suppliers. This process requires the business to go through a process in which they assess and identify their current device usage and create an ITT based on the information they have gathered. Such a process is often done by a business's IT director and finance team and places a significant resource burden on both. The process is very time consuming and difficult because of the high volume of data involved. In addition, most business do not have the expertise and market knowledge to find the best suppliers.

Once the ITT has been created, responses to the ITT are made by a range of service suppliers. It is common for these responses to be set up such that they emphasise the features of the service offering that the service supplier wishes to promote, at the expense of the business's requirements. Given the variety of tender responses and formats, it is often difficult and complex for the business to compare ITT responses from different suppliers. Managing the entire process in-house requires the management and analysis of lots of data. It is common for the customer to be overwhelmed by data and in receipt of much additional information, calls and meetings with service suppliers who are competing for their business. Therefore, the decision often ends up being a best guess or they choose the safe option of retaining the incumbent service supplier.

In some cases, a business will use an online comparison website such as Compare the MarketTm or a business comparison website such as BillmonitorTm to assist the in-house IT and finance departments with this process. Comparison websites come from the consumer market and don't have business mobile expertise and data analytics to give you a bespoke result. Plus, they can focus on the price rather taking into account other business key factors.

An alternative is to outsource the entire tender and procurement process to a cost reduction consultant. Cost reduction consultants tend to be general specialists rather than business mobile experts, so are unlikely to have the expertise to get the best deal for you and can be a costly option.

The solutions available at present require the manual processing of complex information or present the illusion of a computer generated solution, which merely automates some aspects of the supplier selection process without reducing the complexity of the task, providing the customer with any insight into the best solution for their business, reducing the risk that the customer has made the wrong decision and having confidence that they have made the correct decision.

Summary of the Invention

It is an object of the invention to provide an improved system and method for selecting a service provider which addresses the above problems.

In accordance with a first aspect of the invention there is provided a computer system for selecting a service provider, the system comprising: a customer data processing module for receiving customer usage information; a machine learning module which comprises a trained data model which analyses the customer usage information and predicts future customer usage; a benchmarking module which compares customer usage information with benchmarking data; a tender database which formats client tender information and which creates an invitation to tender (ITT); a supplier database to which ITT responses are uploaded by one or more suppliers; a comparison module which compares the ITT responses, ranks the ITT responses in accordance with one or more criteria and provides ranking information to the customer; wherein the customer selects one of the ITT responses supported by the customer usage information and ranking of the ITT responses.

Preferably, the customer data processing module reformats (and anonym ises) the customer data to allow the benchmarking module to compare it with a suitable benchmark tariff to determine a savings calculation.

Preferably, the invitation to tender presents the customer with a series of graphical user interfaces which contain questions designed to extract information from the customer about their requirements in a suitable format.

Preferably, the suitable format is a standardised format which compels suppliers to provide information in a way which makes it readily comparable with information provided by alternative suppliers.

Preferably, the comparison module compares the ITT responses by calculating parameter values for parameters from categories of information defined in the tender document and ranking the tender based on the categories.

Preferably, a weighting is applied to the categories.

Preferably the weighting may be altered.

Preferably, a tool is provided to the customer to change the weightings.

Preferably, the ranking information is provided in a numerical or pictorial format.

Preferably, the machine learning module comprises a neural network.

Preferably, the trained data model is trained using sample data which comprises historical usage information.

Preferably, after pre-processing, the sample data consists of a single row, with one column per input feature and one per output feature.

Preferably, the sample data is received in different categories.

Preferably, the sample data is provided in three categories, bill analysis, bill analysis

summary and calls.

Optionally. different names are used for bill analysis, bill analysis summary and calls depending on different billing platforms, In at least one embodiment, the information is contained within a summary report and a fully itemised report every action from every user over the measured billing period. A request of the customer is for an upload of summary data and itemised data. But this may not be related to the customer facing words that are coded into the platform Preferably, the sample data comprises a usage data spanning a predetermined period to allow prediction for future usage.

Preferably, the sample data comprises a usage data spanning for at least one category of sample data.

Preferably, the sample data comprises a usage data spanning for each category.

Preferably, the sample data contains months 1 to 3 of usage data for each category per day to allow prediction of month 5 usage.

Preferably, the performance of the trained data model is evaluated by splitting the data into training data, validation data and testing data.

Preferably, the training data set is used to learn the network's weights.

Preferably, the validation data set is used to assess the trained data model's output after each training run to allow adjustment of hyperparameters and/or network topology in order to improve performance, without overfitting on the training set.

Preferably, the samples are partitioned into a training data, validation data and test data sets.

Preferably, the samples are partitioned into a training data, validation data and test data sets with a ratio of 70% training data, 15% validation data and 15% test data.

Preferably the partitioned samples are stratified using the customer ID to ensure even customer representation across the datasets.

Preferably, the neural network comprises a convolutional layer, a pooling layer a dense layer with ReLu activatuion and a dense layer to output with ReLU activation.

Preferably, the convolution layer has one filter for each of the features to be predicted.

zo Preferably the convolution layer has 22 filters.

Preferably, the convolution layer has a kernel size of 31 Preferably, the convolution layer has a stride of 1.

Preferably, the convolution layer has a leaky ReLU activation.

Preferably the pooling layer comprises max pooling with a kernel size of 7 and a stride of 1, followed by a flattening operation.

Preferably, an error at output nodes of the neural network is specified by a loss function.

Preferably, the loss function is defined with reference to the weighted absolute error and the percentage error per category.

Preferably, the loss function is weighted to improve prediction in categories for which there is limited data available.

Preferably, the weighted loss function is combined with a per category percentage error to drive the model to predict equally well across all categories, rather than learning to predict the easier features at the expense of the more difficult ones.

Preferably, network weights are updated in such a way that minimises the loss function.

Preferably, a change in the output of the loss function changes as the weights change is calculated using backpropagation, an algorithm which calculates the partial derivative of the loss function with respect to each weight, the weight is then adjusted such that the gradient of the slope decreases.

Preferably, the amount by which the weight changes is specified by a learning rate 20 and momentum, and whether it is increased or decreased depends on the direction of the slope.

Alternatively, the machine learning module comprises a multivariate multiple regression (MMR) model is used create a model to predict future data usage in which multiple dependent variables YL..n are modelled using multiple inputs XL.,..

Preferably, the machine learning module comprises a neural network with a single linear layer.

Preferably, the neural network is trained using an adaptive optimiser to optimise weights b such that the error e between the predicted data and actual data is minimized.

Preferably, the machine learning module: categorises customer data; and apportions the categorised customer data into overlapping samples for a predetermined time period to create a base dataset from which the training, validation and test data is built.

Preferably the customer data is usage data.

Preferably the predetermined time period is 70-110 days.

More preferably the predetermined time period is 90 days.

Preferably the MMR uses call and data usage records for a given customer for the predetermined time period to predict that customer's usage for a future time period days to allow the system to recommend the most suitable plan for that customer one month ahead of time.

Brief Description of the Drawings

The invention will now be described by way of example only with reference to the accompanying drawings in which: figure 1 is a block diagram which illustrates an example of a system in accordance with the present invention; figure 2 is an illustration of a kernel as applied to a layer's input data in a convolutional neural network (CNN); figure 3 is an illustration of the process of feature extraction and classification in a CNN; figure 4 is an illustration of the partitioning of data used to train a model in a CNN; figure 5 is a topology of a CNN in accordance with the present invention; figure 6 is a diagram which shows input variables, operations and an output placeholder; figure 7 is a graph of loss v weight which shows an optimal point for optimal weight and loss; figure 8 is a graph of average per customer percentage validation batch error in a range of categories figure 9 is a graph of loss over training run; figure 10 is a graph of a sample dataset provided over a period of time; figure 11 is an illustration of the MMR input and output data; figure 12 is a graph which shows an example of the trained model predictions for customer data; figure 13 is a graph which shows an example of the trained model predictions for zo customer data; figure 14 is a graph which shows percentage category errors for a range of data categories for a token; figure 15 is a graph which shows mean absolute error/category maximum error for a range of data categories for a token; figure 16 is a schematic flow diagram which shows the process steps for the use of an example of a system in accordance with the present invention; figure 17 is a schematic flow diagram which shows the process steps for a customer savings calculation in the example of the system of the present invention shown in figure 16; figure 18 is a schematic flow diagram which shows the process steps for a customer creating a tender specification in the example of the system of the present invention shown in figure 14; figure 19 is a schematic flow diagram which shows the process steps for a supplier creating a tender response in the example of the system of the present invention shown in figure 16; figure 20 is a schematic flow diagram which shows the process steps for a customer selecting a supplier in the example of the system of the present invention shown in figure 16; figure 21 is a screen shot from an example of a graphical user interface which shows the ranking of suppliers and their score in the example of the system of the present invention shown in figure 16; figure 22 is a screen shot from an example of a graphical user interface which shows the side-by-side comparison of suppliers in the example of the system of the present invention shown in figure 16; figure 23 is a screen shot from an example of a graphical user interface which shows a "health score for several parameters contained in the tender response in the example of the system of the present invention shown in figure 16; and figure 24 is a screen shot from an example of a graphical user interface which shows a slider tool which allows a customer to reprioritise parameters in the example of the system of the present invention shown in figure 16;

Detailed Description of the Drawings

The present invention comprises a software application which has been designed to allow businesses to accurately determine their mobile device usage and to allow them to select a service provider based on a standardised ITT.

The solution provided by the present invention had to address several technical problems which had limited the extent to which it was possible to automate the tender process. Typically, customers can provide a limited amount of data, some customers have more historical data than others, and some have periods for which we have no data at all. In addition, different customers have different usage patterns depending on, the size of the customer company, location, amount of travel undertaken by the employees, data use and talk time.

The present invention has developed a process and method which includes a machine learning module which works for a wide range of company sizes with a wide range of usage and therefore avoids the need to train separate systems for each customer company.

The system and method of the present invention provides accurate predictions via the machine learning module which operates across a number of parameters/features/usage categories to allow it to predict usage taking into account factors such as the wide variety of sparsity of usage across different categories for different customers and the fact that different suppliers classify usage differently.

In at least one aspect, the present invention provides a smart automated comparison platform made for business mobile communications, seamlessly matching customer needs to the best supplier solution. It automates the process of procurement so that the customer gets a robust, repeatable, and reliable process which uses machine learning and advance analytics to ensure accurate tenders, tailored proposals and side by side comparison resulting in clear and informed business choices.

Figure 1 shows a customer 3 who may access the system 1. The customer uploads usage information to a customer data processor 5 where the data is formatted and standardized. The standardized usage information is analysed by a machine learning module, referred to as the prediction module 7 which analyses the customer usage information and predicts future customer usage.

Benchmarking module 9 compares the standardized usage information with benchmark data to create a savings value as an indication of the level of savings available to the customer 9. The benchmarking module may also compare the standardized usage information with a predicted future use value which has been calculated by the prediction module 7 Once the customer 3 has received savings information from the benchmarking module 9, the customer 3 may initiate a tender process by logging on to the system and creating an invitation to tender using the ITT database 11. The ITT database takes the customer 3 through a number of steps which capture customer information on parameters such as their budget, number of lines required and hardware.

Suppliers 19, 21, 23 may log in to the system and search for suitable ITTs or may specify the parameters which match the service they supply. Once they have found an ITT which they are interested in responding to, they complete a tender response online. The tender responses are stored in a supplier database 13 and then the comparison module 15 analyses and ranks the tender response based on several predefined criteria. The customer 3 may modify the weighting attached to each of the criteria to meet a range of changing circumstances or simply according to preference. The customer makes a tender selection 17 which is communicated to the successful supplier 23.

As described above, the system of the present invention uses machine learning to predict future call and data usage from historical usage information. As described in the following example, this result is achieved by the creation of a convolutional neural network which achieved an average per category validation batch error of under 10% across all customers.

The system creates a machine learning /prediction module from historical usage information provided by a customer which has been uploaded and which represent the three months of call and data usage.

This information is the input to a machine learning algorithm which then predicts future data and call usage. For example, the historical usage information for three months, defined as months 1 to 3, can predict the sum of that customer's data and call usage for the period two months in the future (month 5). Once this prediction has been made, a recommendation as to the most suitable plan for that customer, which takes into account future usage, is made. The machine learning system has been designed to provide accurate and reliable results with a relatively small amount of data.

The machine learning model has been designed to operate as a single system which is capable of handle customer data which has different usage patterns depending on, for example the size of the company, location, amount of travel undertaken by the employees.

The machine learning model of the present invention is capable of generalising. In particular, it has been designed to accommodate a large number of features (usage categories) in proportion to the amount of data we have to work with, a wide variation in the sparsity of usage across different categories and a variation in the way in which usage is classified by different providers, for example, provider has different countries which fall into a certain usage category.

In a one embodiment of the present invention the machine learning module comprises an artificial neural network which is a computational model consisting of a number of layers through which data is passed. It has an input layer, some number of hidden layers, and an output layer. Each layer contains several nodes, which take input data and transform it using an activation function in order to introduce non-linearity into the network. Each of these nodes are connected to the nodes in the previous layers by links, which multiply the output of the previous node by a weight. There is also a bias node in each layer, which is set to 1, and has weighted links to each node in the next layer. This bias essentially allows the network to shift the output of the activation function independent of the input data.

The neural network creates a model based upon sample data which is known as training data. The training data is used to optimize the weights during the training stage, such that a loss function (a measure of the difference between the actual prediction of the trained network and what we want to train it to predict) is minimised.

Figure 2 is an illustration of a kernel as applied to a layer's input data in a convolutional neural network. It shows input data 27 assembled as a row of different features 29 each feature being presented as a time series in a column 31. Convolution enables location independent recognition of features in the data. This is done by constructing a kernel 33 (also sometimes called a filter or feature detector) which is a matrix containing the network weights that we aim to learn. This kernel 33 is applied to that layer's input data a number of times using a sliding window, covering the entirety of the input. The distance from the centre of one kernel application to the next kernel application is called the stride. For every application, the results of element wise multiplication over the two matrices (the inputs at the kernel location, and the kernel weights) are summed to produce a single element of the layer's output 35.

Figure 3 is an illustration 41 of the process of feature extraction 43 and classification 45 in a convolutional neural network (CNN). In this example, the neural network is a convolutional neural network (CNN) which receives an input 47, contains convolutional layers in which each node is not connected to every other node in the previous layer and in which some weights are shared. The convolutional layers are interspersed with pooling layers 51 in which the matrix passed from the previous convolutional layer is down sampled and which are followed by a few dense layers which are fully connected layers 53 which map the features identified by the preceding operations to the desired output format.

The convolutional and pooling layers are collectively responsible for detecting high level features in the input data and the fully connected layers then use those features to output a class prediction. In this example, a one-dimensional convolution is used because the input data takes the form of a time series rather than a two-dimensional image.

Pooling layers are used to reduce the dimensions of the feature maps. Thus, it reduces the number of parameters to learn and the amount of computation performed in the network. The pooling layer summarizes the features present in a region of the feature map generated by a convolution layer.

When training a machine learning system such as a neural network it is important to avoid overfitting or underfitting to the training data. Underfitting is when a model fails to learn optimal weights, usually because it has an insufficiently complex topology, too few epochs, or a too small or noisy dataset. Underfitting can be addressed by a variety of techniques, including adjusting model topology, increasing the number of layers or nodes in each layer, procuring more data or augmenting the existing data.

Conversely, overfitting is when the model learns to perform very well on the data it is trained with but fails to work as well when previously unseen data is used. This can be due to an overly complex model which identifies noise in the training data as salient features, over-optimisation of the model parameters over a large number of epochs, or a biased, inadequate or unrepresentative training dataset. Overfitting can be addressed by stopping training before the error completely converges, simplifying the model, or procuring or generating better data.

In this example of the present invention, the training data comprises approximately two years' worth of historical data for 15 customers from three different providers.

After pre-processing, each training sample consists of a single row, with one column per input feature and one per output feature.

The usage data is received in the form of three tables: bill analysis; bill analysis summary; and call categories. Bill_analysis maps various csv files containing customer records to the ID of that customer and their provider.

Bill analysis summary contains all call and data usage records from those files, with each row assigned the appropriate bill analysis ID according the source of its data. Call categories simply maps the category ID to its name.

Selected summary with_customer ids are constructed, which selects the relevant columns from bill analysis summary and merges onto on the bill analysis ID and customer name to form a single table in preparation for building the datasets. From this each customer's usage per category per day is aggregated, to form a per day data table. For days where no information was present, 0.0 values are inserted for each category. This table was then apportioned into samples, each containing months 1 -3 of usage data for each category per day (our model input) and the sum usage for each category for month 5 (what we want to learn to predict). This is the base dataset from which the training, validation and test data are constructed.

In this example, the model focuses on predicting usage rather than cost, since cost is likely to change over time and for different plans. In addition, cost may be calculated directly from usage as necessary when predictions are made. The available data allowed the generation of -6000 samples.

In order to evaluate the performance of the neural network objectively and identify overfit or underfit, the data was typically split into three distinct sets: training, validation and testing. Each set was created to be roughly representative of the dataset as a whole and for this reason the samples were randomly shuffled before partitioning.

The training set is used to learn the network's weights. The validation set is not used for training, but rather to assess the trained model's output after each training run to allow adjustment of hyperparameters and/or network topology in order to improve performance, without overfilling on the training set. The testing set is not used during this process, but only once a final, trained model has been developed. This avoids overfilling of hyperparameters and topology to the validation set and ensures that our evaluation metrics are based on classifying unknown data.

Figure 4 is a diagram 61 which shows the partitioning of data used to train a model in a CNN. The entire dataset 63 is partitioned into a training data 65, validation data 67 and test data sets with a ratio of 70% training, 15% validation and 15% test. The data was stratified using the customer ID to ensure even customer representation across the datasets. The training data 65 and validation data 67 is used to train the model, try different models and adjust hyperparameters. The test data 69 is only used when testing final models and parameters.

The design of the network requires the specification of a number of variables including, number of layers, the number of nodes at each layer, how the activation functions is used, how the nodes are linked together and any other transformations applied to the data. This is referred to as its topology and is illustrated in the diagram figure 5 which shows an input matrix 73, output features 83 and a four layer neural network with is specified consisting of: Layer 1: Convolutional layer 75 with 22 filters (the number of features we wish to predict), a kernel size of 31, a stride of 1 and a leaky ReLU activation Layer 2: Max Pooling layer 77 with a kernel size of 7 and a stride of 1, followed by a flattening operation Layer 3: Dense Layer 79 with 24 nodes and a ReLU activation Layer 4: Dense Layer 81 to output and a ReLU activation Hyperparameters refer to the variables we must set when training the network. The optimal values for these were selected via heuristics and iterative grid search. These include: * Learning Rate: how much the model weights are updated by at each batch. This, along with epsilon, is used for Adam Optimisation (a form of stochastic gradient descent which we use to train the model).

* Epsilon: A small constant for numerical stability which is added to the denominator when computing gradients for backpropagation -this is necessary to avoid division by zero errors.

* Maximum Number of Epochs: this was set at 1500 * Stagnation Threshold: after how many epochs with no loss improvement to quit training.

* Batch Size: how many samples we pass the model before the error is calculated and the weights updated.

How the error at the output nodes of a network is calculated is specified by the loss function, which takes Y, the actual output of the sample, and?, the model's prediction. How the loss function is defined depends upon the task at hand -in this case, it was found that a custom loss function taking into account the weighted absolute error and the percentage error per category was most effective. Due to imbalance in the proportion of minutes/bytes usage per category (e.g. UK data usage is frequently present in the daily data and can be up to 500,000 bytes per day, while directory calls are far rarer and range in length from 0 to just 14 minutes) it was necessary to weight the loss to ensure that the model didn't just learn to predict well on the categories for which there is a lot of data available. Combining this with the per category percentage error ensured that the model was driven to predict equally well across all categories, rather than learning to predict the easier features at the expense of the more difficult ones.

Combined weighted absolute and per category percentage loss: Ed) L(Y, = - (1 -y for y in rt.()E7.0(Yid()Kir 1) ((I 100) + 1) Training and validation scripts are written using TensorFlow, a convenient python framework for building systems to train and test machine learning models, and running those systems in high-performance C++. It facilitates the creation of directed graphs to control the flow of data (in the form of multidimensional matrices -also called tensors) through operations.

A placeholder is a variable that has data assigned to it at a later date. It allows for the creation of operations and the building of our computation graph, without needing the data. Data into the graph through these placeholders. TensorFlow also includes a tool called TensorBoard, a visualisation suite which is used to plot quantitative metrics and generate visualisations of the computational graph. As illustrated in the diagram 91 of figure 6 which shows a placeholder input 93 and variable inputs 95, operations 97 and output 99.

In order to update the network weights in such a way that minimises the loss function, the change in the output of the loss function as the weights change is determined. This is calculated using backpropagation, an algorithm which calculates the partial derivative (AKA the slope) of the loss function with respect to each weight. This weight is then adjusted in such that the gradient of the slope decreases -a technique more commonly known as gradient descent. The amount by which the weight changes is specified by the learning rate and the momentum, and whether it is increased or decreased depends on the direction of the slope.

Figure 7 shows a graph 101 which plots loss 103 against weight 105. Curve 107 shows the initial weight 113, weight updates 115, calculated gradients 111 and a minimum value 109. Adam (Adaptive Moment Estimation) Optimisation, a specialised form of gradient descent is used to compute the learning rate and momentum for each weight individually.

When running the training script, optional arguments may be specified from the command line. (See script or run python run.py -h for details.) If these are not specified, the script runs with default values selected to give the best performance.

When the script is run, an exit handler is initiated which evaluates and saves information about the model and training performance upon exit. This is so that it's possible to kill a running script from the command line without losing any information -the model's parameters are logged and evaluation metrics calculated and saved to file.

A unique ID is generated for each training run, so that the same models training on the same data but running with different parameters can be distinguished for later evaluation. The data is read in from the database and standardised before being batched. Other variables are also initialised at this point to control how often the model is saved and how many times per epoch a summary is printed. TensorFlow variables and operations are also created, along with summary operations to log the loss and accuracy for later visualisation in TensorBoard.

When the session is run, each epoch and each batch is looped through. At each batch, a number of samples equal to the batch size are passed to the model and its predictions are calculated. These are compared to the known output, and the current loss is stored. The Adam optimiser then updates the network weights and the next training batch is generated.

The default value for the number of epochs is 10,000 -but in practice training ends before that number is reached. At each epoch, we check if the number of epochs in a row with no loss decrease has reached the stagnation threshold -if so, we reduce the learning rate. If there are twice the stagnation threshold epochs without loss decreased, or if the error decreases past the convergence threshold, training is stopped early. This is partly to avoid models overfitting, and partly to save time if a model or a certain set of hyperparameters are clearly not performing well.

Each time a training run is completed, the model's performance over the entire run is logged for viewing in TensorBoard, and its parameters are saved so that it can be loaded at a later date for testing or to make predictions. The training script is also saved. After each training run, the model's performance on both training and validation sets is logged along with its hyperparameters and saved to the database. (See training_validation_results, validation_results and test_results) We primarily 113 used the average percentage batch error per category on validation to rank potential models and hyperparameter performance. The best performing model as described above achieved an average per category training batch error of 6.05% and an average per category validation batch error of 8.76%.

A benchmark dataset consisting of uniformly distributed random noise produced an average per category training batch error of 46.79% and an average per category validation batch error of > 100%, confirming the good fit of our model to the data. A random sample from the final batch of training predictions of the most successful model is shown below.

Random Training Sample: Avg per category training batch error: 6.56% Total training batch error: 4.00% CATEGORY PRED ACTUAL ERROR % BATCH ERR calls_from_europe 221 163 -58 8.5 calls_from_row 0 0 0 9.9 calls_from_traveller 129 140 11 4.9 chargeable_calls 0 0 0 9.2 data_europe 1027 1207 180 4.8 data_row 459 734 275 8.6 datatraveller 0 0 0 5.2 data_uk 85410 86970 1560 5.7 directory 0 21 21 6.4 free_calls 0 0 0 8 idd_uk_mins_to_apa c 0 43 43 5.6 idd_uk_mins_to_eur ope 0 539 539 6.2 idd_uk_mins_to_row 0 7 7 6.7 idd_uk_mins_to_usa _canada 0 390 390 5.3 other 0 108 0 6.3 picture_msg 0 0 0 10.6 texts_while_roaming 0 0 0 4.5 uk_calls 13618 13586 -32 5.7 u k_texts 0 0 0 5.6 ukto_internationalt exts 0 0 0 7 Training & Validation Results BATCH_SIZE: 48 TRAIN_BATCHES_PER_EPOCH: 87 NUM_OF_EPOCHS: 1502 TRAINING_TIME: 12 minutes TRAIN_AVG_LOSS: 2306 TRAIN_TOTAL_BATCH_ERROR: 7.11% TRAIN_AVG_CAT_BATCH_ERROR: 6.05% EVAL_AVG_LOSS: 2306 EVAL_TOTAL_BATCH_ERROR: 6.05% EVAL_AVG_CAT_BATCH_ERROR: 8.76% LEARNING_RATE: 0.001 + decay EPSILON: 0.000 Per category avg batch errors: Per customer avg batch errors: calls_from_europe 5.94% 1.0 8.45% calls_from_row 9.98% 2.0 6.79% calls_from_traveller 4.83% 3.0 5.76% chargeable_calls 11.9% 4.0 10.2% data_europe 5.07% 5.0 2.99% data_row 6.75% 6.0 14.6% data_traveller 6.67% 7.0 6.97% data_uk 4.96% 8.0 4.59% directory 6.36% 10.0 4.17% free_calls 7.36% 11.0 3.51% idd_uk_mins_to_apac 7.59% 12.0 10.9% idd_uk_mins_to_europe 9.21% 14.0 5.84% idd_uk_mins_to_row 13.68% 15.0 12.5% idd_uk_mins_to_usa_and_canada 5.12% 16.0 6.23% other 11.30% picture_msg 7.54% te>cts_while_roaming 11.79% uk_calls 10.36% uktexts 8.08% uk_to_international_texts 11.43% Figure 8 is a graph of average per customer percentage validation batch error in a range of categories and figure 9 is a graph of loss over training run as related to the above tables.

In another embodiment, an alternative machine learning algorithm was used to improve the quality of prediction of future call and data usage from historical information. In this example, the system used call and data usage records for a given customer for the past 90 days to predict that customer's usage for the next 60 days to allow the system of the present invention to recommend the most suitable plan for that customer one month ahead of time. This example is illustrated in figures 10 to 14 In doing so, the machine learning algorithm was designed to work with a limited data set and variable amounts of data to address the technical problem that some customers have more historical data than others and some have periods for which no data existed. In addition, different customers have different usage patterns depending on the size of the company, location, amount of travel undertaken by the employees et cetera.

The machine learning algorithm was designed to create a generalised model to address the technical problem that some customers do not have sufficient data to train separate systems. It also had a relatively large number of features (usage categories) that must be predicted in proportion to the amount of data which is available to work with and the sparsity of usage across different categories is very varied.

In this example, approximately 18 months of historical usage data for five customers was used. Each was assigned an identification token. For each day, usage data for the following categories was available.

iddUkMinsToEurope','iddUkMinsToRow','iddUkMinsToUsaCanada', 'iddUkMinsToApac', 'iddMinsTotal', 'dataUk', 'ukToInternationalTexts', 'pictureMessages', 'chargeableCalls', icallsFromEuropeydataEuropeljoam ingDaysEU1,1roam ingDaysROW,lroam ing Days Traveller', 'directory', 'ukCalls';ukTexts',1freeCalls',1other',1callsFromTraveller', 'dataTraveller', 'textsVVhileRoaming', 'callsFromRow', 'dataRow', 'oobCost' For each customer, this data was then apportioned into overlapping samples, each containing 90 days of usage data for each category per day (model input) and the following 60 days of usage data (usage prediction) to create a base dataset from which the training, validation and test data is built. In this example, usage rather than cost is predicted because cost is likely to change over time and for different plans. In addition, cost may be calculated directly from usage as necessary when predictions are made.

Figure 10 is a graph 100 which shows data related to the customer tokens labelled 102, 104, 106, 108 and 110. Per customer sample count: Customer Token Number of Samples (90 + 30 days) 67d4ad8e-baa1-4140-bf9d-c53c0fde8f23 110 110 123b5671-f3be-4865-931247c107086a4e 308 308 06caf371-a 1 b5-45ff-9612-bad3ece84c2b 33b64c7d-5d46-4e8f-874f-e4383b939b2f 286 286 b0391f16-ebd7-4ee4-9999-714f614425c1 This dataset presents challenges as the majority of customers have large periods for which data is unavailable. For this reason, the analysis herein will focus on customers b0391f16-ebd7-4ee4-9999-714f614425c1 and 06caf371-al b5-45ff-9612-bad3ece84c2b, for which we have complete data.

In this example multivariate multiple regression (MMR) was used in preference to larger models with higher data requirements. MMR is a statistical machine learning technique, in which multiple dependent variables Yt.. are modelled using multiple inputs XL,. It can also be understood as a neural network with a single linear layer.

Since the data is continuous, the mean square error (MSE) is used as criterion.

= bo + bix, + b2x2+...bnx" for i in X e = 1 -n (111 -Y)2 A stochastic gradient descent (AdamVV) is formed to optimise weights b such that the error e between the predicted data and actual data is minimized. figure 11 is an illustration 120 of the MMR input 122 and output 124 data and learned weights 126.

The MMR model was trained individually for each customer dataset for 1000 epochs, with a learning rate of 0.001. Each dataset was normalised between 0 and 1 prior to training. We also train a base model on all samples from all existing customers, which can be used to make predictions for new customers with little historical data available.

This training resulted in validation losses as follows: B0391f16-ebd7-4ee4-9999-714f614425c1 0.004099701996892691 33b64c7d-5d46-4e8f-874f-e4383b939b2f 0.0030984063632786274 06caf371-a 1 b5-45ff-9612-bad3ece84c2b 0.009908866137266159 123b5671-f3be 4865-9312-f7c107086a4e 0.004331550560891628 67d4ad8e-baal -4140-bf9d-c53c0fde8f23 0.0013555206824094057 BASE (all training samples) 0.00632152333855629 Figures 12 and 13 are graphs which shows an example of the trained model predictions for customer data. Figure 12 shows a graph 130 with a numerical Y axis 140 and a temporal (days) x axis 138, trained model predictions: actual 132 (green), in predicted 136 (purple), and error 134 (red) for customers b0391f16-ebd7-4ee4-9999- 714f614425c1 andO6caf371-al b5-45ff-9612-bad3ece84c2b.

Figure 13 shows a graph 150 with a numerical Y axis 160 and a temporal (days) x axis 158, trained model predictions: actual 152 (green), predicted 156 (purple), and 15 error 154 (red) for customers b0391f16-ebd7-4ee4-9999-714f614425c1 and 06caf371-al b5-45ff-9612-bad3ece84c2b.

The performance of the model is quantified for each class c using the absolute error (prediction -actual usage) divided by the maximum actual usage value in each category, averaged over the total number of days d for which a prediction may be made: ci For example, if actual usage for some category ranged between 0 and 10 units, and on average the prediction was incorrect by 3 units, a percentage error of 30% would be apparent.

Figure 14 is a graph which shows percentage category errors for a range of data categories for a token and figure 15 is a graph which shows mean absolute error/category maximum error for a range of data categories for a token.

As shown in figures 14 and 15, this embodiment of the present invention provides predictions which are generally very reliable, falling within 8% of the maximum usage for each category for both tokens.

Figures 16 to 24 presented below illustrate an example of the customer and service provider experience when using a system in accordance with the present invention.

Figure 16 is a schematic flow diagram 141 which shows the process steps for the use of an example of a system in accordance with the present invention. The customer uploads their usages data 143 in a suitable format. The data may be collected automatically from the current service provider of the client may upload the data from billing and usage information they have received from the service provider.

The platform takes the usage data, reformats it and compares it with benchmark tariff to determine a savings calculation 145. At this part of the process, the system may use real current data or may use data which has been created as a prediction of 25 future use to provide a benchmark comparison with future use as shown in figure 1.

In order to create an invitation to tender (ITT), the customer completes a questionnaire and the ITT is published on the platform and is viewable by service providers 147.

Suppliers who view the tender may wish to complete a tender response 149. The tender responses is submitted and the system marks and ranks the tender responses and produces a recommendation based on the marks and rankings 151.

In addition, the system provides various ways in which a visual comparison may be made between different tender responses 153 in order to assist the customer in making a decision and to allow them to alter the scoring of the parameters defined in the ITT.

Thereafter, the customer choses a supplier 155 and contracts are negotiated signed. There is also the facility to provide automatic feedback to the suppliers who were unsuccessful 157.

Figure 17 is a schematic flow diagram 161 which shows the process steps for a customer savings calculation. The customer firstly goes to the URL 163 then enters a username/password or registers 165. The user then uploads usage information 167. This may include the details of the network currently used by the customer. The system will ask the customer to provide login details for current supplier network billing engine or the customer can manually add billing information.

The customer then adds company details 169 and initiates the benchmarking process which can be almost immediate or may take several hours depending upon the customer's current network.

Once the results have been calculated, the customer will receive notification that they are available. Typically, the notification is via an email inviting the customer to see the minimum savings they will be able to achieve on the platform. This is based on the benchmark tariff the system has created based on customer requirements.

The savings are based on the active lines on the account over the last 3 months.

Figure 18 is a schematic flow diagram 191 which shows the process steps for a customer creating a tender specification. The customer firstly goes to the URL 193 then enters a username/password or registers 195. The customer is then presented with a series of graphical user interfaces which contain questions with drop down boxes designed to extract information from the customer about their requirements, which the customer must complete to procced. This may include expiry of current contract, preferred length of new contract, number of voice lines, number of data lines, usage growth, current spend, key post codes for signal, EU usage, non-EU usage and device/hardware requirements. The customer may add additional information which is relevant to the ITT in free form text boxes.

When completed the customer will finish the process, receive an acknowledgement and obtain a timeline for response 199.

The customer may check and edit the ITT prior to publication. The tender will close in a minimum of 2 weeks or 2 weeks after the month where the contract finishes.

Once published the customer cannot make any changes to the tender.

Figure 19 is a schematic flow diagram 201 which shows the process steps for a supplier creating a tender response. The supplier firstly goes to the URL 203 then enters a username/password or registers 205.

Once the supplier has accessed their account, they can search for tenders 207. The graphical user interface allows the supplier to view the tender 209 in order to access details of a customer's requirements including important free form text for additional customer requirements.

The next stage 211 is an eight step process to complete and submit a tender response. In doing so, the supplier must provide specific cost and service level information relating to Tariff, 00B, Hardware fund, network additional lines signal and customer service. Thereafter the tender response is submitted on the system, 213.

The above information is submitted using templates for general tariff requirements for the various items such as monthly line discount, usage which is pre-categorised for ease of analysis, traveller days which correspond to the daily roaming allowances, ROW is the actual amounts of usage for countries which are outside of any packages offered by the networks.

Costs may be inputted based on the normal per usage cost. If packages are available to cover usage in a more cost effect way there is a Bundle Wizard which allows a supplier to add the specific pack and apply it to the usage type. Bundles are applied and total cost is detailed against each category, however, the total for the category is only counted once so there is no double counting for the 00B totals.

Cost of devices which are to be part of the hardware fund value not in addition to the total hardware fund provided. It is assumed that the hardware will be deducted from the total fund at the cost. Postcodes selected by customer to check signal important if customer has confirmed a network change possibility.

A summary of all the responses in the tender for suppliers is provided for review. If correct suppliers can post their tender using the submit tender response button 213.

Figure 20 is a schematic flow diagram 221 which shows the process steps for a customer selecting a supplier. The customer firstly goes to the URL 223 then enters a username/password or registers 225. Thereafter the customer views the tender responses 227, selects and modifies tender metrics 229 and selects the most suitable tender 231.

A selection of the graphical user interfaces presented to the customer are provided in figures 21 to 24. The GUIs assist the customer in comparing the tender responses, reprioritising the parameters and ultimately selecting the best tender.

Figure 21 is a screen shot 241 from an example of a graphical user interface which shows the ranking of suppliers and their score 243. The categories 245 are (from left to right) average per line per month cost, monthly cost, total fixed and 00B cost and contract saving. As can be seen, four responses are listed and ranked in order of score.

Figure 22 is a screen shot from an example of a graphical user interface which shows the side-by-side comparison of suppliers which is continued in figure 23 which shows a "health score" for the parameters. The score 253 is shown at the top of the table for each tender response, with the categories 255 arranged in columns. A summary breakdown is provided 257 and a health check which is a colour code which goes from green, light green, yellow orange and red which signify the quality of the response in relation to the following parameters value, tariff, customer service, signal and convenience, green being best and red being worst.

Figure 24 is a screen shot 271 from an example of a graphical user interface which shows a slider tool which allows a customer to reprioritise parameters. The parameters of tariff, value customer service network and convenience 273 each have a slider 275 with a max and min value represented by the line 277 which allows the customer to assign different weights (importances) to these parameters. Reprioritisation can lead to a change in the ranking of the suppliers and result in a different outcome.

The present invention provides a system and method for selecting a service supplier which makes it easier for a customer to get the optimum business mobile contract.

In at least one example, the system of the present invention comprises an automated comparison platform made for business mobile communications which seamlessly matches customer needs to the best supplier solution. The system has automated the process of procurement so that the customer gets a robust, repeatable, and reliable process in which machine learning and advance analytics is used to ensure accurate tenders, tailored proposals and side by side comparison resulting in clear and informed business choices. The system aims to save a customer time and money. In addition, it gives a reliable and systematic basis for their purchase decision.

It makes the process automated, streamlined, and secure, helps a customer to find the best value, tailored solution and provides a smart comparison platform which expertly ranks, compares and recommends, giving you confidence in your chosen solution.

From the initial automatic bill upload and analysis to the creation of the tender, the present invention cuts out unnecessary supplier interactions. It provides a customer with a free, competitive environment where they can find the best supplier solution and to realise significant costs savings. The comparison platform provides side by side comparison, with ranked attributes which can be weighted depending on customer requirements In at least one aspect of the present invention, the system matches individual customer needs to the best service supplier selected from a number of alternatives.

This will help bring transparency to the tender process, cutting out the guesswork, enabling real, informed choice, and saving time, money and resources on both sides.

The description of the invention including that which describes examples of the invention with reference to the drawings may comprise a computer apparatus and/or processes performed in a computer apparatus. However, the invention also extends to computer programs, particularly computer programs stored on or in a carrier adapted to bring the invention into practice. The program may be in the form of source code, object code, or a code intermediate source and object code, such as in partially compiled form or in any other form suitable for use in the implementation of the method according to the invention. The carrier may comprise a storage medium such as ROM, e.g. CD ROM, or magnetic recording medium, e.g. a memory stick or hard disk. The carrier may be an electrical or optical signal which may be transmitted via an electrical or an optical cable or by radio or other means.

In the specification the terms "comprise, comprises, comprised and comprising" or any variation thereof and the terms include, includes, included and including" or any variation thereof are considered to be totally interchangeable and they should all be afforded the widest possible interpretation and vice versa.

Improvements and modifications may be incorporated herein without deviating from the scope of the invention.

Claims

Claims 1. A computer system for selecting a service provider, the system comprising: a customer data processing module for receiving customer usage information; a machine learning module which comprises a trained data model which analyses the customer usage information and predicts future customer usage; a benchmarking module which compares customer usage information with benchmarking data; a tender database which formats client tender information and which creates an invitation to tender (ITT); a supplier database to which ITT responses are uploaded by one or more suppliers; a comparison module which compares the ITT responses, ranks the ITT responses in accordance with one or more criteria and provides ranking information to the customer; wherein the customer selects one of the ITT responses supported by the customer usage information and ranking of the ITT responses 2. The computer system as claimed in claim 1 wherein, the customer data processing module reformats the customer data to allow the benchmarking module to compare it with a suitable benchmark tariff to determine a savings calculation.3. The computer system as claimed in claim 1 or claim 2 wherein, the invitation to tender presents the customer with a series of graphical user interfaces which contain questions designed to extract information from the customer about their requirements in a suitable format.4. The computer system as claimed in claim 3 wherein, the suitable format is a standardised format which compels suppliers to provide information in a way which makes it readily comparable with information provided by alternative suppliers.5. The computer system as claimed in any preceding claim wherein, the comparison module compares the ITT responses by calculating parameter values for parameters from categories of information defined in the tender document and ranking the tender based on the categories.6. The computer system as claimed in claim 5 wherein, a weighting is applied to the categories.7. The computer system as claimed in claim 6 wherein, the weighting may be altered.8. The computer system as claimed in claim 7 wherein, a tool is provided to the customer to change the weightings.9. The computer system as claimed in claim 5 wherein, the ranking information is provided in a numerical or pictorial format.10. The computer system as claimed in any preceding claim wherein, the 15 machine learning module comprises a neural network 11. The computer system as claimed in any preceding claim wherein, the trained data model is trained using sample data which comprises historical usage information.12. The computer system as claimed in claim 11 wherein, after a pre-processing step, the sample data consists of a single row, with one column per input feature and one per output feature.13. The computer system as claimed in claim 11 and claim 12 wherein, the sample data is received in different categories.14. The computer system as claimed in any of claims 11 to 13 wherein, the sample data is provided in three categories, bill analysis, bill analysis summary and calls.15. The computer system as claimed in claims 11 to 14 wherein, the sample data comprises a usage data spanning a predetermined period to allow prediction for future usage.16. The computer system as claimed in any of claims 11 to 15 wherein, the sample data comprises a usage data spanning for at least one category of sample data.17. The computer system as claimed in any of claims 11 to 16 wherein, the sample data comprises a usage data spanning for each category.18. The computer system as claimed in any of claims 11 to 17 wherein, the to sample data contains months 1 to 3 of usage data for each category per day to allow prediction of month 5 usage.19. The computer system as claimed in any preceding claim wherein, the performance of the trained data model is evaluated by splitting the data into training data, validation data and testing data.20. The computer system as claimed in claim 19 wherein, the training data set is used to learn the network's weights.zo 21. The computer system as claimed in claim 19 wherein, the validation data set is used to assess the trained data model's output after each training run to allow adjustment of hyperparameters and/or network topology in order to improve performance, without overfilling on the training set.22. The computer system as claimed in claim 11 wherein, the sample data is partitioned into a training data, validation data and test data sets.23. The computer system as claimed in claim 21 wherein, the samples are partitioned into a training data, validation data and test data sets with a ratio of 70% training data, 15% validation data and 15% test data.24. The computer system as claimed in claim 22 wherein, the partitioned samples are stratified using the customer ID to ensure even customer representation across the datasets.25. The computer system as claimed in claim 10 wherein, the neural network comprises a convolutional layer, a pooling layer a dense layer with ReLu activatuion and a dense layer to output with ReLU activation.26. The computer system as claimed in claim 25 wherein, the convolution layer has one filter for each of the features to be predicted.27. The computer system as claimed in claim 25 wherein the pooling layer comprises max pooling with a kernel size of 7 and a stride of 1, followed by a flattening operation 28. The computer system as claimed in claim 10 wherein, an error at output nodes of the neural network is specified by a loss function.29. The computer system as claimed in claim 28 wherein, the loss function is defined with reference to the weighted absolute error and the percentage error per category.zo 30. The computer system as claimed in claim 28 or claim 29 wherein, the loss function is a weighted loss function which is weighted to improve prediction in categories for which there is limited data available.31. The computer system as claimed in claim 30 wherein, the weighted loss 25 function is combined with a per category percentage error to drive the model to predict equally well across all categories, rather than learning to predict the easier features at the expense of the more difficult ones.32. The computer system as claimed in claim 28 wherein, one or more network weight is updated in such a way that minimises the loss function.33. The computer system as claimed in claim 28 wherein, a change in the output of the loss function changes as the weights change is calculated using backpropagation, an algorithm which calculates the partial derivative of the loss function with respect to each weight, the weight is then adjusted such that the gradient of the slope decreases.34. The computer system as claimed in claim 28 wherein, the amount by which the weight changes is specified by a learning rate and momentum, and whether it is increased or decreased depends on the direction of the slope.35. The computer system as claimed in claim 1 wherein, the machine learning module comprises a multivariate multiple regression (MMR) model is used create a model to predict future data usage in which multiple dependent variables Yi. n are modelled using multiple inputs Xi...a.36. The computer system as claimed in claim 35 wherein, the machine learning module comprises a neural network with a single linear layer.37. The computer system as claimed in claim 35 or 36 wherein, the neural network is trained using an adaptive optimiser to optimise weights b such that the error e between the predicted data and actual data is minimized.38. The computer system as claimed in any of claims 35 to 36 wherein, the machine learning module: categorises customer data; and apportions the categorised customer data into overlapping samples for a predetermined time period to create a base dataset from which the training, validation and test data is built.39. The computer system as claimed in claim 38 wherein, the customer data is usage data.40. The computer system as claimed in claim 38 wherein, the predetermined time period is 70-110 days.41. The computer system as claimed in claim 35 wherein, the predetermined time period is 90 days.42. The computer system as claimed in claim 35 wherein, the MMR uses call and data usage records for a given customer for the predetermined time period to predict that customer's usage for a future time period days to allow the system to recommend the most suitable plan for that customer one month ahead of time.