OA19113A - Large-scale processing of data records with efficient retrieval. - Google Patents

Large-scale processing of data records with efficient retrieval. Download PDF

Info

Publication number
OA19113A
OA19113A OA1201900296 OA19113A OA 19113 A OA19113 A OA 19113A OA 1201900296 OA1201900296 OA 1201900296 OA 19113 A OA19113 A OA 19113A
Authority
OA
OAPI
Prior art keywords
records
data
data records
relating
transactional
Prior art date
Application number
OA1201900296
Inventor
Stelios LELIS
Antonios CHATZISTAMATIOU
Original Assignee
Channel Technologies Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Channel Technologies Limited filed Critical Channel Technologies Limited
Publication of OA19113A publication Critical patent/OA19113A/en

Links

Abstract

A method and system are provided for large-scale processing of data records. The method includes: processing data records including a first type of data records having static values and a second type of data records relating to transactional records with timestamps of events. The data records are filtered and transformed to standardized formats and persisted into two different data stores that support a large amount of records. Different categories of transactional records are grouped according to the identifiers. The categories are persisted to different tables including maintaining a table of stable size for efficient access of a category of transactional records that is most frequently accessed. The persisted data is used for retrieving features relating to an entity from multiple tables for processing to provide an output relating to the entity.

Description

LARGE-SCALE PROCESSING OF DATA RECORDS WITH EFFICIENT RETRIEVAL
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority from South African patent application number 2019/01131 fïled on 22 February 2019, which is incorporated by reference herein.
This invention relates to large-scale processing of data records with efficient retrieval. In particular, the invention relates to processing data records in the field of télécommunications.
BACKGROUND TO THE INVENTION
Mobile Network Operator (MNO) subscribers regularly consume services in advance of payment and/or receive loans through the operator. For example, prepaid subscribers(i.e. subscribers who hâve to first prepay airtime to use the network) are, in many MNOs, offered the option to consume airtime and mobile bundles in advance of payment, and pay back with a next recharge. Other types of loans that MNO subscribers may receive, are the crediting of money to their 20 mobile waliet, the provisioning of a good (e.g. a mobile phone) in advance of payment, etc.
The consomption of services in advance and loans incur, in most of the cases, a cost to the subscriber. The cost is a fee or interest that is paid upon repaying the advance or loan. This cost is a gain for the party-d-istri-btitîng-the-advaiiue/lüau (eithêr the MNO or a third party). On the other hand, if a subscriber does not pay back the advance or loan, the party realizes losses.
The entity distributing the advances or loans seeks to generate profit by maximizing the gains while mimmizmg the losses. A way to achieve this goal is to perform crédit analysis for each subscriber, détermine his/her crédit worthiness, and assign an appropriate crédit limit to each 30 subscriber, including a crédit limit of zéro indicating that the subscriber will not be able to borrow.
Usually crédit analysis is one of the functions of banking institutions and is based on financial data, such as banking transactions or information on assets held by the client. In general, the
MNOs do not possess such data for the subscribers of their network. MNOs hâve only data about the usage of their network and in some cases basic démographie data about their subscribers.
The data that the MNOs can provide, namely Call Detail Records (CDR), also called Event Data “ Records (EDR), and the limited démographie data in the form of Know Your Customer (KYC) -------Üata.ai£-di.Îfi,ciÎltJû^a,ly^^foiLCJadÎUiMaI.y«i6-aiÎd-«Îher-u8e8-a3-these-data; CDR and ·1<γ¥·€γτΠτ n°t usually follow a well-defined format and require large-scale processing.
The preceding discussion of the background to the invention is intended only to facilitate an understanding of the present invention. It should be appreciated that the discussion is not an acknowledgment or admission that any of the material referred to was part of the common general knowledge in the art as at the priority date ofthe application.
SUMMARY OF THE INVENTION
According to a first aspect of the present invention there is provided a computer-implemented method for large-scale processing of data records, comprising: processing large-scale raw data 20 records having no well-defined format and including a first type of data records having static values and a second type of data records relating to transactional records with timestamps of events; fîltering and transforming the data records to standardized formats; persisting the transformed data records into two different data stores that support a large amount of records, a -------firststatic values 25 including a unique identifier of an entity, and a second data store for transactional data for efficient retrieval based on timestamps of events; grouping different categories of transactional records according to the unique identifiers of entities and persisting the categories to different tables including maintaining a table of stable size for efficient access of a category of transactional records that is most frequently accessed; andretrieving features relating to an entity 30 from multiple tables for processing to provide an output relating to the entity.
Retrieving features relating to an entity may include feature génération with dimensions of behaviour ofthe entity at a given reference point in time, and may include: retrieving features from different tables; joining the features; and transforming the features to the reference point in time.
The second data store may be partitioned for time periods and/or by type of event. The category of transactional records that is most frequently accessed may be open transactions that are ongoing. i’he method may process the static data records first to obtain identiiying information ofthe entities.
In one embodiment, the raw data records relate to télécommunications and the entities are users, and wherein the first type of data records are user data records with static values of user attributes and the second type of data records are call detail records of events. The method may include generating a unique identifier for each user as a combination of a Mobile Station International Subscriber Directory Number (MSISDN) and a Subscriber Identity Module (SIM) 15 activation date.
In this embodiment, fîltering and transforming the data records to standardized formats may include applying fîltering rules to extract relevant records from transactional records including one or more of the group of: records relating to events including: calls, messages, monetary 20 events, data usage events; lifecycle events; advance usage events; loan events; mobile wallet events. Grouping different categories of transactional records according to the unique identifiers of an entity may include categories of open crédit advances and closed crédit advances, and a category of transactionaf records that is most frequently accessed are open crédit advances available for subsequeat-analysis. The-oHtptrt-rekrting-tO an entity may be a user profile for 25 subséquent crédit risk analysis.
According to another aspect of the présent invention there is provided a system for large-scale processing of data records, the system including a memory for storing computer-readable program code and a processor for executing the computer-readable program code, the system 30 comprising: a data receiving component for processing large-scale raw data records having no well-defined format and including a first type of data records having static values and a second type of data records relating to transactional records with timestamps of events; a transforming component for transforming data records to standardized formats; a filtering component for fîltenng relevant data records; a data store persisting component for persisting the transformed data records mto two different data stores that support a large amount of records, a first data store for static data relating to entities for efficient retrieval based on static values including a 5 unique identifier of an entity, and a second data store for transactional data for efficient retrieval based on timestamps of events; a grouping component for grouping different categories of .....transactional records according to tlie unique identifiers of entities and a table persisting componenj for pgrsiS.tm^tbe-categarjcsJûJ.i.ffarenLtabk^mel-uding-maintaiiÎing a table uf siHfrfesize for efficient access of a category of transactional records that is most frequentlv accessed: 10 andan entity profile component for retrieving features relating to an entity from multiple tables for processing to provide an output relating to the entity.
The entity profile component for retrieving features relating to an entity may include feature génération with dimensions of behaviour of the entity at a given reference point in time, and may 15 include: a feature retrieving component for retrieving features from different tables; a feature joining component for joining the features; and a feature transforming component for transforming the features to the reference point in time.
In one embodiment, the raw data records relate to télécommunications and the entities are users, 20 and wherein the first type of data records are user data records with static values of user attributes and the second type of data records are call detail records of events. In such an embodiment, the system may include a unique identifier component for generating a unique identifier for each user as a combination of a Mobile Station International Subscriber Directory ^nb.eL·41^ISβNMndÆSubsGr4bêM^nfity-M^ΰ^^SIM)^iWωl Έaίe^ The filtering component may apply filtering rules to extract relevant records from transactional records including one or more of the group of: records relating to events including: calls, messages, monetary events, data usage events; lifecycle events; advance usage events; loan events; mobile wallet events. The grouping component may include grouping categories of open crédit advances and closed crédit advances, and a category of transactional records that is most frequently 30 accessed are open crédit advances available for subséquent analysis.
According to a further aspect of the présent invention there is provided a computer program product for large-scale processing of data records, the computer program product comprising a computer-readable medium having stored computer-readable program code for performing the steps of; processing large-scale raw data records having no well-defmed format and including a first type of data records having static values and a second type of data records relating to 5 transactional records with timestamps of events; fîltering and transforming the data records to standardized formats; persisting the transformed data records into two different data stores that
J — support a large amount of records, a first data store for static data relating to entities for efficient ----~~ retrie val b,g^iç,4 Q]L§itatiiC^Îalu&SJ,nc.lud.i,ûfea-i,i.Îi.ic|.u^ildêia.tiiieuQ£jMa-eiia.tityi and ----_______f°r transactional data for efficient retrieval based on timestamps of events; grouping different_________ 10 categories of transactional records according to the unique identifiers of entities and persisting the categories to different tables including maintaining a table of stable size for efficient access of a category of transactional records that is most frequently accessed; andretrieving features relating to an entity from multiple tables for processing to provide an output relating to the entity.
Further features provide for the computer-readable medium to be a non-transitory computerreadable medium and for the computer-readable program code to be exécutable by a processing circuit.
According to a further aspect of the présent invention there is provided a computer-implemented method for large-scale processing of télécommunication data records, comprising: processing large-scale raw data records having no well-defined format and including a first type of data records having static values relating to a user entity and a second type of data records relating to trailsac.tLQnal r.e.co.î:ds-o.iLtelecQmmunieatioiÎ-e-v^nt-s-wkh-t-i-me&tamps-ot1-evenÎs:-riItering and-------- ---------25 transforming the data records to standardized formats; persisting the transformed data records into two different data stores that support a large amount of records, a first data store for static data relating to entities for efficient retrieval based on static values including a unique identifier of an entity, and a second data store for transactional data for efficient retrieval based on timestamps of events; grouping different categories of transactional records according to the unique identifiers of entities and persisting the categories to different tables including maintaining a table of stable size for efficient access of a category of transactional records that is most frequently accessed relating to open crédit advances to an entity; andretrieving features ' • !
i relating to an entity from multiple tables for processing to provide an output in the form of a crédit score relating to the entity.
Embodiments of the invention will now be described, by way of example only, with reference to 5 the accompanying drawings.
In the drawings:
Figure 1 is a flow diagram showing an example embodiment of a method in accordance with the invention;
Figure 2
Figure 3
Figure 4 is a flow diagram showing a further example embodiment of an aspect ofthe method of Figure 1;
is a block diagram of an example embodiment of a system in accordance with the invention;
illustrâtes an example of a computing device in which various aspects of the disclosure may be implemented.
DETAILED DESCRIPTION WITH REFERENCE TO THE DRAWINGS
The described method and system provide for large-scale processing of data records, in which the data records include a first type of data records having static values and a second type of data records relating to transactional records with timestamps of recorded events. In the described example embodiments, the data records are in the form of télécommunication records including a first type of data records being customer démographie records such as Know Your Customer (KYC) records and a second type of data records being Call Detail Records (CDR). However, the described method and system may equally apply to other forms of data records in which there are similar first and second types of data records.
An application of the described method and System is to utilize the télécommunications data for crédit scoring subscribers of Mobile Network Operators (MNO), and subsequently using the crédit score to assign appropriate crédit limits. This may be applied to any type of lending at the
MNO including, but not limited to, the provisioning of network usage advances, monetary loans credited on mobile wallets, consumer loans for the purchase of goods (e.g, mobile phones), etc.
Referring to Figure 1. a flow diagram Ç1.00) shmv<i an examplp emWimAnf a-f -
method as carried out by a computer-implemented data processing system. The method is
described in general terms that are applicable to different forms of data records, with examples given in the field of télécommunication data records. The method incorporâtes or ingests raw input data records to a data store that is usable for further analysis of the records.
The method may receive and process (101) large-scale raw data input records including a first 15 type of data records having static values and a second type of data records relating to transactional records with timestamps of events. The data input records in the form of télécommunication records may include the first type of data records being customer démographie records such as Know Your Customer (KYC) records and the second type nf data records being Call Detail Records (CDR).
The raw télécommunication data which are the input to the system may be provided in a text format (e.g., comma-separated files) or in a binary format. CDRs are of a transactional nature, they describe an event that took place in a spécifie moment in time and its attributes (e.g., tiniestamp) are.immutable^KYC-reeQrds-on-the-other-hand-re-fe^t-e-attr-i-buÏes-o^a-subscriber-that25.....are iiiostly static (e.g., the date a subscriber joined a Mobile Network is a static piece of information) while new attributes may become known in the future with new records.
The method may process (102) the first type of data records having static values first as they contain identifying information for entities to which the data records relate. In the example 30 where data records relate to télécommunications, entities may be users in the form of customers or subscribers to mobile télécommunication services and the first type of data records may be customer data records (e.g., KYC records) with static values of user attributes. The system may perform many passes over the data if needed and may also combine data from different sources and in different formats.
The method may configure (l 03) data record fields to a standard format. This may be applied to 5 both the first and second types of data records to ensure that fields use a consistent, standard format.For example, configuration may be spécifie to different MNOs and such configuration items melude the timc-zone(s) in the country servedby the MNO, the currency used, the country calling cnde.and others..... =
The method may extract (104) spécifie values from the raw data in a usable format whilst removing information that is not needed in the subséquent analysis.The purpose of this step is to extract, in a usable format, ail the necessary values from the input records. Values include the MSISDN (or other identifier) of the subscriber, the SIM activation date for the subscriber, the tïmestamp of the event, the duration of the event, if applicable, the monetary amount involved, and other event-specific values. This may extract a unique identifier for an entity, such as a user identifier, if available in the first type of data records.
The method may filter (105) the data records to détermine the event type of each input record by applying filtering rules. These rules are either provided by the MNO along with the data or they 20 are constructed after analysis of the data. The event type in télécommunications data may be
Recharge, Network Usage Advance, Repayment, Bundle purchase, P2P transfer, etc. The raw input data records may include records that are not needed in the subséquent analysis (such as records regarding the internai workings of the Systems ofthe MNO). The purpose of this step is to cia ssi •fy-each-record-ancLjzej.ec.Uaa-y-unaeGessar-y-Fecords;----- ------ -------------------25....... ..... ........ . . . .. . .. . .
The method may transform (106) the filtered data records into a standard format for each type of data record. After the transformation ail records of a spécifie type (e.g., ail Recharge records) hâve the same schéma (i.e., the same set of fields) and ail values conform to a standardized format. The same schema/format combination is used for ail records of a spécifie type from ail 30 MNOs. Example transformations may include:
• the appropriate country calling code is prepended to the MSISDN;
• timestamps are converted to the ISO 8601 format using UTC as time-zone;
• amounts are converted to the appropriate currency unit;
• event-specific data are converted to standardized values.
The method may generate (l 07) a unique identifier for each entity, if this is not obtained from 5 the static data values in the first type of data records in step (104) above. For example, the unique identifier for each user entity may be a combination of Mobile Station International
Subscnbei Düectory Number (MSISDN) and Subscriber Identity Module (SIM) activation date.
ΤΉβ AÆS.LS.OTJ alnnp 1Q Tint ......î+ îo...ranaa/l A* a «-Pi-am-ci
terminâtes its subscription the MSISDN in question may be assigned to a new subscriber).
Therefore, the method may generate a combination of MSISDN with the SIM activation date, if available, which uniquely identifies each subscriber. The SIM activation date should be available from a previous processing step. The generated unique identifier is added to the stored data records.
The method may persist (108) the transformed records to an appropriate data store that supports a large amount of records, in the order of billions or more.
The first type of data records with static values (e.g., the first KYC data records) including a generated or extracted unique identifier for entities may be persisted in a data store that provides 20 efficient retrieval based on the unique identifier for the entity (e.g., the MSISDN concatenated with the SIM activation date).
The second type of data records in the form of transactional records(e.g., the CDR data records) may-bê^er^st^4n^-dataFsterO^hat-prOWe^^ffi^enH,Wreval-based-rirrtfiêffimestaiiip uf llie 25 event. The records may be partitioned according to their type. For more efficient retrieval of information, the records may be further partitioned per day using the partitioning methods provided by the data store (e.g., if the data store consists of comma-separated files in a typical filesystem, the partitioning may be implemented by using a different file or folder for each day).
The method may group (109) different categories of transactional records according to the unique identifîers of an entity and persisting the categories to different tables including maintaining a table of stable size for efficient access of a category of transactional records that is 10 most frequently accessed. One category may be ongoing, open, transactional records and another category may be closed transaction records. The closed transactional records will keep growing in size, whereas the open transactional records will be generally stable in size as some transactions close and some new ones open.
In the télécommunication data example, this step may examine ail advances and recoveries of crédit to a subscriber in order to détermine which advances hâve been repaid, termed ''closed”, — , and whinb^dMa,nfie^ha.v(^aÎ.bB&ii.^apaidr4CTmed4Î0pen^Ad¥anee£Mand-i>epaymûnt'S-mtty-be· ______retrieved from the data store and are grouped together per subscriber (using the unique 10 subscriber identifier). The input to this grouping includes data from previous executions of the method, namely any advances that are still open. Each such group is sorted according to its timestamp, from the least to most recent.
Each group is then examined separately. If an advance is encountered it is recorded for further examination, while if a repayment is encountered it is matched, partially or fully, with a recorded advance, according to the rules set forth by the MNO, A possible rule is that a repayment matches the earliest open advance. An advance that has been fully matched with repayments is marked as closed.
At the end of this process advances that are still open are persisted in one table of the data store and advances that hâve been closed are persisted in a different table. In one embodiment, there may be one table with ail open advances, and in another embodiment, there may be one document table with one document per subscriber. The reason for this splitis to ensure efficient fetawal of open ad-vanees for 'subséquent-analysis. The count of the open advances is qui le stable since in normal operation, at each execution of the system, the count of new advances is in the same order of magnitude as the count of older advances that are closed in said execution. Therefore, the table holding the open advances remains stable in size (which translates to more efficient access) while the table holding the closed advances grows in size after each execution of the system. A separate table is used in order to efficiently update the table. As it is only open advances that are updated, the retrieval and update is faster if there is a table with only open advances.
i
The method may retrieve (110) features relating to an entity from one or more ofthe multiple tables for processïng to provide an output relating to the entity. Feature génération is carried out with dimensions of behaviour of the entity at a given time including: retrieving features from different tables, joining these, and transforming the extracted features to a reference point.
An output relating to an entity may relate to a user and may include a list with entity identifiers
and crédit. limils with optional additional information. The output may be an extraction of
nntpn+ial bnrrnuiAr’c λ-Ρ. ... ..... .. .
scoring subscribers and assigning crédit limits.
Referring to Figure 2, a flow diagram (200) shows an example embodiment of an aspect of processïng data records in accordance with the method of Figure 1.
The method may receive (201) a next transactional data record from the raw input data records 15 and may extract (202) record values. Filtering rules may be applied (203) and the type of record determined (204).
It may be determined (205) if the record should be persisted to a data store. If the record is not persisted to a data store, the method may loop to process a next record (201). If the record is 20 persisted to a data store, the record may be transformed (206) to a standard format and a unique identifier may be used (207) for the record. The unique identifier may be generated (207) from the record if this is not available from the extracted record values. The record may be persisted (208) to the data store.
It may be determined (209) if there is another record to process. If so, the method loops to process a next record (201). If there are no further records to process, the method may update (210) an output status such as a loan status for télécommunication subscribers.
The method may extract the following event records from the télécommunications data.
· From CDR records:
Call events, which are events where a subscriber of the MNO makes or receives a call.
° Short Message Service (SMS) events, whichare events where a subscriber of the
MNO sends or receives an SMS.
Monetary Recharge events (also called “top-ups”), which are events where a subscriber of a pre-paid mobile service spends an amount of money in order to 5 increase his account balance with the Mobile Network Operator and in so doing keep using the service in the future.
° Person to Person (P2P) transfers, which are events where a subscriber transfers a
-.......................................°........Biindkl,puroha<ifli'i,..i¥hiGh.are.ev(i>nti?Whara-a-ÎÎubÎ]criber-buy9-a-pToduoÎ-offefed by the 10 Operator that combines a number of services (voice, data, etc.) with spécifie volumes for each product. Such products are commonly referred to as bundles.
° Bundle activations, which are events where a subscriber buys with money from his main account (and therefore already credited to the MNO) a product offered by the Operator that combines a number of services (voice, data, etc.) with spécifie volumes for each product.
° Data usage, which are events of data consumption.
• Lifecycle events, which are events of changes in the status of the subscribers at the MNO.
° Network airtime usage advances, which are events where airtime/bundles are granted to a subscriber beyond their account balance or contract. These advances are paîd back by the subscriber in a future Recharge event.
° Network usage advance repayments, which are events where network usage advances are repaid. ..... T
........° Mobile wallet transactions, which are débit and crédit events on the subscriber’s 25 mobile wallet.
° Mobile wallet loans, which are events where a monetary amount is credited on the mobile wallet of the subscriber with the condition to be paid later commonly along with a fee.
° Mobile wallet loan repayments, which are events where mobile wallet loans are 30 repaid.
0 Any other type ofrecords that can be extracted from the CDRs.
• From Know Your Customer records: démographie information about subscribers, such as the subscriber’s name, date of birth, or address, and data about the relationship of the subscriber to the network, such as price plan selected or subscription date.
Given the input télécommunication data, the output of the method consists of a list where each list item consists of an identifier that uniquely identifies each subscriber associated with a crédit : hmit for tins subscriber, and additional information that encapsulâtes the analysis that led to the .......nirSpfifii.filfc.£i:£ditJjjïuilJ£h,&4MathQdm»y»pii0¥id»^Î4i'i!»t«of“ainy«liengit'h™ÎQnigiî'n'g*'from*,onc*3ubseiiibcTritu millions, and provides the output real time for small lists and offline for large lists.
The method described in Figure 1 describes a method of efficient data ingestion încluding incorporating the provided raw input recordsinto a data store that is usable for analysis. This may be used for different applications, one of which is forentity profiling (încluding user profîling), crédit scoring and limit setting for télécommunication subscribers.
.
An entity profiling method is described in the context of user profiling that returns the user profiles for subséquent crédit risk analysis and limit assignaient may be provided by the described method. A user profile is a collection of aggregate and static values derived from the user data processed in the data ingestion method described with reference to Figure 1. These 20 values are called features and describe the user’sbehaviour given a reference time point. The method may provide user profiles in the format required for crédit risk analysis and limit assignment.
The method gener-ates—thotisands-of features-dcscribrtïg the beliaviuut uf users al a givetl 25 reference point in time. The features are aggregates of transactional events (recharges, advances, etc.), categorical and cardinal features based on KYC, and several combinations of these. The features describe several dimensions of user behaviour on different time frames. Dimensions of user behaviour include recency, loyalty, frequency, aggregates of amounts and durations, They also include minimum, maximum, counts, means, standard déviations, médians, quartiles and 30 trends of counts, amounts and duration. In addition, they include ratios, and average amounts and activity. Time frames, at which dimensions are expressed before the reference point are recent weeks, fortnights, and months. As well as quarters, semesters and years. In addition, several features describe the dimensions across the full duration of a service/product usage by the i subscriber.I i i
The table below shows a small subset of features relevant to a user’s recharge pattern.
Short Name Description F
Irdt Last Recharge Date (before RD, or last day of RM if RD is missins) i
IrdsTnr.............. Da.vs na ssed-fmrti ............................................................................................................... ..........
rvXtoYma Total recharge value in X to Y calendar months ago
mrvXtoYma Mean of monthly (calendar) recharge values of X to Y months ago -
rvstdXtoYma Standard déviation of monthly (calendar) recharge values of X to Y months ago
rnXnia Recharges number X calendar month ago
mrnXtoYma Mean of monthly (calendar) recharge number of X to Y months ago
rnstdXtoYma Standard déviation of monthly (calendar) recharge number of X to Y months ago
mxrXma Maximum Recharge Amount in X calendar month ago
mxrXtoYma Maximum Recharge Amount in X to Y calendar months ago
___________________ arvXma Average Recharge Value (the average value of top-ups) X months ago
arvXtoYma
---------- .. .................................. * v aiuv ^uic» aveidgc venue- U1 L<jp“Up&^ ΙΠΛίυ 1 ΓΠΟΠΤΠδ agO ......................-... —
mwrXtoYma Months with Recharges in X to Y months ago 1
i
I
Most of the features on the user profile hâve a temporal dimension; e.g. rvlma, recharge valueί one month ago. Retrieving aggregate features in bulk mode (i.e. for many users) is faster when!
data are partitioned (i.e. indexed according to time) as described in Figure 1.!
10ί
The method of user profiling includes handlingconfiguration that is spécifie to each source of
I .i data, such as an MNO. The configuration items include the time-zone, the list of features that can be retrieved, and the data store and tables they are to be retrieved from. For optimized execution several features can be retrieved from different tables.
The method thenretrieves the features from the persisted data of the data ingestion method and joins the features retrieved from ail different data stores and tables.The extracted features are —'—then joined to the reference point in time.----------::::
A method oferedit scoring and limit settingis an example use of the user profiles generated above 10 to develop crédit scoring models, to crédit score the subscribers, and assign crédit limits.
Crédit scoring models predict with high accuracy the probability that a subscriber will default on a loan or a crédit service. Crédit scoring models can be developed following statistical, Machine Leaming and Artificial Intelligence methods and are selected based on their performance. Either 15 one model can be developed for the total base of the MNO, or different models can be developed for different segments of the base.
The crédit scoring model can then be used to calculate the crédit score of each subscriber and assign an appropriate crédit limit. Given the crédit score multiple ways can be used to assign a 20 crédit limit, with the rationale being that, everything else equal, subscribers with a higher crédit score (and therefore more crédit worthy) will receive a higher crédit limit, while subscribers with a lower score will receive a lower crédit limit, and subscribers with the lowest score may not be allowed to borrow.
An example of a method for the assignment of crédit limit given the crédit score is the threshold based method. This method consists of an ordered list of score thresholds and associated crédit limits. The score of the subscriber is checked against this list and the crédit limit associated with the highest threshold that is smaller compared to the said score is used as the crédit limit of said subscriber.
Referring to Figure 3, a block diagram illustrâtes an example embodiment of the described System (300) for large-scale processing of data records. The system (300) includesat least one processor (301) or multiple processors running in parallel and memory (302) for storing computer-readable program code in the form of computer instructions (303),
The system (300) includes a data ingesting system (310), an entity profile component (330) with outputs to applications such as a model development component (340) and a crédit score and limit assignment component (350),
................The, data H1.1..1 L
processing large-scale raw data records having no well-defîned format and including a first type
of data records having static values and a second type of data records relating to transactional .records with timestamps of events. In an example embodiment, the raw data records relate to télécommunications data.
The data ingesting system (310) may include a configuration component (312) for configuring 15 the data record fîelds to standard formats and an extracting component (313) for extracting spécifie values from the raw data in usable format.
The data ingesting System (310) may include a filtering component (315) for filtering relevant data records byapplying filtering rules to extract relevant records from transactional recordsand a transforming component (314) for transforming data records to standardized formats.
The data ingesting System (310) may include a data store persisting component (316) for persisting the transformed data records into two different data stores (320, 321) that support a - 1 arge^nount-of recordsr A first data-stor-^(-g2-Q-)-isfer static data relatingrto ôntittcs-fef^ffi^erït-25 retrieval based on static values including a unique identifier of an entity, and a second data store (321) is for transactional data for efficient retrieval based on timestamps of events. The second data store (321) may be partitioned for time periods and/or by type of event.
The data ingesting system (310) may include a grouping component (318) for grouping different 30 categories of transactional records according to the unique identifiers of an entity and a table persisting component (319) is provided for persisting the categories to different tables including maintaining a table (322) of stable size for efficient access of a category of transactional records that is most frequently accessed,
The data ingesting System (310) may include a unique identifier component (317) for generating a unique identifier for entities of the data records, if this îs not extracted by the extracting component (313). For example, entities may be télécommunication subscribers.The data ingesting system (310) may include a status update component (360) for updating status results
further to processing data records. ......
The entity profile component (330) may retrieve features relating to an entity from multiple
tables of the data stores (320, 321). The entity profile component (330) may generate user profiles when the entity is a user. The entity profile component (330) for retrieving features relating to an entity includes feature génération with dimensions of behaviour of the entity at a given time and, in one embodiment, includes a feature retrieving component (331) for retrieving features from different tables, a feature joining component (332) for joining the features, and a feature transforming component (333) for transforming the features to a reference point.
The entity profile component (330) may provide an output from an output component (334) relating to the entity. The output relating to an entity may be a user profile for subséquent crédit risk analysis by use of the model development component (340) and the crédit score and limit assignment component (350).
Figure 4 illustrâtes an example of a computing device (400) in which various aspects of the disclosure may be implemented.The computing device (400) may be embodîed as any form of data prΌces&i]ίg-dëv^ëê-ineludiHg-a-pel:&¢^nal-eί^mputmg-d¢vtec-(e7g7-l^aptopΓor-desktop couiputei), a server computer (which may be self-contained, physically distributed over a number of locations), a client computer, or a communication device, such as a mobile phone (e.g. cellularί téléphoné), satellite phone, tablet computer, personal digital assistant or the like. Different;
i embodiments of the computing device may dictate the inclusion or exclusion of variousj components or subsystems described below.i
I
30i
The computing device (400) may be suitable for storing and executing computer program code. The various participants and éléments in the previously described system diagrams may use any suitable number of subsystems or components of the computing device (400) to facilitate the functions described herein, The computing device (400) may include subsystems or components interconnected via a communication infrastructure (405) (for example, a communications bus, a network, etc.).The computing device (400) may include one or more processors (410) and at least one memory component in the form of computer-readable media. The one or more processors (410) may include one or more of: CPUs, graphical processing units (GPUs),
microprocessors, field programmable gâte arrays (FPGAs), application spécifie integrated
........circuits i ASTCsI and thp. lîkp.Tn «nm? rnn.fï.anratinnc.....n numXor mov Ιλο ...........L
and may be arranged to carry out calculations simultaneously.In some implémentations various
subsystems or components of the computing device (400) may be distributed over a number of physical locations (e.g. in a distributed, cluster or cloud-based computing configuration)and appropriate software units may be arranged to manage and/or process data on behalf of remote devices.
The memory components may include system memory (415), which may include read only [ memory (ROM) and random access memory (RAM).A basic input/output system (BIOS) may be stored in ROM.System software may be stored in the system memory (415) including operating system software.The memory components may also include secondary memory (420).The secondary memory (420) may include a fixed disk (421), such as a hard disk drive, and, optionally, one or more storage interfaces (422) for interfacing with storage components (423), such as removable storage components (e.g. magnetic tape, optical disk, flash memory drive, external hard drive, removable memory chip, etc.), network attached storage components (e.g. NAS drives), remote storage components (e.g. cloud-based storage) or the like.
The computing device (400) may include an external communications interface (430) for operation of the computing device (400) in a networked environment enabling transfer of data between multiple computing devices (400) and/or the Internet.Data transferred via the external j communications interface (430) may be in the form of signais, which may be electronic, i electromagnetic, optical, radio, or other types of signal.The external communications interface (430) may enable communication of data between the computing device (400) and other computing devices including servers and external storage facilities.Web services may be ί accessible by and/or from the computing device (400) via the communications interface (430).
The extemal communications interface (430) may be configured for connection to wireless communication channels (e.g., a cellular téléphoné network, wireless local area network (e.g.
using Wi-Fi™), satellite-phone network, Satellite Internet Network, etc.) and may include an associated wireless transfer element, such as an antenna and associated circuitry.
The computci-ieadable media in lhe form of the various memory components may provide
........ —..... U.
and other data.A computer program product may be provided by a computer-readable medium
having stored computer-readable program code exécutable by the central processor (410).A computer program product may be provided by a non-transient computer-readable medium, or may be provided via a signal or other transient means via the communications interface (430).
Interconnection via the communication infrastructure (405) allows the one or more processors 15 (410) to communicate with each subsystem or component and to control the execution of instructions from the memory components, as well as the exchange of information between subsystems or components.Peripherals (such as printers, scanners, caméras, or the like) and input/output (I/O) devices (such as a mouse, touchpad, keyboard, microphone, touch-sensitive display, input buttons, speakersand the like) may couple to or be integrally formed with the 20 computing device (400) either directiy or via an I/O controller (43 5). One or more displays (445) (which may be touch-sensitive displays) may be coupled to or integrally formed with the computing device (400) via a display (445) or video adapter (440).
be exhaustive or to limit the invention to the précisé forms disclosed.Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.
Any of the steps, operations, components or processes described herein may be performed or 30 implemented with one or more hardware or software units, alone or in combination with other devices.In one embodiment, a software unit is implemented with a computer program product comprising a non-transient computer-readable medium containing computer program code, which can be executed by a processor for performing any or ali of the steps, operations, or processes described. Software units or fonctions described in this application may be implemented as computer program code using any suitable computer language such as, for example, Java™, C++, or Péri™ using, for example, conventional or object-oriented techniques.The computer program code may be stored as a sériés of instructions, or commands on a non-transitory computer-readable medium, such as a random access memory (RAM), a
read-only memory (ROM), a magnetic medium such as a hard-drive, or an optical medium such
................3S ..A CD-ROM Anv snt?.b...r.nmmi+pr rf?aiJîiblA-.mi>rtÎntn ,maxr.a1.cr>..nac.ÎJa zvn ................L
computatîonal apparatus, and may be present on or within different computational apparatuses
within a System or network.
Flowchart illustrations and block dîagrams of methods, Systems, and computer program products according to embodiments are used hereîn.Each block of the flowchart illustrations and/or block dîagrams, and combinations of blocks in the flowchart illustrations and/or block dîagrams, may 15 provide fonctions which may be implemented by computer readable program instructions. In some alternative implémentations, the fonctions identified by the blocks may take place in a different order to that shown in the flowchart illustrations.
Some portions of this description describe the embodiments of the invention in terms of 20 algorithme and symbolic représentations of operations on information.These algorithmic descriptions and représentations are commonly used by those skilled in the data processing arts to convey the substance of their work effectîvely to others skilled in the art.These operations, while described fonctionally, computationally, or logically, are understood to be implemented by computer program^-or—êquivttlen^eleetrieal—etrettito. microeodeT-er thc like. The described 25 operations may be embodied in software, firmware, hardware, or any combinations thereof.
The language used in the spécification has been principally selected for readability and instructional purposes, and it may not hâve been selected to delineate or circumscribe the inventive subject matter.lt is therefore intended that the scope of the invention be limited not by 30 this detailed description, but rather by any claims that issue on an application based hereon.Accordingly, the disclosure of the embodiments of the invention is intended to be illustrative, but not lirnîting, of the scope of the invention, which is set forth in the following daims.
Finally, throughout the spécification and daims unless the contents requires otherwise the word ‘comprise’ or variations such as ‘comprises’ or ‘comprising’ will be understood to imply the 5 inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.

Claims (12)

1. A computer-implemented method for large-scale processing of data records, comprising:
processing large-scale raw data records having no well-defined format and including a
5 first type of data records having static values and a second type of data records relating to transactional records with timestamps of events;
fîltering and transforming the data records to stândardized formats; amount of records, a first data store for static data relating to entities for efficient retrieval based
10 on static values including a unique identifier of an entity, and a second data store for transactional data for efficient retrieval based on timestamps of events;
grouping different categories of transactional records according to the unique identifiers of entities and persisting the categories to different tables including maintaining a table of stable size for efficient access of a category of transactional records that is most frequently accessed;
15 and retrieving features relating to an entity from multiple tables for processing to provide an output relating to the entity.
2. The method as claimed in claim 1, wherein retrieving features relating to an entity 20 includes feature génération with dimensions of behaviour of theentity at a given reference point intime, and includes:
retrieving features from different tables;
joining the features; and transfeffl4ng-Îhe-featur-e»4e-the-ii * 3 4 5eferenee-)x:)-rrtt-m-timc.
-25......................... ...... ..... .....
3. The method as claimed in claim 1 or claim 2, wherein the second data store is partitioned for time periods and/or by type of event.
4. The method as claimed in any one of claims 1 to 3, wherein the category of transactional 30 records that is most frequently accessed is open transactions that are ongoing.
5. The method as claimed in any one of the preceding claims including: processing the static data records first to obtain identifying information of the entities.
6. The method as claimed in any one of the preceding claims, wherein the raw data records relate to télécommunications and the entities are users, and wherein the first type of data records 5 are user data records with static values of user attributes and the second type of data records are call detail records of events.
as a combination of a Mobile Station International Subscriber Directory Number (MSISDN) and
10 a Subscriber Identity Module (SIM) activation date.
8. The method as claimed in claim 6 or claim 7, wherein filtering and transforming the data records to standardized formats include applying filtering rules to extract relevant records from transactional records including one or more of the group of: records relating to events including:
15 calls, messages, monetary events, data usage events; lifecycle events; advance usage events; loan events; mobile wallet events.
9. The method as claimed in any one of claims 6 to 8, wherein grouping different categories of transactional records according to the unique identifiers of an entity includes categories of
20 open crédit advances and closed crédit advances, and a category of transactional records that is most frequently accessed are open crédit advances available for subséquent analysis.
10. The method as claimed in any one of the preceding claims, wherein the output relating to an enfit-y-is-a-usei-pro-file-fer-siibsequcnt-crcdrt-ri'sk-anaiysis. .....
25..........................................................................
11. A System for large-scale processing of data records, the System including a memory for storing computer-readable program code and a processor for executing the computer-readable program code, the System comprising:
a data receiving component for processing large-scale raw data records having no well30 defined format and including a first type of data records having static values and a second type of data records relating to transactional records with timestamps of events;
a transforming component for transforming data records to standardized formats;
24 a filtering component for filtering relevant data records; a data store persisting component for persisting the transformed data records into two different data stores that support a large amount of records, a first data store for static data relating to entities for efficient retrieval based on static values including a unique identifier of an 5 entity, and a second data store for transactional data for efficient retrieval based on tîmestamps of events; a gi'uuping component for grouping different categories of transactional records categories to different tables including maintaining a table of stable size for efficient access of a 10 category of transactional records that is most frequently accessed; and an entity profile component for retrieving features relating to an entity from multiple tables for processing to provide an output relating to the entity. 12. A computer program product for large-scale processing of data records, the computer 15 program product comprising a computer-readable medium having stored computer-readable program code for performing the steps of: processing large-scale raw data records having no well-defined format and including a first type of data records having static values and a second type of data records relating to transactional records with tîmestamps of events; 20 filtering and transforming the data records to standardized formats; persisting the transformed data records into two different data stores that support a large amount of records, a first data store for static data relating to entities for efficient retrieval based on static values including a unique identifier of an entity, and a second data store for irunuaeiioncu uuiu lor eiiicicnt l'etrievai based on tuneslaiiips ul rvents,
25........ grouping different categories of transactional records according to the unique identifiers of entities and persisting the categories to different tables including maintaining a table of stable size for efficient access of a category of transactional records that is most frequently accessed; and retrieving features relating to an entity from multiple tables for processing to provide an 30 output relating to the entity.
13. A computer-implemented method for large-scale processing of télécommunication data records, comprising:
processing large-scale raw data records having no well-defîned format and including a first type of data records having static values relating to a user entity and a second type of data records relating to transactional records of télécommunication events with timestamps of events;
5 filtering and transforming the data records to standardized formats;
persisting the transformed data records into two different data stores that support a large amount of records, a first data store for static data relating to entities for efficient retrieval based .........o» etatas i valuei).....including ·ΰ unique- idontificf of.....anOwtity~'-aiÎd~a,seiÎPiïel~ldaÎM,',lstu'ie fut transactional data for efficient retrieval based on timestamps of events;
10 grouping different categories of transactional records according to the unique identifîers of entities and persisting the categories to different tables including maintaining a table of stable size for efficient access of a category of transactional records that is most frequently accessed relating to open crédit advances to an entity; and retrieving features relating to an entity from multiple tables for processing to provide an
15 output in the form of a crédit score relating to the entity.
OA1201900296 2019-02-22 2019-07-22 Large-scale processing of data records with efficient retrieval. OA19113A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
ZA2019/01131 2019-02-22

Publications (1)

Publication Number Publication Date
OA19113A true OA19113A (en) 2020-01-20

Family

ID=

Similar Documents

Publication Publication Date Title
US11373261B1 (en) Automated analysis of data to generate prospect notifications based on trigger events
US20210209696A1 (en) Real-time analysis using a database to generate data for transmission to computing devices
US9940668B2 (en) Switching between data aggregator servers
CN110738477B (en) Account checking method and device, computer equipment and storage medium
US8346664B1 (en) Method and system for modifying financial transaction categorization lists based on input from multiple users
WO2019080662A1 (en) Information recommendation method, device and apparatus
CN107563881A (en) Bookkeeping methods and device, server
TW202008251A (en) Payment instrument recommendation method and apparatus, device, and computer readable storage medium
US20190295046A1 (en) Settlement facilitation hub
WO2020253067A1 (en) Project scheme management method based on data analysis, related device and storage medium
CN111127214A (en) Method and apparatus for portfolio
US20240103750A1 (en) Systems and methods for providing customer service functionality during portfolio migration downtime
US20090006251A1 (en) Universal rollover account
WO2020170187A1 (en) Large-scale processing of data records with efficient retrieval
CN115082203A (en) Method and device for pushing living scheme, electronic equipment and storage medium
CN113298631A (en) Bearing processing method and device
OA19113A (en) Large-scale processing of data records with efficient retrieval.
CN110321393A (en) Method for computing data, device, equipment and readable storage medium storing program for executing
CN115564415A (en) Order payment and settlement method and device
CN111080355B (en) User set display method and device and electronic equipment
CN109993648B (en) Data processing method and related device
CN113971007B (en) Information processing method, device, electronic equipment and medium
US11720975B2 (en) Systems and methods for multi-purse transaction file splitting
JP7551816B1 (en) Information processing device, information processing method, information processing program, and information processing system
TWM560616U (en) An electronic device for providing an associated menu