CN113849514A - Method and device for generating multidimensional key value pair data - Google Patents

Method and device for generating multidimensional key value pair data Download PDF

Info

Publication number
CN113849514A
CN113849514A CN202111134047.3A CN202111134047A CN113849514A CN 113849514 A CN113849514 A CN 113849514A CN 202111134047 A CN202111134047 A CN 202111134047A CN 113849514 A CN113849514 A CN 113849514A
Authority
CN
China
Prior art keywords
attribute
key
group
value
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111134047.3A
Other languages
Chinese (zh)
Inventor
叶睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202111134047.3A priority Critical patent/CN113849514A/en
Publication of CN113849514A publication Critical patent/CN113849514A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Abstract

The embodiment of the specification provides a method and a device for generating multidimensional key value pair data. In the method, an execution subject can obtain a plurality of initial service data, wherein one initial service data comprises attribute values of a plurality of attributes; then, the first attribute group and the designated attribute value of the attribute contained in the first attribute group are determined, and the key in the key-value pair data to be generated is determined based on the first attribute group and the corresponding designated attribute value. The first property group includes one or more properties. And then, screening the plurality of initial service data based on the first attribute group and the corresponding designated attribute value to obtain the selected service data. After the selected service data is obtained, data processing is carried out on the attribute values in the selected service data based on a preset data processing mode, and the values in the data are determined according to the data processing result.

Description

Method and device for generating multidimensional key value pair data
Technical Field
One or more embodiments of the present disclosure relate to the field of data processing technologies, and in particular, to a method and an apparatus for generating multidimensional key value pair data.
Background
With the development of technology, various service platforms can provide more and more services. The service platform can record a large amount of service data related to the service, and the service data is analyzed, so that the service level can be further improved. The form of the business data is similar to log data, and usually records data of multiple aspects. For example, in a bank loan scenario, a business datum may include data on the name, sex, age of the borrower, the city of the borrower, the term of the borrower, the amount of the borrower, whether the borrower has paid, whether the borrower is overdue, and the like. Such service data can embody certain information, but has a problem of weak expression ability. Currently, there is a need to extract deeper information from these business data.
Therefore, improved schemes are desired that extract deeper features from the business data and improve the expressive power of the data.
Disclosure of Invention
One or more embodiments of the present specification describe a method and an apparatus for generating multidimensional key value pair data to extract deeper features from business data and improve the expression capability of the data. The specific technical scheme is as follows.
In a first aspect, an embodiment provides a method for generating multidimensional key-value pair data, including:
acquiring a plurality of initial service data, wherein one initial service data comprises attribute values of a plurality of attributes;
determining a first attribute group and designated attribute values of attributes contained in the first attribute group, and determining keys in key value pair data to be generated based on the first attribute group and the corresponding designated attribute values;
screening the plurality of initial service data based on the first attribute group and the corresponding designated attribute value to obtain selected service data;
and performing data processing on the attribute values in the selected service data based on a preset data processing mode, and determining the values in the key value pair data based on the data processing result.
In one embodiment, the key includes a field portion and a field value portion, wherein a group of field values corresponds to a sub-key; the key is determined in the following way:
determining attributes in the first attribute group as fields contained by the key;
based on the assigned attribute values of the attributes in the first attribute group, one or more sets of field values contained by the key are determined and correspondingly serve as one or more sub-keys of the key.
In one embodiment, the first property group includes a plurality of properties; the appointed attribute value of any one attribute in the first attribute group is one or more;
the step of determining the attributes in the first attribute group as the fields contained by the key includes:
determining a plurality of attributes in the first attribute group as a plurality of fields contained in the key respectively;
said step of determining one or more sets of field values contained by said key based on specified attribute values of attributes in said first set of attributes comprises:
for a plurality of attributes in the first attribute group, when the assigned attribute value of each attribute in the plurality of attributes is one, respectively determining the assigned attribute values of the plurality of attributes as corresponding field values of fields contained in the key to obtain a group of field values;
for a plurality of attributes in the first attribute group, when the assigned attribute value of at least one attribute in the plurality of attributes is multiple, combining the assigned attribute values of the plurality of attributes to obtain a plurality of attribute value combinations; and respectively determining the attribute value combinations as the corresponding field values of the fields contained in the keys to obtain a plurality of groups of field values.
In one embodiment, the step of filtering the plurality of initial service data based on the first attribute group and the corresponding specified attribute value includes:
and aiming at any sub-key contained in the key, screening the plurality of initial service data by using the attribute and the designated attribute value corresponding to the sub-key to obtain a group of selected service data corresponding to the sub-key.
In one embodiment, the step of performing data processing on the attribute values in the selected service data based on a preset data processing manner and determining the values in the key value pair data based on the data processing result includes:
and aiming at any sub-key contained in the key, performing data processing on the attribute values in a group of selected service data corresponding to the sub-key based on a preset data processing mode, and determining the obtained data processing result as the value in the key value pair data corresponding to the sub-key.
In one embodiment, the step of performing data processing on the attribute values in the selected service data includes:
determining a second property group;
and based on a preset data processing mode, carrying out data processing on the attribute values of the second attribute group in the selected service data.
In one embodiment, the value includes a field portion and a field value portion; the data processing mode comprises one or more modes; any first data processing mode corresponds to the first field of the value;
the step of performing data processing on the attribute values in the selected service data based on a preset data processing mode and determining the values in the key value pair data based on the data processing result includes:
and performing data processing on the attribute values in the selected service data based on the first data processing mode, and determining an obtained data processing result as a field value of the first field.
In one embodiment, the data processing means includes at least one of the following means:
summing, averaging, counting the set attribute values, calculating the variance, calculating the standard deviation and calculating the covariance.
In one embodiment, the first property group is a plurality; the method further comprises the following steps:
determining the incidence relation among the attributes contained in the first attribute groups; wherein the incidence relation is used for identifying the relation between different key-value pair data.
In one embodiment, after obtaining the key-value pair data, the method further comprises:
storing the key-value pair data in a database;
when data retrieval is required, data is retrieved from the database using a number of data engines.
In a second aspect, an embodiment provides an apparatus for generating multidimensional key-value pair data, including:
the acquisition module is configured to acquire a plurality of initial service data, and one initial service data comprises attribute values of a plurality of attributes;
the determining module is configured to determine a first attribute group and designated attribute values of attributes contained in the first attribute group, and a key in the key-value pair data to be generated, which is determined based on the first attribute group and the corresponding designated attribute values;
the screening module is configured to screen the plurality of initial service data based on the first attribute group and the corresponding designated attribute value to obtain selected service data;
and the processing module is configured to perform data processing on the attribute values in the selected service data based on a preset data processing mode, and determine the values in the key value pair data based on the data processing result.
In one embodiment, the key includes a field portion and a field value portion, wherein a group of field values corresponds to a sub-key; the determination module is further configured to determine a key using:
determining attributes in the first attribute group as fields contained by the key;
based on the assigned attribute values of the attributes in the first attribute group, one or more sets of field values contained by the key are determined and correspondingly serve as one or more sub-keys of the key.
In one embodiment, the first property group includes a plurality of properties; the appointed attribute value of any one attribute in the first attribute group is one or more;
the determining module, when determining the attribute in the first attribute group as the field included in the key, includes:
determining a plurality of attributes in the first attribute group as a plurality of fields contained in the key respectively;
the determining module, when determining one or more groups of field values included in the key based on the specified attribute values of the attributes in the first attribute group, includes:
for a plurality of attributes in the first attribute group, when the assigned attribute value of each attribute in the plurality of attributes is one, respectively determining the assigned attribute values of the plurality of attributes as corresponding field values of fields contained in the key to obtain a group of field values;
for a plurality of attributes in the first attribute group, when the assigned attribute value of at least one attribute in the plurality of attributes is multiple, combining the assigned attribute values of the plurality of attributes to obtain a plurality of attribute value combinations; and respectively determining the attribute value combinations as the corresponding field values of the fields contained in the keys to obtain a plurality of groups of field values.
In one embodiment, the screening module is specifically configured to:
and aiming at any sub-key contained in the key, screening the plurality of initial service data by using the attribute and the designated attribute value corresponding to the sub-key to obtain a group of selected service data corresponding to the sub-key.
In one embodiment, the processing module is specifically configured to:
and aiming at any sub-key contained in the key, performing data processing on the attribute values in a group of selected service data corresponding to the sub-key based on a preset data processing mode, and determining the obtained data processing result as the value in the key value pair data corresponding to the sub-key.
In one embodiment, the processing module is specifically configured to:
determining a second property group;
and based on a preset data processing mode, carrying out data processing on the attribute values of the second attribute group in the selected service data.
In one embodiment, the value includes a field portion and a field value portion; the data processing mode comprises one or more modes; any first data processing mode corresponds to the first field of the value;
the processing module is specifically configured to:
and performing data processing on the attribute values in the selected service data based on the first data processing mode, and determining an obtained data processing result as a field value of the first field.
In a third aspect, embodiments provide a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method of any of the first aspect.
In a fourth aspect, an embodiment provides a computing device, including a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method of any one of the first aspect.
In the method and apparatus provided in the embodiments of the present specification, multidimensional key-value pair data may be determined based on a plurality of initial service data. The method comprises the steps of screening a plurality of service data based on a first attribute group and an appointed attribute value, carrying out data processing on the selected service data obtained through screening, determining a value in key value pair data based on a data processing result, determining a key in the key value pair data based on the first attribute group and the appointed attribute value, and expressing the processed data in a key value pair data mode. In the embodiment of the specification, the deep features are extracted from the selected business data through screening and data processing, and the extracted deep features are expressed by using a multi-dimensional key value to form the data, so that the data expression capability can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;
fig. 2 is a schematic flowchart of a method for generating multidimensional key-value pair data according to an embodiment;
FIG. 3 is a schematic diagram of keys in key-value pair data;
fig. 4 is a schematic block diagram of an apparatus for generating multidimensional key-value pair data according to an embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. In fig. 1, in a service platform (e.g., a platform provided by a banking institution, a platform provided by a communicator, a shopping platform, and other institutional platforms), a large amount of business data can be recorded and the business data is continuously generated as time advances. The service data contains attribute values of several attributes. The processing platform can acquire a large amount of service data, deeply extract features of the service data, and express the extracted features in a key value data form. The extracted features can reflect more deep and abundant information of a plurality of service data, are easy to store and retrieve, and can be subsequently analyzed and determined by related personnel to trigger a solution for improving the service level of the service platform.
Wherein, the service platform can also realize the functions of the processing platform. The traffic data may be obtained from log data. For example, in a bank loan scenario, the business data may be of the form set forth in Table 1.
TABLE 1
Figure BDA0003281413130000051
Table 1 above lists only 6 service data as an example. Wherein, line 1 is each attribute, and lines 2 to 7 are respectively 6 service data, and each service data contains attribute values of a plurality of attributes. Of course, in other cases, a service data may contain only one attribute.
As can be seen from table 1, the service data records basic information related to the service, and the information that can be expressed by the individual service data is limited. In order to enhance the expressive power of data and extract deeper features from business data, the embodiments of the present specification provide a method for generating multidimensional key value pair data. Which comprises the following steps: step S210, acquiring a plurality of initial service data; step S220, determining a first attribute group and the appointed attribute value of the attribute contained in the first attribute group, and determining a key in the key value pair data to be generated based on the first attribute group and the corresponding appointed attribute value; step S230, screening a plurality of initial service data based on the first attribute group and the corresponding designated attribute value to obtain selected service data; step S240, based on the preset data processing mode, performing data processing on the attribute values in the selected service data, and determining the values in the key value pair data based on the data processing result. Wherein the initial traffic data may be, but is not limited to, the traffic data illustrated in table 1. The initial service data is screened, selected service data comprising a first attribute group and a specified attribute value can be obtained, the attribute value of the selected service data is subjected to data processing, data in a specific range can be processed and counted in a directional mode, different data processing results are obtained from different dimensionalities, and therefore data information is richer. The form of multidimensional key value pairs can make the expression of the data better. The present application will be described below with reference to specific examples.
Fig. 2 is a flowchart illustrating a method for generating multidimensional key-value pair data according to an embodiment. The execution subject of the method can be any device, equipment, platform, equipment cluster and the like with computing and processing capabilities. The method includes the following steps S210 to S240.
Step S210, a plurality of initial service data are obtained, and an initial service data includes attribute values of a plurality of attributes. The attributes contained in each initial service data may be the same. The initial traffic data may comprise attribute values for one or more attributes, as described in table 1.
The plurality of initial service data may be obtained from the service platform by the processing platform as the execution subject, or may be obtained from a device in which a database for storing data is located. The plurality of initial traffic data may also be derived from log data. For example, the log data may be directly used as the initial service data, or may be obtained by initializing the daily master data. Initialization processes, for example, include cleaning, screening, and the like.
The number of the acquired initial service data may be preset or may be an indeterminate number. Step S210 may be executed multiple times according to a preset cycle when executed. For example, in a monitoring scenario, log data is continuously generated, and the log data obtained multiple times can be used as initial service data; a plurality of log data acquired in each preset period may also be used as a plurality of initial service data.
Step S220, determining a first attribute group and the designated attribute values of the attributes contained in the first attribute group, and determining the key1 in the key-value pair data1 to be generated based on the first attribute group and the corresponding designated attribute values.
The first property group contains one or more properties. There is a corresponding specified attribute value for each attribute, and the corresponding specified attribute value for each attribute may be one or more. When the first attribute group and the corresponding designated attribute value are determined, the execution main body can directly acquire the set first attribute group and the corresponding designated attribute value, and can also determine the first attribute group and the corresponding designated attribute value according to the input operation of related personnel.
The attributes and corresponding specified attribute values in the first attribute group may be selected from the plurality of initial service data acquired in step S210. The attributes in the first attribute group may be selected from a plurality of attributes of the initial service data as needed. The specified attribute value of the attribute (for example, attribute 1) included in the first attribute group may be selected from a plurality of attribute values corresponding to attribute 1 of the plurality of initial service data. The attributes of the plurality of initial service data may include attributes in the first attribute group, and the attribute values of the attributes of the plurality of initial service data may include specified attribute values of corresponding attributes in the first attribute group.
For example, the first attribute group includes 3 attributes of "city of the user", "usage for loan" and "loan amount" in table 1, and the specified attribute values corresponding to the attributes are "city of the user", "usage for loan" 1 ", and" loan amount "5 k and 4k, respectively. See table 2 below for details.
TABLE 2
In the city Use of borrowing Borrowing amount
Beijing
1 5k,4k
Wherein, the number of the designated attribute values corresponding to the attribute "borrowing amount" is 2. The setting of the first attribute group and the corresponding designated attribute value can embody a specific data group having significance, and the data attribute and attribute value of interest.
A Key-Value pair (Key-Value) is a data representation form, and a Key and a Value are correspondingly associated. In step S220, based on the first attribute group and the corresponding specified attribute value, a key1 in the key-value pair data1 to be generated may be determined. Wherein the key1 can be determined in advance based on the first property group and the corresponding specified property value, and the key1 is directly obtained in step S220; it may also be determined based on the first property group and the corresponding specified property value when performed in step S220.
In the present embodiment, the key1 may include a field part and a field value part. Wherein a group of field values corresponds to a sub-key. In determining the key1 in the key-value pair data1 based on the first attribute group and the corresponding specified attribute value, the following steps 1a and 2a may be taken.
Step 1a, the attributes in the first attribute group are determined as fields contained in key 1.
Step 2a, based on the assigned attribute values of the attributes in the first attribute group, determines one or more sets of field values contained in key1 and correspondingly serves as one or more sub-keys of key 1.
In step 1a, when the first property group contains a property, the property can be directly used as a field of key 1. When the first attribute group includes a plurality of attributes, the plurality of attributes in the first attribute group may be respectively determined as a plurality of fields included in the key 1.
In step 2a, the assigned attribute value of any one of the attributes in the first attribute group may be one or more.
For a plurality of attributes in the first attribute group, when the assigned attribute value of each attribute in the plurality of attributes is one, the assigned attribute values of the plurality of attributes are respectively determined as corresponding field values of fields included in the key1, so as to obtain a group of field values.
For example, the first attribute group includes 3 attributes of "city," purpose of loan, "and" loan amount, "and the specified attribute values corresponding to the attributes are" city, "beijing," purpose of loan, "1," and "loan amount," 5k, respectively. Then, "city," borrowing purpose "and" borrowing amount "can be used as fields in the key1 respectively; "Beijing", "1" and "5 k" are respectively determined as the corresponding field values of each field in the key1, forming a group of field values. See table 3 below for details.
TABLE 3
In the city Use of borrowing Borrowing amount
Beijing
1 5k
Wherein, line 1 in the table is a field section, including 3 fields in key 1; line 2 is a field value part, a corresponding field value including 3 fields, and is a group of field values. The "city" is beijing, "usage for borrowing" is 1, and the "amount of borrowing" is 5k, which is the content included in one sub-key of the key 1.
For a plurality of attributes in the first attribute group, when there are a plurality of specified attribute values of at least one of the plurality of attributes, the specified attribute values of the plurality of attributes are combined to obtain a plurality of attribute value combinations, and the plurality of attribute value combinations are respectively determined as corresponding field values of fields included in the Key1 to obtain a plurality of sets of field values.
For example, taking the first attribute group and the specified attribute value of table 2 as an example, the specified attribute value of the attribute "debit amount" includes two of 5k and 4 k. When determining the sub-key, the assigned attribute values of 3 attributes may be combined to obtain 2 sets of attribute value combinations, which are "beijing, 1, 5 k" and "beijing, 1, 4 k", respectively, and these two attribute value combinations may be used as two sets of field values, and correspond to 2 sub-keys, respectively. FIG. 3 is a diagram of keys in key-value pair data. Where the field part and field value part of the key are marked, and 2 sub-keys.
In step S220, the first attribute group and the corresponding specified attribute value may be one type of data or two types of data acquired separately from the key1 in the key-value pair data 1. In the former case, the first property group, the specified property value, and the key1 are determined. In the latter case, the first property group and the specified property value are determined, and the correspondence between the first property group, the specified property value, and the key1 is determined; alternatively, the key1 is determined, and the correspondence between the first property group, the specified property value, and the key1 is determined. The correspondence between the first property group, the specified property value, and the key1 can be seen from table 2 and fig. 3.
Step S230, based on the first attribute group and the corresponding designated attribute value, screening multiple initial service data to obtain selected service data. And screening the plurality of initial service data, namely screening the initial service data with the attribute value of the attribute in the first attribute group as the designated attribute value from the plurality of initial service data as the selected service data.
For example, when the first attribute group and the designated attribute value are the data in table 2, the initial service data shown in row 2, row 5, and row 6 may be screened from the plurality of initial service data in table 1 as the selected service data.
When the key1 includes one or more sub-keys (see fig. 3), when a plurality of initial service data are filtered, the plurality of initial service data may be filtered by using the attribute and the designated attribute value corresponding to the sub-key for any one of the sub-keys included in the key1, so as to obtain a set of selected service data corresponding to the sub-key. For each sub-key in key1, a corresponding set of selected business data is filtered out.
One sub-key corresponds to a group of field values, and the group of field values can be used as a group of screening conditions to screen corresponding selected service data from a plurality of initial service data. For example, for sub-key 1 in fig. 3, the 2 nd row and the 5 th row may be screened from the initial service data in table 1 by using the attribute and the specified attribute value corresponding to the sub-key 1, so as to obtain the 1 st group of selected service data; for sub-key 2 in fig. 3, row 6 may be screened from the initial service data in table 1 by using the attribute and the specified attribute value corresponding to the sub-key 2, so as to obtain group 2 selected service data. Wherein, the attribute and the designated attribute value corresponding to the sub-key 1 include: the city is Beijing, the borrowing purpose is 1, and the borrowing amount is 5 k. The attribute and the designated attribute value corresponding to this sub-key 1 include: the city is Beijing, the borrowing purpose is 1, and the borrowing amount is 4 k.
Step S240, based on the preset data processing manner, performs data processing on the attribute values in the selected service data, and determines the value1 in the key-value pair data1 based on the data processing result. Here, the attribute value in the selected service data may be one or more.
When the attribute in the first attribute group is one or more and the assigned attribute values corresponding to the attributes included in the first attribute group are all one, the key1 includes a sub-key, and the selected service data obtained correspondingly is a group. The sub-key has a value corresponding thereto. In the present embodiment, the value1 in the key-value pair data1 may be determined directly from the data processing result obtained by performing data processing on the set of selected service data.
When the attribute in the first attribute group is one or more and the assigned attribute value corresponding to at least one attribute included in the first attribute group is multiple, the key1 includes multiple sub-keys, the selected service data obtained correspondingly is multiple groups, and each sub-key corresponds to one group of selected service data. And a plurality of sub-keys, each having a value corresponding to each sub-key. In this embodiment, for any sub-key k1 included in the key1, data processing is performed on the attribute values in a set of selected service data corresponding to the sub-key k1 based on a preset data processing method, and the obtained data processing result is determined as the value v1 in the key-value pair data1 corresponding to the sub-key k 1. That is, when data processing is performed on the selected service data, the data processing is performed according to different groups, and the obtained data processing results are also used for determining the values corresponding to the group (i.e., the sub-key).
For example, sub-key 1 in fig. 3 corresponds to group 1 selected service data, and sub-key 2 in fig. 3 corresponds to group 2 selected service data. When data processing is carried out, for the sub-key 1, data processing is carried out on the attribute values in the 1 st group of selected service data based on a preset data processing mode, and the value of the sub-key 1 is obtained; and for the sub-key 2, performing data processing on the attribute values in the 2 nd group of selected service data based on a preset data processing mode to obtain the value of the sub-key 2.
The data processing mode may be preset according to needs, and may be one or more, for example, at least one of the following modes may be included: summing, averaging, counting the set attribute values, calculating the variance, calculating the standard deviation and calculating the covariance. The set attribute value may be preset as needed, and the number thereof may be one or more. The set attribute value may be selected among a plurality of attributes included in the initial service data.
In step S240, the attribute value in the selected service data may be an attribute value of one attribute, or may be attribute values of a plurality of attributes. For example, when the data processing mode is summation, the attribute may be one, that is, summation of a plurality of attribute values for one attribute in the selected service data; when the data processing method is to determine the covariance, the two attributes may be determined, that is, the covariance between two sets of attribute values corresponding to the two attributes is determined.
In another embodiment of the present specification, step S240, when executed, may include the following steps 1b and 2 b.
Step 1b, determining a second property group.
Wherein, the second attribute group may contain one or more attributes. The second set of attributes may be selected from the attributes contained in the initial service data. When the second attribute group is determined, the execution subject may acquire the set second attribute group, or may determine the second attribute group according to an input operation of a relevant person.
The second property group may contain different properties than the first property group. Step 1b and step S220 may be executed simultaneously, and the execution order of the two steps may not be sequential.
And 2b, based on a preset data processing mode, carrying out data processing on the attribute values of the second attribute group in the selected service data.
For example, for the initial service data shown in table 1, the selected service data includes the row 2 and row 5 data, and when performing data processing, the attribute value corresponding to the second attribute group "existence of expiration" in the selected service data may be subjected to data processing to obtain the value1 in the key value data 1.
When the key1 includes one or more sub-keys, for any one sub-key k1 included in the key1, data processing is performed on the attribute values of the second attribute group in a group of selected service data corresponding to the sub-key k1 based on a preset data processing mode, and the obtained data processing result is determined as the value v1 in the key-value pair data1 corresponding to the sub-key k 1.
In another embodiment of the present description, to make the information in the value of the key-value pair data richer, the value1 may include a field portion and a field value portion. The data processing means may include one or more. Each data handling method has a corresponding field in the value1, for example, an arbitrary first data handling method corresponds to the first field of the value 1. The data processing result of the first data processing mode may be a field value corresponding to the first field.
In step S240, data processing may be performed on the attribute values in the selected service data based on the first data processing manner, and the obtained data processing result may be determined as the field value of the first field.
When the key1 includes one or more sub-keys, for any one sub-key k1 included in the key1, data processing is performed on attribute values of a second attribute group in a group of selected service data corresponding to the sub-key k1 based on a first data processing mode, and an obtained data processing result is determined as a field value of a first field in a value v1 corresponding to the sub-key k 1.
Specifically, when the attribute in the first attribute group is one or more and the assigned attribute value of each attribute is one, that is, when the key1 includes one sub-key, the value corresponding to the sub-key may also include a plurality of fields and corresponding field values.
For example, in the data processing method 1, the ratio of the set attribute value of "yes" in the attribute "presence or absence of overdue" is counted, and the field "overdue rate" in the corresponding value1 is counted; in the data processing method 2, the ratio of the set attribute value of "no" in the attribute "presence or absence of overdue" is counted, and the field "non-overdue rate" in the corresponding value1 is used. See table 4 below for details.
TABLE 4
Figure BDA0003281413130000111
When the assigned attribute value of at least one attribute in the first attribute group is plural, that is, when the key1 includes plural sub-keys, the value corresponding to each sub-key may also include plural fields and corresponding field values, respectively.
When the key1 includes a plurality of sub-keys, it is able to perform data processing on a group of selected service data corresponding to each sub-key based on each data processing method, and determine the data processing result as the field value of the field corresponding to the data processing method in the values corresponding to the sub-key.
For example, table 1 is used as partial data of the initial service data, fig. 3 is used as a sub-key included in the key1, two fields "overdue rate" and "non-overdue rate" in table 4 are used as fields of the value1, and the two fields correspond to data processing mode 1 and data processing mode 2. For the sub-key 1 and the sub-key 2, respectively processing data to obtain the overdue rates of 0.4% and 1.0% based on the data processing mode 1 and whether the attribute in the selected service data of the 1 st group and the selected service data of the 2 nd group has the attribute value corresponding to the overdue; and aiming at the sub-key 1 and the sub-key 2, respectively processing the data of the attribute values corresponding to whether the attribute 'overdue' exists in the selected business data of the 1 st group and the selected business data of the 2 nd group based on the data processing mode 2 to respectively obtain the overdue rates of 99.6 percent and 99.0 percent. See table 5 below for details.
TABLE 5
Figure BDA0003281413130000121
Where the key1 portion in table 5 may be referred to as a group, each sub-key may be referred to as a group, and table 5 is for the case of multiple groups. The first group (i.e., row 3 of table 5) indicates that, among a plurality of initial service data (e.g., table 1), the specific data condition of "whether the attribute is overdue" is satisfied for the initial service data of "the city is beijing, the borrowing purpose is 1, and the borrowing amount is 5 k". Therefore, the key value pair data extracts specific deeper features from the initial business data, which has more important and deeper significance for the subsequent analysis and processing of the data, and improves the expression capacity of the data.
In this embodiment, the value of the key-value pair data contains a field part and a field value part, and the value may no longer be one piece of data but may include a richer value. Thus, the expression capacity of the key value pair data can be further improved, and the data cost is lower.
Referring back to steps S210 to S240, in step S210, a plurality of initial service data may be acquired according to a preset period, and for the plurality of initial service data acquired in each preset period, step S230 and step S240 are executed, so that key-value pair data corresponding to the preset period may be obtained.
The attribute groups (including the first attribute group and the second attribute group) in steps S230 and S240 may be different for different preset periods, and the specific form of the key-value pair data obtained in this way is also different. The specific situation can be flexibly set according to the service requirement. In this way, a large amount of multi-packet multi-cycle multi-key multi-value "multi-dimensional" key-value pair data can be obtained.
After key-value pair data is obtained, the key-value pair data may also be stored in a database.
In the case of the "multidimensional" key-value pair data described above, the information contained in the data itself is abundant. When data retrieval is required, a plurality of data engines can be used to retrieve data from the database, and particularly, one or more data engines can be used. When at least two data engines are used, the data retrieved by the at least two data engines can be different, which can make the retrieval result richer. For example, data retrieval may be performed using one or more of the data engines openTSDB, promQL, and SQL. The openTSDB can be suitable for searching in a black-and-white list regular expression mode, the promQL can provide a flexible post-calculation query mode, post-calculation processing such as summation and difference can be directly performed on initial search results, and the SQL realizes generalized query among data. The above data engine also provides different programming languages for selection.
In the above embodiments, there may be a plurality of first attribute groups, and an association relationship between attributes included in the plurality of first attribute groups may be determined, where the association relationship is used to identify a relationship between different key-value pair data. For example, the first attribute group 1 includes the attribute "city of the location" and the first attribute group 2 includes the attribute "province of the location", and an association relationship between the "city of the location" and the "province of the location" may be established and recorded in the association relationship table. And establishing an incidence relation between different attributes, namely establishing an incidence relation between key value pair data.
The above-described association relationship may also be stored in a database together with the key-value pair data. When retrieving data in the database, relevant key-value pair data may be retrieved based on the association. Therefore, the expandability of the key value pair data is improved, and the retrieval efficiency can be improved.
In this specification, the first property group, the "first" in the first data processing mode, and the corresponding "second" in the text are only for convenience of distinction and description, and do not have any limiting meaning.
The foregoing describes certain embodiments of the present specification, and other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily have to be in the particular order shown or in sequential order to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Fig. 4 is a schematic block diagram of an apparatus for generating multidimensional key-value pair data according to an embodiment. The apparatus 400 may be deployed in any apparatus, device, platform, cluster of devices, etc. having computing, processing capabilities. This embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2. The apparatus 400 comprises:
an obtaining module 410 configured to obtain a plurality of initial service data, one initial service data including attribute values of a plurality of attributes;
a determining module 420 configured to determine a first attribute group and a designated attribute value of an attribute contained in the first attribute group, and a key in the key-value pair data to be generated, which is determined based on the first attribute group and the corresponding designated attribute value;
a screening module 430 configured to screen the plurality of initial service data based on the first attribute group and the corresponding designated attribute value to obtain selected service data;
the processing module 440 is configured to perform data processing on the attribute values in the selected service data based on a preset data processing manner, and determine the values in the key value pair data based on the data processing result.
In one embodiment, the key includes a field portion and a field value portion, wherein a group of field values corresponds to a sub-key; the determining module 420 is further configured to determine a key using the following operations:
determining attributes in the first attribute group as fields contained by the key;
based on the assigned attribute values of the attributes in the first attribute group, one or more sets of field values contained by the key are determined and correspondingly serve as one or more sub-keys of the key.
In one embodiment, the first property group includes a plurality of properties; the appointed attribute value of any one attribute in the first attribute group is one or more;
the determining module 420, when determining the attribute in the first attribute group as the field included in the key, includes:
determining a plurality of attributes in the first attribute group as a plurality of fields contained in the key respectively;
the determining module 420, when determining one or more groups of field values included in the key based on the specified attribute values of the attributes in the first attribute group, includes:
for a plurality of attributes in the first attribute group, when the assigned attribute value of each attribute in the plurality of attributes is one, respectively determining the assigned attribute values of the plurality of attributes as corresponding field values of fields contained in the key to obtain a group of field values;
for a plurality of attributes in the first attribute group, when the assigned attribute value of at least one attribute in the plurality of attributes is multiple, combining the assigned attribute values of the plurality of attributes to obtain a plurality of attribute value combinations; and respectively determining the attribute value combinations as the corresponding field values of the fields contained in the keys to obtain a plurality of groups of field values.
In one embodiment, the screening module 430 is specifically configured to:
and aiming at any sub-key contained in the key, screening the plurality of initial service data by using the attribute and the designated attribute value corresponding to the sub-key to obtain a group of selected service data corresponding to the sub-key.
In one embodiment, the processing module 440 is specifically configured to:
and aiming at any sub-key contained in the key, performing data processing on the attribute values in a group of selected service data corresponding to the sub-key based on a preset data processing mode, and determining the obtained data processing result as the value in the key value pair data corresponding to the sub-key.
In one embodiment, the processing module 440 is specifically configured to:
determining a second property group;
and based on a preset data processing mode, carrying out data processing on the attribute value of the second attribute group in the selected service data.
In one embodiment, the value includes a field portion and a field value portion; the data processing mode comprises one or more modes; any first data processing mode corresponds to the first field of the value;
the processing module 440 is specifically configured to:
and performing data processing on the attribute values in the selected service data based on the first data processing mode, and determining an obtained data processing result as a field value of the first field.
In one embodiment, the data processing means includes at least one of the following means: summing, averaging, counting the set attribute values, calculating the variance, calculating the standard deviation and calculating the covariance.
In one embodiment, the first property group is a plurality; the apparatus 400 further comprises:
an association module (not shown in the figure) configured to determine an association relationship between attributes included in the plurality of first attribute groups; wherein the incidence relation is used for identifying the relation between different key-value pair data.
In one embodiment, the apparatus 400 further comprises:
a storage module (not shown in the figure) configured to store the key-value pair data into a database after obtaining the key-value pair data;
a retrieval module (not shown) configured to retrieve data from the database using a number of data engines when data retrieval is required.
The above device embodiments correspond to the method embodiments, and specific descriptions may refer to descriptions of the method embodiments, which are not repeated herein. The device embodiment is obtained based on the corresponding method embodiment, has the same technical effect as the corresponding method embodiment, and for the specific description, reference may be made to the corresponding method embodiment.
Embodiments of the present specification also provide a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method of any one of fig. 1 to 3.
The embodiment of the present specification further provides a computing device, which includes a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method described in any one of fig. 1 to 3.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the storage medium and the computing device embodiments, since they are substantially similar to the method embodiments, they are described relatively simply, and reference may be made to some descriptions of the method embodiments for relevant points.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in connection with the embodiments of the invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments further describe the objects, technical solutions and advantages of the embodiments of the present invention in detail. It should be understood that the above description is only exemplary of the embodiments of the present invention, and is not intended to limit the scope of the present invention, and any modification, equivalent replacement, or improvement made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (19)

1. A method of generating multidimensional key-value pair data, comprising:
acquiring a plurality of initial service data, wherein one initial service data comprises attribute values of a plurality of attributes;
determining a first attribute group and designated attribute values of attributes contained in the first attribute group, and determining keys in key value pair data to be generated based on the first attribute group and the corresponding designated attribute values;
screening the plurality of initial service data based on the first attribute group and the corresponding designated attribute value to obtain selected service data;
and performing data processing on the attribute values in the selected service data based on a preset data processing mode, and determining the values in the key value pair data based on the data processing result.
2. The method of claim 1, the key comprising a field portion and a field value portion, wherein a group of field values corresponds to a sub-key; the key is determined in the following way:
determining attributes in the first attribute group as fields contained by the key;
based on the assigned attribute values of the attributes in the first attribute group, one or more sets of field values contained by the key are determined and correspondingly serve as one or more sub-keys of the key.
3. The method of claim 2, said first property group comprising a plurality of properties; the appointed attribute value of any one attribute in the first attribute group is one or more;
the step of determining the attributes in the first attribute group as the fields contained by the key includes:
determining a plurality of attributes in the first attribute group as a plurality of fields contained in the key respectively;
said step of determining one or more sets of field values contained by said key based on specified attribute values of attributes in said first set of attributes comprises:
for a plurality of attributes in the first attribute group, when the assigned attribute value of each attribute in the plurality of attributes is one, respectively determining the assigned attribute values of the plurality of attributes as corresponding field values of fields contained in the key to obtain a group of field values;
for a plurality of attributes in the first attribute group, when the assigned attribute value of at least one attribute in the plurality of attributes is multiple, combining the assigned attribute values of the plurality of attributes to obtain a plurality of attribute value combinations; and respectively determining the attribute value combinations as the corresponding field values of the fields contained in the keys to obtain a plurality of groups of field values.
4. The method of claim 2, wherein the step of filtering the plurality of initial traffic data based on the first set of attributes and corresponding specified attribute values comprises:
and aiming at any sub-key contained in the key, screening the plurality of initial service data by using the attribute and the designated attribute value corresponding to the sub-key to obtain a group of selected service data corresponding to the sub-key.
5. The method according to claim 4, wherein the step of performing data processing on the attribute values in the selected service data based on a preset data processing manner and determining the values in the key value pair data based on the data processing result comprises:
and aiming at any sub-key contained in the key, performing data processing on the attribute values in a group of selected service data corresponding to the sub-key based on a preset data processing mode, and determining the obtained data processing result as the value in the key value pair data corresponding to the sub-key.
6. The method of claim 1, wherein the step of performing data processing on the attribute values in the selected service data comprises:
determining a second property group;
and based on a preset data processing mode, carrying out data processing on the attribute value of the second attribute group in the selected service data.
7. The method of claim 1, the value comprising a field portion and a field value portion; the data processing mode comprises one or more modes; any first data processing mode corresponds to the first field of the value;
the step of performing data processing on the attribute values in the selected service data based on a preset data processing mode and determining the values in the key value pair data based on the data processing result includes:
and performing data processing on the attribute values in the selected service data based on the first data processing mode, and determining an obtained data processing result as a field value of the first field.
8. The method of claim 1, the data processing mode comprising at least one of:
summing, averaging, counting the set attribute values, calculating the variance, calculating the standard deviation and calculating the covariance.
9. The method of claim 1, said first set of attributes being a plurality; the method further comprises the following steps:
determining the incidence relation among the attributes contained in the first attribute groups; wherein the incidence relation is used for identifying the relation between different key-value pair data.
10. The method of claim 1, after obtaining the key-value pair data, further comprising:
storing the key-value pair data in a database;
when data retrieval is required, data is retrieved from the database using a number of data engines.
11. An apparatus for generating multidimensional key-value pair data, comprising:
the acquisition module is configured to acquire a plurality of initial service data, and one initial service data comprises attribute values of a plurality of attributes;
the determining module is configured to determine a first attribute group and designated attribute values of attributes contained in the first attribute group, and a key in the key-value pair data to be generated, which is determined based on the first attribute group and the corresponding designated attribute values;
the screening module is configured to screen the plurality of initial service data based on the first attribute group and the corresponding designated attribute value to obtain selected service data;
and the processing module is configured to perform data processing on the attribute values in the selected service data based on a preset data processing mode, and determine the values in the key value pair data based on the data processing result.
12. The apparatus of claim 11, the key comprising a field portion and a field value portion, wherein a group of field values corresponds to a sub-key; the determination module is further configured to determine a key using:
determining attributes in the first attribute group as fields contained by the key;
based on the assigned attribute values of the attributes in the first attribute group, one or more sets of field values contained by the key are determined and correspondingly serve as one or more sub-keys of the key.
13. The apparatus of claim 12, said first set of attributes comprising a plurality of attributes; the appointed attribute value of any one attribute in the first attribute group is one or more;
the determining module, when determining the attribute in the first attribute group as the field included in the key, includes:
determining a plurality of attributes in the first attribute group as a plurality of fields contained in the key respectively;
the determining module, when determining one or more groups of field values included in the key based on the specified attribute values of the attributes in the first attribute group, includes:
for a plurality of attributes in the first attribute group, when the assigned attribute value of each attribute in the plurality of attributes is one, respectively determining the assigned attribute values of the plurality of attributes as corresponding field values of fields contained in the key to obtain a group of field values;
for a plurality of attributes in the first attribute group, when the assigned attribute value of at least one attribute in the plurality of attributes is multiple, combining the assigned attribute values of the plurality of attributes to obtain a plurality of attribute value combinations; and respectively determining the attribute value combinations as the corresponding field values of the fields contained in the keys to obtain a plurality of groups of field values.
14. The apparatus of claim 12, the screening module being specifically configured to:
and aiming at any sub-key contained in the key, screening the plurality of initial service data by using the attribute and the designated attribute value corresponding to the sub-key to obtain a group of selected service data corresponding to the sub-key.
15. The apparatus of claim 14, the processing module being specifically configured to:
and aiming at any sub-key contained in the key, performing data processing on the attribute values in a group of selected service data corresponding to the sub-key based on a preset data processing mode, and determining the obtained data processing result as the value in the key value pair data corresponding to the sub-key.
16. The apparatus of claim 11, the processing module being specifically configured to:
determining a second property group;
and based on a preset data processing mode, carrying out data processing on the attribute value of the second attribute group in the selected service data.
17. The apparatus of claim 11, the value comprising a field portion and a field value portion; the data processing mode comprises one or more modes; any first data processing mode corresponds to the first field of the value;
the processing module is specifically configured to:
and performing data processing on the attribute values in the selected service data based on the first data processing mode, and determining an obtained data processing result as a field value of the first field.
18. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-10.
19. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-10.
CN202111134047.3A 2021-09-27 2021-09-27 Method and device for generating multidimensional key value pair data Pending CN113849514A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111134047.3A CN113849514A (en) 2021-09-27 2021-09-27 Method and device for generating multidimensional key value pair data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111134047.3A CN113849514A (en) 2021-09-27 2021-09-27 Method and device for generating multidimensional key value pair data

Publications (1)

Publication Number Publication Date
CN113849514A true CN113849514A (en) 2021-12-28

Family

ID=78980561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111134047.3A Pending CN113849514A (en) 2021-09-27 2021-09-27 Method and device for generating multidimensional key value pair data

Country Status (1)

Country Link
CN (1) CN113849514A (en)

Similar Documents

Publication Publication Date Title
US11163670B2 (en) Data records selection
Mampaey et al. Summarizing data succinctly with the most informative itemsets
CN104756107B (en) Using location information profile data
Frawley et al. Knowledge discovery in databases: An overview
CA2613503C (en) Aggregating data with complex operations
JP4997856B2 (en) Database analysis program, database analysis apparatus, and database analysis method
WO2015035864A1 (en) Method, apparatus and system for data analysis
US11210350B2 (en) Automated assistance for generating relevant and valuable search results for an entity of interest
US20130080584A1 (en) Predictive field linking for data integration pipelines
US20200372079A1 (en) System and method for generating query suggestions reflective of groups
US9098550B2 (en) Systems and methods for performing data analysis for model proposals
US8589451B1 (en) Systems and methods for generating a common data model for relational and object oriented databases
US20030033138A1 (en) Method for partitioning a data set into frequency vectors for clustering
CN112241420A (en) Government affair service item recommendation method based on association rule algorithm
US20150199409A1 (en) Item counting in guided information access systems
CN109213793A (en) A kind of stream data processing method and system
CN113849514A (en) Method and device for generating multidimensional key value pair data
CN110414813B (en) Index curve construction method, device and equipment
US9244988B2 (en) Dynamic relevant reporting
CN113568888A (en) Index recommendation method and device
Chen FEATURE SELECTION BASED ON COMPACTNESS AND SEPARABILITY: COMPARISON WITH FILTER‐BASED METHODS
CN112785320B (en) Credit risk determination method and device, storage medium and electronic equipment
Abyzov BALANCING DATA NORMALIZATION AND DENORMALIZATION IN SPORTS COMPETITION MANAGEMENT PLATFORMS: A COMPARATIVE ANALYSIS
US20150081380A1 (en) Complement self service business intelligence with cleansed and enriched customer data
Burzanska Data Model for Rich Time Series Data and Chameleon Query Language.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination