CN111930756A - Feature construction method and device for source data, electronic equipment and medium - Google Patents

Feature construction method and device for source data, electronic equipment and medium Download PDF

Info

Publication number
CN111930756A
CN111930756A CN202010987617.2A CN202010987617A CN111930756A CN 111930756 A CN111930756 A CN 111930756A CN 202010987617 A CN202010987617 A CN 202010987617A CN 111930756 A CN111930756 A CN 111930756A
Authority
CN
China
Prior art keywords
target
feature
dimension
source data
operator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010987617.2A
Other languages
Chinese (zh)
Other versions
CN111930756B (en
Inventor
王兴武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Bodun Xiyan Technology Co.,Ltd.
Original Assignee
Tongdun Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongdun Holdings Co Ltd filed Critical Tongdun Holdings Co Ltd
Priority to CN202010987617.2A priority Critical patent/CN111930756B/en
Publication of CN111930756A publication Critical patent/CN111930756A/en
Application granted granted Critical
Publication of CN111930756B publication Critical patent/CN111930756B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The invention discloses a feature construction method, a device, electronic equipment and a storage medium for source data, which relate to the technical field of data processing, the method constructs a target by acquiring source data to be processed and corresponding features, wherein the feature construction target comprises a target measure, a target operator name and a target dimension, analyzes the feature construction target, constructs a feature list according to the target operator name, the target measure and the target dimension, the feature list comprises a plurality of target features, and aiming at each target feature in the feature list: the method comprises the steps of calling a corresponding operator function from a pre-generated operator library, extracting a metric value of corresponding measurement under a corresponding dimensional value from source data to be processed, and calculating the metric value by using the operator function to obtain a characteristic value of the target characteristic, so that the problem of low extraction efficiency caused by the fact that a data set comprising a large amount of original data is subjected to characteristic value extraction is solved, and the efficiency of characteristic value extraction is improved.

Description

Feature construction method and device for source data, electronic equipment and medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for constructing characteristics of source data, an electronic device, and a storage medium.
Background
The challenge brought by the data age is not only the explosive growth of data volume, but more importantly how to manage, cure and utilize the data. The original data is the data at the bottom layer of the big data platform and is low-value data. With the explosive growth of data, it is difficult for enterprises to directly apply raw data, and it is often necessary to apply raw data processing to model or policy engine characteristic values, thereby realizing the business value of data mining application.
In the related technology, custom development is often driven by requirements on a data set comprising a large amount of original data, and feature engineering is performed after a result is fed back according to the data requirements and development requirements, so that a feature value is obtained.
In the related art, the problem of low extraction efficiency exists when feature values are extracted from a data set comprising a large amount of original data, and an effective solution is not proposed at present.
Disclosure of Invention
In order to overcome the defects of the prior art, an object of the present invention is to provide a feature construction method for source data, so as to solve at least the problem of low extraction efficiency in the related art of extracting feature values from a data set including a large amount of raw data.
One of the purposes of the invention is realized by adopting the following technical scheme:
acquiring source data to be processed and a corresponding feature construction target, wherein the feature construction target comprises a target measure, a target operator name and a target dimension;
analyzing the feature construction target, and constructing a feature list according to a target operator name, a target measure and a target dimension, wherein the feature list comprises a plurality of target features;
for each target feature in the feature list: calling a corresponding operator function from a pre-generated operator library, and extracting a metric value of corresponding measurement under a corresponding dimension value from the source data to be processed; and calculating the metric value by using the operator function to obtain a characteristic value of the target characteristic.
Further, the analyzing the feature construction target, and constructing a feature list according to the target operator name, the target measure and the target dimension, includes: and randomly combining the target operator name, the target measurement and the dimensional value under the target dimension to obtain the feature list.
Further, the target dimension comprises a main dimension, and the main dimension is used for identifying identity information in the source data to be processed.
Further, the target dimension further comprises a conditional dimension, and the conditional dimension is used for performing dimension screening on the source data to be processed.
Further, in a case that the condition dimension includes a time dimension, after the obtaining the feature construction target, the method includes: aggregating the source data to be processed according to the dimension values under the main dimension and the time dimension to obtain an intermediate data table;
for each target feature in the feature list: calling corresponding operator functions from the operator library, and extracting corresponding metric values of corresponding metrics under corresponding dimensional values from the intermediate data table; and calculating the metric value by using the operator function to obtain a characteristic value of the target characteristic.
Another object of the present invention is to provide a feature constructing apparatus for source data, so as to solve at least the problem of low extraction efficiency in extracting feature values from a data set including a large amount of raw data in the related art.
The second purpose of the invention is realized by adopting the following technical scheme:
a feature construction device for source data comprises an acquisition module, a feature list construction module and a first feature value calculation module;
the acquisition module is used for acquiring source data to be processed and a corresponding feature construction target, wherein the feature construction target comprises a target measure, a target operator name and a target dimension;
the characteristic list building module is used for analyzing the characteristic building target and building a characteristic list according to a target operator name, a target measurement and a target dimension, wherein the characteristic list comprises a plurality of target characteristics;
the first feature value calculation module is configured to, for each target feature in the feature list: calling a corresponding operator function from a pre-generated operator library, and extracting a metric value of corresponding measurement under a corresponding dimension value from the source data to be processed; and calculating the metric value by using the operator function to obtain a characteristic value of the target characteristic.
Further, in a case where a condition dimension of the target dimensions includes a time dimension, the apparatus includes an intermediate data table module and a second feature value calculation module;
the intermediate data table module is used for aggregating the source data to be processed according to the dimension values under the main dimension and the time dimension to obtain an intermediate data table;
the second feature value calculation module is configured to, for each target feature in the feature list: calling corresponding operator functions from the operator library, and extracting corresponding metric values of corresponding metrics under corresponding dimensional values from the intermediate data table; and calculating the metric value by using the operator function to obtain a characteristic value of the target characteristic.
It is a further object of the invention to provide an electronic device comprising a processor, a storage medium and a computer program, the computer program being stored in the storage medium, the computer program, when being executed by the processor, being a method for feature construction of source data, one of the objects of the invention.
It is a fourth object of the present invention to provide a computer-readable storage medium storing one of the objects of the present invention, having a computer program stored thereon, which when executed by a processor, implements a feature construction method for source data of one of the objects of the present invention.
Compared with the related technology, the feature construction method provided by the invention comprises the steps of obtaining source data to be processed and corresponding feature construction targets, analyzing the feature construction targets, constructing a feature list according to the target operator names, the target metrics and the target dimensions, wherein the feature list comprises a plurality of target features, and aiming at each target feature in the feature list: the method comprises the steps of calling a corresponding operator function from a pre-generated operator library, extracting a metric value of corresponding measurement under a corresponding dimensional value from source data to be processed, and calculating the metric value by using the operator function to obtain a characteristic value of the target characteristic, so that the problem of low extraction efficiency caused by the fact that a data set comprising a large amount of original data is subjected to characteristic value extraction is solved, and the efficiency of characteristic value extraction is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow diagram of a feature construction method for source data according to an embodiment of the application;
FIG. 2 is a flow diagram of another method for feature construction of source data according to an embodiment of the present application;
fig. 3 is a block diagram of a feature construction apparatus for source data according to an embodiment of the present application;
FIG. 4 is a block diagram of another apparatus for constructing source data according to an embodiment of the present application;
fig. 5 is an internal structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The present embodiment provides a feature construction method for source data, and fig. 1 is a flowchart of a feature construction method for source data according to an embodiment of the present application, and as shown in fig. 1, the flowchart includes the following steps:
s110, obtaining source data to be processed and a corresponding feature construction target, wherein the feature construction target comprises a target measure, a target operator name and a target dimension.
The source data to be processed is a large structured raw data set, and comprises raw data and a corresponding data table structure. The source data to be processed can be directly acquired in a specific service scene, for example, when the call information of the user is analyzed, structured call information data including a mobile phone number of the user, a timestamp, a called number, call time consumption, a call area and a call date is acquired; or obtained from a third-party database according to the relevant regulations according to the data service scene.
The characteristic construction target is target content input by the client, and corresponds to the source data to be processed. The target measurement, the target operator name and the target dimension in the feature construction target all comprise a plurality of dimensional values. The target operator name represents the name of an operator function, and the operator function is not limited to one or more of summing, averaging, maximizing, minimizing, scaling, rounding, and scaling. The target operator name can be provided with a plurality of options by a client side, and can also be customized by a user, when the user is customized, a source code is required to be provided by the user, and the source code can be maintained through a server side background or uploaded by the user.
S120, analyzing the feature construction target, and constructing a feature list according to the target operator name, the target measurement and the target dimension, wherein the feature list comprises a plurality of target features. And constructing a target by analyzing the features to obtain a target operator name, a target measurement and a dimensional value under a target dimension, and combining the target operator name, the target measurement and the dimensional value under the target dimension to form each target feature in the feature list. The target features are names of feature values, the names of the target features obtained in the step S120 are unified, and the names of operator functions are contained, so that a user can conveniently check and use the target features.
S130, aiming at each target feature in the feature list: calling a corresponding operator function from a pre-generated operator library, and extracting a metric value of corresponding measurement under a corresponding dimension value from source data to be processed; and calculating the metric value by using an operator function to obtain a characteristic value of the target characteristic.
The operator library comprises an operator library interface and various operator functions packaged in the operator library, the various operator functions are packaged in a unified mode, unified input and output formats are set, computing capacity output is carried out in a library mode, the operator library can be multiplexed, and the efficiency of extracting characteristic values of various to-be-processed source data is improved integrally. The operator library interface is used as an interface for extracting the metric value of the corresponding metric under the corresponding dimension value from the source data to be processed, so that the calculation in the operator library does not change due to different source data to be processed, and the reliability of characteristic value extraction is favorably improved. The operator library interface is defined as follows:
trait Calculator {
def parse(indexName : String) : IndexItem
def extractData(record : Map[String, Any])
def calculate() : String
}
the method comprises the steps of obtaining a corresponding dimension value in each target feature by a parse () function, extracting a metric value of a corresponding metric under the corresponding dimension value from source data to be processed by an extctdata () function, and realizing a plurality of operator functions packaged in an operator library by a calculate () function, wherein the operator functions comprise the operator functions corresponding to target operator names.
According to the method and the device, the target is constructed by analyzing the features, the feature list is constructed according to the target operator name, the target measurement and the target dimension, each target feature in the feature list can be calculated through the operator library to obtain a corresponding feature value, custom development is not needed to be carried out from the head according to different source data to be processed, the target features in the feature list are enabled to be defined consistently, the feature value extraction efficiency is improved, and meanwhile the development cost can be reduced.
In some embodiments, parsing the feature construction target, and constructing the feature list according to the target operator name, the target metric, and the target dimension, specifically includes: and randomly combining the target operator name, the target measurement and the dimensional value under the target dimension to obtain a feature list. In the feature list obtained by the method, the same-name synonymy or synonymy condition of the synonymy can not occur in all the target features, so that the repetition or omission of the target features is avoided, and the reliability of feature value extraction is favorably improved.
In some embodiments, the target dimension comprises a primary dimension that is used to identify identity information in the source data to be processed. The target dimension only comprises one main dimension, such as a field of 'user mobile phone number', 'user ID' and the like which uniquely represent identity information.
In some embodiments, the target dimension comprises a conditional dimension, which is used to dimension filter the source data to be processed. A conditional dimension is not necessary in the target dimension, the conditional dimension includes a finite number of dimension values, and in the case where no dimension value is contained in the conditional dimension, a special character representation such as the english character "all" may be agreed upon.
In some embodiments, fig. 2 is a flowchart of another feature construction method for source data according to an embodiment of the present application, and as shown in fig. 2, in a case where a condition dimension in a target dimension includes a time dimension, the method includes the following steps:
s210, obtaining source data to be processed and a corresponding feature construction target, wherein the feature construction target comprises a target measure, a target operator name and a target dimension.
And S220, aggregating the source data to be processed according to the dimension values under the main dimension and the time dimension to obtain an intermediate data table. The dimension value under the major dimension is specific user identity information in the data to be processed, for example, the major dimension is "user ID", and the dimension value under the user ID "includes an ID number of the user 1, an ID number of the user 2, an ID number of the user 3, and the like. The dimension value in the time dimension is a time slice obtained by dividing the time in the source data to be processed, for example, the dimension values in the time dimension are 1 day, 7 days and 30 days if the time slices of the whole day of 1 day, 7 days and 30 days are obtained. Specifically, the source data to be processed is aggregated according to the dimensional values in the main dimension and the time dimension to obtain an intermediate data table, and the dimensional values in the main dimension and the time dimension are stored as indexes of data in the intermediate data table.
S230, analyzing the feature construction target, and constructing a feature list according to the target operator name, the target measurement and the target dimension, wherein the feature list comprises a plurality of target features.
S240, aiming at each target feature in the feature list: calling corresponding operator functions from the operator library, and extracting corresponding metric values under corresponding dimension values from the intermediate data table; and calculating the metric value by using an operator function to obtain a characteristic value of the target characteristic.
Each calculation of the eigenvalues may be based on the source data to be processed, but each time a calculation is made from all of the most primitive source data to be processed, extremely computational resources are wasted. In practical use, the characteristic value is usually periodically updated offline, and the embodiment of the application can improve the efficiency of calculating the whole characteristic value by performing aggregation calculation on the source data to be processed according to the principal dimension value and the dimension value in the time dimension. According to the method and the device, the efficiency of extracting the metric values from the intermediate data table through indexing is higher than that of extracting the metric values from large-scale source data to be processed.
In some embodiments, the condition dimensions may include other condition dimensions in addition to the time dimension, with the number of other condition dimensions ranging from 0 to 3 in view of computational performance. And under the condition that the condition dimension comprises at least one other condition dimension besides the time dimension, aggregating the source data to be processed according to the main dimension, the time dimension and the dimension value under at least one other condition dimension to obtain an intermediate data table.
In some embodiments, after the feature construction target is obtained, the feature construction method further includes: and establishing a feature model according to the features. Specifically, a target dimension, a target operator name and a target metric in the feature construction target generate a feature construction according to a preset format, for example: and the target dimension _ target operator name _ target measurement is realized, so that the target content input by the client is briefly described, and a user can intuitively know the meaning corresponding to the characteristic value conveniently. For example, the feature construction target includes a target dimension "user ID", a target dimension "time slice", a target dimension "communication area", and a target operator name, and the feature model is as follows: and the user ID _ time slice _ call area _ target operator name describes target content input by the client in a concise manner.
The present invention will be described below with reference to the construction of feature values from call information data. The call information data is used as source data to be processed, and the source data to be processed comprises original data and an original call information table shown in table 1. The acquired feature construction targets comprise a main dimension 'mobile phone number', a time dimension 'time slice', a first condition dimension 'call time interval type', a target measurement and a target operator name, so that feature values representing call times of users in different call time intervals (the day time interval is 05: 00-19: 00, and the night time interval is 20: 00-04: 00) within 7 days and 30 days in the past are constructed.
TABLE 1 original call information Table
Figure 500530DEST_PATH_IMAGE001
Analyzing the feature construction target to obtain a main dimension, a time dimension, a first condition dimension, a target measurement and a dimensional value under a target operator name, and randomly combining the target operator name, the target measurement and the dimensional value under the target dimension to obtain a feature list comprising a plurality of target features. For example, the dimension values obtained by the analysis in the main dimension include that the mobile phone numbers of a plurality of users are respectively recorded as mi, i =1,.. the, n, and n are the number of the users, the dimension values in the time dimension include 7 days and 30 days and are respectively recorded as 7day and 30day, the dimension values in the first condition dimension include day and night and are respectively recorded as day and night, the dimension values in the target dimension include the number of calls and are recorded as amount, the dimension values in the target operator name include the frequency operator name and are recorded as freq, and the feature list shown in table 2 can be obtained by arbitrarily combining the dimension values:
TABLE 2 List of characteristics
Figure DEST_PATH_IMAGE003
For a target feature in the feature list, for example, m1_ freq _ acount _ day _7day, a freq corresponding operator function is called from a pre-generated operator library, metric values of the metric acount under dimension values m1, day and 7day are extracted from source data to be processed, and the extracted metric values are calculated by using the freq corresponding operator function, so that a feature value of the target feature m1_ freq _ acount _ day _7day can be obtained. For each target feature in the feature list, a corresponding feature value can be obtained through the above calculation.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The present embodiment further provides a feature construction apparatus for source data, which is used to implement the foregoing embodiments and preferred embodiments, and the description of the feature construction apparatus is omitted for brevity. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram of a feature construction apparatus for source data according to an embodiment of the present application, and as shown in fig. 3, the apparatus includes an obtaining module 310, a feature list construction module 320, and a first feature value calculation module 330:
an obtaining module 310, configured to obtain source data to be processed and a corresponding feature construction target, where the feature construction target includes a target metric, a target operator name, and a target dimension; the feature list construction module 320 is configured to analyze a feature construction target, and construct a feature list according to a target operator name, a target metric, and a target dimension, where the feature list includes a plurality of target features; a first feature value calculation module 330, configured to, for each target feature in the feature list: calling a corresponding operator function from a pre-generated operator library, and extracting a metric value of corresponding measurement under a corresponding dimension value from source data to be processed; and calculating the metric value by using an operator function to obtain a characteristic value of the target characteristic.
Fig. 4 is a block diagram of another feature construction apparatus for source data according to an embodiment of the present application, and as shown in fig. 4, in the case that the condition dimension in the target dimension includes a time dimension, the apparatus includes an obtaining module 410, a feature list construction module 420, an intermediate data table module 430, and a second feature value calculation module 440:
an obtaining module 410, configured to obtain source data to be processed and a corresponding feature construction target, where the feature construction target includes a target metric, a target operator name, and a target dimension; the feature list building module 420 is configured to parse a feature building target, and build a feature list according to a target operator name, a target metric, and a target dimension, where the feature list includes a plurality of target features; the intermediate data table module 430 is configured to aggregate the source data to be processed according to the dimension values in the main dimension and the time dimension to obtain an intermediate data table; a second feature value calculation module 440, configured to, for each target feature in the feature list: calling corresponding operator functions from the operator library, and extracting corresponding metric values under corresponding dimension values from the intermediate data table; and calculating the metric value by using an operator function to obtain a characteristic value of the target characteristic.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In addition, in combination with the feature construction method for source data in the foregoing embodiments, the present application embodiment may be implemented by providing a storage medium. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the above-described embodiments of a feature construction method for source data.
In one embodiment, fig. 5 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application, and as shown in fig. 5, an electronic device is provided, where the electronic device may be a server, and the internal structure diagram may be as shown in fig. 5. The electronic device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the electronic device is used for storing data. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a feature construction method for source data.
Those skilled in the art will appreciate that the configuration shown in fig. 5 is a block diagram of only a portion of the configuration associated with the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (9)

1. A method for feature construction of source data, the method comprising:
acquiring source data to be processed and a corresponding feature construction target, wherein the feature construction target comprises a target measure, a target operator name and a target dimension;
analyzing the feature construction target, and constructing a feature list according to a target operator name, a target measure and a target dimension, wherein the feature list comprises a plurality of target features;
for each target feature in the feature list: calling a corresponding operator function from a pre-generated operator library, and extracting a metric value of corresponding measurement under a corresponding dimension value from the source data to be processed; and calculating the metric value by using the operator function to obtain a characteristic value of the target characteristic.
2. The method of claim 1, wherein parsing the feature build target to build a feature manifest according to target operator names, target metrics, and target dimensions comprises: and randomly combining the target operator name, the target measurement and the dimensional value under the target dimension to obtain the feature list.
3. The feature construction method for source data of claim 1, wherein the target dimension comprises a primary dimension for identifying identity information in the source data to be processed.
4. The feature construction method for source data of claim 3, wherein the target dimension further comprises a conditional dimension, the conditional dimension being used for dimension screening of the source data to be processed.
5. The feature construction method for source data according to claim 4, wherein in a case where the condition dimension includes a time dimension, after the feature construction target is acquired, the method includes: aggregating the source data to be processed according to the dimension values under the main dimension and the time dimension to obtain an intermediate data table;
for each target feature in the feature list: calling corresponding operator functions from the operator library, and extracting corresponding metric values of corresponding metrics under corresponding dimensional values from the intermediate data table; and calculating the metric value by using the operator function to obtain a characteristic value of the target characteristic.
6. A feature construction device for source data is characterized by comprising an acquisition module, a feature list construction module and a first feature value calculation module;
the acquisition module is used for acquiring source data to be processed and a corresponding feature construction target, wherein the feature construction target comprises a target measure, a target operator name and a target dimension;
the characteristic list building module is used for analyzing the characteristic building target and building a characteristic list according to a target operator name, a target measurement and a target dimension, wherein the characteristic list comprises a plurality of target characteristics;
the first feature value calculation module is configured to, for each target feature in the feature list: calling a corresponding operator function from a pre-generated operator library, and extracting a metric value of corresponding measurement under a corresponding dimension value from the source data to be processed; and calculating the metric value by using the operator function to obtain a characteristic value of the target characteristic.
7. The trait building apparatus for source data of claim 6, wherein in a case that a conditional dimension in the target dimension comprises a time dimension, the apparatus comprises an intermediate data table module and a second trait value calculation module;
the intermediate data table module is used for aggregating the source data to be processed according to the dimension values under the main dimension and the time dimension to obtain an intermediate data table;
the second feature value calculation module is configured to, for each target feature in the feature list: calling corresponding operator functions from the operator library, and extracting corresponding metric values of corresponding metrics under corresponding dimensional values from the intermediate data table; and calculating the metric value by using the operator function to obtain a characteristic value of the target characteristic.
8. An electronic device comprising a processor, a storage medium, and a computer program, the computer program being stored in the storage medium, wherein the computer program, when executed by the processor, performs the feature construction method for source data of any one of claims 1 to 5.
9. A computer storage medium having a computer program stored thereon, characterized in that: the computer program, when executed by a processor, implements the feature construction method for source data of any one of claims 1 to 5.
CN202010987617.2A 2020-09-18 2020-09-18 Feature construction method and device for source data, electronic equipment and medium Active CN111930756B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010987617.2A CN111930756B (en) 2020-09-18 2020-09-18 Feature construction method and device for source data, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010987617.2A CN111930756B (en) 2020-09-18 2020-09-18 Feature construction method and device for source data, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN111930756A true CN111930756A (en) 2020-11-13
CN111930756B CN111930756B (en) 2021-02-12

Family

ID=73333931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010987617.2A Active CN111930756B (en) 2020-09-18 2020-09-18 Feature construction method and device for source data, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN111930756B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112732759A (en) * 2020-12-31 2021-04-30 青岛海尔科技有限公司 Data processing method and device, storage medium and electronic device
CN113837604A (en) * 2021-09-23 2021-12-24 万申科技股份有限公司 Multi-source heterogeneous data fusion and multi-dimensional data correlation analysis system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2804137A1 (en) * 2013-05-16 2014-11-19 Orange Vertical social network
CN105307121A (en) * 2015-10-16 2016-02-03 上海晶赞科技发展有限公司 Information processing method and device
CN109977151A (en) * 2019-03-28 2019-07-05 北京九章云极科技有限公司 A kind of data analysing method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2804137A1 (en) * 2013-05-16 2014-11-19 Orange Vertical social network
CN105307121A (en) * 2015-10-16 2016-02-03 上海晶赞科技发展有限公司 Information processing method and device
CN109977151A (en) * 2019-03-28 2019-07-05 北京九章云极科技有限公司 A kind of data analysing method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112732759A (en) * 2020-12-31 2021-04-30 青岛海尔科技有限公司 Data processing method and device, storage medium and electronic device
CN112732759B (en) * 2020-12-31 2023-02-03 青岛海尔科技有限公司 Data processing method and device, storage medium and electronic device
CN113837604A (en) * 2021-09-23 2021-12-24 万申科技股份有限公司 Multi-source heterogeneous data fusion and multi-dimensional data correlation analysis system

Also Published As

Publication number Publication date
CN111930756B (en) 2021-02-12

Similar Documents

Publication Publication Date Title
CN111930756B (en) Feature construction method and device for source data, electronic equipment and medium
CN108062367B (en) Data list uploading method and terminal thereof
CN112434039A (en) Data storage method, device, storage medium and electronic device
CN109918205B (en) Edge equipment scheduling method, system, device and computer storage medium
CN110830577B (en) Service request call tracking method and device, computer equipment and storage medium
CN107704360B (en) Monitoring data processing method, equipment, server and storage medium
CN111143163B (en) Data monitoring method, device, computer equipment and storage medium
CN108255701B (en) Scene testing method and mobile terminal
US20170026793A1 (en) Method and apparatus for providing location information of mobile terminal
CN110716989A (en) Dimension data processing method and device, computer equipment and storage medium
CN112104505B (en) Application recommendation method, device, server and computer readable storage medium
CN108989365B (en) Information processing method, server, terminal equipment and storage medium
CN109033165A (en) A kind of method for exhibiting data, computer readable storage medium and terminal device
CN116737765A (en) Service alarm information processing method and device, electronic equipment and storage medium
CN111159226A (en) Index query method and system
CN109474386B (en) Signaling tracking method, system, network element equipment and storage medium
CN111414395A (en) Data processing method, system and computer equipment
CN115952398A (en) Data uploading statistical calculation method and system based on Internet of things and storage medium
CN108052521B (en) Coordinated data display method, application server and storage medium
CN111367686A (en) Service interface calling method and device, computer equipment and storage medium
CN115409345A (en) Service index calculation method and device, computer equipment and storage medium
CN112764988B (en) Data segment acquisition method and device
CN113778996A (en) Large data stream data processing method and device, electronic equipment and storage medium
CN113392131A (en) Data processing method and device and computer equipment
CN109284260B (en) Big data file reading method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210922

Address after: 311121 room 210, building 18, No. 998, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee after: Hangzhou Bodun Xiyan Technology Co.,Ltd.

Address before: Room 704, building 18, No. 998, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee before: TONGDUN HOLDINGS Co.,Ltd.