CN117575040A - Feature online development method and system - Google Patents

Feature online development method and system Download PDF

Info

Publication number
CN117575040A
CN117575040A CN202311509330.9A CN202311509330A CN117575040A CN 117575040 A CN117575040 A CN 117575040A CN 202311509330 A CN202311509330 A CN 202311509330A CN 117575040 A CN117575040 A CN 117575040A
Authority
CN
China
Prior art keywords
features
feature
data
sets
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311509330.9A
Other languages
Chinese (zh)
Inventor
唐科伟
叶剑涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Fulin Technology Co ltd
Original Assignee
Zhejiang Fulin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Fulin Technology Co ltd filed Critical Zhejiang Fulin Technology Co ltd
Priority to CN202311509330.9A priority Critical patent/CN117575040A/en
Publication of CN117575040A publication Critical patent/CN117575040A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a feature online development method and a system, wherein a plurality of data are acquired and form a data set; acquiring a plurality of feature sets based on the data sets; according to the multiple feature sets, displaying the multiple feature sets in a Web page in a form of a table, and carrying out box-dividing processing on continuous features in the multiple feature sets to form box-dividing features; in the case division features, taking the representative features as the main, combining other similar features into corresponding type features; the method comprises the steps of defining various types of features as editing resources, carrying out online development based on pages to construct a machine learning model, carrying out online processing in a Web page based on a plurality of feature sets, and carrying out box division processing on continuous features in the feature sets so as to optimize the continuous features, thereby reducing the data quantity of the features.

Description

Feature online development method and system
Technical Field
The invention relates to the technical field of feature development, in particular to a feature online development method and system.
Background
With the development of science and technology, the data is geometrically increased, a large amount of data is processed by the traditional feature engineering technology, and is manually selected and features are built, at this time, the large amount of data is not screened, so that the large amount of data easily contains different types of data, the large amount of data is indiscriminately combined, a machine learning model is formed, and the accuracy of the machine learning model is low.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a method and a system for on-line feature development.
In order to solve the above technical problems, an embodiment of the present invention provides a feature online development method, including:
acquiring a plurality of data, and forming a data set from the plurality of data;
acquiring a plurality of feature sets based on the data sets;
according to the multiple feature sets, displaying the multiple feature sets in a Web page in a form of a table, and carrying out box-dividing processing on continuous features in the multiple feature sets to form box-dividing features;
in the case division features, taking the representative features as the main, combining other similar features into corresponding type features;
and defining various types of characteristics as editing resources, and developing online on the basis of pages to construct a machine learning model.
Optionally, the acquiring a plurality of data and forming the plurality of data into a data set includes:
acquiring a plurality of data;
classifying the plurality of data and determining raw data and derivative data;
forming an original data set based on each original data, and forming a derivative data set based on each derivative data;
and forming a data set according to the original data set and the derivative data set, and associating the index information with the data set at the moment.
Optionally, the acquiring a plurality of feature sets based on the data set includes:
determining an original feature based on an original data set in the data sets; determining derived features based on derived data sets in the data sets;
correlating the original features with the derived features and forming a plurality of feature sets;
feature screening is carried out based on a plurality of feature sets so as to determine abnormal features;
locating the outlier features and optimizing the plurality of feature sets.
Optionally, the method includes the steps of displaying the plurality of feature sets in a Web page in a form of a table, and performing box-division processing on continuous features in the plurality of feature sets to form box-division features, including:
diagramming based on the plurality of feature sets, and forming a table from the plurality of feature sets;
the form is recorded on a Web page, and is edited based on the Web page, wherein the Web page adopts a paging design, each page comprises a feature set, and each feature set has own number and title;
traversing successive features in the plurality of feature sets in the table;
and carrying out box division processing according to the continuous features, at the moment, adopting equal-width box division or equal-frequency box division to divide the boxes so as to form each box division feature, and dividing the continuous features into discrete sections so as to reduce the number of features.
Optionally, the method includes the steps of displaying the plurality of feature sets in a Web page in a form of a table, and performing box-division processing on continuous features in the plurality of feature sets to form box-division features, and further includes:
acquiring the characteristics of each sub-box;
synchronously traversing each sub-box characteristic, and carrying out abnormality investigation based on each sub-box characteristic;
detecting characteristic contents in the sub-box characteristics in abnormal investigation of the sub-box characteristics;
the feature content is compared with the preset parameter content to determine an abnormal sub-feature, and the modification is performed based on the abnormal sub-feature.
Optionally, in the case division features, the representative features are mainly the other similar features are combined into corresponding type features, and the method includes:
traversing representative features in the binning features based on the model type;
taking the representative characteristic as a key characteristic;
synchronously comparing the representative features with the features, and determining the corresponding similarity;
comparing the similarity with a preset similarity to determine other similar characteristics;
other similar features are combined with the representative feature to form the corresponding type feature.
Optionally, in the case division feature, the representative feature is mainly the other similar features are combined into corresponding type features, and the method further includes:
optimizing the type features to obtain various type parameters in the type features;
sorting based on the parameters of each type, and sorting the parameters of each type according to the type of the model;
redefining the type of the type feature according to each type parameter and the corresponding priority.
Optionally, the defining each type of feature as an editing resource and developing online based on pages to construct a machine learning model includes:
acquiring various types of characteristics;
defining editing resources based on various types of characteristics, and editing aiming at the types of characteristics;
loading the type features to the page, and developing the type features on line based on the page;
in the online development process of the type features, a plurality of type features are trained and a machine learning model is built.
Optionally, the defining each type of feature as an editing resource and developing online based on the page to construct a machine learning model further includes:
acquiring a machine learning model;
defining learning parameters in a machine learning model;
positioning environmental factors of the machine learning model, and defining environmental parameters based on the environmental factors;
adding the environmental parameters to the learning parameters to further perform self-learning of the learning parameters;
and upgrading and iterating the machine learning model based on the learning parameters.
In addition, the embodiment of the invention also provides an online development system of the characteristics, which comprises the following steps:
the acquisition module is used for acquiring a plurality of data and forming a data set from the plurality of data;
the feature module is used for acquiring a plurality of feature sets based on the data sets;
the box dividing module is used for displaying the multiple feature sets in the Web page in a form of a table, and carrying out box dividing processing on continuous features in the multiple feature sets to form box dividing features;
the merging module is used for merging other similar features into corresponding type features in the box division features, taking the representative features as the main features;
and the construction module is used for defining various types of characteristics as editing resources and carrying out online development based on pages so as to construct a machine learning model.
In the embodiment of the invention, a plurality of data are acquired by the method in the embodiment of the invention, and the plurality of data are formed into a data set; acquiring a plurality of feature sets based on the data sets; according to the multiple feature sets, displaying the multiple feature sets in a Web page in a form of a table, and carrying out box-dividing processing on continuous features in the multiple feature sets to form box-dividing features; in the case division features, taking the representative features as the main, combining other similar features into corresponding type features; the method comprises the steps of defining various types of features as editing resources, carrying out online development based on pages to construct a machine learning model, carrying out online processing in a Web page based on a plurality of feature sets, and carrying out box division processing on continuous features in the feature sets so as to optimize the continuous features, thereby reducing the data quantity of the features.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow diagram of a feature online development method in an embodiment of the invention;
FIG. 2 is a schematic flow chart of S12 in the feature online development method in the embodiment of the invention;
FIG. 3 is a schematic flow chart of S13 in the feature online development method in the embodiment of the invention;
FIG. 4 is a schematic flow chart of S14 in the feature online development method in the embodiment of the invention;
FIG. 5 is a schematic flow chart of S15 in the feature online development method in the embodiment of the invention;
FIG. 6 is a schematic diagram of the structural composition of an online development system of features in an embodiment of the invention;
fig. 7 is a hardware diagram of an electronic device, according to an example embodiment.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1 to 7, an online feature development method includes:
s11, acquiring a plurality of data, and forming a data set from the plurality of data;
step S12, acquiring a plurality of feature sets based on the data sets;
step S13, displaying the continuous features in the feature sets in a form of a table in the Web page according to the feature sets, and carrying out box division processing on the continuous features in the feature sets to form box division features;
step S14, in the case division features, taking the representative features as the main features, and combining other similar features into corresponding type features;
and S15, defining various types of characteristics as editing resources, and developing online on the basis of pages to construct a machine learning model.
In the embodiment of the invention, a plurality of data are acquired by the method in the embodiment of the invention, and the plurality of data are formed into a data set; acquiring a plurality of feature sets based on the data sets; according to the multiple feature sets, displaying the multiple feature sets in a Web page in a form of a table, and carrying out box-dividing processing on continuous features in the multiple feature sets to form box-dividing features; in the case division features, taking the representative features as the main, combining other similar features into corresponding type features; the method comprises the steps of defining various types of features as editing resources, carrying out online development based on pages to construct a machine learning model, carrying out online processing in a Web page based on a plurality of feature sets, and carrying out box division processing on continuous features in the feature sets so as to optimize the continuous features, thereby reducing the data quantity of the features.
In the embodiment of the present application, S11: acquiring a plurality of data, and forming a data set from the plurality of data;
in an embodiment of the present application, a plurality of data are acquired; classifying the plurality of data and determining raw data and derivative data; forming an original data set based on each original data, and forming a derivative data set based on each derivative data; and forming a data set according to the original data set and the derivative data set, and associating the index information with the data set at the moment.
In the embodiment of the application, a plurality of data are acquired so as to be convenient for classifying the plurality of data, at the moment, the plurality of data are classified, and original data and derivative data are determined, at the moment, the original characteristic is data which is not subjected to any processing, can be in the form of texts, images and the like, and is the most basic data unit in the invention; the derived data is obtained by processing, converting and calculating original features, and the derived features comprise statistical indexes, feature engineering results, model prediction results and the like.
Secondly, forming an original data set based on each original data so as to facilitate further processing according to the original data set, and forming a derivative data set based on each derivative data; the data set is formed according to the original data set and the derivative data set, and therefore the data set contains the original data set and the derivative data set.
At this time, the index information is associated with the data set, so MySQL stores the index information of the features, and the advantage of this is that query optimization and transaction support of the relational database can be fully utilized, and meanwhile, the large-scale data storage and lateral expansion capacity of the non-relational database are utilized, so that the overall query efficiency and processing capacity are improved.
Optionally, the required data code number is input through the data interface, such as: ds_2001, ds_2002, ds_2003, the number of data samples, the degree of data missing (representing data integrity); the data service receives a data request of a data interface and initiates a query for acquiring a data index from MySQL; searching a required data index by MySQL, and returning the data index to the data service in an index set mode; the data service initiates a data query request to the HBase by using the data index searched from MySQL; the HBase searches the needed specific characteristic content and returns the specific characteristic content to the data service in the form of a characteristic set; and the data service filters the invalid value of the obtained feature set and returns the filtered invalid value to the data interface.
S12: acquiring a plurality of feature sets based on the data sets;
in an embodiment of the present application, the data set is further processed so as to obtain a plurality of feature sets based on the data set, so that subsequent binning processing is performed according to the plurality of feature sets.
In the implementation process of the invention, the specific steps can be as follows:
s121, determining original characteristics based on original data sets in the data sets; determining derived features based on derived data sets in the data sets;
s122, associating the original features with the derivative features, and forming a plurality of feature sets;
s123, feature screening is carried out based on a plurality of feature sets so as to determine abnormal features;
s124, positioning abnormal features and optimizing a plurality of feature sets.
In an embodiment of the present application, the original features are determined based on an original data set of the data sets; determining derived features based on derived data sets in the data sets, thereby defining the original features and the derived features, so as to correlate the original features and the derived features, and forming a plurality of feature sets, wherein the plurality of feature sets contain a plurality of features.
The feature screening is performed based on the feature sets to determine abnormal features, the abnormal features are determined through the abnormal screening, so that the abnormal features are further triggered, at the moment, the abnormal features are positioned, the feature sets are optimized, feature rationality of the feature sets is guaranteed, accuracy of subsequent development is guaranteed, meanwhile, the feature functions before the feature development, the feature quantity is reduced, and more refined classification features are obtained.
S13: according to the multiple feature sets, displaying the multiple feature sets in a Web page in a form of a table, and carrying out box-dividing processing on continuous features in the multiple feature sets to form box-dividing features;
in the embodiment of the application, the multiple feature sets are processed, and are displayed in the form of a table in the Web page, so that the online processing function of the Web page is utilized to facilitate the online processing of the multiple feature sets, and at the moment, the continuous features in the multiple feature sets are subjected to the box division processing to form the box division features.
In the implementation process of the invention, the specific steps can be as follows:
s131: diagramming based on the plurality of feature sets, and forming a table from the plurality of feature sets;
s132: the form is recorded on a Web page, and is edited based on the Web page, wherein the Web page adopts a paging design, each page comprises a feature set, and each feature set has own number and title;
s133: traversing successive features in the plurality of feature sets in the table;
s134: the method comprises the steps of carrying out box division according to continuous characteristics, at the moment, adopting equal-width box division or equal-frequency box division to divide the continuous characteristics into discrete intervals so as to reduce the number of the characteristics;
in the embodiment of the application, a plurality of feature sets are acquired so as to diagrammatize the plurality of feature sets, at this time, the plurality of feature sets are ordered according to the sequence, and meanwhile, the plurality of feature sets are formed into a table, so that diagrammatizing of the feature sets is completed.
And then, the form is recorded on the Web page, and the form is edited for the Web page by utilizing the function of online processing of the Web page so as to facilitate online processing for the feature set, wherein the Web page adopts a paging design, each page comprises a feature set, each feature set has own number and title, and meanwhile, the page adopts a paging design, each page comprises a feature set, and each feature set has own number and title. At the top of the page, there is a navigation bar that facilitates the user to quickly switch between different feature sets. At the bottom of the page, there is a page control that allows the user to quickly jump to the specified page.
Each feature set is shown in a table containing the following: feature names: the names of the features are listed. Characterization: the meaning and use of the features are briefly described. Data type: data types specifying the feature, such as integers, floating point numbers, strings, etc. Missing value processing: the way the feature handles missing values, such as filling, ignoring, etc., is described. Feature importance: showing the degree of importance of the feature to the target variable. Drawing example: and showing a drawing example of the features, and helping a user to intuitively understand the meaning and distribution of the features. In the form, some operation buttons are included, such as sorting, filtering, exporting, etc., so that the user can operate the data, at this time, at the top or bottom of the page, there is a text description, and the meaning, application scenario, and usage notice of the feature set are briefly introduced. In addition, there are some common problem solutions and related links that help users better understand and use the feature set.
Finally, traversing the continuous features in the feature sets in the table; and carrying out box division processing according to the continuous features, at the moment, adopting equal-width box division or equal-frequency box division to form each box division feature, and dividing the continuous features into discrete intervals, thereby reducing the number of features so as to ensure the subsequent development processing.
Optionally, the continuous features are binned by using methods such as equal-width binning, equal-frequency binning and the like, and the continuous values are divided into discrete intervals, so that the number of features is reduced. For example, we define the frequency range of user bank card spending as 0 to 10, divided into 12 bins according to the month of consumption, and the width of each bin calculated from the range of data records and the number of bins, with the formula (maximum frequency-minimum frequency)/12. Next we need to calculate the boundary of each bin, subtracting the bin width times the bin number from the minimum value of the bin, which is the difference of the bin width times the number. Finally we map the data into the corresponding bins based on the value of each data point. So far, tens of thousands to hundreds of thousands of different data are finally mapped into 12-segment binning data, so that the total data quantity is greatly reduced.
The method comprises the steps of displaying a plurality of feature sets in a Web page in a form of a table, carrying out box division processing on continuous features in the feature sets to form box division features, and further comprising: acquiring the characteristics of each sub-box; synchronously traversing each sub-box characteristic, and carrying out abnormality investigation based on each sub-box characteristic; detecting characteristic contents in the sub-box characteristics in abnormal investigation of the sub-box characteristics; based on the characteristic content and the preset parameter content, the abnormal sub-characteristic is determined, and the abnormal sub-characteristic is modified, so that the accuracy of the box division characteristic is ensured, the box division characteristic is subjected to deep optimization, and the development processing is fully managed and controlled step by step.
S14: in the case division features, taking the representative features as the main, combining other similar features into corresponding type features;
in the embodiment of the application, similar features are divided into clusters by using a clustering method (hierarchical clustering), representative features are selected, other similar features are combined into one feature, and the number of features is further reduced, so that the representative features are mainly used, and other similar features are combined into corresponding types of features.
In the implementation process of the invention, the specific steps can be as follows:
s141: traversing representative features in the binning features based on the model type;
s142: taking the representative characteristic as a key characteristic;
s143: synchronously comparing the representative features with the features, and determining the corresponding similarity;
s144: comparing the similarity with a preset similarity to determine other similar characteristics;
s145: other similar features are combined with the representative feature to form the corresponding type feature.
In the embodiment of the application, in the case division features, traversing the representative features in the case division features based on the types of the models so as to guide the representative features based on the types of the models, thereby positioning the representative features so as to facilitate subsequent feature merging by using the representative features, wherein the representative features are taken as key features; synchronously comparing the representative features with the features, and determining the corresponding similarity; other similar features are determined based on the similarity compared to the preset similarity. Optionally, when the similarity is greater than the preset similarity, other similar features are located so that the other similar features are combined with the representative features.
Therefore, the representative features are mainly used, and other similar features are combined to form corresponding type features, so that the number of the features is further reduced, and the feature set is optimized. For example, we need to extract the consumer preference of the user from all the goods purchased by the user. The commodity classification is defined as 10 classes in advance, and the data is preprocessed to remove abnormal values and fill up missing values. And calculating the distances among different commodities by using a Manhattan distance measurement mode, combining the commodities according to the distances by using a minimum variance after calculating the distances of the commodities, and aggregating the commodity favorites of the end user into 10 types.
In addition, an online code Editor based on a browser is adopted, and an ACE-Editor is used as a tool, so that a convenient code writing environment is provided for a user. This editor supports the Groovy scripting language, enabling users to write and debug feature development code directly in the browser. More specifically, it uses the data acquired in the merged feature step S30 as a programming resource, providing an efficient way for feature engineering teams to develop and optimize custom features.
The model module uses GBDT algorithm to improve the accuracy of the features by reducing variance between the classifiers and to improve the stability of the features by reducing bias. Finally, the model evaluation is stored in a data storage module in a regression mode along with the developed features, and the model evaluation is used for future feature development and is recorded as derivative features.
In the case division features, the representative features are mainly used, and other similar features are combined into corresponding type features, and the case division features further comprise: optimizing the type features to obtain various type parameters in the type features; sorting based on the parameters of each type, and sorting the parameters of each type according to the type of the model; and redefining the types of the type features according to the type parameters and the corresponding priorities, and fully utilizing the type parameters and the corresponding priorities so as to redefine the types of the type features, thereby ensuring the accuracy of the type features.
S15: defining various types of characteristics as editing resources, and developing online based on pages to construct a machine learning model;
in the embodiment of the application, on-line processing is performed in the Web page based on the plurality of feature sets, and meanwhile, continuous features in the plurality of feature sets are subjected to box division processing so as to optimize the continuous features, thereby reducing the data quantity of the features, therefore, various types of features are developed on line based on the page to construct a machine learning model, and the accuracy of the machine learning model is improved.
In the implementation process of the invention, the specific steps can be as follows:
s151: acquiring various types of characteristics;
s152: defining editing resources based on various types of characteristics, and editing aiming at the types of characteristics;
s153: loading the type features to the page, and developing the type features on line based on the page;
s154: in the online development process of the type features, a plurality of type features are trained and a machine learning model is built.
In the embodiment of the application, various types of features are acquired, the various types of features are defined as editing resources so as to edit the type features, so that online processing of the type features is guaranteed, at the moment, the type features are loaded to pages, online development is performed on the type features based on the pages, in the online development process of the type features, a plurality of types of features are trained, and a machine learning model is built, therefore, the various types of features are online developed based on the pages, the machine learning model is built, and the accuracy of the machine learning model is improved.
Meanwhile, the on-line feature development method can automatically extract useful features in the data set so as to construct an accurate machine learning model, so that the performance and accuracy of the model can be improved, and a data scientist and a machine learning expert are helped to construct a more accurate machine learning model. Meanwhile, the feature online development method can automate the feature development process, thereby reducing the time and workload of manually selecting and constructing features. This may allow data scientists and machine learning professionals to build accurate machine learning models faster, thereby improving work efficiency.
In addition, the on-line feature development method can automatically extract useful features, so that the machine learning process is simplified. This may allow data scientists and machine learning professionals to focus more on building and evaluating machine learning models, thereby improving work efficiency and accuracy.
The on-line feature development method can automatically extract useful features, thereby improving the expandability of machine learning. This may allow data scientists and machine learning professionals to more easily build and evaluate machine learning models, thereby improving the efficiency and accuracy of machine learning.
The defining of various types of characteristics as editing resources and online development based on pages to construct a machine learning model further comprises: acquiring a machine learning model; defining learning parameters in a machine learning model; positioning environmental factors of the machine learning model, and defining environmental parameters based on the environmental factors; adding the environmental parameters to the learning parameters to further perform self-learning of the learning parameters; and upgrading iteration is performed on the machine learning model based on the learning parameters, so that the machine learning model is convenient to upgrade autonomously, and meanwhile, the environment parameters and the learning parameters are further associated.
In the embodiment of the invention, a plurality of data are acquired by the method in the embodiment of the invention, and the plurality of data are formed into a data set; acquiring a plurality of feature sets based on the data sets; according to the multiple feature sets, displaying the multiple feature sets in a Web page in a form of a table, and carrying out box-dividing processing on continuous features in the multiple feature sets to form box-dividing features; in the case division features, taking the representative features as the main, combining other similar features into corresponding type features; the method comprises the steps of defining various types of features as editing resources, carrying out online development based on pages to construct a machine learning model, carrying out online processing in a Web page based on a plurality of feature sets, and carrying out box division processing on continuous features in the feature sets so as to optimize the continuous features, thereby reducing the data quantity of the features.
Examples
Referring to fig. 6, fig. 6 is a schematic structural diagram of an online development system of features in an embodiment of the invention.
As shown in fig. 6, an online development system of a feature, the online development system of a feature comprising:
an acquisition module 21, configured to acquire a plurality of data, and form a data set from the plurality of data;
a feature module 22 for acquiring a plurality of feature sets based on the data sets;
a binning module 23, configured to display the multiple feature sets in a Web page in a form of a table, and perform binning on consecutive features in the multiple feature sets to form respective binning features;
a merging module 24, configured to merge other similar features into corresponding type features, where the representative features are the main features in the case division features;
a construction module 25, configured to define each type of feature as an editing resource, and perform online development based on the page to construct a machine learning model.
Example 2
Referring to fig. 7, an electronic device 40 according to this embodiment of the present invention is described below with reference to fig. 7. The electronic device 40 shown in fig. 7 is only an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 7, the electronic device 40 is in the form of a general purpose computing device. Components of electronic device 40 may include, but are not limited to: the at least one processing unit 41, the at least one memory unit 42, a bus 43 connecting the different system components, including the memory unit 42 and the processing unit 41.
Wherein the storage unit stores program code that is executable by the processing unit 41 such that the processing unit 41 performs the steps according to various exemplary embodiments of the present invention described in the above-described "example methods" section of the present specification.
The memory unit 42 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 421 and/or cache memory 422, and may further include Read Only Memory (ROM) 423.
The storage unit 42 may also include a program/utility 424 having a set (at least one) of program modules 425, such program modules 425 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The bus 43 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
Electronic device 40 may also communicate with one or more external devices (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with electronic device 40, and/or any device (e.g., router, modem, etc.) that enables electronic device 40 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 44. Also, electronic device 40 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, via network adapter 45. As shown in fig. 7, the network adapter 45 communicates with other modules of the electronic device 40 over the bus 43. It should be appreciated that although not shown in fig. 7, other hardware and/or software modules may be used in connection with electronic device 40, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup planning systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like. And which stores computer program instructions which, when executed by a computer, cause the computer to perform a method according to the above.
In addition, the above detailed description of the on-line development method and system of the features provided by the embodiment of the present invention should be taken to describe the principles and embodiments of the present invention by specific examples, and the description of the above embodiments is only for helping to understand the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (10)

1. The on-line feature development method is characterized by comprising the following steps of:
acquiring a plurality of data, and forming a data set from the plurality of data;
acquiring a plurality of feature sets based on the data sets;
according to the multiple feature sets, displaying the multiple feature sets in a Web page in a form of a table, and carrying out box-dividing processing on continuous features in the multiple feature sets to form box-dividing features;
in the case division features, taking the representative features as the main, combining other similar features into corresponding type features;
and defining various types of characteristics as editing resources, and developing online on the basis of pages to construct a machine learning model.
2. The method of claim 1, wherein the acquiring the plurality of data and forming the plurality of data into the data set comprises:
acquiring a plurality of data;
classifying the plurality of data and determining raw data and derivative data;
forming an original data set based on each original data, and forming a derivative data set based on each derivative data;
and forming a data set according to the original data set and the derivative data set, and associating the index information with the data set at the moment.
3. The method for online feature development according to claim 2, wherein the acquiring a plurality of feature sets based on the data sets includes:
determining an original feature based on an original data set in the data sets; determining derived features based on derived data sets in the data sets;
correlating the original features with the derived features and forming a plurality of feature sets;
feature screening is carried out based on a plurality of feature sets so as to determine abnormal features;
locating the outlier features and optimizing the plurality of feature sets.
4. The method for online feature development according to claim 1, wherein the step of presenting the plurality of feature sets in a Web page in a form of a table and performing a binning process on consecutive features in the plurality of feature sets to form respective binned features comprises:
diagramming based on the plurality of feature sets, and forming a table from the plurality of feature sets;
the form is recorded on a Web page, and is edited based on the Web page, wherein the Web page adopts a paging design, each page comprises a feature set, and each feature set has own number and title;
traversing successive features in the plurality of feature sets in the table;
and carrying out box division processing according to the continuous features, at the moment, adopting equal-width box division or equal-frequency box division to divide the boxes so as to form each box division feature, and dividing the continuous features into discrete sections so as to reduce the number of features.
5. The method for online feature development according to claim 4, wherein the feature set is presented in a Web page in a form of a table, and consecutive features in the feature set are binned to form binned features, further comprising:
acquiring the characteristics of each sub-box;
synchronously traversing each sub-box characteristic, and carrying out abnormality investigation based on each sub-box characteristic;
detecting characteristic contents in the sub-box characteristics in abnormal investigation of the sub-box characteristics;
the feature content is compared with the preset parameter content to determine an abnormal sub-feature, and the modification is performed based on the abnormal sub-feature.
6. The method for on-line feature development according to claim 5, wherein in the case-division features, other similar features are combined into corresponding type features based on representative features, and the method comprises:
traversing representative features in the binning features based on the model type;
taking the representative characteristic as a key characteristic;
synchronously comparing the representative features with the features, and determining the corresponding similarity;
comparing the similarity with a preset similarity to determine other similar characteristics;
other similar features are combined with the representative feature to form the corresponding type feature.
7. The method for on-line feature development according to claim 6, wherein in the case-division features, other similar features are combined into corresponding type features based on representative features, and further comprising:
optimizing the type features to obtain various type parameters in the type features;
sorting based on the parameters of each type, and sorting the parameters of each type according to the type of the model;
redefining the type of the type feature according to each type parameter and the corresponding priority.
8. The method for online development of features according to claim 7, wherein defining each type of feature as an edit resource and developing online based on pages to build a machine learning model comprises:
acquiring various types of characteristics;
defining editing resources based on various types of characteristics, and editing aiming at the types of characteristics;
loading the type features to the page, and developing the type features on line based on the page;
in the online development process of the type features, a plurality of type features are trained and a machine learning model is built.
9. The method for online development of features according to claim 8, wherein the defining each type of feature as an edit resource and online developing based on pages to construct a machine learning model further comprises:
acquiring a machine learning model;
defining learning parameters in a machine learning model;
positioning environmental factors of the machine learning model, and defining environmental parameters based on the environmental factors;
adding the environmental parameters to the learning parameters to further perform self-learning of the learning parameters;
and upgrading and iterating the machine learning model based on the learning parameters.
10. An online development system of a feature, wherein the online development system of a feature is applied to the online development method of a feature as claimed in any one of claims 1 to 9, the online development system of a feature comprising:
the acquisition module is used for acquiring a plurality of data and forming a data set from the plurality of data;
the feature module is used for acquiring a plurality of feature sets based on the data sets;
the box dividing module is used for displaying the multiple feature sets in the Web page in a form of a table, and carrying out box dividing processing on continuous features in the multiple feature sets to form box dividing features;
the merging module is used for merging other similar features into corresponding type features in the box division features, taking the representative features as the main features;
and the construction module is used for defining various types of characteristics as editing resources and carrying out online development based on pages so as to construct a machine learning model.
CN202311509330.9A 2023-11-10 2023-11-10 Feature online development method and system Pending CN117575040A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311509330.9A CN117575040A (en) 2023-11-10 2023-11-10 Feature online development method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311509330.9A CN117575040A (en) 2023-11-10 2023-11-10 Feature online development method and system

Publications (1)

Publication Number Publication Date
CN117575040A true CN117575040A (en) 2024-02-20

Family

ID=89894643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311509330.9A Pending CN117575040A (en) 2023-11-10 2023-11-10 Feature online development method and system

Country Status (1)

Country Link
CN (1) CN117575040A (en)

Similar Documents

Publication Publication Date Title
US10558629B2 (en) Intelligent data quality
CN110292775B (en) Method and device for acquiring difference data
CN108197132A (en) A kind of electric power asset portrait construction method and device based on chart database
CN111260073A (en) Data processing method, device and computer readable storage medium
Xu et al. Framework of a product lifecycle costing system
CN112668968A (en) Storage management modeling method and system based on domain-driven design
KR102207104B1 (en) Method for determining target company to be invested regarding a topic of interest and apparatus thereof
US11620453B2 (en) System and method for artificial intelligence driven document analysis, including searching, indexing, comparing or associating datasets based on learned representations
US20210073655A1 (en) Rule mining for rule and logic statement development
CN113139141A (en) User label extension labeling method, device, equipment and storage medium
Fernandez et al. Robotic process automation: bibliometric reflection and future opportunities
US8494895B1 (en) Platform maturity analysis system
JP5803469B2 (en) Prediction method and prediction program
CN116881476A (en) Knowledge graph construction method, platform and computer storage medium
CN111427976A (en) Method and device for acquiring road freshness
CN117575040A (en) Feature online development method and system
CN105824976A (en) Method and device for optimizing word segmentation banks
CN107430633A (en) The representative content through related optimization being associated to data-storage system
CN113610225A (en) Quality evaluation model training method and device, electronic equipment and storage medium
RU2602783C2 (en) Managing versions of cases
Miłek et al. Comparative GIS analysis using taxonomy and classification techniques
JP2021152751A (en) Analysis support device and analysis support method
CN110399337A (en) File automating method of servicing and system based on data-driven
CN114997001B (en) Complex electromechanical equipment performance evaluation method based on substitution model and knowledge graph
Al Riyami et al. Petroleum Development Oman Forecasting Management System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination