CN116628465B - Feature selection method based on screening machine learning user - Google Patents

Feature selection method based on screening machine learning user Download PDF

Info

Publication number
CN116628465B
CN116628465B CN202310596434.1A CN202310596434A CN116628465B CN 116628465 B CN116628465 B CN 116628465B CN 202310596434 A CN202310596434 A CN 202310596434A CN 116628465 B CN116628465 B CN 116628465B
Authority
CN
China
Prior art keywords
user
user characteristic
data
characteristic data
selection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310596434.1A
Other languages
Chinese (zh)
Other versions
CN116628465A (en
Inventor
阮宁
曹天赐
倪萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Shunying Data Technology Co ltd
Henan Normal University
Original Assignee
Henan Shunying Data Technology Co ltd
Henan Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan Shunying Data Technology Co ltd, Henan Normal University filed Critical Henan Shunying Data Technology Co ltd
Priority to CN202310596434.1A priority Critical patent/CN116628465B/en
Publication of CN116628465A publication Critical patent/CN116628465A/en
Application granted granted Critical
Publication of CN116628465B publication Critical patent/CN116628465B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a feature selection method based on screening machine learning users, and belongs to the technical field of machine learning. The invention comprises the following steps: collecting user registration information in real time according to a user registration window, and acquiring user characteristic data in real time based on the user registration information; preprocessing the user characteristic data acquired in real time to determine the user characteristic data with distribution characteristics; and analyzing and evaluating the user characteristic data with the distribution characteristics to determine a corresponding analysis and evaluation report. The invention solves the problem that the screening precision is low and the user feature selection effect is poor when the user feature selection is performed based on the machine learning in the prior art.

Description

Feature selection method based on screening machine learning user
Technical Field
The invention relates to the technical field of machine learning, in particular to a feature selection method based on screening machine learning users.
Background
Machine learning is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance.
Chinese patent publication No. CN114970312a discloses a method and apparatus for screening machine learning features, which includes a mounting base, an equipment box, a detection screening computer, a keyboard, a data interface, a data connection line, a support frame, a fixing frame and a display screen; the method is simple and convenient to operate, a plurality of groups of machines are conveniently and simultaneously detected, learning characteristics of the machines are intuitively known, surplus resources are reduced, and the use accuracy of the machines is improved; however, the above patent has the following drawbacks in practical use:
at present, when user feature selection is performed based on machine learning, the screening precision is low, so that the user feature selection effect is poor.
Disclosure of Invention
The invention aims to provide a feature selection method based on screening machine learning users, which can improve screening precision and user feature selection effect when user feature selection is performed based on machine learning, and solves the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions:
A feature selection method based on screening machine learning users comprises the following steps:
S1: collecting user registration information in real time according to a user registration window, and acquiring user characteristic data in real time based on the user registration information, wherein the user characteristic data comprises, but is not limited to, user name characteristics, user gender characteristics, user age characteristics, user region characteristics, user preference characteristics, user payment characteristics and user social characteristics;
S2: preprocessing the user characteristic data acquired in real time, converting the user characteristic data acquired in real time into a form which can be received by a processor based on the selection requirement of the machine learning user characteristic, and performing filtering, noise reduction and sequencing on the converted user characteristic data to determine the user characteristic data with distributed characteristics;
S3: analyzing and evaluating the user characteristic data with the distribution characteristics to obtain the user characteristic data with the distribution characteristics, training the user characteristic model on the basis of the linear model, determining a user characteristic selection model on the basis of the linear model, analyzing and evaluating the user characteristic data with the distribution characteristics on the basis of the user characteristic selection model, and determining a corresponding analysis and evaluation report;
S4: and selecting and controlling the user characteristic data with the distributed characteristics, acquiring an analysis evaluation report based on a user characteristic selection model, determining a user characteristic selection strategy based on the analysis evaluation report based on a data mining technology and a correlation analysis method, and intelligently selecting the user characteristic data according to the user characteristic selection strategy.
Preferably, in the step S1, the user registration information is collected in real time according to the user registration window, and the following operations are performed:
acquiring a user registration window;
The user inputs user registration information corresponding to the specific prompt including, but not limited to, user name characteristics, user gender characteristics, user age characteristics, user region characteristics, user preference characteristics, user payment characteristics and user social characteristics in the user registration window according to the specific prompt of the user registration window;
After the user registration information is input, the user registration information is acquired in real time, and the acquired user registration information is extracted to determine the user characteristic data.
Preferably, in the step S2, the user feature data acquired in real time is preprocessed, and the following operations are executed:
acquiring user characteristic data, and preprocessing the user characteristic data;
Based on the machine learning user feature selection requirement, converting the user feature data acquired in real time into a form which can be received by a processor;
obtaining converted user characteristic data, and filtering the user characteristic data;
Based on the data filtering model, identifying user characteristic data by adopting a statistical method, filtering out user characteristic data which is useless for machine learning user characteristic selection, and retaining user characteristic data which is useful for machine learning user characteristic selection;
Acquiring filtered user characteristic data, and carrying out noise reduction treatment on the user characteristic data;
Processing the user characteristic data based on a data noise reduction method, removing noise characteristic data in the user characteristic data, and determining user characteristic data without the noise characteristic data;
acquiring user characteristic data after noise reduction, and sequencing the user characteristic data;
and based on an internal sorting method, the user characteristic data of the determined noiseless characteristic data are effectively sorted, and the user characteristic data with distributed characteristics are determined.
Preferably, converting the user characteristic data acquired in real time into a form receivable by the processor includes:
setting a unit time length, wherein the value range of the unit time length is 3s-5s;
monitoring the data volume of the user characteristic data acquired in unit time in real time;
Acquiring types of receivable forms of the processors corresponding to the user feature data based on machine learning user feature selection requirements, and taking the types of the receivable forms of each processor as target data forms;
extracting the maximum data conversion amount capable of converting the user characteristic data into each target data form in unit time;
acquiring fit parameters between the user characteristic data and the processor according to the maximum data conversion quantity of each target data form; the fit parameters are obtained through the following formula:
Wherein W represents a fitting parameter; c represents the average value of the data amount of the user characteristic data generated in unit time; c zi represents the maximum data conversion amount corresponding to the ith target data form in unit time; n represents the number of kinds of target data forms; λ represents an adjustment coefficient; w 0 represents the fitting parameter reference constant value; m represents the number of unit time currently experienced; c j denotes the data amount of the user characteristic data generated in the jth unit time;
And determining a data conversion time interval corresponding to each target data form according to the fit parameters.
Preferably, determining a data conversion time interval corresponding to each target data form according to the fit parameters includes:
extracting fit parameters between the user characteristic data and the processor;
And setting the fit parameters through a configuration model to determine a data conversion time interval corresponding to each target data form, wherein the configuration model is as follows:
Wherein, T i represents a data conversion time interval corresponding to the ith target data form; t d represents the corresponding time length of the unit time; c zi represents the maximum data conversion amount corresponding to the ith target data form in unit time; c represents the average value of the data amount of the user characteristic data generated in unit time; w represents a fitting parameter; w 0 denotes a fitting parameter reference constant value.
Preferably, in the step S3, the user feature data with the distribution feature is analyzed and evaluated, and the following operations are performed:
acquiring user characteristic data with distribution characteristics;
Training a user characteristic model based on the linear model for the user characteristic data, and determining a user characteristic selection model based on the linear model;
Based on the user feature selection model, analyzing and evaluating the user feature data with the distributed features, and determining a corresponding analysis and evaluation report;
Aiming at the situation that the user characteristic data with the distributed characteristics contains redundant user characteristics, the determined analysis and evaluation report is that the user characteristic data needs to be subjected to machine learning-based user characteristic selection processing;
For the case that the user feature data with distributed features does not contain redundant user features, the determined analysis evaluation reports that no machine learning-based user feature selection processing is required for the user feature data.
Preferably, in the step S4, the following operations are performed to selectively control the user feature data with the distribution feature:
Acquiring an analysis evaluation report based on a user feature selection model;
Aiming at the situation that the analysis evaluation report is that the user characteristic data needs to be subjected to machine learning-based user characteristic selection processing, determining a user characteristic selection strategy based on the analysis evaluation report based on a data mining technology and an association analysis method, and intelligently selecting the user characteristic data according to the user characteristic selection strategy.
Preferably, in the step S4, the following operations are performed to determine a user feature selection policy based on the analysis evaluation report:
Acquiring an analysis evaluation report based on a user feature selection model;
Based on a data mining technology and a correlation analysis method, carrying out deep analysis on user characteristic data with distributed characteristics, and determining redundant user characteristics in the user characteristic data;
the method comprises the steps of obtaining redundant user characteristics, effectively sequencing the redundant user characteristics based on an internal sequencing method, and determining a redundant user characteristic set based on machine learning user characteristic selection requirements;
and carrying out correlation analysis on the redundant user feature sets, determining a user feature selection strategy based on the redundant user feature sets, and carrying out intelligent selection on user feature data according to the user feature selection strategy.
Preferably, in the step S4, the user feature data is intelligently selected according to a user feature selection policy, and the following operations are executed:
acquiring a user feature selection strategy based on the redundant user feature set, and intelligently selecting redundant user features in the user feature data based on the user feature selection strategy;
Extracting user characteristic data one by one, carrying out correlation analysis on the user characteristic data, determining redundant user characteristics with high correlation, carrying out intelligent selection on the user characteristic data containing the redundant user characteristics with high correlation based on a method for eliminating data redundancy, and removing the redundant user characteristics in the user characteristic data.
Preferably, in the step S4, after the redundant user features are removed, the following operations are further performed:
Acquiring user characteristic data from which redundant user characteristics are removed;
Comprehensively analyzing the user characteristic data, and checking whether the user characteristic data contains low-value user characteristics;
Aiming at the situation that the user characteristic data contains low-value user characteristics, removing the low-value user characteristics contained in the user characteristic data, and reserving high-value user characteristics;
and for the case that the user characteristic data does not contain low-value user characteristics, the user characteristic data is fully reserved.
Compared with the prior art, the invention has the beneficial effects that:
1. The invention acquires user registration information in real time according to the user registration window, acquires user characteristic data in real time based on the user registration information, preprocesses the user characteristic data acquired in real time, converts the user characteristic data acquired in real time into a form which can be received by a processor based on machine learning user characteristic selection requirements, and carries out filtering, noise reduction and sorting processing on the converted user characteristic data to determine the user characteristic data with distributed characteristics.
2. According to the invention, the user characteristic data with distributed characteristics is obtained, the user characteristic model training is carried out on the user characteristic data based on the linear model, the user characteristic selection model based on the linear model is determined, the user characteristic data with distributed characteristics is analyzed and evaluated based on the user characteristic selection model, the corresponding analysis and evaluation report is determined, the user characteristic selection strategy based on the analysis and evaluation report is determined based on the data mining technology and the association analysis method, the user characteristic data is intelligently selected according to the user characteristic selection strategy, and the screening precision and the user characteristic selection effect can be improved when the user characteristic selection is carried out based on machine learning.
Drawings
FIG. 1 is a flow chart of a feature selection method based on a filtering machine learning user of the present invention;
FIG. 2 is a flowchart of an algorithm for analyzing, evaluating, selecting and controlling user feature data according to the present invention;
FIG. 3 is an algorithmic flow chart of the low value user feature processing contained in the user feature data of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order to solve the problem that when the user feature selection is performed based on machine learning, the screening precision is low, resulting in poor user feature selection effect, referring to fig. 1-3, the present embodiment provides the following technical solutions:
A feature selection method based on screening machine learning users comprises the following steps:
S1: collecting user registration information in real time according to a user registration window, and acquiring user characteristic data in real time based on the user registration information, wherein the user characteristic data comprises, but is not limited to, user name characteristics, user gender characteristics, user age characteristics, user region characteristics, user preference characteristics, user payment characteristics and user social characteristics;
S2: preprocessing the user characteristic data acquired in real time, converting the user characteristic data acquired in real time into a form which can be received by a processor based on the selection requirement of the machine learning user characteristic, and performing filtering, noise reduction and sequencing on the converted user characteristic data to determine the user characteristic data with distributed characteristics;
S3: analyzing and evaluating the user characteristic data with the distribution characteristics to obtain the user characteristic data with the distribution characteristics, training the user characteristic model on the basis of the linear model, determining a user characteristic selection model on the basis of the linear model, analyzing and evaluating the user characteristic data with the distribution characteristics on the basis of the user characteristic selection model, and determining a corresponding analysis and evaluation report;
S4: and selecting and controlling the user characteristic data with the distributed characteristics, acquiring an analysis evaluation report based on a user characteristic selection model, determining a user characteristic selection strategy based on the analysis evaluation report based on a data mining technology and a correlation analysis method, and intelligently selecting the user characteristic data according to the user characteristic selection strategy.
In S1, user registration information is acquired in real time according to a user registration window, and the following operations are executed:
acquiring a user registration window;
The user inputs user registration information corresponding to the specific prompt including, but not limited to, user name characteristics, user gender characteristics, user age characteristics, user region characteristics, user preference characteristics, user payment characteristics and user social characteristics in the user registration window according to the specific prompt of the user registration window;
After the user registration information is input, the user registration information is acquired in real time, and the acquired user registration information is extracted to determine the user characteristic data.
S2, preprocessing the user characteristic data acquired in real time, and executing the following operations:
acquiring user characteristic data, and preprocessing the user characteristic data;
Based on the machine learning user feature selection requirement, converting the user feature data acquired in real time into a form which can be received by a processor;
obtaining converted user characteristic data, and filtering the user characteristic data;
Based on the data filtering model, identifying user characteristic data by adopting a statistical method, filtering out user characteristic data which is useless for machine learning user characteristic selection, and retaining user characteristic data which is useful for machine learning user characteristic selection;
Acquiring filtered user characteristic data, and carrying out noise reduction treatment on the user characteristic data;
Processing the user characteristic data based on a data noise reduction method, removing noise characteristic data in the user characteristic data, and determining user characteristic data without the noise characteristic data;
acquiring user characteristic data after noise reduction, and sequencing the user characteristic data;
and based on an internal sorting method, the user characteristic data of the determined noiseless characteristic data are effectively sorted, and the user characteristic data with distributed characteristics are determined.
Specifically, converting the user characteristic data acquired in real time into a form receivable by a processor includes:
setting a unit time length, wherein the value range of the unit time length is 3s-5s;
monitoring the data volume of the user characteristic data acquired in unit time in real time;
Acquiring types of receivable forms of the processors corresponding to the user feature data based on machine learning user feature selection requirements, and taking the types of the receivable forms of each processor as target data forms;
extracting the maximum data conversion amount capable of converting the user characteristic data into each target data form in unit time;
acquiring fit parameters between the user characteristic data and the processor according to the maximum data conversion quantity of each target data form; the fit parameters are obtained through the following formula:
Wherein W represents a fitting parameter; c represents the average value of the data amount of the user characteristic data generated in unit time; c zi represents the maximum data conversion amount corresponding to the ith target data form in unit time; n represents the number of kinds of target data forms; λ represents an adjustment coefficient; w 0 represents the fitting parameter reference constant value; m represents the number of unit time currently experienced; c j denotes the data amount of the user characteristic data generated in the jth unit time;
And determining a data conversion time interval corresponding to each target data form according to the fit parameters.
The technical scheme has the effects that: the technical scheme aims to realize real-time monitoring and processing of the user characteristic data and convert the data into a form which can be received by a processor according to the requirements of machine learning user characteristic selection. By setting the unit time length, the time interval of data processing can be controlled. Through the technical scheme, the data volume of the user characteristic data acquired in unit time can be monitored in real time, and corresponding processing is performed. This facilitates real-time response to user characteristic changes and needs. By acquiring the types of the forms which can be received by the processor corresponding to the user characteristic data based on the requirement of machine learning user characteristic selection, data conversion can be performed according to different processor requirements, and the data can be received and processed by the processor. And the fitting parameters between the user characteristic data and the processor are determined by extracting the maximum data conversion amount capable of converting the user characteristic data into each target data form in unit time, so that the efficiency and the performance of data conversion are improved. The data conversion time interval corresponding to each target data form is determined according to the fit parameters, so that the time interval of data processing can be reasonably arranged, the requirements of a processor are met, and the real-time performance is maintained.
Meanwhile, the fit parameter W can be adjusted according to the actual data amount, so as to ensure the adaptability between the data amount average value C of the user characteristic data generated in unit time and the maximum data conversion amount C zi of each target data form. By adjusting the fit parameters, the data conversion speed can be increased when the data volume is larger, and the conversion speed can be reduced when the data volume is smaller, so that the data processing requirements under different conditions can be met.
By introducing the adjustment coefficient lambda, the fit parameters can be flexibly adjusted to further optimize the effect of data conversion. The adjustment coefficient λ may be set according to specific requirements, for example, increasing λ may increase the data conversion speed, and decreasing λ may decrease the data conversion speed, so as to achieve better data processing performance.
The fitting parameter reference constant value W0 and the number of unit times currently elapsed m may be combined, taking into account the influence of the history information on the fitting parameter. The baseline constant value W 0 may provide an initial reference point, and the number of unit times currently elapsed m may reflect the history of the data processing. By comprehensively considering the reference constant value and the history information, the fitting parameters can be adjusted to adapt to the dynamic changes of the data processing.
In summary, by setting the fit parameters according to the above factors, the adaptability, flexibility and optimality of data conversion can be realized, so as to better meet the requirement of data processing, and consider the influence of historical information on data conversion.
Specifically, determining the data conversion time interval corresponding to each target data form according to the fit parameters includes:
extracting fit parameters between the user characteristic data and the processor;
And setting the fit parameters through a configuration model to determine a data conversion time interval corresponding to each target data form, wherein the configuration model is as follows:
Wherein, T i represents a data conversion time interval corresponding to the ith target data form; t d represents the corresponding time length of the unit time; c zi represents the maximum data conversion amount corresponding to the ith target data form in unit time; c represents the average value of the data amount of the user characteristic data generated in unit time; w represents a fitting parameter; w 0 denotes a fitting parameter reference constant value.
The technical scheme has the effects that: extracting fit parameters between the user characteristic data and the processor may help optimize the process of data transformation. According to the setting of the fit parameters, the speed and the efficiency of data conversion can be adjusted so as to meet the receiving and processing requirements of the processor on the data to the greatest extent. By optimizing the data conversion process, the efficiency and performance of data processing can be improved.
Setting fitting parameters through a configuration model, and determining a data conversion time interval corresponding to each target data form according to the fitting parameters. This allows for a reasonable time interval arrangement during the data conversion process to ensure timely data transmission and processing. By determining the appropriate data conversion time interval, the instantaneity of data processing and the utilization of system resources can be balanced.
By extracting and configuring the fit parameters, the stability and compatibility of the system can be improved. The fit parameters may be adjusted according to the performance of the processor and the data processing requirements to ensure that the data conversion process does not exceed the load capacity of the processor. This helps to avoid problems such as system crashes, data loss, or processing delays, and to improve reliability and stability of the system. Therefore, by extracting the fit parameters between the user characteristic data and the processor and setting these parameters by configuring the model, the data conversion process can be optimized, the data conversion time interval can be determined, and the stability and compatibility of the system can be improved. Specific technical effects also need to be evaluated and verified according to the details of the actual implementation and application.
Meanwhile, the speed of data conversion can be controlled by setting the data conversion time interval corresponding to each target data form. The value of T i may be adjusted according to the actual requirements and the processing power of the processor to ensure that the amount of data that can be processed per unit time does not exceed the maximum data conversion T i. Thus, data accumulation and processing delay can be avoided, and timeliness and stability of data conversion are maintained.
By the method, the rationality of setting the data conversion time corresponding to each target data form can be effectively improved, and the instantaneity and the efficiency of data processing can be realized. According to the average value C of the data quantity of the user characteristic data generated in unit time and the maximum data conversion quantity T i of each target data form, the matching parameter W and the standard constant value W 0 are combined, so that the time delay of data conversion can be reduced as much as possible on the premise of ensuring the accuracy of data conversion, and the real-time performance and the efficiency of data processing are improved.
Meanwhile, the load of the system can be balanced by adjusting the data conversion time corresponding to each target data form. According to the average value C of the data quantity of the user characteristic data generated in unit time and the maximum data conversion quantity C zi of each target data form, the value of T i can be reasonably set so as to ensure that the data conversion process does not exceed the load capacity of a processor. This helps to avoid problems of system crashes or processor overload, improving the stability and reliability of the system.
S3, analyzing and evaluating the user characteristic data with the distributed characteristics, and executing the following operations:
acquiring user characteristic data with distribution characteristics;
Training a user characteristic model based on the linear model for the user characteristic data, and determining a user characteristic selection model based on the linear model;
Based on the user feature selection model, analyzing and evaluating the user feature data with the distributed features, and determining a corresponding analysis and evaluation report;
Aiming at the situation that the user characteristic data with the distributed characteristics contains redundant user characteristics, the determined analysis and evaluation report is that the user characteristic data needs to be subjected to machine learning-based user characteristic selection processing;
For the case that the user feature data with distributed features does not contain redundant user features, the determined analysis evaluation reports that no machine learning-based user feature selection processing is required for the user feature data.
S4, selecting and controlling the user characteristic data with the distribution characteristics, and executing the following operations:
Acquiring an analysis evaluation report based on a user feature selection model;
Aiming at the situation that the analysis evaluation report is that the user characteristic data needs to be subjected to machine learning-based user characteristic selection processing, determining a user characteristic selection strategy based on the analysis evaluation report based on a data mining technology and an association analysis method, and intelligently selecting the user characteristic data according to the user characteristic selection strategy.
S4, determining a user characteristic selection strategy based on the analysis evaluation report, and executing the following operations:
Acquiring an analysis evaluation report based on a user feature selection model;
Based on a data mining technology and a correlation analysis method, carrying out deep analysis on user characteristic data with distributed characteristics, and determining redundant user characteristics in the user characteristic data;
the method comprises the steps of obtaining redundant user characteristics, effectively sequencing the redundant user characteristics based on an internal sequencing method, and determining a redundant user characteristic set based on machine learning user characteristic selection requirements;
and carrying out correlation analysis on the redundant user feature sets, determining a user feature selection strategy based on the redundant user feature sets, and carrying out intelligent selection on user feature data according to the user feature selection strategy.
And S4, intelligently selecting the user characteristic data according to a user characteristic selection strategy, and executing the following operations:
acquiring a user feature selection strategy based on the redundant user feature set, and intelligently selecting redundant user features in the user feature data based on the user feature selection strategy;
Extracting user characteristic data one by one, carrying out correlation analysis on the user characteristic data, determining redundant user characteristics with high correlation, carrying out intelligent selection on the user characteristic data containing the redundant user characteristics with high correlation based on a method for eliminating data redundancy, and removing the redundant user characteristics in the user characteristic data.
In S4, after the redundant user features are removed, the following operations are further performed:
Acquiring user characteristic data from which redundant user characteristics are removed;
Comprehensively analyzing the user characteristic data, and checking whether the user characteristic data contains low-value user characteristics;
Aiming at the situation that the user characteristic data contains low-value user characteristics, removing the low-value user characteristics contained in the user characteristic data, and reserving high-value user characteristics;
and for the case that the user characteristic data does not contain low-value user characteristics, the user characteristic data is fully reserved.
In summary, the screening machine learning user-based feature selection method of the invention collects user registration information in real time according to a user registration window, acquires user feature data in real time based on the user registration information, pre-processes the user feature data acquired in real time, converts the user feature data acquired in real time into a form which can be received by a processor based on machine learning user feature selection requirements, performs filtering, noise reduction and sorting processing on the converted user feature data, determines user feature data with distributed features, performs analysis and evaluation on the user feature data with distributed features, acquires the user feature data with distributed features, performs user feature model training on the user feature data based on a linear model, determines a linear model-based user feature selection model, performs analysis and evaluation on the user feature data with distributed features, determines a corresponding analysis and evaluation report based on a data mining technology and an associated analysis method, determines a user feature selection strategy based on the analysis and evaluation report, performs intelligent selection on the user feature data according to a user feature selection strategy, and can improve screening precision and user feature selection effect when performing user feature selection based on machine learning.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (2)

1. A feature selection method based on screening machine learning users, comprising the steps of:
S1: collecting user registration information in real time according to a user registration window, and acquiring user characteristic data in real time based on the user registration information, wherein the user characteristic data comprises, but is not limited to, user name characteristics, user gender characteristics, user age characteristics, user region characteristics, user preference characteristics, user payment characteristics and user social characteristics;
wherein, the user registration information is collected in real time according to the user registration window, and the following operations are executed:
acquiring a user registration window;
The user inputs user registration information corresponding to the specific prompt including, but not limited to, user name characteristics, user gender characteristics, user age characteristics, user region characteristics, user preference characteristics, user payment characteristics and user social characteristics in the user registration window according to the specific prompt of the user registration window;
After the user registration information is input, acquiring the user registration information in real time, extracting the acquired user registration information, and determining user characteristic data;
Preprocessing the user characteristic data acquired in real time, and executing the following operations:
acquiring user characteristic data, and preprocessing the user characteristic data;
Based on the machine learning user feature selection requirement, converting the user feature data acquired in real time into a form which can be received by a processor;
obtaining converted user characteristic data, and filtering the user characteristic data;
Based on the data filtering model, identifying user characteristic data by adopting a statistical method, filtering out user characteristic data which is useless for machine learning user characteristic selection, and retaining user characteristic data which is useful for machine learning user characteristic selection;
Acquiring filtered user characteristic data, and carrying out noise reduction treatment on the user characteristic data;
Processing the user characteristic data based on a data noise reduction method, removing noise characteristic data in the user characteristic data, and determining user characteristic data without the noise characteristic data;
acquiring user characteristic data after noise reduction, and sequencing the user characteristic data;
Based on an internal sorting method, user characteristic data of the determined noiseless characteristic data are effectively sorted, and user characteristic data with distributed characteristics are determined;
S2: preprocessing the user characteristic data acquired in real time, converting the user characteristic data acquired in real time into a form which can be received by a processor based on the selection requirement of the machine learning user characteristic, and performing filtering, noise reduction and sequencing on the converted user characteristic data to determine the user characteristic data with distributed characteristics;
the method for converting the user characteristic data acquired in real time into a form which can be received by a processor comprises the following steps:
setting a unit time length, wherein the value range of the unit time length is 3s-5s;
monitoring the data volume of the user characteristic data acquired in unit time in real time;
Acquiring types of receivable forms of the processors corresponding to the user feature data based on machine learning user feature selection requirements, and taking the types of the receivable forms of each processor as target data forms;
extracting the maximum data conversion amount capable of converting the user characteristic data into each target data form in unit time;
acquiring fit parameters between the user characteristic data and the processor according to the maximum data conversion quantity of each target data form; the fit parameters are obtained through the following formula:
Wherein W represents a fitting parameter; c represents the average value of the data amount of the user characteristic data generated in unit time; c zi represents the maximum data conversion amount corresponding to the ith target data form in unit time; n represents the number of kinds of target data forms; λ represents an adjustment coefficient; w 0 represents the fitting parameter reference constant value; m represents the number of unit time currently experienced; c j denotes the data amount of the user characteristic data generated in the jth unit time;
Determining a data conversion time interval corresponding to each target data form according to the fit parameters; determining a data conversion time interval corresponding to each target data form according to the fit parameters, wherein the data conversion time interval comprises the following steps:
extracting fit parameters between the user characteristic data and the processor;
And setting the fit parameters through a configuration model to determine a data conversion time interval corresponding to each target data form, wherein the configuration model is as follows:
Wherein, T i represents a data conversion time interval corresponding to the ith target data form; t d represents the corresponding time length of the unit time; c zi represents the maximum data conversion amount corresponding to the ith target data form in unit time; c represents the average value of the data amount of the user characteristic data generated in unit time; w represents a fitting parameter; w 0 represents the fitting parameter reference constant value;
S3: analyzing and evaluating the user characteristic data with the distribution characteristics to obtain the user characteristic data with the distribution characteristics, training the user characteristic model on the basis of the linear model, determining a user characteristic selection model on the basis of the linear model, analyzing and evaluating the user characteristic data with the distribution characteristics on the basis of the user characteristic selection model, and determining a corresponding analysis and evaluation report;
the user characteristic data with the distributed characteristics are analyzed and evaluated, and the following operations are executed:
acquiring user characteristic data with distribution characteristics;
Training a user characteristic model based on the linear model for the user characteristic data, and determining a user characteristic selection model based on the linear model;
Based on the user feature selection model, analyzing and evaluating the user feature data with the distributed features, and determining a corresponding analysis and evaluation report;
Aiming at the situation that the user characteristic data with the distributed characteristics contains redundant user characteristics, the determined analysis and evaluation report is that the user characteristic data needs to be subjected to machine learning-based user characteristic selection processing;
Aiming at the situation that the user characteristic data with the distributed characteristics does not contain redundant user characteristics, the determined analysis and evaluation report is that the user characteristic data does not need to be subjected to machine learning-based user characteristic selection processing;
s4: selecting and controlling the user characteristic data with the distributed characteristics, acquiring an analysis evaluation report based on a user characteristic selection model, determining a user characteristic selection strategy based on the analysis evaluation report based on a data mining technology and an associated analysis method, and intelligently selecting the user characteristic data according to the user characteristic selection strategy;
The user characteristic data with the distribution characteristics are selected and controlled, and the following operations are executed:
Acquiring an analysis evaluation report based on a user feature selection model;
Aiming at the situation that the analysis evaluation report is that the user characteristic data needs to be subjected to machine learning-based user characteristic selection processing, determining a user characteristic selection strategy based on the analysis evaluation report based on a data mining technology and an association analysis method, and intelligently selecting the user characteristic data according to the user characteristic selection strategy;
Determining a user feature selection policy based on the analysis evaluation report, performing the following operations:
Acquiring an analysis evaluation report based on a user feature selection model;
Based on a data mining technology and a correlation analysis method, carrying out deep analysis on user characteristic data with distributed characteristics, and determining redundant user characteristics in the user characteristic data;
the method comprises the steps of obtaining redundant user characteristics, effectively sequencing the redundant user characteristics based on an internal sequencing method, and determining a redundant user characteristic set based on machine learning user characteristic selection requirements;
Performing correlation analysis on the redundant user feature sets, determining a user feature selection strategy based on the redundant user feature sets, and performing intelligent selection on user feature data according to the user feature selection strategy;
Intelligent selection is carried out on the user characteristic data according to the user characteristic selection strategy, and the following operations are executed:
acquiring a user feature selection strategy based on the redundant user feature set, and intelligently selecting redundant user features in the user feature data based on the user feature selection strategy;
Extracting user characteristic data one by one, carrying out correlation analysis on the user characteristic data, determining redundant user characteristics with high correlation, carrying out intelligent selection on the user characteristic data containing the redundant user characteristics with high correlation based on a method for eliminating data redundancy, and removing the redundant user characteristics in the user characteristic data.
2. A method of feature selection for a machine learning user based on screening of claim 1, wherein: in the step S4, after the redundant user features are removed, the following operations are further performed:
Acquiring user characteristic data from which redundant user characteristics are removed;
Comprehensively analyzing the user characteristic data, and checking whether the user characteristic data contains low-value user characteristics;
Aiming at the situation that the user characteristic data contains low-value user characteristics, removing the low-value user characteristics contained in the user characteristic data, and reserving high-value user characteristics;
and for the case that the user characteristic data does not contain low-value user characteristics, the user characteristic data is fully reserved.
CN202310596434.1A 2023-05-25 2023-05-25 Feature selection method based on screening machine learning user Active CN116628465B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310596434.1A CN116628465B (en) 2023-05-25 2023-05-25 Feature selection method based on screening machine learning user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310596434.1A CN116628465B (en) 2023-05-25 2023-05-25 Feature selection method based on screening machine learning user

Publications (2)

Publication Number Publication Date
CN116628465A CN116628465A (en) 2023-08-22
CN116628465B true CN116628465B (en) 2024-07-12

Family

ID=87620823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310596434.1A Active CN116628465B (en) 2023-05-25 2023-05-25 Feature selection method based on screening machine learning user

Country Status (1)

Country Link
CN (1) CN116628465B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276369A (en) * 2019-04-24 2019-09-24 武汉众邦银行股份有限公司 Feature selection approach, device, equipment and storage medium based on machine learning
CN116105217A (en) * 2023-01-09 2023-05-12 宁夏韧恒科技有限公司 Low-energy-consumption building system for combined application of photovoltaic energy and air heat source pump

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595267A (en) * 2018-04-18 2018-09-28 中国科学院重庆绿色智能技术研究院 A kind of resource regulating method and system based on deeply study
US11615347B2 (en) * 2019-12-31 2023-03-28 Paypal, Inc. Optimizing data processing and feature selection for model training
CN111797313A (en) * 2020-06-23 2020-10-20 深圳壹账通智能科技有限公司 Self-learning recommendation method and device, computer equipment and storage medium
WO2022109812A1 (en) * 2020-11-24 2022-06-02 曹庆恒 Operation teaching system and usage method thereof, operation device, and computer readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276369A (en) * 2019-04-24 2019-09-24 武汉众邦银行股份有限公司 Feature selection approach, device, equipment and storage medium based on machine learning
CN116105217A (en) * 2023-01-09 2023-05-12 宁夏韧恒科技有限公司 Low-energy-consumption building system for combined application of photovoltaic energy and air heat source pump

Also Published As

Publication number Publication date
CN116628465A (en) 2023-08-22

Similar Documents

Publication Publication Date Title
CN110990159B (en) Historical data analysis-based container cloud platform resource quota prediction method
CN108874959B (en) User dynamic interest model building method based on big data technology
US7110913B2 (en) Apparatus and method for managing the performance of an electronic device
EP2698712B1 (en) Computer program, method, and information processing apparatus for analyzing performance of computer system
CN103873528A (en) Method and equipment for distributing system resources for user
CN114048436A (en) Construction method and construction device for forecasting enterprise financial data model
CN110046889A (en) A kind of detection method, device and the server of abnormal behaviour main body
CN110164539A (en) The analysis method of the scheduling of Medical Devices
CN109815855A (en) A kind of electronic equipment automatic test approach and system based on machine learning
CN116628465B (en) Feature selection method based on screening machine learning user
CN115883392B (en) Data perception method and device of computing power network, electronic equipment and storage medium
CN115098238B (en) Application program task scheduling method and device
CN116187803A (en) Enterprise innovation capability evaluation system based on big data
CN112396313B (en) Method for optimizing telephone sales performance by using smart watch
CN115827232A (en) Method, device, system and equipment for determining configuration for service model
CN113033938B (en) Method, device, terminal equipment and storage medium for determining resource allocation strategy
CN114529225A (en) Intelligent secondary pressurized water supply method and device and readable storage medium
CN112631882A (en) Capacity estimation method combined with online service index characteristics
CN112800140A (en) High-reliability data acquisition method based on block chain prediction machine
CN102567024A (en) Script executing system and method
CN117235519B (en) Energy data processing method, device and storage medium
CN112633622B (en) Smart power grid operation index screening method
CN116957306B (en) User side response potential evaluation method and system based on resource collaborative interaction
CN118012574B (en) Virtual machine service condition grading method under private cloud platform
CN118249339A (en) Method and device for predicting power demand information, storage medium and processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant