CN113609121A - Target data processing method, device, equipment and medium based on artificial intelligence - Google Patents

Target data processing method, device, equipment and medium based on artificial intelligence Download PDF

Info

Publication number
CN113609121A
CN113609121A CN202110942022.XA CN202110942022A CN113609121A CN 113609121 A CN113609121 A CN 113609121A CN 202110942022 A CN202110942022 A CN 202110942022A CN 113609121 A CN113609121 A CN 113609121A
Authority
CN
China
Prior art keywords
index data
data
index
current
screening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110942022.XA
Other languages
Chinese (zh)
Inventor
亓宁
崔雪宁
张又允
李燕婷
刘剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Asset Management Co Ltd
Original Assignee
Ping An Asset Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Asset Management Co Ltd filed Critical Ping An Asset Management Co Ltd
Priority to CN202110942022.XA priority Critical patent/CN113609121A/en
Publication of CN113609121A publication Critical patent/CN113609121A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a target data processing method, a device, equipment and a medium based on artificial intelligence, which comprises the following steps: acquiring target data, and calculating the target data to obtain corresponding index data; screening the index data by using at least two preset index screening methods to obtain current index data; performing multiple collinearity tests on the current index data to judge whether secondary index data exists in the current index data; when the secondary index data exists in the current index data, deleting the secondary index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data; when the secondary index data meeting the colinearity requirement does not exist in the current index data, taking the current index data as the remaining index data; and training a data processing model according to the residual index data, and processing newly-added data according to the trained data processing model. The method can ensure the data accuracy.

Description

Target data processing method, device, equipment and medium based on artificial intelligence
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a medium for processing target data based on artificial intelligence.
Background
With the development of big data technology, artificial intelligence technology has emerged, which learns the rules in big data through artificial intelligence to obtain valuable information from the data.
In the conventional technology, for the processing of big data, data cleaning is usually performed by using a set rule, and then the data is input into a model, so that corresponding valuable information is obtained.
However, such rules for data cleansing are not constant, and the rules typically include only simple data padding or the like, resulting in errors in subsequently derived valuable information.
Disclosure of Invention
In view of the above, it is necessary to provide a method, an apparatus, a device and a medium for processing target data based on artificial intelligence, which can ensure the accuracy of data.
An artificial intelligence based target data processing method, comprising:
acquiring target data, and calculating the target data to obtain corresponding index data;
screening the index data by using at least two preset index screening methods to obtain current index data;
performing multiple collinearity test on the current index data to judge whether secondary index data exists in the current index data, wherein the secondary index data is index data of which the collinearity meets the collinearity requirement;
when the secondary index data exists in the current index data, deleting the secondary index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data;
when secondary index data meeting the colinearity requirement does not exist in the current index data, taking the current index data as the remaining index data;
and training a data processing model according to the residual index data, and processing newly-added data according to the trained data processing model.
In one embodiment, the screening of the index data by using at least two preset index screening methods to obtain the current index data includes at least two of the following:
sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time;
inputting all the index data into an index data screening model based on logistic regression, and sequentially deleting the index data with the minimum contribution degree;
sequentially performing logistic regression on each index data and the targets, sorting the index data according to the contribution degree of the targets, sequentially introducing the index data according to the sorting, and after introducing the index data, checking all introduced index data to delete the index data with low contribution degree until all the index data are processed.
In one embodiment, after the taking the current index data as the remaining index data, the method further includes:
matching the residual index data with pre-generated index data and a service meaning table;
when the residual index data is not matched with the pre-generated index data and a service meaning table, outputting the unmatched residual index data, and receiving a processing instruction aiming at the unmatched residual index data;
when the processing instruction is deletion, deleting the unmatched residual index data;
when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into the pre-generated index data and business meaning table;
and updating the residual index data according to the matched residual index data and the remained index data.
In one embodiment, the training of the data processing model according to the remaining metric data includes:
sorting the rest index data according to the contribution degree;
respectively selecting different amounts of index data with contribution degrees ranked in the front for model training, and testing the accuracy of the trained model;
and selecting the model with the maximum accuracy as a final model.
In one embodiment, the method further comprises:
receiving an index data distribution instruction;
and acquiring and outputting the rest index data and the contribution degree corresponding to each rest index data according to the index data distribution instruction.
An artificial intelligence based target data processing apparatus, the artificial intelligence based target data processing apparatus comprising:
the data acquisition module is used for acquiring target data and calculating the target data to obtain corresponding index data;
the screening module is used for screening the index data by using at least two preset index screening methods to obtain current index data;
the inspection module is used for performing multiple collinearity inspection on the currently screened index data to judge whether secondary index data exists in the current index data or not, wherein the secondary index data is index data of which the collinearity meets the collinearity requirement;
the circulation module is used for deleting the secondary index data when the secondary index data exists in the current index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data;
the index data acquisition module is used for taking the current index data as the remaining index data when the secondary index data meeting the collinearity requirement does not exist in the current index data;
and the model processing module is used for training a data processing model according to the residual index data and processing newly-added data according to the trained data processing model.
In one embodiment, the screening module is configured to screen the index data according to at least two of the following types to obtain screened index data: sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time;
inputting all the index data into an index data screening model based on logistic regression, and sequentially deleting the index data with the minimum contribution degree;
sequentially performing logistic regression on each index data and a target, sequencing the index data by using the contribution degree of the target, sequentially introducing the index data by using the sequencing, and inspecting all introduced index data after introducing the index data to delete the index data with low contribution degree until all the index data are processed.
In one embodiment, the artificial intelligence based target data processing apparatus further comprises:
the matching module is used for matching the residual index data with pre-generated index data and a service meaning table;
the post-processing module is used for outputting unmatched residual index data when the residual index data is not matched with the pre-generated index data and the service meaning table, and receiving a processing instruction aiming at the unmatched residual index data; when the processing instruction is deletion, deleting the unmatched residual index data; when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into the pre-generated index data and business meaning table; and updating the residual index data according to the matched residual index data and the remained index data.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method when executing the computer program.
A computer storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method.
According to the target data processing method, the device, the equipment and the medium based on the artificial intelligence, after target data are obtained, index data are obtained through calculation, then the index data are screened, the remaining index data after screening are subjected to collinearity calculation, so that collinearity secondary index data are deleted, at least two preset index screening methods are continuously used for screening the currently screened index data deleted with the secondary index data to update the currently screened index data deleted with the secondary index data, and therefore cyclic processing is performed, so that the index data can be screened at least twice, the accuracy of the index data subjected to subsequent model training is guaranteed, and the accuracy of the model is guaranteed.
Drawings
FIG. 1 is a diagram illustrating an exemplary implementation of an artificial intelligence based target data processing method;
FIG. 2 is a schematic flow diagram of a method for artificial intelligence based object data processing in one embodiment;
FIG. 3 is an architecture diagram of an artificial intelligence based object data processing method in another embodiment;
FIG. 4 is a block diagram of an artificial intelligence based target data processing apparatus in one embodiment;
FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The target data processing method based on the artificial intelligence model can be applied to the application environment shown in fig. 1. Wherein database 102 is in communication with server 104 over a network. The server 104 may obtain target data from the database 102, and calculate the target data to obtain corresponding index data; screening the index data by using at least two preset index screening methods to obtain current index data; performing multiple collinearity inspection on the current index data to judge whether secondary index data exist in the current index data, wherein the secondary index data are index data of which the collinearity meets the collinearity requirement, deleting the secondary index data when the secondary index data exist in the current index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data; when the secondary index data meeting the colinearity requirement does not exist in the current index data, taking the current index data as the remaining index data; and training a data processing model according to the residual index data, and processing newly-added data according to the trained data processing model. The server 104 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.
In one embodiment, as shown in fig. 2, there is provided an artificial intelligence model-based target data processing method, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:
s202: and acquiring target data, and calculating the target data to obtain corresponding index data.
Specifically, referring to fig. 3, the target data may be data obtained from a plurality of external resources and customized by a user, and the target data servers may calculate the index data.
The index calculation logic corresponding to the index data can be selected in advance according to business requirements, after the server acquires the target data, the server acquires each industry, and then collects the corresponding index and the index calculation logic according to the industry, so that after the server acquires the target data, the server classifies the target data according to the industry, and then calculates the classified target data by using the corresponding index calculation logic in parallel to obtain the index data.
Alternatively, the indicators may include financial indicators, industry data, and the like, and accordingly the indicator calculation logic is logic for calculating the indicators, such as calculating an average profitability of a product, the server collects the corresponding indexes and index calculation logic according to the industry and caches the indexes, wherein optionally the server may establish a dependency between the respective metric calculation logics, thus avoiding a repeated calculation of each metric, then the server classifies according to industry after acquiring the target data to generate an index calculation task according to industry classification and the corresponding target data, thus, the server acquires the dependency relationship between the calculation logics generated at the beginning, determines the calculation sequence of each index in the index calculation task according to the dependency relationship, and then, sequentially executing corresponding index calculation logics according to the calculation sequence to calculate the target data to obtain index data.
S204: and screening the index data by using at least two preset index screening methods to obtain the current index data.
Specifically, the index screening method can be found below. The server screens the index data according to the index screening methods to obtain the screened index data, wherein preferably, in order to reduce the processing amount, the server may obtain an intersection corresponding to the index data obtained by each index screening method, and rank the index data outside the intersection according to the contribution, so that the server may determine the screened index according to the intersection and the number of the index data within each contribution range, for example, when the number of the index data in the intersection meets the requirement, only the index data in the intersection is used as the screened index data, and when the number of the index data in the intersection is smaller than a preset number, the corresponding number of the index data from the ranked index data according to the contribution may be selected according to the preset number to ensure the validity of the index data range.
Specifically, when the number of the index data in the intersection meets the requirement, the index data is sufficient, the screening range of the index data can be ensured to be sufficient, and further the subsequent accuracy is ensured, if the number of the index data in the intersection is smaller than a certain value, the index data can be added into the intersection according to the contribution degree of the index data outside the intersection, and thus the screening range of the index data can be ensured to be sufficient, and further the subsequent accuracy is ensured.
The contribution degree can be calculated by a method based on information gain and PCA.
Optionally, before the screening of the index data, the corresponding index data may be subjected to variable extremum processing to remove interference of an abnormal value, the processed index data is subjected to normalization processing, and the data with dimensions is converted into dimensionless data to eliminate the influence of different dimensions on the model. In addition, before the index data is screened by the various index screening methods, the server can also analyze the relevance between each index data and the target so as to delete the index data with the relevance not meeting the requirement.
S206: and performing multiple collinearity test on the current index data to judge whether secondary index data exists in the current index data, wherein the secondary index data is index data of which the collinearity meets the collinearity requirement.
S208: deleting the secondary index data when the secondary index data exists in the current index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data; and when the secondary index data meeting the collinearity requirement does not exist in the current index data, taking the current index data as the rest index data.
Specifically, the collinearity test is to calculate a correlation coefficient between each two index data and a corresponding P value thereof, and use multiple collinearity of the variance expansion factor diagnosis factor to delete a secondary factor with strong collinearity.
And in order to ensure the accuracy of the remaining index data, the server loops the step S204 to perform re-screening on the index data. And when the index data meet the requirement, for example, when the index data meeting the collinearity requirement does not exist or the cycle number reaches a preset number, the server acquires the rest index data.
In one embodiment, when the secondary index data exists in the current index data, the secondary index data is deleted, and the current index data from which the secondary index data is deleted is repeatedly screened to update the current index data, wherein the repeatedly screening of the current index data from which the secondary index data is deleted may be continuously performed by screening the index data by using at least two preset index screening methods to obtain the screened index data until the remaining index data meets the preset requirement.
Specifically, the pre-generated index data and service meaning table is a table of the corresponding relationship between each index data and a specific service meaning preset by the user, the server can match the remaining index data with the standard index data in the table, if the matching is successful, the index data has the specific service meaning, so the index data can be retained, if the matching is not successful, the index data can be output for the user to process, for example, if the user determines that the index has no specific service meaning, a deletion instruction is input to delete the unmatched index data, if the user determines that the index has the specific service meaning, the specific service meaning is output, so the server can establish the corresponding relationship between the input service meaning and the corresponding unmatched index data and store the corresponding unmatched index data and the pre-generated service meaning table, so as to facilitate the subsequent matching process, and the remaining index data matched in this way and the remaining index data remained are updated.
In addition, the server can also receive an input index data increasing instruction, so that index data which is not automatically selected by the server but is more important is introduced.
And finally, the server repeats the step of screening the index data until the remaining index data meet the preset requirement, for example, until a stop instruction input by a user is received.
In the above embodiment, after the index data is automatically acquired by the server, the index data is also analyzed manually to ensure that each index data has a specific business meaning, thereby ensuring the accuracy of the finally selected index data.
S210: and training a data processing model according to the residual index data, and processing newly-added data according to the trained data processing model.
Specifically, after the server acquires the remaining index data, the model is trained according to the index data, so that when the newly added data exists subsequently, the newly added data can be processed according to the remaining index data to obtain corresponding target index data, namely, the newly added data is calculated by using the calculation logic of the remaining index data to obtain the target index data, and the target index data is input into the trained data processing model to be processed to obtain a model processing result.
Specifically, as shown in the above diagram, the server downloads the new data in real time, and processes the new data and the corresponding history data by using the remaining index data to obtain the corresponding target index, so as to input the target index into the model to obtain a result, and write the result back.
In the above embodiment, after the target data is obtained, the index data is obtained by calculation, then the index data is screened, and the remaining index data after screening is subjected to collinearity calculation, so that the collinearity index data is deleted, and thus, cyclic processing is performed, so that the index data can be screened, and further, the accuracy of the index data subjected to subsequent model training is ensured, and thus, the accuracy of the model is ensured.
In one embodiment, the current index data is obtained by screening the index data by using at least two preset index screening methods, including at least two of the following methods: sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time; all the index data are input into an index data screening model based on logistic regression, and the index data with the minimum contribution degree are deleted in sequence; and sequentially performing logistic regression on each index data and the target, sequencing the index data according to the contribution degree of the target, sequentially introducing the index data according to the sequencing, and inspecting all introduced index data after introducing the index data to delete the index data with low contribution degree until all the index data are processed.
Specifically, the server may filter the index data by using different index filtering methods in a multithreading manner. For example, different threads are used for executing different index data screening methods respectively, and the threads at least comprise two threads, so that at least two index screening methods are guaranteed to be executed.
Specifically, the server may first add the first index data to the logistic regression-based index data screening model and calculate the contribution degree of the index data, then obtain the next index data, input the next index data and the first index data into the logistic regression-based index data screening model, calculate the contribution degree of each index data, and select the index data with a large contribution degree to output. And then introducing next index data, inputting the index data and the past index data into an index data screening model based on logistic regression together, calculating the contribution degree of each index data, and then selecting the index data with the contribution degree ranked in the front.
In addition, in another thread, the server may first establish an all-variable model, that is, all the index data are input into the index data screening model of the edit regression, and the contribution degrees of the index data are calculated, and then sort the contribution degrees to delete the index data with the smallest contribution degree, and perform a loop until all the index data with the contribution degrees smaller than the preset threshold are deleted.
In addition, in another thread, the server may further screen the index data in a regression manner, specifically, the server may perform logistic regression on each index data and the target in sequence to obtain the contribution degree of each index data, then introduce other index data one by one on the basis of the index data with the highest contribution degree, and check the introduced index data every time a new index data is introduced, so as to delete the index data with the significance degree lower than the preset threshold value, and go through multiple cycles until no significant independent variable is introduced again.
In one embodiment, after taking the current index data as the remaining index data, the method further includes: matching the rest index data with the pre-generated index data and the service meaning table; when the remaining index data is not matched with the pre-generated index data and the service meaning table, outputting the unmatched remaining index data and receiving a processing instruction aiming at the unmatched remaining index data; when the processing instruction is deletion, deleting unmatched residual index data; when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into a pre-generated index data and business meaning table; and updating the remaining index data according to the matched remaining index data and the remaining index data which is remained.
In one embodiment, training the data processing model based on the remaining metric data comprises: sorting the rest index data according to the contribution degree; respectively selecting different amounts of index data with contribution degrees ranked in the front for model training, and testing the accuracy of the trained model; and selecting the model with the maximum accuracy as a final model.
Specifically, during model training, the server ranks the remaining index data according to the contribution degree, so that different amounts of index data with the contribution degree ranked in the front can be selected for model training, and thus, the remaining index data are obtained according to logistic regression and basically meet requirements, and then training is performed according to the artificial intelligence model, so that the accuracy of the artificial intelligence model can be ensured.
The server can respectively select different amounts of index data to perform model training and test accuracy, so that the model with the highest accuracy is selected as the final model.
In the above embodiment, the logistic regression and the artificial intelligence model, that is, the linear analysis and the nonlinear model are combined, so that the accuracy of the model is ensured.
In one embodiment, the artificial intelligence based target data processing method further includes: receiving an index data distribution instruction; and acquiring and outputting the residual index data and the contribution degree corresponding to each residual index data according to the index data distribution instruction.
Specifically, the server may further store the remaining index data and the contribution degrees corresponding to the remaining index data, so that the server may further receive an index data distribution instruction, so as to query the distribution of the corresponding index data, so that the user can know the current index data in time.
It is emphasized that the target data, the index data and the model can also be stored in a node of a block chain in order to further ensure the privacy and security of the target data, the index data and the model.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 4, there is provided an artificial intelligence based object data processing apparatus, including: a data acquisition module 100, a screening module 200, a verification module 300, a cycle module 400, an index data acquisition module 500, and a model processing module 600, wherein:
the data acquisition module 100 is configured to acquire target data and calculate the target data to obtain corresponding index data;
the screening module 200 is configured to screen the index data by using at least two preset index screening methods to obtain current index data;
the inspection module 300 is configured to perform multiple collinearity inspection on the screened index data to determine whether secondary index data exists in the current index data, where the secondary index data is index data whose collinearity meets a collinearity requirement;
a loop module 400, configured to delete the secondary index data when the secondary index data exists in the current index data, and repeatedly screen the current index data from which the secondary index data is deleted, so as to update the current index data;
the index data obtaining module 500 is configured to, when there is no secondary index data satisfying the collinearity requirement in the current index data, take the current index data as remaining index data;
and the model processing module 600 is configured to train a data processing model according to the remaining index data, and process newly-added data according to the trained data processing model.
In one embodiment, the screening module 200 is further configured to calculate the index data according to at least two of the following: sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time; all the index data are input into an index data screening model based on logistic regression, and the index data with the minimum contribution degree are deleted in sequence; and sequentially performing logistic regression on each index data and the target, sequencing the index data according to the contribution degree of the target, sequentially introducing the index data according to the sequencing, and inspecting all introduced index data after introducing the index data to delete the index data with low contribution degree until all the index data are processed.
In one embodiment, the target data processing module further includes:
the matching module is used for matching the residual index data with the pre-generated index data and the service meaning table;
the post-processing module is used for outputting unmatched residual index data when the residual index data are unmatched with the pre-generated index data and the service meaning table, and receiving a processing instruction aiming at the unmatched residual index data; when the processing instruction is deletion, deleting unmatched residual index data; when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into a pre-generated index data and business meaning table; and updating the remaining index data according to the matched remaining index data and the remaining index data which is remained.
In one embodiment, the model processing module 600 includes:
the sorting unit is used for sorting the rest index data according to the contribution degree;
the test unit is used for respectively selecting different amounts of index data with contribution degrees sequenced in the front to carry out model training and testing the accuracy of the trained model;
and the selecting unit is used for selecting the model with the maximum accuracy as the final model.
In one embodiment, the artificial intelligence based target data processing apparatus further includes:
the receiving module is used for receiving an index data distribution instruction;
and the output module is used for acquiring and outputting the residual index data and the contribution degree corresponding to each residual index data according to the index data distribution instruction.
For specific limitations of the target data processing apparatus based on artificial intelligence, reference may be made to the above limitations of the target data processing method based on artificial intelligence, which are not described herein again. The modules in the target data processing device based on artificial intelligence can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing target data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an artificial intelligence based targeted data processing method.
Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: acquiring target data, and calculating the target data to obtain corresponding index data; screening the index data by using at least two preset index screening methods to obtain current index data; performing multiple collinearity inspection on the currently screened index data to judge whether secondary index data exists in the current index data, wherein the secondary index data is index data of which the collinearity meets the collinearity requirement; when the secondary index data exists in the current index data, deleting the secondary index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data; when the secondary index data meeting the colinearity requirement does not exist in the current index data, taking the current index data as the remaining index data; and training a data processing model according to the residual index data, and processing newly-added data according to the trained data processing model.
In one embodiment, the processor, when executing the computer program, performs a screening process on the index data using at least two preset index screening methods to obtain current index data, where the screening process includes at least two of the following steps: sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time; all the index data are input into an index data screening model based on logistic regression, and the index data with the minimum contribution degree are deleted in sequence; and sequentially performing logistic regression on each index data and the target, sequencing the index data according to the contribution degree of the target, sequentially introducing the index data according to the sequencing, and inspecting all introduced index data after introducing the index data to delete the index data with low contribution degree until all the index data are processed.
In one embodiment, the processor, when executing the computer program, after taking the current metric data as the remaining metric data, further comprises: matching the rest index data with the pre-generated index data and the service meaning table; when the remaining index data is not matched with the pre-generated index data and the service meaning table, outputting the unmatched remaining index data and receiving a processing instruction aiming at the unmatched remaining index data; when the processing instruction is deletion, deleting unmatched residual index data; when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into a pre-generated index data and business meaning table; and updating the remaining index data according to the matched remaining index data and the remaining index data which is remained.
In one embodiment, training the data processing model based on the remaining metric data, as implemented by the processor executing the computer program, comprises: sorting the rest index data according to the contribution degree; respectively selecting different amounts of index data with contribution degrees ranked in the front for model training, and testing the accuracy of the trained model; and selecting the model with the maximum accuracy as a final model.
In one embodiment, the processor, when executing the computer program, further performs the steps of: receiving an index data distribution instruction; and acquiring and outputting the residual index data and the contribution degree corresponding to each residual index data according to the index data distribution instruction.
In one embodiment, a computer storage medium is provided, having a computer program stored thereon, the computer program, when executed by a processor, implementing the steps of: acquiring target data, and calculating the target data to obtain corresponding index data; screening the index data by using at least two preset index screening methods to obtain current index data; performing multiple collinearity inspection on the currently screened index data to judge whether secondary index data exists in the current index data, wherein the secondary index data is index data of which the collinearity meets the collinearity requirement; when the secondary index data exists in the current index data, deleting the secondary index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data; when the secondary index data meeting the colinearity requirement does not exist in the current index data, taking the current index data as the remaining index data; and training a data processing model according to the residual index data, and processing newly-added data according to the trained data processing model.
In one embodiment, the computer program is implemented when executed by a processor
Screening the index data by using at least two preset index screening methods to obtain current index data, wherein the current index data comprises at least two of the following types: sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time; all the index data are input into an index data screening model based on logistic regression, and the index data with the minimum contribution degree are deleted in sequence; and sequentially performing logistic regression on each index data and the target, sequencing the index data according to the contribution degree of the target, sequentially introducing the index data according to the sequencing, and inspecting all introduced index data after introducing the index data to delete the index data with low contribution degree until all the index data are processed.
In one embodiment, the computer program, when executed by the processor, further comprises, after taking the current metric data as the remaining metric data: matching the rest index data with the pre-generated index data and the service meaning table; when the remaining index data is not matched with the pre-generated index data and the service meaning table, outputting the unmatched remaining index data and receiving a processing instruction aiming at the unmatched remaining index data; when the processing instruction is deletion, deleting unmatched residual index data; when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into a pre-generated index data and business meaning table; and updating the remaining index data according to the matched remaining index data and the remaining index data which is remained.
In one embodiment, training the data processing model based on the remaining metric data, as implemented when the computer program is executed by the processor, comprises: sorting the rest index data according to the contribution degree; respectively selecting different amounts of index data with contribution degrees ranked in the front for model training, and testing the accuracy of the trained model; and selecting the model with the maximum accuracy as a final model.
In one embodiment, the computer program when executed by the processor further performs the steps of: receiving an index data distribution instruction; and acquiring and outputting the residual index data and the contribution degree corresponding to each residual index data according to the index data distribution instruction.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An artificial intelligence based target data processing method, characterized in that the artificial intelligence based target data processing method comprises:
acquiring target data, and calculating the target data to obtain corresponding index data;
screening the index data by using at least two preset index screening methods to obtain current index data;
performing multiple collinearity test on the current index data to judge whether secondary index data exists in the current index data, wherein the secondary index data is index data of which the collinearity meets the collinearity requirement;
when the secondary index data exists in the current index data, deleting the secondary index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data;
when secondary index data meeting the colinearity requirement does not exist in the current index data, taking the current index data as the remaining index data;
and training a data processing model according to the residual index data, and processing newly-added data according to the trained data processing model.
2. The artificial intelligence based target data processing method according to claim 1, wherein the at least two preset index screening methods are used for screening the index data to obtain current index data, and the current index data includes at least two of the following:
sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time;
inputting all the index data into an index data screening model based on logistic regression, and sequentially deleting the index data with the minimum contribution degree;
sequentially performing logistic regression on each index data and the targets, sorting the index data according to the contribution degree of the targets, sequentially introducing the index data according to the sorting, and after introducing the index data, checking all introduced index data to delete the index data with low contribution degree until all the index data are processed.
3. The artificial intelligence based target data processing method according to claim 1, wherein after the taking the current index data as the remaining index data, further comprising:
matching the residual index data with pre-generated index data and a service meaning table;
when the residual index data is not matched with the pre-generated index data and a service meaning table, outputting the unmatched residual index data, and receiving a processing instruction aiming at the unmatched residual index data;
when the processing instruction is deletion, deleting the unmatched residual index data;
when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into the pre-generated index data and business meaning table;
and updating the residual index data according to the matched residual index data and the remained index data.
4. The artificial intelligence based target data processing method of claim 1, wherein the training of a data processing model according to the remaining metric data comprises:
sorting the rest index data according to the contribution degree;
respectively selecting different amounts of index data with contribution degrees ranked in the front for model training, and testing the accuracy of the trained model;
and selecting the model with the maximum accuracy as a final model.
5. The artificial intelligence based target data processing method of claim 1, further comprising:
receiving an index data distribution instruction;
and acquiring and outputting the rest index data and the contribution degree corresponding to each rest index data according to the index data distribution instruction.
6. An artificial intelligence based object data processing apparatus, characterized in that the artificial intelligence based object data processing apparatus comprises:
the data acquisition module is used for acquiring target data and calculating the target data to obtain corresponding index data;
the screening module is used for screening the index data by using at least two preset index screening methods to obtain current index data;
the inspection module is used for performing multiple collinearity inspection on the currently screened index data to judge whether secondary index data exists in the current index data or not, wherein the secondary index data is index data of which the collinearity meets the collinearity requirement;
the circulation module is used for deleting the secondary index data when the secondary index data exists in the current index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data;
the index data acquisition module is used for taking the current index data as the remaining index data when the secondary index data meeting the collinearity requirement does not exist in the current index data;
and the model processing module is used for training a data processing model according to the residual index data and processing newly-added data according to the trained data processing model.
7. The artificial intelligence based target data processing apparatus according to claim 6, wherein the screening module is configured to screen the index data according to at least two of the following types to obtain the screened index data: sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time;
inputting all the index data into an index data screening model based on logistic regression, and sequentially deleting the index data with the minimum contribution degree;
sequentially performing logistic regression on each index data and a target, sequencing the index data by using the contribution degree of the target, sequentially introducing the index data by using the sequencing, and inspecting all introduced index data after introducing the index data to delete the index data with low contribution degree until all the index data are processed.
8. The artificial intelligence based target data processing apparatus of claim 6, wherein the artificial intelligence based target data processing apparatus further comprises:
the matching module is used for matching the residual index data with pre-generated index data and a service meaning table;
the post-processing module is used for outputting unmatched residual index data when the residual index data is not matched with the pre-generated index data and the service meaning table, and receiving a processing instruction aiming at the unmatched residual index data; when the processing instruction is deletion, deleting the unmatched residual index data; when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into the pre-generated index data and business meaning table; and updating the residual index data according to the matched residual index data and the remained index data.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 5 when executing the computer program.
10. A computer storage medium on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
CN202110942022.XA 2021-08-17 2021-08-17 Target data processing method, device, equipment and medium based on artificial intelligence Pending CN113609121A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110942022.XA CN113609121A (en) 2021-08-17 2021-08-17 Target data processing method, device, equipment and medium based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110942022.XA CN113609121A (en) 2021-08-17 2021-08-17 Target data processing method, device, equipment and medium based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN113609121A true CN113609121A (en) 2021-11-05

Family

ID=78308802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110942022.XA Pending CN113609121A (en) 2021-08-17 2021-08-17 Target data processing method, device, equipment and medium based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN113609121A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956364A (en) * 2016-04-20 2016-09-21 云南中烟工业有限责任公司 Tobacco leaf distinguishing grouping method based on characteristic chemical component
CN108257675A (en) * 2018-02-07 2018-07-06 平安科技(深圳)有限公司 Chronic obstructive pulmonary disease onset risk Forecasting Methodology, server and computer readable storage medium
CN110309219A (en) * 2019-06-20 2019-10-08 吉旗物联科技(上海)有限公司 The generation method and device of credit scoring model
CN110929524A (en) * 2019-10-16 2020-03-27 平安科技(深圳)有限公司 Data screening method, device, equipment and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956364A (en) * 2016-04-20 2016-09-21 云南中烟工业有限责任公司 Tobacco leaf distinguishing grouping method based on characteristic chemical component
CN108257675A (en) * 2018-02-07 2018-07-06 平安科技(深圳)有限公司 Chronic obstructive pulmonary disease onset risk Forecasting Methodology, server and computer readable storage medium
CN110309219A (en) * 2019-06-20 2019-10-08 吉旗物联科技(上海)有限公司 The generation method and device of credit scoring model
CN110929524A (en) * 2019-10-16 2020-03-27 平安科技(深圳)有限公司 Data screening method, device, equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN109598095B (en) Method and device for establishing scoring card model, computer equipment and storage medium
CN110929036A (en) Electric power marketing inspection management method and device, computer equipment and storage medium
US20150199224A1 (en) Method and Apparatus for Detection of Anomalies in Integrated Parameter Systems
CN110929879A (en) Business decision logic updating method based on decision engine and model platform
CN110941555B (en) Test case recommendation method and device, computer equipment and storage medium
CN111176990A (en) Test data generation method and device based on data decision and computer equipment
CN112559365A (en) Test case screening method and device, computer equipment and storage medium
CN109712716B (en) Disease influence factor determination method, system and computer equipment
CN109978261A (en) Determine method, apparatus, readable medium and the electronic equipment of load forecasting model
CN112559364B (en) Test case generation method and device, computer equipment and storage medium
CN112149909A (en) Ship oil consumption prediction method and device, computer equipment and storage medium
CN112232951B (en) Credit evaluation method, device, equipment and medium based on multi-dimensional cross feature
CN115204536A (en) Building equipment fault prediction method, device, equipment and storage medium
CN115062734A (en) Wind control modeling method, device, equipment and medium capable of outputting explanatory information
CN111767192A (en) Service data detection method, device, equipment and medium based on artificial intelligence
CN113110961B (en) Equipment abnormality detection method and device, computer equipment and readable storage medium
CN114169460A (en) Sample screening method, sample screening device, computer equipment and storage medium
CN112527573B (en) Interface testing method, device and storage medium
CN110852384A (en) Medical image quality detection method, device and storage medium
CN113609121A (en) Target data processing method, device, equipment and medium based on artificial intelligence
Mendes et al. Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions: a replicated study
Wirawan et al. Application of data mining to prediction of timeliness graduation of students (a case study)
CN114860608A (en) Scene construction based system automation testing method, device, equipment and medium
CN114372867A (en) User credit verification and evaluation method and device and computer equipment
CN110865939B (en) Application program quality monitoring method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned

Effective date of abandoning: 20240614

AD01 Patent right deemed abandoned