CN113609121A - Target data processing method, device, equipment and medium based on artificial intelligence - Google Patents
Target data processing method, device, equipment and medium based on artificial intelligence Download PDFInfo
- Publication number
- CN113609121A CN113609121A CN202110942022.XA CN202110942022A CN113609121A CN 113609121 A CN113609121 A CN 113609121A CN 202110942022 A CN202110942022 A CN 202110942022A CN 113609121 A CN113609121 A CN 113609121A
- Authority
- CN
- China
- Prior art keywords
- index data
- data
- index
- current
- screening
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 40
- 238000003672 processing method Methods 0.000 title claims abstract description 18
- 238000012545 processing Methods 0.000 claims abstract description 88
- 238000012216 screening Methods 0.000 claims abstract description 84
- 238000000034 method Methods 0.000 claims abstract description 49
- 238000012549 training Methods 0.000 claims abstract description 26
- 238000012360 testing method Methods 0.000 claims abstract description 13
- 238000007477 logistic regression Methods 0.000 claims description 30
- 238000004590 computer program Methods 0.000 claims description 23
- 238000012163 sequencing technique Methods 0.000 claims description 12
- 238000012217 deletion Methods 0.000 claims description 9
- 230000037430 deletion Effects 0.000 claims description 9
- 238000007689 inspection Methods 0.000 claims description 9
- 238000012805 post-processing Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 description 18
- 238000010586 diagram Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 125000004122 cyclic group Chemical group 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to a target data processing method, a device, equipment and a medium based on artificial intelligence, which comprises the following steps: acquiring target data, and calculating the target data to obtain corresponding index data; screening the index data by using at least two preset index screening methods to obtain current index data; performing multiple collinearity tests on the current index data to judge whether secondary index data exists in the current index data; when the secondary index data exists in the current index data, deleting the secondary index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data; when the secondary index data meeting the colinearity requirement does not exist in the current index data, taking the current index data as the remaining index data; and training a data processing model according to the residual index data, and processing newly-added data according to the trained data processing model. The method can ensure the data accuracy.
Description
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a medium for processing target data based on artificial intelligence.
Background
With the development of big data technology, artificial intelligence technology has emerged, which learns the rules in big data through artificial intelligence to obtain valuable information from the data.
In the conventional technology, for the processing of big data, data cleaning is usually performed by using a set rule, and then the data is input into a model, so that corresponding valuable information is obtained.
However, such rules for data cleansing are not constant, and the rules typically include only simple data padding or the like, resulting in errors in subsequently derived valuable information.
Disclosure of Invention
In view of the above, it is necessary to provide a method, an apparatus, a device and a medium for processing target data based on artificial intelligence, which can ensure the accuracy of data.
An artificial intelligence based target data processing method, comprising:
acquiring target data, and calculating the target data to obtain corresponding index data;
screening the index data by using at least two preset index screening methods to obtain current index data;
performing multiple collinearity test on the current index data to judge whether secondary index data exists in the current index data, wherein the secondary index data is index data of which the collinearity meets the collinearity requirement;
when the secondary index data exists in the current index data, deleting the secondary index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data;
when secondary index data meeting the colinearity requirement does not exist in the current index data, taking the current index data as the remaining index data;
and training a data processing model according to the residual index data, and processing newly-added data according to the trained data processing model.
In one embodiment, the screening of the index data by using at least two preset index screening methods to obtain the current index data includes at least two of the following:
sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time;
inputting all the index data into an index data screening model based on logistic regression, and sequentially deleting the index data with the minimum contribution degree;
sequentially performing logistic regression on each index data and the targets, sorting the index data according to the contribution degree of the targets, sequentially introducing the index data according to the sorting, and after introducing the index data, checking all introduced index data to delete the index data with low contribution degree until all the index data are processed.
In one embodiment, after the taking the current index data as the remaining index data, the method further includes:
matching the residual index data with pre-generated index data and a service meaning table;
when the residual index data is not matched with the pre-generated index data and a service meaning table, outputting the unmatched residual index data, and receiving a processing instruction aiming at the unmatched residual index data;
when the processing instruction is deletion, deleting the unmatched residual index data;
when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into the pre-generated index data and business meaning table;
and updating the residual index data according to the matched residual index data and the remained index data.
In one embodiment, the training of the data processing model according to the remaining metric data includes:
sorting the rest index data according to the contribution degree;
respectively selecting different amounts of index data with contribution degrees ranked in the front for model training, and testing the accuracy of the trained model;
and selecting the model with the maximum accuracy as a final model.
In one embodiment, the method further comprises:
receiving an index data distribution instruction;
and acquiring and outputting the rest index data and the contribution degree corresponding to each rest index data according to the index data distribution instruction.
An artificial intelligence based target data processing apparatus, the artificial intelligence based target data processing apparatus comprising:
the data acquisition module is used for acquiring target data and calculating the target data to obtain corresponding index data;
the screening module is used for screening the index data by using at least two preset index screening methods to obtain current index data;
the inspection module is used for performing multiple collinearity inspection on the currently screened index data to judge whether secondary index data exists in the current index data or not, wherein the secondary index data is index data of which the collinearity meets the collinearity requirement;
the circulation module is used for deleting the secondary index data when the secondary index data exists in the current index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data;
the index data acquisition module is used for taking the current index data as the remaining index data when the secondary index data meeting the collinearity requirement does not exist in the current index data;
and the model processing module is used for training a data processing model according to the residual index data and processing newly-added data according to the trained data processing model.
In one embodiment, the screening module is configured to screen the index data according to at least two of the following types to obtain screened index data: sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time;
inputting all the index data into an index data screening model based on logistic regression, and sequentially deleting the index data with the minimum contribution degree;
sequentially performing logistic regression on each index data and a target, sequencing the index data by using the contribution degree of the target, sequentially introducing the index data by using the sequencing, and inspecting all introduced index data after introducing the index data to delete the index data with low contribution degree until all the index data are processed.
In one embodiment, the artificial intelligence based target data processing apparatus further comprises:
the matching module is used for matching the residual index data with pre-generated index data and a service meaning table;
the post-processing module is used for outputting unmatched residual index data when the residual index data is not matched with the pre-generated index data and the service meaning table, and receiving a processing instruction aiming at the unmatched residual index data; when the processing instruction is deletion, deleting the unmatched residual index data; when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into the pre-generated index data and business meaning table; and updating the residual index data according to the matched residual index data and the remained index data.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method when executing the computer program.
A computer storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method.
According to the target data processing method, the device, the equipment and the medium based on the artificial intelligence, after target data are obtained, index data are obtained through calculation, then the index data are screened, the remaining index data after screening are subjected to collinearity calculation, so that collinearity secondary index data are deleted, at least two preset index screening methods are continuously used for screening the currently screened index data deleted with the secondary index data to update the currently screened index data deleted with the secondary index data, and therefore cyclic processing is performed, so that the index data can be screened at least twice, the accuracy of the index data subjected to subsequent model training is guaranteed, and the accuracy of the model is guaranteed.
Drawings
FIG. 1 is a diagram illustrating an exemplary implementation of an artificial intelligence based target data processing method;
FIG. 2 is a schematic flow diagram of a method for artificial intelligence based object data processing in one embodiment;
FIG. 3 is an architecture diagram of an artificial intelligence based object data processing method in another embodiment;
FIG. 4 is a block diagram of an artificial intelligence based target data processing apparatus in one embodiment;
FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The target data processing method based on the artificial intelligence model can be applied to the application environment shown in fig. 1. Wherein database 102 is in communication with server 104 over a network. The server 104 may obtain target data from the database 102, and calculate the target data to obtain corresponding index data; screening the index data by using at least two preset index screening methods to obtain current index data; performing multiple collinearity inspection on the current index data to judge whether secondary index data exist in the current index data, wherein the secondary index data are index data of which the collinearity meets the collinearity requirement, deleting the secondary index data when the secondary index data exist in the current index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data; when the secondary index data meeting the colinearity requirement does not exist in the current index data, taking the current index data as the remaining index data; and training a data processing model according to the residual index data, and processing newly-added data according to the trained data processing model. The server 104 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.
In one embodiment, as shown in fig. 2, there is provided an artificial intelligence model-based target data processing method, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:
s202: and acquiring target data, and calculating the target data to obtain corresponding index data.
Specifically, referring to fig. 3, the target data may be data obtained from a plurality of external resources and customized by a user, and the target data servers may calculate the index data.
The index calculation logic corresponding to the index data can be selected in advance according to business requirements, after the server acquires the target data, the server acquires each industry, and then collects the corresponding index and the index calculation logic according to the industry, so that after the server acquires the target data, the server classifies the target data according to the industry, and then calculates the classified target data by using the corresponding index calculation logic in parallel to obtain the index data.
Alternatively, the indicators may include financial indicators, industry data, and the like, and accordingly the indicator calculation logic is logic for calculating the indicators, such as calculating an average profitability of a product, the server collects the corresponding indexes and index calculation logic according to the industry and caches the indexes, wherein optionally the server may establish a dependency between the respective metric calculation logics, thus avoiding a repeated calculation of each metric, then the server classifies according to industry after acquiring the target data to generate an index calculation task according to industry classification and the corresponding target data, thus, the server acquires the dependency relationship between the calculation logics generated at the beginning, determines the calculation sequence of each index in the index calculation task according to the dependency relationship, and then, sequentially executing corresponding index calculation logics according to the calculation sequence to calculate the target data to obtain index data.
S204: and screening the index data by using at least two preset index screening methods to obtain the current index data.
Specifically, the index screening method can be found below. The server screens the index data according to the index screening methods to obtain the screened index data, wherein preferably, in order to reduce the processing amount, the server may obtain an intersection corresponding to the index data obtained by each index screening method, and rank the index data outside the intersection according to the contribution, so that the server may determine the screened index according to the intersection and the number of the index data within each contribution range, for example, when the number of the index data in the intersection meets the requirement, only the index data in the intersection is used as the screened index data, and when the number of the index data in the intersection is smaller than a preset number, the corresponding number of the index data from the ranked index data according to the contribution may be selected according to the preset number to ensure the validity of the index data range.
Specifically, when the number of the index data in the intersection meets the requirement, the index data is sufficient, the screening range of the index data can be ensured to be sufficient, and further the subsequent accuracy is ensured, if the number of the index data in the intersection is smaller than a certain value, the index data can be added into the intersection according to the contribution degree of the index data outside the intersection, and thus the screening range of the index data can be ensured to be sufficient, and further the subsequent accuracy is ensured.
The contribution degree can be calculated by a method based on information gain and PCA.
Optionally, before the screening of the index data, the corresponding index data may be subjected to variable extremum processing to remove interference of an abnormal value, the processed index data is subjected to normalization processing, and the data with dimensions is converted into dimensionless data to eliminate the influence of different dimensions on the model. In addition, before the index data is screened by the various index screening methods, the server can also analyze the relevance between each index data and the target so as to delete the index data with the relevance not meeting the requirement.
S206: and performing multiple collinearity test on the current index data to judge whether secondary index data exists in the current index data, wherein the secondary index data is index data of which the collinearity meets the collinearity requirement.
S208: deleting the secondary index data when the secondary index data exists in the current index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data; and when the secondary index data meeting the collinearity requirement does not exist in the current index data, taking the current index data as the rest index data.
Specifically, the collinearity test is to calculate a correlation coefficient between each two index data and a corresponding P value thereof, and use multiple collinearity of the variance expansion factor diagnosis factor to delete a secondary factor with strong collinearity.
And in order to ensure the accuracy of the remaining index data, the server loops the step S204 to perform re-screening on the index data. And when the index data meet the requirement, for example, when the index data meeting the collinearity requirement does not exist or the cycle number reaches a preset number, the server acquires the rest index data.
In one embodiment, when the secondary index data exists in the current index data, the secondary index data is deleted, and the current index data from which the secondary index data is deleted is repeatedly screened to update the current index data, wherein the repeatedly screening of the current index data from which the secondary index data is deleted may be continuously performed by screening the index data by using at least two preset index screening methods to obtain the screened index data until the remaining index data meets the preset requirement.
Specifically, the pre-generated index data and service meaning table is a table of the corresponding relationship between each index data and a specific service meaning preset by the user, the server can match the remaining index data with the standard index data in the table, if the matching is successful, the index data has the specific service meaning, so the index data can be retained, if the matching is not successful, the index data can be output for the user to process, for example, if the user determines that the index has no specific service meaning, a deletion instruction is input to delete the unmatched index data, if the user determines that the index has the specific service meaning, the specific service meaning is output, so the server can establish the corresponding relationship between the input service meaning and the corresponding unmatched index data and store the corresponding unmatched index data and the pre-generated service meaning table, so as to facilitate the subsequent matching process, and the remaining index data matched in this way and the remaining index data remained are updated.
In addition, the server can also receive an input index data increasing instruction, so that index data which is not automatically selected by the server but is more important is introduced.
And finally, the server repeats the step of screening the index data until the remaining index data meet the preset requirement, for example, until a stop instruction input by a user is received.
In the above embodiment, after the index data is automatically acquired by the server, the index data is also analyzed manually to ensure that each index data has a specific business meaning, thereby ensuring the accuracy of the finally selected index data.
S210: and training a data processing model according to the residual index data, and processing newly-added data according to the trained data processing model.
Specifically, after the server acquires the remaining index data, the model is trained according to the index data, so that when the newly added data exists subsequently, the newly added data can be processed according to the remaining index data to obtain corresponding target index data, namely, the newly added data is calculated by using the calculation logic of the remaining index data to obtain the target index data, and the target index data is input into the trained data processing model to be processed to obtain a model processing result.
Specifically, as shown in the above diagram, the server downloads the new data in real time, and processes the new data and the corresponding history data by using the remaining index data to obtain the corresponding target index, so as to input the target index into the model to obtain a result, and write the result back.
In the above embodiment, after the target data is obtained, the index data is obtained by calculation, then the index data is screened, and the remaining index data after screening is subjected to collinearity calculation, so that the collinearity index data is deleted, and thus, cyclic processing is performed, so that the index data can be screened, and further, the accuracy of the index data subjected to subsequent model training is ensured, and thus, the accuracy of the model is ensured.
In one embodiment, the current index data is obtained by screening the index data by using at least two preset index screening methods, including at least two of the following methods: sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time; all the index data are input into an index data screening model based on logistic regression, and the index data with the minimum contribution degree are deleted in sequence; and sequentially performing logistic regression on each index data and the target, sequencing the index data according to the contribution degree of the target, sequentially introducing the index data according to the sequencing, and inspecting all introduced index data after introducing the index data to delete the index data with low contribution degree until all the index data are processed.
Specifically, the server may filter the index data by using different index filtering methods in a multithreading manner. For example, different threads are used for executing different index data screening methods respectively, and the threads at least comprise two threads, so that at least two index screening methods are guaranteed to be executed.
Specifically, the server may first add the first index data to the logistic regression-based index data screening model and calculate the contribution degree of the index data, then obtain the next index data, input the next index data and the first index data into the logistic regression-based index data screening model, calculate the contribution degree of each index data, and select the index data with a large contribution degree to output. And then introducing next index data, inputting the index data and the past index data into an index data screening model based on logistic regression together, calculating the contribution degree of each index data, and then selecting the index data with the contribution degree ranked in the front.
In addition, in another thread, the server may first establish an all-variable model, that is, all the index data are input into the index data screening model of the edit regression, and the contribution degrees of the index data are calculated, and then sort the contribution degrees to delete the index data with the smallest contribution degree, and perform a loop until all the index data with the contribution degrees smaller than the preset threshold are deleted.
In addition, in another thread, the server may further screen the index data in a regression manner, specifically, the server may perform logistic regression on each index data and the target in sequence to obtain the contribution degree of each index data, then introduce other index data one by one on the basis of the index data with the highest contribution degree, and check the introduced index data every time a new index data is introduced, so as to delete the index data with the significance degree lower than the preset threshold value, and go through multiple cycles until no significant independent variable is introduced again.
In one embodiment, after taking the current index data as the remaining index data, the method further includes: matching the rest index data with the pre-generated index data and the service meaning table; when the remaining index data is not matched with the pre-generated index data and the service meaning table, outputting the unmatched remaining index data and receiving a processing instruction aiming at the unmatched remaining index data; when the processing instruction is deletion, deleting unmatched residual index data; when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into a pre-generated index data and business meaning table; and updating the remaining index data according to the matched remaining index data and the remaining index data which is remained.
In one embodiment, training the data processing model based on the remaining metric data comprises: sorting the rest index data according to the contribution degree; respectively selecting different amounts of index data with contribution degrees ranked in the front for model training, and testing the accuracy of the trained model; and selecting the model with the maximum accuracy as a final model.
Specifically, during model training, the server ranks the remaining index data according to the contribution degree, so that different amounts of index data with the contribution degree ranked in the front can be selected for model training, and thus, the remaining index data are obtained according to logistic regression and basically meet requirements, and then training is performed according to the artificial intelligence model, so that the accuracy of the artificial intelligence model can be ensured.
The server can respectively select different amounts of index data to perform model training and test accuracy, so that the model with the highest accuracy is selected as the final model.
In the above embodiment, the logistic regression and the artificial intelligence model, that is, the linear analysis and the nonlinear model are combined, so that the accuracy of the model is ensured.
In one embodiment, the artificial intelligence based target data processing method further includes: receiving an index data distribution instruction; and acquiring and outputting the residual index data and the contribution degree corresponding to each residual index data according to the index data distribution instruction.
Specifically, the server may further store the remaining index data and the contribution degrees corresponding to the remaining index data, so that the server may further receive an index data distribution instruction, so as to query the distribution of the corresponding index data, so that the user can know the current index data in time.
It is emphasized that the target data, the index data and the model can also be stored in a node of a block chain in order to further ensure the privacy and security of the target data, the index data and the model.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 4, there is provided an artificial intelligence based object data processing apparatus, including: a data acquisition module 100, a screening module 200, a verification module 300, a cycle module 400, an index data acquisition module 500, and a model processing module 600, wherein:
the data acquisition module 100 is configured to acquire target data and calculate the target data to obtain corresponding index data;
the screening module 200 is configured to screen the index data by using at least two preset index screening methods to obtain current index data;
the inspection module 300 is configured to perform multiple collinearity inspection on the screened index data to determine whether secondary index data exists in the current index data, where the secondary index data is index data whose collinearity meets a collinearity requirement;
a loop module 400, configured to delete the secondary index data when the secondary index data exists in the current index data, and repeatedly screen the current index data from which the secondary index data is deleted, so as to update the current index data;
the index data obtaining module 500 is configured to, when there is no secondary index data satisfying the collinearity requirement in the current index data, take the current index data as remaining index data;
and the model processing module 600 is configured to train a data processing model according to the remaining index data, and process newly-added data according to the trained data processing model.
In one embodiment, the screening module 200 is further configured to calculate the index data according to at least two of the following: sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time; all the index data are input into an index data screening model based on logistic regression, and the index data with the minimum contribution degree are deleted in sequence; and sequentially performing logistic regression on each index data and the target, sequencing the index data according to the contribution degree of the target, sequentially introducing the index data according to the sequencing, and inspecting all introduced index data after introducing the index data to delete the index data with low contribution degree until all the index data are processed.
In one embodiment, the target data processing module further includes:
the matching module is used for matching the residual index data with the pre-generated index data and the service meaning table;
the post-processing module is used for outputting unmatched residual index data when the residual index data are unmatched with the pre-generated index data and the service meaning table, and receiving a processing instruction aiming at the unmatched residual index data; when the processing instruction is deletion, deleting unmatched residual index data; when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into a pre-generated index data and business meaning table; and updating the remaining index data according to the matched remaining index data and the remaining index data which is remained.
In one embodiment, the model processing module 600 includes:
the sorting unit is used for sorting the rest index data according to the contribution degree;
the test unit is used for respectively selecting different amounts of index data with contribution degrees sequenced in the front to carry out model training and testing the accuracy of the trained model;
and the selecting unit is used for selecting the model with the maximum accuracy as the final model.
In one embodiment, the artificial intelligence based target data processing apparatus further includes:
the receiving module is used for receiving an index data distribution instruction;
and the output module is used for acquiring and outputting the residual index data and the contribution degree corresponding to each residual index data according to the index data distribution instruction.
For specific limitations of the target data processing apparatus based on artificial intelligence, reference may be made to the above limitations of the target data processing method based on artificial intelligence, which are not described herein again. The modules in the target data processing device based on artificial intelligence can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing target data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an artificial intelligence based targeted data processing method.
Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: acquiring target data, and calculating the target data to obtain corresponding index data; screening the index data by using at least two preset index screening methods to obtain current index data; performing multiple collinearity inspection on the currently screened index data to judge whether secondary index data exists in the current index data, wherein the secondary index data is index data of which the collinearity meets the collinearity requirement; when the secondary index data exists in the current index data, deleting the secondary index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data; when the secondary index data meeting the colinearity requirement does not exist in the current index data, taking the current index data as the remaining index data; and training a data processing model according to the residual index data, and processing newly-added data according to the trained data processing model.
In one embodiment, the processor, when executing the computer program, performs a screening process on the index data using at least two preset index screening methods to obtain current index data, where the screening process includes at least two of the following steps: sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time; all the index data are input into an index data screening model based on logistic regression, and the index data with the minimum contribution degree are deleted in sequence; and sequentially performing logistic regression on each index data and the target, sequencing the index data according to the contribution degree of the target, sequentially introducing the index data according to the sequencing, and inspecting all introduced index data after introducing the index data to delete the index data with low contribution degree until all the index data are processed.
In one embodiment, the processor, when executing the computer program, after taking the current metric data as the remaining metric data, further comprises: matching the rest index data with the pre-generated index data and the service meaning table; when the remaining index data is not matched with the pre-generated index data and the service meaning table, outputting the unmatched remaining index data and receiving a processing instruction aiming at the unmatched remaining index data; when the processing instruction is deletion, deleting unmatched residual index data; when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into a pre-generated index data and business meaning table; and updating the remaining index data according to the matched remaining index data and the remaining index data which is remained.
In one embodiment, training the data processing model based on the remaining metric data, as implemented by the processor executing the computer program, comprises: sorting the rest index data according to the contribution degree; respectively selecting different amounts of index data with contribution degrees ranked in the front for model training, and testing the accuracy of the trained model; and selecting the model with the maximum accuracy as a final model.
In one embodiment, the processor, when executing the computer program, further performs the steps of: receiving an index data distribution instruction; and acquiring and outputting the residual index data and the contribution degree corresponding to each residual index data according to the index data distribution instruction.
In one embodiment, a computer storage medium is provided, having a computer program stored thereon, the computer program, when executed by a processor, implementing the steps of: acquiring target data, and calculating the target data to obtain corresponding index data; screening the index data by using at least two preset index screening methods to obtain current index data; performing multiple collinearity inspection on the currently screened index data to judge whether secondary index data exists in the current index data, wherein the secondary index data is index data of which the collinearity meets the collinearity requirement; when the secondary index data exists in the current index data, deleting the secondary index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data; when the secondary index data meeting the colinearity requirement does not exist in the current index data, taking the current index data as the remaining index data; and training a data processing model according to the residual index data, and processing newly-added data according to the trained data processing model.
In one embodiment, the computer program is implemented when executed by a processor
Screening the index data by using at least two preset index screening methods to obtain current index data, wherein the current index data comprises at least two of the following types: sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time; all the index data are input into an index data screening model based on logistic regression, and the index data with the minimum contribution degree are deleted in sequence; and sequentially performing logistic regression on each index data and the target, sequencing the index data according to the contribution degree of the target, sequentially introducing the index data according to the sequencing, and inspecting all introduced index data after introducing the index data to delete the index data with low contribution degree until all the index data are processed.
In one embodiment, the computer program, when executed by the processor, further comprises, after taking the current metric data as the remaining metric data: matching the rest index data with the pre-generated index data and the service meaning table; when the remaining index data is not matched with the pre-generated index data and the service meaning table, outputting the unmatched remaining index data and receiving a processing instruction aiming at the unmatched remaining index data; when the processing instruction is deletion, deleting unmatched residual index data; when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into a pre-generated index data and business meaning table; and updating the remaining index data according to the matched remaining index data and the remaining index data which is remained.
In one embodiment, training the data processing model based on the remaining metric data, as implemented when the computer program is executed by the processor, comprises: sorting the rest index data according to the contribution degree; respectively selecting different amounts of index data with contribution degrees ranked in the front for model training, and testing the accuracy of the trained model; and selecting the model with the maximum accuracy as a final model.
In one embodiment, the computer program when executed by the processor further performs the steps of: receiving an index data distribution instruction; and acquiring and outputting the residual index data and the contribution degree corresponding to each residual index data according to the index data distribution instruction.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. An artificial intelligence based target data processing method, characterized in that the artificial intelligence based target data processing method comprises:
acquiring target data, and calculating the target data to obtain corresponding index data;
screening the index data by using at least two preset index screening methods to obtain current index data;
performing multiple collinearity test on the current index data to judge whether secondary index data exists in the current index data, wherein the secondary index data is index data of which the collinearity meets the collinearity requirement;
when the secondary index data exists in the current index data, deleting the secondary index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data;
when secondary index data meeting the colinearity requirement does not exist in the current index data, taking the current index data as the remaining index data;
and training a data processing model according to the residual index data, and processing newly-added data according to the trained data processing model.
2. The artificial intelligence based target data processing method according to claim 1, wherein the at least two preset index screening methods are used for screening the index data to obtain current index data, and the current index data includes at least two of the following:
sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time;
inputting all the index data into an index data screening model based on logistic regression, and sequentially deleting the index data with the minimum contribution degree;
sequentially performing logistic regression on each index data and the targets, sorting the index data according to the contribution degree of the targets, sequentially introducing the index data according to the sorting, and after introducing the index data, checking all introduced index data to delete the index data with low contribution degree until all the index data are processed.
3. The artificial intelligence based target data processing method according to claim 1, wherein after the taking the current index data as the remaining index data, further comprising:
matching the residual index data with pre-generated index data and a service meaning table;
when the residual index data is not matched with the pre-generated index data and a service meaning table, outputting the unmatched residual index data, and receiving a processing instruction aiming at the unmatched residual index data;
when the processing instruction is deletion, deleting the unmatched residual index data;
when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into the pre-generated index data and business meaning table;
and updating the residual index data according to the matched residual index data and the remained index data.
4. The artificial intelligence based target data processing method of claim 1, wherein the training of a data processing model according to the remaining metric data comprises:
sorting the rest index data according to the contribution degree;
respectively selecting different amounts of index data with contribution degrees ranked in the front for model training, and testing the accuracy of the trained model;
and selecting the model with the maximum accuracy as a final model.
5. The artificial intelligence based target data processing method of claim 1, further comprising:
receiving an index data distribution instruction;
and acquiring and outputting the rest index data and the contribution degree corresponding to each rest index data according to the index data distribution instruction.
6. An artificial intelligence based object data processing apparatus, characterized in that the artificial intelligence based object data processing apparatus comprises:
the data acquisition module is used for acquiring target data and calculating the target data to obtain corresponding index data;
the screening module is used for screening the index data by using at least two preset index screening methods to obtain current index data;
the inspection module is used for performing multiple collinearity inspection on the currently screened index data to judge whether secondary index data exists in the current index data or not, wherein the secondary index data is index data of which the collinearity meets the collinearity requirement;
the circulation module is used for deleting the secondary index data when the secondary index data exists in the current index data, and repeatedly screening the current index data with the secondary index data deleted to update the current index data;
the index data acquisition module is used for taking the current index data as the remaining index data when the secondary index data meeting the collinearity requirement does not exist in the current index data;
and the model processing module is used for training a data processing model according to the residual index data and processing newly-added data according to the trained data processing model.
7. The artificial intelligence based target data processing apparatus according to claim 6, wherein the screening module is configured to screen the index data according to at least two of the following types to obtain the screened index data: sequentially adding the index data into an index data screening model based on logistic regression from small to large, and selecting the index data with the largest contribution degree each time;
inputting all the index data into an index data screening model based on logistic regression, and sequentially deleting the index data with the minimum contribution degree;
sequentially performing logistic regression on each index data and a target, sequencing the index data by using the contribution degree of the target, sequentially introducing the index data by using the sequencing, and inspecting all introduced index data after introducing the index data to delete the index data with low contribution degree until all the index data are processed.
8. The artificial intelligence based target data processing apparatus of claim 6, wherein the artificial intelligence based target data processing apparatus further comprises:
the matching module is used for matching the residual index data with pre-generated index data and a service meaning table;
the post-processing module is used for outputting unmatched residual index data when the residual index data is not matched with the pre-generated index data and the service meaning table, and receiving a processing instruction aiming at the unmatched residual index data; when the processing instruction is deletion, deleting the unmatched residual index data; when the processing instruction is reserved, establishing a corresponding relation between the remaining index data which is reserved and the input new business meaning, and storing the corresponding relation into the pre-generated index data and business meaning table; and updating the residual index data according to the matched residual index data and the remained index data.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 5 when executing the computer program.
10. A computer storage medium on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110942022.XA CN113609121A (en) | 2021-08-17 | 2021-08-17 | Target data processing method, device, equipment and medium based on artificial intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110942022.XA CN113609121A (en) | 2021-08-17 | 2021-08-17 | Target data processing method, device, equipment and medium based on artificial intelligence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113609121A true CN113609121A (en) | 2021-11-05 |
Family
ID=78308802
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110942022.XA Pending CN113609121A (en) | 2021-08-17 | 2021-08-17 | Target data processing method, device, equipment and medium based on artificial intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113609121A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105956364A (en) * | 2016-04-20 | 2016-09-21 | 云南中烟工业有限责任公司 | Tobacco leaf distinguishing grouping method based on characteristic chemical component |
CN108257675A (en) * | 2018-02-07 | 2018-07-06 | 平安科技(深圳)有限公司 | Chronic obstructive pulmonary disease onset risk Forecasting Methodology, server and computer readable storage medium |
CN110309219A (en) * | 2019-06-20 | 2019-10-08 | 吉旗物联科技(上海)有限公司 | The generation method and device of credit scoring model |
CN110929524A (en) * | 2019-10-16 | 2020-03-27 | 平安科技(深圳)有限公司 | Data screening method, device, equipment and computer readable storage medium |
-
2021
- 2021-08-17 CN CN202110942022.XA patent/CN113609121A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105956364A (en) * | 2016-04-20 | 2016-09-21 | 云南中烟工业有限责任公司 | Tobacco leaf distinguishing grouping method based on characteristic chemical component |
CN108257675A (en) * | 2018-02-07 | 2018-07-06 | 平安科技(深圳)有限公司 | Chronic obstructive pulmonary disease onset risk Forecasting Methodology, server and computer readable storage medium |
CN110309219A (en) * | 2019-06-20 | 2019-10-08 | 吉旗物联科技(上海)有限公司 | The generation method and device of credit scoring model |
CN110929524A (en) * | 2019-10-16 | 2020-03-27 | 平安科技(深圳)有限公司 | Data screening method, device, equipment and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109598095B (en) | Method and device for establishing scoring card model, computer equipment and storage medium | |
CN110929036A (en) | Electric power marketing inspection management method and device, computer equipment and storage medium | |
US20150199224A1 (en) | Method and Apparatus for Detection of Anomalies in Integrated Parameter Systems | |
CN110929879A (en) | Business decision logic updating method based on decision engine and model platform | |
CN110941555B (en) | Test case recommendation method and device, computer equipment and storage medium | |
CN111176990A (en) | Test data generation method and device based on data decision and computer equipment | |
CN112559365A (en) | Test case screening method and device, computer equipment and storage medium | |
CN109712716B (en) | Disease influence factor determination method, system and computer equipment | |
CN109978261A (en) | Determine method, apparatus, readable medium and the electronic equipment of load forecasting model | |
CN112559364B (en) | Test case generation method and device, computer equipment and storage medium | |
CN112149909A (en) | Ship oil consumption prediction method and device, computer equipment and storage medium | |
CN112232951B (en) | Credit evaluation method, device, equipment and medium based on multi-dimensional cross feature | |
CN115204536A (en) | Building equipment fault prediction method, device, equipment and storage medium | |
CN115062734A (en) | Wind control modeling method, device, equipment and medium capable of outputting explanatory information | |
CN111767192A (en) | Service data detection method, device, equipment and medium based on artificial intelligence | |
CN113110961B (en) | Equipment abnormality detection method and device, computer equipment and readable storage medium | |
CN114169460A (en) | Sample screening method, sample screening device, computer equipment and storage medium | |
CN112527573B (en) | Interface testing method, device and storage medium | |
CN110852384A (en) | Medical image quality detection method, device and storage medium | |
CN113609121A (en) | Target data processing method, device, equipment and medium based on artificial intelligence | |
Mendes et al. | Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions: a replicated study | |
Wirawan et al. | Application of data mining to prediction of timeliness graduation of students (a case study) | |
CN114860608A (en) | Scene construction based system automation testing method, device, equipment and medium | |
CN114372867A (en) | User credit verification and evaluation method and device and computer equipment | |
CN110865939B (en) | Application program quality monitoring method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20240614 |
|
AD01 | Patent right deemed abandoned |