CN110019172B - Data processing method and device, storage medium and electronic equipment - Google Patents

Data processing method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN110019172B
CN110019172B CN201810962547.8A CN201810962547A CN110019172B CN 110019172 B CN110019172 B CN 110019172B CN 201810962547 A CN201810962547 A CN 201810962547A CN 110019172 B CN110019172 B CN 110019172B
Authority
CN
China
Prior art keywords
variable
target
value
characteristic
variables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810962547.8A
Other languages
Chinese (zh)
Other versions
CN110019172A (en
Inventor
江期武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN201810962547.8A priority Critical patent/CN110019172B/en
Publication of CN110019172A publication Critical patent/CN110019172A/en
Application granted granted Critical
Publication of CN110019172B publication Critical patent/CN110019172B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The present disclosure relates to the field of big data analysis technologies, and in particular, to a data processing method, a data processing apparatus, a computer readable storage medium, and an electronic device. The data processing method provided by the embodiment of the disclosure comprises the following steps: reading a database according to the selected target variable; wherein the database includes a plurality of characteristic variables associated with a target variable; obtaining the target variable value taking condition of each characteristic variable in different values or different value ranges; calculating to obtain the predicted value of each characteristic variable to the target variable according to the value taking condition of the target variable; and generating a visual chart for differentially displaying the characteristic variables according to the predicted value. According to the data processing method provided by the embodiment of the disclosure, the importance distribution result of the characteristic variables can be obtained by calculating the predicted values of the plurality of characteristic variables on the target variables, so that the analysis workload of service personnel is reduced, and the data processing efficiency is improved.

Description

Data processing method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of big data analysis technologies, and in particular, to a data processing method, a data processing apparatus, a computer readable storage medium, and an electronic device.
Background
With the rapid development of computer technology, the insurance industry has converged a great deal of customer information, agent information, and related business data. Based on the specificity of the insurance industry, the data acquisition range is extremely wide and fine, and the obtained data resources have the characteristics of wide index span and large associated information quantity.
In the face of large-scale mass data, although the data screening and analysis work can be assisted by means of a conventional data statistics operation tool, the data screening and analysis work can only replace part of manpower on the simple digital processing level, and a large amount of manpower resources still need to be occupied when the processing of important index data is really involved. Particularly, under the influence of more interference information, effective data are difficult to obtain, personnel with higher business capability are generally required to process item by item, so that the efficiency is low, and a large amount of labor cost and time cost are required to be consumed.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure is directed to a data processing method, a data processing apparatus, a computer-readable storage medium, and an electronic device, and further, at least to some extent, to overcome the technical problem of low data processing efficiency caused by the limitations and disadvantages of the related art.
According to an aspect of the present disclosure, there is provided a data processing method, which is characterized by comprising:
reading a database according to the selected target variable; wherein the database includes a plurality of characteristic variables associated with a target variable;
obtaining the target variable value taking condition of each characteristic variable in different values or different value ranges;
calculating to obtain the predicted value of each characteristic variable to the target variable according to the value taking condition of the target variable;
and generating a visual chart for differentially displaying the characteristic variables according to the predicted value.
In one exemplary embodiment of the present disclosure, after reading the database according to the selected target variable, the method further comprises:
and filtering noise variables in the characteristic variables.
In an exemplary embodiment of the disclosure, the filtering out noise variables in the feature variables includes:
acquiring the data deletion rate of each characteristic variable;
the characteristic variable of which the data loss rate exceeds a preset threshold value is recorded as a noise variable;
and filtering noise variables in the characteristic variables.
In an exemplary embodiment of the disclosure, the filtering out noise variables in the feature variables includes:
setting a filtering field according to the target variable;
recording the characteristic variable containing the filtering field as a noise variable;
and filtering noise variables in the characteristic variables.
In an exemplary embodiment of the present disclosure, the obtaining the target variable value condition of each feature variable in different values or different value ranges includes:
judging whether the target variable is a binary variable or not;
if the judgment result is negative, converting the target variable into a binary variable; wherein the values of the binary variables comprise target values and non-target values;
and acquiring the target value quantity and the non-target value quantity of each characteristic variable in different values or different value ranges.
In an exemplary embodiment of the disclosure, the obtaining the target value number and the non-target value number of the respective feature variables in different values or different value ranges includes:
judging whether each characteristic variable is a discrete variable or a continuous variable;
if the characteristic variable is a discrete variable, acquiring the target value quantity and the non-target value quantity of the discrete variable under different values;
and if the characteristic variable is judged to be a continuous variable, carrying out box division processing on the continuous variable according to different value ranges, and then obtaining the target value quantity and the non-target value quantity of the continuous variable in different data boxes.
In an exemplary embodiment of the present disclosure, the calculating, according to the target variable value, a predicted value of each feature variable for the target variable includes:
by the formula
Figure BDA0001774138930000031
Calculating to obtain the predictive value of each characteristic variable for the target variable;
wherein S is the predicted value of the characteristic variable to the target variable, i is the value measurement number or the box measurement number of the characteristic variable, and m i The ith value of the characteristic variable or the target value in the ith sub-boxNumber n i And taking the i value of the characteristic variable or the number of non-target values in the i sub-box, wherein m is the total number of target values in the target variable, and n is the total number of non-target values in the target variable.
According to an aspect of the present disclosure, there is provided a data processing apparatus, comprising:
a data reading module configured to read the database according to the selected target variable; wherein the database includes a plurality of characteristic variables associated with a target variable;
the variable value taking module is configured to acquire target variable value taking conditions of each characteristic variable in different values or different value ranges;
the value prediction module is configured to calculate the predicted value of each characteristic variable for the target variable according to the value condition of the target variable;
and the chart generation module is configured to generate a visual chart for differentially displaying the characteristic variables according to the prediction value.
According to one aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements any of the above described data processing methods.
According to one aspect of the present disclosure, there is provided an electronic device, characterized by comprising a processor and a memory; wherein the memory is for storing executable instructions of the processor, the processor being configured to perform any of the data processing methods described above via execution of the executable instructions.
According to the data processing method provided by the embodiment of the disclosure, the importance distribution result of the characteristic variables can be obtained by calculating the predicted values of the plurality of characteristic variables for the target variables, and the result is displayed in a differentiated mode. The method reduces the data analysis range, reduces the interference of noise data, obtains the distribution of the importance degree of related data, intuitively displays the distribution result in a mode of generating a visual chart, reduces the analysis workload of service personnel and improves the data processing efficiency.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.
Fig. 1 schematically shows a flow chart of steps of a data processing method in an exemplary embodiment of the present disclosure.
Fig. 2 schematically shows a flow chart of steps of a data processing method in another exemplary embodiment of the present disclosure.
Fig. 3 schematically shows a flow chart of steps of a data processing method in another exemplary embodiment of the present disclosure.
Fig. 4 schematically shows a flow chart of steps of a data processing method in another exemplary embodiment of the present disclosure.
Fig. 5 schematically shows a block diagram of the components of a data processing apparatus in an exemplary embodiment of the present disclosure.
Fig. 6 schematically illustrates a schematic diagram of a program product in an exemplary embodiment of the present disclosure.
Fig. 7 schematically illustrates a block diagram of an electronic device in an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
In an exemplary embodiment of the present disclosure, there is provided a data processing method, referring to fig. 1, which may mainly include the steps of:
s110, reading a database according to the selected target variable; wherein the database includes a plurality of characteristic variables associated with the target variable.
The method comprises the steps of firstly selecting a target variable according to requirements, and then reading a database according to the selected target variable, wherein the database comprises a plurality of characteristic variables related to the target variable. For example, if the user wants to know what indicators affect the retention of the insurance agent and how much each indicator affects the retention of the insurance agent, then this step may select the job title of the insurance agent as the target variable, and then read from the database a plurality of characteristic variables related to the job title of the insurance agent, such as various indicators of sex, age, academic, age of the insurance agent, address and company distance, etc.
And S120, filtering noise variables in the characteristic variables.
In order to improve the data processing efficiency and reduce the influence of interference information, the noise variable in the characteristic variable can be filtered in the step. Of course, the present exemplary embodiment may directly perform the subsequent analysis processing on all the feature variables without filtering, which is not particularly limited in the present disclosure.
S130, obtaining the target variable value taking condition of each characteristic variable in different values or different value ranges.
After the target variable and the characteristic variable are defined, the target variable value condition of each characteristic variable in different values or different value ranges is obtained in the step. When the characteristic variables are discrete variables, the target variable value taking conditions of the characteristic variables under different values are obtained. And when the characteristic variable is a discrete variable, the target variable value taking condition of each characteristic variable in different value taking ranges is obtained. As shown in table 1, taking the case of employment of an insurance agent as an example, the present step can obtain the case of employment of an agent corresponding to various indexes such as gender, age, academic, address and distance between the company, for example: acquiring the incumbent and the off-job of the male insurance agent and the female insurance agent, and acquiring the incumbent and the off-job of the insurance agent with ages of 18-22 years old, 23-27 years old, 28-32 years old and over 32 years old, and the like.
Table 1 statistical table of the job positions of insurance agents
Figure BDA0001774138930000061
And S140, calculating to obtain the predicted value of each characteristic variable to the target variable according to the target variable value.
After the target variable value is obtained in step S130, the prediction value of each feature variable for the target variable is calculated by using the target variable value as the input parameter. The prediction value is used for evaluating the prediction capability of each characteristic variable for the target variable value, for example, the prediction value of various characteristic parameters such as gender, age, academic and the like for the job positions of the insurance agents can be calculated respectively.
And S150, generating a visual chart of the differential display characteristic variable according to the prediction value.
After the predicted value of each characteristic variable is calculated, the step is to display each characteristic variable in a differentiated mode according to the value of the predicted value, and a visual chart is generated for operators to check and analyze. For example, the ranking may be based on the magnitude of the predicted value, the highlighting may be performed for feature variables with predicted values above a certain threshold, and so on.
The data processing method provided by the exemplary embodiment can obtain the importance distribution result of the characteristic variable by calculating the predicted value of various characteristic variables for the target variable, and differentially display the result. The method reduces the data analysis range, reduces the interference of noise data, obtains the distribution of the importance degree of related data, intuitively displays the distribution result in a mode of generating a visual chart, reduces the analysis workload of service personnel and improves the data processing efficiency.
As shown in fig. 2, based on the above exemplary embodiment, step s120 of filtering noise variables in the feature variables may further include the steps of:
and S221, acquiring the data deletion rate of each characteristic variable.
The value of the characteristic variable comprises two states of null and non-null, and if the value is null (data is not collected or collected data is invalid), the value is marked as missing, and the data missing rate=the missing number/(the missing number+the non-null number).
And S222, recording a characteristic variable with the data loss rate exceeding a preset threshold as a noise variable.
If the data loss rate of a certain characteristic variable exceeds a preset threshold, the reference value of the characteristic variable can be considered as not high, and therefore the characteristic variable can be recorded as a noise variable.
And S223, filtering noise variables in the characteristic variables.
After the marking of the noise variable is completed in step S222, the noise variable in the feature variable is filtered in this step, so as to improve the data processing efficiency, and avoid the noise variable from interfering with the data processing result.
As shown in fig. 3, in another exemplary embodiment of the present disclosure, step s120 of filtering noise variables in the feature variables may further include the steps of:
s321, setting a filtering field according to the target variable.
The step can set some filtering fields according to the requirements of operators and the self characteristics of target variables so as to filter characteristic variables.
Step S322, the characteristic variable containing the filtering field is recorded as a noise variable.
If a feature variable contains a set filtering field, it can be noted as a noise variable.
And S323, filtering noise variables in the characteristic variables.
After the marking of the noise variable is completed in step S322, the noise variable in the feature variable is filtered in this step, so as to improve the data processing efficiency and avoid the noise variable from interfering with the data processing result.
As shown in fig. 4, in another exemplary embodiment of the present disclosure, step s130, obtaining the target variable value case of each feature variable in different values or different value ranges may further include the following steps:
and S431, judging whether the target variable is a binary variable.
The step first judges whether the target variable is a binary variable, for example, the job situation of the agent includes two values of incumbent and disincumbent, so that the target variable can be judged to be the binary variable.
S432, if the judgment result is negative, converting the target variable into a binary variable; wherein the values of the binary variables include target values and non-target values.
If the determination in step S431 is no, that is, the target variable is not a binary variable, the present step performs the conversion process on the target variable. For example, if the target variable is a discrete variable including A, B, C values, then this step may set a to the target value and B and C to non-target values. For another example, if the target variable is a continuous variable having a value ranging from D to F, the present step may set the value ranging from D to E as the target value and the value ranging from E to F as the non-target value.
S433, obtaining the target value quantity and the non-target value quantity of each characteristic variable in different values or different value ranges.
Similar to step S432, the present step may first determine whether each of the feature variables is a discrete type variable or a continuous type variable. If the characteristic variable is the discrete variable, the target value quantity and the non-target value quantity of the discrete variable under different values are obtained. If the characteristic variable is the continuous variable, the continuous variable is subjected to box division according to different value ranges, and then the target value quantity and the non-target value quantity of the continuous variable in different data boxes are obtained.
By judging the types of the target variable and the characteristic variable, the present exemplary embodiment can simplify the continuous variable or the discrete variable, thereby reducing the difficulty of data processing and improving the data processing efficiency.
As a preferred embodiment, step s140, calculating, according to the target variable value, the predicted value of each feature variable for the target variable, may further include the following steps:
by the formula
Figure BDA0001774138930000091
Calculating to obtain the predictive value of each characteristic variable to the target variable;
wherein S is the predicted value of the characteristic variable to the target variable, i is the value measurement number or the bin measurement number of the characteristic variable, and m i The number n of target values in the ith value or the ith sub-box of the characteristic variable is i And taking the characteristic variable as the ith value or the number of non-target values in the ith sub-box, wherein m is the total number of target values in the target variable, and n is the total number of non-target values in the target variable. Of course, in other exemplary embodiments of the disclosure, any other prediction value calculation method may be used, for example, a kalman filter prediction model, a neural network prediction model, and the like, which is not limited in particular by the present disclosure.
The larger the predictive value S of a certain characteristic variable is, the larger the distribution difference of the value S of the target variable on the characteristic variable is, in other words, the stronger the predictive capability of the characteristic variable to the value S of the target variable is. Therefore, the importance distribution situation of various characteristic variables for the target variable can be obtained by calculating the predictive value S, thereby effectively helping operators to complete data screening, reducing the redundancy of the data and improving the data processing efficiency.
It should be noted that while the above exemplary embodiments describe the steps of the methods in this disclosure in a particular order, this does not require or imply that the steps must be performed in that particular order or that all of the steps must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
In an exemplary embodiment of the present disclosure, there is also provided a data processing apparatus, and as shown in fig. 5, the data processing apparatus 50 may mainly include a data reading module 51, a variable value module 52, a value prediction module 53, and a graph generation module 54. Wherein the data reading module 51 is configured to read the database according to the selected target variable; wherein the database includes a plurality of characteristic variables associated with the target variable. The variable take-off module 52 is configured to obtain target variable take-off conditions for each of the feature variables over different values or ranges of values. The value prediction module 53 is configured to calculate a predicted value of each feature variable for the target variable according to the target variable value. The chart generation module 54 is configured to generate a visual chart that differentially displays the feature variables according to the predictive value.
The details of the data processing apparatus are described in detail in the corresponding data processing method, and thus are not described herein.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, can implement the above-described data processing method of the present disclosure. In some possible implementations, aspects of the disclosure may also be implemented in the form of a program product including program code; the program product may be stored on a non-volatile storage medium (which may be a CD-ROM, a U-disk or a removable hard disk, etc.) or on a network; when the program product is run on a computing device (which may be a personal computer, a server, a terminal device or a network device, etc.), the program code is for causing the computing device to carry out the method steps in the above-mentioned exemplary embodiments of the present disclosure.
Referring to fig. 6, a program product 60 for implementing the above-described methods according to embodiments of the present disclosure may employ a portable compact disk read-only memory (CD-ROM) and include program code and may run on a computing device (e.g., a personal computer, a server, a terminal device, or a network appliance, etc.). However, the program product of the present disclosure is not limited thereto. In the present exemplary embodiment, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may take the form of any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium.
The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the C programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's computing device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In cases involving remote computing devices, the remote computing devices may be connected to the user computing devices through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), etc.; alternatively, it may be connected to an external computing device, for example, using an Internet service provider to connect through the Internet.
In an exemplary embodiment of the present disclosure, there is also provided an electronic device including at least one processor and at least one memory for storing executable instructions of the processor; wherein the processor is configured to perform the method steps in the above-described exemplary embodiments of the present disclosure via execution of the executable instructions.
An electronic device 700 in the present exemplary embodiment is described below with reference to fig. 7. The electronic device 700 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
Referring to fig. 7, the electronic device 700 is embodied in the form of a general purpose computing device. Components of electronic device 700 may include, but are not limited to: at least one processing unit 710, at least one memory unit 720, a bus 730 connecting the different system components, including the processing unit 710 and the memory unit 720, a display unit 740.
Wherein the storage unit 720 stores program code executable by the processing unit 710 such that the processing unit 710 performs the method steps in the above-described exemplary embodiments of the present disclosure.
The memory unit 720 may include readable media in the form of volatile memory units, such as random access memory unit 721 (RAM) and/or cache memory unit 722, and may further include read only memory unit 723 (ROM).
The storage unit 720 may also include a program/utility 724 having a set (at least one) of program modules 725, including but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 730 may be a local bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a bus using any of a variety of bus architectures.
The electronic device 700 may also communicate with one or more external devices 800 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that allow a user to interact with the electronic device 700, and/or any device (e.g., router, modem, etc.) that allows the electronic device 700 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 750. Also, electronic device 700 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 760. As shown in fig. 7, network adapter 760 may communicate with other modules of electronic device 700 via bus 730. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 700, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
The above described features, structures or characteristics may be combined in any suitable manner in one or more embodiments, such as the possible, interchangeable features as discussed in connection with the various embodiments. In the above description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the disclosed aspects may be practiced without one or more of the specific details, or with other methods, components, materials, and so forth. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.

Claims (7)

1. A method of data processing, comprising:
reading a database according to the selected target variable; wherein the database includes a plurality of characteristic variables associated with a target variable;
the method for acquiring the target variable value condition of each characteristic variable in different values or different value ranges comprises the following steps:
judging whether the target variable is a binary variable or not; if the judgment result is negative, converting the target variable into a binary variable; wherein the values of the binary variables comprise target values and non-target values;
judging whether each characteristic variable is a discrete variable or a continuous variable;
if the characteristic variable is a discrete variable, acquiring the target value quantity and the non-target value quantity of the discrete variable under different values;
if the characteristic variable is judged to be a continuous variable, carrying out box division on the continuous variable according to different value ranges, and then obtaining the target value quantity and the non-target value quantity of the continuous variable in different data boxes;
according to the target variable value, calculating to obtain the predicted value of each characteristic variable to the target variable, wherein the predicted value comprises the following steps:
by the formula
Figure QLYQS_1
Calculating to obtain the predictive value of each characteristic variable for the target variable;
wherein ,Sas a predictive value of a feature variable for the target variable,ia value metering number or a box metering number for the characteristic variable,m i is the characteristic variable ofiTake the value or the firstiThe number of target values in each sub-box,n i is the characteristic variable ofiTake the value or the firstiThe number of non-target values in each bin,mfor the total number of target values in the target variable,na total number of non-target values in the target variable;
and generating a visual chart for differentially displaying the characteristic variables according to the predicted value.
2. The data processing method of claim 1, wherein after reading the database according to the selected target variable, the method further comprises:
and filtering noise variables in the characteristic variables.
3. The method of claim 2, wherein filtering noise variables in the feature variables comprises:
acquiring the data deletion rate of each characteristic variable;
the characteristic variable of which the data loss rate exceeds a preset threshold value is recorded as a noise variable;
and filtering noise variables in the characteristic variables.
4. The method of claim 2, wherein filtering noise variables in the feature variables comprises:
setting a filtering field according to the target variable;
recording the characteristic variable containing the filtering field as a noise variable;
and filtering noise variables in the characteristic variables.
5. A data processing apparatus, comprising:
a data reading module configured to read the database according to the selected target variable; wherein the database includes a plurality of characteristic variables associated with a target variable;
the variable value taking module is configured to obtain target variable value taking conditions of each characteristic variable in different values or different value ranges, and comprises the following steps:
judging whether the target variable is a binary variable or not; if the judgment result is negative, converting the target variable into a binary variable; wherein the values of the binary variables comprise target values and non-target values;
judging whether each characteristic variable is a discrete variable or a continuous variable;
if the characteristic variable is a discrete variable, acquiring the target value quantity and the non-target value quantity of the discrete variable under different values;
if the characteristic variable is judged to be a continuous variable, carrying out box division on the continuous variable according to different value ranges, and then obtaining the target value quantity and the non-target value quantity of the continuous variable in different data boxes;
the value prediction module is configured to calculate the predicted value of each characteristic variable to the target variable according to the target variable value, and comprises the following steps:
by the formula
Figure QLYQS_2
Calculating to obtain the predictive value of each characteristic variable for the target variable;
wherein ,Sas a predictive value of a feature variable for the target variable,ia value metering number or a box metering number for the characteristic variable,m i is the characteristic variable ofiTake the value or the firstiThe number of target values in each sub-box,n i is the characteristic variable ofiTake the value or the firstiThe number of non-target values in each bin,mfor the total number of target values in the target variable,na total number of non-target values in the target variable;
and the chart generation module is configured to generate a visual chart for differentially displaying the characteristic variables according to the prediction value.
6. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the data processing method of any of claims 1-4.
7. An electronic device, comprising:
a processor;
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the data processing method of any of claims 1-4 via execution of the executable instructions.
CN201810962547.8A 2018-08-22 2018-08-22 Data processing method and device, storage medium and electronic equipment Active CN110019172B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810962547.8A CN110019172B (en) 2018-08-22 2018-08-22 Data processing method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810962547.8A CN110019172B (en) 2018-08-22 2018-08-22 Data processing method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110019172A CN110019172A (en) 2019-07-16
CN110019172B true CN110019172B (en) 2023-05-30

Family

ID=67188400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810962547.8A Active CN110019172B (en) 2018-08-22 2018-08-22 Data processing method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110019172B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113488184B (en) * 2021-07-07 2023-09-22 天津开心生活科技有限公司 Method and device for inputting data, computer readable storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005122509A (en) * 2003-10-17 2005-05-12 Hitachi Ltd Program, system and method for analyzing hierarchical structure data
CN107369095A (en) * 2017-06-15 2017-11-21 阿里巴巴集团控股有限公司 A kind of data processing method of vehicle insurance business, apparatus and system
CN107590735A (en) * 2017-09-04 2018-01-16 深圳市华傲数据技术有限公司 Data digging method and device for credit evaluation
WO2018080522A1 (en) * 2016-10-28 2018-05-03 Hewlett-Packard Development Company, L.P. Target class feature model
US10025813B1 (en) * 2017-04-13 2018-07-17 Sas Institute Inc. Distributed data transformation system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170084187A1 (en) * 2011-11-21 2017-03-23 Pulsar Informatics, Inc. Systems and methods for improved scoring on stimulus-response tests
US10394871B2 (en) * 2016-10-18 2019-08-27 Hartford Fire Insurance Company System to predict future performance characteristic for an electronic record

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005122509A (en) * 2003-10-17 2005-05-12 Hitachi Ltd Program, system and method for analyzing hierarchical structure data
WO2018080522A1 (en) * 2016-10-28 2018-05-03 Hewlett-Packard Development Company, L.P. Target class feature model
US10025813B1 (en) * 2017-04-13 2018-07-17 Sas Institute Inc. Distributed data transformation system
CN107369095A (en) * 2017-06-15 2017-11-21 阿里巴巴集团控股有限公司 A kind of data processing method of vehicle insurance business, apparatus and system
CN107590735A (en) * 2017-09-04 2018-01-16 深圳市华傲数据技术有限公司 Data digging method and device for credit evaluation

Also Published As

Publication number Publication date
CN110019172A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
CN109360012B (en) Advertisement delivery channel selection method and device, storage medium and electronic equipment
CA2935281C (en) A multidimensional recursive learning process and system used to discover complex dyadic or multiple counterparty relationships
CN102117443A (en) Analyzing anticipated value and effort in using cloud computing to process a specified workload
CN110738527A (en) feature importance ranking method, device, equipment and storage medium
CN113837596B (en) Fault determination method and device, electronic equipment and storage medium
CN109559054B (en) Electric power engineering construction information processing system
Kim Considerations for generating meaningful HRA data: Lessons learned from HuREX data collection
CN115423429A (en) Multimode integrated distribution network operation system based on image and sound information
CN110019172B (en) Data processing method and device, storage medium and electronic equipment
CN115034596A (en) Risk conduction prediction method, device, equipment and medium
CN111179051A (en) Financial target customer determination method and device and electronic equipment
CN112488865A (en) Financial risk prediction method and device based on financial time nodes and electronic equipment
CN112860672A (en) Method and device for determining label weight
CN117314347A (en) Project management method, system, terminal equipment and storage medium
CN111815435A (en) Visualization method, device, equipment and storage medium for group risk characteristics
CN114862282B (en) Business and financial cooperative management method and system based on data analysis
CN115796665A (en) Multi-index carbon efficiency grading evaluation method and device for green energy power generation project
CN113298120B (en) Fusion model-based user risk prediction method, system and computer equipment
US20100100410A1 (en) Systems and Methods for Ecological Evaluation and Analysis of an Enterprise
CN114511174A (en) Service index map construction method and device
Decker et al. The Thousand Faces of Explainable AI Along the Machine Learning Life Cycle: Industrial Reality and Current State of Research
CN112419025A (en) User data processing method and device, storage medium and electronic equipment
CN111062816B (en) Account asset supervision method and device
CN116703614A (en) Vehicle risk wind control method, device, electronic equipment and computer readable medium
CN110458473B (en) Dynamic decision analysis method and terminal for electric billboard

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant