CN111414398A - Data analysis model determination method and device and storage medium - Google Patents
Data analysis model determination method and device and storage medium Download PDFInfo
- Publication number
- CN111414398A CN111414398A CN202010110683.1A CN202010110683A CN111414398A CN 111414398 A CN111414398 A CN 111414398A CN 202010110683 A CN202010110683 A CN 202010110683A CN 111414398 A CN111414398 A CN 111414398A
- Authority
- CN
- China
- Prior art keywords
- data
- data set
- analysis
- stationary
- analysis model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007405 data analysis Methods 0.000 title claims abstract description 110
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000004458 analytical method Methods 0.000 claims abstract description 141
- 238000005457 optimization Methods 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims description 102
- 238000004364 calculation method Methods 0.000 claims description 34
- 238000012417 linear regression Methods 0.000 claims description 20
- 238000012360 testing method Methods 0.000 claims description 13
- 238000012821 model calculation Methods 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 8
- 238000007689 inspection Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 4
- 238000013112 stability test Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 5
- 238000005516 engineering process Methods 0.000 abstract description 2
- 230000009286 beneficial effect Effects 0.000 abstract 1
- 238000009795 derivation Methods 0.000 description 6
- 238000007726 management method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000013210 evaluation model Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013499 data model Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000011157 data evaluation Methods 0.000 description 1
- 238000013524 data verification Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Probability & Statistics with Applications (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Tests Of Electronic Circuits (AREA)
- Automatic Analysis And Handling Materials Therefor (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention relates to a data processing technology, and discloses a data analysis model determining method, which comprises the following steps: acquiring a time series data set; performing stationarity check on the time series data set to obtain an analysis data set, wherein the analysis data set comprises a stationary data set in the time series data set and a non-stationary data set in the time series data set; judging whether the non-stationary data set contains other stationary data; if the non-stationary data set contains other stationary data, performing fitting optimization on the stationary data set and the other stationary data contained in the non-stationary data set to generate a data analysis model; and acquiring an original data set to be analyzed, and analyzing and calculating the original data set by using the data analysis model to obtain an analysis result. The invention also provides a data analysis model determining device, electronic equipment and a storage medium. The method can improve the stability of the data analysis model and is beneficial to improving the accuracy of data analysis.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for determining a data analysis model, an electronic device, and a readable storage medium.
Background
Currently, data is analyzed for product pricing, for example, by a calculation performed by a actuary, or by a pre-constructed model to assist in the analysis. In the prior art, most evaluation models for data analysis adopt a traditional prediction algorithm or data classification calculation, and although the purposes of data analysis and evaluation can be achieved, the prediction algorithm or data classification calculation is too single and lacks necessary market data as a support, so that the stability of the evaluation model is not high, and further the problem of low accuracy of data analysis by the evaluation model is guaranteed to be analyzed.
Disclosure of Invention
The invention provides a data analysis model determining method and device, electronic equipment and a computer readable storage medium, and mainly aims to improve the stability of a data analysis model and improve the accuracy of data analysis.
In order to achieve the above object, the present invention provides a method for determining a data analysis model, comprising:
acquiring a time series data set;
performing stationarity check on the time series data set to obtain an analysis data set, wherein the analysis data set comprises a stationary data set in the time series data set and a non-stationary data set in the time series data set;
judging whether the non-stationary data set contains other stationary data;
if the non-stationary data set contains other stationary data, performing fitting optimization on the stationary data set and the other stationary data contained in the non-stationary data set to generate a data analysis model;
and acquiring an original data set to be analyzed, and analyzing and calculating the original data set by using the data analysis model to obtain an analysis result of a preset type of the original data set to be analyzed.
Optionally, the performing stationarity check on the time series data set to obtain an analysis data set includes:
performing stationarity test on the time sequence data set by using an analysis function;
and carrying out stability classification on the result of the stability test to obtain an analysis data set.
Optionally, the determining whether the non-stationary data set contains other stationary data includes:
carrying out differential calculation on data contained in the non-stationary data set by using a differential function;
judging whether the difference calculation result contains a stable data subset which is not empty;
if so, determining that the non-stationary data set contains other stationary data, and determining that the stationary data subset is other stationary data.
Optionally, the performing fitting optimization on the stationary data set and other stationary data included in the non-stationary data set to generate a data analysis model includes:
fitting and calculating other stationary data contained in the stationary data set and the non-stationary data set by using a linear regression function to obtain a fitting data set;
mapping the fitted data set to the linear regression function to generate an analysis function;
carrying out logarithmic calculation on the analysis function to obtain a likelihood function;
and combining the likelihood function and the analysis function to generate a data analysis model.
Optionally, the linear regression function is:
wherein z represents data in the stationary data set and other stationary data included in the non-stationary data set, and the value range of g (z) is an interval [0,1 ].
Optionally, the obtaining of the original data set to be analyzed, and performing analysis calculation on the original data set by using the data analysis model to obtain an analysis result of a preset type of the original data set to be analyzed includes:
acquiring an original data set to be analyzed, wherein the original data set comprises insurance policy data of a user and historical insurance policy data of the user;
analyzing and calculating the insurance policy data and the historical insurance policy data by using the data analysis model to obtain an insurance policy analysis set and a historical insurance policy analysis set;
and combining the insurance policy analysis set and the historical insurance policy analysis set by using a data warehouse scheme to obtain a insurance fee analysis result of the user.
Optionally, the performing, by using the data analysis model, analysis and calculation on the policy keeping data and the historical policy keeping data respectively includes:
calculating the insurance policy data by using the likelihood function in the data analysis model to generate an insurance policy analysis set, wherein the insurance policy analysis set comprises insurance policy analysis and insurance policy occurrence probability;
and calculating the historical policy data by using a likelihood function in the data analysis model to generate a historical policy analysis set, wherein the historical policy analysis set comprises historical premium analysis and historical premium occurrence probability.
In order to solve the above problem, the present invention also provides a data analysis model determination apparatus, including:
the data acquisition module is used for acquiring a time series data set;
the data inspection module is used for carrying out stationarity inspection on the time sequence data set to obtain an analysis data set, wherein the analysis data set comprises a stationary data set in the time sequence data set and a non-stationary data set in the time sequence data set;
the data judgment module is used for judging whether the non-stationary data set contains other stationary data;
the model calculation module is used for fitting and optimizing the stationary data set and other stationary data contained in the non-stationary data set to generate a data analysis model if the non-stationary data set contains other stationary data;
the data acquisition module is also used for acquiring an original data set to be analyzed;
the model calculation module is further configured to perform analysis calculation on the original data set by using the data analysis model to obtain a preset type of analysis result on the original data set to be analyzed.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one instruction; and
a processor executing instructions stored in the memory to implement the data analysis model determination method of any of the above.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, in which at least one instruction is stored, and the at least one instruction is executed by a processor in an electronic device to implement the data analysis model determination method described in any one of the above.
The embodiment of the invention acquires a time series data set; performing stationarity check on the time series data set to obtain an analysis data set, wherein the analysis data set comprises a stationary data set in the time series data set and a non-stationary data set in the time series data set; judging whether the non-stationary data set contains other stationary data; and if the non-stationary data set contains other stationary data, performing fitting optimization on the stationary data set and the other stationary data contained in the non-stationary data set to generate a data analysis model. Through stationarity test on the time series data set and secondary judgment on whether the non-stationary data set contains other stationary data or not, stationarity and accuracy of data during model establishment can be effectively improved, influence of redundant data on construction of a data analysis model is eliminated, accuracy of the model on data analysis is further improved, the original data set to be analyzed is analyzed and calculated through the data analysis model, a preset type analysis result of the original data set to be analyzed can be obtained, and the purpose of improving the accuracy of the data analysis model on data analysis is achieved.
Drawings
Fig. 1 is a schematic flow chart of a data analysis model determination method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a data analysis model determination apparatus according to an embodiment of the present invention;
fig. 3 is a schematic internal structural diagram of an electronic device implementing a data analysis model determination method according to an embodiment of the present invention;
the objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Fig. 1 is a schematic flow chart of a data analysis model determination method according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the data analysis model determining method includes:
and S1, acquiring a time series data set. In an alternative embodiment, the time series data set may be obtained from a network channel sales platform.
The time sequence data set comprises data corresponding to different times and different time points. For example, the time series data set includes policy data for sample users and non-policy data for sample users, as well as other policy-related data and the like. The policy data includes, but is not limited to, historical policy data and policy data; the non-policy data comprises identity information of profession, age, gender and the like of the sample user; the other policy related data includes collected market policy research data, and policy related statistical data for each insurance company.
And S2, performing stationarity check on the time series data set to obtain an analysis data set, wherein the analysis data set comprises a stationary data set in the time series data set and a non-stationary data set in the time series data set.
The stationarity test is to check the data sequence in the time sequence data set for change along with the time.
Further, performing stationarity test on the time series data set to obtain an analysis data set includes:
performing stationarity check on the time sequence data set by using an analysis function;
and carrying out stationarity classification on the result of the stationarity test to obtain an analysis data set, wherein the analysis data set comprises a stationary data set and a non-stationary data set.
In this embodiment, the formula of the analysis function may be:
Ft1,t2,....tm(x1,x2,...,xm)=Ft1+τ,t2+τ,...,tm+τ(x1,x2,...,xm)
wherein x represents a data value in the sample data set, t represents time, and τ represents a preset probability factor.
For example: the time series data set of the sample user includes data of Zhang III, which specifically includes: the method comprises the following steps of name king five, sex male, age 20, annual income 20 ten thousand, insurance scheme A101 and insurance price 5000, and performing stationarity test on data of Zhang III by using an analysis function to obtain a stationary data set and a non-stationary data set, wherein the stationary data set comprises data of name items, sex items and age items, and the non-stationary data set comprises data of annual income items, insurance scheme items and insurance price items.
And S3, judging whether the non-stationary data set contains other stationary data.
In this embodiment, the determining whether the non-stationary data set contains other stationary data includes:
and judging whether the non-stationary data set contains other stationary data or not by carrying out differential calculation on the non-stationary data set.
Further, the determining whether the non-stationary data set contains other stationary data includes:
carrying out differential calculation on data contained in the non-stationary data set by using a differential function;
judging whether the difference calculation result contains a stable data subset which is not empty;
if so, determining that the non-stationary data set contains other stationary data, and determining that the stationary data subset is other stationary data.
Optionally, the difference function includes a first order difference function and a second order difference function, and the first order difference function is used for subtracting the value of the previous item from each item of the data in the non-stationary data set; the second order difference function is used for carrying out a difference again on the basis of the first order difference function.
And carrying out differential calculation on the non-stationary data set through the first-order difference function and the second-order difference function to obtain a stationary data subset in the non-stationary data set and a stationary data subset in the non-stationary data set so as to obtain a stationary sequence item and a non-stationary sequence item.
By judging whether the non-stationary data set contains stationary data again, more comprehensive stationary data is obtained.
And S4, if the non-stationary data set contains other stationary data, fitting and optimizing the stationary data set and the other stationary data contained in the non-stationary data set to generate a data analysis model.
The embodiment of the invention can carry out fitting optimization through a curve fitting method.
In detail, the S4 includes:
fitting and calculating other stationary data contained in the stationary data set and the non-stationary data set by using a linear regression function to obtain a fitting data set;
mapping the fitted data set to the linear regression function to generate an analysis function;
carrying out logarithmic derivation on the analysis function to obtain a likelihood function;
and combining the likelihood function and the analysis function to generate a data analysis model.
In detail, the linear regression function is:
wherein z represents data in the stationary data set and other stationary data contained in the non-stationary data set, and g (z) has a value range of [0,1 ].
For example, calculating data (data of three characteristics, such as annual income item, insurance scheme item and insurance price item) in other stable data contained in the stable data set and the non-stable data set through the linear regression function, and outputting a fitting data set, wherein the fitting data set contains characteristic probability; mapping the characteristic probability meeting a preset linear relation to the linear regression function to obtain an analysis function containing a characteristic value; and carrying out logarithmic derivation calculation on the analysis function to obtain a likelihood function, and combining the likelihood function and the analysis function based on a preset linear relation to generate a data analysis model.
For example: calculating data of three characteristics of the annual income item insurance scheme item and the insurance price item of Zhang III through the linear regression function to obtain a fitting data set of Zhang III, wherein the fitting data set comprises the characteristic probability of the three characteristics of the annual income item insurance scheme item and the insurance price item, mapping the characteristic probability of the three characteristics of the annual income item insurance scheme item and the insurance price item meeting a preset linear relation to the linear regression function to obtain an analysis function comprising Zhang III characteristic values, carrying out logarithmic derivation calculation on the analysis function to obtain a likelihood function, and combining the likelihood function and the analysis function based on the preset linear relation to generate a data analysis model.
S5, obtaining an original data set to be analyzed, and analyzing and calculating the original data set by using the data analysis model to obtain an analysis result of a preset type of the original data set to be analyzed. In detail, the S5 includes:
acquiring an original data set to be analyzed, wherein the original data set comprises insurance policy data of a user and historical insurance policy data of the user;
analyzing and calculating the insurance policy data and the historical insurance policy data by using the data analysis model to obtain an insurance policy analysis set and a historical insurance policy analysis set;
and combining the insurance policy analysis set and the historical insurance policy analysis set by using a data warehouse scheme to obtain a insurance fee analysis result of the user.
Further, the analyzing and calculating the insurance policy data and the historical insurance policy data by using the data analysis model respectively comprises:
calculating the insurance policy data by using the likelihood function in the data analysis model to generate an insurance policy analysis set, wherein the insurance policy analysis set comprises insurance policy analysis and insurance policy occurrence probability;
and calculating the historical policy data by using a likelihood function in the data analysis model to generate a historical policy analysis set, wherein the historical policy analysis set comprises historical premium analysis and historical premium occurrence probability.
In the data warehouse scheme, data conversion and data integration are required before data is loaded, so that the loaded data is unified under a data model, and association of multiple data types is realized according to operations such as matching and retention.
In the embodiment of the invention, the insurance premium analysis, the insurance premium occurrence probability, the historical insurance premium analysis and the historical insurance premium occurrence probability are combined through the data warehouse scheme to obtain the insurance premium analysis result of the user.
The method of the embodiment of the invention obtains a time sequence data set; performing stationarity check on the time series data set to obtain an analysis data set, wherein the analysis data set comprises a stationary data set in the time series data set and a non-stationary data set in the time series data set; judging whether the non-stationary data set contains other stationary data; and if the non-stationary data set contains other stationary data, performing fitting optimization on the stationary data set and the other stationary data contained in the non-stationary data set to generate a data analysis model. Through stationarity test on the time series data set and secondary judgment on whether the non-stationary data set contains other stationary data or not, stationarity and accuracy of data during model establishment can be effectively improved, influence of redundant data on construction of a data analysis model is eliminated, accuracy of the model on data analysis is improved, the original data set to be analyzed is analyzed and calculated through the data analysis model, a preset type analysis result of the original data set to be analyzed can be obtained, and the purpose of improving the accuracy of the data analysis model on data analysis is achieved.
Fig. 2 is a functional block diagram of the data analysis model determination device according to the present invention.
The data analysis model determination apparatus 100 according to the present invention may be installed in an electronic device. According to the realized functions, the data analysis model determination device may include a data acquisition module 101, a data verification module 102, a data judgment module 103, and a model calculation module 104. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the data acquisition module 101 is configured to acquire a time series data set;
the data checking module 102 is configured to perform stationarity checking on the time series data set to obtain an analysis data set, where the analysis data set includes a stationary data set in the time series data set and a non-stationary data set in the time series data set;
the model calculation module 103 is configured to determine whether the non-stationary data set includes other stationary data;
the model calculation module 104 is configured to, if the non-stationary data set includes other stationary data, perform fitting optimization on the stationary data set and the other stationary data included in the non-stationary data set to generate a data analysis model;
the data acquisition module 101 is further configured to acquire an original data set to be analyzed;
the model calculation module 104 is further configured to perform analysis calculation on the raw data set by using the data analysis model to obtain a preset type of analysis result on the raw data set to be analyzed.
In detail, the specific implementation steps of each module of the data analysis model determination device are as follows:
the data acquisition module 101 acquires a time series data set.
In an alternative embodiment, the time series data set may be obtained from a network channel sales platform.
The time sequence data set comprises data corresponding to different times and different time points. For example, the time series data set includes policy data for sample users and non-policy data for sample users, as well as other policy-related data and the like. The policy data includes, but is not limited to, historical policy data and policy data; the non-policy data comprises identity information of profession, age, gender and the like of the sample user; the other policy related data includes collected market policy research data, and policy related statistical data for each insurance company.
The data checking module 102 performs stationarity checking on the time series data set to obtain an analysis data set, where the analysis data set includes a stationary data set in the time series data set and a non-stationary data set in the time series data set.
The stationarity test is to check the data sequence in the time sequence data set for change along with the time.
Further, performing stationarity test on the time series data set to obtain an analysis data set includes:
performing stationarity check on the time sequence data set by using an analysis function;
and carrying out stationarity classification on the result of the stationarity test to obtain an analysis data set, wherein the analysis data set comprises a stationary data set and a non-stationary data set.
In this embodiment, the formula of the analysis function may be:
Ft1,t2,....tm(x1,x2,...,xm)=Ft1+τ,t2+τ,...,tm+τ(x1,x2,...,xm)
wherein x represents a data value in the sample data set, t represents time, and τ represents a preset probability factor.
For example: the time series data set of the sample user includes data of Zhang III, which specifically includes: the method comprises the following steps of name king five, sex male, age 20, annual income 20 ten thousand, insurance scheme A101 and insurance price 5000, and performing stationarity test on data of Zhang III by using an analysis function to obtain a stationary data set and a non-stationary data set, wherein the stationary data set comprises data of name items, sex items and age items, and the non-stationary data set comprises data of annual income items, insurance scheme items and insurance price items.
The data determination module 103 determines whether the non-stationary data set contains other stationary data.
In this embodiment, the determining whether the non-stationary data set contains other stationary data includes:
and judging whether the non-stationary data set contains other stationary data or not by carrying out differential calculation on the non-stationary data set.
Further, the determining whether the non-stationary data set contains other stationary data includes:
carrying out differential calculation on data contained in the non-stationary data set by using a differential function;
judging whether the difference calculation result contains a stable data subset which is not empty;
if so, determining that the non-stationary data set contains other stationary data, and determining that the stationary data subset is other stationary data.
Optionally, the difference function includes a first order difference function and a second order difference function, and the first order difference function is used for subtracting the value of the previous item from each item of the data in the non-stationary data set; the second order difference function is used for carrying out a difference again on the basis of the first order difference function.
And carrying out differential calculation on the non-stationary data set through the first-order difference function and the second-order difference function to obtain a stationary data subset in the non-stationary data set and a stationary data subset in the non-stationary data set so as to obtain a stationary sequence item and a non-stationary sequence item.
By judging whether the non-stationary data set contains stationary data again, more comprehensive stationary data is obtained.
If the non-stationary data set contains other stationary data, the data analysis module 104 performs fitting optimization on the stationary data set and the other stationary data contained in the non-stationary data set to generate a data analysis model, and performs analysis calculation on the original data set to be analyzed by using the data analysis model to obtain an analysis result of a preset type of the original data set to be analyzed.
The embodiment of the invention can carry out fitting optimization through a curve fitting method.
In detail, a linear regression function is used for carrying out fitting calculation on other stationary data contained in the stationary data set and the non-stationary data set to obtain a fitting data set;
mapping the fitted data set to the linear regression function to generate an analysis function;
carrying out logarithmic derivation on the analysis function to obtain a likelihood function;
and combining the likelihood function and the analysis function to generate a data analysis model.
In detail, the linear regression function is:
wherein z represents data in the stationary data set and other stationary data contained in the non-stationary data set, and g (z) has a value range of [0,1 ].
For example, calculating data (data of three characteristics, such as annual income item, insurance scheme item and insurance price item) in other stable data contained in the stable data set and the non-stable data set through the linear regression function, and outputting a fitting data set, wherein the fitting data set contains characteristic probability; mapping the characteristic probability meeting a preset linear relation to the linear regression function to obtain an analysis function containing a characteristic value; and carrying out logarithmic derivation calculation on the analysis function to obtain a likelihood function, and combining the likelihood function and the analysis function based on a preset linear relation to generate a data analysis model.
For example: calculating data of three characteristics of the annual income item insurance scheme item and the insurance price item of Zhang III through the linear regression function to obtain a fitting data set of Zhang III, wherein the fitting data set comprises the characteristic probability of the three characteristics of the annual income item insurance scheme item and the insurance price item, mapping the characteristic probability of the three characteristics of the annual income item insurance scheme item and the insurance price item meeting a preset linear relation to the linear regression function to obtain an analysis function comprising Zhang III characteristic values, carrying out logarithmic derivation calculation on the analysis function to obtain a likelihood function, and combining the likelihood function and the analysis function based on the preset linear relation to generate a data analysis model.
In this embodiment, after the data analysis model is generated, the original data set to be analyzed may be further processed and evaluated by the data analysis model to obtain the measurement and calculation result to be analyzed.
Further, the analyzing and calculating the original data set to be analyzed by using the data analysis model to obtain an analysis result of a preset type of the original data set to be analyzed includes:
acquiring an original data set to be analyzed, wherein the original data set comprises insurance policy data of a user and historical insurance policy data of the user;
analyzing and calculating the insurance policy data and the historical insurance policy data by using the data analysis model to obtain an insurance policy analysis set and a historical insurance policy analysis set;
and combining the insurance policy analysis set and the historical insurance policy analysis set by using a data warehouse scheme to obtain a insurance fee analysis result of the user.
Further, the analyzing and calculating the insurance policy data and the historical insurance policy data by using the data analysis model respectively comprises:
calculating the insurance policy data by using the likelihood function in the data analysis model to generate an insurance policy analysis set, wherein the insurance policy analysis set comprises insurance policy analysis and insurance policy occurrence probability;
and calculating the historical policy data by using a likelihood function in the data analysis model to generate a historical policy analysis set, wherein the historical policy analysis set comprises historical premium analysis and historical premium occurrence probability.
In the data warehouse scheme, data conversion and data integration are required before data is loaded, so that the loaded data is unified under a data model, and association of multiple data types is realized according to operations such as matching and retention.
In the embodiment of the invention, the insurance premium analysis, the insurance premium occurrence probability, the historical insurance premium analysis and the historical insurance premium occurrence probability are combined through the data warehouse scheme to obtain the insurance premium analysis result of the user.
In the embodiment of the invention, a data acquisition module acquires a time sequence data set; the data inspection module performs stationarity inspection on the time sequence data set to obtain an analysis data set, wherein the analysis data set comprises a stationary data set in the time sequence data set and a non-stationary data set in the time sequence data set; the data judgment module judges whether the non-stationary data set contains other stationary data; and if the non-stationary data set contains other stationary data, the model calculation module performs fitting optimization on the stationary data set and the other stationary data contained in the non-stationary data set to generate a data analysis model. Through stationarity test on the time series data set and secondary judgment on whether the non-stationary data set contains other stationary data or not, validity and accuracy of data in model building can be effectively improved, influence of redundant data on construction of a data analysis model is eliminated, accuracy of the model on data analysis is further improved, analysis and calculation are conducted on an original data set to be analyzed through the data analysis model, analysis results of preset types of the original data set to be analyzed can be obtained, and the purpose of improving the accuracy of the data analysis model on the data analysis is achieved.
Furthermore, the original data set to be analyzed is efficiently analyzed through a high-precision data analysis model, so that an accurate measurement and calculation result is obtained, and the purpose of improving the accuracy of policy maintenance analysis is achieved.
Fig. 3 is a schematic structural diagram of an electronic device implementing the data analysis model determination method according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a data analysis model determination program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of a data analysis model determination program, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (for example, executing a data analysis model determination program, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard (Keyboard)), optionally, a standard wired interface, a wireless interface, optionally, in some embodiments, the Display may be an L ED Display, a liquid crystal Display, a touch-sensitive liquid crystal Display, an O L ED (Organic light-Emitting Diode) touch-sensitive device, etc.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The data analysis model determination program 12 stored in the memory 11 of the electronic device 1 is a combination of a plurality of instructions that, when executed in the processor 10, may implement:
acquiring a time series data set;
performing stationarity check on the time series data set to obtain an analysis data set, wherein the analysis data set comprises a stationary data set in the time series data set and a non-stationary data set in the time series data set;
judging whether the non-stationary data set contains other stationary data;
if the non-stationary data set contains other stationary data, performing fitting optimization on the stationary data set and the other stationary data contained in the non-stationary data set to generate a data analysis model;
and acquiring an original data set to be analyzed, and analyzing and calculating the original data set by using the data analysis model to obtain an analysis result of a preset type of the original data set to be analyzed.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, which is not described herein again.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a non-volatile computer-readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims (10)
1. A method for determining a data analysis model, the method comprising:
acquiring a time series data set;
performing stationarity check on the time series data set to obtain an analysis data set, wherein the analysis data set comprises a stationary data set in the time series data set and a non-stationary data set in the time series data set;
judging whether the non-stationary data set contains other stationary data;
if the non-stationary data set contains other stationary data, performing fitting optimization on the stationary data set and the other stationary data contained in the non-stationary data set to generate a data analysis model;
and acquiring an original data set to be analyzed, and analyzing and calculating the original data set by using the data analysis model to obtain an analysis result of a preset type of the original data set to be analyzed.
2. The method of determining a data analysis model of claim 1, wherein the stationarity checking the time series data set to obtain an analysis data set comprises:
performing stationarity test on the time sequence data set by using an analysis function;
and carrying out stability classification on the result of the stability test to obtain an analysis data set.
3. The data analysis model determination method of claim 1, wherein said determining whether the non-stationary data set contains other stationary data comprises:
carrying out differential calculation on data contained in the non-stationary data set by using a differential function;
judging whether the difference calculation result contains a stable data subset which is not empty;
and if the difference calculation result contains a steady data subset which is not empty, determining that the non-steady data set contains other steady data, and determining that the steady data subset is other steady data.
4. The method of determining a data analysis model of claim 1, wherein fitting the stationary data set and other stationary data contained in the non-stationary data set to optimize, and generating the data analysis model comprises:
fitting and calculating other stationary data contained in the stationary data set and the non-stationary data set by using a linear regression function to obtain a fitting data set;
mapping the fitted data set to the linear regression function to generate an analysis function;
carrying out logarithmic calculation on the analysis function to obtain a likelihood function;
and combining the likelihood function and the analysis function to generate a data analysis model.
6. The method for determining the data analysis model according to claim 1, wherein the obtaining of the raw data set to be analyzed and the performing of the analysis calculation on the raw data set by using the data analysis model to obtain the preset type of analysis result on the raw data set to be analyzed comprises:
acquiring an original data set to be analyzed, wherein the original data set comprises insurance policy data of a user and historical insurance policy data of the user;
analyzing and calculating the insurance policy data and the historical insurance policy data by using the data analysis model to obtain an insurance policy analysis set and a historical insurance policy analysis set;
and combining the insurance policy analysis set and the historical insurance policy analysis set by using a data warehouse scheme to obtain a insurance fee analysis result for the user.
7. The method of determining a data analysis model of claim 6, wherein said using said data analysis model to perform analytical calculations on said warranty data and said historical warranty data, respectively, comprises:
calculating the insurance policy data by using the likelihood function in the data analysis model to generate an insurance policy analysis set, wherein the insurance policy analysis set comprises insurance policy analysis and insurance policy occurrence probability;
and calculating the historical policy data by using a likelihood function in the data analysis model to generate a historical policy analysis set, wherein the historical policy analysis set comprises historical premium analysis and historical premium occurrence probability.
8. A data analysis model determination apparatus, characterized in that the apparatus comprises:
the data acquisition module is used for acquiring a time series data set; the data inspection module is used for carrying out stationarity inspection on the time sequence data set to obtain an analysis data set, wherein the analysis data set comprises a stationary data set in the time sequence data set and a non-stationary data set in the time sequence data set;
the data judgment module is used for judging whether the non-stationary data set contains other stationary data;
the model calculation module is used for fitting and optimizing the stationary data set and other stationary data contained in the non-stationary data set to generate a data analysis model if the non-stationary data set contains other stationary data;
the data acquisition module is also used for acquiring an original data set to be analyzed;
the model calculation module is further configured to perform analysis calculation on the original data set by using the data analysis model to obtain a preset type of analysis result on the original data set to be analyzed.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a data analysis model determination method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a data analysis model determination method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010110683.1A CN111414398B (en) | 2020-02-22 | 2020-02-22 | Data analysis model determining method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010110683.1A CN111414398B (en) | 2020-02-22 | 2020-02-22 | Data analysis model determining method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111414398A true CN111414398A (en) | 2020-07-14 |
CN111414398B CN111414398B (en) | 2023-05-30 |
Family
ID=71492760
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010110683.1A Active CN111414398B (en) | 2020-02-22 | 2020-02-22 | Data analysis model determining method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111414398B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160034615A1 (en) * | 2014-08-01 | 2016-02-04 | Tata Consultancy Services Limited | System and method for forecasting a time series data |
CN107577648A (en) * | 2017-09-04 | 2018-01-12 | 北京京东尚科信息技术有限公司 | For handling the method and device of multivariate time series data |
CN108684051A (en) * | 2018-05-11 | 2018-10-19 | 广东南方通信建设有限公司 | A kind of wireless network performance optimization method, electronic equipment and storage medium based on cause and effect diagnosis |
-
2020
- 2020-02-22 CN CN202010110683.1A patent/CN111414398B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160034615A1 (en) * | 2014-08-01 | 2016-02-04 | Tata Consultancy Services Limited | System and method for forecasting a time series data |
CN107577648A (en) * | 2017-09-04 | 2018-01-12 | 北京京东尚科信息技术有限公司 | For handling the method and device of multivariate time series data |
CN108684051A (en) * | 2018-05-11 | 2018-10-19 | 广东南方通信建设有限公司 | A kind of wireless network performance optimization method, electronic equipment and storage medium based on cause and effect diagnosis |
Non-Patent Citations (2)
Title |
---|
尚君;陈艺源;马捷;任燕燕;: "中国保险需求影响因素的实证研究――基于时间序列的分位数回归" * |
范涛涛;寇艳廷;刘晨;阎红灿;: "时间序列分析中数据的平稳性判定研究" * |
Also Published As
Publication number | Publication date |
---|---|
CN111414398B (en) | 2023-05-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111754110A (en) | Method, device, equipment and medium for evaluating operation index based on artificial intelligence | |
CN112445875B (en) | Data association and verification method and device, electronic equipment and storage medium | |
CN112883190A (en) | Text classification method and device, electronic equipment and storage medium | |
CN113592019A (en) | Fault detection method, device, equipment and medium based on multi-model fusion | |
CN112579621B (en) | Data display method and device, electronic equipment and computer storage medium | |
CN115081025A (en) | Sensitive data management method and device based on digital middlebox and electronic equipment | |
CN113516417A (en) | Service evaluation method and device based on intelligent modeling, electronic equipment and medium | |
CN112463530A (en) | Anomaly detection method and device for micro-service system, electronic equipment and storage medium | |
CN111652278A (en) | User behavior detection method and device, electronic equipment and medium | |
CN111401691A (en) | Business progress monitoring method and device and computer readable storage medium | |
CN111694844A (en) | Enterprise operation data analysis method and device based on configuration algorithm and electronic equipment | |
CN111814106A (en) | Time series data hysteresis processing method and device, electronic equipment and storage medium | |
CN116485220A (en) | Staff performance assessment method and device, electronic equipment and storage medium | |
CN114612194A (en) | Product recommendation method and device, electronic equipment and storage medium | |
CN116089250A (en) | Man-machine interaction optimization management system and management method | |
CN112541688B (en) | Service data verification method and device, electronic equipment and computer storage medium | |
CN111460293B (en) | Information pushing method and device and computer readable storage medium | |
CN113766312A (en) | Method, device, equipment and storage medium for calculating response delay between equipment | |
CN112948705A (en) | Intelligent matching method, device and medium based on policy big data | |
CN111402068A (en) | Premium data analysis method and device based on big data and storage medium | |
CN111414398B (en) | Data analysis model determining method, device and storage medium | |
CN114781855A (en) | DEA model-based logistics transmission efficiency analysis method, device, equipment and medium | |
CN114331237A (en) | Sewage treatment quality evaluation method and device based on AHP-entropy weight method | |
CN113139129A (en) | Virtual reading track map generation method and device, electronic equipment and storage medium | |
CN112734205A (en) | Model confidence degree analysis method and device, electronic equipment and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |