CN113449024B

CN113449024B - Insurance data analysis method, device, equipment and medium based on big data

Info

Publication number: CN113449024B
Application number: CN202110696233.XA
Authority: CN
Inventors: 吴先祥
Original assignee: Ping An Puhui Enterprise Management Co Ltd
Current assignee: Ping An Puhui Enterprise Management Co Ltd
Priority date: 2021-06-23
Filing date: 2021-06-23
Publication date: 2023-02-14
Anticipated expiration: 2041-06-23
Also published as: CN113449024A

Abstract

The invention relates to the field of big data, and provides insurance data analysis method, device, equipment and medium based on big data, which can call Sqoop to extract data to be processed from upstream data and write the data into a Hive table, call Hive to process the data to be processed to obtain a marker bit, execute calculation when the marker bit is detected to meet configuration conditions, effectively save calculation resources of a system, call Hive to extract calculation factors under each target dimension from the data to be processed to calculate a UPR value of each target dimension, call Sqoop to synchronize the UPR value of each target dimension to a local database, further combine big data calculation and unified operation mode, avoid waste of system resources caused by different calculation modes due to differences of data quantity and the like, effectively reduce labor cost, avoid introducing higher errors due to artificial calculation, enable the calculated UPR value to be more accurate, and improve calculation efficiency. In addition, the invention also relates to a block chain technology, and the UPR value can be stored in the block chain node.

Description

Insurance data analysis method, device, equipment and medium based on big data

Technical Field

The invention relates to the technical field of big data, in particular to insurance data analysis method, device, equipment and medium based on big data.

Background

At present, in each company enterprise on the market, the operation mode of calculating the UPR (unexpired Premium Reserve) is generally divided into manual processing and SAS (status Analysis System) System processing according to the difference of data volume.

In actual operation, the income details of the previous month are extracted from the financial aspect, then the UPR accounting of each period is calculated according to the guarantee fee income and the unexposed risk date of each period, and finally the data after accounting is sent to the financial aspect for entering account.

However, regardless of the manual processing or the SAS system processing, the labor cost is high, the accuracy of the calculation result is also poor, and the time consumption is long.

Disclosure of Invention

In view of the above, it is necessary to provide a method, an apparatus, a device and a medium for analyzing insurance data based on big data, which can combine big data calculation and unify operation modes, avoid waste of system resources and manpower due to different calculation modes caused by differences in data amount, automatically calculate the UPR value in each dimension by a Hive data warehouse tool, effectively reduce labor cost, avoid introducing higher errors due to manual calculation, make the calculated UPR value more accurate, and improve calculation efficiency.

An insurance data analysis method based on big data, comprising:

scanning upstream data at preset time intervals, calling a Sqoop data migration tool to extract data to be processed from the upstream data, and writing the data to be processed into a Hive table;

calling a Hive data warehouse tool to process the data to be processed stored in the Hive table to obtain a flag bit;

starting an Azkaban task scheduler to detect whether the zone bit meets configuration conditions or not at regular time;

when the flag bit is detected to meet the configuration condition, determining at least one target dimension according to a received requirement table;

calling the Hive data warehouse tool to extract a calculation factor under each target dimension from the data to be processed, and calculating a UPR value of each target dimension according to the calculation factor under each target dimension;

and calling the Sqoop data migration tool to synchronize the UPR value of each target dimension to a local database.

According to the preferred embodiment of the present invention, the invoking the Sqoop data migration tool to extract the data to be processed from the upstream data includes:

acquiring a current timestamp;

determining a target time range according to the preset time interval and the current timestamp;

determining a designated database storing the upstream data;

starting the Sqoop data migration tool to be connected to the specified database, and extracting data in the target time range from the specified database as the data to be processed; and/or

And detecting data with a specified format generated in the target time range in the upstream data, and determining the detected data as the data to be processed.

According to the preferred embodiment of the present invention, the calling the Hive data warehouse tool to process the data to be processed stored in the Hive table to obtain the flag bit includes:

acquiring a first field, a second field and a third field which are configured in advance;

acquiring data matched with the first field from the data to be processed as guarantee fee income data, acquiring data matched with the second field as a guarantee period, and acquiring data matched with the third field as an evaluation time point;

when the guarantee fee income data is less than zero and/or the guarantee period is less than or equal to the evaluation time point, determining that the Flag bit is Flag =0;

and when the guarantee fee income data is greater than or equal to zero and the guarantee period is greater than the evaluation time point, determining that the Flag bit is Flag =1.

According to the preferred embodiment of the present invention, the starting of the Azkaban task scheduler to regularly detect whether the flag bit meets the configuration condition includes:

starting the Azkaban task scheduler to detect the value of the flag bit at fixed time;

when the Flag bit is detected to be Flag =1, determining that the configuration condition is satisfied; or alternatively

When the Flag bit is detected to be Flag =0, it is determined that the configuration condition is not satisfied.

According to a preferred embodiment of the present invention, the invoking the Hive data warehouse tool to extract a calculation factor in each target dimension from the data to be processed, and calculating the UPR value in each target dimension according to the calculation factor in each target dimension includes:

for each target dimension, acquiring at least one field name configured in advance, wherein the at least one field name comprises the first field, the third field and a fourth field corresponding to a guarantee period;

extracting a calculation factor under each target dimension from the data to be processed according to the at least one field name, wherein the calculation factor comprises target guarantee fee income data corresponding to the first field, a target evaluation time point corresponding to the third field and a target guarantee period corresponding to the fourth field;

calculating target passing time according to the target evaluation time point and the target guarantee period;

calculating a target current insurance period according to the target guarantee period and the target elapsed time;

calculating the sum of the target current insurance start period and a preset value as a target current insurance end period;

calculating the difference value between the target current insurance expiration date and the target evaluation time point and the preset value as a first target value;

calculating the difference value between the target current insurance deadline and the target current insurance start period as a second target value;

calculating a product of the second target value and the target premium revenue data as a third target value;

and calculating the quotient of the first target value and the third target value as the UPR value of the target dimension.

According to the preferred embodiment of the present invention, after the Sqoop data migration tool is called to synchronize the UPR value of each target dimension to the local database, the method further includes:

when the preset identification is detected, the synchronization is determined to be completed;

transmitting the preset identification to kafak;

connecting to a mail notification interface;

and when the mail notification interface monitors that the kafak consumes the preset identifier, sending a prompt mail to a specified terminal device through the mail notification interface, wherein the prompt mail is used for prompting that the UPR value of each target dimension is successfully synchronized to the local database.

monitoring a pre-established query interface and a pre-established download interface in real time;

when the inquiry interface and/or the download interface is detected to be triggered, acquiring a requester triggering the inquiry interface and/or the download interface;

verifying the requestor's rights;

when the authority of the requester is verified, determining target data requested by the requester through the query interface and/or the download interface;

and feeding back the target data to the terminal equipment of the requester through the query interface and/or the download interface.

A big-data based insurance data analysis apparatus, the big-data based insurance data analysis apparatus comprising:

the extracting unit is used for scanning upstream data at preset time intervals, calling a Sqoop data migration tool to extract data to be processed from the upstream data, and writing the data to be processed into a Hive table;

the processing unit is used for calling a Hive data warehouse tool to process the data to be processed stored in the Hive table to obtain a flag bit;

the detection unit is used for starting the Azkaban task scheduler to detect whether the zone bit meets the configuration condition or not at regular time;

the determining unit is used for determining at least one target dimension according to the received requirement table when the flag bit is detected to meet the configuration condition;

the calculating unit is used for calling the Hive data warehouse tool to extract a calculation factor under each target dimension from the data to be processed and calculating a UPR value of each target dimension according to the calculation factor under each target dimension;

and the synchronization unit is used for calling the Sqoop data migration tool to synchronize the UPR value of each target dimension to a local database.

A computer device, the computer device comprising:

a memory storing at least one instruction; and

a processor executing instructions stored in the memory to implement the big-data based insurance data analysis method.

A computer-readable storage medium having stored therein at least one instruction for execution by a processor in a computer device to implement the big-data based insurance data analysis method.

It can be seen from the above technical solutions that the present invention can scan upstream data at preset time intervals, invoke a Sqoop data migration tool to extract data to be processed from the upstream data, write the data to be processed into a Hive table, invoke a Hive data warehouse tool to process the data to be processed stored in the Hive table, obtain a flag bit, start an Azkaban task scheduler to periodically detect whether the flag bit satisfies a configuration condition, determine at least one target dimension according to a received requirement table when it is detected that the flag bit satisfies the configuration condition, execute subsequent calculations only when it is detected that the flag bit satisfies the configuration condition, effectively save the computational resources of the system, the Hive data warehouse tool is called to extract a calculation factor of each target dimension from the data to be processed, a UPR value of each target dimension is calculated according to the calculation factor of each target dimension, the Sqoop data migration tool is called to synchronize the UPR value of each target dimension to a local database, further, big data calculation and unified operation modes can be combined, waste of system resources and manpower caused by different calculation modes due to difference of data quantity and the like is avoided, the UPR value of each dimension is automatically calculated through the Hive data warehouse tool, manpower cost is effectively reduced, higher error caused by manual calculation is avoided, the calculated UPR value is more accurate, and meanwhile calculation efficiency is improved.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the big data based insurance data analysis method of the present invention.

FIG. 2 is a functional block diagram of a preferred embodiment of the big data based insurance data analysis apparatus of the present invention.

FIG. 3 is a schematic structural diagram of a computer device for implementing a big-data-based insurance data analysis method according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

FIG. 1 is a flow chart of a method for analyzing insurance data based on big data according to a preferred embodiment of the present invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.

The insurance data analysis method based on big data is applied to one or more computer devices, wherein the computer devices are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware thereof includes but is not limited to a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device and the like.

The computer device may be any electronic product capable of performing human-computer interaction with a user, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an Internet Protocol Television (IPTV), an intelligent wearable device, and the like.

The computer device may also include a network device and/or a user device. The network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network servers.

The Network in which the computer device is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.

S10, scanning upstream data at preset time intervals, calling a Sqoop data migration tool to extract data to be processed from the upstream data, and writing the data to be processed into a Hive table.

In this embodiment, the preset time interval may be configured by a user, for example, 1 day.

In this embodiment, the upstream data refers to a raw data record of system operation, and the upstream data is usually stored in a relational database, such as mysql database, oracle database, and the like.

In at least one embodiment of the present invention, the invoking the Sqoop data migration tool to extract the data to be processed from the upstream data includes:

acquiring a current timestamp;

determining a designated database storing the upstream data;

For example: when the current timestamp is day 1, 12 of 1990 and the preset time interval is one month, the target time range is one month forward from day 1, 12 of 1990, i.e., day 1, 11 of 1990.

In this embodiment, the designated database is generally a relational database for storing the upstream data.

In this embodiment, the specified format includes a csv format.

In this embodiment, when the data that meets the target time range is stored in the specified database, the data is directly acquired from the specified database as the to-be-processed data by using the Sqoop data migration tool.

Meanwhile, for data which is not stored in the specified database, the data with the specified format generated in the target time range is detected, so that the obtained data is more comprehensive, and data omission is avoided.

Further, in this embodiment, after the Sqoop data migration tool is called to extract the to-be-processed data from the upstream data, the to-be-processed data is written into the Hive table, so that a Hive data warehouse tool is called subsequently to perform data calculation.

And S11, calling a Hive data warehouse tool to process the data to be processed stored in the Hive table to obtain a flag bit.

The flag bit may be used as a determination flag to determine whether to perform a calculation.

In at least one embodiment of the present invention, the invoking the Hive data warehouse tool to process the data to be processed stored in the Hive table to obtain a flag bit includes:

and when the guarantee fee income data is larger than or equal to zero and the guarantee period is larger than the evaluation time point, determining that the Flag bit is Flag =1.

The first field, the second field and the third field can be configured by self-definition and used for acquiring corresponding data.

And S12, starting the Azkaban task scheduler to detect whether the zone bit meets the configuration condition at regular time.

In at least one embodiment of the present invention, the starting of the Azkaban task scheduler to periodically detect whether the flag bit meets the configuration condition includes:

when the Flag bit is detected to be Flag =1, determining that the configuration condition is met; or

When the Flag bit is detected to be Flag =0, determining that the configuration condition is not satisfied.

S13, when the flag bit is detected to meet the configuration condition, determining at least one target dimension according to the received requirement table.

In at least one embodiment of the present invention, when it is detected that the Flag bit does not satisfy the configuration condition, i.e., when it is detected that the Flag bit is Flag =0, no calculation is performed.

By the embodiment, when the acquired data do not meet the configuration condition, namely the acquired data are dirty data, the calculation is not executed, the waste of calculation resources is avoided, and the performance of the system is effectively ensured.

In this embodiment, the requirement table may be uploaded by a relevant worker (e.g., a developer, a project manager, etc.), and all dimensions that may involve calculations are stored in the requirement table.

The target dimension can be configured by self-definition, and the invention is not limited.

By way of example, the target dimensions include, but are not limited to, one or a combination of the following: date, year and month, legal subject, company section, cost center, product section and business section.

For example: the target dimension may be: UPR value at xxx cost center under xxx legal subjects at time 2021-03.

In the above embodiment, only when the flag bit is detected to satisfy the configuration condition, the subsequent calculation is performed, which effectively saves the calculation resources of the system.

S14, calling the Hive data warehouse tool to extract a calculation factor under each target dimension from the data to be processed, and calculating an UPR (unexpired Premium Reserve) value of each target dimension according to the calculation factor under each target dimension.

Taking the above embodiment into consideration, the invoking the Hive data warehouse tool to extract the calculation factor in each target dimension from the data to be processed, and calculating the UPR value in each target dimension according to the calculation factor in each target dimension includes:

calculating the current insurance period according to the target guarantee period and the target elapsed time;

calculating the difference value between the target current insurance deadline and the target evaluation time point and the preset value as a first target value;

Specifically, the calculation is as follows:

(1) Target elapsed time (unit: month) = (target evaluation time point (unit: year) — target guarantee period (unit: year)). 12+ (target evaluation time point (unit: month) — target guarantee period (unit: month));

wherein the target evaluation time point (unit: year and day) is the last natural day of the last month;

wherein, the target elapsed time is a term concept which indicates how many periods, i.e. how many months, the client has elapsed;

(2) Target current insurance expiration date (unit: year and day) = guarantee expiration date (unit: year and day) + target elapsed time (unit: month);

the target current insurance period refers to the insurance period of each client in the current period, and the target current insurance period of each client is different according to different dates of the target guarantee period of the client;

(3) Target current insurance deadline (unit: year and day) = target current insurance start period (unit: year and day) +1;

wherein 1 is the preset value, namely one month;

wherein the target current insurance deadline represents a current guarantee deadline of the client;

(4) UPR value = ((target current insurance deadline (unit: year and day) — target evaluation time point (unit: year and day)) -1)/(target current insurance deadline (unit: year and day) — target current insurance start date (unit: year and day))) target premium revenue data.

Through the embodiment, the UPR value under each dimensionality can be automatically calculated through a Hive data warehouse tool, the labor cost is effectively reduced, the phenomenon that a higher error is introduced due to manual calculation is avoided, the calculated UPR value is more accurate, and meanwhile the calculation efficiency is improved.

And S15, calling the Sqoop data migration tool to synchronize the UPR value of each target dimension to a local database.

The local database is a database which is deployed locally and can directly acquire data from the local database, so that the data acquisition process is more efficient and convenient.

The embodiment combines big data calculation, unifies operation modes, and avoids waste of system resources and human resources caused by different calculation modes due to differences of data quantity and the like.

In at least one embodiment of the present invention, after the Sqoop data migration tool is called to synchronize the UPR value of each target dimension to the local database, the method further includes:

transmitting the preset identification to kafak;

connecting to a mail notification interface;

Wherein the preset identification is used for marking whether synchronization is completed or not.

The specific terminal device may include, but is not limited to: terminal equipment of relevant financial staff and terminal equipment of sales personnel.

Through the implementation mode, after the calculated UPR value is successfully synchronized to the local database, the automatic mail can inform relevant workers so as to prompt the relevant workers to check in time, and the work efficiency of the relevant workers is improved in an auxiliary manner.

verifying the authority of the requester;

and feeding the target data back to the terminal equipment of the requester through the query interface and/or the download interface.

For example: when the button of the query interface is detected to be clicked, a target user clicking the button of the query interface is obtained, the authority of the target user can be queried from a pre-configured authority list, when the target user has the query authority, the target user is determined to pass verification, data queried by the target user through the query interface is further detected to serve as target data, and the target data is transmitted through the query interface and displayed on the terminal device of the requester.

Through the implementation mode, the local inquiry and downloading can be supported after the UPR value obtained through calculation is successfully synchronized to the local database, so that the data can be conveniently read and checked by related workers, the working efficiency is further improved, and the user experience is better.

It should be noted that, in order to further improve the security of the data and avoid malicious tampering of the data, the UPR value may be stored in the blockchain node.

It can be seen from the above technical solutions that the present invention can scan upstream data at preset time intervals, invoke a Sqoop data migration tool to extract data to be processed from the upstream data, write the data to be processed into a Hive table, invoke a Hive data warehouse tool to process the data to be processed stored in the Hive table, obtain a flag bit, start an Azkaban task scheduler to periodically detect whether the flag bit satisfies a configuration condition, determine at least one target dimension according to a received requirement table when it is detected that the flag bit satisfies the configuration condition, execute subsequent calculations only when it is detected that the flag bit satisfies the configuration condition, effectively save the computational resources of the system, the Hive data warehouse tool is called to extract a calculation factor under each target dimension from the data to be processed, a UPR value of each target dimension is calculated according to the calculation factor under each target dimension, the Sqoop data migration tool is called to synchronize the UPR value of each target dimension to a local database, further big data calculation and unified operation modes can be combined, waste of system resources and manpower caused by different calculation modes due to difference of data quantity and the like is avoided, the UPR value under each dimension is automatically calculated through the Hive data warehouse tool, manpower cost is effectively reduced, higher errors caused by manual calculation are avoided, the calculated UPR value is more accurate, and meanwhile calculation efficiency is improved.

Fig. 2 is a functional block diagram of a safety data analysis device based on big data according to a preferred embodiment of the present invention. The safety data analysis device 11 based on big data comprises an extraction unit 110, a processing unit 111, a detection unit 112, a determination unit 113, a calculation unit 114 and a synchronization unit 115. The module/unit referred to in the present invention refers to a series of computer program segments that can be executed by the processor 13 and that can perform a fixed function, and that are stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.

The extraction unit 110 scans upstream data at preset time intervals, invokes a Sqoop data migration tool to extract data to be processed from the upstream data, and writes the data to be processed into the Hive table.

In at least one embodiment of the present invention, the extracting unit 110 invoking the Sqoop data migration tool to extract the data to be processed from the upstream data includes:

acquiring a current timestamp;

determining a designated database storing the upstream data;

starting the Sqoop data migration tool to be connected to the specified database, and extracting data in the target time range from the specified database to serve as the data to be processed; and/or

For example: when the current timestamp is day 1, 12/1990 and the predetermined time interval is one month, the target time range is one month forward from day 1, 12/1990, i.e., day 1, 11/1990.

In this embodiment, the specified format includes a csv format.

In this embodiment, when the data corresponding to the target time range is stored in the specified database, the data is directly acquired from the specified database as the data to be processed by using the Sqoop data migration tool.

Meanwhile, the data which is not stored in the specified database is detected to be generated in the target time range and has the specified format, so that the acquired data is more comprehensive, and data omission is avoided.

The processing unit 111 calls a Hive data warehouse tool to process the data to be processed stored in the Hive table to obtain a flag bit.

In at least one embodiment of the present invention, the processing unit 111 invokes a Hive data warehouse tool to process the to-be-processed data stored in the Hive table, and obtaining the flag bit includes:

The detection unit 112 starts the Azkaban task scheduler to detect whether the flag bit meets the configuration condition or not at regular time.

In at least one embodiment of the present invention, the starting, by the detection unit 112, of the Azkaban task scheduler to periodically detect whether the flag bit meets the configuration condition includes:

when the Flag bit is detected to be Flag =1, determining that the configuration condition is satisfied; or

When detecting that the flag bit satisfies the configuration condition, the determining unit 113 determines at least one target dimension according to the received requirement table.

In at least one embodiment of the present invention, when it is detected that the Flag does not satisfy the configuration condition, that is, when it is detected that the Flag is Flag =0, no calculation is performed.

For example, the target dimensions include, but are not limited to, one or a combination of the following dimensions: date, year and month, legal body, company section, cost center, product section and service section.

For example: the target dimension may be: UPR values for xxx cost centers under xxx jurisdictions at times 2021-03.

The calculating unit 114 calls the Hive data warehouse tool to extract a calculation factor in each target dimension from the data to be processed, and calculates an UPR (unexpired Premium Reserve) value in each target dimension according to the calculation factor in each target dimension.

Specifically, the calculation is as follows:

(1) Target elapsed time (unit: month) = (target evaluation time point (unit: year) — target guarantee period (unit: year)) + 12+ (target evaluation time point (unit: month) — target guarantee period (unit: month));

wherein, the target elapsed time is a term concept which indicates how many periods, i.e. months, the client has elapsed;

wherein 1 is the preset value, namely one month;

(4) UPR value = ((target current insurance deadline (unit: year and day) —) -target evaluation time point (unit: year and day) — 1)/(target current insurance deadline (unit: year and day) — (target current insurance start time (unit: year and day))) target premium revenue data.

The synchronization unit 115 invokes the Sqoop data migration tool to synchronize the UPR value of each target dimension to the local database.

In at least one embodiment of the invention, after the Sqoop data migration tool is called to synchronize the UPR value of each target dimension to a local database, when a preset identifier is detected, synchronization is determined to be completed;

transmitting the preset identification to kafak;

connecting to a mail notification interface;

Wherein, the specified terminal device may include, but is not limited to: terminal equipment of related financial staff and terminal equipment of salesmen.

In at least one embodiment of the invention, after the Sqoop data migration tool is called to synchronize the UPR value of each target dimension to a local database, a pre-established query interface and a pre-established download interface are monitored in real time;

verifying the authority of the requester;

Through the embodiment, after the UPR value obtained through calculation is successfully synchronized to the local database, local query and downloading are supported, so that relevant workers can conveniently read and check data, the working efficiency is further improved, and the user experience is better.

According to the technical scheme, the upstream data can be scanned at intervals of preset time, the data to be processed is extracted from the upstream data by calling the Sqoop data migration tool, the data to be processed is written into the Hive table, the Hive data warehouse tool is called to process the data to be processed stored in the Hive table to obtain the zone bits, the Azkaban task scheduler is started to detect whether the zone bits meet the configuration conditions at regular time, when the zone bits meet the configuration conditions, at least one target dimension is determined according to the received requirement table, when the zone bits meet the configuration conditions, subsequent calculation is executed, the calculation resources of the system are effectively saved, the Hive data warehouse tool is called to extract the calculation factors of each target dimension from the data to be processed, the calculation factors of each target dimension are calculated according to the calculation factors of each target dimension, the Sqoop data migration tool is called to synchronize the UPR value of each target dimension to the local database, the calculation of the UPR value of each target dimension is further combined with the calculation of the large data, the operation mode is unified, the situation that the difference of the data amount and the difference of the different calculation modes, the UPR value of each target dimension is adopted is avoided, the UPR value is more accurately calculated, the UPR value is more accurately introduced by the calculation efficiency is reduced, and the calculation errors of the UPR calculation of the UPR of the system are more effectively reduced, and the calculation errors caused by the UPR calculation errors are avoided. .

Fig. 3 is a schematic structural diagram of a computer device for implementing the insurance data analysis method based on big data according to the preferred embodiment of the present invention.

The computer device 1 may comprise a memory 12, a processor 13 and a bus, and may further comprise a computer program, such as a big data based insurance data analysis program, stored in the memory 12 and executable on the processor 13.

It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the computer device 1, and does not constitute a limitation to the computer device 1, the computer device 1 may be in a bus structure or a star structure, the computer device 1 may include more or less hardware or software than those shown, or different component arrangements, for example, the computer device 1 may further include an input and output device, a network access device, and the like.

It should be noted that the computer device 1 is only an example, and other electronic products that are currently available or may come into existence in the future, such as electronic products that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.

The memory 12 includes at least one type of readable storage medium, which includes flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 12 may in some embodiments be an internal storage unit of the computer device 1, for example a removable hard disk of the computer device 1. The memory 12 may also be an external storage device of the computer device 1 in other embodiments, such as a plug-in removable hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the computer device 1. Further, the memory 12 may also include both an internal storage unit and an external storage device of the computer device 1. The memory 12 can be used not only for storing application software installed in the computer device 1 and various types of data such as codes of insurance data analysis programs based on big data, etc., but also for temporarily storing data that has been output or is to be output.

The processor 13 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 13 is a Control Unit (Control Unit) of the computer device 1, connects various components of the whole computer device 1 by various interfaces and lines, executes various functions of the computer device 1 and processes data by running or executing programs or modules stored in the memory 12 (for example, executing a safety data analysis program based on big data, etc.), and calling data stored in the memory 12.

The processor 13 executes the operating system of the computer device 1 and various installed application programs. The processor 13 executes the application program to implement the steps of the various big-data based insurance data analysis method embodiments described above, such as the steps shown in fig. 1.

Illustratively, the computer program may be divided into one or more modules/units, which are stored in the memory 12 and executed by the processor 13 to accomplish the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the computer device 1. For example, the computer program may be divided into an extraction unit 110, a processing unit 111, a detection unit 112, a determination unit 113, a calculation unit 114, a synchronization unit 115.

The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute the portions of the big data based insurance data analysis method according to the embodiments of the present invention.

The modules/units integrated by the computer device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments described above may be implemented.

Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random-access Memory, or the like.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one line is shown in FIG. 3, but this does not mean only one bus or one type of bus. The bus is arranged to enable connection communication between the memory 12 and at least one processor 13 etc.

Although not shown, the computer device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 13 through a power management device, so that functions of charge management, discharge management, power consumption management and the like are realized through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The computer device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

Further, the computer device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the computer device 1 and other computer devices.

Optionally, the computer device 1 may further comprise a user interface, which may be a Display (Display), an input unit, such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the computer device 1 and for displaying a visualized user interface.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

Fig. 3 shows only the computer device 1 with the components 12-13, and it will be understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the computer device 1 and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.

Referring to fig. 1, the memory 12 of the computer device 1 stores a plurality of instructions to implement a big data based insurance data analysis method, and the processor 13 can execute the plurality of instructions to implement:

Specifically, the specific implementation method of the instruction by the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, which is not described herein again.

In the several embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the present invention may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the same, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. An insurance data analysis method based on big data is characterized by comprising the following steps:

starting an Azkaban task scheduler to detect whether the flag bit meets configuration conditions at regular time;

calling the Hive data warehouse tool to extract a calculation factor under each target dimension from the data to be processed, and calculating a UPR value of each target dimension according to the calculation factor under each target dimension, wherein the method comprises the following steps: for each target dimension, acquiring at least one field name configured in advance, wherein the at least one field name comprises a first field, a third field and a fourth field corresponding to a guarantee period; extracting a calculation factor under each target dimension from the data to be processed according to the at least one field name, wherein the calculation factor comprises target guarantee income data corresponding to the first field, a target evaluation time point corresponding to the third field and a target guarantee period corresponding to the fourth field; calculating target passing time according to the target evaluation time point and the target guarantee period; calculating the current insurance period according to the target guarantee period and the target elapsed time; calculating the sum of the target current insurance start period and a preset value as a target current insurance end period; calculating the difference value between the target current insurance expiration date and the target evaluation time point and the preset value as a first target value; calculating the difference value between the target current insurance deadline and the target current insurance start period as a second target value; calculating a product of the second target value and the target premium revenue data as a third target value; calculating a quotient of the first target value and the third target value as a UPR value of the target dimension;

2. The big-data-based insurance data analyzing method according to claim 1, wherein said invoking a Sqoop data migration tool to extract the data to be processed from the upstream data comprises:

acquiring a current timestamp;

determining a designated database storing the upstream data;

3. The big-data-based insurance data analysis method according to claim 1, wherein the calling Hive data warehouse tool to process the data to be processed stored in the Hive table to obtain a flag bit comprises:

acquiring the first field, the second field and the third field which are configured in advance;

4. The big-data-based insurance data analysis method according to claim 1, wherein the starting of the Azkaban task scheduler to periodically detect whether the flag bit satisfies a configuration condition comprises:

5. The big-data-based insurance data analyzing method according to claim 1, wherein after invoking the Sqoop data migration tool to synchronize UPR values of each target dimension to a local database, the method further comprises:

transmitting the preset identification to kafak;

connecting to a mail notification interface;

6. The big-data-based insurance data analysis method according to claim 1, wherein after invoking the Sqoop data migration tool to synchronize the UPR value of each target dimension to a local database, the method further comprises:

verifying the requestor's rights;

7. An insurance data analysis apparatus based on big data, characterized in that the insurance data analysis apparatus based on big data comprises:

the extraction unit is used for scanning upstream data at preset time intervals, calling a Sqoop data migration tool to extract data to be processed from the upstream data, and writing the data to be processed into a Hive table;

the detection unit is used for starting the Azkaban task scheduler to detect whether the zone bit meets the configuration condition at regular time;

the calculating unit is used for calling the Hive data warehouse tool to extract a calculation factor under each target dimension from the data to be processed, and calculating a UPR value of each target dimension according to the calculation factor under each target dimension, and the calculating unit comprises: for each target dimension, acquiring at least one field name configured in advance, wherein the at least one field name comprises a first field, a third field and a fourth field corresponding to a guarantee period; extracting a calculation factor under each target dimension from the data to be processed according to the at least one field name, wherein the calculation factor comprises target guarantee fee income data corresponding to the first field, a target evaluation time point corresponding to the third field and a target guarantee period corresponding to the fourth field; calculating target passing time according to the target evaluation time point and the target guarantee period; calculating a target current insurance period according to the target guarantee period and the target elapsed time; calculating the sum of the target current insurance start period and a preset value as a target current insurance end period; calculating the difference value between the target current insurance deadline and the target evaluation time point and the preset value as a first target value; calculating the difference value between the target current insurance deadline and the target current insurance start period as a second target value; calculating a product of the second target value and the target premium revenue data as a third target value; calculating a quotient of the first target value and the third target value as a UPR value of the target dimension;

8. A computer device, characterized in that the computer device comprises:

a memory storing at least one instruction; and

a processor executing instructions stored in the memory to implement a big-data based insurance data analysis method of any of claims 1 to 6.

9. A computer-readable storage medium, characterized in that: the computer-readable storage medium has stored therein at least one instruction that is executed by a processor in a computer device to implement the big data-based insurance data analysis method of any one of claims 1 to 6.