CN112883091A

CN112883091A - Factor data acquisition method and device, computer equipment and storage medium

Info

Publication number: CN112883091A
Application number: CN202110036556.6A
Authority: CN
Inventors: 亓宁; 杨斐然; 刘剑
Original assignee: Ping An Asset Management Co Ltd
Current assignee: Ping An Asset Management Co Ltd
Priority date: 2021-01-12
Filing date: 2021-01-12
Publication date: 2021-06-01

Abstract

The application relates to the technical field of artificial intelligence, in particular to a factor data acquisition method, a factor data acquisition device, computer equipment and a storage medium. The method comprises the following steps: extracting source data with different data structures from a plurality of data sources according to preset frequency, and taking the extracted source data as heterogeneous multi-source data; loading heterogeneous multi-source data to a data lake, and performing layered processing on the loaded heterogeneous multi-source data in the data lake by using a distributed processing component to obtain factor key value data; and deploying the factor service corresponding to the factor key value data to the container and exposing a corresponding calling interface, wherein the calling interface is used for indicating the terminal to call the corresponding demand factor data. The application also relates to the technical field of the block chain, the demand factor data are stored in the block chain, and the acquisition efficiency of the demand factor can be improved by adopting the method.

Description

Factor data acquisition method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a factor data obtaining method, an apparatus, a computer device, and a storage medium.

Background

In an actual service scene, demand factors are often required to be acquired from a plurality of data sources respectively, and due to differences of data storage structures of different data sources, different algorithms are required to be written respectively to process data acquired from the data sources to acquire the demand factors.

However, when the number of data sources is large, more algorithms are needed to extract the demand factor data, thereby reducing the efficiency of acquiring the demand factor from the plurality of data sources.

Disclosure of Invention

In view of the above, it is necessary to provide a method, an apparatus, a computer device and a storage medium capable of improving factor data acquisition efficiency.

A factor data acquisition method, the method comprising:

extracting source data with different data structures from a plurality of data sources according to preset frequency, and taking the extracted source data as heterogeneous multi-source data;

loading heterogeneous multi-source data to a data lake, and performing layered processing on the loaded heterogeneous multi-source data in the data lake by using a distributed processing component to obtain factor key value data;

and deploying the factor service corresponding to the factor key value data to the container and exposing a corresponding calling interface, wherein the calling interface is used for indicating the terminal to call the corresponding demand factor data.

In one embodiment, performing hierarchical processing on the loaded heterogeneous multi-source data by using a distributed processing component in a data lake to obtain factor key value data includes:

extracting target source data from heterogeneous multi-source data, filtering the target source data according to a preset filtering algorithm to obtain filtered target data, and loading the filtered target data to a data warehouse;

dividing the data in the data warehouse into different theme databases according to a preset theme type;

and performing characterization processing on the basis of different theme databases to obtain model data, and performing mixed processing on the model data to obtain a data mart, wherein the data mart comprises factor key value data.

In one embodiment, the method for performing topic type division on data in a data warehouse according to a preset topic type to obtain different topic databases includes:

acquiring a service demand, and determining a theme type based on the service demand;

and clustering the data in the data warehouse according to the theme types to obtain theme databases respectively corresponding to different theme types.

In one embodiment, the characterizing process is performed based on different topic databases to obtain model data, and the mixing process is performed on the model data to obtain a data mart, including:

performing data extraction on data in the subject database based on business requirements to obtain extracted business data;

acquiring a data conversion algorithm, and performing data conversion processing on the extracted service data according to the data conversion algorithm to obtain conversion data;

and loading the converted data to obtain loaded data, and using the transshipped data as a data mart.

In one embodiment, deploying factor services corresponding to factor key value data to a container includes:

acquiring factor key value data and factor management configuration corresponding to the factor key value data;

determining factor service according to the factor key value data and the factor management configuration;

factor services are deployed to the service container.

In one embodiment, the method further comprises:

receiving a user service request, wherein the user service request carries user information and an address to be requested;

and checking the network flow of the user in unit time according to the user information, calling a corresponding calling interface based on the address to be requested when the network flow meets a preset condition, and calling the demand factor data according to the calling interface.

In one embodiment, the method further comprises: and storing the demand factor data into the block chain.

A factor data acquisition apparatus, the apparatus comprising:

the extraction module is used for extracting source data with different data structures from a plurality of data sources according to preset frequency, and taking the extracted source data as heterogeneous multi-source data;

the processing module is used for loading the heterogeneous multi-source data into the data lake and carrying out layered processing on the loaded heterogeneous multi-source data in the data lake by utilizing the distributed processing component to obtain factor key value data;

and the deployment module is used for deploying the factor service corresponding to the factor key value data to the container and exposing the corresponding calling interface, and the calling interface is used for indicating the terminal to call the corresponding demand factor data.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the method of any of the embodiments described above when the computer program is executed by the processor.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any of the above embodiments.

According to the factor data acquisition method, the factor data acquisition device, the computer equipment and the storage medium, source data with different data structures are extracted from a plurality of data sources according to the preset frequency, and the extracted source data are used as heterogeneous multi-source data; loading heterogeneous multi-source data to a data lake, and performing layered processing on the loaded heterogeneous multi-source data in the data lake by using a distributed processing component to obtain factor key value data; and deploying the factor service corresponding to the factor key value data to the container and exposing a corresponding calling interface, wherein the calling interface is used for indicating the terminal to call the corresponding demand factor data. The acquired heterogeneous multi-source data is processed in a unified manner in the data lake instead of coding different algorithms aiming at different structural data respectively, and the different algorithms are utilized to perform data processing respectively, so that the data processing efficiency is improved, and the factor data acquisition efficiency is improved.

Drawings

FIG. 1 is a diagram of an application environment of a factor data acquisition method in one embodiment;

FIG. 2 is a schematic flow chart diagram illustrating a factor data acquisition method in one embodiment;

FIG. 3 is a schematic diagram of data flow provided in one embodiment;

FIG. 4 is a technical architecture diagram provided in one embodiment;

FIG. 5 is a block diagram showing the construction of a factor data obtaining apparatus according to an embodiment;

FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The factor data acquisition method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The server 104 extracts source data with different data structures from a plurality of data sources according to preset frequency, and takes the extracted source data as heterogeneous multi-source data; loading heterogeneous multi-source data to a data lake, and performing layered processing on the loaded heterogeneous multi-source data in the data lake by using a distributed processing component to obtain factor key value data; and deploying the factor service corresponding to the factor key value data to the container and exposing a corresponding calling interface, wherein the calling interface is used for indicating the terminal 102 to call the corresponding demand factor data. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, a factor data obtaining method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

step 202, extracting source data with different data structures from multiple data sources according to preset frequency, and taking the extracted source data as heterogeneous multi-source data.

The heterogeneous multi-source data refers to data with different structure types, and specifically may include structured data, semi-structured data, unstructured data, and the like. In one embodiment, obtaining heterogeneous multi-source data comprises: source data is extracted from a plurality of types of data sources at a predetermined frequency, the source data including at least one of structured data, semi-structured data, unstructured data, and binary data.

The preset frequency corresponds to a time characteristic of obtaining the source data, and specifically, the source data may be obtained according to a frequency of obtaining the source data once a day, or according to a frequency of obtaining the source data once every preset time, or according to a frequency of obtaining the source data at a fixed time point, and the like, which is not limited herein. Also, since the source data may come from different databases, respectively, the data structure of the source data may be different.

Specifically, data in the heterogeneous multi-source data may be obtained from a plurality of different data sources, and different data sources may correspond to different databases, for example, some data sources may correspond to a relational database, and some data sources may correspond to a non-relational database, and specifically, the database types may include Oracle, MySQL, and the like. In a specific application scenario, the data source may specifically include a department internal data source, a company external data source, and the like, and further includes obtaining external data through an API interface. Referring to fig. 3, fig. 3 is a schematic diagram of data flow provided in an embodiment, and in fig. 3, it can be seen that data sources may include data inside a resource, data inside a clique, data outside a clique, and the like.

And 204, loading the heterogeneous multi-source data to a data lake, and performing layered processing on the loaded heterogeneous multi-source data in the data lake by using the distributed processing component to obtain factor key value data.

Among other things, a data lake is a way to store data in a natural format in a system or repository that facilitates configuring data, typically object blocks or files, in various patterns and structural forms. The main idea of the data lake is to store all data in the actual business uniformly, and convert the original data into converted data for various tasks such as reporting, visualization, analysis and machine learning. The data in the data lake includes structured data from relational databases (rows and columns), semi-structured data (logs of CSV, XML, JSON), unstructured data (email, document, PDF) and binary data (image, audio, video) to form one centralized data store to hold all forms of data. The core idea of the data lake is to store data with different structures uniformly, so that different data have a consistent storage mode, are convenient to connect when in use, and really solve the problem of data integration.

The hierarchical processing adopts a data processing mode for the loaded heterogeneous multi-source data in multiple modes. With continued reference to fig. 3, the hierarchical processing may specifically include implementing the hierarchical processing on the heterogeneous multi-source data by using the original layer STG, the summary layer ODS, and the data model layer DW. The factor key value data refers to factor data stored in a key value pair format. In the step, heterogeneous multi-source data is processed by a layered processing method, uniform formatting of the heterogeneous multi-source data is realized, uniform processing of data of various structure types is further realized, and a data basis is provided for subsequent factor data processing.

The factor data can be native index factor data directly extracted from heterogeneous multi-source data. Specifically, the index directly obtained after screening representative and general original indexes in the data lake is directly used as a part of the content of the factor library, or derivative factor data obtained after processing through multi-layer data processing in the data lake, specifically, a new factor is constructed by performing derivative calculation on the original data such as heterogeneous multi-source data, for example, the new factor can be obtained by calculating the change rate of the original index, calculating the ratio of the original index, calculating the time lag of the derivative index or based on the mixed calculation of the methods.

And, the type of the factor data may include an industry factor, the industry factor is composed of indexes having industry representativeness and having a significant effect on interpreting target variables (e.g., industry difference), for example, the industry factor may include a real estate factor (real estate loan, real estate price, etc.), a clothing retail factor (product price, product sale, import and export statistics, etc.), a steel factor (product yield, product price, etc.), and the like. The type of factor data may include macroscopic factors, which are made up of multidimensional data reflecting the situation of macroscopic economy, such as inflation, import and export, currency credit, popularity, etc. The macroscopic factors may specifically include inflation factors, popularity factors, interest rate factors, and monetary credits. The type of the factor data may include a publisher principal factor, which is configured from the perspective of characterizing the production and operation performance of the enterprise and the risk of the enterprise, and includes an enterprise valuation factor, a growth factor, a profitability factor, an asset structure factor, an operation efficiency factor, a scale factor, and specifically includes a growth factor, a profitability factor, an asset structure factor, an operation efficiency factor, and the like. The type of factor data can comprise market factors, and the planning content of the market factor library comprises transaction data reflecting the basic market, rating data, market public opinion, emotion data and the like, and by taking the online emotion index of the capital plane as an example, the data reflects the urgency degree of capital requirements in the financial markets of the whole market, large banks, middle and small banks and non-silver institutions.

In further embodiments, the product application to which the factor key value data corresponds may include: data products, factor tags, and synchronization data. The essence of the data product service is to pipeline data for accessing data to the user system according to the user's requirements. And the support for a user system is realized through data product service, and the object of key service is a user who has a customized requirement on data and a requirement on secondary processing or deep processing of the data in business. The data product specifically comprises a core model, a thematic theme, a factor tag and a depth analysis, wherein the core model comprises a default early warning model, a quantitative credit evaluation model and a bond pricing model; the topic theme comprises an industry portrait, a bond portrait, an enterprise portrait and a city investment topic; the factor label can specifically comprise a public opinion factor label, an industry popularity factor, a confidence evaluation factor label, a risk attribution factor and the like; the depth analysis comprises bond depth basic planes, interest rate time selection, enterprise depth basic planes, fund prediction and the like. In the embodiment, the data productization is used for realizing the planned and organized product research and development activities according to the actual business requirements of users such as users, the factor data service API is used for building a self management platform such as a financial management platform, subdividing markets and products, selecting information required by the factor data service API, and providing continuously updated products according to the continuously changing requirements of target user groups.

Carrying out layered processing on the loaded heterogeneous multi-source data by utilizing a distributed processing component in a data lake to obtain factor key value data, wherein the factor key value data comprises the following steps: the distributed processing components HIVE and the Flink are used as processing components, the obtained heterogeneous source data are processed in a layered mode according to the processing components to obtain processed data, the processed data are stored as factor key value data in a key-value format, and the data storage modes of the factor key value data are consistent.

And step 206, deploying the factor service corresponding to the factor key value data to the container and exposing a corresponding calling interface, wherein the calling interface is used for a terminal to call the corresponding requirement factor data.

The factor service includes factor data and management authority, and the management authority is used to manage authority for accessing corresponding factor data, and specifically may include whether the factor service has access authority, access traffic, access IP, and the like, and may further include information such as some preset configuration or log. In particular, the corresponding factor service may be deployed into a container, where the container may be specifically a K8S container, etc., without limitation.

The calling interface is provided for the user and can be called by the user to access specific data.

In one embodiment, performing hierarchical processing on the loaded heterogeneous multi-source data by using a distributed processing component in a data lake to obtain factor key value data includes: extracting target source data from heterogeneous multi-source data, filtering the target source data according to a preset filtering algorithm to obtain filtered target data, and loading the filtered target data to a data warehouse; dividing the data in the data warehouse into different theme databases according to a preset theme type; and performing characterization processing on the basis of different theme databases to obtain model data, and performing mixed processing on the model data to obtain a data mart, wherein the data mart comprises factor key value data.

The target source data is data extracted from a database corresponding to the data source, and the filtering algorithm is data obtained by performing data cleaning and filtering on the obtained target source data, and specifically may include filtering out error data, control data, data which does not meet the specification, and the like. In one embodiment, the filtering and cleaning the target source data according to a preset filtering algorithm to obtain filtered target data includes: and filtering and cleaning the target source data based on a machine cleaning algorithm to obtain filtered target data, wherein the machine cleaning algorithm comprises at least one of automatic verification, logic comparison, missing value supplement, format cleaning and the like.

In one embodiment, performing hierarchical processing on loaded heterogeneous multi-source data by using a distributed processing component in a big data lake to obtain factor key value data includes: and performing at least one layer of processing such as data source access, data ETL conversion, theme library construction, unstructured data extraction, characteristic engineering processing and the like on the heterogeneous multi-source data to obtain processed data.

The ETL (Extract-Transform-Load) is specifically used to describe a process of extracting (Extract), converting (Transform), and loading (Load) data from a source end to a destination end, and the ETL process is essentially a data flow process, in which different data flows from different data sources to different target data. ETL is an important ring for constructing a big data lake, and a user extracts required data from a data source, and finally loads the data into a data warehouse according to a predefined data warehouse model after data cleaning. In a specific embodiment, the ETL during the transformation process may include: null processing, normalizing data format, splitting data, verifying data correctness, replacing data and the like. The factor library is obtained through ETL processing, and the steps of feature engineering, model establishment and model inspection can be further executed on data in the factor library, so that the application of the model can be finally realized.

In one embodiment, the method for performing topic type division on data in a data warehouse according to a preset topic type to obtain different topic databases includes: acquiring a service demand, and determining a theme type based on the service demand; and clustering the data in the data warehouse according to the theme types to obtain theme databases respectively corresponding to different theme types.

The theme types may specifically include a bond library, an enterprise library, a market library, an industry library, a macro library, and the like, and it should be noted that different theme types based on the service scenario may be adaptively changed according to the service scenario, which is not limited herein. In one embodiment, the characterizing process based on different topic databases to obtain model data includes: and performing characteristic processing based on a preset AI model algorithm, a preset BI model algorithm, a preset information evaluation model algorithm and the like to obtain model data.

The characterization processing may specifically include data processing on data in different subject databases according to business requirements, and the modes of the characterization processing include characterization processing by an AI model, characterization processing based on a BI model, characterization processing based on a credit evaluation model, and the like. In a particular embodiment, the characterization process includes calculating data such as a rate of return factor. And the characterization processing process can further comprise an AI label extraction step, and specifically can comprise the step of performing characterization processing by using algorithms such as OCR image analysis, knowledge distillation model, NLP semantic analysis, event correlation capture and the like. Feature engineering may also include generating a single factor report, where the single factor report includes a single factor rate of return curve, IR, IC, and RIC history curves, among others. And factor validity screening can be further included in the characteristic engineering, and specifically includes grouping verification, IC, IR, RIC verification and the like. The characteristic engineering can also comprise factor weighted combination and the like. After the factor characteristic engineering is executed, model establishment can be further included, and specifically includes random forest model establishment, CBDT model establishment, logistic regression model establishment, linear regression model establishment and the like. And further comprises model test, wherein the model test comprises strategy co-test, ROC test and the like. Finally, model application can be realized, and the model application comprises risk attribution, income attribution, stock selection strategy and the like.

The Data Mart (Data Mart) is a Data market, and the Data Mart meets the requirements of a specific department or user, is stored in a multidimensional way, and comprises defined dimensions, indexes needing to be calculated, the hierarchy of the dimensions and the like, and generates a Data cube facing the requirements of decision analysis. The factor key value data is factor data stored in a key value pair format, specifically, the data mart corresponds to a data product mart APP, and a field related to the data product is stored in a KeyValue form. And the data product market APP and the external data table share the same set of files, so that one file can be shared, only one file can be modified during modification, and the over-time data can be deleted by configuring the TTL, so that the storage space can be saved, and the management is convenient.

In the embodiment, in a specific application scenario, data unification processing is performed on the obtained heterogeneous multi-source data by combining a big data lake technology, and finally factor key value data is obtained, so that unification processing on different data source data is realized, and the data processing efficiency is improved.

In one embodiment, the characterizing process is performed based on different topic databases to obtain model data, and the mixing process is performed on the model data to obtain a data mart, including: performing data extraction on data in the subject database based on business requirements to obtain extracted business data; acquiring a data conversion algorithm, and performing data conversion processing on the extracted service data according to the data conversion algorithm to obtain conversion data; and loading the converted data to obtain loaded data, and using the transshipped data as a data mart.

In one embodiment, deploying factor services corresponding to factor key value data to a container includes: acquiring factor key value data and factor management configuration corresponding to the factor key value data; determining factor service according to the factor key value data and the factor management configuration; factor services are deployed to the service container.

In one embodiment, deploying the factor service corresponding to the factor key value data to the container and exposing the corresponding call interface includes: and deploying the factor service corresponding to the factor key value data to a K8S container, configuring management settings in the service, wherein the management settings comprise permission settings, access tracking settings, IP binding settings, flow control settings, application settings and the like, setting configurations, logs and the like, and exposing RESTFUL calling interfaces to the outside.

The K8S (kubernets) containers are applied and deployed in a container deployment mode, the containers are isolated from one another, each container has a file system, processes among the containers cannot affect one another, and computing resources can be distinguished. Compared with a virtual machine, the container can be deployed rapidly, and the container can be migrated among different clouds and different versions of operating systems because the container is decoupled from underlying facilities and a machine file system. Moreover, the container occupies less resources and is fast to deploy, each application can be packaged into a container mirror image, the container has greater advantages due to the one-to-one relationship between each application and the container, and the container mirror image can be created for the application at the stage of build or release by using the container, because each application does not need to be combined with other application stacks and does not depend on a production environment infrastructure, a consistent environment can be provided from research and development to test and production. Similarly, containers are lighter weight, more "transparent" than virtual machines, which is more convenient to monitor and manage.

And when factor service deployment is carried out, data security management is improved through deployment management configuration, wherein authority setting, access tracking, IP binding, flow control, application setting and the like are specifically configured.

And, by leaking RESTFUL interface to the storm, the user can realize to call the factor data quickly and simply. Specifically, restul is a design style and development manner of web applications, and based on HTTP, XML format definition or JSON format definition may be used. RESTFUL is suitable for a scene that a mobile internet manufacturer serves as a service enabling interface, the function that a third party OTT calls mobile network resources is achieved, and the action type is to add, change and delete the called resources.

In one embodiment, the method further comprises: receiving a user service request, wherein the user service request carries user information and an address to be requested; and checking the network flow of the user in unit time according to the user information, calling a corresponding calling interface based on the address to be requested when the network flow meets a preset condition, and calling the demand factor data according to the calling interface.

In one embodiment, the method further comprises: and setting the single-user unit time restriction value to enable the user to execute the step of calling the interface according to the set single-user unit time restriction value.

The single-user unit event flow limiting value refers to a data stream corresponding to each user in unit time, and through carrying out flow limiting control on a single user, different users can be guaranteed to access data more uniformly, and the distribution balance of resources is guaranteed.

Fig. 4 is a technical architecture diagram provided in an embodiment, as shown in fig. 4, and includes a gateway layer, an application layer, a data integration layer, and a data source in fig. 4. The users can execute data calling service in the gateway layer, and current limiting control is carried out on a single user, so that different users can be guaranteed to access data more uniformly, and resource distribution balance is guaranteed.

In one embodiment, the factor data obtaining device further comprises a storage module, and the storage module is used for storing the demand factor data into the block chain.

It is emphasized that, to further ensure the privacy and security of the demand factor data, the demand factor data may also be stored in a node of a block chain.

With continued reference to fig. 3, in fig. 3, the server obtains heterogeneous multi-source data, where the heterogeneous multi-source data refers to data from different data sources, and the different data sources refer to different data contents and different data storage structures in the database, including structured data and unstructured data. For example, in an embodiment, the target demand factors include 10 factors that need to be obtained, at this time, 3 factors may need to be extracted from the data source 1, 2 factors from the data source 2, 5 factors from the data source 3, and the like. According to the proposal, the HBase can be butted with one data product interface table, and the HBase does not need to be butted with multiple heterogeneous data sources respectively, so that the data processing efficiency is greatly improved. Specifically, the technical implementation steps of the data integration layer include: the heterogeneous multi-source data is loaded into a big data lake, a distributed processing component HIVE and a Flink are used as processing tools to conduct data layering processing, specifically, three-layer processing such as STG, ODS and DW can be conducted through 3 layers, and specifically, the processing process comprises 5 modes of data source access, data ETL conversion, theme library building, unstructured data extraction, characteristic engineering processing and the like. And further realizes that the structured or unstructured data is processed in 5 ways. In addition, the data cluster generation is realized by distributed computing, specifically using path language to realize DASK ETL and the like, specifically using PYSPARK, PYHIVE and other technologies to realize distributed computing, and the data cluster generation efficiency is improved. And storing the processed factor data in HBase in a KEY-VALUE form, deploying the factor service into K8S, and revealing a restful interface for an external user to call. The restful interface is simple in form and convenient for a user to directly call.

In a specific embodiment, with reference to fig. 3, in the step of accessing data of a data source included in fig. 3, because the types of databases corresponding to different data sources are different, the obtained data structures are different, and thus data processing needs to be performed on accessed heterogeneous multi-source data. Specifically, for data access, the system must perform data protocol conversion on the accessed data, and send the data downstream through data monitoring and data sorting. The data access API supports various programming languages, such as file batch import in JAVA, C + +, Restful and FTP/SFTP modes, and simultaneously supports control of data release frequency in a configuration mode, so that problems to the system are avoided; and carrying out structure standardization on accessed native data of multiple data sources, checking and cleaning the quality of the data sources, and controlling and managing the authority of the data sources. The data processing step is also included, and for the data processing, a data standardization system is established to carry out standardization integration on the obtained heterogeneous multi-source data so as to realize the processing and monitoring of the heterogeneous multi-source data and realize real-time data interaction; and through certain computational logic, the primary data of a plurality of data sources are counted and processed for the second time, and data fields are supplemented, such as: the daily height, daily volume, the number of internal disks and the like, and the processed data are repeatedly checked and cleaned; generating a standardized data format based on a data model, caching data according to a service scene and caching time sequence data, end-of-day processing, TICK data storage and the like related to historical data; and (3) customizing each product by using a product model (model concept), wherein each piece of data corresponds to the information of the underlying basic database and is integrated for unified management. And the method also comprises a data issuing step, wherein for the data issuing, the data is issued to the downstream by a targeted method according to different application scenes, and the data issuing comprises the steps of data sending API interface system establishment, data security management, data encryption/decryption, certificate and user verification, data authorization management, flow monitoring, access tracking, embedded point monitoring, application setting and establishment of an internal and external unified data protocol.

Meanwhile, in the production practice, the behavior of each user can be tracked through API (application programming interface) embedded point monitoring, and the use degree of the key flow is counted. Under the condition that infrastructure and product monitoring are relatively complete, the BI is considered to be brought into a monitoring system, and the analysis of business data can be used as a value-added service submission management layer and used as a reference icon of business aspects such as a business growth model, product pressure, function preference, behavior portrait construction and the like.

In an actual business scenario, such as investment strategy and model correction and optimization, in order to achieve the goals of strategy correction and model optimization, a technical problem that how to achieve a data standardized output function is urgently needed to be solved. In the embodiment, the obtained heterogeneous multi-source data is processed in a layered manner, so that factor key value data with a uniform structure is obtained, and the factor service corresponding to the factor key value data is deployed to provide the corresponding calling interface, so that when a user needs to call the corresponding factor data, the user only needs to directly call the corresponding calling interface without respectively butting different types of source databases, and the efficiency of obtaining the factor data is improved. The API factor library provides factor label transmission and model result output necessary for various models and business scenes, and the factor label transmission and the model result output are combined with static databases corresponding to various data sources to jointly meet actual business requirements such as user investment return test, valuation analysis and performance evaluation of investment portfolio.

According to the method and the device, unified factor key value data are obtained by executing data processing on the acquired heterogeneous multi-source data, unified processing on the non-unified heterogeneous multi-source data is further achieved, it is guaranteed that a follow-up user can call corresponding demand data from the unified factor key value data, and the efficiency of the user for acquiring data is improved. And according to specific service requirements, data are processed in a big data lake in a targeted manner, so that the obtained target data are suitable for specific application scenes, and the processing through the distributed components also improves the data processing efficiency. And by carrying out current-limiting control on a single user, different users can be ensured to access data more uniformly, and the distribution balance of resources is ensured. By leaking the RESTFUL interface to the external exposure, the data can be quickly and simply called by a user.

It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

In one embodiment, as shown in fig. 5, there is provided a factor data acquisition apparatus including:

the extracting module 502 is configured to extract source data with different data structures from multiple data sources according to a preset frequency, and use the extracted source data as heterogeneous multi-source data.

The processing module 504 is configured to load the heterogeneous multi-source data into the data lake, and perform layered processing on the loaded heterogeneous multi-source data by using the distributed processing component in the data lake to obtain factor key value data.

A deployment module 506, configured to deploy the factor service corresponding to the factor key value data to the container and expose a corresponding call interface, where the call interface is used to instruct the terminal to call the corresponding demand factor data.

In one embodiment, the processing module 504 is further configured to extract target source data from the heterogeneous multi-source data, filter the target source data according to a preset filtering algorithm to obtain filtered target data, and load the filtered target data into the data warehouse; dividing the data in the data warehouse into different theme databases according to a preset theme type; and performing characterization processing on the basis of different theme databases to obtain model data, and performing mixed processing on the model data to obtain a data mart, wherein the data mart comprises factor key value data.

In one embodiment, the processing module 504 is further configured to obtain a business requirement, and determine a topic type based on the business requirement; and clustering the data in the data warehouse according to the theme types to obtain theme databases respectively corresponding to different theme types.

In one embodiment, the processing module 504 is further configured to perform data extraction on data in the topic database based on business requirements to obtain extracted business data; acquiring a data conversion algorithm, and performing data conversion processing on the extracted service data according to the data conversion algorithm to obtain conversion data; and loading the converted data to obtain loaded data, and using the transshipped data as a data mart.

In one embodiment, the deployment module 506 is further configured to obtain factor key value data and a factor management configuration corresponding to the factor key value data; determining factor service according to the factor key value data and the factor management configuration; factor services are deployed to the service container.

In one embodiment, the factor data obtaining device further comprises a checking module, wherein the checking module is used for receiving a user service request, and the user service request carries user information and an address to be requested; and checking the network flow of the user in unit time according to the user information, calling a corresponding calling interface based on the address to be requested when the network flow meets a preset condition, and calling the demand factor data according to the calling interface.

For the specific definition of the factor data obtaining device, reference may be made to the above definition of the factor data obtaining method, which is not described herein again. The modules in the factor data acquisition device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store metadata. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a factor data acquisition method.

Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: extracting source data with different data structures from a plurality of data sources according to preset frequency, and taking the extracted source data as heterogeneous multi-source data; loading heterogeneous multi-source data to a data lake, and performing layered processing on the loaded heterogeneous multi-source data in the data lake by using a distributed processing component to obtain factor key value data; and deploying the factor service corresponding to the factor key value data to the container and exposing a corresponding calling interface, wherein the calling interface is used for indicating the terminal to call the corresponding demand factor data.

In one embodiment, the processor, when executing the computer program, performs the steps of: extracting target source data from heterogeneous multi-source data, filtering the target source data according to a preset filtering algorithm to obtain filtered target data, and loading the filtered target data to a data warehouse; dividing the data in the data warehouse into different theme databases according to a preset theme type; and performing characterization processing on the basis of different theme databases to obtain model data, and performing mixed processing on the model data to obtain a data mart, wherein the data mart comprises factor key value data.

In one embodiment, the processor, when executing the computer program, performs the steps of: acquiring a service demand, and determining a theme type based on the service demand; and clustering the data in the data warehouse according to the theme types to obtain theme databases respectively corresponding to different theme types.

In one embodiment, the processor, when executing the computer program, performs the steps of: performing data extraction on data in the subject database based on business requirements to obtain extracted business data; acquiring a data conversion algorithm, and performing data conversion processing on the extracted service data according to the data conversion algorithm to obtain conversion data; and loading the converted data to obtain loaded data, and using the transshipped data as a data mart.

In one embodiment, the processor, when executing the computer program, performs the steps of: acquiring factor key value data and factor management configuration corresponding to the factor key value data; determining factor service according to the factor key value data and the factor management configuration; factor services are deployed to the service container.

In one embodiment, the processor, when executing the computer program, performs the steps of: receiving a user service request, wherein the user service request carries user information and an address to be requested; and checking the network flow of the user in unit time according to the user information, calling a corresponding calling interface based on the address to be requested when the network flow meets a preset condition, and calling the demand factor data according to the calling interface.

In one embodiment, the processor, when executing the computer program, performs the steps of: and storing the demand factor data into the block chain.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: extracting source data with different data structures from a plurality of data sources according to preset frequency, and taking the extracted source data as heterogeneous multi-source data; loading heterogeneous multi-source data to a data lake, and performing layered processing on the loaded heterogeneous multi-source data in the data lake by using a distributed processing component to obtain factor key value data; and deploying the factor service corresponding to the factor key value data to the container and exposing a corresponding calling interface, wherein the calling interface is used for indicating the terminal to call the corresponding demand factor data.

In one embodiment, the computer program when executed by the processor further performs the steps of: extracting target source data from heterogeneous multi-source data, filtering the target source data according to a preset filtering algorithm to obtain filtered target data, and loading the filtered target data to a data warehouse; dividing the data in the data warehouse into different theme databases according to a preset theme type; and performing characterization processing on the basis of different theme databases to obtain model data, and performing mixed processing on the model data to obtain a data mart, wherein the data mart comprises factor key value data.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a service demand, and determining a theme type based on the service demand; and clustering the data in the data warehouse according to the theme types to obtain theme databases respectively corresponding to different theme types.

In one embodiment, the computer program when executed by the processor further performs the steps of: performing data extraction on data in the subject database based on business requirements to obtain extracted business data; acquiring a data conversion algorithm, and performing data conversion processing on the extracted service data according to the data conversion algorithm to obtain conversion data; and loading the converted data to obtain loaded data, and using the transshipped data as a data mart.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring factor key value data and factor management configuration corresponding to the factor key value data; determining factor service according to the factor key value data and the factor management configuration; factor services are deployed to the service container.

In one embodiment, the computer program when executed by the processor further performs the steps of: receiving a user service request, wherein the user service request carries user information and an address to be requested; and checking the network flow of the user in unit time according to the user information, calling a corresponding calling interface based on the address to be requested when the network flow meets a preset condition, and calling the demand factor data according to the calling interface.

In one embodiment, the computer program when executed by the processor further performs the steps of: and storing the demand factor data into the block chain.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A factor data acquisition method, the method comprising:

loading the heterogeneous multi-source data to a data lake, and performing layered processing on the loaded heterogeneous multi-source data in the data lake by using a distributed processing component to obtain factor key value data;

and deploying the factor service corresponding to the factor key value data to a container and exposing a corresponding calling interface, wherein the calling interface is used for indicating a terminal to call the corresponding demand factor data.

2. The method of claim 1, wherein the performing hierarchical processing on the loaded heterogeneous multi-source data in the data lake by using distributed processing components to obtain factor key value data comprises:

extracting target source data from the heterogeneous multi-source data, filtering the target source data according to a preset filtering algorithm to obtain filtered target data, and loading the filtered target data to a data warehouse;

3. The method according to claim 2, wherein the dividing the data in the data warehouse into different topic databases according to a preset topic type comprises:

4. The method of claim 3, wherein the characterizing the different topic databases to obtain model data and the blending the model data to obtain data marts comprises:

performing data extraction on the data in the theme database based on the business requirement to obtain extracted business data;

5. The method of claim 1, wherein deploying the factor service corresponding to the factor key value data to a container comprises:

deploying the factor service to a service container.

6. The method according to any one of claims 1 to 5, further comprising:

7. The method according to any one of claims 1 to 5, further comprising:

and storing the demand factor data into a block chain.

8. A factor data acquisition apparatus, characterized in that the apparatus comprises:

the processing module is used for loading the heterogeneous multi-source data to a data lake and carrying out layered processing on the loaded heterogeneous multi-source data in the data lake by utilizing a distributed processing assembly to obtain factor key value data;

and the deployment module is used for deploying the factor service corresponding to the factor key value data to a container and exposing a corresponding calling interface, and the calling interface is used for indicating a terminal to call the corresponding demand factor data.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.