Detailed Description
For the understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention.
The embodiment of the invention provides a multi-source data integration system based on a big data technology, and as shown in fig. 1, the system comprises a business end processing device 1, an interface processing device 2, a big data cluster device 3, a task monitoring device 4 and an external data source 5.
Specifically, the service-side processing device 1 may be a plurality of service sides, and serves as a data interface of a specific service, and a user initiates a data query request through the service sides, and may simultaneously carry one or more parameters when initiating the request; meanwhile, the service-side processing device 1 may also generate a data query request table, and log in a data query request initiated by a user to the data query request table; then, the data query request is sent to the interface processing device 2;
the interface processing device 2: the data query request is used for receiving a data query request from a service end, analyzing the data query request, determining whether to generate a new data retrieval request for the data query request according to an analysis result (requiring deduplication processing) and recording the new data retrieval request into a data retrieval request table, analyzing the association condition of the data retrieval request, judging a corresponding data interface or data crawling channel, further correspondingly initiating the corresponding data interface request or data crawling request, and starting an interface data acquisition program or/and a data crawling program to acquire corresponding data.
For different data, different acquisition channels exist; some external data sources open data interfaces for the big data integrated system, and the interface processing device 2 can be directly connected to a database of the external data sources through the interfaces to acquire corresponding data; some external data sources are public data and require crawling of the data by a crawler.
After the interface processing device 2 acquires the corresponding data, according to the record of the data calling request table, the data is associated with the corresponding data calling request in the data calling request table, which indicates that the data is acquired corresponding to the data calling request; meanwhile, relevant rule conditions of the data calling request are read from the task monitoring device 4, association is established for the data relation according to the rule conditions, and the data association relation is stored; so that the relevant data can be directly called through the data association relationship subsequently without repeatedly acquiring external data; meanwhile, the method can be used for subsequent data analysis, mining, integration and the like, so that the reusability and high availability of the data are enhanced, and the redundancy of the data is reduced.
After the interface processing device 2 establishes data association, the obtained data is sent to the big data cluster device 3 for analysis and processing to obtain a data result which is required by the service end and can be received at the same time.
That is, after acquiring the relevant data through the external data source 5, the interface processing device 2 only temporarily stores the relevant data, but does not store the data for a long time or perform analysis processing, and only performs data relationship association on the relevant data, and then sends the relevant data to the big data cluster device 3 for deep processing analysis processing; the memory space is released after temporarily storing the data for a while to ensure that the interface processing device 2 can be in a light load state.
After the big data cluster device 3 analyzes the data and forms a data result, associating the data result with a corresponding data calling request in the data calling request table; the task monitoring device 4 monitors in real time that the monitored conditions and states meet the preset conditions, namely, the triggering device triggers the interface processing device 2 to request the big data cluster device 3 to send the data result; after receiving the data result, the interface processing device 2 associates the data result with the corresponding data query request through the data call request table according to the data association relationship recorded in the data call request table, and sends the data result to the corresponding service end. After receiving the data result, the service-side processing device 1 associates the data result with the corresponding data query request in the data query request table, and at the same time, may send a notification to the user and display the data result.
The large data cluster device 3: as described above, the large data cluster device 3 is directly docked to the interface processing device 2; the big data cluster device 3 receives the data and the data calling request table from the interface processing device 2, analyzes and processes the data to obtain a data result which is required by the service end and acceptable, stores the data result, records the result state into the data calling request table, and establishes data association between the data result and the corresponding data calling request of the data calling request table.
The task monitoring device 4 monitors the data state in the data call request table in real time, for example, a data result is formed, data acquisition is successful, data acquisition is failed, and the like, and correspondingly triggers execution of different processing mechanisms, for example, the interface processing device 2 is triggered to request the data result so that the big data cluster device 3 returns the data result, the interface processing device 2 is triggered to send data to the big data cluster device 3 for processing, the interface processing device 2 is triggered to acquire data again, and the like.
In addition, the big data cluster device 3, as a big data warehouse, can actively extract various data from the interface processing device 2 or the service end in real time or at regular time, perform deep learning, analysis, mining, integration, and the like on the data, and can classify the data processing result for subsequent calling or continuous mining, and the like.
In general, the big data cluster device 3 is used for integrally processing data of the service end and the interface processing device 2, and automatically executing data flow in real time according to a pre-established data analysis model, a data mining model, a data integration logic model, a service association logic model and the like, and according to a pre-set task flow and a data processing mechanism, and forming and storing a corresponding data result, so as to facilitate subsequent direct calling of related interface data without repeatedly acquiring external data; meanwhile, the method can be used for subsequent data analysis, mining, integration and the like, so that the reusability and high availability of the data are enhanced, and the redundancy of the data is reduced.
The task monitoring device 4: the task monitoring device 4 is configured to store information such as trigger conditions, time conditions, operation conditions, data states, system states, operation states and other pre-trigger conditions set in processing programs of the multiple service-side processing devices 1, the interface processing devices 2, the big data cluster device 3 and the like, and corresponding processing mechanisms, and the task monitoring device 4 monitors the relevant trigger conditions in real time and sends the corresponding processing mechanisms to the corresponding processing devices, and after receiving the corresponding processing mechanisms, the service-side processing devices 1, the interface processing devices 2, and the big data cluster device 3 correspondingly execute the relevant programs, such as data result display, data acquisition, data ETL, data stream execution, data output and the like. The data ETL specifically refers to extracting, converting, and loading data from a source end to a destination end.
For example, when a user clicks a query button on a service end, a task is essentially triggered; after monitoring the task, the task monitoring device 4 triggers the service-side processing device 1 to first determine whether the user has initiated the same data query request before according to the pre-stored triggering condition and processing mechanism, if the task monitoring device 4 monitors that the query request initiated by the user at the service side belongs to a repeated query request (i.e. the same query request has been initiated before), and corresponding data results have been generated corresponding to the previous query request, the user wants to query the data results again at present, but the service side does not store the data results, the data results are stored in the big data cluster device 3, and the service side only stores basic data; then, the task monitoring device 4 triggers the interface processing device 2 to request the big data cluster device 3 to send the data result according to the condition, the big data cluster device 3 retrieves the data result and sends the data result to the interface processing device 2, and the interface processing device 2 forwards the data result to the corresponding service end.
For another example, as described above, when the big data cluster device 3 parses data to generate a data result, and records the state of the data result in the corresponding data retrieval request table, and the task monitoring device 4 monitors that the state is "generated data result", correspondingly, the triggering device of the task monitoring device 4 triggers the interface processing device 2 to request the big data cluster device 3 to send the data result, and sends the data result to the corresponding service end according to the data association relationship in the data retrieval request table.
The big data cluster device 3 will actively extract data from the interface processing device 2 or the business end processing device 1 to analyze and process, but will not actively send data to the interface processing device 2; therefore, the task monitoring device 4 can monitor the data state and conditions in the big data cluster device 3 in real time, and further trigger the corresponding execution mechanism according to the monitoring result.
External data source 5: and providing external data for the interface processing device, wherein the external data comprises data from a plurality of interface channels and a plurality of crawling channels. The interface processing device 2 is connected to the external data source 5 and the service end processing device 1 in a butt joint mode, and after receiving a data query request of the service end processing device 1, the interface processing device 2 analyzes and judges which data interface or data crawling channel the data is acquired through according to parameters and association conditions of the data query request.
In an application scenario, the interface processing apparatus 2 may acquire the required external data through one or more interface channels or/and a plurality of crawling channels.
Meanwhile, the interface processing device 2 records the integrity of external data acquisition in the data calling request table, if the acquired data is complete, the "data acquisition success" is marked, and if the acquired data is incomplete, the "data acquisition failure" is marked; when reading the data acquisition failure state of the data retrieval request table, the task monitoring device 4 triggers the interface processing device 2 to continue to acquire the rest of the remaining data according to a preset processing mechanism, so that the complete data can be finally acquired.
According to the multi-source data integration system based on the big data technology, the business end processing device 1, the big data cluster device 3 and the external data source 5 are effectively linked through the interface processing device 2, and a real and effective big data integration system is formed; the system can be suitable for various different service scenes; when a new service end or a service system needs to be added into the data integration system, only an interface needs to be newly developed in the interface processing device 2 to the new service end processing device 1, so that the interface cost can be reduced; the big data cluster device 3 only needs to provide one uniform interface to transmit data with the interface processing device 2, and various different service ends do not need to be directly connected, so that the risk of the big data cluster device 3 can be greatly reduced; therefore, the integrated system can be compatible and suitable for various different service systems, saves data calling resources, and simultaneously can ensure the safety of the big data cluster processing device.
In addition, for the acquisition and processing of data, the interface processing device 2 acquires data according to the data calling request, and sends the acquired data to the big data cluster device 3 for analysis processing, that is, the interface processing device 2 is responsible for acquiring data, establishing data relation association, and releasing the data after temporarily storing the data, while the big data cluster device 3 is mainly responsible for deep processing of the data, forming a data result required by a service end, and storing the data result; the two are mutually matched to form a big data system together; because the interface processing device 2 only needs to transmit data and does not store data for a long time, even if the interface processing device 2 is connected with a plurality of service terminals at the same time, the interface processing device can work under light load, has low requirement on system hardware, and can save related cost.
Correspondingly, the embodiment of the invention provides a multi-source data integration method based on a big data technology, and the method specifically comprises the following steps.
A user initiates a data query request through the service end processing device 1, and one or more parameters can be simultaneously taken when the request is initiated; meanwhile, a data query request table can be generated, and a data query request initiated by a user is recorded in the data query request table; then, the data query request is sent to the interface processing device 2.
The interface processing device 2 receives the data query request from the service end processing device 1, analyzes the data query request, determines whether to generate a new data retrieval request for the data query request according to an analysis result (requiring deduplication processing) and records the new data retrieval request into a data retrieval request table, analyzes the association condition of the data retrieval request, judges a corresponding data interface or data crawling channel, further correspondingly initiates a corresponding data interface request or data crawling request, and starts an interface data acquisition program or/and a data crawling program to acquire corresponding data.
After acquiring the corresponding data, the interface processing device 2 associates the data with the corresponding data retrieval request in the data retrieval request table according to the record of the data retrieval request table, indicating that the data is acquired corresponding to the data retrieval request, and marks that the data is successfully acquired in the data retrieval request table corresponding to the data retrieval request; meanwhile, relevant rule conditions of the data calling request are read from the task monitoring device 4, association is established for the data relation according to the rule conditions, and the data association relation is stored; so that the relevant data can be directly called through the data association relationship subsequently without repeatedly acquiring external data; meanwhile, the method can be used for subsequent data analysis, mining, integration and the like, so that the reusability and high availability of the data are enhanced, and the redundancy of the data is reduced.
After the interface processing device 2 establishes the preliminary association of the data, the obtained data and the data calling request table are sent to the big data cluster device 3 for analysis processing, and the data calling request table are processed into data results which are required by the service end and can be received at the same time.
The big data cluster device 3 analyzes and processes the data, forms and stores data results, and establishes data association between the data results and corresponding data calling requests in the data calling request table. The big data cluster device 3 can further analyze, mine and integrate the data to enhance the reusability and high availability of the data and reduce the redundancy of the data; subsequent direct calls to the relevant data can also be facilitated without repeated retrieval of external data.
Specifically, the interface processing device 2 analyzes the data query request, including determining whether the data query request has a data result generated, if so, directly calling the corresponding data result, and associating the data result with the data query request; if not, judging whether the same data query request is associated in the data calling request table or not, and if so, associating the data query request with the same data query request in the data calling request table; and if not, generating a new data calling request corresponding to the data query request in the data calling request table.
In addition, as a preferred embodiment, before determining whether the data query request generates a data result, the interface processing device 2 may further include, to analyze whether the user has initiated the same data query request, and if so, trigger the user to select to initiate the latest data query request or view the data result of the historical query; if the user selects to initiate the latest data query request, continuously judging whether the data query request generates a data result; if the user selects to check the data result of the historical query, triggering to call the data result of the historical query; if not, continuously judging whether the data query request generates a data result.
Specifically, establishing a data association relationship, including associating a data query request table of the service end processing device with a data retrieval request table of the interface processing device; establishing association between the acquired data and corresponding data calling requests in a data calling request table; establishing association between the data result and the corresponding data calling request in the data calling request table; and establishing association between the data result and the corresponding data query request through a data retrieval request table.
The task monitoring device 4 monitors in real time that the data conditions and the states meet the preset conditions, and triggers the corresponding device to execute a corresponding processing mechanism according to the preset triggering conditions and the processing mechanism thereof;
for example, if the task monitoring device 4 monitors that a data query request is recorded in the data query request table of the service-side processing device 1 in real time, the task monitoring device triggers the service-side processing device 1 to determine an attribute of the data query request, for example, whether the data query request belongs to a new data query request or an old data query request; if the data is a new data query request, triggering the service end processing device 1 to send the data query request to the interface processing device 2; the interface processing device 2 processes the data query request.
For another example, the task monitoring device 4 monitors the data obtaining state of the interface processing device 2 corresponding to the data retrieval request mark in the data retrieval request table in real time, for example, the data obtaining is successful, then the task monitoring device 4 triggers the interface processing device 2 to send data to the big data cluster device 3 for analysis and processing into a data result required by the service end; if the interface processing device 2 cannot acquire the corresponding data, the corresponding data retrieval request in the data retrieval request table marks data acquisition failure, and the task monitoring device 4 monitors the data acquisition state in real time, then the interface processing device 2 may be triggered to process according to a preset rule condition, for example, to acquire again when a preset time is reached.
For another example, after the big data cluster device 3 analyzes the received data, a data result is generated, the data result is associated with a corresponding data retrieval request in the data retrieval request table, and a data result generation state is marked corresponding to the data retrieval request, for example, the data result is successfully generated; if the task monitoring device 4 monitors the data state in real time, it will trigger the interface processing device 2 to request the big data cluster device 3 to send the data result; if the data result generation state is that the data result generation fails, the task monitoring device 4 triggers the big data cluster device 3 to analyze the reason of the failure generation and then triggers the next processing flow according to the reason; for example, the reason for the generation failure is that the interface processing device 2 fails to acquire all the data successfully, and only part of the interface data is acquired successfully, the interface processing device 2 is triggered to acquire the interface data that has failed to acquire again; and so on.
In practical application, a user may need to query various different data in different service scenarios, and then, according to the scheme provided by the embodiment of the present invention, the user can query different data only through the service terminal of the user. The scheme of the invention is further explained by taking the example that the user A needs to inquire the enterprise data.
1. When enterprise data needs to be inquired, a user A can initiate a data inquiry request through a self service end. In the request, there is a query object, such as Enterprise A.
It should be noted that the user a may be an individual user or an enterprise user. In this embodiment, the user a is an enterprise user, and the query operations of the main account or other sub-accounts of the enterprise all belong to the query operation of the same enterprise user.
When a user a initiates a data query request through a service end, the user a has already triggered a task, the task monitoring device 4 monitors the data query request task in real time, and triggers the service end processing device 1 through the triggering device to determine whether the query request belongs to a new query request or an old query request (the old query request refers to the query request that the user has already queried before, and the same query request is raised again at present).
The business-side processing device 1 may analyze the association condition between the user a and the queried object (enterprise a) in the data query request, and determine whether the user a has queried the data of the enterprise a (within a certain time limit), has generated a corresponding query result, and the like, according to the record of the data query request table.
2. For the analysis result of the data query request, in an actual service scenario, several situations can be classified, as follows.
2.1. Scene 1: according to the record of the data query request table of the service end processing device 1, the user A queries the enterprise A to be queried before and generates a data result; when the task monitoring device 4 monitors the analysis result in real time, it may trigger the service-side processing device 1 to return a relevant prompt to the user, and may let the user a select "initiate a latest query request" or "view a data result of a historical query".
If the user A selects to initiate the latest query request, the task monitoring device 4 triggers the service end processing device 1 to initiate a data query request to the interface processing device 2, and records the data query request in a data query request table;
after receiving the data query request, the interface processing device 2 determines whether there is an associated record in the enterprise a currently queried, for example, whether there is a data result associated record corresponding to the enterprise a (in practical applications, it may be set that the data result within one month of the current time point includes a data result of the enterprise a requested to be queried by another user);
if the data result exists, the interface processing device 2 initiates a data result retrieval request to the big data cluster device 3 without generating a new data result, and retrieves the data result; the big data cluster device 3 searches the data result from the database thereof according to the association record of the data calling request table and sends the data result to the interface processing device 2, then the interface processing device 2 forwards the data result to the corresponding service end processing device 1, and the service end processing device 1 displays the data result to the user;
if not, a new data result needs to be generated; the interface processing device 2 further analyzes whether it is necessary to generate a data retrieval request for the data retrieval request of the currently queried enterprise a and logs in the data retrieval request table.
Specifically, the interface processing device 2 determines whether the enterprise a being queried is already in the current latest data retrieval request table, that is, determines whether the data retrieval request concerning the enterprise a is already recorded in the current latest data retrieval request table (since there is a possibility that other users also initiate a data retrieval request to the enterprise a, and the interface processing device 2 has already generated and posted a data retrieval request for this purpose in the data retrieval request table); if the data retrieval request table already records the data retrieval request of the enterprise A, the data retrieval request table does not need to record again so as to avoid repeated query, and the current data query request of the enterprise A is directly associated with the data retrieval request of the enterprise A in the data retrieval request table; if no record exists, the data retrieval request for enterprise A is logged into the current latest data retrieval request table.
That is, the same query requests are correlated to generate a data calling request; therefore, other query requirements of the same data query request can be met only by completing data calling once, and waste of data calling resources is reduced.
Meanwhile, the interface processing device 2 associates the data retrieval request of the enterprise a in the data retrieval request table with the corresponding data query request in the data query request table of the business end processing device 1, so that the subsequently generated data result is conveniently associated with the data retrieval request table and the data query request table.
If the user A selects to check the data result of the historical query, the task monitoring device 4 triggers the service end processing device 1 to search whether the data result of the historical query exists in the service end;
if yes, the business end processing device 1 can directly call and display from the business end; the data result called from the big data cluster device is stored by the service end, so that the service end can read the data quickly, and simultaneously, the waste of resources for executing the data stream by the big data cluster device is reduced.
If not, the service end processing device 1 initiates a data query request to the interface processing device 2, and the interface processing device 2 requests the big data cluster device 3 to invoke and send the data result of the historical query according to the association record of the data invoking request table, and forwards the data result to the corresponding service end processing device 1 for display after receiving the data result.
In addition, in combination with an actual service application scenario, for the user a to select to initiate the latest data query request, the embodiment may further be extended to: the interface processing device 2 further judges whether the user A obtains the authorization of the inquired object, if so, the interface processing device can further judge whether the user A pays, and if so, the interface processing device performs subsequent data acquisition processing.
2.2. Scene 2: according to the data association record, if the user A never inquires the enterprise A which is inquired currently, the task monitoring device 4 triggers the service end processing device 1 to send a data inquiry request to the interface processing device 2, and records the data inquiry request in a data inquiry request table;
after the interface processing device 2 receives the data query request, preferably, the interface processing device 2 may first determine whether the user a has obtained the authorization of the queried enterprise a and has paid the fee;
if the user a has been authorized and paid the fee, the interface processing device 2 continues to execute the processing flow downwards, and the specific processing flow is the same as the processing flow of the user a initiating the latest data query request in the scenario 1, which is not described herein again.
2.3. Scene 3: according to the data association record, the user A has previously requested to query the data of the enterprise A, but the previous query request does not obtain the authorization of the enterprise A and does not acquire the data; specifically, the data query request of the user a to the enterprise a recorded in the data query request table of the service-side processing device 1 is marked with: the user A does not obtain the authorization of the enterprise A; the task monitoring device 4 may trigger the service-side processing device 1 to return a notification prompt that authorization of the enterprise a needs to be obtained first to the user; when the user a receives the authorization of the enterprise a and then issues a query request, the processing flow refers to scenario 1 and scenario 2, which are not described herein again.
In addition, as an example, with respect to the process of the user a obtaining the authorization of the enterprise a, the following processing flow of 3.3 may be referred to; after the user a obtains the authorization of the enterprise a and calls the authorization data to the service end through the interface processing device 2, the task monitoring device 4 triggers the service end processing device 1 to associate the authorization data with the data query request corresponding to the data query request table, and updates and marks the state of the corresponding data query request as: user a has obtained the authorization of enterprise a. Then, when the user a issues a data query request to the enterprise a again, the service-side processing device 1 sends the data query request to the interface processing device 2 together with the authorization status, so that the interface processing device 2 performs the next analysis processing accordingly.
2.4. Scene 4: according to the data association record, the user A inquires the data of the current enterprise A and obtains the authorization of the enterprise A, but the data is not obtained due to the failure of deduction due to insufficient balance; specifically, the data query request of the user a to the enterprise a recorded in the data query request table of the service-side processing device 1 is marked with: user A has obtained the authorization of Enterprise A, but user A has not paid; the task monitoring device 4 can trigger the service-side processing device 1 to return a notification prompt that the user needs to be charged and paid for first; when the user a successfully pays the fee and then issues the query request, the processing flow refers to the scene 1 and the scene 2, which is not described herein again.
2.5. Scene 5: according to the data association record, the user a has inquired about the data of the current enterprise a, and in the process of generating the data result, the task monitoring device 4 can trigger the service end processing device 1 to return a corresponding notification prompt, so that the user can inquire about the data result after a certain waiting time.
The above is the corresponding processing operation performed by the embodiment for different service scenarios in practical application, but those skilled in the art can understand that in practical application, the operation is not limited to the above operation, and the skilled person can make corresponding adjustment according to the practical situation.
3. The interface processing device 2 records the retrieval request for the currently queried enterprise a in the data retrieval request table, and further analyzes the association condition of the data retrieval request to determine the corresponding retrieval channel, that is, from which interface or crawling channel the data should be retrieved.
3.1 for example, if the calling request needs to obtain the basic information of the industry and the commerce of the enterprise A, the calling request can be obtained from a channel I (a certain data platform) preferably; if the calling request needs to acquire the relationship nodes and the enterprise business information thereof in the enterprise relationship network of the enterprise A, the calling request can be acquired from a channel II (another data platform) preferably; if the user needs to obtain the authorization of enterprise A, the authorization can be obtained from channel three (another data platform); and so on.
Specifically, the specific data acquisition process is further explained below.
3.1.1 when the user wants to inquire the business information of the enterprise A, relevant parameters are brought in the data inquiry request: such as enterprise a, business information, etc.; after the request is entered into the data retrieval request table, the interface processing device 2 analyzes the retrieval request, and determines that the business information of the enterprise a can be acquired from the first channel according to the request parameters.
The interface processing means 2 initiates a data retrieval request as soon as the interface channel of the external data source 5.
After the related data is successfully acquired, the interface processing device 2 reads related rule conditions from the task monitoring device 4 (because the trigger conditions, the processing mechanisms and other rule conditions of the service end processing device 1, the interface processing device 2 and the big data cluster device 3 are all stored in the task monitoring device 4), establishes a preliminary association for the data according to the rule conditions, for example, records the related serial number of the acquired data under the corresponding data request in the data retrieval request table, so as to indicate that the data is acquired corresponding to the data retrieval request, and marks that the data retrieval is successful corresponding to the data retrieval request in the data retrieval request table;
then, the task monitoring device 4 triggers the interface processing device 2 to send the called data to the big data cluster device 3 for analysis processing; the big data cluster device 3 analyzes and processes the data, establishes data association for the processed data result, associates the data result with the corresponding data calling request in the data calling request table, and marks the generated data result corresponding to the data calling request in the data calling request table;
if the task monitoring device 4 monitors the data state of the generated data result in real time, the interface processing device 2 is triggered to request the big data cluster device 3 to send the data result;
after receiving the data retrieval request, the big data cluster device 3 sends the data result to the interface processing device 2 according to the association record of the data retrieval request table;
after receiving the data result, the interface processing device 2 associates the data result with the corresponding data query request through the data call request table according to the data association relationship recorded in the data call request table, and sends the data result to the corresponding service-side processing device 1. After receiving the data result, the service-side processing device 1 associates the data result with the corresponding data query request in the data query request table, and at the same time, may send a notification to the user and display the data result.
If the data acquisition fails, the data can be called again, in this embodiment, preferably, 5 times of repeated calling can be performed, and a calling time can also be set, the task monitoring device 4 monitors the time, and once the preset time is reached, the interface processing device 2 is triggered to call the data; if the data still cannot be acquired after 5 times of calling, the interface processing device 2 records data calling failure in the corresponding data calling request in the data calling request table, so that a subsequent management background can manually trigger an abnormal interface to acquire the data.
If the interface processing device 2 does not call the data for reasons such as authorization or cost, the interface processing device 2 correspondingly marks that the data is not called for reasons (such as authorization or cost) in the data call request table;
after monitoring the marking state in real time, the task monitoring device 4 further triggers related devices to perform the next processing according to the rule conditions stored in advance; for example, the task monitoring device 4 may trigger the interface processing device 2 to return a notification that data is not called to the business-end processing device 1, with a reason, such as unauthorized or unpaid; then, waiting for the service-side processing device 1 to perform corresponding processing, for example, triggering the service-side processing device 1 to initiate an authorized data query request, and then triggering the next processing mechanism by the task monitoring device 4 according to the processing result.
3.1.2 in the data result of the call, if there is list page data, when the user needs to show the relevant data information, in this embodiment, the interface processing device 2 can preferably request to obtain data from the interface of the external data channel one in real time; wherein, the number of pages requested to be acquired is preferably 10 pages; after the data is successfully acquired, similarly, the interface processing device 2 reads the relevant rule conditions from the task monitoring device 4, establishes preliminary association on the data according to the rule conditions, and marks that the data is successfully acquired in the data acquisition request table corresponding to the data request;
if there is data acquisition failure, the data may be called again, preferably 5 times, and if the data still fails after 5 times of calling, the interface processing device 2 records the current interface data acquisition failure of the calling request in the data calling request table, so as to facilitate the subsequent manual triggering of the abnormal interface to acquire data.
3.1.3 for some details in the enterprise information, such as details of lawsuits, court announcements, and delivery announcements, when the user clicks the details (i.e. needs to obtain the corresponding detail interface data) in the lawsuits, court announcements, and delivery announcements in the queried enterprise information, the business-end processing device 1 first determines whether the corresponding detail page data exists in the interface database of the business end, if so, the business-end processing device 1 directly displays the data at the business end, if not, the business-end processing device 1 needs to request the interface processing device 2 to obtain the detail interface data corresponding to a channel, and after receiving the request, the interface processing device 2 obtains the data from the corresponding interface, and obtains the flow similar to the corresponding flow.
After the data is successfully acquired, similarly, the interface processing device 2 reads the relevant rule conditions from the task monitoring device 4, establishes preliminary association on the data according to the rule conditions, and marks that the data is successfully acquired in the data acquisition request table corresponding to the data request;
if there is an interface with data acquisition failure, the interface can be called again, preferably 5 times, and if the interface still fails after 5 times of calling, the interface processing device 2 records the current interface data acquisition failure of the calling request in the data calling request table, so as to facilitate the subsequent manual triggering of an abnormal interface to acquire data.
The same request operation as described above may be performed for other business lists in the data retrieval request table.
3.2. When a user needs to further query the relevant information of the relation node in the enterprise relation network after obtaining the business information and the enterprise relation network information of the queried enterprise, the user can initiate a request to the interface processing device 2 through the business end processing device 1 to query the business information of the relation node;
the interface processing device 2 analyzes the data query request, and determines whether the interface database of the big data cluster device 3 stores the business information of the required relationship node within a certain time (for example, within one month) according to the data association record.
If the business information of the required relationship node does not exist in the interface database of the big data cluster device 3 within a certain time, the interface processing device 2 needs to acquire the data again, and judges that the business information of the relationship node can be acquired from the channel II preferentially, and the acquisition process is the same as the corresponding process;
if so, the interface processing device 2 directly requests the big data cluster device 3 to retrieve the required business information in the interface database.
After the data is successfully acquired, similarly, the interface processing device 2 reads the relevant rule conditions from the task monitoring device 4, establishes preliminary association on the data according to the rule conditions, and marks that the data is successfully acquired in the data acquisition request table corresponding to the data acquisition request;
if there is an interface with data acquisition failure, the interface can be called again, preferably 5 times, and if the interface still fails after 5 times of calling, the interface processing device 2 records the current interface data acquisition failure of the calling request in the data calling request table, so as to facilitate the subsequent manual triggering of an abnormal interface to acquire data.
3.3. When a user needs to acquire authorization data, an authorization data query request can be initiated through a service end; the interface processing device 2 receives the data query request, analyzes the data query request, judges that the data query request is an authorized data query request, further judges that the authorized data query request can obtain related authorized data through the interface channel III, obtains state information of enterprise authorization from an authorization interface of the channel III in real time, and matches the state information with a corresponding request record, and specifically can match the request record by using three condition parameters of an enterprise name (authorized enterprise name) of a user, a searched enterprise (authorized enterprise name) and an authorized state record.
Further, for the enterprise list in the data retrieval request table, an interface data request can be initiated to channel three in real time.
In an application scenario, it may be preferable to set the enterprise list in the data retrieval request table to obtain data from the interface of channel three after 12 o 'clock and 18 o' clock each day. When the task monitoring device 4 detects that the set time is reached, the interface processing device 2 is triggered to acquire data from the interface of the channel three.
After the data is successfully acquired, similarly, the interface processing device 2 reads the relevant rule conditions from the task monitoring device 4, establishes preliminary association on the data according to the rule conditions, and marks that the data is successfully acquired in the data acquisition request table corresponding to the data request;
if there is an interface with data acquisition failure, the interface can be called again, preferably 5 times, and if the interface still fails after 5 times of calling, the interface processing device 2 records the current interface data acquisition failure of the calling request in the data calling request table, so as to facilitate the subsequent manual triggering of an abnormal interface to acquire data.
4. According to the data retrieval request table, preferably, the big data cluster device 3 screens out the data retrieval request with successful data retrieval, and executes the data stream on the associated and successfully retrieved data in real time; preferably, the large data cluster device 3 may generate a data result report for the acquired data, and correspondingly, after successfully acquiring the data and generating the data result report, the data result report status corresponding to the data retrieval request in the data retrieval request table is updated to "generated". After the task monitoring device 4 monitors the data result report state in real time, the triggering interface processing device 2 requests the big data cluster device 3 to send the data result report, and sends the data result report to the corresponding service end processing device 1 according to the rule condition read from the task monitoring device 4.
The service end processing device 1 displays data for the corresponding user through the record of the data query request table.
The above embodiment takes the user request to query the related data of the enterprise as an example to illustrate the solution of the present invention, but it should be clear to those skilled in the art that the solution of the present invention can be applied to various business scenarios, such as querying credit data, tax payment data, etc. of the user, therefore, the above specific query steps regarding the enterprise data are only used for understanding the solution of the present invention, and should not be construed as limiting the idea of the solution of the present invention.
The multi-source data integration method based on the big data technology and the system formed by the corresponding device provided by the embodiment can effectively combine a plurality of service terminals and a big data system, and integrate a plurality of third-party data interfaces to form an effective system for querying data by a user terminal. The scheme of the invention can be suitable for various different service scenes, when a user needs to inquire various data, the user can initiate a data inquiry request only through the self service end, then the data inquiry request is processed uniformly by the interface processing device 2, an optimal acquisition channel is selected, after the corresponding data is obtained, the data inquiry request is associated with the corresponding data calling request and inquiry request, the data is further processed uniformly by the big data cluster device 3, the data analysis, the mining, the integration and the association are carried out, various service data of ETL are obtained, and the data result is returned to the user service end through the interface processing device, so that a big data system is formed really, and the subsequent direct calling of the related data is facilitated; the big data system can enhance the reusability and high availability of data and reduce the redundancy of data, and simultaneously can save data calling resources and reduce the interface cost, because the interface processing device 2 uniformly interfaces various different service terminals, the big data cluster device only needs to open one interface to the interface processing device; moreover, the user end can initiate the query request only through the service end of the user end, and query resources are saved for the user.
It should be noted that, as will be understood by those skilled in the art: all or part of the steps for implementing the method can be completed by hardware related to program instructions, the program instructions can be stored in a computer readable storage medium or storage device, and when the program instructions are executed, the steps of the multi-source data integration method based on the big data technology are executed; and the aforementioned storage media or storage devices include, but are not limited to: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Accordingly, the embodiment of the present invention further provides a computer-readable storage device, which stores a computer program, where the computer program is executed by a processor to implement the above-mentioned multi-source data integration method based on big data technology.
Further, the present invention also provides a corresponding mobile terminal and system to implement the above multi-source data integration method based on big data technology, specifically:
a mobile terminal, comprising:
a processor adapted to execute program instructions;
and the storage device is suitable for storing program instructions which are suitable for being loaded and executed by a processor to realize the multi-source data integration method based on the big data technology.
A multi-source data integration system based on big data technology comprises a server; the server comprises a processor and a storage device;
a processor adapted to execute program instructions;
and the storage device is suitable for storing program instructions which are suitable for being loaded and executed by a processor to realize the multi-source data integration method based on the big data technology.
The above description is only a preferred embodiment of the present invention, and for those skilled in the art, the present invention should not be limited by the description of the present invention, which should be interpreted as a limitation.