CN109634951B

CN109634951B - Big data acquisition method, device, computer equipment and storage medium

Info

Publication number: CN109634951B
Application number: CN201811239711.9A
Authority: CN
Inventors: 陈文端
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-10-23
Filing date: 2018-10-23
Publication date: 2023-12-22
Anticipated expiration: 2038-10-23
Also published as: CN109634951A

Abstract

The invention discloses a big data acquisition method, a big data acquisition device, computer equipment and a storage medium. The method comprises the following steps: receiving service data uploaded by a plurality of service databases, and storing the service data into a sub-source data table corresponding to a local source database according to a source identifier; acquiring a sub-source data table included in a source database, and correspondingly generating a multi-level theme zone according to a source identifier of the sub-source data table and a selected field in the sub-source data table; and establishing a theme domain database according to the multi-level theme domain correspondence, and storing the data obtained by screening the sub-source data table included in the source database according to the multi-level theme domain correspondence into the theme domain database. The method realizes that the data is concentrated on one platform, the data analysis only needs to fetch data from one data source, unified data management and use are realized, and the repeated processing of the data is reduced.

Description

Big data acquisition method, device, computer equipment and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method and apparatus for collecting big data, a computer device, and a storage medium.

Background

At present, databases of enterprises are generally classified according to service systems, and different service data are stored in respective corresponding databases, so that the data among the service systems cannot be mutually fused for data analysis and application, and the problem of data island is caused.

Disclosure of Invention

The embodiment of the invention provides a big data acquisition method, a big data acquisition device, computer equipment and a storage medium, which aim to solve the problems that in the prior art, different service data are stored in corresponding databases, so that data among service systems cannot be mutually fused for data analysis and application, and data island is caused.

In a first aspect, an embodiment of the present invention provides a big data acquisition method, including:

receiving service data uploaded by a plurality of service databases, and storing the service data into a sub-source data table corresponding to a local source database according to a source identifier;

acquiring a sub-source data table included in a source database, and correspondingly generating a multi-level theme zone according to a source identifier of the sub-source data table and a selected field in the sub-source data table;

and establishing a theme domain database according to the multi-level theme domain correspondence, and storing the data obtained by screening the sub-source data table included in the source database according to the multi-level theme domain correspondence into the theme domain database.

In a second aspect, an embodiment of the present invention provides a big data acquisition apparatus, including:

the service data receiving unit is used for receiving service data uploaded by the plurality of service databases and storing the service data into a sub-source data table corresponding to the local source database according to the source identification;

the multi-level theme zone acquisition unit is used for acquiring a sub-source data table included in the source database and generating a multi-level theme zone according to the source identification of the sub-source data table and the correspondence of the selected fields in the sub-source data table;

and the topic domain database processing unit is used for correspondingly establishing a topic domain database according to the multi-level topic domain, and correspondingly storing the data screened by the sub-source data table included in the source database according to the multi-level topic domain into the topic domain database.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the computer program to implement the big data collection method described in the first aspect.

In a fourth aspect, an embodiment of the present invention further provides a storage medium, where the storage medium stores a computer program, where the computer program when executed by a processor causes the processor to perform the big data collection method according to the first aspect.

The embodiment of the invention provides a big data acquisition method, a big data acquisition device, computer equipment and a storage medium. The topic domain processing is carried out on the service data uploaded by the plurality of service databases according to the fields to obtain the topic domain databases, so that the data is concentrated on one platform, the data analysis only needs to be taken from one data source, unified data management and use are realized, and the repeated processing of the data is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a big data acquisition method according to an embodiment of the present invention;

FIG. 2 is another flow chart of the big data collection method according to the embodiment of the invention;

FIG. 3 is a schematic sub-flowchart of a big data acquisition method according to an embodiment of the present invention;

FIG. 4 is another flow chart of the big data collection method according to the embodiment of the present invention;

FIG. 5 is a schematic diagram of another sub-flow of the big data collection method according to the embodiment of the present invention;

FIG. 6 is a schematic block diagram of a big data acquisition device provided by an embodiment of the present invention;

FIG. 7 is another schematic block diagram of a big data acquisition device provided by an embodiment of the present invention;

FIG. 8 is a schematic block diagram of a subunit of the big data acquisition device according to an embodiment of the present invention;

FIG. 9 is another schematic block diagram of a big data acquisition device provided by an embodiment of the present invention;

FIG. 10 is a schematic block diagram of another subunit of the big data collection device according to an embodiment of the present invention;

fig. 11 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Referring to fig. 1, fig. 1 is a flow chart of a big data collection method provided by an embodiment of the present invention, where the big data collection method is applied to a management server, and the method is executed by application software installed in the management server, and the management server is an enterprise terminal for collecting and processing service data uploaded by each service system.

As shown in fig. 1, the method includes steps S101 to S103.

S101, receiving service data uploaded by a plurality of service databases, and storing the service data into a sub-source data table corresponding to a local source database according to a source identifier.

In this embodiment, the management server is used as a database of data fusion functions of large data collection and processing. In a plurality of service systems (such as service systems in insurance enterprises generally comprise an underwriting database, a claim settlement database, a channel database, a client information database, an external database and the like) connected with the management server, data in the service systems can be uploaded to the management server, and the management server processes the data.

When the service data in the plurality of service databases are uploaded according to a specified period (such as automatic uploading once every 24 hours) or manually triggered, a storage space is arranged in the management server to serve as a source database, and the uploaded service data are stored into corresponding sub-tables (namely sub-source data tables) in the local source database according to the source of the service data.

In one embodiment, step S101 includes:

receiving service data uploaded by a service database, and analyzing a source identifier of the service data;

and positioning a corresponding sub-source data table in the local source database according to the source identification, and storing the service data into the sub-source data table.

When each service database establishes a communication channel with the management server for the first time and performs data transmission, a sub-source data table corresponding to the service database is established in the local source database of the management server so as to correspondingly store service data from different service databases. For example, the service data uploaded by the underwriting database is added with the source identifier of the underwriting database and uploaded to the underwriting data sub-source data table in the source database through the communication channel established by the underwriting database and the management server (if the underwriting data sub-source data table exists in the source database, the underwriting data sub-source data table does not need to be established, and if the underwriting data sub-source data table does not exist in the source database, the underwriting data sub-source data table needs to be correspondingly established according to the source identifier of the underwriting database). In the subsequent process, if the management server receives the service data uploaded by the underwriting database, the analyzed source of the service data is identified as the underwriting database, so that the service data is stored in the underwriting data sub-source data table.

In an embodiment, the local source database is a Hadoop database, and service data in the service database is automatically uploaded to the Hadoop database through the Sqoop script. Hadoop is a distributed computing platform of the Apache foundation with the next open source, takes a distributed file system HDFS and a MapReduce algorithm as cores, and provides a transparent distributed infrastructure for the bottom details of the system for users. The Sqoop script is mainly used for transmitting data between Hadoop and a traditional database (such as mysql, postgresql, etc.), and can be used for importing data in a relational database into the Hadoop HDFS or importing data of the HDFS into the relational database.

S102, obtaining a sub-source data table included in the source database, and generating a multi-level theme zone according to source identification of the sub-source data table and the correspondence of selected fields in the sub-source data table.

In this embodiment, after service data in a plurality of service databases are all uploaded to the management server, at this time, the management server may statistically obtain a sub-source data table included in the local source database, and use the sub-source data table included in the local source database as basic data of data processing, and may perform processing of a theme zone on the basic data, and may also perform data processing in modes such as data marts and data applications. Data marts and data applications process data from different granularities and topics to meet the data analysis requirements of different latitudes of various scenes. When the theme zone is processed, the source identification of the sub-source data table and the selected field correspondingly generate a multi-level theme zone so as to meet the requirement that a user needs to retrieve data according to the theme.

In one embodiment, step S102 further includes:

and generating a multi-level theme domain according to the source identification of the sub-source data table and the field correspondence selected by the first SQL script.

That is, as shown in fig. 2, as another embodiment of step S102, the following is specifically described:

s102a, obtaining a sub-source data table included in the source database, and generating a multi-level theme zone according to source identification of the sub-source data table and field correspondence selected by the first SQL script.

In this embodiment, after the sub-source data tables included in the source database are acquired, the source identifier corresponding to each sub-source data table may be used as the first-level theme domain, for example, the source identifier of the underwriting data sub-source data table is the underwriting database, and after the corresponding field is automatically selected in the underwriting data sub-source data table through the first SQL script, the selected field may be used as the second-level theme domain, and after the first-level theme domain and the second-level theme domain are obtained in the above manner, the theme domain database may be established according to the correspondence of the multi-level theme domain. Specifically, when the multi-level theme zone is built, the method is not limited to only building the two-level theme zone as exemplified, and the next-level field can be further selected in the sub-source data table as the next-level theme zone for the second-level theme zone.

For example, after the first SQL script automatically selects the field of name as the second-level theme zone, the first SQL script can automatically select the fields of mobile phone number, underwriting amount and underwriting period as the third-level theme zone. When the user selects one first-level theme zone of the underwriting database and one second-level theme zone of the name (for example, the selected name is Zhang san), the management server screens out data of 3 third-level theme zones including the mobile phone number, the underwriting amount and the underwriting period under Zhang san, and draws a data table to feed back to the user for viewing. The multi-level processing of the original data is facilitated by generating the database of the multi-level theme zone, so that the actual data use requirement of a user is met.

S103, a topic domain database is established according to the multi-level topic domain correspondence, and the sub-source data table included in the source database is correspondingly stored in the topic domain database according to the data obtained by screening the multi-level topic domain.

In this embodiment, after a topic domain database is built according to the screened multi-level topic domain correspondence, the data screened from the sub-source data table through the corresponding fields in the multi-level topic domain is stored in the topic domain database correspondingly by using the sub-source data table included in the local source database as a data base. The sub-source data table included in the local source database is used as original unprocessed data in the management server, and the topic domain database is also used, so that the requirement of a user for retrieving data according to topics is met.

In one embodiment, as shown in fig. 3, step S103 includes:

s1031, establishing a table by taking a first-level topic domain in the multi-level topic domains as a table name one-to-one correspondence, and deleting the residual topic domain correspondence establishment fields of the first-level topic domains according to the multi-level topic domains included in the multi-level topic domains;

s1032, generating a second SQL script according to the field correspondence included in the topic domain database, and filling the data of the corresponding field in the sub-source data table into the corresponding table of the topic domain database through the second SQL script.

In this embodiment, when the topic domain database is built, a table is built in the topic domain database by taking a first level topic domain in a multi-level topic domain as a table name in a one-to-one correspondence manner, and the remaining topic domain corresponding built fields in the first level topic domain are deleted according to the multi-level topic domain included in the multi-level topic domain, so that the obtained topic domain database is a database simplified relative to the source database, i.e. part of the fields are selectively reserved, and the fields which are not required to be reserved are deleted, so that the retrieval efficiency of the user in the actual application scene can be improved by selectively reserving the table and the fields. And the data of the corresponding fields in the sub-source data table are filled into the corresponding table of the theme zone database through the second SQL script, so that the automatic screening and filling of the data are realized, and the data processing efficiency is improved.

In one embodiment, as shown in fig. 4, step S103 further includes:

s104, obtaining sub-source data tables included in the source database, and fusing the data in each sub-source data table according to the common fields in the sub-source data tables to obtain the bazaar database.

In this embodiment, the fields included in the sub-source data table included in the source database are comprehensively counted (i.e., only the fields appearing in the sub-source data table are counted), and the data of the repeatedly appearing fields are fused, so that a comprehensively counted bazaar database is obtained, and the data fusion of the sub-source data table is realized.

In one embodiment, as shown in fig. 5, step S104 includes:

s1041, obtaining fields included in each sub-source data table through statistics;

s1042, if the occurrence frequency of the field in the fields included in each sub-source data table is greater than 1, de-duplicating the field to make the occurrence frequency of the field be 1 and using the field as the field in the bazaar database;

s1043, if the occurrence frequency of the field in the fields included in each sub-source data table is equal to 1, saving the field as a field in the bazaar database;

s1044, correspondingly filling data in fields included in the bazaar database to obtain the bazaar database.

In this embodiment, each sub-source data table generally includes the same field, such as a user identification card number, if the data with the same field in each sub-source data table is fused, that is, only one field of the user identification card number is reserved (that is, only one same field is reserved), and the other different fields are reserved, so as to obtain fused data. For example, the sub-source data table 1 is provided with a user ID, a user identification number, a user name, an application type, an application amount and an application time limit field; the sub-source data table 2 is provided with a user ID, a user identification card number, a user name, a mobile phone number, a borrowing amount and a borrowing time limit field; after data fusion, in a bazaar database, the two pieces of data are fused into one piece, and the two pieces of data have the following fields: user ID, user ID card number, user name, insurance type, insurance amount, insurance time limit, mobile phone number, borrowing amount, borrowing time limit fields. Thus, the data in each sub-source data table are fused, and the most comprehensive data aiming at each user is obtained.

According to the method, the topic domain processing is carried out on the service data uploaded by the plurality of service databases according to the fields to obtain the topic domain databases, so that the data is concentrated on one platform, the data analysis only needs to be taken from one data source, unified data management and use are realized, and the repeated processing of the data is reduced.

The embodiment of the invention also provides a big data acquisition device which is used for executing any embodiment of the big data acquisition method. Specifically, referring to fig. 6, fig. 6 is a schematic block diagram of a big data acquisition device according to an embodiment of the present invention. The big data collection device 100 may be configured in a management server.

As shown in fig. 6, the big data acquisition apparatus 100 includes a business data reception unit 101, a multi-stage subject domain acquisition unit 102, and a subject domain database processing unit 103.

The service data receiving unit 101 is configured to receive service data uploaded by the plurality of service databases, and store the service data into a sub-source data table corresponding to the local source database according to the source identifier.

When the service data in the plurality of service databases are uploaded according to a specified period (such as automatic uploading once every 24 hours) or manually triggered, a storage space is arranged in the management server to serve as a source database, and the uploaded service data are stored into corresponding sub-tables in the local source database according to the source of the service data.

In an embodiment, the service data receiving unit 101 includes:

the source identification analysis unit is used for receiving the service data uploaded by the service database and analyzing the source identification of the service data;

and the sub-source data table positioning unit is used for positioning the corresponding sub-source data table in the local source database according to the source identifier and storing the service data into the sub-source data table.

The multi-level topic domain obtaining unit 102 is configured to obtain a sub-source data table included in the source database, and generate a multi-level topic domain according to a source identifier of the sub-source data table and a field selected in the sub-source data table.

In one embodiment, as shown in fig. 7, as another embodiment of the multi-level subject matter domain obtaining unit 102, the big data collecting apparatus 100 further includes:

the data script processing unit 102a is configured to obtain a sub-source data table included in the source database, and generate a multi-level theme zone according to a source identifier of the sub-source data table and a field selected by the first SQL script.

The topic domain database processing unit 103 is configured to build a topic domain database according to the multi-level topic domain correspondence, and store the sub-source data table included in the source database to the topic domain database according to the data obtained by screening the multi-level topic domain correspondence.

In one embodiment, as shown in fig. 8, the subject domain database processing unit 103 includes:

the topic domain table establishing unit 1031 is configured to establish a table with a first level topic domain of the multi-level topic domains as a table name in a one-to-one correspondence manner, and delete topic domain correspondence establishment fields remaining in the first level topic domain according to the multi-level topic domains included in the multi-level topic domains;

and the topic domain table data filling unit 1032 is configured to correspondingly generate a second SQL script according to the fields included in the topic domain database, and fill the data of the corresponding fields in the sub-source data table into the corresponding tables of the topic domain database through the second SQL script.

In one embodiment, as shown in fig. 9, the big data acquisition apparatus 100 further includes:

the bazaar database processing unit 104 is configured to obtain sub-source data tables included in the source database, and fuse data in each sub-source data table according to common fields in the sub-source data tables, so as to obtain the bazaar database.

In one embodiment, as shown in FIG. 10, the bazaar database processing unit 104 includes:

a field statistics unit 1041, configured to statistically obtain fields included in each sub-source data table;

the deduplication unit 1042 is configured to deduplicate a field included in each sub-source data table to make the occurrence frequency of the field be 1 and use the field as a field in the bazaar database if the occurrence frequency of the field is greater than 1;

a field retaining unit 1043, configured to store, if the occurrence frequency of a field in the fields included in each sub-source data table is equal to 1, the field as a field in the bazaar database;

And a data filling unit 1044, configured to perform data filling on the field correspondence included in the bazaar database, so as to obtain the bazaar database.

The device obtains the topic domain database by carrying out topic domain processing on the service data uploaded by the service databases according to the fields, thereby realizing the purpose of concentrating the data on a platform, only taking the data from one data source for data analysis, realizing unified data management and use, and reducing the repeated processing of the data.

The big data acquisition means described above may be implemented in the form of a computer program which can be run on a computer device as shown in fig. 11.

Referring to fig. 11, fig. 11 is a schematic block diagram of a computer device according to an embodiment of the present invention.

With reference to FIG. 11, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform a big data acquisition method.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the execution of a computer program 5032 in the non-volatile storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform a big data acquisition method.

The network interface 505 is used for network communication, such as providing for transmission of data information, etc. It will be appreciated by those skilled in the art that the structure shown in FIG. 11 is merely a block diagram of some of the structures associated with the present inventive arrangements and does not constitute a limitation of the computer device 500 to which the present inventive arrangements may be applied, and that a particular computer device 500 may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

Wherein the processor 502 is configured to execute a computer program 5032 stored in a memory to perform the following functions: receiving service data uploaded by a plurality of service databases, and storing the service data into a sub-source data table corresponding to a local source database according to a source identifier; acquiring a sub-source data table included in a source database, and correspondingly generating a multi-level theme zone according to a source identifier of the sub-source data table and a selected field in the sub-source data table; and establishing a theme domain database according to the multi-level theme domain correspondence, and storing the data obtained by screening the sub-source data table included in the source database according to the multi-level theme domain correspondence into the theme domain database.

In one embodiment, after performing the step of establishing a topic domain database according to the multi-level topic domain correspondence, the processor 502 further performs the following operations after storing the data correspondence screened by the sub-source data table included in the source database according to the multi-level topic domain into the topic domain database: and acquiring sub-source data tables included in the source database, and fusing the data in each sub-source data table according to the common fields in the sub-source data tables to obtain the bazaar database.

In one embodiment, the processor 502 performs the following operations when generating a multi-level subject matter field based on the source identification of the sub-source data table and the correspondence of selected fields in the sub-source data table: and generating a multi-level theme domain according to the source identification of the sub-source data table and the field correspondence selected by the first SQL script.

In one embodiment, the processor 502 performs the following operations when performing the step of establishing a topic domain database according to the multi-level topic domain correspondence, and storing the data correspondence screened by the sub-source data table included in the source database according to the multi-level topic domain into the topic domain database: establishing a table by taking a first-level topic domain in the multi-level topic domains as a table name one-to-one correspondence, and deleting the topic domain correspondence establishment fields remained in the first-level topic domains according to the multi-level topic domains included in the multi-level topic domains; and generating a second SQL script according to the field correspondence included in the theme zone database, and filling the data of the corresponding field in the sub-source data table into the corresponding table of the theme zone database through the second SQL script.

In one embodiment, the processor 502 performs the following operations when performing the step of fusing the data in each sub-source data table according to the common field in the sub-source data table to obtain the bazaar database: counting and obtaining fields included in each sub-source data table; if the occurrence frequency of the field in the fields included in each sub-source data table is greater than 1, de-duplicating the field to enable the occurrence frequency of the field to be 1, and taking the field as the field in the bazaar database; if the occurrence frequency of the field in the fields included in each sub-source data table is equal to 1, the field is saved to be used as the field in the bazaar database;

and correspondingly filling data in the fields included in the bazaar database to obtain the bazaar database.

Those skilled in the art will appreciate that the embodiment of the computer device shown in fig. 11 is not limiting of the specific construction of the computer device, and in other embodiments, the computer device may include more or less components than those shown, or certain components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 11, and will not be described again.

It should be appreciated that in an embodiment of the invention, the processor 502 may be a central processing unit (CentralProcessing Unit, CPU), the processor 502 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In another embodiment of the present invention, a storage medium is provided. The storage medium may be a non-volatile computer readable storage medium. The storage medium stores a computer program, wherein the computer program when executed by a processor performs the steps of: receiving service data uploaded by a plurality of service databases, and storing the service data into a sub-source data table corresponding to a local source database according to a source identifier; acquiring a sub-source data table included in a source database, and correspondingly generating a multi-level theme zone according to a source identifier of the sub-source data table and a selected field in the sub-source data table; and establishing a theme domain database according to the multi-level theme domain correspondence, and storing the data obtained by screening the sub-source data table included in the source database according to the multi-level theme domain correspondence into the theme domain database.

In an embodiment, the root establishes a topic domain database according to the multi-level topic domain correspondence, and stores the data correspondence screened by the sub-source data table included in the source database according to the multi-level topic domain into the topic domain database, and then further includes: and acquiring sub-source data tables included in the source database, and fusing the data in each sub-source data table according to the common fields in the sub-source data tables to obtain the bazaar database.

In one embodiment, the generating the multi-level theme zone according to the source identifier of the sub-source data table and the selected field correspondence in the sub-source data table includes: and generating a multi-level theme domain according to the source identification of the sub-source data table and the field correspondence selected by the first SQL script.

In an embodiment, the establishing a topic domain database according to the multi-level topic domain correspondence, and storing the sub-source data table included in the source database to the topic domain database according to the data correspondence obtained by screening the multi-level topic domain, includes: establishing a table by taking a first-level topic domain in the multi-level topic domains as a table name one-to-one correspondence, and deleting the topic domain correspondence establishment fields remained in the first-level topic domains according to the multi-level topic domains included in the multi-level topic domains; and generating a second SQL script according to the field correspondence included in the theme zone database, and filling the data of the corresponding field in the sub-source data table into the corresponding table of the theme zone database through the second SQL script.

In an embodiment, the fusing the data in each sub-source data table according to the common field in the sub-source data table to obtain the bazaar database includes: counting and obtaining fields included in each sub-source data table; if the occurrence frequency of the field in the fields included in each sub-source data table is greater than 1, de-duplicating the field to enable the occurrence frequency of the field to be 1, and taking the field as the field in the bazaar database; if the occurrence frequency of the field in the fields included in each sub-source data table is equal to 1, the field is saved to be used as the field in the bazaar database; and correspondingly filling data in the fields included in the bazaar database to obtain the bazaar database.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein. Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units is merely a logical function division, there may be another division manner in actual implementation, or units having the same function may be integrated into one unit, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present invention.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units may be stored in a storage medium if implemented in the form of software functional units and sold or used as stand-alone products. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A big data acquisition method, comprising:

establishing a topic domain database according to the multi-level topic domain correspondence, and storing the sub-source data table included in the source database to the topic domain database according to the data obtained by screening the multi-level topic domain correspondence;

acquiring sub-source data tables included in a source database, and fusing data in each sub-source data table according to common fields in the sub-source data tables to obtain a bazaar database;

Establishing a topic domain database according to the multi-level topic domain correspondence, and storing the data correspondence obtained by screening the sub-source data table included in the source database according to the multi-level topic domain into the topic domain database, wherein the method comprises the following steps:

establishing a table by taking a first-level topic domain in the multi-level topic domains as a table name one-to-one correspondence, and deleting the topic domain correspondence establishment fields remained in the first-level topic domains according to the multi-level topic domains included in the multi-level topic domains;

and generating a second SQL script according to the field correspondence included in the theme zone database, and filling the data of the corresponding field in the sub-source data table into the corresponding table of the theme zone database through the second SQL script.

2. The big data collection method according to claim 1, wherein the generating the multi-level theme zone according to the source identifier of the sub-source data table and the selected field in the sub-source data table includes:

3. The big data collection method according to claim 1, wherein the fusing the data in each sub-source data table according to the common field in the sub-source data table to obtain a bazaar database comprises:

Counting and obtaining fields included in each sub-source data table;

if the occurrence frequency of the field in the fields included in each sub-source data table is greater than 1, de-duplicating the field to enable the occurrence frequency of the field to be 1, and taking the field as the field in the bazaar database;

if the occurrence frequency of the field in the fields included in each sub-source data table is equal to 1, the field is saved to be used as the field in the bazaar database;

4. A big data acquisition device, comprising:

the topic domain database processing unit is used for correspondingly establishing a topic domain database according to the multi-level topic domain, and correspondingly storing the sub-source data table included in the source database into the topic domain database according to the data obtained by screening the multi-level topic domain;

The system comprises a bazaar database processing unit, a processing unit and a processing unit, wherein the bazaar database processing unit is used for acquiring sub-source data tables included in a source database, and fusing the data in each sub-source data table according to common fields in the sub-source data tables to obtain the bazaar database;

the subject domain database processing unit includes:

the topic domain table establishing unit is used for establishing a table by taking a first-level topic domain in the multi-level topic domains as a table name in one-to-one correspondence, and deleting the topic domain corresponding establishing fields remained in the first-level topic domains according to the multi-level topic domains included in the multi-level topic domains;

and the topic domain table data filling unit is used for correspondingly generating a second SQL script according to the fields included in the topic domain database, and filling the data of the corresponding fields in the sub-source data table into the corresponding tables of the topic domain database through the second SQL script.

5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the big data collection method according to any of claims 1 to 3 when executing the computer program.

6. A storage medium storing a computer program which, when executed by a processor, causes the processor to perform the big data collection method of any of claims 1 to 3.