CN110633315A - Data processing method and device and computer storage medium - Google Patents

Data processing method and device and computer storage medium Download PDF

Info

Publication number
CN110633315A
CN110633315A CN201810639187.8A CN201810639187A CN110633315A CN 110633315 A CN110633315 A CN 110633315A CN 201810639187 A CN201810639187 A CN 201810639187A CN 110633315 A CN110633315 A CN 110633315A
Authority
CN
China
Prior art keywords
data
database
subdata
calling
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810639187.8A
Other languages
Chinese (zh)
Inventor
蔡金锴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Heilongjiang Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Heilongjiang Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Heilongjiang Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201810639187.8A priority Critical patent/CN110633315A/en
Publication of CN110633315A publication Critical patent/CN110633315A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/125Finance or payroll

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data processing method, which comprises the following steps: storing first subdata, the data attribute of which accords with a preset attribute, in first data into a first database, and storing second subdata, in the first data, into a second database; wherein the first data comprises data associated with an audit project; the second subdata is data except the first subdata in the first data; receiving a calling instruction for calling data; wherein the calling instruction carries a data identifier; responding to the calling instruction, and determining whether the first database comprises second data corresponding to the data identification; and if the first database comprises the second data, calling the second data from the first database. The embodiment of the invention also discloses a data processing device and a computer storage medium.

Description

Data processing method and device and computer storage medium
Technical Field
The present invention relates to, but not limited to, the field of computer technologies, and in particular, to a data processing method, device, and computer storage medium.
Background
The electronic audit step by step replaces manual audit, and currently, audit data required by the electronic audit is mainly stored by a traditional centralized database.
However, with the rapid increase of the audit data, the storage space is larger and larger, so that the time required for traversing the audit data of the database is longer and longer, the time required for processing the audit service is further increased, and the processing efficiency of the audit data is low.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data processing method, a device, and a computer storage medium, which solve the problem of low audit data processing efficiency in the prior art, improve audit data processing efficiency, and reduce system overhead.
In order to achieve the above purpose, the technical solution of the embodiment of the present invention is realized as follows:
the embodiment of the invention provides a data processing method, which comprises the following steps:
storing first subdata, of which the data attribute accords with a preset attribute, in the first data to a first database, and storing second subdata, of the first data, to a second database; wherein the first data comprises data associated with an audit project; the second subdata is data except the first subdata in the first data;
receiving a calling instruction for calling data; wherein the calling instruction carries a data identifier;
responding to the calling instruction, and determining whether second data corresponding to the data identification is included in the first database;
and if the first database comprises the second data, calling the second data from the first database.
Optionally, the method further includes:
and if the first database does not comprise the second data, inquiring the second data from the second database according to the index, and calling the second data.
Optionally, after storing the second sub-data in the first data in the second database, the method further includes:
acquiring the data category of each piece of data in the second subdata;
and generating a first index for each data in the second subdata according to the data category.
Optionally, if the first database does not include the second data, querying the second data from the second database according to an index, and invoking the second data, including:
acquiring the data type of the second data;
and inquiring the second data from the second database according to the data category of the second data and the first index, and calling the second data.
Optionally, before the first sub-data in which the data attribute of the first data conforms to the preset attribute is stored in the first database, the method further includes:
acquiring historical query frequency of each data in the first data and data characteristics of each data in the first data;
and searching data with historical query frequency larger than preset frequency and/or data characteristics conforming to structured data characteristics from the first data, and determining the data as the first subdata.
Optionally, after the first index is performed on each data in the second sub-data according to the data category, the method further includes:
and generating a second index for the data with the same data category in the second database according to the data category and the first index.
Optionally, if the first database does not include the second data, querying the second data from the second database according to an index, and invoking the second data, including:
acquiring the data type of the second data;
and inquiring the second data from the second database according to the data category of the second data and the second index, and calling the second data.
An embodiment of the present invention provides a data processing apparatus, including:
a processor, a memory, and a communication bus;
the communication bus is used for realizing communication connection between the processor and the memory;
the processor is used for executing the program for processing the data in the memory so as to realize the following steps:
storing first subdata, of which the data attribute accords with a preset attribute, in the first data to a first database, and storing second subdata, of the first data, to a second database; wherein the first data comprises data associated with an audit project; the second subdata is data except the first subdata in the first data;
receiving a calling instruction for calling data; wherein the calling instruction carries a data identifier;
responding to the calling instruction, and determining whether second data corresponding to the data identification is included in the first database;
and if the first database comprises the second data, calling the second data from the first database.
Optionally, before the first sub-data in which the data attribute of the first data conforms to the preset attribute is stored in the first database, the processor is further configured to execute a data processing program, so as to implement the following steps:
acquiring historical query frequency of each data in the first data and data characteristics of each data in the first data;
and searching data with historical query frequency larger than preset frequency and/or data characteristics conforming to structured data characteristics from the first data, and determining the data as the first subdata.
Embodiments of the present invention provide a computer storage medium, which stores one or more programs that can be executed by one or more processors to implement the steps of the data processing method as described above.
According to the data processing method, the data processing device and the computer storage medium provided by the embodiment of the invention, first subdata of which the data attribute in the first data accords with the preset attribute is stored in a first database, and second subdata of the first data is stored in a second database; wherein the first data comprises data associated with an audit project; the second subdata is data except the first subdata in the first data; receiving a calling instruction for calling data; wherein the calling instruction carries a data identifier; responding to the calling instruction, and determining whether the first database comprises second data corresponding to the data identification; if the first database comprises second data, calling the second data from the first database; that is to say, when a call instruction for calling data is received, in response to the call instruction, first, second data is queried from a first database including first subdata, where the first subdata is data in which a data attribute of data associated with an audit item conforms to a preset attribute.
Drawings
FIG. 1 is a schematic diagram of an audit system in the related art;
FIG. 2 is a flow chart illustrating a data processing method according to an embodiment of the present invention;
FIG. 3 is a flow chart illustrating another data processing method according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating another data processing method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an architecture of an auditing system provided by an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a data processing device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
An embodiment of the present invention provides a data processing method, which may be applied to a data processing device, where the data processing device at least includes: a processor and a storage medium configured to store executable instructions. The core idea of the invention is to improve a centralized storage mode adopted in the related technology and classify and store data related to an audit project according to data attributes so as to improve the processing efficiency of audit data, solve the problem of low processing efficiency of audit data in the related technology and reduce the system overhead. The data processing method provided by the embodiment of the invention can be applied to audit business and is used for processing data in the audit business.
Here, for a brief explanation of the architecture of an auditing system for implementing auditing services in the related art, for example, referring to fig. 1, the auditing system 1 includes a service database 11, a data collection module 12 and a database analysis platform 13. The data acquisition module can extract the data of the original log library through the uniform interface, store the data in the business database, analyze and process the data according to the database analysis platform, and output the fixed report and the fixed image by applying a simple strategy according to the processing result.
In the related technology, audit data required by electronic audit is mainly stored by adopting a centralized database, and with the electronic audit gradually replacing manual audit, the volume of an audit data source stored in the centralized database is larger and larger, and the audit data source is limited by basic capacity of a data-bearing server such as a memory, a processor and the like, and the current storage mode presents a great bottleneck and mainly appears in the following two aspects: on one hand, with the sharp increase of data volume, the space required by storage is larger and larger, so that the data traversal time is longer and longer, but the data is limited by the continuously increased data, and the manual maintenance cost is extremely high; on the other hand, with the gradual popularization of electronic auditing services, the online database load is in a high position for a long time, the database downtime and the risk of data loss are increased sharply, and the unit time for processing services is increased due to the overhigh database load, so that the stability of an auditing system is poor.
From the above, in the auditing system in the related art, in the face of the phenomenon that a large amount of data is stored in a business database in a centralized manner, the resource overhead is insufficient and the database is busy at first, and improvement is urgently needed.
With the rapid development of the operator service and the comprehensive popularization of the auditing electronization, the system accessed to the outside of the 4A auditing system is more and more complex, and the provided data access modes are also diversified. The mode of distributed deployment, database division, table division and the like adopted by the original electronic audit system on the database storage can not meet the increasing practical problems of complex business logic, huge audit data volume, complex audit analysis strategy, long audit report export time and the like of an application system. Therefore, the audit requirements under the background of big data from the perspective of the traditional technology cannot be solved well on the premise of high benefit and low cost, and a new technical breakthrough is needed.
According to the foregoing embodiments, an embodiment of the present invention provides a data processing method applied to a data processing apparatus, as shown in fig. 2, the method including the steps of:
s201, storing first subdata, of which the data attributes in the first data accord with preset attributes, into a first database, and storing second subdata in the first data into a second database.
Wherein the first data comprises data associated with an audit project; the second sub data is data of the first data except the first sub data.
Here, the data processing device may collect data associated with the audit project, such as data of cloud audit, to obtain first data, where the first data includes dynamic data associated with an enterprise, and the data processing device may update the first data in real time; the key improvement of the embodiment is that not only can the historical static data of the enterprise related to the audit project be collected, but also various dynamic data related to the audit project can be collected through a big data system, so that online dynamic audit is realized, and more scientific guidance data is provided for the production and operation decision management of the enterprise.
Here, the data processing apparatus may set a preset attribute that may characterize an attribute of the data having high-value information. Of course, the preset attribute may also represent other content, for example, the preset attribute may represent an attribute that data has a structured data feature, and the preset attribute is not specifically limited in the embodiment of the present invention, so as to implement the data processing method provided in the embodiment of the present invention.
For example, the preset attribute may be an attribute that characterizes the data as having a structured data feature, where the structured data feature has the following features: relational database representations and storage may be used, representing data in two dimensions. The structured data is characterized in that: data is in row units, one row of data represents information of one entity, and the attribute of each row of data is the same. The semi-structured data has the following characteristics: it does not conform to the data model structure associated in the form of a relational database or other data table, but contains relevant tags to separate semantic elements and to stratify records and fields; it is therefore also referred to as a self-describing structure. The semi-structured data is characterized in that: entities belonging to the same class may have different attributes, even if they are grouped together, the order of these attributes is not important. Unstructured data has the following characteristics: data without a fixed structure, for example, various documents, pictures, video/audio, and the like, belong to unstructured data. The unstructured data are characterized in that: the storage is generally performed directly in its entirety, and is generally stored in a binary data format.
Then, under the condition that the preset attribute is set as the attribute representing that the data has the structured data characteristic, after the first data is obtained, the data processing equipment analyzes the attribute of each data in the first data, obtains the data of which the data attribute accords with the preset attribute as first sub-data, and stores the first sub-data in a first database; and storing second sub-data, which comprises semi-structured data and unstructured data, in the first data except the first sub-data, into a second database.
In the embodiment of the present invention, the first database is different from the second database, and for example, the first database may be an Oracle database, and the second database may be a Hadoop database. That is, after the data processing device finishes collecting the audit data, the data processing device classifies the data according to the data attribute of the collected data, and stores the data into different data processing devices such as a first database and a second database.
In another embodiment of the invention, the Oracle database stores structured data such as results data, thesaurus, category library data. The Hadoop data mainly stores semi-structured data and unstructured data, such as Internet web pages, data processing equipment logs, network element data and the like. Therefore, only the structured data is stored in the first database, the DML operation on the first database can be greatly reduced, the pressure of the traditional database can be greatly reduced, and the load of a host computer is reduced.
S202, receiving a calling instruction for calling data.
Wherein, the calling instruction carries a data identifier; illustratively, the data identification may be a keyword for the data of the call. In this way, the data processing apparatus can determine the data itself to be called when it receives a call instruction for calling the data.
S203, responding to the calling instruction, and determining whether the first database comprises second data corresponding to the data identification.
Here, the data processing apparatus, upon receiving a call instruction for calling data, queries, in response to the call instruction, second data from the first database in accordance with the data identifier to determine whether the second data corresponding to the data identifier is included in the first database.
And S204, if the first database comprises the second data, calling the second data from the first database.
Here, when the data processing determines that the first database includes the second data, the second data is directly called from the first database. That is to say, when receiving a call instruction for calling data, the data processing device responds to the call instruction, and queries not directly for all first data of audit data but for data in the first database, so that once the data processing device queries second data in the first database, the data processing device directly calls the second data without performing query operation for the second data for the second database, and query efficiency is improved. Therefore, in the embodiment of the invention, the defects of audit analysis according to the database are overcome, and the key point is to adjust the storage mode of the database, reduce a large amount of frequent query operations on the database, reduce the pressure of the database, further reduce the load of a host computer bearing the database, and achieve the purposes of improving the efficiency, reducing the load and reducing the risk.
According to the data processing method provided by the embodiment of the invention, first subdata of which the data attribute accords with the preset attribute in first data is stored in a first database, and second subdata except the first subdata in the first data is stored in a second database; wherein the first data comprises data associated with an audit project; receiving a calling instruction for calling data; wherein the calling instruction carries a data identifier; responding to the calling instruction, and determining whether the first database comprises second data corresponding to the data identification; if the first database comprises second data, calling the second data from the first database; that is to say, when a call instruction for calling data is received, in response to the call instruction, first, second data is queried from a first database including first subdata, where the first subdata is data in which a data attribute of data associated with an audit item conforms to a preset attribute.
According to the foregoing embodiments, an embodiment of the present invention provides a data processing method, as shown in fig. 3, including the following steps:
s301, the data processing device acquires the historical query frequency of each piece of first data and the data characteristics of each piece of first data.
Here, the data processing apparatus may set a preset frequency, and acquire a historical query frequency of each of the first data and a data characteristic of each of the first data.
S302, searching data with historical query frequency larger than preset frequency and/or data characteristics conforming to the structured data characteristics from the first data, and determining the data as first subdata.
The data processing device compares the acquired historical query frequency with a preset frequency, compares the data characteristics with the structured data characteristics, and determines that the data in the first data is the first subdata with the data attributes conforming to the preset attributes if the historical query frequency is greater than the preset frequency and/or the data characteristics conform to the structured data characteristics.
S303, the data processing equipment stores first subdata, of which the data attributes in the first data accord with preset attributes, into a first database, and stores second subdata in the first data into a second database.
Wherein the first data comprises data associated with an audit project; the second sub data is data of the first data except the first sub data.
S304, the data processing equipment receives a calling instruction for calling data.
Wherein the calling instruction carries a data identifier.
S305, the data processing equipment responds to the calling instruction and determines whether the first database comprises second data corresponding to the data identification.
S306, if the first database comprises the second data, the data processing equipment calls the second data from the first database.
S307, if the first database does not comprise the second data, inquiring the second data from the second database according to the index, and calling the second data.
It should be noted that, for the descriptions of the same steps and the same contents in this embodiment as those in other embodiments, reference may be made to the descriptions in other embodiments, which are not described herein again.
According to the foregoing embodiments, an embodiment of the present invention provides a data processing method, as shown in fig. 4, including the following steps:
s401, the data processing equipment stores first subdata, of which the data attributes in the first data accord with preset attributes, into a first database, and stores second subdata in the first data into a second database.
Wherein the first data comprises data associated with an audit project; the second sub data is data of the first data except the first sub data.
S402, the data processing equipment acquires the data category of each piece of data in the second sub data.
Here, the data categories of different data in the second sub data may be the same; the data categories of different data in the second sub data may also be different.
S403, the data processing device generates a first index for each data in the second sub-data according to the data category.
Here, the first index is used to identify each data in the second sub data, and a data category of each data in the second sub data can be determined according to the first index.
S404, the data processing device receives a calling instruction for calling data.
Wherein the calling instruction carries a data identifier.
S405, responding to the calling instruction, and determining whether the first database comprises second data corresponding to the data identification.
S406, if the first database comprises the second data, calling the second data from the first database.
S407, if the first database does not contain the second data, the data processing equipment acquires the data type of the second data.
Here, the data category of the second data may be the same as the data category corresponding to the at least one data in the first database.
S408, the data processing equipment inquires the second data from the second database according to the data category and the first index of the second data and calls the second data.
Here, the data processing apparatus, after acquiring the data type of the second data, can query the second data from the second database according to the data type of the second data and the first index, and call the second data.
It should be noted that the first index and the data in the second database are in a one-to-one relationship, and then, when the data processing device queries the second data from the second database according to the data types of the first index and the second data, it finds a data in the second database corresponding to the first index according to the first index.
In another embodiment of the present invention, the S403 data processing device may choose to perform steps a1-a6 after first indexing for each data in the second sub data according to the data category;
a1, the data processing device generates a second index for the data with the same data category in the second database according to the category and the first index.
Here, the data processing device may integrate a real-time data stream search engine Solr, which provides a fast indexing function of one search, in the second database. In the embodiment of the invention, in order to improve the indexing accuracy, a secondary index can be set in the second database, and the secondary index can be generated by a secondary index engine integrated in the Solr, and the secondary index engine has a secondary index function. The secondary search engine is set according to the primary index and comprises a resource grouping component, a complementary cluster index, an on-chip secondary index and the like.
A2, the data processing device receives a calling instruction for calling data.
Wherein the calling instruction carries a data identifier.
A3, responding to the call instruction, determining whether the first database includes the second data corresponding to the data identification
A4, if the first database includes the second data, calling the second data from the first database.
A5, if the first database does not contain the second data, the data processing equipment acquires the data type of the second data;
a6, the data processing device inquires the second data from the second database according to the data category and the second index of the second data, and calls the second data.
Here, the data processing apparatus, after acquiring the data type of the second data, can query the second data from the second database according to the data type of the second data and the second index, and call the second data.
The data category of the second data may also be the same as the data category corresponding to at least one data in the second database.
It should be noted that the second index is in a one-to-many relationship with data in the second database, and then, when the data processing apparatus queries the second data from the second database according to the second index and the data category of the second data, according to one second index, it finds a plurality of data in the second database corresponding to the second index.
In the embodiment of the present invention, for example, referring to fig. 5, a search engine such as Solr is embedded in the Hadoop database, and the Solr can automatically perform index integration on unstructured data to form an index cluster. The index cluster after integration greatly improves the index efficiency. Illustratively, when the data processing method provided by the invention is used for processing the audit service, the data related to the audit item collected by the data processing equipment comprises service data, communication data, internet data and log data, and the data form first data. Further, after the data processing equipment collects the first data, the first data are classified, the structured data are stored in an Oracle database, and the unstructured and semi-structured data are stored in a Hadoop database.
In the embodiment of the present invention, the indexing function of the second database is described in detail in three aspects:
in a first aspect: after the index clusters are formed by the Solr, the secondary retrieval technology of the data processing equipment can perform secondary grouping according to specific data and can perform indexing again according to key information in the clusters. By the aid of secondary retrieval of the index clusters, retrieval precision can be greatly improved, and precision data are provided for the model.
In a second aspect: for frequently used models, the retrieval conditions of the models can be solidified and processed in batch, so that the retrieval precision is improved, and the system resource overhead is reduced.
In a third aspect: by analyzing the Query plan through a prefabricated Structured Query Language (SQL), the data processing equipment can utilize the idle time slice of the processor to perform Query, thereby improving the processing efficiency and effectively applying the idle resource overhead.
The data blocks with fast data growth and frequent data volume processing are migrated to a Hadoop database, and the Hadoop has high reliability, high expansibility, high efficiency and high fault tolerance, so that the requirements can be met. As Hadoop has no advantages in the aspect of real-time processing, and electronic audit needs to carry out a large amount of operations of increasing, deleting, checking and modifying on data through a model, Solr technology is introduced as a supplement. Solr is highly extensible, provides distributed search and index replication, is integrated with Hadoop, and can effectively make up for the deficiency of Hadoop in real-time retrieval.
In the embodiment of the present invention, referring to fig. 5, after the data Processing device performs index integration on the data in the second database, the data are respectively stored in an ORACLE and a Massively Parallel Processing (MPP) database (an arrow between the search engine Solr and the ORACLE database and an arrow between the search engine Solr and the MPP database in fig. 5 represent data migration). The MPP database stores high-value structured stream data, where the data stored in the MPP database, also referred to herein as intermediate data, disappears after the data processing apparatus is restarted. At the moment, the second database comprises a Hadoop database and an MPP database; it should be noted that, if the query result indicates that the first database does not include the second data, querying the second data according to the second database, and invoking the second data may include: if the query result represents that the first database does not comprise the second data, querying the second data according to the MPP database in the second database, and generating a query result; if the query result represents that the MPP database comprises second data, calling the second data; and if the query result represents that the MPP database does not comprise the second data, querying the second data according to the Hadoop database, and calling the second data.
It should be noted that, for the descriptions of the same steps and the same contents in this embodiment as those in other embodiments, reference may be made to the descriptions in other embodiments, which are not described herein again.
As can be seen from the foregoing, the data processing apparatus provided in the embodiment of the present invention can solve the following problems by using the foregoing data processing method:
1) the original centralized storage mode is improved, the data are classified according to data frequency, data types and data values, and different storage solutions are provided according to different classifications. High-value data and the like are continuously stored in an ORACLE database, and high-density data are solved through a Hadoop storage scheme;
2) according to the consideration of audit accuracy and resource overhead balance, a Solr technology and a secondary retrieval technology are introduced into a Hadoop database, so that a multi-level and fine-grained retrieval method is realized;
3) according to a common model, an idle time pre-retrieval model according to SQL is constructed, and idle time slices of a machine are used for retrieval, so that the resource overhead of a host is reduced, and the database efficiency is improved.
Furthermore, the traditional electronic audit system database is only limited by the storage of structured data and resource overhead pressure, and only can be used for simple fixed report presentation and simple model statistical analysis, and the presentation mode and the processing capacity are single, and the presentation mode is a fixed report and a fixed chart.
However, in the embodiment of the invention, a sufficient elastic calculation function is provided according to the Hadoop platform and the ORACLE database, and the optimized intelligent audit platform according to big data can support complex strategy configuration and result display. The platform supports the formulation of auditing strategies of different services, including a data acquisition scheme, a data processing scheme and a data presentation scheme; and can show the multidimensional value information according to the requirements of users.
In the embodiment of the invention, the audit data analysis and the improved audit data analysis can collect data from an Oracle database and a Hadoop database and are not limited by resources of a single data source, so that a more complex service model, such as a neural network model requiring a large amount of data processing overhead, can be constructed in the aspect of data analysis. Through a complex model, data can be analyzed in an all-round mode, multi-dimensional display views (multi-dimensional risk views) are formed, the views have multi-dimensional index display, the single display mode of an original fixed report and a fixed chart can be effectively improved, and the multi-dimensional and full-flow business audit flow can be supported.
In another embodiment of the invention, the audit personnel is changed from passive audit to active audit, which is limited by the expense of traditional audit data source and system resource, and the audit personnel can not realize all audit ideas in the audit system under the traditional electronic audit mode. The improved big data intelligent audit platform can easily solidify the audit thought of an auditor into the model by introducing the interactive data thought, so that the completeness of the audit is greatly promoted.
In practical application, it is found that the data processing device adopting the data processing method provided by the embodiment of the invention has the following performance test results: microscopic effects: after the data processing equipment with the same configuration type is adopted for shunting storage, the space is saved by 20% in a row storage mode, the loading speed is increased by 23% in a column storage mode, and the online query analysis capability (time) is increased by 11.4 times. Macroscopic effect: on the premise that the hardware scale is kept unchanged, the aging range of data processed by the data processing equipment is increased from original 7 days to 3 months, the processed data scale is increased by one order of magnitude, the system throughput is increased by 7 times, and the response time of the original query request with the delay of more than 1s is averagely reduced by 57.4%.
The data processing method provided by the embodiment of the invention also has the following technical advantages: and audit data analysis, wherein the improved audit data analysis can collect data from an Oracle database and a Hadoop database and is not limited by resources of a single data source, so that a more complex service model can be constructed in the aspect of data analysis, such as a neural network model requiring a large amount of data processing overhead. Through a complex model, data can be analyzed in an all-round mode, multi-dimensional display views (multi-dimensional risk views) are formed, the views have multi-dimensional index display, the single display mode of an original fixed report and a fixed chart can be effectively improved, and the multi-dimensional and full-flow business audit flow can be supported. The problem of narrow traditional audit face is solved, the audit range is expanded and the audit flexibility is increased by constructing a secondary index and embedding an audit model. The audit personnel are switched from passive audit to active audit, which is limited by the traditional audit data source and the system resource expenditure, and under the traditional electronic audit mode, the audit personnel can not realize all audit ideas in the audit system. The improved big data intelligent audit platform can easily solidify the audit thought of an auditor into the model by introducing the interactive data thought, so that the completeness of the audit is greatly promoted. The method has the advantages that the integration index improvement of the primary index technology and the secondary index technology is built, the system resource expenditure is greatly reduced while the audit efficiency is improved, the independent index technology is supported, and the audit flexibility and the actual working efficiency of auditors are improved. The management of platform resource expenditure is optimized, and the data processing equipment can utilize the idle time slices of the processor to perform inquiry through prefabricating the analysis inquiry plan, so that the machine efficiency is improved, and the idle resource expenditure is effectively applied.
According to the foregoing embodiments, an embodiment of the present invention provides a data processing apparatus 6, which can be applied to a data processing method provided in the corresponding embodiments of fig. 2 to 4, and as shown in fig. 6, the data processing apparatus includes: a processor 61, a memory 62, and a communication bus 63, wherein:
the communication bus 63 is used for realizing communication connection between the processor 61 and the memory 62;
the processor 61 is used to execute the program of data processing in the memory 62 to realize the following steps:
storing first subdata, the data attribute of which accords with a preset attribute, in first data into a first database, and storing second subdata, in the first data, into a second database; wherein the first data comprises data associated with an audit project; the second subdata is data except the first subdata in the first data;
receiving a calling instruction for calling data; wherein the calling instruction carries a data identifier;
responding to the calling instruction, and determining whether the first database comprises second data corresponding to the data identification;
and if the first database comprises the second data, calling the second data from the first database.
In other embodiments of the present invention, when the processor 61 is used to execute the program for processing data in the memory 62, the following steps can be implemented:
and if the first database does not comprise the second data, inquiring the second data from the second database according to the index, and calling the second data.
In other embodiments of the present invention, after the processor 61 is configured to execute the step of storing the second sub-data in the first data in the second database in the memory 62, the following steps may be further implemented:
acquiring the data category of each datum in the second subdata;
and generating a first index for each data in the second sub-data according to the data category.
In other embodiments of the present invention, the processor 61 is configured to execute the following steps when the first database does not include the second data in the memory 62, the second database is queried according to the index, and the second data is called:
if the first database does not contain the second data, acquiring the data type of the second data;
and inquiring the second data from the second database according to the data category and the first index of the second data, and calling the second data.
In other embodiments of the present invention, before the processor 61 is configured to execute the step of storing the first sub-data in the memory 62, where the data attribute of the first data conforms to the preset attribute, in the first database, the following steps may also be implemented:
acquiring historical query frequency of each datum in the first data and data characteristics of each datum in the first data;
and searching data with the historical query frequency larger than the preset frequency and/or the data characteristics conforming to the structured data characteristics from the first data, and determining the data as first subdata.
In other embodiments of the present invention, after the processor 61 is configured to execute the step of generating the first index for each data in the second sub-data according to the data category in the memory 62, the following steps may be further implemented:
and generating a second index for the data with the same data category in the second database according to the data category and the first index.
In other embodiments of the present invention, the processor 61 is configured to execute the following steps when the first database does not include the second data in the memory 62, the second database is queried according to the index, and the second data is called:
if the first database does not contain the second data, acquiring the data type of the second data;
and inquiring the second data from the second database according to the data category and the second index of the second data, and calling the second data.
It should be noted that, for a specific implementation process of the steps executed by the processor in this embodiment, reference may be made to an implementation process in the data processing method provided in the embodiment corresponding to fig. 2 to 4, and details are not described here again.
According to the data processing device provided by the embodiment of the invention, first subdata of which the data attribute accords with the preset attribute in first data is stored in a first database, and second subdata in the first data is stored in a second database; wherein the first data comprises data associated with an audit project; the second subdata is data except the first subdata in the first data; receiving a calling instruction for calling data; wherein the calling instruction carries a data identifier; responding to the calling instruction, and determining whether the first database comprises second data corresponding to the data identification; if the first database comprises second data, calling the second data from the first database; that is to say, when a call instruction for calling data is received, in response to the call instruction, first, second data is queried from a first database including first subdata, where the first subdata is data in which a data attribute of data associated with an audit item conforms to a preset attribute.
In accordance with the foregoing embodiments, embodiments of the present application also provide a computer storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps of:
storing first subdata, the data attribute of which accords with a preset attribute, in first data into a first database, and storing second subdata, in the first data, into a second database; wherein the first data comprises data associated with an audit project; the second subdata is data except the first subdata in the first data;
receiving a calling instruction for calling data; wherein the calling instruction carries a data identifier;
responding to the calling instruction, and determining whether the first database comprises second data corresponding to the data identification;
and if the first database comprises the second data, calling the second data from the first database.
In other embodiments of the invention, the one or more programs are executable by the one or more processors to perform the steps of:
and if the first database does not comprise the second data, inquiring the second data from the second database according to the index, and calling the second data.
In other embodiments of the present invention, after the one or more programs are executable by the one or more processors to store second sub-data, other than the first sub-data, in the first data in the second database, the following steps may be further implemented:
acquiring the data category of each datum in the second subdata;
and generating a first index for each data in the second sub-data according to the data category.
In other embodiments of the present invention, the one or more programs are executable by the one or more processors to, if the first database does not include the second data, query the second database for the second data according to the index, and call the second data, further implementing the following steps:
if the first database does not contain the second data, acquiring the data type of the second data;
and inquiring the second data from the second database according to the data category and the first index of the second data, and calling the second data.
In other embodiments of the present invention, before the one or more programs are executed by the one or more processors to store the first sub-data in the first data, where the data attribute of the first data corresponds to the preset attribute, in the first database, the following steps may be further implemented:
acquiring historical query frequency of each datum in the first data and data characteristics of each datum in the first data;
and searching data with the historical query frequency larger than the preset frequency and/or the data characteristics conforming to the structured data characteristics from the first data, and determining the data as first subdata.
In other embodiments of the present invention, after the one or more programs are executable by the one or more processors to generate the first index for each data in the second sub-data according to the data category, the following steps may be further implemented:
and generating a second index for the data with the same data category in the second database according to the data category and the first index.
In other embodiments of the present invention, the one or more programs are executable by the one or more processors to, if the first database does not include the second data, query the second database for the second data according to the index, and call the second data, further implementing the following steps:
if the first database does not contain the second data, acquiring the data type of the second data;
and inquiring the second data from the second database according to the data category and the second index of the second data, and calling the second data.
It should be noted that, for a specific implementation process of the steps executed by the processor in this embodiment, reference may be made to an implementation process in the data processing method provided in the embodiment corresponding to fig. 2 to 4, and details are not described here again.
The computer storage medium may be a Memory such as a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a magnetic Random Access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM); and may be various electronic devices such as mobile phones, computers, tablet devices, personal digital assistants, etc., including one or any combination of the above-mentioned memories.
In this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus necessary general hardware, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation. With such an understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the methods described in the embodiments of the present invention.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method of data processing, the method comprising:
storing first subdata, the data attribute of which accords with a preset attribute, in first data into a first database, and storing second subdata in the first data into a second database; wherein the first data comprises data associated with an audit project; the second subdata is data except the first subdata in the first data;
receiving a calling instruction for calling data; wherein the calling instruction carries a data identifier;
responding to the calling instruction, and determining whether second data corresponding to the data identification is included in the first database;
and if the first database comprises the second data, calling the second data from the first database.
2. The method of claim 1, further comprising:
and if the first database does not comprise the second data, inquiring the second data from the second database according to the index, and calling the second data.
3. The method of claim 2, wherein after storing the second sub-data of the first data in the second database, further comprising:
acquiring the data category of each piece of data in the second subdata;
and generating a first index for each data in the second subdata according to the data category.
4. The method of claim 3, wherein querying the second database for the second data according to the index and invoking the second data if the second data is not included in the first database comprises:
acquiring the data type of the second data;
and inquiring the second data from the second database according to the data category of the second data and the first index, and calling the second data.
5. The method of claim 1, wherein before storing the first sub-data of the first data with the data attribute conforming to the preset attribute in the first database, the method further comprises:
acquiring historical query frequency of each data in the first data and data characteristics of each data in the first data;
and searching data with historical query frequency larger than preset frequency and/or data characteristics conforming to structured data characteristics from the first data, and determining the data as the first subdata.
6. The method of claim 3, wherein after generating a first index for each data in the second child data according to the data category, the method further comprises:
and generating a second index for the data with the same data category in the second database according to the data category and the first index.
7. The method of claim 6, wherein querying the second database for the second data according to the index and invoking the second data if the second data is not included in the first database comprises:
if the first database does not contain the second data, acquiring the data type of the second data;
and inquiring the second data from the second database according to the data category of the second data and the second index, and calling the second data.
8. A data processing apparatus, characterized by comprising: a processor, a memory, and a communication bus;
the communication bus is used for realizing communication connection between the processor and the memory;
the processor is used for executing the program for processing the data in the memory so as to realize the following steps:
storing first subdata, the data attribute of which accords with a preset attribute, in first data into a first database, and storing second subdata in the first data into a second database; wherein the first data comprises data associated with an audit project; the second subdata is data except the first subdata in the first data;
receiving a calling instruction for calling data; wherein the calling instruction carries a data identifier;
responding to the calling instruction, and determining whether second data corresponding to the data identification is included in the first database;
and if the first database comprises the second data, calling the second data from the first database.
9. The data processing device according to claim 8, wherein before storing the first sub-data in which the data attribute of the first data conforms to the preset attribute in the first database, the processor is further configured to execute the data processing program to implement the following steps:
acquiring historical query frequency of each data in the first data and data characteristics of each data in the first data;
and searching data with historical query frequency larger than preset frequency and/or data characteristics conforming to structured data characteristics from the first data, and determining the data as the first subdata.
10. A computer storage medium, characterized in that the computer storage medium stores one or more programs executable by one or more processors to implement the steps of the data processing method according to any one of claims 1 to 7.
CN201810639187.8A 2018-06-20 2018-06-20 Data processing method and device and computer storage medium Pending CN110633315A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810639187.8A CN110633315A (en) 2018-06-20 2018-06-20 Data processing method and device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810639187.8A CN110633315A (en) 2018-06-20 2018-06-20 Data processing method and device and computer storage medium

Publications (1)

Publication Number Publication Date
CN110633315A true CN110633315A (en) 2019-12-31

Family

ID=68967501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810639187.8A Pending CN110633315A (en) 2018-06-20 2018-06-20 Data processing method and device and computer storage medium

Country Status (1)

Country Link
CN (1) CN110633315A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114510481A (en) * 2022-02-15 2022-05-17 中国农业银行股份有限公司 User credit data information storage method, device and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100011010A1 (en) * 2003-09-05 2010-01-14 Oracle International Corporation Method and mechanism for efficient storage and query of xml documents based on paths
CN102915373A (en) * 2012-11-06 2013-02-06 无锡江南计算技术研究所 Data storage method and device
CN107341756A (en) * 2017-06-29 2017-11-10 北京公科飞达交通工程发展有限公司 Traffic Heterogeneous Information accesses and interoperability service-specific system
CN107506383A (en) * 2017-07-25 2017-12-22 中国建设银行股份有限公司 A kind of audit data processing method and computer equipment
CN107704601A (en) * 2017-10-13 2018-02-16 中国人民解放军第三军医大学第附属医院 Big data search method and system, computer-readable storage medium and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100011010A1 (en) * 2003-09-05 2010-01-14 Oracle International Corporation Method and mechanism for efficient storage and query of xml documents based on paths
CN102915373A (en) * 2012-11-06 2013-02-06 无锡江南计算技术研究所 Data storage method and device
CN107341756A (en) * 2017-06-29 2017-11-10 北京公科飞达交通工程发展有限公司 Traffic Heterogeneous Information accesses and interoperability service-specific system
CN107506383A (en) * 2017-07-25 2017-12-22 中国建设银行股份有限公司 A kind of audit data processing method and computer equipment
CN107704601A (en) * 2017-10-13 2018-02-16 中国人民解放军第三军医大学第附属医院 Big data search method and system, computer-readable storage medium and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114510481A (en) * 2022-02-15 2022-05-17 中国农业银行股份有限公司 User credit data information storage method, device and equipment

Similar Documents

Publication Publication Date Title
CN110019218B (en) Data storage and query method and equipment
CN109241159B (en) Partition query method and system for data cube and terminal equipment
CN104239377A (en) Platform-crossing data retrieval method and device
CN106777108A (en) A kind of data query method and apparatus based on mixing storage architecture
CN107451149A (en) The monitoring method and its device of data on flows query task
CN111159180A (en) Data processing method and system based on data resource directory construction
CN111046059B (en) Low-efficiency SQL statement analysis method and system based on distributed database cluster
CN111506559A (en) Data storage method and device, electronic equipment and storage medium
CN111506621A (en) Data statistical method and device
CN112965979B (en) User behavior analysis method and device and electronic equipment
CN109669975B (en) Industrial big data processing system and method
CN117150050B (en) Knowledge graph construction method and system based on large language model
CN115222374A (en) Government affair data service system based on big data processing
CN115664785A (en) Big data platform data desensitization system
CN115905630A (en) Graph database query method, device, equipment and storage medium
CN107291938A (en) Order Query System and method
CN107491558A (en) Metadata updates method and device
CN116719822B (en) Method and system for storing massive structured data
CN110633315A (en) Data processing method and device and computer storage medium
CN110134688B (en) Hot event data storage management method and system in online social network
CN108932258A (en) Data directory processing method and processing device
CN113568931A (en) Route analysis system and method for data access request
Jiadi et al. Research on Data Center Operation and Maintenance Management Based on Big Data
US20220215021A1 (en) Data Query Method and Apparatus, Computing Device, and Storage Medium
CN117312549A (en) Work order processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191231