CN110442565A - A kind of data processing method, device, computer equipment and storage medium - Google Patents

A kind of data processing method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN110442565A
CN110442565A CN201910730003.3A CN201910730003A CN110442565A CN 110442565 A CN110442565 A CN 110442565A CN 201910730003 A CN201910730003 A CN 201910730003A CN 110442565 A CN110442565 A CN 110442565A
Authority
CN
China
Prior art keywords
data
database
processing
phase
completion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910730003.3A
Other languages
Chinese (zh)
Other versions
CN110442565B (en
Inventor
邵健锋
崔巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENZHEN NEW TREND INTERNATIONAL LOGISTICS TECHNOLOGY Co Ltd
Original Assignee
SHENZHEN NEW TREND INTERNATIONAL LOGISTICS TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENZHEN NEW TREND INTERNATIONAL LOGISTICS TECHNOLOGY Co Ltd filed Critical SHENZHEN NEW TREND INTERNATIONAL LOGISTICS TECHNOLOGY Co Ltd
Priority to CN201910730003.3A priority Critical patent/CN110442565B/en
Publication of CN110442565A publication Critical patent/CN110442565A/en
Application granted granted Critical
Publication of CN110442565B publication Critical patent/CN110442565B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a kind of data processing method, device, computer equipment and storage medium, the data processing method includes partition database;When data base access is to the crawl phase, the data grabbed are written in database, next database for entering the crawl phase is written after the completion of crawl;When data base access is to process phase, processing locality is carried out to the data in the database, processing locality is carried out to the data in next database for entering process phase after the completion of processing;After the completion of processing locality, the data in database are exported.This method can be made just to separate in the database of data write-in with the database for carrying out data processing, be reduced the performance requirement to single database, also achieve effective preservation of mass data by partition database.

Description

A kind of data processing method, device, computer equipment and storage medium
Technical field
The present invention relates to technical field of data processing, in particular to a kind of data processing method, device, a kind of computer can Read storage medium and a kind of computer equipment.
Background technique
Currently, the purpose of most of data acquisition softwares acquisition data is that real time data is shown to user interface, and it is auxiliary It is stored with data.It is risen therewith with progress, artificial intelligence with the development of science and technology, one important basis of artificial intelligence is exactly to count According to also needing to acquire data using data acquisition software, but its purpose for acquiring data is to collect data, and is used for Data modeling and AI analysis, for example it is used for data trend analysis, to predict the operating status of equipment in short-term following a period of time. Existing data acquisition software is not able to satisfy the demand of data modeling and AI analysis, the reason is that available data acquisition software pair It is higher in the performance requirement of database.
Therefore, the performance requirement to database how is reduced in data processing, optimizes data handling procedure to drop Low database use cost is a technical problem that technical personnel in the field need to solve at present.
Summary of the invention
The embodiment of the invention provides a kind of data processing method, device, a kind of computer readable storage medium and one kind Computer equipment can reduce the performance requirement to database in data processing, optimize data handling procedure to reduce Database use cost.
In a first aspect, the embodiment of the invention provides a kind of data processing methods, comprising:
Partition database;
When data base access is to the crawl phase, the data grabbed are written in database, to next after the completion of crawl The database for entering the crawl phase is written;
When data base access is to process phase, processing locality is carried out to the data in the database, it is right after the completion of processing Data in next database for entering process phase carry out processing locality;
After the completion of processing locality, the data in database are exported.
Second aspect, the embodiment of the invention provides a kind of data processing equipments, comprising:
Division module is used for partition database;
Writing module, for when data base access arrive the crawl phase when, the data grabbed are written in database, crawl Next database for entering the crawl phase is written after the completion;
Processing locality module, for when data base access arrive process phase when, in the database data carry out local Processing carries out processing locality to the data in next database for entering process phase after the completion of processing;
Data export module, for exporting the data in database after the completion of processing locality.
The third aspect, the embodiment of the present invention provide a kind of computer equipment again comprising memory, processor and storage On the memory and the computer program that can run on the processor, the processor execute the computer program Data processing method described in the above-mentioned first aspect of Shi Shixian.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, wherein the computer can It reads storage medium and is stored with computer program, the computer program is realized when being executed by a processor described in above-mentioned first aspect Data processing method.
The embodiment of the invention provides a kind of data processing method, this method includes partition database;Work as data base access To crawl the phase when, the data grabbed are written in database, to next database for entering the crawl phase after the completion of crawl It is written;When data base access is to process phase, processing locality is carried out to the data in the database, it is right after the completion of processing Data in next database for entering process phase carry out processing locality;After the completion of processing locality, export in database Data.This method separates the database for carrying out data write-in with the database for carrying out data processing, so as to list The performance requirement of a database significantly declines, and index when reading data will not due to data frequently write into duration weight It builds, equally reduces the performance requirement to database.The present invention additionally provides a kind of data processing equipment, a kind of computer simultaneously Readable storage medium storing program for executing and a kind of computer equipment have above-mentioned beneficial effect, and details are not described herein.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart of data processing method provided by the embodiment of the present invention;
Fig. 2 is a kind of another flow chart of data processing method provided by the embodiment of the present invention;
Fig. 3 is a kind of another flow chart of data processing method provided by the embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of data processing equipment provided by the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.
It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction Described feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precluded Body, step, operation, the presence or addition of element, component and/or its set.
It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment And be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless on Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.
It will be further appreciated that the term "and/or" used in description of the invention and the appended claims is Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.
Below referring to Figure 1, Fig. 1 is a kind of flow chart of data processing method provided by the embodiment of the present invention.
Specific steps may include:
S101, partition database;
Since in the lower deployment cost of data processing, database authorization expense occupies sizable ratio, using free It is that most of users are top-priority that database, which replaces charging database,.Although free database is in database size, property Can, functionally there be more limitation.But generally modeled without complex data due to local, limitation functionally to data at The influence for managing software is little.The embodiment of the present invention in order to make general data library export data meet data modeling and AI analysis It is required that corresponding improvement has been carried out to database structure, to make database adapt to the application scenarios of data modeling and AI analysis.
In a concrete application scene, the database being previously mentioned in the embodiment of the present invention is local data base, and the present invention is real Applying the data processing that example is carried out is also to be handled locally, locally to get out data in advance, is uploaded when needed To cloud, data modeling and AI analysis etc. are carried out.
This step divides (or segmentation) database, and the purpose of division is to make at data write-in and data Reason separation, can reduce the performance requirement of database in this way.
Division for database can according to need using different division modes, such as can be using according to predetermined The model split database of time span, or using model split database according to predetermined size, it is of course also possible to according to Other modes carry out partition database, and the embodiment of the present invention is not particularly limited this.In the embodiment of the present invention, it is so-called division or Segmentation refers to creation database, makes to meet above-mentioned division mode when the database write-in data.
Due in data processing, handling the different data of same period mainly as unit of the time, therefore divide Database is cut preferably to carry out by standard of the time.In concrete application scene, number can be divided according to a predetermined time period According to library.If the predetermined time period is fixed value, then each database divided, time span is identical, if The predetermined time period is change value, then each database divided, will determine time span according to change conditions.
Specifically, length partition database refers to crawl time of determining database data be written to schedule Section, that is, set a property for the database, which includes the crawl period of data be written.Such as provide a number According to library, for be written from same day 1:00 AM it is whole to same day 2:00 AM it is whole between the data that grab, then its predetermined time length is 1 Hour.According to the explanation of front, when according to the predetermined time length partition database, in order to determine grabbing for data be written The period is taken, the crawl initial time of data be written, that is, the crawl time started of data be written and crawl knot should be also specified In beam time, such as above-mentioned example, crawl time starteds of data be written is that same day 1:00 AM is whole, and the crawl end time is Same day 2:00 AM is whole, and the database of division is the data that are grabbed of period for being specifically used to be written the formation of the two time points. A benefit of length partition database is convenient the writing according to timing progress data when data are written to schedule Enter, because all corresponding records have the crawl time when crawl data, only the number met the requirements need to be written according to the crawl time According to, in addition in follow-up data treatment process, be typically also as unit of the time, in a period of time data carry out batch Processing, so data-handling efficiency can be improved, another benefit of length partition database is subsequent asking to schedule When seeking data, need to request the corresponding data of the setting time according to setting time, by searching in predetermined time length The setting time can quickly position setting time position, and quickly return to the corresponding number of the setting time found According to.Since the total amount of data grabbed in predetermined time length is usually in a certain range, too big variation not to be had, so can The suitable database of size is created according to the writable total amount of data of predetermined time the past length.
Specifically, model split database according to predetermined size, which refers to, provides a tentation data capacity for database, In this case it is necessary to define when start that data are written, when terminates that data are written.It is specifically answered at one It is such as last according to a upper database in scene, can continuing to determine the time for starting that data are written according to the crawl time The crawl time of the data of write-in determines that current database starts that the crawl time of data is written, such as a upper database is last The crawl time that data are written is 12 points of same day 35 minutes and 45 seconds, then database also from the crawl time is then 12 points of same day 35 minutes 45 seconds data start to be written, and are then sequentially written in data sequentially in time, until the total amount of data of write-in is close to or up to Until the tentation data capacity of database, it is described in detail below as specific judgment method.
S102, when data base access is to the crawl phase, the data grabbed are written in database, it is right after the completion of crawl Next database for entering the crawl phase is written;
In a concrete application scene, after length creates database to schedule, data base access to crawl Phase, it is meant that the data grabbed are written in database by needs, and in the embodiment of the present invention, database is written (i.e. Write-in processing) what is referred to is exactly to write data into database.The data volume for being written to database is according to predetermined time length Come what is determined, predetermined time length is bigger, then the data time span being written is bigger, and writable data volume is bigger, makes a reservation for Time span is smaller, then the data time span being written is smaller, and writable data volume is smaller.Terminate the trigger condition of write-in Exactly judge whether data the be written corresponding crawl time is the end time, if so, then terminating the write-in of data, such as It is no, then continue the write-in of data.After the action triggers for terminating write-in, next database for entering the crawl phase can be carried out Write-in repeats the above process one new database of creation, to new database after new data base access to crawl phase Carry out the write-in of data.
In a concrete application scene, after creating database according to predetermined size, data base access to crawl phase, meaning Taste need for the data grabbed to be written in database, from starting to be written to the number for terminating write-in and being written in total this period It should be not more than according to amount and close to preset data capacity, that is to say, that the data volume for being written to database is held according to preset data Amount determination.Terminate write-in trigger action be exactly judge preset data capacity subtract presently written data volume difference whether Within preset threshold, such as within preset threshold, then terminate the write-in of data, if within preset threshold, then do not continued data Write-in.Such as the preset threshold is 5Mb, then then terminating to write when the data volume that preset data capacity subtracts write-in is 2Mb Enter data, the data volume being written in database at this time is not more than preset data capacity, and and preset data capacity difference default In threshold range.It should be noted that when the difference that preset data capacity subtracts the data volume of write-in is exactly equal to the preset threshold, Then directly terminate that data are written.After the action triggers for terminating write-in, next database for entering the crawl phase can be carried out Write-in repeats the above process one new database of creation, to new database after new data base access to crawl phase Carry out the write-in of data.
S103, when data base access is to process phase, processing locality is carried out to the data in the database, processing is completed Processing locality is carried out to the data in next database for entering process phase afterwards;
It is right in the embodiment of the present invention when data base access to process phase is intended to carry out processing locality to database What database progress processing locality referred to is exactly to carry out processing locality to the data in database.It is right due in the embodiment of the present invention Processing locality is only carried out in the database for entering process phase, write-in processing is only carried out for the database for entering the crawl phase, In other words, the processing locality of database is separated with write-in processing, such as a database only carries out (batch) and is written, and one Database only carries out processing locality, this significantly to decline the performance requirement of single database in entire treatment process, Because the processing locality of database is independent of each other with write-in processing.In addition, when carrying out data processing locality, when reading data Index will not be frequently write by data and duration is rebuild, so also reducing the performance requirement to database.
Likewise, after completing to the processing locality that data in the database carry out process phase can be entered to next Database carry out processing locality, i.e., in next database for entering process phase data carry out processing locality, in this way may be used Continuously successively to carry out write-in and processing locality to the data in database.
S104, data after the completion of processing locality, in export database.
After the completion of the processing locality carried out to the data in database, the data base access to export phase is carried out at export Reason, can export the data in database within the export phase, so as to the memory space of subsequent release database, and can create new Database re-enters into the crawl phase, carries out the write-in of new data.
The present embodiment includes partition database;When data base access is to the crawl phase, the data grabbed are written to number According in library, next database for entering the crawl phase is written after the completion of crawl;It is right when data base access is to process phase Data in the database carry out processing locality, carry out after the completion of processing to the data of next database for entering process phase Processing locality;After the completion of processing locality, the data in database are exported.This method will carry out high-volume data write-in Database is separated with the database for carrying out data processing, to declining the performance requirement to single database significantly, And index when reading data will not be rebuild duration due to data frequently write into, and equally reducing needs the performance of database It asks.
The embodiment of the present invention also provides a kind of data processing method, as shown in Fig. 2, itself comprising steps of
S201, partition database;
S202, when data base access is to the crawl phase, the data grabbed are written in database, it is right after the completion of crawl Next database for entering the crawl phase is written;
S203, when data base access is to cooling phase, cooling treatment is carried out to the database, with provide when needed after It is continuous to grab and to be prepared into process phase;
S204, when data base access is to process phase, processing locality is carried out to the data in the database, processing is completed Processing locality is carried out to the data in next database for entering process phase afterwards;
S205, data after the completion of processing locality, in export database.
In above-mentioned steps, the embodiment of S201 and S101, S202 and S102, S204 and S103, S205 and S104 are homogeneous Together, specific implementation detail can refer to the data processing method of previous embodiment offer, and the present embodiment repeats no more this.
The increased S203 of institute in the present embodiment is described in detail below.In S202, the data of database have been written Cheng Hou can enter cooling phase in S203 and carry out cooling treatment, cooling treatment be in order to provide when needed continue crawl and To be prepared into process phase.
In S202, data are written in the crawl phase in database, and the database will go into cooling phase at this time, and newly Database continues to generate, and enters crawl phase write-in data.The cooling phase that original database then enters S203 carries out database Cooling treatment.Since the embodiment of the present invention may use multiple grabbers, (equipment that can grab data, grabs data The data grabbed are written in database afterwards), the time system of grabber may have difference slightly, in addition grab Connection between device and database may disconnect in short term, therefore the database in cooling phase is set as writable state and (grabs Take device that can still write data into the database), to ensure that the writing process of database keeps complete.Existing for cooling phase Meaning lies also in and confirms that the data of all grabber crawls are completed write-in, to carry out subsequent operation.
The embodiment of the present invention also provides a kind of data processing method, as shown in figure 3, comprising:
S301, partition database;
S302, when data base access is to the crawl phase, the data grabbed are written in database, it is right after the completion of crawl Next database for entering the crawl phase is written;
S303, when data base access is to the currently processed phase, processing locality, processing are carried out to the data in the database Processing locality is carried out to the data in next database for entering the currently processed phase after the completion;
S304, when data base access is to next process phase, processing locality, processing are carried out to the data in the database Processing locality is carried out to the data in next database for entering next process phase after the completion;
S305, data after the completion of processing locality, in export database.
In the present embodiment, S301, S302, S305 and the corresponding process embodiment of above-described embodiment are all the same, specifically Implementation detail can refer to the data processing method that the corresponding embodiment of Fig. 1 provides, the present embodiment repeats no more this.In addition, The present embodiment can be implemented on the basis of Fig. 2 corresponding embodiment, i.e., combine the two, in conjunction with mode be, in S302 Increase step between S303 and " when cooling phase is arrived in data base access, cooling treatment is carried out to the database, in need There is provided when wanting and continue to grab and to be prepared into process phase ", available preferred embodiment.
Below in the present embodiment S303 and S304 be described in detail.
In the present embodiment, the process phase is provided with multiple, i.e., database is introduced into a process phase and is originally located in Reason, the database enters back into next process phase and carries out processing locality after the completion of processing, until all process phases have been handled Finish.
Specifically, in S303, when data base access is to the currently processed phase, this is carried out to the data in the database Ground processing carries out processing locality to next database for entering the currently processed phase after the completion of processing.In S304, the database After the completion of the processing of currently processed phase, next process phase can be entered and continue processing locality, after processing is completed to next The database for entering next process phase carries out processing locality.
In a concrete application scene, the process phase be provided with it is N number of, respectively handle 1 phase, processing 2 phases, processing 3 Phase ..., processing N phase.If the processing sequence of data is just followed successively by 1 phase of processing, 2 phases of processing, 3 phases ... of processing, processing N phase, that It is introduced into 1 phase of processing progress processing locality to database A, and 2 phases of processing progress processing locality is entered after the completion of processing, has been handled Enter 3 phases of processing progress processing locality ... at rear, enters finally into processing N phase progress processing locality.Meanwhile database A into When entering 2 phases of processing progress processing locality, database B can then enter 1 phase of processing progress processing locality, enter place in database A When managing 3 phases progress processing locality, database B can then enter 2 phases of processing progress processing locality, and database C can then enter everywhere Manage 1 phase carry out processing locality, database each in this way will be sequentially entered in the way of pipeline system corresponding process phase into Row processing, makes to substantially reduce the performance requirement of database.Importantly, being required according to the algorithm of data processing, local number It may need to use the data for being several times as much as database time span as reference according to processing, or need using newly in the processing moment Data, so multiple process phases are arranged in the embodiment of the present invention, for handling different databases, each number simultaneously when needed Each process phase, which is sequentially entered, according to timeliness arrangement according to library carries out processing locality.
Since the data to database are modeled and are analyzed, apparent return is not had in a short time, so generally needing System deployment cost is reduced, up-front investment is reduced, and most important one approach is exactly to store sea by the way of low cost Measure data.Mass data is stored in a manner of low cost in order to realize, needs to be effectively treated the data in database, with Reduce lower deployment cost.
In a concrete application scene, the data in the database carry out processing locality and include:
It determines required derived data in database, and required derived data is replicated.
In the aforementioned embodiment, database is divided, and database division be used only to mass data is effective Carry out save.In order to handle data beyond the clouds, the data after preservation must carry out effective compression processing, on reducing To the demand of network when biography.
After carrying out compression processing to the data in database, the data volume in database will reduce, it is ensured that subsequent need The data of smaller size are uploaded into cloud and carry out subsequent data processing.
In a concrete application scene, required derived data are comprised determining that in database in the determining database Derived data are not required to, and the data in database in addition to being not required to export are determined as required derived data;Wherein determine Derived data are not required in database to include redundant data in determining database, determine hash and determination in database One or more of the data that data dimension can reject are converted in database.
It is illustrated separately below for the embodiment of processing locality therein, it will be clear that the embodiment of the present invention may be used also Processing locality is carried out using the other modes for being not limited to above embodiment, the embodiment of the present invention is also only to enumerate wherein more It is typical several, it is contemplated that those skilled in the art according to an embodiment of the present invention can illustrate, without creative labor Dynamic be contemplated that carries out processing mode using other modes, these schemes belong to the protection scope of the claims in the present invention.
It for the database of process phase, is defaulted as read-only, i.e., not will do it substantial compression processing, and be Determine one or more of the data that can be rejected in redundant data therein, hash and change data dimension, these numbers According to being not need derived data, after excluding these data, that is, it can determine that required derived data in database, and Required derived data are replicated.It is subsequent in this way the data copied only to be exported when carrying out data export, Abandon unwanted data, i.e., derived data do not include above-mentioned redundant data, hash, the data that can be rejected etc., thus It realizes the deletion to these data, achievees the purpose that substantial compression.That is, in the process phase of database, not to any number According to delete operation is carried out, in order to avoid influencing the data processing operation of other process phases, while it can also make the data processing of each database It operates more complete.
For determining for the redundant data in data, due to during data grabber, it is therefore possible to use multiple grabbers It is grabbed, so the data of crawl can have certain redundancy, that is, there is repeated data, such as according to the data grabber time Redundant data is determined with the condition that project etc. can customize.It, can be true by redundant data therein therefore when carrying out processing locality It fixes, then can determine required derived data, a specific implementation method is to be known as base with corresponding data item name Required derived data are replicated under the data item by plinth, and the data after duplication can individually save Mr. Yu position (in database Or the other positions of database), facilitate and is exported.
For determining for the hash in data, in order to reducing environment maximum when failure occurs, it is System needs to acquire the mass data sample in (such as 5 minutes) for the previous period when failure occurs, so as in the event of a failure Environment before failure can be restored.But in system normal course of operation, these data are little instead to mathematics model building significance.Cause Real time data can only be provided for hardware sensor, thus system can only comprehensive collection these hashes, but the acquisition moment it A period of time afterwards can abandon these hashes after confirmation system is working properly.Likewise, a specific embodiment party Method is to be known as basis with corresponding data item name, required derived data is replicated under the data item, the data after duplication Mr. Yu position (in database or other positions of database) can be individually saved, facilitates and is exported.
For the data that can be rejected when determining change data dimension, the data of sensor acquisition are the number of current state According to, and partial data only needs to record the conversion time of work in modeling.Such as motor rotary state, the meeting in each acquisition Report currently whether operate, rotation direction, and it is only necessary to know that at the time of working condition changes every time, when continuing when data modeling Between with current state.So in data processing, these initial data need to be converted into required for data modeling Data, the data that can largely reject can be generated during this.Likewise, a specific implementation method is with corresponding number According to the entitled basis of item, required derived data are replicated under the data item, the data after duplication can individually save Mr. Yu Position (in database or other positions of database), facilitates and is exported.
It should be noted that the present invention can carry out above-mentioned three kinds of locals to the same database simultaneously in a process phase Processing, can also carry out a kind of above-mentioned processing locality, or a certain respectively in three process phases to the same database A process phase carries out a kind of above-mentioned processing locality to a database, carries out in next process phase to the same database above-mentioned Other two kinds of processing localities, these embodiments belong to a kind of subordinate concept of the scope of the present invention, it is clear that belong to this The protection scope of invention.
In a concrete application scene, the data in the export database include:
Export replicated data.
Data in export database i.e. of the invention are substantially the data exported after duplication, i.e., required derived data. This step is only substantial compression processing.It will eliminate after not needing derived data, the data of needs will be exported.
The embodiment of the present invention exports the method for data using duplication useful data, deletes data indirectly.
After data in the export database, the data processing method further include:
Derived data are uploaded to cloud, and delete data and/or correspondence database derived from local.
When final export, derived data preservation will be needed to be independent local file, thus make local file relative to For original database, data size is greatly reduced, and is conveniently locally stored or dumps to other positions, while can also be reduced It is uploaded to the bandwidth demand in cloud.
After final export, the corresponding database of exported data can be deleted, to discharge its memory space, facilitates progress The database of next round creates, and carries out the processes such as new data write-in, processing and export.It is uploaded in local file simultaneously Afterwards, local file (derived data) can be deleted, reduce the occupancy of local memory, facilitate carry out it is subsequent storage and on It passes.
The embodiment of the present invention can reduce weight to mass data, realize the storage mass data of low cost.It reduces and is System lower deployment cost, reduces investment.In addition, data, which finally need to upload to after acquisition cloud, carries out data analysis work, this Inventive embodiments reduce the transmission of hash, reduce the data volume for needing to be transferred to cloud, improve transmission speed with And reduce network bandwidth requirements.
In embodiments of the present invention, in a process phase, it can configure and use one or more processors.Each place Manage the corresponding specific mathematical algorithm of device or business function.By being applied in combination for processor, specific service logic is realized. In a process phase, any processor can also not used.Process phase will be only used as time interval at this time, to wait database Next process phase can be entered.
The variable that the embodiment of the present invention is previously mentioned, it can be interpreted as every class number of data grabber equipment and processor generation According to title, such as " motor [No. 15] temperature ".
Processor therein can be one section of specific program.The program supports source acquisition place from any customer trust Device is managed, it can be by providing in many ways.Processor interface includes input value, output valve, and shape can be kept during being run multiple times State data, continuously to handle multiple databases.
Wherein, processor can specify one or more variables as input.Variable can come from the data of crawl, can also With the output from other processors.The quantity and type of variable depend on processor and are arranged.
All data of first time value of current data to be processed are divided according to process phase demand after system starting It Xie Ru not the corresponding input bit of processor.After the completion of write-in, the operation of processor is successively triggered.Processor internal logic is for holding Row all or caused by the change of specific variable triggers.Processor completes the algorithm made by oneself movement in each triggering moment, reads All input variable values for taking current time change the state of itself, and optional output valve.
Particularly, processor can application time delay during processing.If application is reached in delay time Afterwards, system can execute primary triggering to it.This time delay refers to data processing time, rather than when the true operation of current system Between.
After the completion of all trigger actions are performed both by, system will handle all data of following second time value, weight Multiple above-mentioned steps, until entire database processing finishes.
Processor can specify one or more variables as output.The quantity and type of variable, set depending on processor It sets.Output valve corresponding time, the as time of present input data.
Write back data library (with reference to stability security mechanism hereinafter) can be set in the variable of output, or is served only for this moment The input of other processors of operation.If inputted for other processors, this output can equally trigger alignment processing device.In order to Caused cycle deadlocks are prevented, allow to be arranged whether trigger corresponding processor in processor internal logic and outside setting.
The embodiment of the present invention provides interface for the time offset of processor inside setting input, and time offset is non-negative Time value.It is defaulted as 0, i.e., is not deviated.After being provided with time offset, when providing data this will be inputted, update is provided Data.Such as it is 1 minute that some input of some processor, which sets time offset, then when the number for handling t moment According to when, this input data be t+1 minutes moment data.In order to guarantee that data exist, the processing provided with time offset Device can not work in first process phase.Its work phase cannot be below the period needed for data complete the time prepared earliest.Example Such as when database was cut with 5 minutes for length, some input of some processor is provided with 8 minutes time offsets, then this Processor earliest can only be in the 3rd process phase (can provide data of the time offset no more than 10 minutes) work.
Particularly, when processor has internal logic to need, it can directly apply for that batch data is read.It specifies and becomes when reading List, time started, end time are measured, disposably to return to all qualified data.Where this reading is not only restricted to The time bound of process phase.Read the movement that will not trigger any processor.
In the embodiment of the present invention, stability security mechanism refers to since processor logic is complicated, and author is uncontrolled, System, which is unable to ensure processor, can stablize execution.When executing beginning, volatile data base can be established in each database, To store possible write-in data and processor state data.When being executed, processor works in independent host processes, And heartbeat data is sent back at regular intervals.The data that processor writes back can only be stored in volatile data base.It is held in fault-free After the completion of row, independent host processes notice execution movement is completed, and is at this time saved back the data of volatile data base currently processed Database, and delete volatile data base;After execution is broken down, independent host processes are fed back, so as to straight Deletion temporary library is connect, and is re-executing treatment process later.Such as it is more than that setting time does not obtain heartbeat data refreshing, will also recognizes It is independent host processes failure, and deletes volatile data base.Since the title for creating volatile data base every time is different, even if Subsequent to execute again, data can not also be written again and interfere new execution for the upper independent host processes having timed, out.
Fig. 4 is referred to, Fig. 4 is a kind of structural schematic diagram of data processing equipment provided by the embodiment of the present invention;
The apparatus may include:
Division module 401 is used for partition database;
Writing module 402, for when data base access is to the crawl phase, the data grabbed to be written in database, Next database for entering the crawl phase is written after the completion of crawl;
Processing locality module 403, for when data base access arrive process phase when, in the database data carry out Data in next database for entering process phase are carried out processing locality after the completion of processing by ground processing;
Data export module 404, for exporting the data in database after the completion of processing locality.
The present apparatus is according to predetermined time period partition database;When data base access is to the crawl phase, the number that will grab According to being written in database, next database for entering the crawl phase is written after the completion of crawl;When data base access is arrived When process phase, processing locality is carried out to the data in the database, to next data for entering process phase after the completion of processing Library carries out processing locality;After the completion of processing locality, the data in database are exported.The present apparatus will carry out high-volume data The database of write-in is separated with the database for carrying out data processing, to making the performance requirement to single database apparent Decline, and index when reading data will not be rebuild duration due to data frequently write into, and equally be reduced to database Performance requirement.
Further, the present apparatus further include:
Refrigerating module, for when data base access arrive cooling phase when, to the database progress cooling treatment, to need When provide and continue to grab and to be prepared into process phase.
Further, the process phase is provided with multiple, correspondingly, the processing locality module 403 includes:
Current processing unit, for when data base access arrive the currently processed phase when, in the database data progress Processing locality carries out processing locality to the data in next database for entering the currently processed phase after the completion of processing;
Next processing unit, for when data base access arrive next process phase when, in the database data progress Processing locality carries out processing locality to the data in next database for entering next process phase after the completion of processing.
Further, the processing locality module 403 includes:
Replication processes unit for determining required derived data in database, and answers required derived data System.
Further, the replication processes unit is specifically used for: determining and is not required to derived data in database, and by data Data in library in addition to being not required to export derived data needed for being determined as;It wherein determines in database and is not required to derived data packet The redundant data in determining database is included, determine hash in database and determines that conversion data dimension can pick in database One or more of data removed.
Further, the data export module 404 be specifically used for export database in remove redundant data outside data, It removes in export database and is removed outside the data that change data dimension can be rejected in data and the export database outside hash One or more of data.
Further, division module 401 includes:
Division unit, for length partition database to schedule.
Further, the present apparatus further include:
Upload process module for derived data to be uploaded to cloud, and deletes data and/or correspondence derived from local Database.
Since the embodiment of device part is corresponded to each other with the embodiment of method part, the embodiment of device part is asked Referring to the description of the embodiment of method part, wouldn't repeat here.
The present invention also provides a kind of computer readable storage mediums, have computer program thereon, the computer program It is performed and step provided by above-described embodiment may be implemented.The storage medium may include: USB flash disk, mobile hard disk, read-only deposit Reservoir (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or The various media that can store program code such as CD.
The present invention also provides a kind of computer equipments, may include memory and processor, have in the memory Computer program when the processor calls the computer program in the memory, may be implemented above-described embodiment and be provided The step of.Certain computer equipment can also include various network interfaces, the components such as power supply.
Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other realities The difference of example is applied, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment Speech, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part illustration .It should be pointed out that for those skilled in the art, without departing from the principle of the present invention, also Can be with several improvements and modifications are made to the present invention, these improvement and modification also fall into the protection scope of the claims in the present invention It is interior.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.Under the situation not limited more, the element limited by sentence "including a ..." is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.

Claims (10)

1. a kind of data processing method characterized by comprising
Partition database;
When data base access is to the crawl phase, the data grabbed are written in database, to next entrance after the completion of crawl Database to the crawl phase is written;
When data base access is to process phase, processing locality is carried out to the data in the database, to next after the completion of processing The data entered in the database of process phase carry out processing locality;
After the completion of processing locality, the data in database are exported.
2. data processing method according to claim 1, which is characterized in that it is described when data base access is to process phase, it is right Data in the database carry out processing locality, after the completion of processing to the data in next database for entering process phase into Before row processing locality, the data processing method further include:
When data base access is to cooling phase, cooling treatment is carried out to the database, with provide when needed continue crawl and To be prepared into process phase.
3. data processing method according to claim 1, which is characterized in that the process phase be provided with it is multiple, correspondingly, institute State when data base access is to process phase, processing locality carried out to the data in the database, after the completion of processing to it is next into Enter to the data progress processing locality in the database of process phase and includes:
When data base access is to the currently processed phase, processing locality is carried out to the data in the database, it is right after the completion of processing Data in next database for entering the currently processed phase carry out processing locality;
When data base access is to next process phase, processing locality is carried out to the data in the database, it is right after the completion of processing Data in next database for entering next process phase carry out processing locality.
4. data processing method according to claim 1, which is characterized in that the data in the database carry out this Ground is handled
It determines required derived data in database, and required derived data is replicated.
5. data processing method according to claim 4, which is characterized in that required derived data in the determining database It comprises determining that and is not required to derived data in database, and led needed for the data in database in addition to being not required to export are determined as Data out;Wherein determine that derived data are not required in database to be included redundant data in determining database, determine database In hash and determine one or more of the data that can reject of conversion data dimension in database.
6. data processing method according to claim 1, which is characterized in that the partition database includes:
Length partition database to schedule.
7. data processing method according to claim 1, which is characterized in that after the data in the export database, institute State data processing method further include:
Derived data are uploaded to cloud, and delete data and/or correspondence database derived from local.
8. a kind of data processing equipment characterized by comprising
Division module is used for partition database;
Writing module, for when data base access arrive the crawl phase when, the data grabbed are written in database, crawl completion Next database for entering the crawl phase is written afterwards;
Processing locality module, for when data base access arrive process phase when, in the database data progress processing locality, Processing locality is carried out to the data in next database for entering process phase after the completion of processing;
Data export module, for exporting the data in database after the completion of processing locality.
9. a kind of computer equipment, which is characterized in that including memory, processor and be stored on the memory and can be in institute The computer program run on processor is stated, the processor realizes such as claim 1 to 7 times when executing the computer program Data processing method described in one.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes the data processing method as described in any one of claim 1 to 7 when the computer program is executed by processor.
CN201910730003.3A 2019-08-08 2019-08-08 Data processing method, device, computer equipment and storage medium Active CN110442565B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910730003.3A CN110442565B (en) 2019-08-08 2019-08-08 Data processing method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910730003.3A CN110442565B (en) 2019-08-08 2019-08-08 Data processing method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110442565A true CN110442565A (en) 2019-11-12
CN110442565B CN110442565B (en) 2023-06-30

Family

ID=68433980

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910730003.3A Active CN110442565B (en) 2019-08-08 2019-08-08 Data processing method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110442565B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07325839A (en) * 1994-06-02 1995-12-12 Mitsubishi Electric Corp Time series data processor
CN101697152A (en) * 2009-10-23 2010-04-21 金蝶软件(中国)有限公司 Database storage system and method and device for splitting data thereof
CN105069134A (en) * 2015-08-18 2015-11-18 上海新炬网络信息技术有限公司 Method for automatically collecting Oracle statistical information
CN105159925A (en) * 2015-08-04 2015-12-16 北京京东尚科信息技术有限公司 Database cluster data distribution method and system
CN105608202A (en) * 2015-12-25 2016-05-25 北京奇虎科技有限公司 Data packet analysis method and device
US20170031994A1 (en) * 2015-07-27 2017-02-02 Datrium, Inc. System and Methods for Storage Data Deduplication
CN106649857A (en) * 2016-12-30 2017-05-10 北京恒华伟业科技股份有限公司 Reading and writing separation-based database operation method and apparatus
US20170140021A1 (en) * 2015-11-13 2017-05-18 Sap Se Efficient partitioning of related database tables
US20170262232A1 (en) * 2016-03-11 2017-09-14 EMC IP Holding Company LLC Method and apparatus for optimizing data storage based on application
US20180121135A1 (en) * 2016-11-01 2018-05-03 SK Hynix Inc. Data processing system and data processing method
CN108073703A (en) * 2017-12-14 2018-05-25 郑州云海信息技术有限公司 A kind of comment information acquisition methods, device, equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07325839A (en) * 1994-06-02 1995-12-12 Mitsubishi Electric Corp Time series data processor
CN101697152A (en) * 2009-10-23 2010-04-21 金蝶软件(中国)有限公司 Database storage system and method and device for splitting data thereof
US20170031994A1 (en) * 2015-07-27 2017-02-02 Datrium, Inc. System and Methods for Storage Data Deduplication
CN105159925A (en) * 2015-08-04 2015-12-16 北京京东尚科信息技术有限公司 Database cluster data distribution method and system
CN105069134A (en) * 2015-08-18 2015-11-18 上海新炬网络信息技术有限公司 Method for automatically collecting Oracle statistical information
US20170140021A1 (en) * 2015-11-13 2017-05-18 Sap Se Efficient partitioning of related database tables
CN105608202A (en) * 2015-12-25 2016-05-25 北京奇虎科技有限公司 Data packet analysis method and device
US20170262232A1 (en) * 2016-03-11 2017-09-14 EMC IP Holding Company LLC Method and apparatus for optimizing data storage based on application
US20180121135A1 (en) * 2016-11-01 2018-05-03 SK Hynix Inc. Data processing system and data processing method
CN106649857A (en) * 2016-12-30 2017-05-10 北京恒华伟业科技股份有限公司 Reading and writing separation-based database operation method and apparatus
CN108073703A (en) * 2017-12-14 2018-05-25 郑州云海信息技术有限公司 A kind of comment information acquisition methods, device, equipment and storage medium

Also Published As

Publication number Publication date
CN110442565B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
CN109460349B (en) Test case generation method and device based on log
EP3754515A1 (en) Reshard method and system in distributed storage system
CN110532247A (en) Data migration method and data mover system
CN104301360B (en) A kind of method of logdata record, log server and system
CN105447014B (en) Metadata management method based on binlog and for providing the method and device of metadata
CN107145403A (en) The relevant database data retrogressive method of web oriented development environment
CN105868343B (en) Database migration method and system
CN108536752A (en) A kind of method of data synchronization, device and equipment
US8472449B2 (en) Packet file system
CN103678519B (en) It is a kind of to support the enhanced mixing storage systems of Hive DML and its method
AU2004304873A1 (en) Method and apparatus for data storage using striping
CN104035925B (en) Date storage method, device and storage system
WO2016070529A1 (en) Method and device for achieving duplicated data deletion
CN101692226A (en) Storage method of mass filing stream data
KR101374533B1 (en) High performance replication system and backup system for mass storage data, method of the same
CN106649467A (en) Blue-ray disc jukebox archiving management method and system
JPH10207754A (en) Duplication system for updating system data base
CN107451190A (en) Can persistence non-relational database data processing method and device
CN107665219A (en) A kind of blog management method and device
CN111930716A (en) Database capacity expansion method, device and system
CN104820625B (en) A kind of data record, backup and the restoration methods of Information management system
CN108536833A (en) A kind of distributed, database and its construction method towards big data
CN110287152A (en) A kind of method and relevant apparatus of data management
CN103631831B (en) A kind of data back up method and device
CN110515958A (en) Data consistency method, apparatus, equipment and storage medium based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant