CN107766572A - Distributed extraction and visual analysis method and system based on economic field data - Google Patents
Distributed extraction and visual analysis method and system based on economic field data Download PDFInfo
- Publication number
- CN107766572A CN107766572A CN201711113558.0A CN201711113558A CN107766572A CN 107766572 A CN107766572 A CN 107766572A CN 201711113558 A CN201711113558 A CN 201711113558A CN 107766572 A CN107766572 A CN 107766572A
- Authority
- CN
- China
- Prior art keywords
- data
- task
- distributed
- field
- extraction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5072—Grid computing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/48—Indexing scheme relating to G06F9/48
- G06F2209/484—Precedence
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application provides a kind of distributed extraction based on economic field data and visual analysis method and system.Distributed extraction and visual analysis method wherein based on economic field data, including:Distributed data extraction step:Back-end server receives user and extracts the instruction of big data and be sent to host node, the big task for extracting data is cut into small task by host node according to the instruction received according to some or multiple field dimensions of task, and small task is distributed to different processing nodes, node is handled to initiate to ask to text retrieval system according to the small task of distribution, host node stores the small task of generation into database, and the state of synchronous task in the process of running;Distributed storage step;Distributed Calculation and analytical procedure;Data load and caching step;Result visualization shows step.The application improves the efficiency of big data extraction, while reduce the threshold of user's big data analysis by above-mentioned means.
Description
Technical field
The application is related to technical field of data processing, especially, be related to distributed extraction based on economic field data and
Visual analysis method and system.
Background technology
In the epoch of the very fast expansion of this current data volume, big data becomes the noun of the supreme arrogance of a person with great power.The big data epoch
Have no lack of quantity, it is important that we need to find overall rule by quantity, so as to analyze big data.
Big data is utilized as the key factor of raising core competitiveness.The decision-making of all trades and professions is from " business is driven
It is dynamic " transformation " data-driven ".Analysis to big data can make retailer grasp market trend in real time and make reply rapidly;
More accurate effective marketing strategy can be formulated for businessman decision support is provided;Enterprise can be helped to be provided more for consumer
Timely and personalized service;In medical field, diagnostic accuracy and drug effectiveness can be improved;In government utility field, greatly
Data also begin to play the important function promoting economic development, maintain social stability etc..Data application is produced to life
In, it can effectively help people or enterprise to be made comparisons to information accurate judgement, to take appropriate action.Data analysis
It is that tissue purposefully collects data, analyze data, and makes the process of information.Namely refer to personal or enterprise in order to
Solves the problems such as decision-making or marketing in life production, the process that operation analysis method is handled data.
By taking macroeconomy FIELD Data as an example.The analysis of macroeconomy FIELD Data is related to wechat public number, academic intelligence
Internet content including storehouse website, finance and economics information website, industry portal website etc. is analyzed.For government department's online service
Positioning, service strategy and the influence of internet evaluation to vocational work provide data analysis support.With the quick hair of technology
Exhibition, people carry out data analysis using big data visual analyzing means, data are shown in a more intuitive way, from
Different dimension observed data, so as to data carry out deeper into observation and analysis.
Big data visual analyzing refers to while big data automatically analyzes method for digging, using supporting information visualization
User interface and analyze the man-machine interaction mode and technology of process, the computing capability of effective integration computer and the cognition of people
Ability, to obtain the insight for large-scale complex data set.
When current design is built and realizes the visual analyzing platform based on macroeconomy FIELD Data, can encounter several
Individual technical barrier.It is that the extraction of macroeconomy FIELD Data requires problem first, it is desirable to be according to setting from text retrieval system
The querying condition put(Essential condition:Crucial phrase+time range)Extract hit data.Next to that technical threshold problem
(When user is ignorant of sql query statements, does big data analysis again), it is finally user mutual, bandwagon effect and mode problem
(It is self-defined to show effect, flexibility and diversified ways of presentation).
At present, big data isomerous environment data syn-chronization instrument DataX and Sqoop, all it is that the data for solving isomerous environment are handed over
Change problem.DataX is the instrument of an exchange high speed data between database/file system of isomery, is realized any
Data handling system (RDBMS/Hdfs/Local filesystem) between data exchange, by data platform department of Taobao
Complete.Sqoop is an instrument for being used for mutually shifting the data in Hadoop and relevant database, can be by a pass
Be type database (such as:MySQL, Oracle, Postgres etc.) in data lead in the HDFS for entering Hadoop, also may be used
Entered so that HDFS data are led in relevant database.
But problems be present using above-mentioned instrument:It is that the extraction of macroeconomy FIELD Data requires problem first, it is desirable to
It is according to the querying condition of setting from text retrieval system(Crucial phrase+time range)Hit data are extracted, it is grand
Sight economic field data and general public sentiment class data maximum are not both that macroeconomic tendency is mainly studied in macroeconomy field
With influence macroeconomic various factors, therefore the data source time span extracted is long, and the industry field being related to compares
More, data volume is bigger, can not meet carrying based on macroeconomy FIELD Data using DataX and Sqoop instruments increased income etc.
Requirement is taken, so needing to design a kind of distributed extracting method to solve this problem.
Distributed system refer to more machines by network connection together, as generally serve upper layers.
Specifically, it would be desirable to which the problem of magnanimity computing capability could be handled splits into many fritters, then distributes to fritter same
Different calculate nodes is handled in set system, finally merges to obtain final result by the result of separate computations if necessary,
So this system is referred to as distributed system.Its interior joint refers to that one independently can complete one group of logic according to distributed protocol
Program individual, often fingering journey in engineering.It is completely independent and is mutually isolated between node, communication sole mode is by unreliable
Network.
Hive is namely based on a distributed system Hadoop Tool for Data Warehouse, can be by the data file of structuring
A database table is mapped as, and simple sql query functions are provided, sql sentences can be converted to MapReduce tasks and entered
Row operation.It is characterized in quickly realizing simple MapReduce statistics by class SQL statement, it is not necessary to develop special
MapReduce is applied, and is very suitable for the statistical analysis of data warehouse.And Spark SQL, as Apache Spark big data frames
A part for frame, it is mainly used in structural data processing and Spark data is performed with class SQL inquiry.By Spark SQL,
The data that different-format can be directed to perform ETL operations(Such as JSON, Parquet, database)Then specific inquiry behaviour is completed
Make.
But because the integration of user interaction functionality person of needing to use that the Open-Source Tools such as hive and spark sql provide possesses necessarily
SQL basis, in order to realize do not possess SQL basis analysis personnel remain able to carry out data analysis the problem of, so need
A kind of new distributed visible analysis method is designed to solve this problem.
The content of the invention
The application provides distributed extraction and visual analysis method and system based on economic field data, for solving
The problem of big data is less efficient, user's operation difficulty is too high and data analysis is not directly perceived enough is extracted in the prior art.
Distributed extraction and visual analysis method disclosed in the present application based on economic field data, including:
Distributed data extraction step:Back-end server receives user and extracts the instruction of big data and be sent to host node, main section
The big task for extracting data is cut into small task by point according to the instruction that receives according to some or multiple field dimensions of task,
And small task is distributed to different processing nodes, processing node is initiated according to the small task of distribution to text retrieval system please
Ask, host node stores the small task of generation into database, and the state of synchronous task in the process of running;
Distributed storage step:In the data set deposit data-base cluster that processing node returns to text retrieval system;
Distributed Calculation and analytical procedure:Background server receives the instruction of user and need is loaded from data-base cluster according to it
The data set wanted, filter data, analyze data and statistical analysis are then crossed, then in result set write into Databasce cluster;
Data load and caching step:After background server receives the request of client loading data, according to request from database
The associated metadata of middle reading task, memory table is created, is loaded data into according to parameter and from data-base cluster in memory table,
Feedback result after the completion of loading;
Result visualization shows step:By data, by chart etc., intuitively ways of presentation shows.
Preferably, in distributed data extraction step, the small task of generation assigns the priority of task according to a certain percentage,
The more high more preferential operation of task priority, the task of same levels arrives first according to FIFO first obtains scheduling strategy execution, according to processing
The configuration parameter of node, the different task of priority is given to different processing nodes in proportion;Handle the receiving thread of node
After receiving task, the dispatching algorithm that the scheduling of scheduling thread use priority, FIFO scheduling and equity dispatching are combined will receive
To task add in task queue, extraction data manipulation is performed according to the parameter of task and receives data.
Preferably, in Distributed Calculation and analytical procedure, instructed receiving analysis task of the user with query argument
Afterwards, sql query statements are assembled into according to the mapping relations analytic parameter of literary name section and entities field and splicing.
Preferably, visualize in step, the data for asking the current generation to need by way of the on-demand loading of front end,
And the data having requested that are cached by front end caching mechanism.
Preferably, step, including following fine division step are visualized,
The number according to corresponding to user pulls analysis field to the instruction to transmission acquisition request field from the background of dimension or number line
According to;
Shown in table form after getting data;
The subtype for being judged and being shown to select according to the number of the number of dimension axle field and number line field;
The configurable parameter of the type is shown according to the subtype of the selection of user, chart is generated according to the parameter of user configuration
And shown.
Distributed extraction and Visualized Analysis System disclosed in the present application based on economic field data, including:
Distributed data extraction module:For receive user extract big data instruction and be sent to host node, host node according to
The big task for extracting data is cut into small task by the instruction received according to some or multiple field dimensions of task, and will be small
Task distributes to different processing nodes, and processing node initiates to ask according to the small task of distribution to text retrieval system, main section
Point stores the small task of generation into database, and the state of synchronous task in the process of running;
Distributed Storage module:Data set for text retrieval system to be returned is stored in data-base cluster;
Distributed data calculates and analysis module:For receiving the instruction of user and needs being loaded from data-base cluster according to it
Data set, filter data, analyze data and statistical analysis are then crossed, then in result set write into Databasce cluster;
Data load and cache module:After request for receiving client loading data, read according to request from database
The associated metadata of task, memory table is created, loads data into memory table, has loaded according to parameter and from data-base cluster
Into rear feedback result;
Result visualization display module:For by data, by chart etc., intuitively ways of presentation to show.
Preferably, in distributed data extraction module, the small task of generation assigns the priority of task according to a certain percentage,
The more high more preferential operation of task priority, the task of same levels arrives first according to FIFO first obtains scheduling strategy execution, according to processing
The configuration parameter of node, the different task of priority is given to different processing nodes in proportion;Handle the receiving thread of node
After receiving task, the dispatching algorithm that the scheduling of scheduling thread use priority, FIFO scheduling and equity dispatching are combined will receive
To task add in task queue, extraction data manipulation is performed according to the parameter of task and receives data.
Preferably, in distributed data calculating and analysis module:Receiving analysis task of the user with query argument
After instruction, sql query statements are assembled into according to the mapping relations analytic parameter of literary name section and entities field and splicing.
Preferably, visualize in module, the data for asking the current generation to need by way of the on-demand loading of front end,
And the data having requested that are cached by front end caching mechanism.
Preferably, visualize in module,
(1)According to corresponding to user pulls analysis field to the instruction to transmission acquisition request field from the background of dimension or number line
Data;
(2)Shown in table form after getting data;
(3)The chart class for being judged and being shown to select according to the number of the number of dimension axle field and number line field
Type;
(4)The configurable parameter of the type is shown according to the subtype of the selection of user, is generated according to the parameter of user configuration
Chart is simultaneously shown.
Compared with prior art, the application has advantages below:
This application provides a kind of distributed data extraction method, the user based on big data framework need not write sql inquiries
The method that sentence can also carry out big data visual analyzing.(1)Cutting and allocation algorithm by task so that big data quantity
Distribution extraction is possibly realized;(2)Pre- subregion, the dynamic of storage and the optimization method of filter of database table can speed up
Parallel processing speeds;(3)Dynamic splices the operation difficulty that sql query statements make user reduce analysis;(4)User interface simultaneously
In simple operation ensure that the flexibility of business personnel's analysis, customized chart ways of presentation enhances visual presentation
The friendly of analysis result.
The application is applied to the distributed extraction and visualization of big data, the distribution for the economic field data that are particularly suitable for use in
Extraction and visualization.
Brief description of the drawings
Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as the limitation to the application.And whole
In individual accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 is distributed extraction and the signal of visual analysis method first embodiment of the application based on economic field data
Figure;
Fig. 2 is the distribution of distributed extraction and visual analysis method first embodiment of the application based on economic field data
The schematic diagram of data extraction;
Fig. 3 is distributed extraction and the signal of Visualized Analysis System second embodiment of the application based on economic field data
Figure.
Embodiment
It is below in conjunction with the accompanying drawings and specific real to enable the above-mentioned purpose of the application, feature and advantage more obvious understandable
Mode is applied to be described in further detail the application.
In the description of the present application, it is to be understood that term " first ", " second " are only used for describing purpose, and can not
It is interpreted as indicating or implies relative importance or imply the quantity of the technical characteristic indicated by indicating.Thus, define " the
One ", one or more this feature can be expressed or be implicitly included to the feature of " second "." multiple " are meant that two
Or two or more, unless otherwise specifically defined.Term " comprising ", "comprising" and similar terms are understood to out
The term of putting property, i.e., " including/including but not limited to ".Term "based" is " being based at least partially on ".Term " embodiment "
Represent " at least one embodiment ";Term " another embodiment " expression " at least one further embodiment ".The phase of other terms
Close during definition will be described below and provide.
With reference to figure 1, distributed extraction and visual analysis method first of the application based on economic field data are shown
The flow of embodiment.Data extract and visual analyzing whole process in be related to data and extract, store, calculating and the side of showing
Face.Four parts are indispensable, form one from the analysis system for extracting the overall process showed.Data are extracted and visual
The key step changed in analysis overall process includes:Distributed data extraction, distributed storage, Distributed Calculation and statistical analysis,
Data load and caching and result visualization displaying.
This method for optimizing embodiment comprises the following steps:
Step S101:The instruction that user extracts macroeconomy FIELD Data task is received, for example the querying condition of user's input is
Crucial phrase+time range, and be sent to host node, some or multiple word of the host node according to the instruction received according to task
The big task for extracting data is cut into small task by Duan Weidu, and small task is distributed to different processing nodes, handles node
To initiate to ask to text retrieval system according to the small task of distribution, host node stores the small task of generation into database, and
The state of synchronous task in the process of running.
Specifically, text retrieval system includes Solr the or Elasticsearch full-text search collection that store mass data
The system of group.Wherein Elasticsearch is real-time a distributed search and analysis engine, is built upon full-text search and draws
Hold up the search engine on the basis of Apache Lucene.It can be with high speed processing large-scale data.It can be used for full-text search,
Structured search and analysis, naturally it is also possible to be combined this three.Solr is the enterprise that increases income of Apache Lucene projects
Industry search platform.Its major function includes full-text search, hit sign, facet search, dynamic clustering, geo-database integration, and
Rich text(Such as Word, PDF)Processing.
As shown in Fig. 2 the framework of distributed data extraction includes three parts:Between host node, processing node, node
Communication.Host node and processing node contacts are got up to be formed the entirety for externally providing service by the communication between node.
Host node(master)Function include:The metadata of maintenance and management middle table(Table name, literary name section and type),
Including mapping relations and record information;Generation and distribution task, monitor task running status, the management of history log are abnormal to accuse
It is alert;The communication information of calculation procedure is handled, updates task status.
Handle node(slave)Function include:Specifying for task is received and read, efficiently and stably performs task;Deposit
Store up in data to data storehouse.
Communication between node is primarily referred to as the socket communications based on tcp agreements and sends with realizing high-performance, receives life
Order and data.Wherein, the message of communication needs definition command format standard.Also realize the work(such as heartbeat detection and disconnection reconnecting in addition
Energy.
The key point of this step is task cutting and distribution and tasks carrying strategy.Specifically, as shown in Fig. 2
This step includes following fine division step:
(1)Equably segmentation task:When host node receives the extraction data task of user(Such as extraction macroeconomy field number
According to task)Instruction after, had according to the querying condition of task with crucial phrase and time range(Time started is with the end of
Between)Rule, and because time range is continuous divisible dimension, temporally dimension and same time interval(The most granule of time
Degree, such as seven days)Big task is cut into small task.The time range of each small task is different and occurs without time-interleaving and appoints
Time range between business occurs without space, and the holding of other dimensions and big task is constant.All small task coverages
Summation is equal to the scope of big task.
(2)The small task of generation assigns the running priority level of task by a certain percentage, and priority divides five grades, respectively
For 1,3,5,7,9 grades, the more high more preferential operation task of number of degrees.The task of same levels is arrived first by FIFO must first dispatch plan
Slightly perform.
(3)According to the configuration parameter of processing node(By CPU core calculation), the different task of priority is given not in proportion
With node, at this moment task be assigned.The small task of host node generation is stored into relevant database, is needed in running
The state of synchronous task, and log.When certain processing node or host node occur abnormal or hang, the information of task will not
Lose.
(4)Send and receive task:Host node is responsible for distributed tasks(Batch sending)To each processing node, node is handled
Listening port simultaneously receives task.
(5)After the receiving thread of processing node receives the task of batch, task is put into candidate tasks pond, dispatches line
The dispatching algorithm that the scheduling of journey use priority, FIFO scheduling and equity dispatching are combined selects a small amount of appoint from candidate tasks pond
Business is added in task queue.
Specifically, processing node distinguishes processing Thread Count on the premise of always processing thread is set according to a certain percentage
Different grades of task is handled, the thread of same levels forms sets of threads.Each sets of threads is respectively from the task team of corresponding level
Task is claimed in row and is performed.For example priority is high, normal, basic presses 3:2:1 ratio, the high task of 3 thread process priority, 2
Task in individual processing priority, the low task of 1 processing priority.After processing node receives task, sort by priority
Task, the high priority of task of priority is added in high level task queue, the task of corresponding level is added after priority is low
Queue, and queue length sets threshold values(Default value is 10).The task of same levels is added to task queue(At task wait
Reason)When, time more early more preferential addition queue need to be generated by the generation time-sequencing of task, when task queue is expired, dormancy
Certain time(Several seconds or millisecond)The task that whether can add is reexamined, or waits task queue to have vacant position.Due to when certain
When there are multiple big tasks to need to extract data in the period, multiple small tasks corresponding to different big tasks, the life of task can be produced
There is point of priority into the time, there may exist and occur in some big task long-times when only FIFO, which is dispatched, is responsible for scheduler task
It cannot handle, and the small task of some big tasks priority treatment always(It is more early than other to generate the time), it may appear that starve property
Problem.In order to solve hungry sex chromosome mosaicism, addition task is to during task queue, needing all big tasks of poll(In the presence of small not handled
Business), a small amount of tasks is therefrom selected, and small task is selected by a certain percentage between big task, it is then added in queue.
(6)Task is taken out from queue by processing thread, processing thread performs extraction data manipulation according to the parameter of task(Hair
Send query statement, paging query), receive the business datum that full-text search cluster returns(From internet, collection is returned original
Text data, structuring or semi-structured business datum are converted to by pretreatment)Collection, business datum collection is then write number
According in storehouse, extract and complete such as the business datum of certain task, be then followed by handling a task.It is asynchronous that transmission is responsible for by transmission thread
Information reports the situation of certain task to host node.
(7)Host node receives the information of small task run state, collects the small task status and statistics of each processing node
The data volume extracted simultaneously judges whether all small tasks of big task run completion, updated if completing big task state and
Sum.
(8)As host node receives new extraction data command, return to step(1).
This step solves the requirement of high efficiency extraction data well, and is carried how to include the distribution of similar demand
Take system to provide method and be related to scheme, there is very strong application value.
Step S102:In the data set deposit data-base cluster that processing node returns to text retrieval system, and use number
Accelerate parallel processing speeds according to the pre- partitioning technique and dynamic memory of storehouse table.
Specifically, carrying out pre- subregion to table according to the pre- number of partitions, the random number generated according to the major key of data record writes
The specified partition of database.
By taking hbase as an example.One hbase table is corresponded into a kind of data source or a kind of function.All words in hbase tables
Content in section is stored using json forms, namely all fields and content of a record press the field string that json forms generate
It is stored in table all field, in order to dynamic analysis.Pre- subregion is carried out to hbase tables, the data major key of same task is write from memory
Recognize storage of the addition prefix using 0# as prefix;When data volume reaches to a certain degree, the major key of hbase tables need to add prefix [1-
9]|[a-z]+#。
Hbase table major key forms are:Prefix code # task id@record major key.
Wherein, prefix code is one-bit digital or letter, and " # " and "@" are bound symbol, and task id is certain extraction data
Task major key, record major key for certain task a certain bar record major key(MD5).
Such as: a#1000@1d128b617546ee05e126ad0b33381248
Prefix code is raw according to data ranging formula preCode=remarksCode [hash (key) % partionNum]
Into.
Prefix code array is
String[] remarksCode= {"0","1","2","3","4","5","6","7","8","9","a","b",
"c","d","e","f","g","h","i","j","k","l","m","n",
"o","p","q","r","s","t","u","v","w","x","y","z"};
In data ranging formula, preCode is prefix code, and remarksCode is the array of storage prefix code, and key is note
The major key of record, partionNum are the number for needing subregion, and hash refers to Hash table(Hash table, also cry hash table), it is root
The data structure directly to be conducted interviews according to key.Hash is recorded by the way that key is mapped to a position in table to access, to add
The speed searched soon.This mapping function is called hash function, and the array for depositing record is called hash table.Key is passed through one
Fixed algorithmic function is that so-called hash function is converted into an integer numeral, then just carries out the digital logarithm group length
Remainder, remainder result is just as the subscript of array, in value is stored in using the numeral as lower target array space.And working as makes
When inquiry with Hash table, exactly reuse hash function and key is converted into corresponding array index, and navigate to
The space obtains value, thus, it is possible to makes full use of the positioning performance of array to carry out data positioning.Directly use key
Hash values(The method for calculating key hash values can be with unrestricted choice, such as algorithm CRC32, MD5, or even local hash systems
System, such as java hashcode)The number of partitions positions the prefix code of subregion on mould.This algorithm is not only simple, and has
Good random distribution nature.
Table subregion uses the pre- subregion of prefix, the MD5 major keys Hash write-in specified partition of data record, can efficiently solve out
The defects of existing focus(Data write-in, read inequality).
The data volume in factor data source would generally be very big(Million or millions more than), default setting equably stores data to 36
In individual subregion(Or more subregion, two can be split into when the memory space of some subregion exceedes threshold values automatically), but analyze rank
Data caused by section are typically smaller than record number corresponding to data source, can be according to the feelings of the relative reduction of result data amount of analysis task
Condition, the small set of data volume is stored into partial-partition, rather than in 36 subregions of the deposit with different prefixes, dynamic point
The characteristics of area stores is advantageous to follow-up data loading.Such as:When data volume is less than 50,000, only storage to 5 subregions, then
PartionNum=5, according to data ranging formula calculate prefix code be 0 to 4 between one.
The design of hbase tables and dynamic storage method, the storage focus for solving the databases such as hbase of this step application are asked
Topic and quick reading data problem.
Step S103:Background server receives the instruction of user and loads the data set of needs from data-base cluster according to it,
Then filter data, analyze data and statistical analysis are crossed, then in result set write into Databasce cluster.
Specifically, the user instruction received meets the specification of definition.The specification of definition includes(1)The meta-attribute of field:Word
Section Chinese, field English name, field data types;(2)The data type of definition:date、datetime、long、
double、string、text;(3)The filtering computational methods of definition:It is equal to, be not equal to, be more than, be less than, be more than or equal to, is less than
Be equal to, scope, be empty, be not empty, include, do not include, regular expression etc..
Specifically, the condition for crossing filter data supports the combinational expression of the AND-OR INVERTER between field while supports field
Regular expression.The purpose of filtering is after being extracted from distributed data base, filters out and needs not participate in before analysis is started
The record of analysis, effectively reduce the data acquisition system of analysis.
The priority principle of processing data type is:
Int integers>Double floating types>Date date types>Datetime time types>String character string classes>
Text long text types.
Integer field contents are handled first, finally just handle the field contents of long text type.
The data of the field of different types of data are according to priority filtered, then statistical analysis again, effectively accelerates arithmetic speed,
Reduce internal memory usage amount.
Use priority filtering reason is that the data of different types of data perform the cpu that identical filter condition is consumed
Resource is different, and unnecessary resource can be effectively avoided when between the field in filter condition being "or" or NOT operation relation
Expense(cpu).When the relation of field 1 and the filter condition of field 2 is "or", the content of a certain field only therein meets
Then this record is hit filter condition(Need to participate in the record analyzed), this record quilt if all filter condition is unsatisfactory for
Filter out miss(It is shown to be the record for needing not participate in analysis).Can sequentially it be respectively compared twice when miss(Word
Section 1 is required for field 2), and number of comparisons when hitting is fewer than miss number, the resource overhead consumed during hit
It can also lack with the time.When the data type of field 1 be long and it is corresponding filtering computational methods be " being more than ", field 2 data
When type is text and corresponding filtering computational methods are " regular expression ", the resource that the content of filtered fields 1 is consumed is opened
Pin is smaller than field 2, and the time of computing is also few.When needing to handle big data quantity, expense and time that both add up can be put
It is more obviously big.
Such as when using same regular expression handling the data content of string and text fields respectively, because of string
The content-length of type is shorter than the content-length of text field, so the cpu clocks needed will quite lack.
Therefore, the data of the field of different types of data are according to priority filtered, then statistical analysis again, effectively accelerates computing
Speed, reduce internal memory usage amount.
Business literary name section is used in data analysis(Field English name)With the mapping relations of entities field.Because data
Analysis is that the field of development language is handled, rather than the literary name section in database.So need in database
Data are converted to the set of entity class.Data exchange(Data and the process write data in database are read from database)
In, the field in json forms need to be parsed, the value of field is imparted to the corresponding field of entity class.Uniform provisions business literary name section
Be advantageous to the normative standard of field management with the mapping relations of entities field.Briefly:Literary name section and entities field are one by one
It is corresponding.
In traditional Relational DataBase, most basic sql query statements such as SELECT field A, field B,
field C FROM table A WHERE field A>10, there is projection(field A, field B, field
C), data source (table A) and filter (field A>10, codition) three parts form, and are distributed corresponding sql and look into
Reuslt, data source, operation during inquiry, that is to say, that sql sentences press result-> data source-
>Operation order describes.But it is according to operation- during sql sentences are actually performed>data
source->Result order performs.
In a complete query statement form " select field from table where querying condition order by
Sort criteria group by are grouped condition " in, " where querying conditions ", " order by sort criterias " and " group by
Packet condition " is all not essential, it is necessary to only " select field from tables ".So fortune according to sql query statements
Row principle, dynamically splicing meet the sentence of sql standard queries grammers.
Specifically, background program instructs in the analysis task for receiving user(With query argument)Afterwards, json lattice are parsed
The parameter of formula(Three parts composition including query statement)Splicing assembling sql query statements.Splice in assembling process, use need to be differentiated
It is expression formula which family, which sets, and which is result row, and which is data source, there is which filter condition, then corresponding parameter
Content is converted to grammaticalness, the sentence fragment of literary style standard according to the mapping relations of literary name section and entities field, forms user
Expressed complete sentence.Most filter condition(The field being related in data source)The position of filter can be advanced to,
It need not be spliced in query statement, so efficiently reduce the resource used during analysis.Then performed using spark sql components
Result is simultaneously write hbase databases by query statement.So, the distributed advantages of spark have fully been played and have been solved
The shortcomings that spark sql.
In this step, user need not write sql sentences, it is only necessary to neatly select some or more numbers in front end page
According to source(Data extraction returns to be stored in the data set of database), optional some fields analyzed, can the multiple fields of multiselect divide
The dimension of group statistics, while filter condition may be selected(Support regular expression)Filter data is crossed, is then analyzed and is counted, and
Near real-time shows operation result, reduces the threshold of big data analysis.
In addition, the method for crossing filter data of this step solves the problems, such as the data filtering of complexity.
Step S104:After background server receives the request of client loading data, analytic parameter, from relevant database
The associated metadata of middle reading task, then creates memory table, then loaded data into according to parameter from data-base cluster in
Deposit in table, feedback result after the completion of loading.
Specifically, background server receives when showing result command of client user's transmission, looked into by jdbc interfaces
Ask certain analysis task in relevant database(Completion is analyzed)Result set metadata information.Pass through hbase clients
The result set data of the task are inquired about in interface polls hbase clusters.After returning result collection, web server is believed according to metadata
The Data Concurrent of breath parsing json forms is sent to front end.
Specifically, this step also takes metadata cache technology, distributed data load method, provides inquiry progress
The interface and preloading technology of bar.
The metadata cache technology refers to the memory resource limitation because of server, can not thus data be all saved in
In internal memory, retain a certain amount of data set or the data set in the range of one section of access time using algorithm according to nearest.In user
When sending repeatedly request, service end can need to detect data and metadata(The information of table)Whether it is deleted, if deleted, weighs
New loading data.Service end timing detects whether that the data needs for meeting deletion condition are deleted, and starts if condition meets
Handle thread and perform deletion task, metadata and corresponding data are deleted in the environment of affairs, it is ensured that client accesses data
When be not in abnormal conditions.The information cache of metadata improves the speed for accessing metadata in the internal memory of program.
The distributed data load method refers to when the number of partitions of the metadata of certain task(Stored corresponding to finger task
In the number of partitions of hbase tables)When having multiple, data are read in a distributed manner from hbase tables using spark(Read the finger of subregion
Fixed number evidence), then data distribution formula is loaded into the memory table of relevant database, is characterized in quick and high concurrent
Property.When the number of partitions only has one(Data volume is seldom), then loaded using local program, it is not necessary to use spark original
Cause:When spark asks the metadata of hbase tables, connecting zookeeper process needs the regular hour, between spark nodes
Communication similarly needs the time, is unfavorable for more efficiently completing the loading data of the task more quickly.The background program of service end
During startup, the context environmental for the spark that has been prepared in advance(Each calculate node has distributed resource, and executive process has been turned on),
The speed for loading data for the first time can so be lifted.
The interface for providing inquiry progress bar refers to the progress situation of Real-time Feedback loading procedure.
The preloading refers to, using before chart function, in advance load in memory table data, without using when
Just load data.
Step S105:By data, by chart etc., intuitively ways of presentation shows.
The emphasis that macroscopical big data visualization shows is that the economic information of digitization is passed through into straightaway chart, form
Data user of service is presented to Deng ways of presentation.Visualization shows should be noted 2 points in this course:First, how will
Volume data rapid requests are shown, second, the difference for how tackling a variety of data users of service shows demand, so as to
Flexibly show existing statistics.For above-mentioned 2 demands, in front end, visualization takes big data quantity caching in terms of showing
Mechanism and based on dynamic pull flexible visualization show method.
1. big data quantity caching mechanism
Big data visualizes asked data and all sent back in a manner of json.Front end is on the one hand logical in request process
The data that the mode of on-demand loading asks the current generation to need are crossed, to reduce the time to be expended in request process, improves and adds
Carry speed.On the other hand the data having requested that are cached by front end caching mechanism, reduces repeat to ask within a certain period of time
The number asked, the speed for improving data loading is reached with this.
The shortcomings that existing basis caching:Browser is exactly in itself cache policy to http request, but this caching
Two defects of mode:
(1) get requests can only be cached.
(2) while setting for caching is all specified in the header of rear end response.Present many service code logics are
Front end is concentrated on, front end exploitation this mode difficult to use is thus caused and caches.
The web front-end caching mechanism being related at present in the present embodiment has several aspects:
(1) the js files locally loaded are cached.On the one hand by setting url parameters cache in jQuery ajax methods:
true.datatype:" script " is cached local js data.On the other hand increase while using browser rs cache
Add and use Application Cache mechanism.Cached in units of file, and file has certain update mechanism.Its is specific
Method is to refer to the file of an appcache ending by manifest attributes in HTML heads.AppCache principle has
Two key points:Manifest attributes and manifest files.In HTML the and manifest files for quoting manifest files
The file to be cached listed finally all can be by browser rs cache.
(2) requested data are present in front page layout by way of Hidden field is set in the page.Follow-up one
Request in fixing time carries out real time data processing by reading the data in Hidden field.Reduce the request to backstage.
(3) local demand file is compressed process of compilation, reduces file size.
By above caching mechanism, on the one hand reduce the data transfer of redundancy, save flow;On the other hand wink is alleviated
Between congestion, reduce the requirement to original server.
2. method is showed based on the flexible visualization that dynamic pulls
The method for carrying out data visualization is that dimension based on data exhibiting and numerical value are carried out.Each needs are visualized
Multiple corresponding fields all be present in the data source showed.The field showed when carrying out visualization and showing by pulling needs analysis
To dimension axle or number line, the information corresponding to the field is just sent to backstage, Real time request while pulling and completing
To the data that show corresponding to the field, first parsing is presented in the page in table form.Give data user of service one basis
Visualization show.Field each time pulls the information that can all get the field in real time.
The field for being dragged to dimension and number line can be by the verification of one group of data type, existing data field class
Type has:date、datetime、long、double、string、text.Different types of field has different drop-down options,
It is directed to the option of the field.This method very great Cheng during dragging by asking a verification data to verify in real time
Reduce the process of data processing on degree.Be capable of dynamic flexible goes out different style sheets to different types of data exhibiting.
If it is desired to by the data of serializing by various types of pattern exhibition such as:Line Chart, block diagram, accumulation
Figure, pie chart, map etc., just need to realize by js dynamic configurations option on the basis of using visual control different
Visualization shows.This method utilizes echart visual controls during visualization shows, and increases in front end page more
Item configurableization option, allows user flexibly to select desired function, avoids user oneself from writing code, has widened and can be used
The colony of user, user oneself can select during custom-configuring, and reject showing for redundancy, reach clear with this
Effect of visualization.
Data drawing list forms special visualization and showed after being stored in instrument board, each visualized graphs is as one
Independent module, can be with arbitrary placement on the basis of using drag function technology, the weight showed according to user to economic data
Point carries out different distribution layouts.This method pulls module apart from the position of browser and itself element using dynamic calculation
It is wide high, the storage location after pulling is calculated, is lifted after Real time request and preserves data.
It is specific as follows to edit chart step:
(1) analysis field is pulled to dimension or number line.Request field verifies during dragging, lists optional drop-down choosing
.Pull to lift backward while completion and send data corresponding to the acquisition request field.
(2) json data conversions are shown into form ranks in table form after getting json data.Form most base
The visualization of plinth shows.
(3) after choosing the field for wanting analysis, carried out according to the number of the number of dimension axle field and number line field
Judge that data at this stage can be shown by which kind chart, several diagrammatic forms are chosen, show that the shape can be used
Formula.After user selects one of them to show type, it may appear that show the configurable parameter of form for this.Such as:The maximum of axle
Value minimum value scope, whether add boost line, whether increase zoom function, addition label etc..
(4) in the instrument board of file where the icon generated being saved in into the data source, data visualization is formed
Chart.The data storage of preservation is in background data base.It can also modify, change for the Visual Chart generated
When by what data ID asked back the Visual Chart show data, reappear in editor's chart working region.
The step has the following advantages that:
(1) different field that can be directed to a data source generates different visualized graphs, shows the different analyses in economic field
Emphasis.
(2) it can arbitrarily pull chart module and carry out arbitrary placement, the emphasis of prominent visualization special topic.
(3) flexible topology's real-time storage is visualized, reduces user's operating procedure, it is convenient and swift.
(4) threshold of big data analysis is reduced.
It is simple in order to describe for foregoing each method embodiment, therefore it is all expressed as to a series of combination of actions, but this
The technical staff in field should know that the application is not limited by described sequence of movement, because according to the application, it is some
Step can serially or simultaneously be performed using other;Secondly, those skilled in the art should also know, above method embodiment is equal
Belong to preferred embodiment, necessary to involved action and module not necessarily the application.
Reference picture 3, show distributed extraction and Visualized Analysis System one embodiment of the application based on economic field data
Structured flowchart, including:
The distributed data extraction module of module 301, for receiving the instruction of user's extraction big data and being sent to host node, main section
The big task for extracting data is cut into small task by point according to the instruction that receives according to some or multiple field dimensions of task,
The small task of generation assigns the priority of task according to a certain percentage, the more high more preferential operation of task priority, same levels
Task is arrived first according to FIFO and first obtains scheduling strategy execution, and according to the configuration parameter of processing node, the different task of priority is pressed
Ratio gives different processing nodes;After the receiving thread of processing node receives task, the scheduling of scheduling thread use priority,
The dispatching algorithm that FIFO is dispatched and equity dispatching is combined adds received task in task queue, according to the parameter of task
Initiate to ask and receive data to text retrieval system.Host node stores the small task of generation into database, and is running
During synchronous task state;
The Distributed Storage module of module 302:Data set for text retrieval system to be returned is stored in data-base cluster;
The distributed data of module 303 calculates and analysis module, for receiving analysis task life of the user with query argument
Order, sql query statements are assembled into according to the mapping relations analytic parameter of literary name section and entities field and splicing, and looked into according to sql
The data set that sentence loads needs from data-base cluster is ask, filter data, analyze data and statistical analysis are then crossed, then knot
In fruit collection write into Databasce;
The data of module 304 load and cache module:After request for receiving client loading data, analytic parameter, from relation
The associated metadata of task is read in type database, memory table is then created, data is then loaded from database according to parameter
Into memory table, feedback result after the completion of loading;
The result visualization display module of module 305:The data that the current generation needs are asked by way of the on-demand loading of front end, and
The data having requested that are cached by front end caching mechanism, and are used for following functions:
(1)According to corresponding to user pulls analysis field to the instruction to transmission acquisition request field from the background of dimension or number line
Data;
(2)Shown in table form after getting data;
(3)The chart class for being judged and being shown to select according to the number of the number of dimension axle field and number line field
Type;
(4)The configurable parameter of the type is shown according to the subtype of the selection of user, is generated according to the parameter of user configuration
Chart is simultaneously shown.
It should be noted that said apparatus embodiment belongs to preferred embodiment, involved unit and module might not
Necessary to being the application.
Each embodiment in this specification is described by the way of progressive, what each embodiment stressed be with
The difference of other embodiment, between each embodiment identical similar part mutually referring to.For the dress of the application
For putting embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is real referring to method
Apply the explanation of example part.Device and device embodiment described above is only schematical, wherein described be used as is divided
Module from part description can be or may not be it is physically separate, can be located at a place or or
It is distributed on multiple NEs.Some or all of module therein can be selected to realize this implementation according to the actual needs
The purpose of example scheme.Those of ordinary skill in the art are without creative efforts, you can to understand and implement.
Above to one kind ... method and apparatus provided herein, it is described in detail, tool used herein
Body example is set forth to the principle and embodiment of the application, and the explanation of above example is only intended to help and understands this Shen
Method and its core concept please;Meanwhile for those of ordinary skill in the art, according to the thought of the application, specific real
There will be changes in mode and application are applied, in summary, this specification content should not be construed as the limit to the application
System.
Claims (10)
1. distributed extraction and visual analysis method based on economic field data, it is characterised in that including:
Distributed data extraction step:Back-end server receives user and extracts the instruction of big data and be sent to host node, main section
The big task for extracting data is cut into small task by point according to the instruction that receives according to some or multiple field dimensions of task,
And small task is distributed to different processing nodes, processing node is initiated according to the small task of distribution to text retrieval system please
Ask, host node stores the small task of generation into database, and the state of synchronous task in the process of running;
Distributed storage step:In the data set deposit data-base cluster that processing node returns to text retrieval system;
Distributed Calculation and analytical procedure:Background server receives the instruction of user and need is loaded from data-base cluster according to it
The data set wanted, filter data, analyze data and statistical analysis are then crossed, then in result set write into Databasce cluster;
Data load and caching step:After background server receives the request of client loading data, according to request from database
The associated metadata of middle reading task, memory table is created, is loaded data into according to parameter and from data-base cluster in memory table,
Feedback result after the completion of loading;
Result visualization shows step:By data, by chart etc., intuitively ways of presentation shows.
2. distributed extraction and visual analysis method according to claim 1 based on economic field data, its feature
It is, in the distributed data extraction step, the small task of generation assigns the priority of task according to a certain percentage, and task is excellent
The more high more preferential operation of first level, the task of same levels arrives first according to FIFO first obtains scheduling strategy execution, according to processing node
Configuration parameter, the different task of priority is given to different processing nodes in proportion;The receiving thread of processing node receives
After task, the dispatching algorithm that the scheduling of scheduling thread use priority, FIFO scheduling and equity dispatching are combined is appointed what is received
Business is added in task queue, is performed extraction data manipulation according to the parameter of task and is received data.
3. distributed extraction and visual analysis method according to claim 1 or 2 based on economic field data, it is special
Sign is, in the Distributed Calculation and analytical procedure, after analysis task instruction of the user with query argument is received, and root
Sql query statements are assembled into according to the mapping relations analytic parameter of literary name section and entities field and splicing.
4. distributed extraction and visual analysis method according to claim 1 or 2 based on economic field data, it is special
Sign is, in the visual presentation step, the data of current generation needs is asked by way of the on-demand loading of front end, and lead to
Front end caching mechanism is crossed to be cached the data having requested that.
5. distributed extraction and visual analysis method according to claim 1 or 2 based on economic field data, it is special
Sign is, the visual presentation step, including following fine division step,
The number according to corresponding to user pulls analysis field to the instruction to transmission acquisition request field from the background of dimension or number line
According to;
Shown in table form after getting data;
The subtype for being judged and being shown to select according to the number of the number of dimension axle field and number line field;
The configurable parameter of the type is shown according to the subtype of the selection of user, chart is generated according to the parameter of user configuration
And shown.
6. distributed extraction and Visualized Analysis System based on economic field data, it is characterised in that including:
Distributed data extraction module:For receive user extract big data instruction and be sent to host node, host node according to
The big task for extracting data is cut into small task by the instruction received according to some or multiple field dimensions of task, and will be small
Task distributes to different processing nodes, and processing node initiates to ask according to the small task of distribution to text retrieval system, main section
Point stores the small task of generation into database, and the state of synchronous task in the process of running;
Distributed Storage module:Data set for text retrieval system to be returned is stored in data-base cluster;
Distributed data calculates and analysis module:For receiving the instruction of user and needs being loaded from data-base cluster according to it
Data set, filter data, analyze data and statistical analysis are then crossed, then in result set write into Databasce cluster;
Data load and cache module:After request for receiving client loading data, read according to request from database
The associated metadata of task, memory table is created, loads data into memory table, has loaded according to parameter and from data-base cluster
Into rear feedback result;
Result visualization display module:For by data, by chart etc., intuitively ways of presentation to show.
7. distributed extraction and Visualized Analysis System according to claim 6 based on economic field data, its feature
It is, in the distributed data extraction module, the small task of generation assigns the priority of task according to a certain percentage, and task is excellent
The more high more preferential operation of first level, the task of same levels arrives first according to FIFO first obtains scheduling strategy execution, according to processing node
Configuration parameter, the different task of priority is given to different processing nodes in proportion;The receiving thread of processing node receives
After task, the dispatching algorithm that the scheduling of scheduling thread use priority, FIFO scheduling and equity dispatching are combined is appointed what is received
Business is added in task queue, is performed extraction data manipulation according to the parameter of task and is received data.
8. the distribution based on economic field data according to claim 6 or 7 is extracted and Visualized Analysis System, it is special
Sign is, in the distributed data calculating and analysis module:Instructed receiving analysis task of the user with query argument
Afterwards, sql query statements are assembled into according to the mapping relations analytic parameter of literary name section and entities field and splicing.
9. the distribution based on economic field data according to claim 6 or 7 is extracted and Visualized Analysis System, it is special
Sign is led in the data in the visual presentation module, asking the current generation to need by way of the on-demand loading of front end
Front end caching mechanism is crossed to be cached the data having requested that.
10. the distribution based on economic field data according to claim 6 or 7 is extracted and Visualized Analysis System, its
It is characterised by, in the visual presentation module,
(1)According to corresponding to user pulls analysis field to the instruction to transmission acquisition request field from the background of dimension or number line
Data;
(2)Shown in table form after getting data;
(3)The chart class for being judged and being shown to select according to the number of the number of dimension axle field and number line field
Type;
(4)The configurable parameter of the type is shown according to the subtype of the selection of user, is generated according to the parameter of user configuration
Chart is simultaneously shown.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711113558.0A CN107766572A (en) | 2017-11-13 | 2017-11-13 | Distributed extraction and visual analysis method and system based on economic field data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711113558.0A CN107766572A (en) | 2017-11-13 | 2017-11-13 | Distributed extraction and visual analysis method and system based on economic field data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107766572A true CN107766572A (en) | 2018-03-06 |
Family
ID=61272268
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711113558.0A Pending CN107766572A (en) | 2017-11-13 | 2017-11-13 | Distributed extraction and visual analysis method and system based on economic field data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107766572A (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108536778A (en) * | 2018-03-29 | 2018-09-14 | 客如云科技(成都)有限责任公司 | A kind of data application shared platform and method |
CN108595574A (en) * | 2018-04-16 | 2018-09-28 | 上海达梦数据库有限公司 | Connection method, device, equipment and the storage medium of data-base cluster |
CN108763527A (en) * | 2018-05-31 | 2018-11-06 | 北京明朝万达科技股份有限公司 | A kind of searching method and device of business datum |
CN108829884A (en) * | 2018-06-27 | 2018-11-16 | 中国建设银行股份有限公司 | data mapping method and device |
CN109214132A (en) * | 2018-10-30 | 2019-01-15 | 中国运载火箭技术研究院 | A kind of big Throughput Asynchronous task processing system of non-coupled streaming towards LVC emulation |
CN109241085A (en) * | 2018-09-20 | 2019-01-18 | 潘丽华 | A kind of big data SQL query method for SolrCloud |
CN109271428A (en) * | 2018-09-11 | 2019-01-25 | 北京市计算中心 | Data pick-up method and method for exhibiting data based on geography information |
CN109657184A (en) * | 2018-12-19 | 2019-04-19 | 北京创鑫旅程网络技术有限公司 | Rich text processing method, device, server and computer-readable medium |
CN109684399A (en) * | 2018-12-24 | 2019-04-26 | 成都四方伟业软件股份有限公司 | Data bank access method, database access device and Data Analysis Platform |
CN109783717A (en) * | 2018-12-04 | 2019-05-21 | 北京奇艺世纪科技有限公司 | Query task processing method, system, server cluster and device, computer readable storage medium |
CN110209512A (en) * | 2019-05-30 | 2019-09-06 | 口碑(上海)信息技术有限公司 | Verification of data method and device based on multi-data source |
CN110222113A (en) * | 2019-06-20 | 2019-09-10 | 中国人民解放军陆军特种作战学院 | A kind of data extraction process visualization method for early warning |
CN110309214A (en) * | 2018-04-10 | 2019-10-08 | 腾讯科技(深圳)有限公司 | A kind of instruction executing method and its equipment, storage medium, server |
CN110457371A (en) * | 2019-08-13 | 2019-11-15 | 杭州有赞科技有限公司 | Data managing method, device, storage medium and system |
CN110515990A (en) * | 2019-07-23 | 2019-11-29 | 华信永道(北京)科技股份有限公司 | Data query methods of exhibiting and inquiry display systems |
CN110636164A (en) * | 2019-09-10 | 2019-12-31 | 广东小天才科技有限公司 | Strange number matching method, device, equipment and storage medium |
CN110851465A (en) * | 2019-11-15 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Data query method and system |
CN111125553A (en) * | 2019-11-22 | 2020-05-08 | 中国科学院城市环境研究所 | Intelligent urban built-up area extraction method supporting multi-source data |
CN111125063A (en) * | 2019-12-20 | 2020-05-08 | 无线生活(杭州)信息科技有限公司 | Method and device for rapidly verifying data migration among clusters |
CN111914008A (en) * | 2020-06-20 | 2020-11-10 | 中国建设银行股份有限公司 | Method and device for batch export of work order data, electronic equipment and medium |
CN112328668A (en) * | 2020-09-10 | 2021-02-05 | 北京锐安科技有限公司 | Data visualization implementation method, device, equipment and storage medium |
CN112347103A (en) * | 2020-11-05 | 2021-02-09 | 深圳市极致科技股份有限公司 | Data synchronization method and device, electronic equipment and storage medium |
CN113158064A (en) * | 2021-05-11 | 2021-07-23 | 两比特(北京)科技有限公司 | Cloud data short video data capturing and statistical summarization analysis algorithm |
CN113407633A (en) * | 2018-09-13 | 2021-09-17 | 华东交通大学 | Distributed data source heterogeneous synchronization method |
CN114741080A (en) * | 2022-04-24 | 2022-07-12 | 北京格睿德思信息科技有限公司 | Big data display method based on artificial intelligence |
CN115549862A (en) * | 2022-12-05 | 2022-12-30 | 大方智造(天津)科技有限公司 | MES system concurrency performance test data receiving method based on dynamic analysis |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103631922A (en) * | 2013-12-03 | 2014-03-12 | 南通大学 | Hadoop cluster-based large-scale Web information extraction method and system |
US20170140064A1 (en) * | 2015-11-13 | 2017-05-18 | International Business Machines Corporation | Query processing for xml data using big data technology |
CN107291948A (en) * | 2016-09-21 | 2017-10-24 | 广州特道信息科技有限公司 | A kind of access method of distributed newSQL databases |
-
2017
- 2017-11-13 CN CN201711113558.0A patent/CN107766572A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103631922A (en) * | 2013-12-03 | 2014-03-12 | 南通大学 | Hadoop cluster-based large-scale Web information extraction method and system |
US20170140064A1 (en) * | 2015-11-13 | 2017-05-18 | International Business Machines Corporation | Query processing for xml data using big data technology |
CN107291948A (en) * | 2016-09-21 | 2017-10-24 | 广州特道信息科技有限公司 | A kind of access method of distributed newSQL databases |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108536778B (en) * | 2018-03-29 | 2020-10-30 | 时时同云科技(成都)有限责任公司 | Data application sharing platform and method |
CN108536778A (en) * | 2018-03-29 | 2018-09-14 | 客如云科技(成都)有限责任公司 | A kind of data application shared platform and method |
CN110309214B (en) * | 2018-04-10 | 2023-06-23 | 腾讯科技(深圳)有限公司 | Instruction execution method and equipment, storage medium and server thereof |
CN110309214A (en) * | 2018-04-10 | 2019-10-08 | 腾讯科技(深圳)有限公司 | A kind of instruction executing method and its equipment, storage medium, server |
CN108595574A (en) * | 2018-04-16 | 2018-09-28 | 上海达梦数据库有限公司 | Connection method, device, equipment and the storage medium of data-base cluster |
CN108763527A (en) * | 2018-05-31 | 2018-11-06 | 北京明朝万达科技股份有限公司 | A kind of searching method and device of business datum |
CN108829884A (en) * | 2018-06-27 | 2018-11-16 | 中国建设银行股份有限公司 | data mapping method and device |
CN109271428A (en) * | 2018-09-11 | 2019-01-25 | 北京市计算中心 | Data pick-up method and method for exhibiting data based on geography information |
CN113407633A (en) * | 2018-09-13 | 2021-09-17 | 华东交通大学 | Distributed data source heterogeneous synchronization method |
CN109241085A (en) * | 2018-09-20 | 2019-01-18 | 潘丽华 | A kind of big data SQL query method for SolrCloud |
CN109241085B (en) * | 2018-09-20 | 2022-06-21 | 郴州职业技术学院 | Big data SQL query method for SolrCloud |
CN109214132A (en) * | 2018-10-30 | 2019-01-15 | 中国运载火箭技术研究院 | A kind of big Throughput Asynchronous task processing system of non-coupled streaming towards LVC emulation |
CN109214132B (en) * | 2018-10-30 | 2023-06-30 | 中国运载火箭技术研究院 | LVC simulation-oriented uncoupled streaming type large-flux asynchronous task processing system |
CN109783717B (en) * | 2018-12-04 | 2022-02-01 | 北京奇艺世纪科技有限公司 | Query task processing method, system, server cluster, device and computer readable storage medium |
CN109783717A (en) * | 2018-12-04 | 2019-05-21 | 北京奇艺世纪科技有限公司 | Query task processing method, system, server cluster and device, computer readable storage medium |
CN109657184A (en) * | 2018-12-19 | 2019-04-19 | 北京创鑫旅程网络技术有限公司 | Rich text processing method, device, server and computer-readable medium |
CN109684399A (en) * | 2018-12-24 | 2019-04-26 | 成都四方伟业软件股份有限公司 | Data bank access method, database access device and Data Analysis Platform |
CN110209512A (en) * | 2019-05-30 | 2019-09-06 | 口碑(上海)信息技术有限公司 | Verification of data method and device based on multi-data source |
CN110222113A (en) * | 2019-06-20 | 2019-09-10 | 中国人民解放军陆军特种作战学院 | A kind of data extraction process visualization method for early warning |
CN110515990B (en) * | 2019-07-23 | 2021-10-01 | 华信永道(北京)科技股份有限公司 | Data query display method and query display system |
CN110515990A (en) * | 2019-07-23 | 2019-11-29 | 华信永道(北京)科技股份有限公司 | Data query methods of exhibiting and inquiry display systems |
CN110457371A (en) * | 2019-08-13 | 2019-11-15 | 杭州有赞科技有限公司 | Data managing method, device, storage medium and system |
CN110636164A (en) * | 2019-09-10 | 2019-12-31 | 广东小天才科技有限公司 | Strange number matching method, device, equipment and storage medium |
CN110636164B (en) * | 2019-09-10 | 2022-07-22 | 广东小天才科技有限公司 | Strange number matching method, device, equipment and storage medium |
CN110851465A (en) * | 2019-11-15 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Data query method and system |
CN111125553B (en) * | 2019-11-22 | 2022-05-31 | 中国科学院城市环境研究所 | Intelligent urban built-up area extraction method supporting multi-source data |
CN111125553A (en) * | 2019-11-22 | 2020-05-08 | 中国科学院城市环境研究所 | Intelligent urban built-up area extraction method supporting multi-source data |
CN111125063A (en) * | 2019-12-20 | 2020-05-08 | 无线生活(杭州)信息科技有限公司 | Method and device for rapidly verifying data migration among clusters |
CN111125063B (en) * | 2019-12-20 | 2023-09-26 | 无线生活(杭州)信息科技有限公司 | Method and device for rapidly checking data migration among clusters |
CN111914008A (en) * | 2020-06-20 | 2020-11-10 | 中国建设银行股份有限公司 | Method and device for batch export of work order data, electronic equipment and medium |
CN112328668A (en) * | 2020-09-10 | 2021-02-05 | 北京锐安科技有限公司 | Data visualization implementation method, device, equipment and storage medium |
CN112347103A (en) * | 2020-11-05 | 2021-02-09 | 深圳市极致科技股份有限公司 | Data synchronization method and device, electronic equipment and storage medium |
CN112347103B (en) * | 2020-11-05 | 2024-04-12 | 深圳市极致科技股份有限公司 | Data synchronization method, device, electronic equipment and storage medium |
CN113158064A (en) * | 2021-05-11 | 2021-07-23 | 两比特(北京)科技有限公司 | Cloud data short video data capturing and statistical summarization analysis algorithm |
CN114741080A (en) * | 2022-04-24 | 2022-07-12 | 北京格睿德思信息科技有限公司 | Big data display method based on artificial intelligence |
CN115549862A (en) * | 2022-12-05 | 2022-12-30 | 大方智造(天津)科技有限公司 | MES system concurrency performance test data receiving method based on dynamic analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107766572A (en) | Distributed extraction and visual analysis method and system based on economic field data | |
US11120344B2 (en) | Suggesting follow-up queries based on a follow-up recommendation machine learning model | |
Emani et al. | Understandable big data: a survey | |
US11461320B2 (en) | Determining a user-specific approach for disambiguation based on an interaction recommendation machine learning model | |
US10885026B2 (en) | Translating a natural language request to a domain-specific language request using templates | |
Mishne et al. | Fast data in the era of big data: Twitter's real-time related query suggestion architecture | |
Kamat et al. | Distributed and interactive cube exploration | |
US7912812B2 (en) | Smart data caching using data mining | |
US11055270B2 (en) | Trash daemon | |
Duggal et al. | Big Data analysis: Challenges and solutions | |
Ma et al. | Big graph search: challenges and techniques | |
US20190034498A1 (en) | Determining a presentation format for search results based on a presentation recommendation machine learning model | |
US11170016B2 (en) | Navigating hierarchical components based on an expansion recommendation machine learning model | |
CN110245178A (en) | Marketing automation management platform system and its management method | |
CN106528169B (en) | A kind of Web system exploitation reusable method based on AnGo Dynamic Evolution Model | |
US10901811B2 (en) | Creating alerts associated with a data storage system based on natural language requests | |
WO2019142052A2 (en) | Elastic distribution queuing of mass data for the use in director driven company assessment | |
Tu et al. | IoT streaming data integration from multiple sources | |
Mohbey et al. | The impact of big data in predictive analytics towards technological development in cloud computing | |
Moulaison et al. | The disruptive qualities of Linked Data in the library environment: Analysis and recommendations | |
CN114996549A (en) | Intelligent tracking method and system based on active object information mining | |
Yan et al. | G-thinker: a general distributed framework for finding qualified subgraphs in a big graph with load balancing | |
US11803543B2 (en) | Lossless switching between search grammars | |
Yu | Data processing and development of big data system: a survey | |
Ahmed et al. | A study of big data and classification of nosql databases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180306 |