CN106611013A - Information searching method and system - Google Patents
Information searching method and system Download PDFInfo
- Publication number
- CN106611013A CN106611013A CN201510705372.9A CN201510705372A CN106611013A CN 106611013 A CN106611013 A CN 106611013A CN 201510705372 A CN201510705372 A CN 201510705372A CN 106611013 A CN106611013 A CN 106611013A
- Authority
- CN
- China
- Prior art keywords
- data
- distributed
- query
- file system
- size
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses an information searching method and system. The information searching method comprises the steps of performing task decomposition on a search statistical request after the search statistical request from a searching user is received, to obtain a corresponding map reduce task; reading data from corresponding distributed data storage nodes in a distributed file system according to the obtained map reduce task, wherein data storage adopts an RcFile format in an Hive data warehouse in the distributed file system; performing distributed calculation according to the data read from each distributed data storage node; merging calculation results of the respective distributed data storage nodes to obtain a searching result; and providing the searching result for the searching user. By adoption of the method and the system provided by the invention, cloud resource calculation nodes are automatically increased in an automatic resource adaptive manner, so that mobile online log searching and analyzing efficiency is improved.
Description
Technical field
The present invention relates to moving communicating field, more particularly to a kind of information query method and system.
Background technology
In mobile Internet, mobile terminal such as mobile phone and PAD terminals are carried out by telecom operators
Wireless mode is accessed, and realizes the access of network.In order to ensure public cybersecurity, telecom operators
To accessing by CTNET, CTWAP or WLAN mode, the network of Internet service is accessed
Trace Data is retained.Two types Trace Data is mainly contained, mobile Internet access user exist
Vestige and mobile Internet access user access after access to internet in certification login process when accessing the Internet
Trace Data during the Internet.
With the fast development and the popularization of intelligent mobile phone terminal of mobile Internet, the trace of mobile Internet access
Mark retained data amount breaks through to TB ranks by GB ranks.With the C network users of Fujian telecommunications 9,000,000,
The original online Trace Data of the daily generation in January, 2014 is 700G.According at least preservation 3
The Chinese Ministry of Industry and Information of the moon requires, then data total amount has growing trend in 70T.
It is that online Trace Data is associated after matching that existing technology solves framework, is loaded into relationship type
Data base is realizing the inquiry to user's internet behavior and statistical analysiss.When the data of traversal queries
In more than 10TB, centralized relevant database processing system occurs in that data query to amount
Positioning is slow, needs to expend close 6 hours during one week online Trace Data for retrieving a user,
The user network behavior analysiss of macroscopic view cannot be completed under existing framework.Even if current the Internet row
Already in a large number using hadoop big data Technology applications on user behavior analysis, but still need because
Portfolio rapid growth and the magnanimity of analytical data amount brought increases, constantly passively manual setting
Cloud computing resources and storage resource.
Therefore, it is necessary to the vestige retained data for proposing a kind of efficient mobile Internet access is retained and retrieved
Method is solving above-mentioned technical problem.
The content of the invention
The disclosure technical problem to be solved is how to propose a kind of trace of efficient mobile Internet access
Mark retained data is retained and search method solves mass data storage and presence in retrieval in prior art
Problem.
The disclosure provides a kind of information query method, including:Receiving looking into for inquiry user's transmission
After asking statistics request, query statistic request is carried out Task-decomposing to obtain corresponding map reduce
Task;According to the map reduce tasks for obtaining, it is distributed accordingly from distributed file system
Formula data memory node reads data;Hive data warehouses wherein in distributed file system
In, data storage adopts RcFile forms;According to the number that each Distributed Storage node reads
According to carrying out Distributed Calculation;The result of calculation of each Distributed Storage node is merged, with
Obtain Query Result;Query Result is supplied to into inquiry user.
Further, the method includes:The online Trace Data of Real-time Collection mobile subscriber;To adopt
The online Trace Data for collecting is loaded in the Hive data warehouses in distributed file system.
Further, the method includes:It is distributed the online for collecting Trace Data is loaded into
In the step in Hive data warehouses in file system, also include:
When the establishment of Hive Data Warehouses table is carried out, according to query statistic request task point
Solution number and system capability determine a point bucket number.
Further, the method includes:Using formula
Buckets=min (data_total_size/dfs.block.size, map_count)
Point bucket number Buckets is calculated, wherein min () is to take minimum value function,
Data_total_size is online Trace Data total amount, and dfs.block.size is distributed file system
The file block size of middle configuration, map_count is that query statistic request task decomposes number.
Further, the method includes:Online Trace Data includes recognizing for DPI device classes upload
The authentication information and the Internet that card information and internet access information, WAP gateway classification are uploaded is visited
Ask information, the NAT information of address conversion of the SYSLOG log servers upload of fire wall.
The present invention also provides a kind of information query system, including interface unit, query driven unit,
Data processing unit and distributed file system, wherein:Interface unit, for receiving inquiry user
The query statistic request of transmission;Query driven unit, for receiving inquiry user in interface unit
After the query statistic request of transmission, Task-decomposing is carried out to query statistic request, it is corresponding to obtain
Map reduce tasks, and the map reduce tasks for obtaining are supplied to into data-reading unit;
Data processing unit, for according to the map reduce tasks for obtaining, from distributed file system
In corresponding Distributed Storage node read data, read according to each Distributed Storage node
The data for taking carry out Distributed Calculation, and the result of calculation of each Distributed Storage node is closed
And, to obtain Query Result;And indicate that Query Result is supplied to inquiry user by interface unit;Point
Cloth file system, for distributed storage data, wherein the Hive in distributed file system
In data warehouse, data storage adopts RcFile forms.
Further, also include:Collecting unit and data load units, wherein:
Collecting unit, for the online Trace Data of Real-time Collection mobile subscriber;
Data load units, the online Trace Data for collecting unit to be collected is loaded into distribution
In Hive data warehouses in formula file system.
Further, data load units are specifically carrying out the establishment of Hive Data Warehouses table
When, number is decomposed according to query statistic request task and system capability determines a point bucket number.
Further, data load units utilize formula
Buckets=min (data_total_size/dfs.block.size, map_count)
Point bucket number Buckets is calculated, wherein min () is to take minimum value function,
Data_total_size is online Trace Data total amount, and dfs.block.size is distributed file system
The file block size of middle configuration, map_count is that query statistic request task decomposes number.
Further, Trace Data of surfing the Net includes the authentication information of DPI device classes upload and interconnection
Authentication information and internet access information, fire prevention that net access information, WAP gateway classification are uploaded
The NAT information of address conversion that the SYSLOG log servers of wall are uploaded.
Information query method and system that the disclosure is provided, are increased automatically in the way of automatic resource adaptation
Plus cloud Resource Calculation node, lift the efficiency of mobile Internet access log query and analysis.
Description of the drawings
Fig. 1 illustrates the flow chart of the information query method of one embodiment of the invention.
Fig. 2 illustrates the structural representation of the information query system of one embodiment of the invention.
Fig. 3 illustrates the schematic flow sheet of the information query method of one embodiment of the invention.
Fig. 4 illustrates the design sketch of the information query method of one embodiment of the invention.
Fig. 5 illustrates the structured flowchart of the information query system of one embodiment of the invention.
Fig. 6 illustrates the structured flowchart of the information query system of an alternative embodiment of the invention.
Specific embodiment
The present invention is described more fully with reference to the accompanying drawings, wherein illustrating the example of the present invention
Property embodiment.
Fig. 1 illustrates the flow chart of the information query method of one embodiment of the invention.Such as Fig. 1 institutes
Show, the method mainly includes:
Step 100, after the query statistic request that inquiry user sends is received, to query statistic
Request carries out Task-decomposing, to obtain corresponding map reduce tasks.
Step 102 is corresponding from distributed file system according to the map reduce tasks for obtaining
Distributed Storage node read data;Hive numbers wherein in distributed file system
According in warehouse, data storage adopts RcFile forms.
In one embodiment, the online Trace Data of Real-time Collection mobile subscriber;By what is collected
Online Trace Data is loaded in the Hive data warehouses in distributed file system.
In one embodiment, Trace Data of surfing the Net includes the authentication information that DPI device classes are uploaded
The authentication information uploaded with internet access information, WAP gateway classification and internet access letter
The NAT information of address conversion that breath, the SYSLOG log servers of fire wall are uploaded.
In one embodiment, when carrying out Hive Data Warehouses table and creating, according to looking into
Ask statistics request task and decompose number and a system capability determination point bucket number.
In one embodiment, it is possible to use formula
Buckets=min (data_total_size/dfs.block.size, map_count)
Point bucket number Buckets is calculated, wherein min () is to take minimum value function,
Data_total_size is online Trace Data total amount, and dfs.block.size is distributed file system
The file block size of middle configuration, map_count is that query statistic request task decomposes number.This
Sample, in the way of automatic resource adaptation cloud Resource Calculation node is increased automatically, is conducive to the later stage to carry out
Retrieval.
Step 104, according to the data that each Distributed Storage node reads distributed meter is carried out
Calculate.The memory node that specifically can be formed according to point bucket algorithm carry out distribution calculating, so can be with
Lift the efficiency of mobile Internet access log query and analysis.
Step 106, the result of calculation of each Distributed Storage node is merged, to obtain
Query Result.
Step 108, by Query Result inquiry user is supplied to.
Information query method provided in an embodiment of the present invention, can give full play to cloud computing resource pool and
The distributed big datas of Hadoop process two kinds of technical advantages, and in data warehouse Hive optimization number is used
According to a point bucket algorithm, and vestige categorical data of surfing the Net is stored using Rcfile compressed formats, its
In, Hive RcFile compress the use of storage format, compare nature TextFile lattice in Hadoop
Formula, equal number data can save 2/3 memory space;Hive divides bucket algorithm using optimization data,
Compare in Hive natures and stored regardless of bucket, the service inquiry time is substantially improved.
Fig. 2 illustrates the structural representation of the information query system of one embodiment of the invention.The present invention
There is provided at a kind of distributed big datas of HADOOP of structure on X86-based cloud computing resource pool
Reason system, in the way of automatic resource adaptation, increases cloud Resource Calculation node automatically, lifts movement
The efficiency that internet log is inquired about and analyzed.As shown in Fig. 2 the system includes:Log acquisition module
27th, the cloud computing resource pool 21 of X86-based, HDFS distributed file systems 22, map
Reduce23, PIC enquiry module 25, HIVE statistical analysis modules 26.Wherein, log collection
Module 27 is responsible for the module of mobile Internet access Trace Data collection, the collection of mobile Internet access Trace Data
Module is deployed on the physical equipment for possessing express network access, is responsible for the DPI equipment packet domain
Classification upload authentication information and internet access information, WAP gateway classification upload authentication information,
Internet access information (including proxy information), the SYSLOG log servers of fire wall are uploaded
NAT information of address conversion is collected and associates.
Big data query analysis module is deployed on the cloud computing resource pool 21 of x86 frameworks, fully
Using the flexible computing resource dispatching of cloud computing Iaas, big data query analysis module will be moved
The Trace Data for finishing is associated in dynamic online Trace Data acquisition module be loaded into Hive data bins
Storehouse, data storage adopts RcFile forms, the file storage lattice commonly used in Hadoop platform system
Formula has the TextFile for supporting text and supports binary SequenceFile etc., and they are belonged to
Row storage mode.What RCFile (Record Columnar File) storage organization was followed is " first water
The design concept of flat division, then vertical division ".
First, RCFile possesses the data loading and adaptive load energy equivalent to row storage
Power;Secondly, the reading optimization of RCFile can avoid unnecessary row from reading when form is scanned,
Test shows that as a rule it possesses better performance than other structures;Again, RCFile
Using the compression of row dimension, therefore, it is possible to effectively lift memory space utilization rate, it is however generally that,
Hive RcFile compress the use of storage format, compare nature TextFile forms in Hadoop,
Equal number data can save 2/3 memory space.
Carrying out, tables of data establishment time-division bucket number computing formula is as follows:
Buckets=min (data_total_size/dfs.block.size, map_count)
Wherein buckets is a point bucket number;Data_total_size is data total size;
Dfs.block.size is the fast size of file configured in hdfs.Map_count is service inquiry task
Decompose number.
Determine a point bucket number according to above-mentioned formula, carry out tables of data in data warehouse Hive and create
When, the matching of follow-up business query decomposition and existing system capacity scheme can be taken into full account, pass through
The test repeatedly of commensurability data, has reached the optimization balance of time and resource, can effectively shorten
The time that later retrieval is used.
Fig. 3 illustrates the schematic flow sheet of the information query method of one embodiment of the invention.Such as Fig. 3
Shown, the method includes:
Step 301, user sends query statistic request by Web interface31.
Process user to ask by the query statistic that web interface send, and the inquiry is united
Meter request is sent to Hive Drive32.
Step 302, Hive Drive32 task resolution engines.
Specifically, Hive Drive32 decompose and translation and inquiry statistics request, please by the query statistic
Ask decomposition and be translated as map reduce (mapping reduction) task.
Step 303, Map reduce33 are performed various according to the dependence of task
Mapreduce tasks.
Specifically, a mapreduce task is all serialized to a plan.xml file
In, in being then loaded into job cache, and each several part parsing plan.xml (unserializing),
And associative operation is performed, result is put into into interim position, then by DML (data manipulation languages
Speech) it is transferred to specified location.
Step 304, HDFS (Hadoop Distributed File System, distributed field system
System) transfer Distributed Storage node data and carry out Distributed Calculation, the wherein distributed storage
The data of node are the data obtained according to point bucket algorithm.
Step 305, Map reduce33 merge the result of calculation of each node.
Step 306, Hive Drive32 returns represent result and give Web interface31, to pass through
Web interface31 will represent result presentation to inquiry user.
Fig. 4 illustrates the design sketch of the information query method of one embodiment of the invention, with Fujian telecommunications
As a example by the collection analysises of mobile Internet access Trace Data.The C network users of Fujian telecommunications 9,000,000,2014
The original online Trace Data of the daily generation in year January is 700G, according at least preserving 3 months
Chinese Ministry of Industry and Information require, data total amount in 70T, according to the framework of the present invention, log collection mould
Block and each data source are interconnected by high speed fibre, and use physical server, realize data
Converge and associate.
The essential core module of the present invention, is carried on Fujian telecommunication service cloud computing resource pool, uses
The virtualization calculating platform of vmware vspher, opens 6 process nodes.Using of the invention real
When point bucket algorithm for applying example is stored, the roaming access model, hand basket for user closes on mould
Type, highest access purpose model etc. and realize behavior analysiss, in the data query to 10,000,000,000,
Test query result is 1201 seconds, compares former centralized relational database processing system, there is big
Width is lifted.
Fig. 5 illustrates the structured flowchart of the information query system of one embodiment of the invention, the system
500 include interface unit 501, query driven unit 502, data processing unit 503 and distributed
File system 504, wherein:Interface unit 501 is used to receive the query statistic that inquiry user sends
Request;Query driven unit 502 is used to receive the inquiry system that inquiry user sends in interface unit
After meter request, Task-decomposing is carried out to query statistic request, to obtain corresponding map reduce
Task, and the map reduce tasks for obtaining are supplied to into data-reading unit;Data processing list
Unit 503, it is corresponding from distributed file system for according to the map reduce tasks for obtaining
Distributed Storage node reads data, according to the data that each Distributed Storage node reads
Distributed Calculation is carried out, the result of calculation of each Distributed Storage node is merged, with
To Query Result;And indicate that Query Result is supplied to inquiry user by interface unit;Distributed document
System 504, for distributed storage data, wherein the Hive data in distributed file system
In warehouse, data storage adopts RcFile forms.
In one embodiment, interface unit 501 can be Web interface, query driven list
Unit 502 can be Hive Drive, data processing unit can be Map reduce, distributed text
Part system can be HDFS.
In one embodiment, the device also includes:Collecting unit 505 and data load units
506, wherein:Collecting unit 505, for the online Trace Data of Real-time Collection mobile subscriber;Number
According to load units 506, the online Trace Data for collecting unit to be collected is loaded into distributed
In Hive data warehouses in file system.
In one embodiment, data load units 506 are specifically in Hive data warehouses are carried out
When tables of data is created, number is decomposed according to query statistic request task and system capability determines point bucket
Number.
In one embodiment, data load units 506 utilize formula
Buckets=min (data_total_size/dfs.block.size, map_count)
Point bucket number Buckets is calculated, wherein min () is to take minimum value function,
Data_total_size is online Trace Data total amount, and dfs.block.size is distributed file system
The file block size of middle configuration, map_count is that query statistic request task decomposes number.
In one embodiment, Trace Data of surfing the Net includes the authentication information that DPI device classes are uploaded
The authentication information uploaded with internet access information, WAP gateway classification and internet access letter
The NAT information of address conversion that breath, the SYSLOG log servers of fire wall are uploaded.
Fig. 6 illustrates the structured flowchart of the information query system of an alternative embodiment of the invention, the letter
Breath inquiry system 600 can be possess the host server of computing capability, personal computer PC,
Or portable portable computer, mobile terminal or other-end etc..The present invention is embodied as
Example is not limited implementing for calculate node.
Information query system 600 includes processor (processor) 601, communication interface
(Communications Interface) 602, memorizer (memory) 603 and bus 604.Its
In, processor 601, communication interface 602 and memorizer 603 complete phase by bus 604
Communication between mutually.
Communication interface 602 is used for and network device communications, and wherein the network equipment includes such as virtual machine
Administrative center, shared storage etc..
Processor 601 is used for configuration processor.Processor 601 can be a central processing unit
CPU, or can be application-specific integrated circuit ASIC (Application Specific Integrated
Circuit), or be arranged to implement the embodiment of the present invention one or more integrated circuits.
Memorizer 603 is used to deposit file.Memorizer 603 can be stored comprising high-speed RAM
Device, also can also include nonvolatile memory (non-volatile memory), for example, at least one
Disk memory.Memorizer 603 can also be memory array.Memorizer 603 is also possible to be divided
Block, and block can be combined into virtual volume by certain rule.
In one embodiment, said procedure can be the program generation for including computer-managed instruction
Code.The program is particularly used in:It is right after the query statistic request that inquiry user sends is received
Query statistic request carries out Task-decomposing, to obtain corresponding map reduce tasks;According to
The map reduce tasks for arriving, the corresponding Distributed Storage section from distributed file system
Point reads data;In Hive data warehouses wherein in distributed file system, data storage
Using RcFile forms;The data read according to each Distributed Storage node carry out distributed
Calculate;The result of calculation of each Distributed Storage node is merged, to obtain inquiry knot
Really;Query Result is supplied to into inquiry user.
In one specifically embodiment, the method also includes:Real-time Collection mobile subscriber's is upper
Net Trace Data;The Hive online Trace Data for collecting being loaded in distributed file system
In data warehouse.
In one specifically embodiment, the online for collecting Trace Data is being loaded into into distribution
In the step in Hive data warehouses in formula file system, also include:Carrying out Hive data
When tables of data is created in warehouse, number is decomposed according to query statistic request task and system capability determines
Divide bucket number.
In one specifically embodiment, using formula
Buckets=min (data_total_size/dfs.block.size, map_count)
Point bucket number Buckets is calculated, wherein min () is to take minimum value function,
Data_total_size is online Trace Data total amount, and dfs.block.size is distributed file system
The file block size of middle configuration, map_count is that query statistic request task decomposes number.
In one specifically embodiment, Trace Data of surfing the Net includes what DPI device classes were uploaded
Authentication information and the Internet that authentication information and internet access information, WAP gateway classification are uploaded
The NAT information of address conversion that access information, the SYSLOG log servers of fire wall are uploaded.
Those of ordinary skill in the art are it is to be appreciated that each example in embodiment described herein
Property unit and algorithm steps, can be with electronic hardware or the knot of computer software and electronic hardware
Close to realize.These functions are realized with hardware or software form actually, depending on technical scheme
Application-specific and design constraint.Professional and technical personnel can select not for specific application
With method realizing described function, but this realize it is not considered that beyond the model of the present invention
Enclose.
If function being realized using in the form of computer software and as independent production marketing or being used
When, then to a certain extent it is believed that all or part of technical scheme is (such as to existing
Have the part that technology contributes) embody in form of a computer software product.The computer
Software product is generally stored inside in the non-volatile memory medium of embodied on computer readable, including some fingers
Order is used so that computer equipment (can be personal computer, server or the network equipment
Deng) perform various embodiments of the present invention method all or part of step.And aforesaid storage medium bag
Include USB flash disk, portable hard drive, read only memory (ROM, Read-Only Memory), random
Access memorizer (RAM, Random Access Memory), magnetic disc or CD etc. are various
Can be with the medium of store program codes.
Description of the invention is given for the sake of example and description, and is not exhaustively
Or the form disclosed in limiting the invention to.Many modifications and variations are for the common skill of this area
It is obvious for art personnel.It is to more preferably illustrate the principle of the present invention to select and describe embodiment
And practical application, and one of ordinary skill in the art is made it will be appreciated that the present invention is suitable so as to design
In the various embodiments with various modifications of special-purpose.
Claims (10)
1. a kind of information query method, it is characterised in that include:
After the query statistic request that inquiry user sends is received, query statistic request is carried out appointing
Business is decomposed, to obtain corresponding map reduce tasks;
It is corresponding distributed from distributed file system according to the map reduce tasks for obtaining
Data memory node reads data;Hive data warehouses wherein in distributed file system
In, data storage adopts RcFile forms;
Distributed Calculation is carried out according to the data that each Distributed Storage node reads;
The result of calculation of each Distributed Storage node is merged, to obtain Query Result;
Query Result is supplied to into inquiry user.
2. method according to claim 1, it is characterised in that also include:
The online Trace Data of Real-time Collection mobile subscriber;
The Hive data bins online Trace Data for collecting being loaded in distributed file system
In storehouse.
3. method according to claim 2, it is characterised in that
In the Hive data being loaded into the online for collecting Trace Data in distributed file system
In step in warehouse, also include:
When the establishment of Hive Data Warehouses table is carried out, according to query statistic request task point
Solution number and system capability determine a point bucket number.
4. method according to claim 3, it is characterised in that
Using formula
Buckets=min (data_total_size/dfs.block.size, map_count)
Point bucket number Buckets is calculated, wherein min () is to take minimum value function,
Data_total_size is online Trace Data total amount, and dfs.block.size is distributed file system
The file block size of middle configuration, map_count is that query statistic request task decomposes number.
5. method according to claim 2, it is characterised in that
Online Trace Data includes the authentication information of DPI device classes upload and internet access letter
Authentication information and internet access information, fire wall that breath, WAP gateway classification are uploaded
The NAT information of address conversion that SYSLOG log servers are uploaded.
6. a kind of information query system, it is characterised in that including interface unit, query driven list
Unit, data processing unit and distributed file system, wherein:
Interface unit, for receiving the query statistic request that inquiry user sends;
Query driven unit, the query statistic for receiving inquiry user's transmission in interface unit please
After asking, Task-decomposing is carried out to query statistic request, appointed with obtaining corresponding map reduce
Business, and the map reduce tasks for obtaining are supplied to into data-reading unit;
Data processing unit, for according to the map reduce tasks for obtaining, from distributed document
Corresponding Distributed Storage node reads data in system, according to each Distributed Storage section
The data that point reads carry out Distributed Calculation, and the result of calculation of each Distributed Storage node is entered
Row merges, to obtain Query Result;And indicate that Query Result is supplied to inquiry to use by interface unit
Family;
Distributed file system, for distributed storage data, wherein in distributed file system
Hive data warehouses in, data storage adopt RcFile forms.
7. system according to claim 6, it is characterised in that also include:Collecting unit and
Data load units, wherein:
Collecting unit, for the online Trace Data of Real-time Collection mobile subscriber;
Data load units, the online Trace Data for collecting unit to be collected is loaded into distribution
In Hive data warehouses in formula file system.
8. system according to claim 7, it is characterised in that
Data load units specifically when carrying out Hive Data Warehouses table and creating, according to looking into
Ask statistics request task and decompose number and a system capability determination point bucket number.
9. system according to claim 8, it is characterised in that
Data load units utilize formula
Buckets=min (data_total_size/dfs.block.size, map_count)
Point bucket number Buckets is calculated, wherein min () is to take minimum value function,
Data_total_size is online Trace Data total amount, and dfs.block.size is distributed file system
The file block size of middle configuration, map_count is that query statistic request task decomposes number.
10. method according to claim 7, it is characterised in that
Online Trace Data includes the authentication information of DPI device classes upload and internet access letter
Authentication information and internet access information, fire wall that breath, WAP gateway classification are uploaded
The NAT information of address conversion that SYSLOG log servers are uploaded.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510705372.9A CN106611013A (en) | 2015-10-27 | 2015-10-27 | Information searching method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510705372.9A CN106611013A (en) | 2015-10-27 | 2015-10-27 | Information searching method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106611013A true CN106611013A (en) | 2017-05-03 |
Family
ID=58615498
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510705372.9A Pending CN106611013A (en) | 2015-10-27 | 2015-10-27 | Information searching method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106611013A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408559A (en) * | 2018-10-09 | 2019-03-01 | 北京易观智库网络科技有限公司 | Retain the method, apparatus and storage medium of analysis |
CN110222533A (en) * | 2019-06-17 | 2019-09-10 | 英联(厦门)智能数据有限公司 | Distributed data security application method, system and electronic equipment |
CN110929081A (en) * | 2019-11-28 | 2020-03-27 | 浙江大华技术股份有限公司 | Picture query method, computer equipment and storage medium |
CN111046013A (en) * | 2019-11-12 | 2020-04-21 | 上海麦克风文化传媒有限公司 | Cold data full storage and query architecture |
CN111797310A (en) * | 2020-06-19 | 2020-10-20 | 北京达佳互联信息技术有限公司 | Behavior review method and device, electronic equipment and storage medium |
WO2021148014A1 (en) * | 2020-01-23 | 2021-07-29 | 飞诺门阵(北京)科技有限公司 | Task processing method and apparatus, and electronic device |
RU2794969C1 (en) * | 2020-01-23 | 2023-04-26 | Новнет Компютинг Систем Тек Ко., Лтд. | Method, device and electronic device for task processing |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101183368A (en) * | 2007-12-06 | 2008-05-21 | 华南理工大学 | Method and system for distributed calculating and enquiring magnanimity data in on-line analysis processing |
CN103678520A (en) * | 2013-11-29 | 2014-03-26 | 中国科学院计算技术研究所 | Multi-dimensional interval query method and system based on cloud computing |
CN104462609A (en) * | 2015-01-06 | 2015-03-25 | 福州大学 | RDF data storage and query method combined with star figure coding |
-
2015
- 2015-10-27 CN CN201510705372.9A patent/CN106611013A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101183368A (en) * | 2007-12-06 | 2008-05-21 | 华南理工大学 | Method and system for distributed calculating and enquiring magnanimity data in on-line analysis processing |
CN103678520A (en) * | 2013-11-29 | 2014-03-26 | 中国科学院计算技术研究所 | Multi-dimensional interval query method and system based on cloud computing |
CN104462609A (en) * | 2015-01-06 | 2015-03-25 | 福州大学 | RDF data storage and query method combined with star figure coding |
Non-Patent Citations (1)
Title |
---|
TOM WHITE: "《Hadoop权威指南(第3版)》", 31 January 2015, 清华大学出版社 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408559A (en) * | 2018-10-09 | 2019-03-01 | 北京易观智库网络科技有限公司 | Retain the method, apparatus and storage medium of analysis |
CN110222533A (en) * | 2019-06-17 | 2019-09-10 | 英联(厦门)智能数据有限公司 | Distributed data security application method, system and electronic equipment |
CN110222533B (en) * | 2019-06-17 | 2021-08-13 | 英联(厦门)金融技术服务股份有限公司 | Distributed data security application method and system and electronic equipment |
CN111046013A (en) * | 2019-11-12 | 2020-04-21 | 上海麦克风文化传媒有限公司 | Cold data full storage and query architecture |
CN111046013B (en) * | 2019-11-12 | 2024-04-12 | 上海麦克风文化传媒有限公司 | Cold data full-quantity storage and query architecture |
CN110929081A (en) * | 2019-11-28 | 2020-03-27 | 浙江大华技术股份有限公司 | Picture query method, computer equipment and storage medium |
WO2021148014A1 (en) * | 2020-01-23 | 2021-07-29 | 飞诺门阵(北京)科技有限公司 | Task processing method and apparatus, and electronic device |
RU2794969C1 (en) * | 2020-01-23 | 2023-04-26 | Новнет Компютинг Систем Тек Ко., Лтд. | Method, device and electronic device for task processing |
US11706097B2 (en) | 2020-01-23 | 2023-07-18 | Novnet Computing System Tech Co., Ltd. | Task processing method applied to network topology, electronic device and storage medium |
CN111797310A (en) * | 2020-06-19 | 2020-10-20 | 北京达佳互联信息技术有限公司 | Behavior review method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106611013A (en) | Information searching method and system | |
CN102436513B (en) | Distributed search method and system | |
CN103177055B (en) | It is stored as row storage and row stores the hybrid database table of the two | |
CN110431545A (en) | Inquiry is executed for structural data and unstructured data | |
CN107038207A (en) | A kind of data query method, data processing method and device | |
CN109902216A (en) | A kind of data collection and analysis method based on social networks | |
CN106209989B (en) | Spatial data concurrent computational system and its method based on spark platform | |
CN104268143B (en) | The treating method and apparatus of XML data | |
CN104516979A (en) | Data query method and data query system based on quadratic search | |
CN104598557A (en) | Method and device for data rasterization and method and device for user behavior analysis | |
CN104408100B (en) | The compression method of structured web site daily record | |
CN103514205A (en) | Mass data processing method and system | |
CN107391502A (en) | The data query method, apparatus and index structuring method of time interval, device | |
CN109151824A (en) | A kind of library data service extension system and method based on 5G framework | |
CN107480205A (en) | A kind of method and apparatus for carrying out data partition | |
CN106021583A (en) | Statistical method and system for page flow data | |
CN106570153A (en) | Data extraction method and system for mass URLs | |
CN105786941B (en) | Information mining method and device | |
CN109903122A (en) | House prosperity transaction information processing method, device, equipment and storage medium | |
CN116166191A (en) | Integrated system of lake and storehouse | |
Xia et al. | Optimizing an index with spatiotemporal patterns to support GEOSS Clearinghouse | |
CN108021607A (en) | A kind of wireless city Audit data off-line analysis method based on big data platform | |
CN108804502A (en) | Big data inquiry system, method, computer equipment and storage medium | |
CN106156021A (en) | Space time correlation information generating method and the server performing it | |
CN106570152A (en) | Mobile phone number volume extracting method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170503 |
|
RJ01 | Rejection of invention patent application after publication |