CN104298771B - A kind of magnanimity web daily record datas inquiry and analysis method - Google Patents

A kind of magnanimity web daily record datas inquiry and analysis method Download PDF

Info

Publication number
CN104298771B
CN104298771B CN201410596395.6A CN201410596395A CN104298771B CN 104298771 B CN104298771 B CN 104298771B CN 201410596395 A CN201410596395 A CN 201410596395A CN 104298771 B CN104298771 B CN 104298771B
Authority
CN
China
Prior art keywords
data
hive
daily record
analysis
magnanimity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410596395.6A
Other languages
Chinese (zh)
Other versions
CN104298771A (en
Inventor
马廷淮
瞿晶晶
田伟
薛羽
曹杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhirong Shidai Information Technology Co ltd
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN201410596395.6A priority Critical patent/CN104298771B/en
Publication of CN104298771A publication Critical patent/CN104298771A/en
Application granted granted Critical
Publication of CN104298771B publication Critical patent/CN104298771B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses inquiry and the analysis method of a kind of magnanimity web daily record datas based on Hadoop and Hive using high reliability, high scalability, high efficiency and the high fault tolerance of Hadoop/Hive Distributed Computing Platforms.The present invention comprises the following steps:Data to each data source are parsed;Data are loaded into data warehouse;Receive HiveQL sentences;Optimized to receiving sentence, obtain preliminary map results;Sentence will be received to be converted into MapReduce tasks carryings and store Query Result;Data are split;Analysis mining is carried out to data;Data are loaded into Mysql databases.The present invention is directed to the web daily record datas of magnanimity, realizes accurately inquiry and data analysis, can realize the scalability and high efficiency of mass data storage query analysis, the problem of job skewness overall performances for also avoiding data skew from bringing decline.

Description

A kind of magnanimity web daily record datas inquiry and analysis method
Technical field
The invention belongs to technical field of computer information processing, and in particular to a kind of magnanimity based on Hadoop and Hive Web daily record datas are inquired about and analysis method.
Background technology
With developing rapidly for Internet technologies, the various application and service run on Internet are also a large amount of therewith Emerge in large numbers, the epoch of big data have arrived.Each website is an independent information system in itself, and network is passed through in these websites After interconnection so that whole internet becomes a huge information system.Client can leave it during browsing web sites The vestige that accesses, these vestiges can preserve in the form of web journal files.Various systems, program, O&M, transaction etc. Obtaining daily record becomes more and more important, because it is the important evidence of the operations such as system recovery, error tracking, safety detection.
Because data source is numerous, each system user is various, frequent operation, TB grades even PB grades can be produced daily Magnanimity web daily record datas, and traditional database can not have been met and moved now due to the limitation of scalability and process performance Often tens of G, hundreds of G, the requirement of the storage analyzing and processing of even upper T data volume.And in a lot of non-structured daily record texts Inside part, how quick-searching goes out data, how fast searching is to useful data, how to daily record progress statistical analysis, into For urgent problem to be solved.Existing big data querying method can only directly carry out the search of line unit simply by HBase and borrow Hive HQL is helped to be retrieved, retrieval time delay is very big, and data results are also inaccurate, it is impossible to meet current demand.
The content of the invention
To solve the above problems, the present invention utilizes the high reliability of Hadoop/Hive Distributed Computing Platforms, high extension Property, high efficiency and high fault tolerance, disclose a kind of inquiry and analysis of the magnanimity web daily record datas based on Hadoop and Hive Method.
Open Framework Hadoop is an extensive use and very unique instrument, and user is write by oneself MapReduce programs, and a task is divided into many more fine-grained subtasks by dispatching, and these subtasks are distributed Different nodes into cluster, with parallel progress.So, also can obtain in the case of big data set user receiving when Between be spaced.Hadoop causes the benefit that the user for being ignorant of Distributed Calculation can also make full use of Distributed Calculation to bring.Hive Increased income first by Facebook in 2008, once release, allowing for Hive use becomes very popular, Hadoop user Exploitation Hive can be used according to the data processing needs of oneself.Hive defines simple class SQL query languages, claims For HiveQL, it is allowed to be familiar with SQL user's inquiry data.Meanwhile, this language also allows to be familiar with MapReduce exploitations What the customized mapper and reducer of exploitation of person can not complete to handle built-in mapper and reducer Complicated analysis work.Hive mainly includes user interface, metadata storage, interpreter, compiler, optimizer, actuator etc. Deng.Hadoop distributed file system file system HDFS is stored in by the plan of interpreter, compiler, optimizer generation In, actuator calls MapReduce programs to complete sentence and calls analysis.
The present invention is directed to web daily record data magnanimity features, and correlation is carried out to magnanimity web daily record datas according to actual conditions Inquiry and analysis, using the HiveQL of optimization as the important means of inquiry, with data segmentation and the combination of genetic algorithm to magnanimity day Will data are analyzed, and realize the efficient excavation of big data.
In order to achieve the above object, the present invention provides following technical scheme:
A kind of magnanimity web daily record datas inquiry and analysis method, comprise the following steps:
Step(1), the data of each data source are parsed with the ETL in Hive, resolving includes extracting, clear Four steps are washed, converted and loaded, when being cleaned to data, useful information therein is carried out with MapReduce programs Distributed extraction processing;
Step(2), the data extracted are loaded into data warehouse;
Step(3), Hive part Driver reception HiveQL sentences;
Step(4), optimized for tilt data to receiving sentence, preliminary map obtained after carry out table attended operation As a result;
Step(5), the HiveQL sentences received are converted into MapReduce tasks carryings and Query Result is stored;
Step(6), data segmentation is carried out for the web daily record datas of magnanimity;
Step(7), the genetic algorithm searched for using the global randomization of highly-parallel is to data progress analysis mining;
Step(8), the data that data query and analysis part are drawn are loaded into Mysql databases.
Further, the step(4)In optimization operation include to inclined data use map join connection data Table, not inclined data are with common join connection tables of data.
Further, the step(5)In composite function combiner is introduced during map, realize that local key's is poly- Close, the key that map is exported is sorted, value is iterated.
Further, the combiner function setups are transported before or after the map results produced merge operation OK.
Compared with prior art, the invention has the advantages that and beneficial effect:
The present invention is directed to magnanimity web daily record datas, and the scalability for taking storage mass data system into consideration also has data Structure it is unstructured, and data with existing processing method advantage and disadvantage, the high-performance based on Hadoop/Hive distributed systems Calculate and split based on data and genetic algorithm data analysis technique, contribute to inquiry in the web daily record datas of magnanimity with Analysis, realizes accurately inquiry and data analysis.For example, the daily record data that can analyze search engine web site obtains user's click Ranking of the order with URL.This method has carried out Hive optimizations, compensate for directly carrying out line unit simply by HBase in the past Search and carry out retrieval time delay very big shortcoming by Hive HQL;Meanwhile, to split analysis using data, while using Record in Analysis of Genetic Algorithms daily record data, makes data results more accurate.Both sides is combined, and can realize magnanimity The scalability and high efficiency of data store query analysis, are also avoided under the job skewness overall performances that data skew is brought The problem of drop.Relative to traditional daily record data query analysis method, can allow to carry out daily record data analysis inquiry company or Person client can accurately understand web situations, and for example can find Top Site with URL ranking according to user's click order is carried out The advertisement putting of trade company.The present invention realizes the data mining of big data, for example, can realize web recommendation and ecommerce Marketing.
Brief description of the drawings
Fig. 1 is the inventive method steps flow chart schematic diagram;
Fig. 2 is the list structure figure of webpage.
Embodiment
The technical scheme provided below with reference to specific embodiment the present invention is described in detail, it should be understood that following specific Embodiment is only illustrative of the invention and is not intended to limit the scope of the invention.
Client can leave the vestige that they are accessed during browsing web sites, and these vestiges can be with web journal files Form is preserved.This example is directed to these data, using the ETL language in Hive, the Hive SQL queries of optimization, introduces The MapReduce of combiner functions, the genetic algorithm based on data cutting techniques provide daily record data inquiry with dividing come accurate The result of analysis.As shown in figure 1, this method is comprised the following steps that:
Step 10, the data of each data source are parsed with the ETL in Hive.ETL processes include entering data Four steps that row extracts, cleans, converts and load.The extraction stage parses source data in deposit Hive, uses Hadoop And Hive programs extract the data that may be used to Transform layers from source data;Wash phase will based on Hive programs The field that may subsequently use extracts Load layers of deposit, discards the data and repeated data that will not be used;Loading Stage is exactly the table being stored in the data handled well in Hive, and deletes source data.But only extracting data with ETL instruments can not The requirement of speed is met, the present invention uses MapReduce programmings for the cleaning process of data, will be carried out per data Read, extract respective field.When handling initial data, distributed cleaning treatment is carried out using MapReduce programs, in cluster One NameNode of middle setting(JobTracker)To serve as data distributing server, DataNode is set(TaskTracker) To deposit and handle the data distributed via NameNode.It is 128,000,000 that data to be processed are divided into size by NameNode Block, each block number according to two backups are set, then according to certain algorithm by Hadoop systems from being about to block number according to deposit DataNode is the further processing that data are carried out in data processing server.This step be related to Hive and MapReduce application, it is built upon on the basis of distributed file system HDFS, and subsequent data warehouse is from various dimensions pair Mass data is modeled, as requested inquiry or analyze data.
Step 20, the data parsed through step 10 are stored among the table in data warehouse Hive designing and building up. The number of table in Hive is designed, it is necessary to create multiple tables according to the actual conditions of data, and each table visioning procedure is substantially the same. Such as field of the table of storage Apache format logs:Visitor IP, viewer's mark, user name, access time, the side accessed Document that method, request are accessed etc..The present invention sets up first number that a relational database metastore is specifically used to storage table It is believed that breath.
Step 30, the Driver that Hive systems are carried receives HiveQL sentences, administers the life of HiveQL sentences Life cycle, including compiling, optimization and execution to HiveQL sentences, its detailed process is as follows:
Step 40, for tilt data problem, optimized to receiving sentence, carry out table connection(join)Obtained after operation Preliminary map results:The problem of invalid id will encounter data skew in association, such as daily about 2,000,000,000 the whole network day Will, visitor IP therein is major key, can be lost during log collection, the situation that major key is null occurs, if taking it In visitor IP and viewer sign associate, the problem of data skew will being encountered.During reason is Hive, major key is null values Item can be taken as identical Key and distribute into same calculating map, Calculation bottleneck can be caused.Met according to the distribution of data Sociology statistical law, inclined key will not be too many, and the present invention is optimized when carrying out Hive inquiry join sentences, inclined Data map join, i.e., carry out a segmentation to the major key of tilt data, it is to avoid inclined major key is all distributed into a meter Calculate, carry out distributed table attended operation;Not inclined data are with common join, i.e., the connection behaviour for directly pressing major key carry out table Make, final merging obtains complete result.
Step 50, optimize obtained preliminary map results according to step 40, then compiler is called with Driver Compiler, the strategy that the HiveQL sentences received are converted into being made up of the DAG of MapReduce tasks, strategy Operated and constituted by metadata and HDFS, be finally submitted to task on enforcement engine with topological order, complete the analysis of data Calculating task --- carry out distributed query according to querying condition.MapReduce input, which comes from, has been imported into HDFS File in cluster, these files are evenly distributed in all nodes, one MapReduce program of operation can first in part or In all nodes of person run mapping tasks, all mapping tasks be all it is of equal value, each mapping tasks all without with Other mapping exchange information, other mapping presence are also may not realize that, after the completion of the mapping stages, between node The middle key-value pair of generation may be intercoursed, and will possess identical key value, such as same visitor IP is submitted to During same Reducer, whole MapReduce, communicating between node occurs with regard to only possible in this step, with Mapping tasks are the same, and reducing tasks will not also be communicated with other reducing tasks, Hadoop MapReduce Ensure the reliability of tasks carrying by being automatically performed data transfer and restarting thousand business on failure node.On this basis, After map processes, before reduce processes, we may be incorporated into composite function combiner, to the number of map the output of process According to optimizing, local key polymerization is realized, the key that map is exported is sorted, value is iterated;To producing during map Data can carry out one merging merge operation, by the data of generation press major key merge, combiner functions can also basis The result for needing to be arranged on map generations run before or after merge, especially large result when, greatly reduce Data copy of the map tasks to Reduce tasks.
Step 60, data are split:Test sample collection is averagely divided into M parts (born by InputFormat first Data block is divided into InputSplit by duty), and unified (be formatted as is carried out to data format<Id,<X, Y>>, wherein, id Represent the numbering being made up of visitor IP and access date;Y represents the page of user's current accessed;X represents reference, i.e. user The page stopped before accession page Y.Then, map operations are that each record of input is scanned, by data set Initialized according to above-mentioned form;After map is operated, intermediate result is obtained<<X, Y>, 1>, that is, have a user from page Face Y have accessed page X;Reduce is operated then by intermediate result according to identical<X, Y>Page jump access mode carry out Merging obtains output result<<X, Y>, n>, wherein, n represent access path X->Y frequency.Secondly, each sub-group(I.e. The data block that previous segmentation is obtained)The result that Reduce is operated is converted into list structure respectively, chained list head preserves k values. List structure figure is as shown in Fig. 2 wherein, i.e. k represents chromosome chained list length;X, Y, Z, R represent webpage.
Step 70, selected, intersected inside the genetic algorithm sub-group searched for using the global randomization of highly-parallel Etc. genetic evolutionary operations:2 chromosomes are randomly choosed first from parent chromosome, then random generation insertion position Ins, Delete position Del, insert and delete length Len.Then whether isometric compare 2 sections of chromosomes, if equal, judge be end to end It is no to have coincidence, have, then the new chromosome of connection generation, otherwise, does not generate child chromosome;If Length discrepancy, insertion is judged It is whether identical with 2 sections of genes of deletion, if identical, item chromosome is merged into as new chromosome, otherwise, is not given birth to Into child chromosome.When genetic algebra is 50 multiple, marriage operation is carried out between colony.Each sub-group is repeated always Operation is stated, until k values no longer change, when genetic algebra is 50 multiple, marriage operation is carried out between colony.Each sub-group Aforesaid operations are repeated always, until k values no longer change, exit genetic algorithm.Page can be obtained by aforesaid operations Access path, and the Web log file sizes handled do not interfere with the validity of algorithm.
Above-mentioned steps 70 and 80 combine data cutting techniques and genetic algorithm in data analysis process, special to use Web log analysis in Hadoop/Hive cluster environment.
Step 80, data data query and analysis part drawn are loaded into Mysql databases, as desired will The result of data analysis shows user in friendly interface form.For example, some website, the page, data center's access times Inquiry, the analysis of visitor's situation, other access the ratio of failure such as a certain web page in certain a period of time in the past Example, or user's click order with URL ranking, can query analysis come out.
Technological means disclosed in the present invention program is not limited only to the technological means disclosed in above-mentioned embodiment, in addition to Constituted technical scheme is combined by above technical characteristic.It should be pointed out that for those skilled in the art For, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications are also considered as Protection scope of the present invention.

Claims (4)

1. a kind of magnanimity web daily record datas inquiry and analysis method, it is characterised in that comprise the following steps:
Step(1), the data of each data source are parsed with the ETL in Hive, resolving include extract, cleaning, Four steps of conversion and loading, the extraction stage parses source data in deposit Hive, with Hadoop and Hive programs from source The data that may be used are extracted in data to Transform layers;Wash phase, which is based on Hive programs, may subsequently use To field extract deposit Load layers, discard the data and repeated data that will not be used;Load phase will be handled well Table in data deposit Hive, and delete source data;When being cleaned to data, useful information therein is used MapReduce programs carry out distributed extraction processing;
Step(2), the data extracted are loaded into data warehouse;
Step(3), Hive part Driver reception HiveQL sentences;
Step(4), optimized for tilt data to receiving sentence, preliminary map results obtained after carry out table attended operation;
Step(5), the HiveQL sentences received are converted into MapReduce tasks carryings and Query Result is stored;
Step(6), data segmentation is carried out for the web daily record datas of magnanimity;
Step(7), the genetic algorithm searched for using the global randomization of highly-parallel is to data progress analysis mining:First from father It is then random to generate insertion position Ins, delete position Del, insert and delete length for 2 chromosomes are randomly choosed in chromosome Len;Then whether isometric compare 2 sections of chromosomes, if equal, judge whether there is coincidence end to end, have, then connection generation is new Chromosome, otherwise, do not generate child chromosome;If Length discrepancy, judge whether the 2 sections of genes for inserting and deleting are identical, If identical, item chromosome is merged into as new chromosome, otherwise, child chromosome is not generated, when genetic algebra is During 50 multiple, marriage operation is carried out between colony, each sub-group repeats aforesaid operations always, until k values no longer change;
Step(8), the data that data query and analysis part are drawn are loaded into Mysql databases.
2. magnanimity web daily record datas inquiry according to claim 1 and analysis method, it is characterised in that:The step(4) In optimization operation include using inclined data map join connection tables of data, not inclined data are with common join companies Connect tables of data.
3. magnanimity web daily record datas inquiry according to claim 1 or 2 and analysis method, it is characterised in that:The step (5)During map introduce composite function combiner, realize local key polymerization, to map export key sort, Value is iterated.
4. magnanimity web daily record datas inquiry according to claim 3 and analysis method, it is characterised in that:It is described Combiner function setups are run before or after the result that union operation is produced is merged.
CN201410596395.6A 2014-10-30 2014-10-30 A kind of magnanimity web daily record datas inquiry and analysis method Expired - Fee Related CN104298771B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410596395.6A CN104298771B (en) 2014-10-30 2014-10-30 A kind of magnanimity web daily record datas inquiry and analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410596395.6A CN104298771B (en) 2014-10-30 2014-10-30 A kind of magnanimity web daily record datas inquiry and analysis method

Publications (2)

Publication Number Publication Date
CN104298771A CN104298771A (en) 2015-01-21
CN104298771B true CN104298771B (en) 2017-09-05

Family

ID=52318496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410596395.6A Expired - Fee Related CN104298771B (en) 2014-10-30 2014-10-30 A kind of magnanimity web daily record datas inquiry and analysis method

Country Status (1)

Country Link
CN (1) CN104298771B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809231A (en) * 2015-05-11 2015-07-29 浪潮集团有限公司 Mass web data mining method based on Hadoop
CN104866608B (en) * 2015-06-05 2018-01-09 中国人民大学 Enquiring and optimizing method based on join index in a kind of data warehouse
CN105512315B (en) * 2015-12-12 2019-04-30 天津南大通用数据技术股份有限公司 A kind of distributed data base SQL execute in INNER JOIN intelligent evaluation method
EP3182288B1 (en) 2015-12-15 2019-02-13 Tata Consultancy Services Limited Systems and methods for generating performance prediction model and estimating execution time for applications
CN106897293B (en) * 2015-12-17 2020-09-11 中国移动通信集团公司 Data processing method and device
CN105608203B (en) * 2015-12-24 2019-09-17 Tcl集团股份有限公司 A kind of Internet of Things log processing method and device based on Hadoop platform
CN105677842A (en) * 2016-01-05 2016-06-15 北京汇商融通信息技术有限公司 Log analysis system based on Hadoop big data processing technique
CN105787009A (en) * 2016-02-23 2016-07-20 浪潮软件集团有限公司 Hadoop-based mass data mining method
CN106874322A (en) * 2016-06-27 2017-06-20 阿里巴巴集团控股有限公司 A kind of data table correlation method and device
CN106301892A (en) * 2016-08-02 2017-01-04 浪潮电子信息产业股份有限公司 Hue service arrangement based on Apache Ambari and configuration and surveillance method
CN106547883B (en) * 2016-11-03 2021-02-19 北京集奥聚合科技有限公司 Method and system for processing User Defined Function (UDF) running condition
CN106599244B (en) * 2016-12-20 2024-01-05 飞狐信息技术(天津)有限公司 General original log cleaning device and method
CN106709029A (en) * 2016-12-28 2017-05-24 上海斐讯数据通信技术有限公司 File hierarchical processing method and processing system based on Hadoop and MySQL
CN107818181A (en) * 2017-11-27 2018-03-20 深圳市华成峰科技有限公司 Indexing means and its system based on Plcient interactive mode engines
CN108182596A (en) * 2017-12-22 2018-06-19 合肥天源迪科信息技术有限公司 One kind is based on enterprise marketing management method under big data environment
CN108133043B (en) * 2018-01-12 2022-07-29 福建星瑞格软件有限公司 Structured storage method for server running logs based on big data
CN108520071A (en) * 2018-04-13 2018-09-11 航天科技控股集团股份有限公司 A kind of log searching system and method based on recorder platform
CN108509648A (en) * 2018-04-13 2018-09-07 航天科技控股集团股份有限公司 A kind of log searching system based on recorder platform
CN108595578A (en) * 2018-04-17 2018-09-28 曙光信息产业(北京)有限公司 Data processing method, device and the storage system of high-performance calculation Historical Jobs data
CN108664657A (en) * 2018-05-20 2018-10-16 湖北九州云仓科技发展有限公司 A kind of big data method for scheduling task, electronic equipment, storage medium and platform
CN109918349B (en) * 2019-02-25 2021-05-25 网易(杭州)网络有限公司 Log processing method, log processing device, storage medium and electronic device
CN111125149B (en) * 2019-12-19 2024-01-26 广州品唯软件有限公司 Hive-based data acquisition method, hive-based data acquisition device and storage medium
CN112346672B (en) * 2020-11-06 2023-01-03 深圳市同行者科技有限公司 Log dyeing method, device, equipment and storage medium
CN113434376B (en) * 2021-06-24 2023-04-11 山东浪潮科学研究院有限公司 Web log analysis method and device based on NoSQL
CN113836431A (en) * 2021-10-19 2021-12-24 中国平安人寿保险股份有限公司 User recommendation method, device, equipment and medium based on user duration
CN116644039B (en) * 2023-05-25 2023-12-19 安徽继远软件有限公司 Automatic acquisition and analysis method for online capacity operation log based on big data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9754050B2 (en) * 2012-02-28 2017-09-05 Microsoft Technology Licensing, Llc Path-decomposed trie data structures

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"基于hive的性能优化方法的研究与实践";叶文宸;《中国优秀硕士学位论文全文数据库 信息科技辑》;20111015(第10期);第17页第3.1.1节,第43页第4.5.11节 *
"基于海量查询日志的数据挖掘及用户行为分析";周婷婷;《中国优秀硕士学位论文全文数据库 信息科技辑》;20131115(第11期);第13-18页第2.3-2.4节,第30-31页第4.2节,图2-5 *

Also Published As

Publication number Publication date
CN104298771A (en) 2015-01-21

Similar Documents

Publication Publication Date Title
CN104298771B (en) A kind of magnanimity web daily record datas inquiry and analysis method
Rao et al. The big data system, components, tools, and technologies: a survey
CN106649455B (en) Standardized system classification and command set system for big data development
Zaharia et al. Fast and interactive analytics over Hadoop data with Spark
EP2780834B1 (en) Processing changes to distributed replicated databases
CN105989150B (en) A kind of data query method and device based on big data environment
EP2572289B1 (en) Data storage and processing service
CN101593200A (en) Chinese Web page classification method based on the keyword frequency analysis
Hariharakrishnan et al. Survey of pre-processing techniques for mining big data
CN109086573B (en) Multi-source biological big data fusion system
Nikhil et al. A survey on text mining and sentiment analysis for unstructured web data
Savitha et al. Mining of web server logs in a distributed cluster using big data technologies
Sethy et al. Big data analysis using Hadoop: a survey
Benny et al. Hadoop framework for entity resolution within high velocity streams
Nagdive et al. Web server log analysis for unstructured data using apache flume and pig
Ennaji et al. Social intelligence framework: Extracting and analyzing opinions for social CRM
CN103488741A (en) Online semantic excavation system of Chinese polysemic words and based on uniform resource locator (URL)
KR20140076010A (en) A system for simultaneous and parallel processing of many twig pattern queries for massive XML data and method thereof
Ravichandran Big Data processing with Hadoop: a review
Sudha et al. A survey paper on map reduce in big data
De Bonis et al. Graph-based methods for Author Name Disambiguation: a survey
He et al. The high-activity parallel implementation of data preprocessing based on MapReduce
Priya et al. Entity resolution for high velocity streams using semantic measures
Mangla et al. IPB-Implementation of Parallel Mining for Big Data
Vissamsetti et al. Twitter Data Analysis for Live Streaming by Using Flume Technology

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20181128

Address after: 412300 Room 807, Youzhou Internet Financial Innovation Center, Yongjia Community Jiayuan Group, Lianxing Street, Youxian County, Zhuzhou City, Hunan Province

Patentee after: Zhixin Financial Information Service (Youxian) Co.,Ltd.

Address before: 210044 Ning six road, Nanjing, Jiangsu Province, No. 219

Patentee before: Nanjing University of Information Science and Technology

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200715

Address after: No.383, commercial building 3, building 1, jianxiyuan Zhongli, Haidian District, Beijing 100043

Patentee after: BEIJING ZHIRONG SHIDAI INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 412300 Room 807, Youzhou Internet Financial Innovation Center, Yongjia Community Jiayuan Group, Lianxing Street, Youxian County, Zhuzhou City, Hunan Province

Patentee before: Zhixin Financial Information Service (Youxian) Co.,Ltd.

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170905

Termination date: 20211030

CF01 Termination of patent right due to non-payment of annual fee