CN107153710A - A kind of big data processing method and system - Google Patents
A kind of big data processing method and system Download PDFInfo
- Publication number
- CN107153710A CN107153710A CN201710356324.2A CN201710356324A CN107153710A CN 107153710 A CN107153710 A CN 107153710A CN 201710356324 A CN201710356324 A CN 201710356324A CN 107153710 A CN107153710 A CN 107153710A
- Authority
- CN
- China
- Prior art keywords
- data
- big data
- database
- data processing
- cloud computing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0253—During e-commerce, i.e. online transactions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0255—Targeted advertisements based on user history
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a kind of big data processing method and system, methods described includes:Data acquisition is carried out according to behaviors such as the conventional historical viewings of user, purchaser records;Using Hadoop distributed modes, to the data collecting module collected to data filter, obtained complete and unduplicated data;By the complete and unduplicated data after the data filtering modular filtration, it is converted into computer language and is stored in database;The information stored in the database is called, and the data in the database called out are handled using cloud computing.Using the embodiment of the present invention, rely on cloud computing to carry out distributed data digging to big data, can effectively excavate website user's behavioral data, and do cloud computing processing effectively in real time.
Description
Technical field
The present invention relates to big data processing technology field, more particularly to a kind of big data processing method and system.
Background technology
In recent years, the development of internet is more and more rapider, is also increasingly popularized using the people of internet, and people are using mutual
When networking carries out daily activity, program, information are checked in such as net purchase, and commodity can all produce substantial amounts of data, and these
Data are very valuable for e-commerce website or the Internet media class website, utilize the processing of these big datas
Processing can obtain very valuable commercial value.
Big data is widely used in internet items application, great to the significance of website, at mass data
Reason and the realization of cloud computing, can maximize help the Internet media class advertiser web site system and ecommerce class website big data
Commodity supplying system obtains maximized lifting.The big data advertisement of the Internet media class website is read preference according to user and pushed,
For the cloud computing of mass data, website browsing user's ecommerce class website big data business is pushed to by various advertisement forms
Product are pushed to on-line purchase person, and behavior, buying behavior, product correlation, preference and use time rule are clicked on by handling user
Rule pushes corresponding commodity and sales promotion information.
The appearance of big data, is triggering technology and Business Change deep in global range.Technically, big data makes
The usual manner that information is extracted among data is changed.The machine played a significant role in search engine and online advertisement
Device learns, it is considered to be big data plays the field of true value.Statistical disposition goes out the behavior of people, custom in the data of magnanimity
Etc. mode, advertiser is at utmost helped to find accurate potential customers, so as to lift advertising results and follow-up purchase operation.
But current big data application have the shortcomings that it is many, for example:1st, the processing of data needs the number based on magnanimity
According to accumulation.Current big data needs to be handled according to millions of users and its historical behavior, and the overwhelming majority is flat
Platform or enterprise lack big data and relied on, often small data, middle data, in addition the number such as behavioural habits, purchaser record, browing record
According to also deficienter;2nd, data processing needs powerful software and hardware to support.The calculating of current big data has higher threshold, so
The calculating of big data is not also to popularize very much.Present big data, which is calculated, mainly the following two classes ecosphere:Big data of the increasing income ecosphere
With the commercial big data ecosphere;3rd, data processing needs to rely on the decoding of a large amount of professional persons.The behavior model of big data, is needed
There are stronger mathematical statistics requirement, microcomputer modelling requirement, the current country also lacks such talent.Such as need to be grasped data
Use ability, probability statistics of base management system etc.;4th, also there is erroneous judgement in data processed result.The result of big data is past
It is wrong toward the otherness for not possessing real-time, specific aim, initial data sampling precision and statistical method, and modeling structure
By mistake, it can all cause processing wrong.In addition different usage scenarios also brings along entirely different result.
The content of the invention
The purpose of the embodiment of the present invention is to provide a kind of big data processing method and system, relies on cloud computing can be to big
Data carry out distributed data digging, can effectively excavate website user's behavioral data, and do cloud computing processing effectively in real time.
To reach above-mentioned purpose, the embodiment of the invention discloses a kind of big data processing method, method includes:
Data acquisition is carried out according to behaviors such as the conventional historical viewings of user, purchaser records;
Using Hadoop distributed modes, to the data collecting module collected to data filter, obtain complete and
Unduplicated data;
By the complete and unduplicated data after the data filtering modular filtration, it is converted into computer language and is stored in data
In storehouse;
The information stored in the database is called, and the data in the database called out are handled using cloud computing.
To reach above-mentioned purpose, the embodiment of the invention also discloses a kind of big data processing system, the system includes:
Big data acquisition module, the data acquisition module is used to be entered according to behaviors such as the conventional historical viewings of user, purchaser records
Row data acquisition;
Big data filtering module, the data filtering module is used to utilize Hadoop distributed modes, to the data acquisition module
The data that block is collected are filtered, obtained complete and unduplicated data;
Collector, for the complete and unduplicated data for obtaining the data filtering modular filtration, is converted into computer
Language;
Database, the complete and unduplicated data that the data filtering modular filtration is obtained are converted by the collector
Computer language can be stored in the database;
Operating system, by the operating system, can call the data message stored in the database;
Cloud computing module, the cloud computing module can handle the data in the database.
Optionally, the data handling system can also include:
The webserver, the data cube computation in multiple databases can be got up by the webserver, and there is provided bigger
Data.
Optionally, the operating system is (SuSE) Linux OS.
Optionally, the webserver is the Apache webservers.
Optionally, the database is MySQL databases.
Optionally, the collector is Perl, PHP or Python programming language.
Optionally, the data of the data collecting module collected carry out distributed data mining by the cloud computing,
Required data are effectively excavated with this.
Optionally, the data handling system can also include:
Storm topological structure frameworks, can be corrected in real time by the topological structure framework in the case where not needing professional
The deviation of data processing.
Optionally, the data handling system can also include:
The simple Storm topological structures of MapReduce functions, the simple Storm topological structures of the MapReduce functions can
The deviation of correction data processing in real time.
It can be seen that, a kind of big data processing method and system provided in an embodiment of the present invention, according to big data processing system energy
Enough lift the precision and the precision of store merchandise display of the advertizing of website;Caused by big data system treatment technology
Platform can understand the behavioural habits and preference of user, and the real-time dynamic interaction during its use rapidly, allow interested wide
Accuse and commodity are shown in the appropriate time with friendly form of websites, solve conventional ads and merchandise display not accurately
Problem;Solve defect of the domestic enterprise on software and hardware, and operating personnel lack experience, help platform overcomes original number
The problems such as according in disorder, big data model modeling, data processing and prediction, there is provided the support of real-time and relative efficiency data;Rely on
Cloud computing can carry out distributed data digging to big data, can effectively excavate website user's behavioral data, and in real time effectively
Do cloud computing processing in ground;Also, the Storm topological structures wherein included can in real time be rectified in the case where not needing professional
Correction data treatment deviation.
Certainly, any product or method for implementing the present invention it is not absolutely required to while reaching all the above excellent
Point.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of schematic flow sheet of big data processing method provided in an embodiment of the present invention.
Fig. 2 is a kind of distributed data digging schematic diagram provided in an embodiment of the present invention.
Fig. 3 is a kind of Storm topological structures configuration diagram provided in an embodiment of the present invention.
Fig. 4 is a kind of simple Storm topological structures schematic diagram of MapReduce functions provided in an embodiment of the present invention.
Fig. 5 is a kind of Hadoop clouds framework allocation plan schematic diagram provided in an embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of protection of the invention.
Referring to Fig. 1, Fig. 1 is a kind of schematic flow sheet of big data processing method provided in an embodiment of the present invention, can be wrapped
Include following steps:
S101, data acquisition is carried out according to behaviors such as the conventional historical viewings of user, purchaser records;
S102, using Hadoop distributed modes, to the data collecting module collected to data filter, what is obtained is complete
Whole and unduplicated data;Wherein, Hadoop distributed modes are prior art, and the embodiment of the present invention is not gone to live in the household of one's in-laws on getting married to it herein
State;
S103, by the complete and unduplicated data after the data filtering modular filtration, is converted into computer language and stores
In database;
S104, calls the information stored in the database, and handle the number in the database called out using cloud computing
According to.
Big data processing system is applied on some ecommerce class websites, for example, apply on A stores.Wherein A stores
Big data processing system is mainly included to be carried out largely timely handling to behaviors such as the conventional historical viewings of user, purchaser records, shape
Into huge store Dynamic Data Warehouse, according to purchase preference and the frequency of purchase, user's commodity are pushed in time by data mining
Information, automatic periodically send includes EDM, short message, the Commdity advertisement information for the diversified forms such as interior letter of standing.Further, big data is handled
System is also the foundation as examination store product temperature and layout simultaneously, and the conventional product of hot topic is handled by system can be certainly
It is dynamic to be ordered into most obvious position.(judge typically by IP addresses according to customer access pipe or account judges, strictly abide by
Keep safe and secret principle), recommended products, hot product in website can quickly update adjustment with the operation of user, match
User's commodity interested, so as at utmost lift the accurate sale of web site commodity.In order to realize A stores big data processing system
A kind of function of system, big data processing system that the present invention is provided uses distributed computing architecture (LAMP), LAMP framework bags
Include:(SuSE) Linux OS, the Apache webservers, MySQL database, Perl, PHP or Python programming language,
All composition products are open source softwares, are the architecture frameworks of international mature.Compared with Java/J2EE frameworks, LAMP tools
Have web resource abundant, light weight, it is safe the features such as, compared with the .NET frameworks of Microsoft, LAMP has general, cross-platform, high-performance
Advantage.Store data real time backup, issued transaction effect are quick simultaneously, possess complete data processing function.Pass through cloud again
Form of calculation, processing large-scale parallel (MPP) database, distributed data base etc., can quickly, largely, accurately handle business
The buying habit of city user, pushes the product matched and is presented in diversified form in the vision of buyer, so that effectively
Promote the probability and frequency of commodity purchasing.
Big data processing system is applied on some the Internet media class websites, for example, apply on a websites.Wherein a nets
Big data of standing processing system, especially a websites big data ad system, can automatic lifting paid advertisement client maximum on website
Degree matches potential customers, is handled by a large number of users behavioral data, is to browse web sites in a short time by cloud computing processing
The associated advertising message of pushes customer.So as to promote advertisement of the online user to classification interested to be browsed, click on and look into
The follow-up behavior such as see, be to realize the maximized core Internet technology of advertisement value.A advertiser web site systems are also supported simultaneously
Internet overwhelming majority advertisement form, including word chain, display advertising, video ads etc..Possess sound Advertisement arrangement machine
System, can precisely count advertisement PV, click on effect, data statistics etc..Possess Advertisers bid system, can according to cpc, cpm,
The diversified forms such as cpa, cps, cpv are charged.In order to realize the function of a websites big data ad system, the present invention provides one
Plant big data processing system and use distributed computing architecture (LAMP), LAMP frameworks include:Linux operating systems,
The Apache webservers, MySQL databases, Perl, PHP or Python programming language, all composition products
It is open source software, is the architecture framework of international mature.Compared with Java/J2EE frameworks, LAMP has Web resources
Abundant, light weight, it is safe the features such as, compared with the .NET frameworks of Microsoft, LAMP has general, cross-platform, high-performance benefits.
Simultaneously by cloud computing form, processing large-scale parallel (MPP) database, distributed data base etc., can quickly, largely,
Accurately processing advertising message and diversified displaying in front of the user.
Distributed data digging according to Fig. 2, distributed data digging relies on the distributed treatment of cloud computing, distribution
Formula database (PaaS) and cloud storage, virtualization technology (IaaS).Show cloud computing presentation by mobile terminal, PC ends
Effect data.Website user's behavioral data can be effectively excavated, and effectively does cloud computing processing in real time, feedback user sense is emerging
The advertising message and commodity of interest.
With the arriving of cloud era, big data has also attracted increasing concern.Big data is commonly used to describe one
A large amount of unstructured datas and semi-structured data that company creates, these data are downloading to relevant database for handling
When can overspending time and money.Big data processing is often linked together with cloud computing, because large data Ji Chu in real time
Reason needs the framework as MapReduce to be shared out the work to tens of, hundreds of or even thousands of computer.
Big data needs special technology, effectively to handle the substantial amounts of data accommodated within the elapsed time.It is applied to
The technology of big data, including MPP (MPP) database, data mining power network, distributed file system, distribution
Formula database, cloud computing platform, internet and expansible storage system.
Storm topological structure frameworks according to Fig. 3, using Storm frameworks rapidly and efficiently, can be corrected in real time
The deviation of big data processing, and need not the personnel of specialty can just draw more accurately data result.Storm is more than
One traditional big data processing system, it is an example of Complex event processing (CEP) system.CEP systems are generally divided
Class is for calculating and towards detection, wherein each system can be realized by user-defined algorithm in Storm.It is worth mentioning
, a Storm topmost feature is that it focuses on fault-tolerant and management.Storm is realized at secure message
Reason, so each tuple can carry out overall treatment by Storm topological structures;If it find that a tuple is also untreated,
It automatic can reset at nozzle.Storm also achieves the fault detect of task level, when a task breaks down, disappears
Breath can be redistributed quickly to start the process over automatically.Storm is managed comprising the processing more intelligent than Hadoop, flow meeting
It is managed by supervisor, to ensure that resource is fully used.
Specifically, Storm also achieves a kind of data flow model, wherein data continue to flow through a conversion entity net
Network, as shown in Figure 3.One data flow is abstractively referred to as a stream(Stream source, Stream source), a stream is one unlimited
Tuple sequence(Tuple stream, Tuple stream).Tuple represents mark using some additional serializing codes just as a kind of
Quasi- data type (such as integer, floating-point and byte arrays) or the structure of user defined type.Each flow by an only ID
Definition, this ID can be used for the topological structure for building data source and receiver (sink).Stream originates from nozzle(Message source,
Spout), nozzle is by data from external source flows into Storm topological structures.Also, spout can launch tuple stream to disappearing
Cease processor(Bolt), Bolt can perform filtering, polymerization, inquiry database operation, and can be with the progress of one-level one-level
Handle tuple stream, it is possible to carry out circulation and change(stream transformation).
The simple Storm topological structures of MapReduce functions according to Fig. 3.For common platform or enterprise
Industry, using more simple Storm models, the processing of low-volume traffic stream in can preferably adapting to, with it is more wide should
Use field.Receiver (or providing the entity of conversion) is referred to as bolt.Bolt realizes single conversion and one on a stream
All processing in Storm topological structures.Bolt can both realize MapReduce etc traditional function, can also realize more
Complicated operation (single step function), such as filter, polymerize or communicated with database external entity.Typical Storm topologys
Structure can realize multiple conversions, it is therefore desirable to multiple bolts with independent tuple stream.Nozzle and bolt are realized as in system
One or more tasks.
It is noted that Storm can be used to be that word frequency easily realizes MapReduce(Map reduction)Function.
As shown in Figure 4, nozzle generation textstream, bolt realizes Map(Mapping)Function (each list of a tokenized stream
Word).Reduce is realized in stream from " map " bolt and then inflow one(Reduction)(so that word to be polymerize in the bolt of function
Into sum).
Hadoop cloud framework allocation plan according to Fig. 5, it mainly illustrates the realization of cloud computing, passes through high in the clouds
Efficient data processing is realized in configuration.Hadoop MapReduce use Master(Master)/Slave(From disk)Structure.
Master is unique global administration person of whole cluster, and function includes:Job management, condition monitoring and task scheduling etc., i.e.,
JobTracker in MapReduce(Job controller).Slave is responsible for the execution of task and the return of task status, i.e.,
TaskTracker in MapReduce(Task performer).
Hadoop core is write using Java language, but supports the data processing write using various language to answer
Use program.The realization of newest application program employs more abstruse route, to make full use of modern languages and their spy
Property.
Specific operating procedure is as follows:Hadoop frameworks are realized first by five machines.
IP is followed successively by:
192.168.1.199(master)
192.168.1.200(slave)
192.168.1.201(slave)
192.168.1.202(slave)
192.168.1.203(slave)
First log into 119 servers:
[root@localhost~] #uname-ar
Linux localhost2.6.18-92.el5 #1 SMP Tue Jun 10 18:49:47 EDT 2008 i686
i686i386 GNU/Linux
Ensure the global uniqueness of computer name:
hadoop1.test.com-----192.168.1.203
hadoop2.test.com-----192.168.1.202
hadoop3.test.com-----192.168.1.201
hadoop4.test.com-----192.168.1.200
hadoop5.test.com-----192.168.1.199
Hostname is set:
Hostname hadoop5.test.com
[root@localhost~] #vi/etc/hosts
127.0.0.1 localhost.localdomain localhos
192.168.1.199 hadoop5.test.com
[root@localhost~] #uname-ar
Linux hadoop5.test.com2.6.18-92.el5 #1 SMP Tue Jun 10 18:49:47 EDT
2008i686 i686 i386 GNU/Linux
[root@localhost~] #vi/etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
#HOSTNAME=localhost.localdomain
HOSTNAME=hadoop5.test.com
GATEWAY=192.168.1.254
The setting that ssh without password is logged in:
Set up Master to each Slave SSH trusted certificates.Because Master will be started by SSH
All Slave Hadoop, thus need to set up unidirectional or two-way certificate ensure need not to input again when order is performed it is close
Code.Performed on Master and all Slave machines:ssh-keygen-t rsa.
When performing this order, it is seen that prompting only needs to carriage return.Then it will be produced below/root/.ssh/
Id_rsa.pub certificate file, (will remember to repair by scp on this file copy on Master machines to Slave
Rename title), for example:
Scp root/.ssh/id_rsa.pub root@192.168.1.200:/root/.ssh/authorized_keys
Set up authorized_keys files, this file can be opened and looked at, that is, rsa public key conduct
Key, user@IP are used as value.It can now test, need not be close from master ssh to slave
Code.It is also same reversely to be set up by slave.It is why reverse, if always be in fact Master start and
So It is not necessary to set up reverse, simply if it is desired to can also close Hadoop in Slave is accomplished by foundation if closing
Reversely.
Specifically realize that the step of the Internet media class advertiser web site is pushed with e-commerce website merchandise display is as follows:
(a) behavioural informations such as the conventional historical viewings of data collecting module collected user, purchaser record are passed through;
(b) information collected is converted into computer language and be stored in database by conversion;And
(c) user carries out distributed data digging, feedback to the information in the database in webpage clicking according to cloud computing
User's advertising message interested and commodity.
Wherein step (b) includes step:
(b.1) information collected is converted into computer language by a collector;
(b.2) information of computer language is converted into be stored in a database;
(b.3) information in multiple databases is connected by a webserver, realizes the formation of big data;And
(b.4) collected information is called at any time by an operating system.
Operating system is preferably Linux operating systems wherein described in step (b), and the webserver is preferred
For the Apache webservers, the database is preferably MySQL databases, and the collector is preferably Perl, PHP
Or Python programming languages.
In summary, big data processing system is combined with the basis of the various solutions of current big data technology, shape
Into being concisely and efficiently technical finesse means.Suitable for medium-sized and small enterprises, media platform, electric business platform, cost performance is higher, can meet
Data processing needed for day-to-day operations is supported, helps enterprise preferably to obtain income.
It can be seen that, a kind of big data processing method and system provided in an embodiment of the present invention, according to big data processing system energy
Enough lift the precision and the precision of store merchandise display of the advertizing of website;Caused by big data system treatment technology
Platform can understand the behavioural habits and preference of user, and the real-time dynamic interaction during its use rapidly, allow interested wide
Accuse and commodity are shown in the appropriate time with friendly form of websites, solve conventional ads and merchandise display not accurately
Problem;Solve defect of the domestic enterprise on software and hardware, and operating personnel lack experience, help platform overcomes original number
The problems such as according in disorder, big data model modeling, data processing and prediction, there is provided the support of real-time and relative efficiency data;Rely on
Cloud computing can carry out distributed data digging to big data, can effectively excavate website user's behavioral data, and in real time effectively
Do cloud computing processing in ground;Also, the Storm topological structures wherein included can in real time be rectified in the case where not needing professional
Correction data treatment deviation.
It should be noted that herein, all relational terms according to first and second or the like are used merely to one
Entity or operation make a distinction with another entity or operation, and not necessarily require or imply between these entities or operation
There is any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to contain
Lid nonexcludability is included, so that process, method, article or equipment including a series of key elements not only will including those
Element, but also other key elements including being not expressly set out, or also include being this process, method, article or equipment
Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that
Also there is other identical element in process, method, article or equipment including the key element.
Each embodiment in this specification is described by the way of related, identical similar portion between each embodiment
Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.It is real especially for device
Apply for example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method
Part explanation.
Can one of ordinary skill in the art will appreciate that realizing that all or part of step in above method embodiment is
To instruct the hardware of correlation to complete by program, described program can be stored in computer read/write memory medium,
The storage medium designated herein obtained, according to:ROM/RAM, magnetic disc, CD etc..
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention
It is interior.
Claims (10)
1. a kind of big data processing method, it is characterised in that methods described includes:
Data acquisition is carried out according to behaviors such as the conventional historical viewings of user, purchaser records;
Using Hadoop distributed modes, to the data collecting module collected to data filter, obtain complete and
Unduplicated data;
By the complete and unduplicated data after the data filtering modular filtration, it is converted into computer language and is stored in data
In storehouse;
The information stored in the database is called, and the data in the database called out are handled using cloud computing.
2. a kind of big data processing system, it is characterised in that the system includes:
Big data acquisition module, the data acquisition module is used to be entered according to behaviors such as the conventional historical viewings of user, purchaser records
Row data acquisition;
Big data filtering module, the data filtering module is used to utilize Hadoop distributed modes, to the data acquisition module
The data that block is collected are filtered, obtained complete and unduplicated data;
Collector, for the complete and unduplicated data for obtaining the data filtering modular filtration, is converted into computer
Language;
Database, the complete and unduplicated data that the data filtering modular filtration is obtained are converted by the collector
Computer language can be stored in the database;
Operating system, by the operating system, can call the data message stored in the database;
Cloud computing module, the cloud computing module can handle the data in the database.
3. big data processing system according to claim 2, it is characterised in that the data handling system can also be wrapped
Include:
The webserver, the data cube computation in multiple databases can be got up by the webserver, and there is provided bigger
Data.
4. big data processing system according to claim 2, it is characterised in that the operating system is that Linux operates system
System.
5. big data processing system according to claim 3, it is characterised in that the webserver is Apache networks
Server.
6. big data processing system according to claim 2, it is characterised in that the database is MySQL databases.
7. big data processing system according to claim 2, it is characterised in that the collector be Perl, PHP or
Person's Python programming languages.
8. according to any described big data processing system of claim 2 to 7, it is characterised in that the data acquisition module is adopted
The data of collection carry out distributed data mining by the cloud computing, and required data are effectively excavated with this.
9. according to any described big data processing system of claim 2 to 7, it is characterised in that the data handling system is also
It can include:
Storm topological structure frameworks, can be corrected in real time by the topological structure framework in the case where not needing professional
The deviation of data processing.
10. according to any described big data processing system of claim 2 to 7, it is characterised in that the data handling system is also
It can include:
The simple Storm topological structures of MapReduce functions, the simple Storm topological structures of the MapReduce functions can
The deviation of correction data processing in real time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710356324.2A CN107153710A (en) | 2017-05-19 | 2017-05-19 | A kind of big data processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710356324.2A CN107153710A (en) | 2017-05-19 | 2017-05-19 | A kind of big data processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107153710A true CN107153710A (en) | 2017-09-12 |
Family
ID=59793610
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710356324.2A Withdrawn CN107153710A (en) | 2017-05-19 | 2017-05-19 | A kind of big data processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107153710A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107797768A (en) * | 2017-10-11 | 2018-03-13 | 南京东方金信数据服务有限公司 | A kind of method and system for handling big data |
CN108334599A (en) * | 2018-01-31 | 2018-07-27 | 佛山市聚成知识产权服务有限公司 | A kind of analysis system based on big data |
CN108446306A (en) * | 2018-01-31 | 2018-08-24 | 佛山市聚成知识产权服务有限公司 | A kind of processing equipment of big data |
CN111416889A (en) * | 2020-01-16 | 2020-07-14 | 重庆大学 | Communication method and system adapted through GATT and exception handling |
CN114144773A (en) * | 2019-08-01 | 2022-03-04 | 国际商业机器公司 | Adjusting conversational flow based on behavior in human-machine cognitive interactions |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104361110A (en) * | 2014-12-01 | 2015-02-18 | 广东电网有限责任公司清远供电局 | Mass electricity consumption data analysis system as well as real-time calculation method and data mining method |
CN104376053A (en) * | 2014-11-04 | 2015-02-25 | 南京信息工程大学 | Storage and retrieval method based on massive meteorological data |
CN105205055A (en) * | 2014-06-06 | 2015-12-30 | 上海商会网网络信息技术有限公司 | Big data analyzing system |
CN105812394A (en) * | 2016-05-24 | 2016-07-27 | 王四春 | Novel application of cloud computing to cross-border electronic commerce |
CN106599174A (en) * | 2016-12-12 | 2017-04-26 | 国云科技股份有限公司 | Real-time news recommendation system and method thereof |
-
2017
- 2017-05-19 CN CN201710356324.2A patent/CN107153710A/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105205055A (en) * | 2014-06-06 | 2015-12-30 | 上海商会网网络信息技术有限公司 | Big data analyzing system |
CN104376053A (en) * | 2014-11-04 | 2015-02-25 | 南京信息工程大学 | Storage and retrieval method based on massive meteorological data |
CN104361110A (en) * | 2014-12-01 | 2015-02-18 | 广东电网有限责任公司清远供电局 | Mass electricity consumption data analysis system as well as real-time calculation method and data mining method |
CN105812394A (en) * | 2016-05-24 | 2016-07-27 | 王四春 | Novel application of cloud computing to cross-border electronic commerce |
CN106599174A (en) * | 2016-12-12 | 2017-04-26 | 国云科技股份有限公司 | Real-time news recommendation system and method thereof |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107797768A (en) * | 2017-10-11 | 2018-03-13 | 南京东方金信数据服务有限公司 | A kind of method and system for handling big data |
CN108334599A (en) * | 2018-01-31 | 2018-07-27 | 佛山市聚成知识产权服务有限公司 | A kind of analysis system based on big data |
CN108446306A (en) * | 2018-01-31 | 2018-08-24 | 佛山市聚成知识产权服务有限公司 | A kind of processing equipment of big data |
CN114144773A (en) * | 2019-08-01 | 2022-03-04 | 国际商业机器公司 | Adjusting conversational flow based on behavior in human-machine cognitive interactions |
CN114144773B (en) * | 2019-08-01 | 2022-10-28 | 国际商业机器公司 | Adjusting conversational flow based on behavior in human-machine cognitive interactions |
CN111416889A (en) * | 2020-01-16 | 2020-07-14 | 重庆大学 | Communication method and system adapted through GATT and exception handling |
CN111416889B (en) * | 2020-01-16 | 2022-03-04 | 重庆大学 | Communication method and system adapted through GATT and exception handling |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105205055A (en) | Big data analyzing system | |
CN107153710A (en) | A kind of big data processing method and system | |
JP7465939B2 (en) | A Novel Non-parametric Statistical Behavioral Identification Ecosystem for Power Fraud Detection | |
Zheng et al. | Real-time intelligent big data processing: technology, platform, and applications | |
US10110687B2 (en) | Session based web usage reporter | |
CN111949834B (en) | Site selection method and site selection platform system | |
Bordin et al. | Dspbench: A suite of benchmark applications for distributed data stream processing systems | |
US20140172506A1 (en) | Customer segmentation | |
CN108885627A (en) | Inquiry, that is, service system of query result data is provided to Terminal Server Client | |
CN107590681A (en) | A kind of big data analysis system | |
CN105976242A (en) | Transaction fraud detection method and system based on real-time streaming data analysis | |
CN108268565B (en) | Method and system for processing user browsing behavior data based on data warehouse | |
CN104394118A (en) | User identity identification method and system | |
WO2018223672A1 (en) | Data processing method and device | |
WO2019242343A1 (en) | Marketing information release platform construction method and apparatus | |
WO2016127632A1 (en) | Method, system, and computer device for electronic payment behavior-based data processing | |
Balar et al. | Forecasting consumer behavior with innovative value proposition for organizations using big data analytics | |
CN105976226A (en) | Internet E-commerce platform | |
US20140280237A1 (en) | Method and system for identifying sets of social look-alike users | |
CN115168460A (en) | Data processing method, data transaction system, device and storage medium | |
US11494788B1 (en) | Triggering supplemental channel communications based on data from non-transactional communication sessions | |
WO2015041950A1 (en) | Method and system for determining a next best offer | |
Xu et al. | Novel model of e-commerce marketing based on big data analysis and processing | |
CN108446306A (en) | A kind of processing equipment of big data | |
CN108334599A (en) | A kind of analysis system based on big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20170912 |