CN108664549A - A kind of big data processing system, method and apparatus - Google Patents

A kind of big data processing system, method and apparatus Download PDF

Info

Publication number
CN108664549A
CN108664549A CN201810268396.6A CN201810268396A CN108664549A CN 108664549 A CN108664549 A CN 108664549A CN 201810268396 A CN201810268396 A CN 201810268396A CN 108664549 A CN108664549 A CN 108664549A
Authority
CN
China
Prior art keywords
data
user
user behavior
behavior data
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810268396.6A
Other languages
Chinese (zh)
Inventor
林炳文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Nova Technology Singapore Holdings Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810268396.6A priority Critical patent/CN108664549A/en
Publication of CN108664549A publication Critical patent/CN108664549A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This application discloses a kind of big data processing system, method and apparatus, which includes:Data acquisition platform obtains the user behavior data that user is generated during using Internet resources;Data processing platform (DPP) carries out distributed treatment to the user behavior data, obtains the characteristic index of the user behavior data, the service condition that the characteristic index is used to characterize the user to the Internet resources;Data storing platform stores the characteristic index by the way of distributed storage.

Description

A kind of big data processing system, method and apparatus
Technical field
This application involves field of computer technology more particularly to a kind of big data processing system, method and apparatus.
Background technology
With the rapid development of Internet technology, more and more resource providers are recommended by internet to user various Resource, for example, debt-credit resource, sale resource etc..After recommending these resources to user by internet, user can be by mutual Networking (including check, obtain, consume) these resources.After user is using these resources, resource provider can be right The user behavior data that user generates is analyzed, and is assessed the service condition of resource according to analysis result.
It is more and more with the quantity of Internet user, resource provider by internet to user recommend resource after, The data volume for the user behavior data that user generates is also more and more, in this way, being analyzed in the user behavior data to user When, since the data volume of user behavior data is more, cause data processing speed slow, and then cannot be effectively to resource Service condition is assessed.
Invention content
A kind of big data processing system of the embodiment of the present application offer, method and apparatus are recommending to provide for solving to user Behind source, since the data volume of the user behavior data of user's generation is more, cause data processing speed slow, and then cannot have The problem of service condition of resource is assessed on effect ground.
In order to solve the above technical problems, what the embodiment of the present application was realized in:
In a first aspect, propose a kind of big data processing system, including:
Data acquisition platform obtains the user behavior data that user is generated during using Internet resources;
Data processing platform (DPP) carries out distributed treatment to the user behavior data, obtains the user behavior data Characteristic index, the service condition that the characteristic index is used to characterize the user to the Internet resources;
Data storing platform stores the characteristic index by the way of distributed storage.
Second aspect, it is proposed that a kind of big data processing method, including:
Obtain the user behavior data that user is generated during using Internet resources;
Distributed treatment is carried out to the user behavior data, obtains the characteristic index of the user behavior data, it is described The service condition that characteristic index is used to characterize the user to the Internet resources;
The characteristic index is stored by the way of distributed storage.
The third aspect, it is proposed that a kind of data processing equipment, including:
Acquiring unit obtains the user behavior data that user is generated during using Internet resources;
Processing unit carries out distributed treatment to the user behavior data, obtains the feature of the user behavior data Index, the service condition that the characteristic index is used to characterize the user to the Internet resources;
Storage unit stores the characteristic index by the way of distributed storage.
Fourth aspect proposes that a kind of electronic equipment, the electronic equipment include:
Processor;And
It is arranged to the memory of storage computer executable instructions, which makes the processor when executed Execute following operation:
Obtain the user behavior data that user is generated during using Internet resources;
Distributed treatment is carried out to the user behavior data, obtains the characteristic index of the user behavior data, it is described The service condition that characteristic index is used to characterize the user to the Internet resources;
The characteristic index is stored by the way of distributed storage.
5th aspect, proposes a kind of computer readable storage medium, the computer-readable recording medium storage one or Multiple programs, one or more of programs by the electronic equipment including multiple application programs when being executed so that the electronics Equipment executes following methods:
Obtain the user behavior data that user is generated during using Internet resources;
Distributed treatment is carried out to the user behavior data, obtains the characteristic index of the user behavior data, it is described The service condition that characteristic index is used to characterize the user to the Internet resources;
The characteristic index is stored by the way of distributed storage.
Above-mentioned at least one technical solution that the embodiment of the present application uses can reach following advantageous effect:
Technical solution provided by the embodiments of the present application, data acquisition platform obtain user in the process using Internet resources The user behavior data of middle generation, data processing platform (DPP) handle the user behavior data of user, obtain characterization user couple The characteristic index of the service condition of Internet resources, data storing platform are stored the feature by the way of distributed storage and referred to Mark.In this way, since the method that distributed treatment may be used in data processing platform (DPP) uses during Internet resources user The user behavior data of generation is handled, it is thus possible to improve data processing speed, and then can quickly obtain for characterizing Indices of the user to the service condition of Internet resources.Further, since being carried out to data by the way of distributed storage Storage, can also effectively improve the storage capacity of data.
Description of the drawings
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments described in application, for those of ordinary skill in the art, in the premise of not making the creative labor property Under, other drawings may also be obtained based on these drawings.
Fig. 1 is the structural schematic diagram of one embodiment big data processing system of the application;
Fig. 2 is the schematic diagram of one embodiment big data processing method of the application;
Fig. 3 is the flow diagram of one embodiment big data processing method of the application;
Fig. 4 is the structural schematic diagram of one embodiment electronic equipment of the application;
Fig. 5 is the structural schematic diagram of one embodiment big data processing unit of the application.
Specific implementation mode
In general, resource provider after recommending Internet resources to user by internet, can use mutually user The user behavior data generated during networked resources is analyzed, and the use for characterizing user to Internet resources is obtained The various indexs of situation, and the service condition of Internet resources is assessed according to these indexs.
For provider to borrow or lend money resource recommends loan product by internet to user, the provider for borrowing or lending money resource can To recommend loan product, user browsing to this to user by channels such as official website, cell phone software, wechat public platform, service windows After loan product, it can click and check the loan product, which can also be registered, borrow money etc..Make in user During with (including click and check, register, borrow money) loan product, the provider for borrowing or lending money resource can obtain user's generation User behavior data, and based on these user behavior datas analyze to obtain page browsing amount, the registration user of the loan product The indexs such as number, credit number of users, aggregate level of borrowing.In this way, the indices that the provider of debt-credit resource can obtain according to analysis The service condition of loan product to being recommended is assessed.For example, the marketing effectiveness of assessment loan product, each recommendation channel Cost and income etc..
In the prior art, resource provider get user using Internet resources generate user behavior data after, Generally use relevant database carries out statistics and analysis to the user behavior data of user.Specifically, first, relational data Library can cache the user behavior data that user generates into its internal two-dimensional table, the use stored in multiple two-dimensional tables Family behavioral data is interrelated;Secondly, user behavior data is analyzed based on the incidence relation between two-dimensional table, is obtained User uses the indices of Internet resources;Finally, the indices that analysis obtains are stored into relevant database.This Sample, resource recommendation side can use user according to the indices stored in relevant database the use feelings of Internet resources Condition is assessed.
However, the quantity with Internet user is more and more, what user was generated during using Internet resources The data volume of user behavior data is also more and more, when being analyzed user behavior data using relevant database, until It has the following disadvantages less:
(1) relevant database can not support the storage of the user behavior data of up to ten million grades or more;
(2) relational data needs to be associated with numerous two-dimensional tables, in user when analyzing user behavior data In the case of the data volume of behavioral data is more, data processing speed is slow;
(3) due to the storage capacity of relevant database deficiency, it is easy to lead to the loss of user behavior data, for example, The user behavior data that user generates in set period of time is lost, in this way, the case where the user behavior data of user is lost Under, it will be unable to analysis and obtain user within this time to the service condition of Internet resources, also you can't get users right The entire service life of Internet resources is (for example, user may include to the entire service life of loan product:Click, register, Credit and loaning bill) the interior service condition to Internet resources.
It can be seen that when existing use relevant database analyzes user behavior data, due in user's row For data data volume it is more in the case of, the storage capacity and data-handling capacity of relevant database are insufficient, cause to count It is relatively difficult according to handling, the indices of Internet resources cannot be effectively obtained, and then cannot be to the use feelings of Internet resources Condition is effectively assessed.
In view of this, a kind of big data processing system of the embodiment of the present application offer, method and apparatus, wherein the big data Processing system includes:Data acquisition platform obtains the user behavior data that user is generated during using Internet resources; Data processing platform (DPP) carries out distributed treatment to the user behavior data, obtains the characteristic index of the user behavior data, The service condition that the characteristic index is used to characterize the user to the Internet resources;Data storing platform, using distribution The mode of formula storage stores the characteristic index.
Compared with prior art, at least there is following advantageous effect in the technical solution of the embodiment of the present application:
(1) since the method for distributed treatment may be used to user in the mistake using Internet resources in data processing platform (DPP) The user behavior data generated in journey is handled, without being associated with multiple tables, it is thus possible to improve data processing speed, in turn The indices for characterizing user to the service condition of Internet resources can quickly be obtained;
(2) due to being stored to data by the way of distributed storage, the storage energy of data can also be effectively improved Power can support the data of up to ten million grades or more to store;
(3) since the data storage capacities of the embodiment of the present application are stronger, data will not lose easily, therefore, can obtain User in the entire service life of Internet resources to the service condition of Internet resources, and then can be to Internet resources Service condition is effectively assessed.
In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with the application reality The attached drawing in example is applied, technical solutions in the embodiments of the present application is clearly and completely described, it is clear that described implementation Example is merely a part but not all of the embodiments of the present application.Based on the embodiment in the application, this field is common The every other embodiment that technical staff is obtained without creative efforts should all belong to the application protection Range.
It should be noted that in technical solution provided by the embodiments of the present application, resource provider can by internet to User recommends Internet resources, specifically, can be by official website, cell phone software, wechat public platform, service window etc. channels to user Recommend Internet resources, wherein the Internet resources can be debt-credit resource, can also be sale resource, can also be it The resource that he can be recommended by internet, no longer illustrates one by one here.
Resource recommendation side can use technical solution provided by the embodiments of the present application after recommending Internet resources to user The user behavior data generated during using Internet resources to user is handled, and is quickly obtained for characterizing user To the various indexs of the service condition of Internet resources, and then by these indexs effectively to the service condition of Internet resources It is assessed.
It should also be noted that, the embodiment of the present application described in can for data acquisition, processing and the platform of storage To be interpreted as function integrated system, it can specifically be presented as the cluster being made of one or more server, computer etc. or be System etc..
Below in conjunction with attached drawing, the technical solution that each embodiment of the application provides is described in detail.
Fig. 1 is the structural schematic diagram of one embodiment big data processing system 10 of the application.The big data processing system System 10 may include:Data acquisition platform 11, data processing platform (DPP) 12 and data storing platform 13, wherein:
Data acquisition platform 11 obtains the user behavior data that user is generated during using Internet resources;
Data processing platform (DPP) 12 carries out distributed treatment to the user behavior data, obtains the user behavior data Characteristic index, the characteristic index be used for characterize the user to the Internet resources service condition;
Data storing platform 13 stores the characteristic index by the way of distributed storage.
In the embodiment of the present invention, resource recommendation side is used in user (including to mutual after recommending Internet resources to user Click that networked resources carry out such as checks, obtains at the operations) during Internet resources, the data acquisition platform 11 can be with Obtain the user behavior data that user generates.Wherein, the data that the data acquisition platform 11 is got can be that user is real-time The user behavior data of generation.
The data acquisition platform 11 can specifically obtain following user behavior when obtaining the user behavior data of user At least one of data, including:
(1) daily record that user generates during using the Internet resources by webpage.
After recommending Internet resources to user by browser, user can make in the webpage of browser for resource recommendation side With (including click, the check, obtain) Internet resources, the data acquisition platform 11 can obtain user and pass through webpage The daily record generated during using Internet resources.
(2) daily record that user generates during using the Internet resources by application software.
Resource recommendation side Internet resources are being recommended to user by application software (can be official APP or other APP) Afterwards, user can use the Internet resources, the data acquisition platform 11 that can obtain user by answering in application software The daily record generated during using Internet resources with software.
(3) daily record that user generates in other channels.
Recommending interconnection by other channels (for example, public platform) except browser and application software in resource recommendation side After net resource, for user during using Internet resources by other channels, data acquisition platform 11 can obtain user The daily record generated in other channels.
(4) daily record that user generates in the activity system of Internet resources.
Resource recommendation side can recommend Internet resources internet in a manner of preferential activity or advertisement, and user is logical Preferential activity system or ad system are crossed using during Internet resources, data acquisition platform 11 can obtain user's generation Daily record.
(5) user is directed to the business information of the Internet resources.
Data acquisition platform 11, can be to getting after at least one of four kinds of daily records for getting above-mentioned record Daily record is analyzed, and the service request serial number of user is obtained, can be from user behavior according to the service request serial number of user It hooks and obtains the business information that user is directed to Internet resources in catenary system.Wherein, user behavior hooks can be stored in catenary system Different service request serial numbers and business information corresponding with service request serial number.
In the embodiment of the present application, the data acquisition platform 11 can have the function of collecting user action flow data Cluster, it is preferable that for the ease of obtaining the user behavior data that user is generated during using Internet resources, the number Can be Kafka clusters according to platform 11 is obtained, Kafka is that a kind of distributed post of high-throughput subscribes to message system, can be with Handle everything flow data of the user in internet.
It should be noted that the particular content that above-mentioned five kinds of user behavior datas include can be covered mutually, in order to avoid User behavior data is omitted, and then effectively the service condition of Internet resources cannot be assessed, it is preferable that the data Five kinds of user behavior datas of above-mentioned record can be obtained by obtaining platform 11.
After the data acquisition platform 11 gets user behavior data, user behavior data can be sent to the number According to processing platform 12, in order to which data processing platform (DPP) 12 can carry out distributed treatment to user behavior data, alternatively, the number According to processing platform 12 can also actively from the data acquisition platform 11 obtain user behavior data, and to user behavior data into Row distributed treatment, is not specifically limited here.
The data processing platform (DPP) 12 can carry out user behavior data before handling user behavior data Cleaning (pre-processes), includes the user behavior data of removal mistake, the user behavior data etc. of repetition is removed, to user's row After being cleaned for data, distributed treatment can be carried out to the user behavior data after cleaning.
The data processing platform (DPP) 12 is available for characterization and uses after carrying out distributed treatment to user behavior data At least one characteristic index of the family to the service condition of the Internet resources.For example, Internet resources are loan product, then, Characteristic index can be the page browsing amount of loan product, registration number of users, credit number of users, aggregate level of borrowing etc..
The embodiment of the present application compared with prior art, since distributed treatment may be used in the data processing platform (DPP) 12 The user behavior data that method generates during using Internet resources to user is handled, and without being associated with multiple tables, Therefore, the processing speed of user behavior data can be accelerated, and then can quickly be obtained for characterizing user to Internet resources Service condition indices.
In one embodiment of the application, the data processing platform (DPP) 12 can specifically include:First processing platform 121 And second processing platform 122.
First processing platform 121 can carry out distributed online processing to user behavior data, obtain the first spy Levy index, wherein the fisrt feature index can be used for characterizing user to the real-time service condition of the Internet resources, tool Body can be real-time service condition of the user to the Internet resources using minute, hour or day as granularity.
The second processing platform 122 can carry out distributed processed offline to user behavior data, obtain the second spy Levy index, wherein the second feature index can be used for characterizing user to the history service condition of the Internet resources, tool Body can be history service condition of the user to the Internet resources using week, the moon or season as granularity.
For the second feature index compared with the fisrt feature index, the second feature index can be considered as coarseness Index, the fisrt feature index can be considered as fine granularity index.It, can be by the fisrt feature index in the embodiment of the present application It is mutually authenticated with the second feature index, according to verification result, real-time service condition to Internet resources and is gone through History service condition is assessed.
In compared to the prior art using relational data for user behavior data carries out processed offline, the application is real It applies example and online processing is carried out to user behavior data using first processing platform 121, and use the second processing platform 122 pairs of user behavior datas carry out processed offline, increase data processing method, while obtaining second feature index, also Fisrt feature index can be obtained, user can be obtained to mutual according to the fisrt feature index and the second feature index The history of networked resources and real-time service condition, and then can be according to user to the different service conditions of Internet resources, more The service condition effectively to Internet resources is added to assess.
It should be noted that in practical applications, the data processing platform (DPP) 12 may include first processing platform At least one of 121 and the second processing platform 122, at first processing platform 121 and described second At least one of platform 122 handles user behavior data, compared to existing technologies, due to using distributed The method of processing handles data, therefore, can accelerate data processing speed, and then quickly obtain for characterizing user couple The indices of the service condition of Internet resources.
The data processing platform (DPP) 12 is handled user behavior data according to the method for above-mentioned record, obtains feature After index, the mode that distributed storage may be used in the data storing platform 13 stores the characteristic index, relative to existing The storage capacity of relevant database is insufficient in technology, since the embodiment of the present application stores number by the way of distributed storage According to therefore, storage capacity is stronger, and the data of up to ten million grades or more can be supported to store.
In the embodiment of the present application, the data storing platform 13 can be the database for having distributed storage ability.Institute It states data storing platform 13 and can specifically include first database 131 and the second database 132, the first database 131 And second database 132 can be used for storing the fisrt feature index, and, storage mode is distributed storage.Tool Body:
Second database 132 caches the fisrt feature index, and every preset time, will be described pre- If the fisrt feature index cached in the time is synchronized in the first database 131;
The fisrt feature index is carried out distribution in the form of unstructured data and deposited by the first database 131 Storage.
First data platform 121 can obtain real-time processing after carrying out online processing to user behavior data The fisrt feature index cache into second database 132, second database 132 can be every preset time The fisrt feature index cached in the preset time is synchronized to institute by (for example, every one minute, every half an hour etc.) It states in first database 131.
For example, first data platform 121 carries out distributed treatment using N number of processor to user behavior data, often A processor can cache the fisrt feature index after processing obtains fisrt feature index to second database In 132, second database 132 adds up the fisrt feature index that each processor is handled, by preset time The fisrt feature index added up in (for example being one minute or one hour etc.) is synchronized in the first database 131.Later, Second database 132 can repeat aforesaid operations, that is, cache the fisrt feature index that N number of processor is handled, and will The fisrt feature index added up in preset time is synchronized in second database 132.
In the embodiment of the present application, first the fisrt feature index line is cached into second database 132, then by institute Fisrt feature index is stated to be synchronized in the first database 131 by second database 132, in this way, it is possible to prevente effectively from When directly the fisrt feature index being stored to the first database 131, due to needing the data volume stored is more to cause Data congestion, the problems such as storage speed is slow.
Second database 132 by the fisrt feature index after being synchronized to the first database 131, and described The mode that distributed storage may be used in one database 131 stores the fisrt feature index.In the present embodiment, for the ease of depositing More data are stored up, the first database 131 can be carried out the fisrt feature index in the form of unstructured data Storage.
In the embodiment of the present application, first data processing platform (DPP) 121 can be Stream Processing platform, in a kind of realization side In formula, for the ease of to user behavior data online processing and store the fisrt feature index, described the One processing platform 121 can be Storm, and the first database 131 can be point for having unstructured data storage capacity Cloth database, can be specifically Hbase, and second database 132 can carry out data in the form of key-value to deposit The distributed data base of storage can be specifically Redis, wherein Storm is a kind of distributed processing frame of big data in real time, Hbase is a PostgreSQL database distributed, towards row, and Redis, which is one, can be based on memory, can persistence daily record Type, key-value databases.
In the embodiment of the present application, the data storing platform 13 can also include:Third database 133 and the 4th data Library 134, the third database 133 can be used for storing the second feature index, and the 4th database 134 can be used for User behavior data is stored, and, the storage mode of the two is distributed storage.Specifically:
4th database 134 carries out distribution to the user behavior data that the data acquisition platform 11 obtains and deposits Storage;
The second processing platform 122 obtains user of the user in set period of time from the 4th database 134 Behavioral data, and the user behavior data in the set period of time is handled, obtain the second feature index;
The second feature index is carried out distributed storage by the third database 133 in the form of structural data.
The second processing platform 122 to user behavior data when carrying out processed offline, due to user to be treated Behavioral data is the user behavior data of history, and what data acquisition platform 11 obtained is real-time user behavior data, therefore, Processed offline is carried out to user behavior data for the ease of the second processing platform 122, the 4th database may be used 134 store real-time user behavior data, in this way, being carried out to user behavior data in the second processing platform 122 When processed offline, it can be handled based on the user behavior data stored in the 4th database 134.
The second processing platform 122, can be from the 4th data when carrying out processed offline to user behavior data The user behavior data in set period of time is obtained in library 134, for example, can be obtained from the 4th database 134 from working as User behavior data of the preceding moment within the previous moon.It, can after getting the user behavior data in the set period of time To be cleaned to user behavior data, distributed treatment is carried out to the user behavior data after cleaning, it is special to obtain described second Levy index, the second feature index can with embodiments user in the set period of time to the use feelings of Internet resources Condition.
The second processing platform 122 can deposit the second feature index after obtaining the second feature index In storage to the third database 133, in the embodiment of the present application, since the second feature index is relative to the fisrt feature It is coarseness index for index, data volume is relatively fewer, and therefore, the third database 133 can be by the second feature Value carries out distributed storage in the form of structural data.It certainly, can also be by described in order to store more data Two characteristic values carry out distributed storage in the form of unstructured data.
In the embodiment of the present application, the second processing platform 122 can be the platform with data-handling capacity of large quantities. In one implementation, for the ease of to user behavior data carry out processed offline, and to the second feature index into Row storage, the second processing platform 122 can be Spark, and the third database 133 can be deposited with structural data The distributed data base of energy storage power, can be specifically Hive, and the 4th database 134 can carry out data with database table The distributed data base of storage can also be specifically Hive.Wherein, Spark is that one kind aiming at large-scale data processing and designs Universal-purpose quick computing engines, Hive is a Tool for Data Warehouse, can the data file of structuring be mapped as one Database table.
In another embodiment of the application, the big data processing system can also include:Data Query Platform 14, In:
The Data Query Platform 14, provides interactive interface, and by the interactive interface by the fisrt feature index And the second feature index is shown, in order to the use feelings according to the characteristic index of displaying to the Internet resources Condition is assessed.
In the embodiment of the present application, the interactive interface that can be provided by the Data Query Platform 14 is inquired user and is used The various indexs of Internet resources can be specifically the fisrt feature index, can also be the second feature index.It is described Inquiry can be obtained characteristic index and is shown by Data Query Platform 14, in this way, resource provider can be according to the spy of displaying The service condition that index understands user to Internet resources is levied, and then the service condition of Internet resources is assessed.
In the embodiment of the present application, the big data processing system can also include:Embedded data storing platform 15, In:
The embedded data storing platform 15 stores the behavior detailed data of different user, and by different use The behavior detailed data at family is shown by interactive interface, the behavior detailed data of the different user by the data at Platform 12 is analyzed to obtain to the user behavior data.
The embedded data storing platform 15 can have the function of data storage and data displaying simultaneously, specifically Ground, the embedded data storing platform 15 can store the behavior detailed data of user, and by the behavior detailed data of user It is shown by interactive interface.Wherein, the behavior detailed data of user can be the details of user behavior data, specifically User behavior data can be analyzed to obtain by the data processing platform (DPP) 12.
In the embodiment of the present application, the behavior detailed data of user can be real time data, can also be historical data, in real time Behavior detailed data user behavior data can be analyzed to obtain by the first processing platform 121, the behavior detailed data of history User behavior data can be analyzed to obtain by second processing platform 122.Analysis obtain real-time behavior detailed data and After the behavior detailed data of history, these behavior detailed datas can be stored into the embedded data storing platform 15.
When needing to check the behavior detailed data of user, can be inquired in the embedded data storing platform 15, After inquiry obtains the behavior detailed data of user, it can be shown by interactive interface, in this way, the information of Internet resources Provider can be assessed user using the case where Internet resources according to the behavior detailed data of user.
In the embodiment of the present application, the service condition of Internet resources is assessed according to characteristic index, Ke Yishi:According to The costs and benefits of Internet resources, root are recommended in user's browsing figureofmerit daily to Internet resources, assessment by internet The index for obtaining Internet resources weekly in each recommendation channel according to user, assesses each recommendation effect for recommending channel, etc. here not Another one illustrates.
Further, since the embodiment of the present application can use user according to the fisrt feature index feelings of Internet resources Condition is assessed in real time, therefore, after recommending Internet resources to user, can use user the feelings of Internet resources in real time Condition is monitored, and in order to active, copes with the various emergency cases occurred in Internet resources recommendation process in time.
In one implementation, the Data Query Platform 14 can be Presto, can pass through the friendship of browser Fisrt feature index and the second feature index described in mutual showing interface.The embedded data storing platform 15 can be The embedded storage systems of ES can show the behavior detailed data of user by the interactive interface of browser.Wherein, Presto For the data query engine of Facebook exploitations, rapidly interactive analysis can be carried out to the data of 250PB or more.
It should be noted that the content based on above-mentioned record, the second processing platform 122 to user behavior data into When row processed offline, it can be combined with the fisrt feature index and first processing platform 121 analyze obtained user Behavior detailed data, common determination obtains the second feature index.Specifically, the second processing platform 122 can be from User behavior data corresponding fisrt feature index and the user in set period of time are obtained in the first database 131 Behavior detailed data (alternatively, the first database 131 can periodically by the fisrt feature index in set period of time with And the behavior detailed data of user is sent to the second processing platform 122), to the fisrt feature index of acquisition, user Behavior detailed data and the user behavior data out of set period of time that obtained in the 4th database 134 be total to With processing, the second feature index is obtained.
Fig. 2 is the schematic diagram of one embodiment big data processing method of the application.Big data processing method shown in Fig. 2 Can as shown in Figure 1 embodiment record big data processing system realize.
In Fig. 2, Kafka clusters can be considered as the data acquisition platform 11 of embodiment illustrated in fig. 1 record, be carried in resource After supplier recommends Internet resources by internet to user, Kafka clusters can obtain user and use the Internet resources During the user behavior data that generates, which may include:User uses the internet by webpage Daily record that the daily record that is generated during resource, user generate during using the Internet resources by application software, The daily record and user that user generates in the daily record that other channels generate, user in the activity system of Internet resources are directed to institute State at least one of the business information of Internet resources.
On the one hand user behavior data can be sent to Storm by Kafka clusters after getting user behavior data (the first processing platform that embodiment illustrated in fig. 1 record can be considered as), in order to which Storm locates user behavior data online Reason, on the other hand user behavior data can also be sent to Hive (can be considered as the 4th data of embodiment illustrated in fig. 1 record Library) it is stored, in order to which Spark carries out processed offline to the user behavior data stored in Hive.
Storm data processing shelfs are a kind of Stream Processing platforms, can when carrying out online processing to user behavior data User behavior data is cleaned and (be pre-processed) first, includes the user behavior data of removal mistake, remove repetition User behavior data etc. can carry out distributed treatment to user behavior data, obtain being used for table after being cleaned to data Take over fisrt feature index of the family to the real-time service condition of Internet resources for use.
After obtaining fisrt feature index, can the fisrt feature index be temporarily stored in Redis (can be considered as Storm Second database described in embodiment illustrated in fig. 1) in, Redis can be every preset time, will be in the preset time The fisrt feature index of caching, which is synchronized to Hbase, (can be considered as first data described in embodiment illustrated in fig. 1 Library) in.
Wherein, the fisrt feature index using minute as granularity can be stored in Hbase, can also be stored and be with hour The fisrt feature index of granularity can also store the fisrt feature index using hour as granularity, specifically can be by institute It states fisrt feature index and carries out distributed storage in the form of unstructured data.
It should be noted that Storm can also be carried out actively from acquisition user behavior data in Kafka clusters from distribution Reason.As shown in Fig. 2, Storm actively can obtain user from Kafka clusters and be directed to according to the service request serial number of user The business information of Internet resources, and the business information of user progress distributed treatment is obtained corresponding with the business information of user Fisrt feature index, by the fisrt feature information cache to Redis, and then be synchronized in Hbase.
In addition, Storm can also analyze user behavior data to obtain the behavior detailed data of user, and by user Behavior detailed data stored into Hbase by Redis.
In Fig. 2, when Spark carries out processed offline to user behavior data, the user behavior for obtaining Kafka clusters is needed Data are stored into Hive, when carrying out data buffer storage, during Kafka clusters can be with Flume sink (result collection system) It is situated between, user behavior data is stored into Hive.
It is stored to Hive by user behavior data, Spark can be from the user obtained in Hive in set period of time Behavioral data later can clean the user behavior data of acquisition, and distributed treatment is carried out to the data after cleaning, Obtain the second feature index for characterizing user to the history service condition of Internet resources.
The second feature index can be still stored in Hive by Spark after obtaining second feature index, wherein The second feature index using week as granularity can be stored in Hive, can also store the second feature using the moon as granularity Index can also store the second feature index using season as granularity, specifically can be by the second feature index structure The form for changing data carries out distributed storage.
It should be noted that Spark can also be obtained when carrying out processed offline to user behavior data from Hbase The behavior detailed data of the fisrt feature index and user in the set period of time, and referred to according to the fisrt feature Mark, the behavior detailed data of user and the user behavior data in the set period of time are jointly processed by obtain second spy Levy index.
In addition, Spark can also analyze the user behavior data in the set period of time to obtain the row of user For detailed data, and the behavior detailed data of user is stored into Hive.
It in Fig. 2, is stored to Hbase by the behavior detailed data of fisrt feature index and user, and special by second The behavior detailed data of sign index and user are stored to Hive, can be detailed by the behavior of these characteristic indexs and user Data are shown by Presto (can be considered as the Data Query Platform described in embodiment illustrated in fig. 1).
Presto can show the characteristic index that Storm and Spark processing obtains by the interactive interface of browser.Tool Body, resource provider can input the content for needing to inquire in the interactive interface of browser, and Presto can be according to input Content inquired, and query result is shown by the interactive interface of browser.
In addition, the behavior detailed data of the user obtained for on-line analysis and/or off-line analysis, can also be stored Into the embedded storage systems of ES, ES can directly show the behavior detailed data of user by the interactive interface of browser.Tool Body, resource provider can input the content for needing to inquire in the interactive interface of browser, and ES exists according to the content of input It is inquired inside it, and query result is shown by the interactive interface of browser.Fig. 2 is illustrated only on-line analysis The behavior detailed data of obtained user is stored and is shown.
Resource provider can make Internet resources after viewing the behavior detailed data of characteristic index or user It is assessed with situation.
Big data processing system provided by the embodiments of the present application, including:Data acquisition platform obtains user using interconnection The user behavior data generated during net resource;Data processing platform (DPP) carries out at distribution the user behavior data Reason, obtains the characteristic index of the user behavior data, and the characteristic index provides the internet for characterizing the user The service condition in source;Data storing platform stores the characteristic index by the way of distributed storage.In this way, due to data The user behavior that the method that distributed treatment may be used in processing platform generates user during using Internet resources Data are handled, it is thus possible to improve data processing speed, and then can quickly obtain providing internet for characterizing user The indices of the service condition in source.Further, since data are stored by the way of distributed storage, it can also be effectively Improve the storage capacity of data.
Fig. 3 is the flow diagram of one embodiment big data processing method of the application.It is provided by the embodiments of the present application Big data processing method can be executed and realized by big data processing system shown in fig. 1 or fig. 2, and the method is as described below.
S302:Obtain the user behavior data that user is generated during using Internet resources.
In S302, resource is provided after recommending Internet resources to user by internet, and user can use this mutually Networked resources (including click, check, obtaining the Internet resources etc.).It, can be with during user uses Internet resources Obtain the user behavior data that user generates.
The user behavior data that user generates may include that user is used by webpage during the Internet resources Daily record that the daily record of generation, user generate during using the Internet resources by application software, user are in other canals The daily record and user that the daily record of road generation, user generate in the activity system of Internet resources are directed to the Internet resources At least one of business information, specifically may refer to the content of embodiment illustrated in fig. 1 record, description be not repeated herein.
After getting the user behavior data that user is generated during using Internet resources, it can execute S304。
S304:Distributed treatment is carried out to the user behavior data, obtains the characteristic index of the user behavior data.
In S304, distributed treatment can be carried out to the user behavior data obtained in S302, and obtain for characterizing Characteristic index of the user to the service condition of the Internet resources.
In the embodiment of the present application, distributed treatment is carried out to the user behavior data, obtains the user behavior data Characteristic index, may include:
Online processing is carried out to the user behavior data, obtains fisrt feature index;
Processed offline is carried out to the user behavior data, obtains second feature index.
Specifically, online processing and processed offline two ways may be used to carry out at distribution user behavior data Reason, and respectively obtain fisrt feature index and second feature index, wherein the fisrt feature index, which can be used for characterizing, to be used Family can be specifically using minute, hour or day as granularity, user is to the internet to the real-time service condition of Internet resources The real-time service condition of resource, i.e., the described fisrt feature index can be considered as fine granularity index;The second feature index can be with Can be specifically using week, the moon or season as granularity, user is to institute for characterizing history service condition of the user to Internet resources The history service condition of Internet resources is stated, i.e., the described second feature index can be considered as coarseness index.
The embodiment of the present application is handled user behavior data using online and offline two ways, compared to existing skill Data processing method is increased, can be obtained for user behavior data carries out processed offline using relational data in art History and real-time service condition of the user to Internet resources, so can be according to user to Internet resources different uses Situation more efficiently assesses the service condition of Internet resources.
In one implementation, can online processing only be carried out to user behavior data, or only to user behavior data Processed offline is carried out, or online and offline processing is carried out to user behavior data simultaneously.For any processing mode, by Data are handled in using the method for distributed treatment, and without being associated with multiple tables, therefore, data processing speed can be accelerated Degree, and then quickly obtain the indices for characterizing user to the service condition of Internet resources.
It can specifically include when obtaining second feature index carrying out processed offline to the user behavior data:
The user behavior data is subjected to distributed storage;
From the user behavior data obtained in the user behavior data of storage in set period of time;
User behavior data in the set period of time is handled, the second feature index is obtained.
Specific implementation may refer to the related content of embodiment illustrated in fig. 1 record, and description is not repeated herein.
Optionally, the embodiment of the present application can also divide when carrying out on-line analysis and off-line analysis to user behavior data Analysis obtains the behavior detailed data of user, specifically may refer to the related content of embodiment illustrated in fig. 1 record, also no longer heavy here Multiple description.
After obtaining the fisrt feature index and the second feature index, S306 can be executed.
S306:The characteristic index is stored by the way of distributed storage.
In S306, can the fisrt feature index that obtained in S304 and second feature index be subjected to distribution and deposited Storage.
When storing the fisrt feature index by the way of distributed storage, may include:
The fisrt feature index that online processing obtains is cached;
Every preset time, by the fisrt feature index cached in the preset time with the shape of unstructured data Formula carries out distributed storage.
When storing the second feature index by the way of distributed storage, may include:
The second feature index is subjected to distributed storage in the form of structural data.
The specific implementation that distributed storage is carried out to the fisrt feature index and the second feature index can With the related content that embodiment shown in Figure 1 is recorded, description is not repeated herein.
After carrying out distributed storage to the fisrt feature index and the second feature index, the method is also wrapped It includes:
The fisrt feature index and the second feature index are shown, in order to be referred to according to the feature of displaying Mark assesses the service condition of the Internet resources.
Specific implementation may refer to the related content of embodiment illustrated in fig. 1 record, and description is not repeated herein.
Optionally, after the behavior detail that user is obtained in S304, the method further includes:
The behavior detailed data of different user is shown, in order to according to the behavior detailed data of different user to institute The service condition for stating Internet resources is assessed, and the behavior detailed data of the different user passes through to the user behavior number It is obtained according to distributed treatment is carried out.
Specific implementation may refer to the related content of embodiment illustrated in fig. 1 record, and description is not repeated herein.
Big data processing method provided by the embodiments of the present application is obtained user and is generated during using Internet resources User behavior data;Distributed treatment is carried out to the user behavior data, the feature for obtaining the user behavior data refers to Mark, the service condition that the characteristic index is used to characterize the user to the Internet resources;Using the side of distributed storage Formula stores the characteristic index.It is produced during using Internet resources to user due to the method that distributed treatment may be used Raw user behavior data is handled, without being associated with multiple tables, it is thus possible to improve data processing speed, and then can be fast Speed obtains the indices for characterizing user to the service condition of Internet resources.Further, since using distributed storage Mode stores data, can also effectively improve the storage capacity of data.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the action recorded in detail in the claims or step can be come according to different from the sequence in embodiment It executes and desired result still may be implemented.In addition, the process described in the accompanying drawings not necessarily require show it is specific suitable Sequence or consecutive order could realize desired result.In some embodiments, multitasking and parallel processing be also can With or it may be advantageous.
Fig. 4 is the structural schematic diagram of one embodiment electronic equipment of the application.Referring to FIG. 4, in hardware view, the electricity Sub- equipment includes processor, further includes optionally internal bus, network interface, memory.Wherein, memory may include interior It deposits, such as high-speed random access memory (Random-Access Memory, RAM), it is also possible to further include non-volatile memories Device (non-volatile memory), for example, at least 1 magnetic disk storage etc..Certainly, which is also possible that other The required hardware of business.
Processor, network interface and memory can be connected with each other by internal bus, which can be ISA (Industry Standard Architecture, industry standard architecture) bus, PCI (Peripheral Component Interconnect, Peripheral Component Interconnect standard) bus or EISA (Extended Industry Standard Architecture, expanding the industrial standard structure) bus etc..The bus can be divided into address bus, data/address bus, control always Line etc..For ease of indicating, only indicated with a four-headed arrow in Fig. 4, it is not intended that an only bus or a type of Bus.
Memory, for storing program.Specifically, program may include program code, and said program code includes calculating Machine operational order.Memory may include memory and nonvolatile memory, and provide instruction and data to processor.
Processor is from then operation in corresponding computer program to memory is read in nonvolatile memory, in logical layer Data processing equipment is formed on face.Processor executes the program that memory is stored, and specifically for executing following operation:
Obtain the user behavior data that user is generated during using Internet resources;
Distributed treatment is carried out to the user behavior data, obtains the characteristic index of the user behavior data, it is described The service condition that characteristic index is used to characterize the user to the Internet resources;
The characteristic index is stored by the way of distributed storage.
The method that data processing equipment disclosed in the above-mentioned embodiment illustrated in fig. 4 such as the application executes can be applied to processor In, or realized by processor.Processor may be a kind of IC chip, the processing capacity with signal.It was realizing Each step of Cheng Zhong, the above method can be complete by the integrated logic circuit of the hardware in processor or the instruction of software form At.Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), Network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processor, DSP), it is application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing Field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device are divided Vertical door or transistor logic, discrete hardware components.It may be implemented or execute and is in the embodiment of the present application disclosed each Method, step and logic diagram.General processor can be microprocessor or the processor can also be any conventional place Manage device etc..The step of method in conjunction with disclosed in the embodiment of the present application, can be embodied directly in hardware decoding processor and execute At, or in decoding processor hardware and software module combination execute completion.Software module can be located at random access memory, This fields such as flash memory, read-only memory, programmable read only memory or electrically erasable programmable memory, register maturation In storage medium.The storage medium is located at memory, and processor reads the information in memory, and above-mentioned side is completed in conjunction with its hardware The step of method.
The method that the electronic equipment can also carry out Fig. 3, and realize the work(of data processing equipment in the embodiment shown in fig. 3 Can, details are not described herein for the embodiment of the present application.
Certainly, other than software realization mode, other realization methods are not precluded in the electronic equipment of the application, for example patrol Collect the mode etc. of device or software and hardware combining, that is to say, that the executive agent of following process flow is not limited to each patrol Unit is collected, can also be hardware or logical device.
The embodiment of the present application also proposed a kind of computer readable storage medium, the computer-readable recording medium storage one A or multiple programs, the one or more program include instruction, and the instruction is when the portable electronic for being included multiple application programs When equipment executes, the method that the portable electronic device can be made to execute embodiment illustrated in fig. 3, and specifically for executing following behaviour Make:
Obtain the user behavior data that user is generated during using Internet resources;
Distributed treatment is carried out to the user behavior data, obtains the characteristic index of the user behavior data, it is described The service condition that characteristic index is used to characterize the user to the Internet resources;
The characteristic index is stored by the way of distributed storage.
Fig. 5 is the structural schematic diagram of one embodiment big data processing unit 50 of the application.Referring to FIG. 5, in one kind In Software Implementation, the big data processing unit 50 may include:Acquiring unit 51, processing unit 52 and storage unit 53, Wherein:
Acquiring unit 51 obtains the user behavior data that user is generated during using Internet resources;
Processing unit 52 carries out distributed treatment to the user behavior data, obtains the spy of the user behavior data Levy index, the service condition that the characteristic index is used to characterize the user to the Internet resources;
Storage unit 53 stores the characteristic index by the way of distributed storage.
Optionally, the processing unit 52 carries out distributed treatment to the user behavior data, obtains user's row For the characteristic index of data, including:
Online processing is carried out to the user behavior data, obtains fisrt feature index;
Processed offline is carried out to the user behavior data, obtains second feature index.
Optionally, the storage unit 53 stores the fisrt feature index by the way of distributed storage, including:
The fisrt feature index that online processing obtains is cached;
Every preset time, by the fisrt feature index cached in the preset time with the shape of unstructured data Formula carries out distributed storage.
Optionally, the processing unit 52 carries out processed offline to the user behavior data, obtains second feature and refer to Mark, including:
The user behavior data is subjected to distributed storage;
From the user behavior data obtained in the user behavior data of storage in set period of time;
User behavior data in the set period of time is handled, the second feature index is obtained.
Optionally, the storage unit 53 stores the second feature index by the way of distributed storage, including:
The second feature index is subjected to distributed storage in the form of structural data.
Optionally, the big data processing unit 50 can also include:Display unit 54, wherein:
The fisrt feature index and the second feature index are shown by the display unit 54, in order to The service condition of the Internet resources is assessed according to the characteristic index of displaying.
Optionally, the behavior detailed data of different user can also be shown, in order to root by the display unit 54 The service condition of the Internet resources is assessed according to the behavior detailed data of different user, the behavior of the different user Detailed data is obtained by carrying out distributed treatment to the user behavior data.
The method that big data processing unit 50 can also carry out Fig. 3, and realize data processing equipment embodiment shown in Fig. 3 Function, details are not described herein for the embodiment of the present application.
In short, the foregoing is merely the preferred embodiment of the application, it is not intended to limit the protection domain of the application. Within the spirit and principles of this application, any modification, equivalent replacement, improvement and so on should be included in the application's Within protection domain.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology realizes information storage.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic tape cassette, tape magnetic disk storage or other magnetic storage apparatus Or any other non-transmission medium, it can be used for storage and can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability Including so that process, method, commodity or equipment including a series of elements include not only those elements, but also wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that wanted including described There is also other identical elements in the process of element, method, commodity or equipment.
Each embodiment in this specification is described in a progressive manner, identical similar portion between each embodiment Point just to refer each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so description is fairly simple, related place is referring to embodiment of the method Part explanation.

Claims (18)

1. a kind of big data processing system, including:
Data acquisition platform obtains the user behavior data that user is generated during using Internet resources;
Data processing platform (DPP) carries out distributed treatment to the user behavior data, obtains the feature of the user behavior data Index, the service condition that the characteristic index is used to characterize the user to the Internet resources;
Data storing platform stores the characteristic index by the way of distributed storage.
2. the system as claimed in claim 1, the data processing platform (DPP) include:First processing platform and second processing platform, Wherein:
First processing platform carries out online processing to the user behavior data, obtains fisrt feature index;
The second processing platform carries out processed offline to the user behavior data, obtains second feature index.
3. system as claimed in claim 2, the data storing platform include:First database and the second database, wherein:
Second database caches the fisrt feature index, and every preset time, will be in the preset time The fisrt feature index of caching is synchronized in the first database;
The fisrt feature index is carried out distributed storage by the first database in the form of unstructured data.
4. system as claimed in claim 2, the data storing platform further include:Third database and the 4th database, In:
4th database carries out distributed storage to the user behavior data that the data acquisition platform obtains;
The second processing platform obtains user behavior data of the user in set period of time from the 4th database, And the user behavior data in the set period of time is handled, obtain the second feature index;
The second feature index is carried out distributed storage by the third database in the form of structural data.
5. system as claimed in claim 2, the system also includes:Data Query Platform, wherein:
The Data Query Platform, provides interactive interface, and by the interactive interface by the fisrt feature index and institute It states second feature index to be shown, in order to carry out the service condition of the Internet resources according to the characteristic index of displaying Assessment.
6. system as claimed in claim 5, the system also includes:Embedded data storing platform, wherein:
The embedded data storing platform stores the behavior detailed data of different user, and by the row of different user It is shown by interactive interface for detailed data, the behavior detailed data of the different user passes through the data processing platform (DPP) The user behavior data is analyzed to obtain.
7. the system as claimed in claim 1,
The user behavior data that the data acquisition platform obtains includes:User uses the Internet resources by webpage During the daily record that generates, the daily record that user generates during using the Internet resources by application software, user The daily record generated in the activity system of Internet resources and user are directed in the business information of the Internet resources extremely Few one kind.
8. system as claimed in claim 2,
The data acquisition platform is the cluster collect with user action flow data, and first processing platform is stream Formula processing platform, the second processing platform are the platform with data-handling capacity of large quantities, and the data storing platform is Database with distributed storage ability.
9. a kind of big data processing method, including:
Obtain the user behavior data that user is generated during using Internet resources;
Distributed treatment is carried out to the user behavior data, obtains the characteristic index of the user behavior data, the feature The service condition that index is used to characterize the user to the Internet resources;
The characteristic index is stored by the way of distributed storage.
10. method as claimed in claim 9 carries out distributed treatment to the user behavior data, obtains user's row For the characteristic index of data, including:
Online processing is carried out to the user behavior data, obtains fisrt feature index;
Processed offline is carried out to the user behavior data, obtains second feature index.
11. method as claimed in claim 10 stores the fisrt feature index by the way of distributed storage, including:
The fisrt feature index that online processing obtains is cached;
Every preset time, by the fisrt feature index cached in the preset time in the form of unstructured data into Row distributed storage.
12. method as claimed in claim 10 carries out processed offline to the user behavior data, obtains second feature and refer to Mark, including:
The user behavior data is subjected to distributed storage;
From the user behavior data obtained in the user behavior data of storage in set period of time;
User behavior data in the set period of time is handled, the second feature index is obtained.
13. method as claimed in claim 12 stores the second feature index by the way of distributed storage, including:
The second feature index is subjected to distributed storage in the form of structural data.
14. method as claimed in claim 10, the method further include:
The fisrt feature index and the second feature index are shown, in order to according to the characteristic index pair of displaying The service condition of the Internet resources is assessed.
15. method as claimed in claim 14, the method further include:
The behavior detailed data of different user is shown, in order to according to the behavior detailed data of different user to it is described mutually The service condition of networked resources is assessed, the behavior detailed data of the different user by the user behavior data into Row distributed treatment obtains.
16. a kind of big data processing unit, including:
Acquiring unit obtains the user behavior data that user is generated during using Internet resources;
Processing unit carries out distributed treatment to the user behavior data, obtains the characteristic index of the user behavior data, The service condition that the characteristic index is used to characterize the user to the Internet resources;
Storage unit stores the characteristic index by the way of distributed storage.
17. a kind of electronic equipment, including:
Processor;And
It is arranged to the memory of storage computer executable instructions, which when executed execute the processor It operates below:
Obtain the user behavior data that user is generated during using Internet resources;
Distributed treatment is carried out to the user behavior data, obtains the characteristic index of the user behavior data, the feature The service condition that index is used to characterize the user to the Internet resources;
The characteristic index is stored by the way of distributed storage.
18. a kind of computer readable storage medium, the computer-readable recording medium storage one or more program, described one A or multiple programs by the electronic equipment including multiple application programs when being executed so that the electronic equipment is executed with lower section Method:
Obtain the user behavior data that user is generated during using Internet resources;
Distributed treatment is carried out to the user behavior data, obtains the characteristic index of the user behavior data, the feature The service condition that index is used to characterize the user to the Internet resources;
The characteristic index is stored by the way of distributed storage.
CN201810268396.6A 2018-03-29 2018-03-29 A kind of big data processing system, method and apparatus Pending CN108664549A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810268396.6A CN108664549A (en) 2018-03-29 2018-03-29 A kind of big data processing system, method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810268396.6A CN108664549A (en) 2018-03-29 2018-03-29 A kind of big data processing system, method and apparatus

Publications (1)

Publication Number Publication Date
CN108664549A true CN108664549A (en) 2018-10-16

Family

ID=63782770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810268396.6A Pending CN108664549A (en) 2018-03-29 2018-03-29 A kind of big data processing system, method and apparatus

Country Status (1)

Country Link
CN (1) CN108664549A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582693A (en) * 2018-11-26 2019-04-05 成都四方伟业软件股份有限公司 Mathematical logic expression processing method and device based on WEB
CN109766363A (en) * 2019-01-08 2019-05-17 北京江融信科技有限公司 Stream data processing method, system, electronic equipment and storage medium
CN110781238A (en) * 2019-10-08 2020-02-11 中国建设银行股份有限公司 Client view caching method and device based on combination of Redis and Hbase
CN111125042A (en) * 2019-11-13 2020-05-08 中国建设银行股份有限公司 Method and device for determining risk operation event
CN111899047A (en) * 2020-07-14 2020-11-06 拉扎斯网络科技(上海)有限公司 Resource recommendation method and device, computer equipment and computer-readable storage medium
CN113434376A (en) * 2021-06-24 2021-09-24 山东浪潮科学研究院有限公司 Web log analysis method and device based on NoSQL

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126641A (en) * 2016-06-24 2016-11-16 中国科学技术大学 A kind of real-time recommendation system and method based on Spark
CN106709003A (en) * 2016-12-23 2017-05-24 长沙理工大学 Hadoop-based mass log data processing method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126641A (en) * 2016-06-24 2016-11-16 中国科学技术大学 A kind of real-time recommendation system and method based on Spark
CN106709003A (en) * 2016-12-23 2017-05-24 长沙理工大学 Hadoop-based mass log data processing method

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582693A (en) * 2018-11-26 2019-04-05 成都四方伟业软件股份有限公司 Mathematical logic expression processing method and device based on WEB
CN109766363A (en) * 2019-01-08 2019-05-17 北京江融信科技有限公司 Stream data processing method, system, electronic equipment and storage medium
CN109766363B (en) * 2019-01-08 2021-06-11 北京江融信科技有限公司 Streaming data processing method, system, electronic device and storage medium
CN110781238A (en) * 2019-10-08 2020-02-11 中国建设银行股份有限公司 Client view caching method and device based on combination of Redis and Hbase
CN110781238B (en) * 2019-10-08 2022-09-13 中国建设银行股份有限公司 Client view caching method and device based on combination of Redis and Hbase
CN111125042A (en) * 2019-11-13 2020-05-08 中国建设银行股份有限公司 Method and device for determining risk operation event
CN111899047A (en) * 2020-07-14 2020-11-06 拉扎斯网络科技(上海)有限公司 Resource recommendation method and device, computer equipment and computer-readable storage medium
CN113434376A (en) * 2021-06-24 2021-09-24 山东浪潮科学研究院有限公司 Web log analysis method and device based on NoSQL
CN113434376B (en) * 2021-06-24 2023-04-11 山东浪潮科学研究院有限公司 Web log analysis method and device based on NoSQL

Similar Documents

Publication Publication Date Title
CN108664549A (en) A kind of big data processing system, method and apparatus
CN105989074B (en) A kind of method and apparatus recommend by mobile device information cold start-up
CN108510311A (en) A kind of method, apparatus and electronic equipment of determining marketing program
US20210112101A1 (en) Data set and algorithm validation, bias characterization, and valuation
CN106227832A (en) Application method of Internet big data technology architecture in business analysis in enterprise
CN102902775B (en) The method and system that internet calculates in real time
CN112465627B (en) Financial loan auditing method and system based on block chain and machine learning
CN103678659A (en) E-commerce website cheat user identification method and system based on random forest algorithm
CN107886366A (en) Generation method, sex fill method, terminal and the storage medium of Gender Classification model
Ermakova et al. Web tracking-A literature review on the state of research
CN108416627A (en) A kind of brand influence force monitoring method and system based on internet data
CN103518200B (en) Determine the unique visitor of network site
CN109274639A (en) The recognition methods of open platform abnormal data access and device
Ha et al. An analysis on information diffusion through BlogCast in a blogosphere
CN108108820A (en) For selecting the method and system of the feature of machine learning sample
Khan Graph analysis of the ethereum blockchain data: A survey of datasets, methods, and future work
CN109918429A (en) Spark data processing method and system based on Redis
CN110866698A (en) Device for assessing service score of service provider
Rastogi et al. Privacy and security issues in big data: Through Indian prospective
Navdeep et al. Role of big data analytics in analyzing e-Governance projects
Gaurav et al. An outline on big data and big data analytics
CN103593355A (en) User original content recommending method and device
EP2348417A2 (en) A method of storing and analysing data produced from interactions between external agents and a system
Rudikowa et al. The development of a data collection and analysis system based on social network users’ data
CN106570005A (en) Database cleaning method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200922

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200922

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240222

Address after: Guohao Times City # 20-01, 128 Meizhi Road, Singapore

Applicant after: Advanced Nova Technology (Singapore) Holdings Ltd.

Country or region after: Singapore

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant before: Innovative advanced technology Co.,Ltd.

Country or region before: Cayman Islands

TA01 Transfer of patent application right