CN106844588A - A kind of analysis method and system of the user behavior data based on web crawlers - Google Patents

A kind of analysis method and system of the user behavior data based on web crawlers Download PDF

Info

Publication number
CN106844588A
CN106844588A CN201710017268.XA CN201710017268A CN106844588A CN 106844588 A CN106844588 A CN 106844588A CN 201710017268 A CN201710017268 A CN 201710017268A CN 106844588 A CN106844588 A CN 106844588A
Authority
CN
China
Prior art keywords
data
analysis
user
module
url
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710017268.XA
Other languages
Chinese (zh)
Inventor
欧阳涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taizhou Jiji Intellectual Property Operation Co.,Ltd.
Original Assignee
Shanghai Feixun Data Communication Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Feixun Data Communication Technology Co Ltd filed Critical Shanghai Feixun Data Communication Technology Co Ltd
Priority to CN201710017268.XA priority Critical patent/CN106844588A/en
Publication of CN106844588A publication Critical patent/CN106844588A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user

Abstract

The present invention relates to data processing field, the analysis method and system of specially a kind of user behavior data based on web crawlers, the inventive method comprises the following steps:Step 1, user's internet behavior initial data is obtained;Step 2, by web crawlers in step 1 obtain data carry out intermediate analysis and export user behavior data standardize result;Step 3, correlation analysis and commercialization value information are extracted carrying out data field to user behavior data standardization result data in step 2, have the advantages that efficiency high, analysis precision are high.

Description

A kind of analysis method and system of the user behavior data based on web crawlers
Technical field
The present invention relates to technical field of data processing, the data of specially a kind of user behavior data based on web crawlers Analysis method and system.
Background technology
Intelligent movable equipment is also enabled consumers to not constrained by region and time, and shopping online is carried out whenever and wherever possible With amusement etc. behavior.Web page browsing situation for a user is analyzed, and we can therefrom deduce the user to mutual The degree of dependence of networking and divided using the behavior property of network, be the consumer online of potential type, or loyal type network Consumer, or other types.For a value for the behavioral data analysis of user less, in the same way to the whole nation This behavioral data of even Global Subscriber is analyzed, and huge commercial value is just hidden, while with internet behavior number According to the huge of amount, Data Collection, storage, the pressure analyzed are increasing.At present, big internet store, residential hardware equipment The enterprises such as provider, by the product of itself in the share and technology in market, can grab number of the user on the kind equipment According to, these data are carried out with the operations such as copy backup, clear and parsing, extract the webpage that browses of user, i.e. URL, by this A little URL, extract the network behavior feature of user, but can't accomplish good data analysis.
As Publication No. CN 101192227B patent disclose a kind of user behavior analysis method based on big data and System, by client Real-time Collection user behavior data, the contextual information of user behavior and page URL is combined, most Limits ground reappearing user browses the real scene of Web page, extracts comprehensive user behavior track, for analysis user behavior is carried For effective Data safeguard;And by safety analysis module for user behavior data provides safety guarantee, also using user behavior Body of data model is modeled to user behavior, is realized the shared of behavioural information semantic class and is reused, and improves model interoperability And reliability;Real-time Collection user behavior and context data are analyzed, and make result more reliable;With column storage database to this Body and behavioural information are stored, and are that Mass Data Management lays the foundation;By the powerful disposal ability of cloud computing technology and big rule Mould data storage capacities, body and its reasoning, Methods of Knowledge Discovering Based are combined, in real time analysis mass users behavioral data, in time User interest is obtained, so as to realize effectively being pushed with accurately user, the technology of the patent use of the disclosure is for mass data Collect, store, analyze the scarce capacity that will seem, efficiency is low, and accuracy also can be relatively low.
The content of the invention
It is an object of the invention to provide a kind of efficiency high, analysis precision a kind of user behavior based on web crawlers high The analysis method and system of data.
Above-mentioned technical purpose of the invention technical scheme is that:
A kind of analysis method of the user behavior data based on web crawlers, comprises the following steps:
Step 1, user's internet behavior initial data is obtained;
Step 2, by web crawlers in step 1 obtain data carry out intermediate analysis and export user behavior data advise Generalized result;
Step 3, correlation analysis and business carrying out data field to user behavior data standardization result data in step 2 Industry value information is extracted.
As to of the invention preferred, transmission data of the data acquisition including extraction user on hardware device in step 1 Duplication and backup storage.
As to it is of the invention preferably, the intermediate analysis in step 2 include:
Step 1) URL information extraction;
Step 2) URL information qualitative analysis;
Step 3) source file keyword frequency analysis.
As to of the invention preferred, step 1) include:
The decompression of initial data;
The decryption of initial data;
Protocol analysis after processing data;
URL and corresponding user profile are extracted.
As to it is of the invention preferably, step 2) realizations need to be analyzed the feature of existing webpage, set up URL divide Class identification model, to URL, content carries out information excavating and classification in itself.
As to of the invention preferred, when step 2), the qualitative success of URL information qualitative analysis then performs step 3), it is no Then, step 3 is not performed).
As to it is of the invention preferably, step 3) realization:
First, the extraction of some representative key words in info web is set up, key word library is set up;
Then, the source file information of webpage is obtained by the URL of user, for the keyword in this key word library in source The frequency occurred in fileinfo carries out statistics and analysis, extracts the behavioral data qualitative analysis of web page user.
A kind of analysis system of the user behavior data based on web crawlers, including user's initial data acquisition module, URL Information extraction modules, URL information qualitative analysis module, source file keyword frequency analysis module, behavioral data standardization output Module, data analysis module, wherein:
User's initial data acquisition module is used to obtain user's internet behavior initial data;
The URL information extraction module is used to extract the URL letters that user in user's internet behavior initial data browses webpage Breath;
The URL information qualitative analysis module is qualitative for being carried out to the URL information that the URL information extraction module is extracted Analysis;
The source file keyword frequency analysis module is used to extract the URL information extraction module by web crawlers URL information carry out qualitative analysis;
The behavioral data standardization output module is used for the URL information qualitative analysis module and the source file is crucial User behavior data standardization result output is carried out after word frequency analysis module qualitative analysis;
The data analysis module is used to carry out data word to the result of behavioral data standardization output module output Intersegmental correlation analysis and commercialization value information are extracted.
As to it is of the invention preferably, the URL information extraction module include initial data decompression submodule, initial data Protocol analysis submodule and URL and corresponding user profile extraction module after decryption submodule, processing data, wherein, it is described Initial data decompression submodule is used for the decompression to user's internet behavior initial data;
The initial data decryption submodule is used for the decryption to user's internet behavior initial data;
Protocol analysis submodule after the processing data is used to decompress initial data submodule and initial data decryption Data after submodule treatment carry out protocol analysis;
The protocol analysis submodule parsing that the URL and corresponding user profile extraction module are used for after the processing data Data afterwards carry out URL and the data of corresponding user profile are extracted.
As to it is of the invention preferably, data analysis module is defined as artificial analysis module.
The present invention utilizes existing technology and hardware environment, and the internet behavior initial data of user is collected and deposited Storage;Hadoop aggregated structures are built, the data to obtaining are stored, thought is processed to mass data using Spark big datas Operation is analyzed using appropriate algorithm;Using protocol analysis software, the data to obtaining carry out decompression layer by layer, decryption and Protocol analysis etc. are operated, and obtain the URL information and other relevant informations of user behavior data;Use web crawlers technology and URL Information analysis software, to be automatically obtained the behavioral data of user is classified, positioned and analyzed etc. and operates and export standardization and use Family behavioral data analysis result;Correlation analysis and commercialization valency between data field are carried out eventually through the standardization result data Value information is extracted, and in hgher efficiency, analysis precision is more reliable.
Brief description of the drawings
Fig. 1 is the whole handling process schematic diagram of user's internet behavior initial data in the embodiment of the present invention 1;
Fig. 2 is the schematic flow sheet of intermediate analysis in the embodiment of the present invention 1;
Fig. 3 is user's internet behavior standardization result correlation analysis and commercial value acquisition in the embodiment of the present invention 1 Schematic flow sheet;
Fig. 4 is the system module figure of the embodiment of the present invention 2.
Specific embodiment
Specific examples below is only explanation of the invention, and it is not limitation of the present invention, art technology Personnel can make the modification without creative contribution to the present embodiment as needed after this specification is read, but as long as All protected by Patent Law in scope of the presently claimed invention.
Embodiment 1
A kind of analysis method of the user behavior data based on web crawlers, comprises the following steps:
Step 1, user's internet behavior initial data is obtained;
Step 2, by web crawlers in step 1 obtain data carry out intermediate analysis and export user behavior data advise Generalized result;
Step 3, correlation analysis and business carrying out data field to user behavior data standardization result data in step 2 Industry value information is extracted.
Web crawlers technology, web crawlers, also known as webpage spider and network robot can be used in the application, is a kind of According to certain rule, the program or script of web message are automatically captured.With the development of intelligent movable equipment, the mankind Dependence to internet is more and more stronger, and the behavior for carrying out daily life requirement consumption by network turns into compares stream at present Capable consumer behavior, and the market share is also in cumulative year after year.
Big data framework can be used in the application, and qualitative point can be carried out to the webpage that user accesses by web crawlers Analysis, and counted, so as to obtain the rule of the network behavior of user, valuable information is therefrom extracted, for example certain enterprise Certain commodity global each area sale requirement forecasting, and the commodity different model commodity are in global each area The prediction of consumption demand, so as to reduce the retention rate of commodity, improves corporate profit margin.By web crawlers technology, to a large amount of use The webpage behavioral data at family is automatically analyzed, qualitative, statistics and analyze again, so as to the network behavior for drawing user enters professional etiquette Generalized result is exported, and business department can just be exported by the result data of the standardization, formulate marketing strategies, the application Also need to combine using mass data treatment is carried out in hadoop framework clusters, with reference to Spark big data Algorithm Analysis is used, tie Close and processed using the parallelization of cluster, improve the processing speed and accuracy of data.
To the further optimization of such scheme, the data processing in step 1 includes extracting biography of the user on hardware device The duplication and backup storage of transmission of data.When user carries out internet behavior using such hardware device, data will be hard by such Part equipment carries out data transmission, then from such hardware device data can be carried out with copy backup, or directly in some nets Upper store back-end data service centre application obtain, the acquisition of these data and using be need sign confidentiality agreement, prevent User data malice is revealed.
Then, step 2 is further refined,
Intermediate analysis in step 2 include:
Step 1), URL information is extracted;
Step 2), URL information qualitative analysis;
Step 3), source file keyword frequency analysis.
Step 1) include:
The decompression of initial data;
The decryption of initial data;
Protocol analysis after processing data;
URL and corresponding user profile are extracted.
Wherein, it is expressly noted that when step 2), the qualitative success of URL information qualitative analysis, then perform step 3), otherwise, Do not perform step 3).Step 2) realization need to be analyzed the feature of existing webpage, set up URL classification identification model, it is right Content carries out information excavating and classification to URL in itself, for example:By extracting keyword in main flow network address, URL own contents are set up Key word library, so that for user network page behavioral data qualitative analysis provides matching library, you can realize qualitative point of user's URL information Analysis.
Step 3) realization:First, the extraction of some representative key words in info web is set up, key word library is set up;So Afterwards, the source file information of webpage is obtained by the URL of user, for the keyword in this key word library in source file information The frequency of appearance carries out statistics and analysis, extracts the behavioral data qualitative analysis of web page user.The step is needed with reptile skill Art, obtains the source file information of the corresponding webpages of URL, sets up the extraction of some representative key words in info web, such as news, The keywords such as shopping, physical culture, set up key word library, such as, Type of website word frequency storehouse, according to the Type of website word frequency storehouse set up, Word frequency statistic is carried out, according to statistics and webpage deterministic algorithm, you can carry out user's URL behavioral data qualitative analyses.
Output user behavior data standardization result is based on URL information qualitative analysis and source file keyword frequency analysis Both analysis situations, carry out data normalization statistics, record and output function.
And correlation analysis and commercialization carrying out data field to user behavior data standardization result data in step 2 Value information is extracted and can take artificial analysis method, by the knot of the user behavior data standardization output of qualitative analysis Fruit is artificially analyzed and is extracted the contact of each field in the result data of standardization output and the business valency wherein contained Value information.
It is described below than more complete analysis process, as shown in figure 1, Fig. 1 is user's internet behavior initial data The acquisition of entirely handling process schematic diagram, including user's internet behavior initial data, then needs to carry out certain data processing, And after the treatment, reduced, it is easy to be extracted during later analysis, certainly, the data to these user behaviors are equally Exported after needing standardization.
As shown in Fig. 2 Fig. 2 is the schematic flow sheet of intermediate analysis flow, to the user's internet behavior data after standardization, The process of intermediate analysis can then be carried out:
1st, URL information extraction is carried out;
2nd, URL own contents information qualitative analysis;
In process 2, if the qualitative analysis of process 2 success, user behavior data standardization record, Ran Houbao are directly carried out Deposit user behavior data analysis result;
In process 2, if the qualitative analysis of process 2 is unsuccessful, implementation procedure 3;
3rd, this is process 3, and process 3 is source file keyword frequency analysis, if the qualitative analysis of process 3 success, is first carried out User behavior data standardization record, then preserves user behavior data analysis result;
If the qualitative analysis of process 3 is unsuccessful, user behavior data analysis result is directly preserved;
4th, it is last, judge whether user's internet behavior data access terminates, if finished, directly export user's online Behavioral data standardization result;
If process 4 is accessed continued, next user behavior data of access made above is returned to, and make intermediate analysis.
As shown in figure 3, Fig. 3 is the flow that user's internet behavior standardization result correlation analysis and commercial value are obtained showing It is intended to, according to the normalized number result according to statistics for obtaining, makees following work:
1st, interfield correlation analysis is carried out;
2nd, commercial value analysis and excavation;
3rd, writing for document is summarized.
By the work of above three aspects, you can make high precision and take few analytical conclusions again.
Embodiment 2
A kind of analysis system of the user behavior data based on web crawlers, including user's initial data acquisition module, URL Information extraction modules, URL information qualitative analysis module, source file keyword frequency analysis module, behavioral data standardization output Module, data analysis module, wherein,
User's initial data acquisition module is used to obtain user's internet behavior initial data;
The URL information extraction module is used to extract the URL letters that user in user's internet behavior initial data browses webpage Breath;
The URL information qualitative analysis module is qualitative for being carried out to the URL information that the URL information extraction module is extracted Analysis;
The source file keyword frequency analysis module is used to extract the URL information extraction module by web crawlers URL information carry out qualitative analysis;
The behavioral data standardization output module is used for the URL information qualitative analysis module and the source file is crucial User behavior data standardization result output is carried out after word frequency analysis module qualitative analysis;
The data analysis module is used to carry out data word to the result of behavioral data standardization output module output Intersegmental correlation analysis and commercialization value information are extracted.
The general function of the system can be divided into six functional modules, be respectively that user's initial data is obtained, URL information is carried Take, URL information qualitative analysis, source file keyword frequency analysis, behavioral data standardization output and data analysis.Wherein user Initial data is obtained and the extraction of URL information is the existing technological means of enterprise, and the particular design of the application is that URL information is determined Position analysis, source file keyword frequency analysis, behavioral data standardization output and data analysis.
The design of the system can be the analysis method suitable for embodiment 1, wherein, the work(that modules are implemented Can specifically can set as follows:
User's initial data acquisition module, to user initial data of the user on using equipment such as enterprise routers Automatically obtained;
The URL information extraction module, extracts user and browses net by operations such as the decompression of initial data, decryption and cleanings The URL information of page;
The URL information qualitative analysis module, by extracting keyword in main flow network address, sets up the pass of URL own contents Key character library, so as to for user network page behavioral data qualitative analysis provides matching library, realize user's URL information qualitative analysis;
The source file keyword frequency analysis module, by crawler technology, obtains the source file letter of the corresponding webpages of URL Breath, then according to the Type of website word frequency storehouse set up, carries out word frequency statistic, according to statistics and webpage deterministic algorithm, carries out User's URL behavioral data qualitative analyses;
The behavioral data standardization output module, by the statistics and analysis to URL information, extracts the key of the inside Information, type of webpage and corresponding frequency that for example user accesses, the type of merchandise and the letter such as visiting frequency that user accesses Breath;
The data analysis module, by the form of recording a demerit exported to data normalization, analyzes user behavior data, carries out User behavior data correlation analysis.
The URL information extraction module includes initial data decompression submodule, initial data decryption submodule, processing data Protocol analysis submodule and URL and corresponding user profile extraction module afterwards, wherein, the initial data decompression submodule is used In the decompression to user's internet behavior initial data;
The initial data decryption submodule is used for the decryption to user's internet behavior initial data;
Protocol analysis submodule after the processing data is used to decompress initial data submodule and initial data decryption Data after submodule treatment carry out protocol analysis;
The protocol analysis submodule parsing that the URL and corresponding user profile extraction module are used for after the processing data Data afterwards carry out URL and the data of corresponding user profile are extracted.
Data analysis module is defined as artificial analysis module.
Whole system has following function and advantage:
1) existing technology and hardware environment are utilized, the internet behavior initial data of user is collected and stored.
2) Hadoop aggregated structures are built, the data to obtaining are stored, thought is processed to big using Spark big datas Amount data are analyzed operation using appropriate algorithm.
3) protocol analysis software can be used, the data to obtaining carry out the operations such as decompression layer by layer, decryption and protocol analysis, Obtain the URL information and other relevant informations of user behavior data.
4) web crawlers technology and URL information analysis software can be used, is automatically obtained and the behavioral data of user is divided Class, positioning and analysis etc. are operated and export standardization user behavior data analysis result.
5) correlation analysis and commercialization value information between data field is carried out eventually through the standardization result data to carry Take.
Specific embodiment described herein is only to the spiritual explanation for example of the present invention.Technology neck belonging to of the invention The technical staff in domain can make various modification or supplement or use similar mode replacements to described specific embodiment, but Without departing from spirit of the invention or surmount scope defined in appended claims.

Claims (10)

1. a kind of analysis method of the user behavior data based on web crawlers, it is characterised in that comprise the following steps:
Step 1, user's internet behavior initial data is obtained;
Step 2, by web crawlers in step 1 obtain data carry out intermediate analysis and export user behavior data standardize As a result;
Step 3, correlation analysis and commercialization carrying out data field to user behavior data standardization result data in step 2 Value information is extracted.
2. the analysis method of a kind of user behavior data based on web crawlers according to claim 1, it is characterised in that: Data acquisition in step 1 includes extracting the duplication and backup storage of transmission data of the user on hardware device.
3. the analysis method of a kind of user behavior data based on web crawlers according to claim 1, it is characterised in that: Intermediate analysis in step 2 include:
Step 1), URL information is extracted;
Step 2), URL information qualitative analysis;
Step 3), source file keyword frequency analysis.
4. the analysis method of a kind of user behavior data based on web crawlers according to claim 3, it is characterised in that: Step 1) include:
The decompression of initial data;
The decryption of initial data;
Protocol analysis after processing data;
URL and corresponding user profile are extracted.
5. the analysis method of a kind of user behavior data based on web crawlers according to claim 4, it is characterised in that: Step 2) realization need to be analyzed the feature of existing webpage, set up URL classification identification model, to URL, content is entered in itself Row information is excavated and classified.
6. a kind of data analysing method of user behavior data based on web crawlers according to claim 5, its feature It is:When step 2), the qualitative success of URL information qualitative analysis then performs step 3), otherwise, do not perform step 3).
7. the analysis method of a kind of user behavior data based on web crawlers according to claim 5, it is characterised in that: Step 3) realization:First, the extraction of some representative key words in info web is set up, key word library is set up;Then, by with The URL at family obtains the source file information of webpage, for the frequency that the keyword in this key word library occurs in source file information Degree carries out statistics and analysis, extracts the behavioral data qualitative analysis of web page user.
8. a kind of analysis system of the user behavior data based on web crawlers, it is characterised in that:Obtained including user's initial data Modulus block, URL information extraction module, URL information qualitative analysis module, source file keyword frequency analysis module, behavioral data Standardization output module, data analysis module, wherein:
User's initial data acquisition module is used to obtain user's internet behavior initial data;
The URL information extraction module is used to extract the URL information that user in user's internet behavior initial data browses webpage;
The URL information qualitative analysis module is used to carry out qualitative point to the URL information that the URL information extraction module is extracted Analysis;
The source file keyword frequency analysis module is used for the URL information extraction module is extracted by web crawlers URL information carries out qualitative analysis;
The behavioral data standardization output module is used for the URL information qualitative analysis module and the source file keyword frequently User behavior data standardization result output is carried out after degree analysis module qualitative analysis;
The data analysis module is used to carry out data field to the result of behavioral data standardization output module output Correlation analysis and commercialization value information are extracted.
9. the analysis system of a kind of user behavior data based on web crawlers according to claim 8, it is characterised in that: The URL information extraction module includes the agreement after initial data decompression submodule, initial data decryption submodule, processing data Analyzing sub-module and URL and corresponding user profile extraction module, wherein:
The initial data decompression submodule is used for the decompression to user's internet behavior initial data;
The initial data decryption submodule is used for the decryption to user's internet behavior initial data;
Protocol analysis submodule after the processing data is used to decompress initial data submodule and initial data decryption submodule Data after block treatment carry out protocol analysis;
After the protocol analysis submodule parsing that the URL and corresponding user profile extraction module are used for after the processing data Data carry out URL and the data of corresponding user profile are extracted.
10. a kind of analysis system of user behavior data based on web crawlers according to claim 8, its feature exists In:Data analysis module is defined as artificial analysis module.
CN201710017268.XA 2017-01-11 2017-01-11 A kind of analysis method and system of the user behavior data based on web crawlers Pending CN106844588A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710017268.XA CN106844588A (en) 2017-01-11 2017-01-11 A kind of analysis method and system of the user behavior data based on web crawlers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710017268.XA CN106844588A (en) 2017-01-11 2017-01-11 A kind of analysis method and system of the user behavior data based on web crawlers

Publications (1)

Publication Number Publication Date
CN106844588A true CN106844588A (en) 2017-06-13

Family

ID=59118551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710017268.XA Pending CN106844588A (en) 2017-01-11 2017-01-11 A kind of analysis method and system of the user behavior data based on web crawlers

Country Status (1)

Country Link
CN (1) CN106844588A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108540314A (en) * 2018-03-22 2018-09-14 微梦创科网络科技(中国)有限公司 The restoring method and system of user behavior
CN108536804A (en) * 2018-03-30 2018-09-14 掌阅科技股份有限公司 Information-pushing method, electronic equipment based on e-book and computer storage media
CN109361564A (en) * 2018-11-01 2019-02-19 清华大学 Internet data acquisition method and device based on the passive data fusion of master
CN109416700A (en) * 2017-09-30 2019-03-01 深圳市得道健康管理有限公司 A kind of the classification based training method and the network terminal of internet behavior
WO2019071966A1 (en) * 2017-10-13 2019-04-18 平安科技(深圳)有限公司 Crawler data-based user behavior analysis method, application server and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855248A (en) * 2011-06-29 2013-01-02 中国移动通信集团广西有限公司 Determination method, apparatus and system for user characteristic information
CN102946319A (en) * 2012-09-29 2013-02-27 焦点科技股份有限公司 System and method for analyzing network user behavior information
CN103646119A (en) * 2013-12-26 2014-03-19 北京西塔网络科技股份有限公司 Method and device for generating user behavior record
CN104750704A (en) * 2013-12-26 2015-07-01 中国移动通信集团河南有限公司 Webpage uniform resource locator (URL) classification and identification method and device
CN105786965A (en) * 2016-01-27 2016-07-20 久远谦长(北京)技术服务有限公司 URL-based user behavior analysis method and device
CN105893583A (en) * 2016-04-01 2016-08-24 北京鼎泰智源科技有限公司 Data acquisition method and system based on artificial intelligence

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855248A (en) * 2011-06-29 2013-01-02 中国移动通信集团广西有限公司 Determination method, apparatus and system for user characteristic information
CN102946319A (en) * 2012-09-29 2013-02-27 焦点科技股份有限公司 System and method for analyzing network user behavior information
CN103646119A (en) * 2013-12-26 2014-03-19 北京西塔网络科技股份有限公司 Method and device for generating user behavior record
CN104750704A (en) * 2013-12-26 2015-07-01 中国移动通信集团河南有限公司 Webpage uniform resource locator (URL) classification and identification method and device
CN105786965A (en) * 2016-01-27 2016-07-20 久远谦长(北京)技术服务有限公司 URL-based user behavior analysis method and device
CN105893583A (en) * 2016-04-01 2016-08-24 北京鼎泰智源科技有限公司 Data acquisition method and system based on artificial intelligence

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109416700A (en) * 2017-09-30 2019-03-01 深圳市得道健康管理有限公司 A kind of the classification based training method and the network terminal of internet behavior
WO2019071966A1 (en) * 2017-10-13 2019-04-18 平安科技(深圳)有限公司 Crawler data-based user behavior analysis method, application server and readable storage medium
CN108540314A (en) * 2018-03-22 2018-09-14 微梦创科网络科技(中国)有限公司 The restoring method and system of user behavior
CN108536804A (en) * 2018-03-30 2018-09-14 掌阅科技股份有限公司 Information-pushing method, electronic equipment based on e-book and computer storage media
CN108536804B (en) * 2018-03-30 2021-06-29 掌阅科技股份有限公司 Information pushing method based on electronic book, electronic equipment and computer storage medium
CN109361564A (en) * 2018-11-01 2019-02-19 清华大学 Internet data acquisition method and device based on the passive data fusion of master

Similar Documents

Publication Publication Date Title
Nguyen et al. Automatic image filtering on social networks using deep learning and perceptual hashing during crises
Tanwar et al. Unravelling unstructured data: A wealth of information in big data
CN106844588A (en) A kind of analysis method and system of the user behavior data based on web crawlers
CN103914478B (en) Webpage training method and system, webpage Forecasting Methodology and system
CN105468744B (en) Big data platform for realizing tax public opinion analysis and full text retrieval
CN102915335B (en) Based on the information correlation method of user operation records and resource content
CN106815307A (en) Public Culture knowledge mapping platform and its use method
CN101751458A (en) Network public sentiment monitoring system and method
Jayaweera et al. Crime analytics: Analysis of crimes through newspaper articles
CN102542061B (en) Intelligent product classification method
CN103605738A (en) Webpage access data statistical method and webpage access data statistical device
CN102473190A (en) Keyword assignment to a web page
CN104899229A (en) Swarm intelligence based behavior clustering system
US10467255B2 (en) Methods and systems for analyzing reading logs and documents thereof
CN104598536B (en) A kind of distributed network information structuring processing method
CN114915468B (en) Intelligent analysis and detection method for network crime based on knowledge graph
CN112328806A (en) Data processing method, system, computer equipment and storage medium
CN107086925B (en) Deep learning-based internet traffic big data analysis method
Bhardwaj et al. Web scraping using summarization and named entity recognition (ner)
Arora et al. Big data: A review of analytics methods & techniques
CN106874368B (en) RTB bidding advertisement position value analysis method and system
Goele et al. Data Mining Trend in Past, Current and Future
CN111447575A (en) Short message pushing method, device, equipment and storage medium
Gaurav et al. An outline on big data and big data analytics
EP4248325A1 (en) Multi-cache based digital output generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201102

Address after: No. 2-3167, zone a, Nonggang City, No. 2388, Donghuan Avenue, Hongjia street, Jiaojiang District, Taizhou City, Zhejiang Province

Applicant after: Taizhou Jiji Intellectual Property Operation Co.,Ltd.

Address before: 201616 Shanghai city Songjiang District Sixian Road No. 3666

Applicant before: Phicomm (Shanghai) Co.,Ltd.