CN106844588A - A kind of analysis method and system of the user behavior data based on web crawlers - Google Patents
A kind of analysis method and system of the user behavior data based on web crawlers Download PDFInfo
- Publication number
- CN106844588A CN106844588A CN201710017268.XA CN201710017268A CN106844588A CN 106844588 A CN106844588 A CN 106844588A CN 201710017268 A CN201710017268 A CN 201710017268A CN 106844588 A CN106844588 A CN 106844588A
- Authority
- CN
- China
- Prior art keywords
- data
- analysis
- user
- module
- url
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3438—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/535—Tracking the activity of the user
Abstract
The present invention relates to data processing field, the analysis method and system of specially a kind of user behavior data based on web crawlers, the inventive method comprises the following steps:Step 1, user's internet behavior initial data is obtained;Step 2, by web crawlers in step 1 obtain data carry out intermediate analysis and export user behavior data standardize result;Step 3, correlation analysis and commercialization value information are extracted carrying out data field to user behavior data standardization result data in step 2, have the advantages that efficiency high, analysis precision are high.
Description
Technical field
The present invention relates to technical field of data processing, the data of specially a kind of user behavior data based on web crawlers
Analysis method and system.
Background technology
Intelligent movable equipment is also enabled consumers to not constrained by region and time, and shopping online is carried out whenever and wherever possible
With amusement etc. behavior.Web page browsing situation for a user is analyzed, and we can therefrom deduce the user to mutual
The degree of dependence of networking and divided using the behavior property of network, be the consumer online of potential type, or loyal type network
Consumer, or other types.For a value for the behavioral data analysis of user less, in the same way to the whole nation
This behavioral data of even Global Subscriber is analyzed, and huge commercial value is just hidden, while with internet behavior number
According to the huge of amount, Data Collection, storage, the pressure analyzed are increasing.At present, big internet store, residential hardware equipment
The enterprises such as provider, by the product of itself in the share and technology in market, can grab number of the user on the kind equipment
According to, these data are carried out with the operations such as copy backup, clear and parsing, extract the webpage that browses of user, i.e. URL, by this
A little URL, extract the network behavior feature of user, but can't accomplish good data analysis.
As Publication No. CN 101192227B patent disclose a kind of user behavior analysis method based on big data and
System, by client Real-time Collection user behavior data, the contextual information of user behavior and page URL is combined, most
Limits ground reappearing user browses the real scene of Web page, extracts comprehensive user behavior track, for analysis user behavior is carried
For effective Data safeguard;And by safety analysis module for user behavior data provides safety guarantee, also using user behavior
Body of data model is modeled to user behavior, is realized the shared of behavioural information semantic class and is reused, and improves model interoperability
And reliability;Real-time Collection user behavior and context data are analyzed, and make result more reliable;With column storage database to this
Body and behavioural information are stored, and are that Mass Data Management lays the foundation;By the powerful disposal ability of cloud computing technology and big rule
Mould data storage capacities, body and its reasoning, Methods of Knowledge Discovering Based are combined, in real time analysis mass users behavioral data, in time
User interest is obtained, so as to realize effectively being pushed with accurately user, the technology of the patent use of the disclosure is for mass data
Collect, store, analyze the scarce capacity that will seem, efficiency is low, and accuracy also can be relatively low.
The content of the invention
It is an object of the invention to provide a kind of efficiency high, analysis precision a kind of user behavior based on web crawlers high
The analysis method and system of data.
Above-mentioned technical purpose of the invention technical scheme is that:
A kind of analysis method of the user behavior data based on web crawlers, comprises the following steps:
Step 1, user's internet behavior initial data is obtained;
Step 2, by web crawlers in step 1 obtain data carry out intermediate analysis and export user behavior data advise
Generalized result;
Step 3, correlation analysis and business carrying out data field to user behavior data standardization result data in step 2
Industry value information is extracted.
As to of the invention preferred, transmission data of the data acquisition including extraction user on hardware device in step 1
Duplication and backup storage.
As to it is of the invention preferably, the intermediate analysis in step 2 include:
Step 1) URL information extraction;
Step 2) URL information qualitative analysis;
Step 3) source file keyword frequency analysis.
As to of the invention preferred, step 1) include:
The decompression of initial data;
The decryption of initial data;
Protocol analysis after processing data;
URL and corresponding user profile are extracted.
As to it is of the invention preferably, step 2) realizations need to be analyzed the feature of existing webpage, set up URL divide
Class identification model, to URL, content carries out information excavating and classification in itself.
As to of the invention preferred, when step 2), the qualitative success of URL information qualitative analysis then performs step 3), it is no
Then, step 3 is not performed).
As to it is of the invention preferably, step 3) realization:
First, the extraction of some representative key words in info web is set up, key word library is set up;
Then, the source file information of webpage is obtained by the URL of user, for the keyword in this key word library in source
The frequency occurred in fileinfo carries out statistics and analysis, extracts the behavioral data qualitative analysis of web page user.
A kind of analysis system of the user behavior data based on web crawlers, including user's initial data acquisition module, URL
Information extraction modules, URL information qualitative analysis module, source file keyword frequency analysis module, behavioral data standardization output
Module, data analysis module, wherein:
User's initial data acquisition module is used to obtain user's internet behavior initial data;
The URL information extraction module is used to extract the URL letters that user in user's internet behavior initial data browses webpage
Breath;
The URL information qualitative analysis module is qualitative for being carried out to the URL information that the URL information extraction module is extracted
Analysis;
The source file keyword frequency analysis module is used to extract the URL information extraction module by web crawlers
URL information carry out qualitative analysis;
The behavioral data standardization output module is used for the URL information qualitative analysis module and the source file is crucial
User behavior data standardization result output is carried out after word frequency analysis module qualitative analysis;
The data analysis module is used to carry out data word to the result of behavioral data standardization output module output
Intersegmental correlation analysis and commercialization value information are extracted.
As to it is of the invention preferably, the URL information extraction module include initial data decompression submodule, initial data
Protocol analysis submodule and URL and corresponding user profile extraction module after decryption submodule, processing data, wherein, it is described
Initial data decompression submodule is used for the decompression to user's internet behavior initial data;
The initial data decryption submodule is used for the decryption to user's internet behavior initial data;
Protocol analysis submodule after the processing data is used to decompress initial data submodule and initial data decryption
Data after submodule treatment carry out protocol analysis;
The protocol analysis submodule parsing that the URL and corresponding user profile extraction module are used for after the processing data
Data afterwards carry out URL and the data of corresponding user profile are extracted.
As to it is of the invention preferably, data analysis module is defined as artificial analysis module.
The present invention utilizes existing technology and hardware environment, and the internet behavior initial data of user is collected and deposited
Storage;Hadoop aggregated structures are built, the data to obtaining are stored, thought is processed to mass data using Spark big datas
Operation is analyzed using appropriate algorithm;Using protocol analysis software, the data to obtaining carry out decompression layer by layer, decryption and
Protocol analysis etc. are operated, and obtain the URL information and other relevant informations of user behavior data;Use web crawlers technology and URL
Information analysis software, to be automatically obtained the behavioral data of user is classified, positioned and analyzed etc. and operates and export standardization and use
Family behavioral data analysis result;Correlation analysis and commercialization valency between data field are carried out eventually through the standardization result data
Value information is extracted, and in hgher efficiency, analysis precision is more reliable.
Brief description of the drawings
Fig. 1 is the whole handling process schematic diagram of user's internet behavior initial data in the embodiment of the present invention 1;
Fig. 2 is the schematic flow sheet of intermediate analysis in the embodiment of the present invention 1;
Fig. 3 is user's internet behavior standardization result correlation analysis and commercial value acquisition in the embodiment of the present invention 1
Schematic flow sheet;
Fig. 4 is the system module figure of the embodiment of the present invention 2.
Specific embodiment
Specific examples below is only explanation of the invention, and it is not limitation of the present invention, art technology
Personnel can make the modification without creative contribution to the present embodiment as needed after this specification is read, but as long as
All protected by Patent Law in scope of the presently claimed invention.
Embodiment 1
A kind of analysis method of the user behavior data based on web crawlers, comprises the following steps:
Step 1, user's internet behavior initial data is obtained;
Step 2, by web crawlers in step 1 obtain data carry out intermediate analysis and export user behavior data advise
Generalized result;
Step 3, correlation analysis and business carrying out data field to user behavior data standardization result data in step 2
Industry value information is extracted.
Web crawlers technology, web crawlers, also known as webpage spider and network robot can be used in the application, is a kind of
According to certain rule, the program or script of web message are automatically captured.With the development of intelligent movable equipment, the mankind
Dependence to internet is more and more stronger, and the behavior for carrying out daily life requirement consumption by network turns into compares stream at present
Capable consumer behavior, and the market share is also in cumulative year after year.
Big data framework can be used in the application, and qualitative point can be carried out to the webpage that user accesses by web crawlers
Analysis, and counted, so as to obtain the rule of the network behavior of user, valuable information is therefrom extracted, for example certain enterprise
Certain commodity global each area sale requirement forecasting, and the commodity different model commodity are in global each area
The prediction of consumption demand, so as to reduce the retention rate of commodity, improves corporate profit margin.By web crawlers technology, to a large amount of use
The webpage behavioral data at family is automatically analyzed, qualitative, statistics and analyze again, so as to the network behavior for drawing user enters professional etiquette
Generalized result is exported, and business department can just be exported by the result data of the standardization, formulate marketing strategies, the application
Also need to combine using mass data treatment is carried out in hadoop framework clusters, with reference to Spark big data Algorithm Analysis is used, tie
Close and processed using the parallelization of cluster, improve the processing speed and accuracy of data.
To the further optimization of such scheme, the data processing in step 1 includes extracting biography of the user on hardware device
The duplication and backup storage of transmission of data.When user carries out internet behavior using such hardware device, data will be hard by such
Part equipment carries out data transmission, then from such hardware device data can be carried out with copy backup, or directly in some nets
Upper store back-end data service centre application obtain, the acquisition of these data and using be need sign confidentiality agreement, prevent
User data malice is revealed.
Then, step 2 is further refined,
Intermediate analysis in step 2 include:
Step 1), URL information is extracted;
Step 2), URL information qualitative analysis;
Step 3), source file keyword frequency analysis.
Step 1) include:
The decompression of initial data;
The decryption of initial data;
Protocol analysis after processing data;
URL and corresponding user profile are extracted.
Wherein, it is expressly noted that when step 2), the qualitative success of URL information qualitative analysis, then perform step 3), otherwise,
Do not perform step 3).Step 2) realization need to be analyzed the feature of existing webpage, set up URL classification identification model, it is right
Content carries out information excavating and classification to URL in itself, for example:By extracting keyword in main flow network address, URL own contents are set up
Key word library, so that for user network page behavioral data qualitative analysis provides matching library, you can realize qualitative point of user's URL information
Analysis.
Step 3) realization:First, the extraction of some representative key words in info web is set up, key word library is set up;So
Afterwards, the source file information of webpage is obtained by the URL of user, for the keyword in this key word library in source file information
The frequency of appearance carries out statistics and analysis, extracts the behavioral data qualitative analysis of web page user.The step is needed with reptile skill
Art, obtains the source file information of the corresponding webpages of URL, sets up the extraction of some representative key words in info web, such as news,
The keywords such as shopping, physical culture, set up key word library, such as, Type of website word frequency storehouse, according to the Type of website word frequency storehouse set up,
Word frequency statistic is carried out, according to statistics and webpage deterministic algorithm, you can carry out user's URL behavioral data qualitative analyses.
Output user behavior data standardization result is based on URL information qualitative analysis and source file keyword frequency analysis
Both analysis situations, carry out data normalization statistics, record and output function.
And correlation analysis and commercialization carrying out data field to user behavior data standardization result data in step 2
Value information is extracted and can take artificial analysis method, by the knot of the user behavior data standardization output of qualitative analysis
Fruit is artificially analyzed and is extracted the contact of each field in the result data of standardization output and the business valency wherein contained
Value information.
It is described below than more complete analysis process, as shown in figure 1, Fig. 1 is user's internet behavior initial data
The acquisition of entirely handling process schematic diagram, including user's internet behavior initial data, then needs to carry out certain data processing,
And after the treatment, reduced, it is easy to be extracted during later analysis, certainly, the data to these user behaviors are equally
Exported after needing standardization.
As shown in Fig. 2 Fig. 2 is the schematic flow sheet of intermediate analysis flow, to the user's internet behavior data after standardization,
The process of intermediate analysis can then be carried out:
1st, URL information extraction is carried out;
2nd, URL own contents information qualitative analysis;
In process 2, if the qualitative analysis of process 2 success, user behavior data standardization record, Ran Houbao are directly carried out
Deposit user behavior data analysis result;
In process 2, if the qualitative analysis of process 2 is unsuccessful, implementation procedure 3;
3rd, this is process 3, and process 3 is source file keyword frequency analysis, if the qualitative analysis of process 3 success, is first carried out
User behavior data standardization record, then preserves user behavior data analysis result;
If the qualitative analysis of process 3 is unsuccessful, user behavior data analysis result is directly preserved;
4th, it is last, judge whether user's internet behavior data access terminates, if finished, directly export user's online
Behavioral data standardization result;
If process 4 is accessed continued, next user behavior data of access made above is returned to, and make intermediate analysis.
As shown in figure 3, Fig. 3 is the flow that user's internet behavior standardization result correlation analysis and commercial value are obtained showing
It is intended to, according to the normalized number result according to statistics for obtaining, makees following work:
1st, interfield correlation analysis is carried out;
2nd, commercial value analysis and excavation;
3rd, writing for document is summarized.
By the work of above three aspects, you can make high precision and take few analytical conclusions again.
Embodiment 2
A kind of analysis system of the user behavior data based on web crawlers, including user's initial data acquisition module, URL
Information extraction modules, URL information qualitative analysis module, source file keyword frequency analysis module, behavioral data standardization output
Module, data analysis module, wherein,
User's initial data acquisition module is used to obtain user's internet behavior initial data;
The URL information extraction module is used to extract the URL letters that user in user's internet behavior initial data browses webpage
Breath;
The URL information qualitative analysis module is qualitative for being carried out to the URL information that the URL information extraction module is extracted
Analysis;
The source file keyword frequency analysis module is used to extract the URL information extraction module by web crawlers
URL information carry out qualitative analysis;
The behavioral data standardization output module is used for the URL information qualitative analysis module and the source file is crucial
User behavior data standardization result output is carried out after word frequency analysis module qualitative analysis;
The data analysis module is used to carry out data word to the result of behavioral data standardization output module output
Intersegmental correlation analysis and commercialization value information are extracted.
The general function of the system can be divided into six functional modules, be respectively that user's initial data is obtained, URL information is carried
Take, URL information qualitative analysis, source file keyword frequency analysis, behavioral data standardization output and data analysis.Wherein user
Initial data is obtained and the extraction of URL information is the existing technological means of enterprise, and the particular design of the application is that URL information is determined
Position analysis, source file keyword frequency analysis, behavioral data standardization output and data analysis.
The design of the system can be the analysis method suitable for embodiment 1, wherein, the work(that modules are implemented
Can specifically can set as follows:
User's initial data acquisition module, to user initial data of the user on using equipment such as enterprise routers
Automatically obtained;
The URL information extraction module, extracts user and browses net by operations such as the decompression of initial data, decryption and cleanings
The URL information of page;
The URL information qualitative analysis module, by extracting keyword in main flow network address, sets up the pass of URL own contents
Key character library, so as to for user network page behavioral data qualitative analysis provides matching library, realize user's URL information qualitative analysis;
The source file keyword frequency analysis module, by crawler technology, obtains the source file letter of the corresponding webpages of URL
Breath, then according to the Type of website word frequency storehouse set up, carries out word frequency statistic, according to statistics and webpage deterministic algorithm, carries out
User's URL behavioral data qualitative analyses;
The behavioral data standardization output module, by the statistics and analysis to URL information, extracts the key of the inside
Information, type of webpage and corresponding frequency that for example user accesses, the type of merchandise and the letter such as visiting frequency that user accesses
Breath;
The data analysis module, by the form of recording a demerit exported to data normalization, analyzes user behavior data, carries out
User behavior data correlation analysis.
The URL information extraction module includes initial data decompression submodule, initial data decryption submodule, processing data
Protocol analysis submodule and URL and corresponding user profile extraction module afterwards, wherein, the initial data decompression submodule is used
In the decompression to user's internet behavior initial data;
The initial data decryption submodule is used for the decryption to user's internet behavior initial data;
Protocol analysis submodule after the processing data is used to decompress initial data submodule and initial data decryption
Data after submodule treatment carry out protocol analysis;
The protocol analysis submodule parsing that the URL and corresponding user profile extraction module are used for after the processing data
Data afterwards carry out URL and the data of corresponding user profile are extracted.
Data analysis module is defined as artificial analysis module.
Whole system has following function and advantage:
1) existing technology and hardware environment are utilized, the internet behavior initial data of user is collected and stored.
2) Hadoop aggregated structures are built, the data to obtaining are stored, thought is processed to big using Spark big datas
Amount data are analyzed operation using appropriate algorithm.
3) protocol analysis software can be used, the data to obtaining carry out the operations such as decompression layer by layer, decryption and protocol analysis,
Obtain the URL information and other relevant informations of user behavior data.
4) web crawlers technology and URL information analysis software can be used, is automatically obtained and the behavioral data of user is divided
Class, positioning and analysis etc. are operated and export standardization user behavior data analysis result.
5) correlation analysis and commercialization value information between data field is carried out eventually through the standardization result data to carry
Take.
Specific embodiment described herein is only to the spiritual explanation for example of the present invention.Technology neck belonging to of the invention
The technical staff in domain can make various modification or supplement or use similar mode replacements to described specific embodiment, but
Without departing from spirit of the invention or surmount scope defined in appended claims.
Claims (10)
1. a kind of analysis method of the user behavior data based on web crawlers, it is characterised in that comprise the following steps:
Step 1, user's internet behavior initial data is obtained;
Step 2, by web crawlers in step 1 obtain data carry out intermediate analysis and export user behavior data standardize
As a result;
Step 3, correlation analysis and commercialization carrying out data field to user behavior data standardization result data in step 2
Value information is extracted.
2. the analysis method of a kind of user behavior data based on web crawlers according to claim 1, it is characterised in that:
Data acquisition in step 1 includes extracting the duplication and backup storage of transmission data of the user on hardware device.
3. the analysis method of a kind of user behavior data based on web crawlers according to claim 1, it is characterised in that:
Intermediate analysis in step 2 include:
Step 1), URL information is extracted;
Step 2), URL information qualitative analysis;
Step 3), source file keyword frequency analysis.
4. the analysis method of a kind of user behavior data based on web crawlers according to claim 3, it is characterised in that:
Step 1) include:
The decompression of initial data;
The decryption of initial data;
Protocol analysis after processing data;
URL and corresponding user profile are extracted.
5. the analysis method of a kind of user behavior data based on web crawlers according to claim 4, it is characterised in that:
Step 2) realization need to be analyzed the feature of existing webpage, set up URL classification identification model, to URL, content is entered in itself
Row information is excavated and classified.
6. a kind of data analysing method of user behavior data based on web crawlers according to claim 5, its feature
It is:When step 2), the qualitative success of URL information qualitative analysis then performs step 3), otherwise, do not perform step 3).
7. the analysis method of a kind of user behavior data based on web crawlers according to claim 5, it is characterised in that:
Step 3) realization:First, the extraction of some representative key words in info web is set up, key word library is set up;Then, by with
The URL at family obtains the source file information of webpage, for the frequency that the keyword in this key word library occurs in source file information
Degree carries out statistics and analysis, extracts the behavioral data qualitative analysis of web page user.
8. a kind of analysis system of the user behavior data based on web crawlers, it is characterised in that:Obtained including user's initial data
Modulus block, URL information extraction module, URL information qualitative analysis module, source file keyword frequency analysis module, behavioral data
Standardization output module, data analysis module, wherein:
User's initial data acquisition module is used to obtain user's internet behavior initial data;
The URL information extraction module is used to extract the URL information that user in user's internet behavior initial data browses webpage;
The URL information qualitative analysis module is used to carry out qualitative point to the URL information that the URL information extraction module is extracted
Analysis;
The source file keyword frequency analysis module is used for the URL information extraction module is extracted by web crawlers
URL information carries out qualitative analysis;
The behavioral data standardization output module is used for the URL information qualitative analysis module and the source file keyword frequently
User behavior data standardization result output is carried out after degree analysis module qualitative analysis;
The data analysis module is used to carry out data field to the result of behavioral data standardization output module output
Correlation analysis and commercialization value information are extracted.
9. the analysis system of a kind of user behavior data based on web crawlers according to claim 8, it is characterised in that:
The URL information extraction module includes the agreement after initial data decompression submodule, initial data decryption submodule, processing data
Analyzing sub-module and URL and corresponding user profile extraction module, wherein:
The initial data decompression submodule is used for the decompression to user's internet behavior initial data;
The initial data decryption submodule is used for the decryption to user's internet behavior initial data;
Protocol analysis submodule after the processing data is used to decompress initial data submodule and initial data decryption submodule
Data after block treatment carry out protocol analysis;
After the protocol analysis submodule parsing that the URL and corresponding user profile extraction module are used for after the processing data
Data carry out URL and the data of corresponding user profile are extracted.
10. a kind of analysis system of user behavior data based on web crawlers according to claim 8, its feature exists
In:Data analysis module is defined as artificial analysis module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710017268.XA CN106844588A (en) | 2017-01-11 | 2017-01-11 | A kind of analysis method and system of the user behavior data based on web crawlers |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710017268.XA CN106844588A (en) | 2017-01-11 | 2017-01-11 | A kind of analysis method and system of the user behavior data based on web crawlers |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106844588A true CN106844588A (en) | 2017-06-13 |
Family
ID=59118551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710017268.XA Pending CN106844588A (en) | 2017-01-11 | 2017-01-11 | A kind of analysis method and system of the user behavior data based on web crawlers |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106844588A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108540314A (en) * | 2018-03-22 | 2018-09-14 | 微梦创科网络科技(中国)有限公司 | The restoring method and system of user behavior |
CN108536804A (en) * | 2018-03-30 | 2018-09-14 | 掌阅科技股份有限公司 | Information-pushing method, electronic equipment based on e-book and computer storage media |
CN109361564A (en) * | 2018-11-01 | 2019-02-19 | 清华大学 | Internet data acquisition method and device based on the passive data fusion of master |
CN109416700A (en) * | 2017-09-30 | 2019-03-01 | 深圳市得道健康管理有限公司 | A kind of the classification based training method and the network terminal of internet behavior |
WO2019071966A1 (en) * | 2017-10-13 | 2019-04-18 | 平安科技(深圳)有限公司 | Crawler data-based user behavior analysis method, application server and readable storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102855248A (en) * | 2011-06-29 | 2013-01-02 | 中国移动通信集团广西有限公司 | Determination method, apparatus and system for user characteristic information |
CN102946319A (en) * | 2012-09-29 | 2013-02-27 | 焦点科技股份有限公司 | System and method for analyzing network user behavior information |
CN103646119A (en) * | 2013-12-26 | 2014-03-19 | 北京西塔网络科技股份有限公司 | Method and device for generating user behavior record |
CN104750704A (en) * | 2013-12-26 | 2015-07-01 | 中国移动通信集团河南有限公司 | Webpage uniform resource locator (URL) classification and identification method and device |
CN105786965A (en) * | 2016-01-27 | 2016-07-20 | 久远谦长(北京)技术服务有限公司 | URL-based user behavior analysis method and device |
CN105893583A (en) * | 2016-04-01 | 2016-08-24 | 北京鼎泰智源科技有限公司 | Data acquisition method and system based on artificial intelligence |
-
2017
- 2017-01-11 CN CN201710017268.XA patent/CN106844588A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102855248A (en) * | 2011-06-29 | 2013-01-02 | 中国移动通信集团广西有限公司 | Determination method, apparatus and system for user characteristic information |
CN102946319A (en) * | 2012-09-29 | 2013-02-27 | 焦点科技股份有限公司 | System and method for analyzing network user behavior information |
CN103646119A (en) * | 2013-12-26 | 2014-03-19 | 北京西塔网络科技股份有限公司 | Method and device for generating user behavior record |
CN104750704A (en) * | 2013-12-26 | 2015-07-01 | 中国移动通信集团河南有限公司 | Webpage uniform resource locator (URL) classification and identification method and device |
CN105786965A (en) * | 2016-01-27 | 2016-07-20 | 久远谦长(北京)技术服务有限公司 | URL-based user behavior analysis method and device |
CN105893583A (en) * | 2016-04-01 | 2016-08-24 | 北京鼎泰智源科技有限公司 | Data acquisition method and system based on artificial intelligence |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109416700A (en) * | 2017-09-30 | 2019-03-01 | 深圳市得道健康管理有限公司 | A kind of the classification based training method and the network terminal of internet behavior |
WO2019071966A1 (en) * | 2017-10-13 | 2019-04-18 | 平安科技(深圳)有限公司 | Crawler data-based user behavior analysis method, application server and readable storage medium |
CN108540314A (en) * | 2018-03-22 | 2018-09-14 | 微梦创科网络科技(中国)有限公司 | The restoring method and system of user behavior |
CN108536804A (en) * | 2018-03-30 | 2018-09-14 | 掌阅科技股份有限公司 | Information-pushing method, electronic equipment based on e-book and computer storage media |
CN108536804B (en) * | 2018-03-30 | 2021-06-29 | 掌阅科技股份有限公司 | Information pushing method based on electronic book, electronic equipment and computer storage medium |
CN109361564A (en) * | 2018-11-01 | 2019-02-19 | 清华大学 | Internet data acquisition method and device based on the passive data fusion of master |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nguyen et al. | Automatic image filtering on social networks using deep learning and perceptual hashing during crises | |
Tanwar et al. | Unravelling unstructured data: A wealth of information in big data | |
CN106844588A (en) | A kind of analysis method and system of the user behavior data based on web crawlers | |
CN103914478B (en) | Webpage training method and system, webpage Forecasting Methodology and system | |
CN105468744B (en) | Big data platform for realizing tax public opinion analysis and full text retrieval | |
CN102915335B (en) | Based on the information correlation method of user operation records and resource content | |
CN106815307A (en) | Public Culture knowledge mapping platform and its use method | |
CN101751458A (en) | Network public sentiment monitoring system and method | |
Jayaweera et al. | Crime analytics: Analysis of crimes through newspaper articles | |
CN102542061B (en) | Intelligent product classification method | |
CN103605738A (en) | Webpage access data statistical method and webpage access data statistical device | |
CN102473190A (en) | Keyword assignment to a web page | |
CN104899229A (en) | Swarm intelligence based behavior clustering system | |
US10467255B2 (en) | Methods and systems for analyzing reading logs and documents thereof | |
CN104598536B (en) | A kind of distributed network information structuring processing method | |
CN114915468B (en) | Intelligent analysis and detection method for network crime based on knowledge graph | |
CN112328806A (en) | Data processing method, system, computer equipment and storage medium | |
CN107086925B (en) | Deep learning-based internet traffic big data analysis method | |
Bhardwaj et al. | Web scraping using summarization and named entity recognition (ner) | |
Arora et al. | Big data: A review of analytics methods & techniques | |
CN106874368B (en) | RTB bidding advertisement position value analysis method and system | |
Goele et al. | Data Mining Trend in Past, Current and Future | |
CN111447575A (en) | Short message pushing method, device, equipment and storage medium | |
Gaurav et al. | An outline on big data and big data analytics | |
EP4248325A1 (en) | Multi-cache based digital output generation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20201102 Address after: No. 2-3167, zone a, Nonggang City, No. 2388, Donghuan Avenue, Hongjia street, Jiaojiang District, Taizhou City, Zhejiang Province Applicant after: Taizhou Jiji Intellectual Property Operation Co.,Ltd. Address before: 201616 Shanghai city Songjiang District Sixian Road No. 3666 Applicant before: Phicomm (Shanghai) Co.,Ltd. |