CN109583472A - A kind of web log user identification method and system - Google Patents

A kind of web log user identification method and system Download PDF

Info

Publication number
CN109583472A
CN109583472A CN201811276191.9A CN201811276191A CN109583472A CN 109583472 A CN109583472 A CN 109583472A CN 201811276191 A CN201811276191 A CN 201811276191A CN 109583472 A CN109583472 A CN 109583472A
Authority
CN
China
Prior art keywords
user
log
behavior
user action
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811276191.9A
Other languages
Chinese (zh)
Inventor
张梦菲
方金云
肖茁建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201811276191.9A priority Critical patent/CN109583472A/en
Publication of CN109583472A publication Critical patent/CN109583472A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a kind of web log user identification method and systems.This method comprises: extracting log critical field from User action log, wherein the log critical field includes at least uniform resource position mark URL;Multiple behavioural characteristics of building reflection user behavior motivation from the uniform resource position mark URL;User behavior motivation similarity is calculated according to the multiple behavioural characteristic and user is identified based on the user behavior motivation similarity.Method and system of the invention can accurately and effectively carry out user's identification according to web log.

Description

A kind of web log user identification method and system
Technical field
The present invention relates to information technology field more particularly to a kind of web log user identification method and systems.
Background technique
Basis of the user recognition technology as Web Web log mining is to analyze anonymous from a large amount of unordered data Independent behavior track and feature, and finally identify unique user individual.In the prior art, heuristic side is usually utilized Method tracks user according to information such as IP, cookie of user mark and user-agent, and since user is logging in When internet, Internet Service Provider is often randomly assigned IP address to user, so that a user possesses many IP , " multi-user's problem " and " single user problem " address, therefore this method is faced with following problems: 1), " multi-user's problem " is In referring to the same user in different times by address field input URL or enter the Web page from collection can be identified as it is more A user, " single user problem " refer to that one IP of multiple user sharings may be identified even with equipment of the same race and browser For a user.2), efficiency, in the case of web user's amount of access is more than million ranks, current user, which identifies, to be calculated Method it is inefficient.
Therefore, it is necessary to be improved to the prior art, to provide the use that a kind for the treatment of effeciency is high and recognition accuracy is high Family recognition methods and system.
Summary of the invention
It is an object of the invention to overcome the defect of the above-mentioned prior art, provide a kind of web log user identification method and System.
According to the first aspect of the invention, a kind of web log user identification method is provided, this method includes following step It is rapid:
Step 1: log critical field is extracted from User action log, wherein the log critical field includes at least Uniform resource position mark URL;
Step 2: multiple behavioural characteristics of building reflection user behavior motivation from the uniform resource position mark URL;
Step 3: user behavior motivation similarity being calculated according to the multiple behavioural characteristic and is moved based on the user behavior Machine similarity identifies user.
In one embodiment, the multiple behavioural characteristic includes access type, access website column, access retail shop's row At least one of for, access commodity behavior, in search behavior.
In one embodiment, step 3 further comprises:
Step 31: identifying whether it is same based on the log critical field extracted from the User action log User;
Step 32: the User action log for not identifying user is based further on the user behavior motivation phase It is identified like degree.
In one embodiment, the log critical field also includes at least the unified resource positioning an of accession page Referrer URL, user identifier, user-agent, cookie mark and session mark are accorded with, in step 31, will be met Two User action logs of any of the following conditions are determined as the same user:
The user identifier of two User action logs is not null field and identical;
The cookie mark of two user behaviors, two logs is not null field and identical;
The session mark of two User action logs is not null field and identical;Or
The uniform resource locator of the uniform resource position mark URL of two User action logs and a upper accession page Referrer URL meets the topological structure of website.
In one embodiment, for two User action logs, step 32 includes:
Step 321: if the multiple behavioural characteristics extracted from two User action logs include access retail shop's behavior, Main business description is then extracted, if commodity title is extracted, if mentioned comprising search behavior comprising accessing commodity behavior Take search key;
Step 322: main business description, commodity title or the search extracted from two User action logs are closed Keyword carries out user behavior motivation similarity calculation, if similarity difference is less than threshold value, determines two user behavior days Will is the same user.
In one embodiment, step 32 further include:
Step 323, if the multiple behavioural characteristics extracted from two User action logs include access type and visit It asks website column, then the access type of two User action logs and access website column is subjected to comparison of combined, if two Person is identical, determines that two User action logs are the same user.
In one embodiment, in step 322, by the master of first User action log calculated using word2vec Seek the institute of the word insertion vector sum Article 2 User action log of any one of business description or commodity title or search key Described in the cosine similarity conduct for stating the word insertion vector of any one of main business description or commodity title or search key User behavior motivation similarity, if the user behavior motivation similarity is less than threshold value, judgement is the same user.
According to the second aspect of the invention, a kind of user's identifying system is provided.The system includes:
User action log acquisition module: for extracting log critical field from User action log, wherein the day Will critical field includes at least uniform resource position mark URL;
User behavior characteristics extraction module: dynamic for constructing reflection user behavior from the uniform resource position mark URL Multiple behavioural characteristics of machine;
Subscriber identification module: for calculating user behavior motivation similarity according to the multiple behavioural characteristic and based on described User behavior motivation similarity identifies user
In one embodiment, the system further include:
User action log preprocessing module: for User action log collected to be filtered or formatted;
Bucket division module: for institute's User action log to be divided into number of different sizes according to the log critical field According to bucket;
Identifier generation module: user generates user identifier to the user identified.
In one embodiment, the system is the system based on spark platform.
Compared with the prior art, the advantages of the present invention are as follows: by the behavioural characteristic of user being extracted from web log come table Show user behavior motivation, identify user using the similarity of user behavior motivation, improves the accuracy of user's identification, this Outside, it based on realizing that method of the invention improves data-handling efficiency using Spark platform, can better meet at data The requirement of real-time of reason.
Detailed description of the invention
The following drawings only makees schematical description and interpretation to the present invention, is not intended to limit the scope of the present invention, in which:
Fig. 1 shows the flow chart of web log user identification method according to an embodiment of the invention;
Fig. 2 shows the schematic diagrames of web log user's identifying system according to an embodiment of the invention.
Specific embodiment
It is logical below in conjunction with attached drawing in order to keep the purpose of the present invention, technical solution, design method and advantage more clear Crossing specific embodiment, the present invention is described in more detail.It should be appreciated that specific embodiment described herein is only used for explaining The present invention is not intended to limit the present invention.
According to one embodiment of present invention, a kind of web log user identification method is provided, this method is with web services Foundation of the journal file as analysis on device, user is identified according to the similitude of user behavior motivation, specifically, referring to figure Described in 1, method includes the following steps:
Step S110 pre-processes User action log to obtain log critical field.
User action log can be acquired from the web server of record user behavior, wherein include the action trail of user, For example, IP address, online moment, purpose website IP, URL, flow, connection type etc..
The log critical field needed is obtained by carrying out pretreatment to log, preprocessing process includes in filtering log Crawler with delete unrelated webpage information, Exception Filter data to avoid influence user identification accuracy rate, to the day after filtering Will is formatted to extract required field etc..
In one embodiment, the log critical field extracted from web log includes that IP address, unified resource are fixed Position symbol URL, the uniform resource locator (referer URL) of a upper accession page, the log access time, user identifier, User-agent (user agent), cookie mark, session mark etc..
It should be noted that user identifier, cookie mark, session mark etc. may be sky word for every log Section and is it through the following description as can be seen that one of the objects of the present invention is to identify that identification information is empty log Fill unique identifier.
Step S120 obtains user behavior motivation based on log critical field.
In one embodiment, five behavioural characteristics are constructed based on the URL in log critical field to indicate user's row For motivation, which is access type, access website column, access retail shop's behavior, access commodity behavior, search row respectively For.
According to one embodiment of present invention, by that can be obtained in analysis URL by each component part of separator of "/" To the type and access website plate of access.Access type can be the page, website homepage, searched page, net outside website It stands the types such as column, cart page, the order page, retail shop's details page, item detail page;Website column can be website Different columns, such as forum, commodity column, map, crowd the columns such as raise, purchase, subleting.Further, if including in URL Retail shop's behavior is accessed, then extracts retail shop ID and its corresponding main business description;If in URL including access commodity behavior, Retail shop's main business description of retail shop where extracting commodity ID and its corresponding commodity title and commodity;If including in URL Search behavior then instead decodes the search string in URL, obtains search key.
For example, the URL of a log be "/product/detail2.htm? productId=928685056 ", pass through The URL is analyzed it is found that user's access type is item detail page, the column for accessing website is commodity column, accesses commodity ID It is 928685056, which is " wholesale glasses promote children's money sunglasses sunglasses ellipse children mirror ", accesses retail shop ID It is 056034, retail shop's main business description is " wholesale sunglasses, goggles spectacle-frame, polariscope, Kids eyeglasses ";For another example, separately The URL of one log be "/search/s.html? q=%e7%ba%a2%e9%85%92 ", access type indicate user Search behavior, access website column be search column, the search key extracted is " sunglasses ".
In this embodiment, the behavior motive of user, example are reflected from extracted five features of log critical field URL Such as, user is to interest of behaviors such as browsing, search etc..The user behavior that every log is reflected can be obtained in this way Motivation.
Step 130, user is identified by calculating user behavior motivation similarity.
In this step, user is identified by calculating the user behavior motivation similarity of different logs.
In one embodiment, user's identification process includes preliminary matches identification process and base based on log critical field In the Secondary Match identification process of user behavior motivation similarity.
Specifically, in preliminary matches identification process, two logs for meeting following either condition are judged to belonging to same User:
The user identifier of two logs is not null field and identical;
The cookie mark of two logs is not null field and identical;
The session mark of two logs is not null field and identical;
URL the and referer URL of two logs meets the topological structure of website, i.e., the URL of one log can be from another The URL of log is reached.
After preliminary matches identification, the log for not identifying user is put into candidate pool, with further base User is identified in user behavior motivation similarity, for example, calculating user behavior motivation phases according to the five of extraction behavioural characteristics Like degree, the log that similarity is less than given threshold is determined as the same user.
According to one embodiment of present invention, user behavior motivation similarity calculation is extracted in certain period of time URL feature is calculated, by taking extracted above-mentioned five features as an example, if calculating process includes: to determine that two URL include Retail shop's behavior, access commodity behavior, any one in search behavior are accessed, then to the main business description of extraction or commodity mark Topic or search key carry out Text similarity computing, obtain similarity value, and two logs that similarity value is less than threshold value are sentenced It is set to the same user;Otherwise, to the access type and access website column progress comparison of combined in above-mentioned five features, if Both access type and access website plate are identical, then otherwise it is different user that two logs, which are the same user,.
In one embodiment, word2vec algorithm can be used in above-mentioned Text similarity computing, for example, to Chinese Wiki The participle data of encyclopaedia and the small item dictionary having built up merge training term vector, main business description to extraction or Commodity title or search key are segmented, and remove stop words, obtain the word insertion of URL, and calculate its cosine similarity. For example, being for above-mentioned URL "/product/detail2.htm? the log of productId=928685056 " and URL be "/ Search/s.html? the log of q=%e7%ba%a2%e9%85%92 ", word2vec algorithm calculate separately first The corresponding word of the corresponding search key of word insertion vector sum Article 2 URL after URL corresponding commodity title participle be embedded in The cosine similarity of amount reflects identical user's motivation, expression is the same use if the similarity of the two is less than threshold value Family.
Step 140, user identifier is generated.
It further include the cookie mark new for the cookie mark generation of missing after carrying out user's identification, and according to Cookie mark generates user identifier.
In one embodiment, for being determined as two logs of identical user, the rule of new cookie mark are generated It is then: directly ignores if the cookie of two logs mark all exists, if the cookie mark of only one log lacks Lose, be then assigned to this log with the cookie mark of another log, otherwise according to IP, user-agent of log, access when Between three critical fielies come generate unique key be assigned to two missing cookie mark logs.Finally, being identical user Identical user identifier is generated, is that different users is close to generate according to three IP, user-agent, access time critical fielies Unique user identifier is generated for user in key.Specific generation method about user identifier belongs to the prior art, no longer superfluous herein It states.
According to the second aspect of the invention, a kind of user's identifying system is provided, which is based on spark platform and realizes this The user identification method of invention.Spark is a kind of expansible Data Analysis Platform, uses elasticity distribution formula data set RDD, Distributed memory parallel computation engine is provided, quickly iterative calculation, the calculating of Spark platform is supported to pass through operation RDD (bullet Property distributed data collection) Lai Jinhang, the operation to RDD include Map, mapPartition, flatMap, join, The operators such as Repartition, Filter, Union, GroupBy.
User's identifying system 200 shown in Figure 2, the system include User action log acquisition module 210, Yong Huhang For log integrity module 220, user behavior characteristics extraction module 230, bucket division module 240 and subscriber identification module 250 with And identifier generation module 260.
User action log acquisition module 210, for acquiring User action log from web log server.For example, can Log is stored onto the HDFS (distributed file system) of computing cluster, and User action log is read in by Spark Deposit middle formation RDD.
User action log preprocessing module 220 obtains normalized day for pre-processing to User action log Will critical field.For example, User action log is filtered using the filter operator of Spark, to filter out crawler and abnormal number According to further to obtain log critical field, including IP, uniform resource position mark URL, the unified resource of a upper accession page Finger URL (referer URL), log access time, user identifier, user-agent, cookie mark, session mark Deng.
User behavior characteristics extraction module 230, for extracting the feature of reflection user behavior from log critical field.Example Such as, five features are constructed from the URL in log critical field to indicate user behavior motivation, which is access respectively Type, access website column, access retail shop's behavior, access commodity behavior, search behavior.In addition, user behavior characteristics extract mould Block is also used to, if in URL including access retail shop's behavior, extracts retail shop ID and its corresponding main business description;If It include access commodity behavior, the then retail shop of retail shop where extracting commodity ID and its corresponding commodity title and commodity in URL Main business description;If in URL including search behavior, the anti-search string decoded in URL obtains search key.
Bucket division module 240, for User action log to be divided into size not according to normalized log critical field Same data bucket.
In one embodiment, the bucket division module using distributed computing technology in the log critical field IP and The character string of user-agent splicing is key value, is value with User action log, User action log is divided into size Different data buckets.For example, User action log be converted to key using the mapPartitionsToPair operator of Spark being The pairRDD structure of IP and user-agent string-concatenation is as data bucket of different sizes.
Subscriber identification module 250, each data bucket user according to the present invention obtained for bucket division module 240 identify Method is identified.
For example, using in Spark groupBykey operator and mapValues operator the bucket division module obtained Each data bucket carries out user's identification.User identification detailed process and text similarity calculating can be found in above for The description of family recognition methods, wherein SparkML can be used to calculate text similarity.
Identifier generation module 260, for generating user identifier to every log after identification, mark generating process be can be found in Above.
In conclusion the present invention acquires Web user user behaviors log, crawler and disturbance records are filtered out, by journal formatting For the key log field of specification, and the behavior motive of multiple URL character representation users of every log is extracted, utilizes distribution User action log is divided into data bucket of different sizes according to IP and user-agent by formula technology, according still further to user identifier, Cookie mark, session mark and URL and referer URL carry out preliminary user's matching, then by calculating user Behavior motive similarity determines identical user, is different use finally, generating identical user identifier for identical user Unique user identifier is generated for user according to the key that IP, user-agent and access time three generate in family.Utilize this hair The method and system of bright offer, traditional algorithm can mistake in the case where can effectively solve the problem that cookie missing or carrying out origin url missing One IP of the problem of being identified as multiple users and multiple user sharings even identical equipment when traditional algorithm can not effectively distinguish The problem of, to improve the accuracy rate of user's identification, furthermore the present invention can be fast using the distributed memory technology based on Spark Speed identification user, efficiency with higher.
Above embodiments only indicate the preferred embodiment of the present invention, the scope of the patents being not intended to restrict the invention, the skill Art field personnel can carry out equivalent fractionation or combined transformation to modules, it is all under thinking of the invention, make it is equivalent Change, in scope of patent protection of the invention.
It should be noted that, although each step is described according to particular order above, it is not intended that must press Each step is executed according to above-mentioned particular order, in fact, some in these steps can concurrently execute, or even is changed suitable Sequence, as long as can be realized required function.
The present invention can be system, method and/or computer program product.Computer program product may include computer Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the invention.
Computer readable storage medium can be to maintain and store the tangible device of the instruction used by instruction execution equipment. Computer readable storage medium for example can include but is not limited to storage device electric, magnetic storage apparatus, light storage device, electromagnetism and deposit Store up equipment, semiconductor memory apparatus or above-mentioned any appropriate combination.The more specific example of computer readable storage medium Sub (non exhaustive list) include: portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), Portable compressed disk are read-only Memory (CD-ROM), memory stick, floppy disk, mechanical coding equipment, is for example stored thereon with instruction at digital versatile disc (DVD) Punch card or groove internal projection structure and above-mentioned any appropriate combination.
Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In principle, the practical application or to the technological improvement in market for best explaining each embodiment, or make the art its Its those of ordinary skill can understand each embodiment disclosed herein.

Claims (12)

1. a kind of web log user identification method, comprising the following steps:
Step 1: log critical field is extracted from User action log, wherein the log critical field includes at least unified Resource Locator URL;
Step 2: multiple behavioural characteristics of building reflection user behavior motivation from the uniform resource position mark URL;
Step 3: user behavior motivation similarity being calculated according to the multiple behavioural characteristic and is based on the user behavior motivation phase User is identified like degree.
2. according to the method described in claim 1, wherein, the multiple behavioural characteristic include access type, access website column, Access retail shop's behavior, access commodity behavior, in search behavior at least one of.
3. according to the method described in claim 1, wherein, step 3 further comprises:
Step 31: identifying whether it is the same use based on the log critical field extracted from the User action log Family;
Step 32: the User action log for not identifying user is based further on the user behavior motivation similarity To be identified.
4. according to the method described in claim 3, wherein, the log critical field also includes at least an accession page Uniform resource locator referrer URL, user identifier, user-agent, cookie mark and session mark, in step In 31, two User action logs for meeting any of the following conditions are determined as the same user:
The user identifier of two User action logs is not null field and identical;
The cookie mark of two user behaviors, two logs is not null field and identical;
The session mark of two User action logs is not null field and identical;Or
The uniform resource locator of the uniform resource position mark URL of two User action logs and a upper accession page Referrer URL meets the topological structure of website.
5. according to the method described in claim 3, wherein, for two User action logs, step 32 includes:
Step 321: if the multiple behavioural characteristics extracted from two User action logs include access retail shop's behavior, mentioning Main business is taken to describe, if extracting commodity title comprising accessing commodity behavior, if extraction is searched comprising search behavior Rope keyword;
Step 322: for main business description, commodity title or the search key extracted from two User action logs User behavior motivation similarity calculation is carried out, if similarity difference is less than threshold value, determines that two User action logs are The same user.
6. according to the method described in claim 5, wherein, step 32 further include:
Step 323, if the multiple behavioural characteristics extracted from two User action logs include access type and access net It stands column, then the access type of two User action logs and access website column is subjected to comparison of combined, if the two phase It is same then determine two User action logs be the same user.
7. according to the method described in claim 6, wherein, in step 322, first user of word2vec calculating will be utilized The word of any one of the main business description of user behaviors log or commodity title or search key is embedded in vector sum Article 2 user The cosine phase of the word insertion vector of any one of the main business description of user behaviors log or commodity title or search key It is used as the user behavior motivation similarity like degree, if the user behavior motivation similarity is less than threshold value, judgement is same One user.
8. a kind of user's identifying system, comprising:
User action log acquisition module: for extracting log critical field from User action log, wherein the log is closed Key field includes at least uniform resource position mark URL;
User behavior characteristics extraction module: for the building reflection user behavior motivation from the uniform resource position mark URL Multiple behavioural characteristics;
Subscriber identification module: for calculating user behavior motivation similarity according to the multiple behavioural characteristic and being based on the user Behavior motive similarity identifies user.
9. system according to claim 8, which is characterized in that further include:
User action log preprocessing module: for User action log collected to be filtered or formatted;
Bucket division module: for institute's User action log to be divided into data of different sizes according to the log critical field Bucket;
Identifier generation module: user generates user identifier to the user identified.
10. system according to claim 8 or claim 9, which is characterized in that the system is the system based on spark platform.
11. a kind of computer readable storage medium, is stored thereon with computer program, wherein when the program is executed by processor The step of realizing according to claim 1 to any one of 7 the method.
12. a kind of computer equipment, including memory and processor, be stored on the memory to transport on a processor Capable computer program, which is characterized in that the processor realizes any one of claims 1 to 7 institute when executing described program The step of method stated.
CN201811276191.9A 2018-10-30 2018-10-30 A kind of web log user identification method and system Pending CN109583472A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811276191.9A CN109583472A (en) 2018-10-30 2018-10-30 A kind of web log user identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811276191.9A CN109583472A (en) 2018-10-30 2018-10-30 A kind of web log user identification method and system

Publications (1)

Publication Number Publication Date
CN109583472A true CN109583472A (en) 2019-04-05

Family

ID=65921317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811276191.9A Pending CN109583472A (en) 2018-10-30 2018-10-30 A kind of web log user identification method and system

Country Status (1)

Country Link
CN (1) CN109583472A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059141A (en) * 2019-04-22 2019-07-26 珠海网博信息科技股份有限公司 A method of relationship analysis is carried out to different acquisition feature by log track
CN110096499A (en) * 2019-04-10 2019-08-06 华南理工大学 A kind of the user object recognition methods and system of Behavior-based control time series big data
CN110602038A (en) * 2019-08-01 2019-12-20 中国科学院信息工程研究所 Abnormal UA detection and analysis method and system based on rules
WO2021012483A1 (en) * 2019-07-23 2021-01-28 平安科技(深圳)有限公司 Information identification method and apparatus, and computer device and storage medium
CN113556368A (en) * 2020-04-23 2021-10-26 北京达佳互联信息技术有限公司 User identification method, device, server and storage medium
CN114021014A (en) * 2021-11-04 2022-02-08 山东库睿科技有限公司 Single-equipment multi-user recommendation method, device, equipment and storage medium
CN117708863A (en) * 2024-02-05 2024-03-15 四川集鲜数智供应链科技有限公司 Equipment data encryption processing method based on Internet of things

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1804844A (en) * 2006-01-10 2006-07-19 西安交通大学 Web page metadata based formalized description method for user access behaviors
CN103051637A (en) * 2012-12-31 2013-04-17 北京亿赞普网络技术有限公司 User identification method and device
US20130346447A1 (en) * 2012-06-21 2013-12-26 Xerox Corporation Systems and methods for behavioral pattern mining
CN104217030A (en) * 2014-09-28 2014-12-17 北京奇虎科技有限公司 Method and device for classifying users according to search log data of server
CN104731914A (en) * 2015-03-24 2015-06-24 浪潮集团有限公司 Method for detecting user abnormal behavior based on behavior similarity
CN105574200A (en) * 2015-12-29 2016-05-11 成都陌云科技有限公司 User interest extraction method based on historical record
CN105786965A (en) * 2016-01-27 2016-07-20 久远谦长(北京)技术服务有限公司 URL-based user behavior analysis method and device
US9998598B1 (en) * 2017-02-02 2018-06-12 Conduent Business Services, Llc Methods and systems for automatically recognizing actions in a call center environment using screen capture technology
CN108200101A (en) * 2018-03-13 2018-06-22 河南工学院 A kind of computer system and its personal identification method and device of user

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1804844A (en) * 2006-01-10 2006-07-19 西安交通大学 Web page metadata based formalized description method for user access behaviors
US20130346447A1 (en) * 2012-06-21 2013-12-26 Xerox Corporation Systems and methods for behavioral pattern mining
CN103051637A (en) * 2012-12-31 2013-04-17 北京亿赞普网络技术有限公司 User identification method and device
CN104217030A (en) * 2014-09-28 2014-12-17 北京奇虎科技有限公司 Method and device for classifying users according to search log data of server
CN104731914A (en) * 2015-03-24 2015-06-24 浪潮集团有限公司 Method for detecting user abnormal behavior based on behavior similarity
CN105574200A (en) * 2015-12-29 2016-05-11 成都陌云科技有限公司 User interest extraction method based on historical record
CN105786965A (en) * 2016-01-27 2016-07-20 久远谦长(北京)技术服务有限公司 URL-based user behavior analysis method and device
US9998598B1 (en) * 2017-02-02 2018-06-12 Conduent Business Services, Llc Methods and systems for automatically recognizing actions in a call center environment using screen capture technology
CN108200101A (en) * 2018-03-13 2018-06-22 河南工学院 A kind of computer system and its personal identification method and device of user

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
周三多: "基于大数据平台的搜索日志分析技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
周松松等: "基于URL相似度的会话识别方法", 《计算机系统应用》 *
汤伟等: "基于行为分析的web日志用户识别算法", 《软件产业与工程》 *
肖慧等: "Web日志挖掘中的用户识别算法", 《计算机系统应用》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110096499A (en) * 2019-04-10 2019-08-06 华南理工大学 A kind of the user object recognition methods and system of Behavior-based control time series big data
CN110096499B (en) * 2019-04-10 2021-08-10 华南理工大学 User object identification method and system based on behavior time series big data
CN110059141A (en) * 2019-04-22 2019-07-26 珠海网博信息科技股份有限公司 A method of relationship analysis is carried out to different acquisition feature by log track
WO2021012483A1 (en) * 2019-07-23 2021-01-28 平安科技(深圳)有限公司 Information identification method and apparatus, and computer device and storage medium
CN110602038A (en) * 2019-08-01 2019-12-20 中国科学院信息工程研究所 Abnormal UA detection and analysis method and system based on rules
CN113556368A (en) * 2020-04-23 2021-10-26 北京达佳互联信息技术有限公司 User identification method, device, server and storage medium
CN114021014A (en) * 2021-11-04 2022-02-08 山东库睿科技有限公司 Single-equipment multi-user recommendation method, device, equipment and storage medium
CN117708863A (en) * 2024-02-05 2024-03-15 四川集鲜数智供应链科技有限公司 Equipment data encryption processing method based on Internet of things
CN117708863B (en) * 2024-02-05 2024-04-19 四川集鲜数智供应链科技有限公司 Equipment data encryption processing method based on Internet of things

Similar Documents

Publication Publication Date Title
CN109583472A (en) A kind of web log user identification method and system
US11463476B2 (en) Character string classification method and system, and character string classification device
US8190621B2 (en) Method, system, and computer readable recording medium for filtering obscene contents
CN108280114B (en) Deep learning-based user literature reading interest analysis method
CN105183781B (en) Information recommendation method and device
CN104899508B (en) A kind of multistage detection method for phishing site and system
KR101029160B1 (en) Method, system and computer-readable recording medium for writing new image and its information onto image database
CN103544255A (en) Text semantic relativity based network public opinion information analysis method
US8825620B1 (en) Behavioral word segmentation for use in processing search queries
WO2008106668A1 (en) User query mining for advertising matching
WO2017084205A1 (en) Network user identity authentication method and system
CN105574200A (en) User interest extraction method based on historical record
Katragadda et al. Framework for real-time event detection using multiple social media sources
CN103530429A (en) Webpage content extracting method
CN110825941A (en) Content management system identification method, device and storage medium
CN108292408A (en) The method for detecting WEB follow-up services
CN107527289B (en) Investment portfolio industry configuration method, device, server and storage medium
Chen et al. Finding keywords in blogs: Efficient keyword extraction in blog mining via user behaviors
CN106844588A (en) A kind of analysis method and system of the user behavior data based on web crawlers
CN110795613A (en) Commodity searching method, device and system and electronic equipment
CN105404697A (en) Social interaction behavior collection and detection method
CN106997350A (en) A kind of method and device of data processing
CN112035723A (en) Resource library determination method and device, storage medium and electronic device
CN110019659B (en) Method and device for searching referee document
CN105512334A (en) Data mining method based on search words

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination