CN109583472A - A kind of web log user identification method and system - Google Patents
A kind of web log user identification method and system Download PDFInfo
- Publication number
- CN109583472A CN109583472A CN201811276191.9A CN201811276191A CN109583472A CN 109583472 A CN109583472 A CN 109583472A CN 201811276191 A CN201811276191 A CN 201811276191A CN 109583472 A CN109583472 A CN 109583472A
- Authority
- CN
- China
- Prior art keywords
- user
- log
- behavior
- user action
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3438—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides a kind of web log user identification method and systems.This method comprises: extracting log critical field from User action log, wherein the log critical field includes at least uniform resource position mark URL;Multiple behavioural characteristics of building reflection user behavior motivation from the uniform resource position mark URL;User behavior motivation similarity is calculated according to the multiple behavioural characteristic and user is identified based on the user behavior motivation similarity.Method and system of the invention can accurately and effectively carry out user's identification according to web log.
Description
Technical field
The present invention relates to information technology field more particularly to a kind of web log user identification method and systems.
Background technique
Basis of the user recognition technology as Web Web log mining is to analyze anonymous from a large amount of unordered data
Independent behavior track and feature, and finally identify unique user individual.In the prior art, heuristic side is usually utilized
Method tracks user according to information such as IP, cookie of user mark and user-agent, and since user is logging in
When internet, Internet Service Provider is often randomly assigned IP address to user, so that a user possesses many IP
, " multi-user's problem " and " single user problem " address, therefore this method is faced with following problems: 1), " multi-user's problem " is
In referring to the same user in different times by address field input URL or enter the Web page from collection can be identified as it is more
A user, " single user problem " refer to that one IP of multiple user sharings may be identified even with equipment of the same race and browser
For a user.2), efficiency, in the case of web user's amount of access is more than million ranks, current user, which identifies, to be calculated
Method it is inefficient.
Therefore, it is necessary to be improved to the prior art, to provide the use that a kind for the treatment of effeciency is high and recognition accuracy is high
Family recognition methods and system.
Summary of the invention
It is an object of the invention to overcome the defect of the above-mentioned prior art, provide a kind of web log user identification method and
System.
According to the first aspect of the invention, a kind of web log user identification method is provided, this method includes following step
It is rapid:
Step 1: log critical field is extracted from User action log, wherein the log critical field includes at least
Uniform resource position mark URL;
Step 2: multiple behavioural characteristics of building reflection user behavior motivation from the uniform resource position mark URL;
Step 3: user behavior motivation similarity being calculated according to the multiple behavioural characteristic and is moved based on the user behavior
Machine similarity identifies user.
In one embodiment, the multiple behavioural characteristic includes access type, access website column, access retail shop's row
At least one of for, access commodity behavior, in search behavior.
In one embodiment, step 3 further comprises:
Step 31: identifying whether it is same based on the log critical field extracted from the User action log
User;
Step 32: the User action log for not identifying user is based further on the user behavior motivation phase
It is identified like degree.
In one embodiment, the log critical field also includes at least the unified resource positioning an of accession page
Referrer URL, user identifier, user-agent, cookie mark and session mark are accorded with, in step 31, will be met
Two User action logs of any of the following conditions are determined as the same user:
The user identifier of two User action logs is not null field and identical;
The cookie mark of two user behaviors, two logs is not null field and identical;
The session mark of two User action logs is not null field and identical;Or
The uniform resource locator of the uniform resource position mark URL of two User action logs and a upper accession page
Referrer URL meets the topological structure of website.
In one embodiment, for two User action logs, step 32 includes:
Step 321: if the multiple behavioural characteristics extracted from two User action logs include access retail shop's behavior,
Main business description is then extracted, if commodity title is extracted, if mentioned comprising search behavior comprising accessing commodity behavior
Take search key;
Step 322: main business description, commodity title or the search extracted from two User action logs are closed
Keyword carries out user behavior motivation similarity calculation, if similarity difference is less than threshold value, determines two user behavior days
Will is the same user.
In one embodiment, step 32 further include:
Step 323, if the multiple behavioural characteristics extracted from two User action logs include access type and visit
It asks website column, then the access type of two User action logs and access website column is subjected to comparison of combined, if two
Person is identical, determines that two User action logs are the same user.
In one embodiment, in step 322, by the master of first User action log calculated using word2vec
Seek the institute of the word insertion vector sum Article 2 User action log of any one of business description or commodity title or search key
Described in the cosine similarity conduct for stating the word insertion vector of any one of main business description or commodity title or search key
User behavior motivation similarity, if the user behavior motivation similarity is less than threshold value, judgement is the same user.
According to the second aspect of the invention, a kind of user's identifying system is provided.The system includes:
User action log acquisition module: for extracting log critical field from User action log, wherein the day
Will critical field includes at least uniform resource position mark URL;
User behavior characteristics extraction module: dynamic for constructing reflection user behavior from the uniform resource position mark URL
Multiple behavioural characteristics of machine;
Subscriber identification module: for calculating user behavior motivation similarity according to the multiple behavioural characteristic and based on described
User behavior motivation similarity identifies user
In one embodiment, the system further include:
User action log preprocessing module: for User action log collected to be filtered or formatted;
Bucket division module: for institute's User action log to be divided into number of different sizes according to the log critical field
According to bucket;
Identifier generation module: user generates user identifier to the user identified.
In one embodiment, the system is the system based on spark platform.
Compared with the prior art, the advantages of the present invention are as follows: by the behavioural characteristic of user being extracted from web log come table
Show user behavior motivation, identify user using the similarity of user behavior motivation, improves the accuracy of user's identification, this
Outside, it based on realizing that method of the invention improves data-handling efficiency using Spark platform, can better meet at data
The requirement of real-time of reason.
Detailed description of the invention
The following drawings only makees schematical description and interpretation to the present invention, is not intended to limit the scope of the present invention, in which:
Fig. 1 shows the flow chart of web log user identification method according to an embodiment of the invention;
Fig. 2 shows the schematic diagrames of web log user's identifying system according to an embodiment of the invention.
Specific embodiment
It is logical below in conjunction with attached drawing in order to keep the purpose of the present invention, technical solution, design method and advantage more clear
Crossing specific embodiment, the present invention is described in more detail.It should be appreciated that specific embodiment described herein is only used for explaining
The present invention is not intended to limit the present invention.
According to one embodiment of present invention, a kind of web log user identification method is provided, this method is with web services
Foundation of the journal file as analysis on device, user is identified according to the similitude of user behavior motivation, specifically, referring to figure
Described in 1, method includes the following steps:
Step S110 pre-processes User action log to obtain log critical field.
User action log can be acquired from the web server of record user behavior, wherein include the action trail of user,
For example, IP address, online moment, purpose website IP, URL, flow, connection type etc..
The log critical field needed is obtained by carrying out pretreatment to log, preprocessing process includes in filtering log
Crawler with delete unrelated webpage information, Exception Filter data to avoid influence user identification accuracy rate, to the day after filtering
Will is formatted to extract required field etc..
In one embodiment, the log critical field extracted from web log includes that IP address, unified resource are fixed
Position symbol URL, the uniform resource locator (referer URL) of a upper accession page, the log access time, user identifier,
User-agent (user agent), cookie mark, session mark etc..
It should be noted that user identifier, cookie mark, session mark etc. may be sky word for every log
Section and is it through the following description as can be seen that one of the objects of the present invention is to identify that identification information is empty log
Fill unique identifier.
Step S120 obtains user behavior motivation based on log critical field.
In one embodiment, five behavioural characteristics are constructed based on the URL in log critical field to indicate user's row
For motivation, which is access type, access website column, access retail shop's behavior, access commodity behavior, search row respectively
For.
According to one embodiment of present invention, by that can be obtained in analysis URL by each component part of separator of "/"
To the type and access website plate of access.Access type can be the page, website homepage, searched page, net outside website
It stands the types such as column, cart page, the order page, retail shop's details page, item detail page;Website column can be website
Different columns, such as forum, commodity column, map, crowd the columns such as raise, purchase, subleting.Further, if including in URL
Retail shop's behavior is accessed, then extracts retail shop ID and its corresponding main business description;If in URL including access commodity behavior,
Retail shop's main business description of retail shop where extracting commodity ID and its corresponding commodity title and commodity;If including in URL
Search behavior then instead decodes the search string in URL, obtains search key.
For example, the URL of a log be "/product/detail2.htm? productId=928685056 ", pass through
The URL is analyzed it is found that user's access type is item detail page, the column for accessing website is commodity column, accesses commodity ID
It is 928685056, which is " wholesale glasses promote children's money sunglasses sunglasses ellipse children mirror ", accesses retail shop ID
It is 056034, retail shop's main business description is " wholesale sunglasses, goggles spectacle-frame, polariscope, Kids eyeglasses ";For another example, separately
The URL of one log be "/search/s.html? q=%e7%ba%a2%e9%85%92 ", access type indicate user
Search behavior, access website column be search column, the search key extracted is " sunglasses ".
In this embodiment, the behavior motive of user, example are reflected from extracted five features of log critical field URL
Such as, user is to interest of behaviors such as browsing, search etc..The user behavior that every log is reflected can be obtained in this way
Motivation.
Step 130, user is identified by calculating user behavior motivation similarity.
In this step, user is identified by calculating the user behavior motivation similarity of different logs.
In one embodiment, user's identification process includes preliminary matches identification process and base based on log critical field
In the Secondary Match identification process of user behavior motivation similarity.
Specifically, in preliminary matches identification process, two logs for meeting following either condition are judged to belonging to same
User:
The user identifier of two logs is not null field and identical;
The cookie mark of two logs is not null field and identical;
The session mark of two logs is not null field and identical;
URL the and referer URL of two logs meets the topological structure of website, i.e., the URL of one log can be from another
The URL of log is reached.
After preliminary matches identification, the log for not identifying user is put into candidate pool, with further base
User is identified in user behavior motivation similarity, for example, calculating user behavior motivation phases according to the five of extraction behavioural characteristics
Like degree, the log that similarity is less than given threshold is determined as the same user.
According to one embodiment of present invention, user behavior motivation similarity calculation is extracted in certain period of time
URL feature is calculated, by taking extracted above-mentioned five features as an example, if calculating process includes: to determine that two URL include
Retail shop's behavior, access commodity behavior, any one in search behavior are accessed, then to the main business description of extraction or commodity mark
Topic or search key carry out Text similarity computing, obtain similarity value, and two logs that similarity value is less than threshold value are sentenced
It is set to the same user;Otherwise, to the access type and access website column progress comparison of combined in above-mentioned five features, if
Both access type and access website plate are identical, then otherwise it is different user that two logs, which are the same user,.
In one embodiment, word2vec algorithm can be used in above-mentioned Text similarity computing, for example, to Chinese Wiki
The participle data of encyclopaedia and the small item dictionary having built up merge training term vector, main business description to extraction or
Commodity title or search key are segmented, and remove stop words, obtain the word insertion of URL, and calculate its cosine similarity.
For example, being for above-mentioned URL "/product/detail2.htm? the log of productId=928685056 " and URL be "/
Search/s.html? the log of q=%e7%ba%a2%e9%85%92 ", word2vec algorithm calculate separately first
The corresponding word of the corresponding search key of word insertion vector sum Article 2 URL after URL corresponding commodity title participle be embedded in
The cosine similarity of amount reflects identical user's motivation, expression is the same use if the similarity of the two is less than threshold value
Family.
Step 140, user identifier is generated.
It further include the cookie mark new for the cookie mark generation of missing after carrying out user's identification, and according to
Cookie mark generates user identifier.
In one embodiment, for being determined as two logs of identical user, the rule of new cookie mark are generated
It is then: directly ignores if the cookie of two logs mark all exists, if the cookie mark of only one log lacks
Lose, be then assigned to this log with the cookie mark of another log, otherwise according to IP, user-agent of log, access when
Between three critical fielies come generate unique key be assigned to two missing cookie mark logs.Finally, being identical user
Identical user identifier is generated, is that different users is close to generate according to three IP, user-agent, access time critical fielies
Unique user identifier is generated for user in key.Specific generation method about user identifier belongs to the prior art, no longer superfluous herein
It states.
According to the second aspect of the invention, a kind of user's identifying system is provided, which is based on spark platform and realizes this
The user identification method of invention.Spark is a kind of expansible Data Analysis Platform, uses elasticity distribution formula data set RDD,
Distributed memory parallel computation engine is provided, quickly iterative calculation, the calculating of Spark platform is supported to pass through operation RDD (bullet
Property distributed data collection) Lai Jinhang, the operation to RDD include Map, mapPartition, flatMap, join,
The operators such as Repartition, Filter, Union, GroupBy.
User's identifying system 200 shown in Figure 2, the system include User action log acquisition module 210, Yong Huhang
For log integrity module 220, user behavior characteristics extraction module 230, bucket division module 240 and subscriber identification module 250 with
And identifier generation module 260.
User action log acquisition module 210, for acquiring User action log from web log server.For example, can
Log is stored onto the HDFS (distributed file system) of computing cluster, and User action log is read in by Spark
Deposit middle formation RDD.
User action log preprocessing module 220 obtains normalized day for pre-processing to User action log
Will critical field.For example, User action log is filtered using the filter operator of Spark, to filter out crawler and abnormal number
According to further to obtain log critical field, including IP, uniform resource position mark URL, the unified resource of a upper accession page
Finger URL (referer URL), log access time, user identifier, user-agent, cookie mark, session mark
Deng.
User behavior characteristics extraction module 230, for extracting the feature of reflection user behavior from log critical field.Example
Such as, five features are constructed from the URL in log critical field to indicate user behavior motivation, which is access respectively
Type, access website column, access retail shop's behavior, access commodity behavior, search behavior.In addition, user behavior characteristics extract mould
Block is also used to, if in URL including access retail shop's behavior, extracts retail shop ID and its corresponding main business description;If
It include access commodity behavior, the then retail shop of retail shop where extracting commodity ID and its corresponding commodity title and commodity in URL
Main business description;If in URL including search behavior, the anti-search string decoded in URL obtains search key.
Bucket division module 240, for User action log to be divided into size not according to normalized log critical field
Same data bucket.
In one embodiment, the bucket division module using distributed computing technology in the log critical field IP and
The character string of user-agent splicing is key value, is value with User action log, User action log is divided into size
Different data buckets.For example, User action log be converted to key using the mapPartitionsToPair operator of Spark being
The pairRDD structure of IP and user-agent string-concatenation is as data bucket of different sizes.
Subscriber identification module 250, each data bucket user according to the present invention obtained for bucket division module 240 identify
Method is identified.
For example, using in Spark groupBykey operator and mapValues operator the bucket division module obtained
Each data bucket carries out user's identification.User identification detailed process and text similarity calculating can be found in above for
The description of family recognition methods, wherein SparkML can be used to calculate text similarity.
Identifier generation module 260, for generating user identifier to every log after identification, mark generating process be can be found in
Above.
In conclusion the present invention acquires Web user user behaviors log, crawler and disturbance records are filtered out, by journal formatting
For the key log field of specification, and the behavior motive of multiple URL character representation users of every log is extracted, utilizes distribution
User action log is divided into data bucket of different sizes according to IP and user-agent by formula technology, according still further to user identifier,
Cookie mark, session mark and URL and referer URL carry out preliminary user's matching, then by calculating user
Behavior motive similarity determines identical user, is different use finally, generating identical user identifier for identical user
Unique user identifier is generated for user according to the key that IP, user-agent and access time three generate in family.Utilize this hair
The method and system of bright offer, traditional algorithm can mistake in the case where can effectively solve the problem that cookie missing or carrying out origin url missing
One IP of the problem of being identified as multiple users and multiple user sharings even identical equipment when traditional algorithm can not effectively distinguish
The problem of, to improve the accuracy rate of user's identification, furthermore the present invention can be fast using the distributed memory technology based on Spark
Speed identification user, efficiency with higher.
Above embodiments only indicate the preferred embodiment of the present invention, the scope of the patents being not intended to restrict the invention, the skill
Art field personnel can carry out equivalent fractionation or combined transformation to modules, it is all under thinking of the invention, make it is equivalent
Change, in scope of patent protection of the invention.
It should be noted that, although each step is described according to particular order above, it is not intended that must press
Each step is executed according to above-mentioned particular order, in fact, some in these steps can concurrently execute, or even is changed suitable
Sequence, as long as can be realized required function.
The present invention can be system, method and/or computer program product.Computer program product may include computer
Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the invention.
Computer readable storage medium can be to maintain and store the tangible device of the instruction used by instruction execution equipment.
Computer readable storage medium for example can include but is not limited to storage device electric, magnetic storage apparatus, light storage device, electromagnetism and deposit
Store up equipment, semiconductor memory apparatus or above-mentioned any appropriate combination.The more specific example of computer readable storage medium
Sub (non exhaustive list) include: portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM),
Erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), Portable compressed disk are read-only
Memory (CD-ROM), memory stick, floppy disk, mechanical coding equipment, is for example stored thereon with instruction at digital versatile disc (DVD)
Punch card or groove internal projection structure and above-mentioned any appropriate combination.
Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and
It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill
Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport
In principle, the practical application or to the technological improvement in market for best explaining each embodiment, or make the art its
Its those of ordinary skill can understand each embodiment disclosed herein.
Claims (12)
1. a kind of web log user identification method, comprising the following steps:
Step 1: log critical field is extracted from User action log, wherein the log critical field includes at least unified
Resource Locator URL;
Step 2: multiple behavioural characteristics of building reflection user behavior motivation from the uniform resource position mark URL;
Step 3: user behavior motivation similarity being calculated according to the multiple behavioural characteristic and is based on the user behavior motivation phase
User is identified like degree.
2. according to the method described in claim 1, wherein, the multiple behavioural characteristic include access type, access website column,
Access retail shop's behavior, access commodity behavior, in search behavior at least one of.
3. according to the method described in claim 1, wherein, step 3 further comprises:
Step 31: identifying whether it is the same use based on the log critical field extracted from the User action log
Family;
Step 32: the User action log for not identifying user is based further on the user behavior motivation similarity
To be identified.
4. according to the method described in claim 3, wherein, the log critical field also includes at least an accession page
Uniform resource locator referrer URL, user identifier, user-agent, cookie mark and session mark, in step
In 31, two User action logs for meeting any of the following conditions are determined as the same user:
The user identifier of two User action logs is not null field and identical;
The cookie mark of two user behaviors, two logs is not null field and identical;
The session mark of two User action logs is not null field and identical;Or
The uniform resource locator of the uniform resource position mark URL of two User action logs and a upper accession page
Referrer URL meets the topological structure of website.
5. according to the method described in claim 3, wherein, for two User action logs, step 32 includes:
Step 321: if the multiple behavioural characteristics extracted from two User action logs include access retail shop's behavior, mentioning
Main business is taken to describe, if extracting commodity title comprising accessing commodity behavior, if extraction is searched comprising search behavior
Rope keyword;
Step 322: for main business description, commodity title or the search key extracted from two User action logs
User behavior motivation similarity calculation is carried out, if similarity difference is less than threshold value, determines that two User action logs are
The same user.
6. according to the method described in claim 5, wherein, step 32 further include:
Step 323, if the multiple behavioural characteristics extracted from two User action logs include access type and access net
It stands column, then the access type of two User action logs and access website column is subjected to comparison of combined, if the two phase
It is same then determine two User action logs be the same user.
7. according to the method described in claim 6, wherein, in step 322, first user of word2vec calculating will be utilized
The word of any one of the main business description of user behaviors log or commodity title or search key is embedded in vector sum Article 2 user
The cosine phase of the word insertion vector of any one of the main business description of user behaviors log or commodity title or search key
It is used as the user behavior motivation similarity like degree, if the user behavior motivation similarity is less than threshold value, judgement is same
One user.
8. a kind of user's identifying system, comprising:
User action log acquisition module: for extracting log critical field from User action log, wherein the log is closed
Key field includes at least uniform resource position mark URL;
User behavior characteristics extraction module: for the building reflection user behavior motivation from the uniform resource position mark URL
Multiple behavioural characteristics;
Subscriber identification module: for calculating user behavior motivation similarity according to the multiple behavioural characteristic and being based on the user
Behavior motive similarity identifies user.
9. system according to claim 8, which is characterized in that further include:
User action log preprocessing module: for User action log collected to be filtered or formatted;
Bucket division module: for institute's User action log to be divided into data of different sizes according to the log critical field
Bucket;
Identifier generation module: user generates user identifier to the user identified.
10. system according to claim 8 or claim 9, which is characterized in that the system is the system based on spark platform.
11. a kind of computer readable storage medium, is stored thereon with computer program, wherein when the program is executed by processor
The step of realizing according to claim 1 to any one of 7 the method.
12. a kind of computer equipment, including memory and processor, be stored on the memory to transport on a processor
Capable computer program, which is characterized in that the processor realizes any one of claims 1 to 7 institute when executing described program
The step of method stated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811276191.9A CN109583472A (en) | 2018-10-30 | 2018-10-30 | A kind of web log user identification method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811276191.9A CN109583472A (en) | 2018-10-30 | 2018-10-30 | A kind of web log user identification method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109583472A true CN109583472A (en) | 2019-04-05 |
Family
ID=65921317
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811276191.9A Pending CN109583472A (en) | 2018-10-30 | 2018-10-30 | A kind of web log user identification method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109583472A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110059141A (en) * | 2019-04-22 | 2019-07-26 | 珠海网博信息科技股份有限公司 | A method of relationship analysis is carried out to different acquisition feature by log track |
CN110096499A (en) * | 2019-04-10 | 2019-08-06 | 华南理工大学 | A kind of the user object recognition methods and system of Behavior-based control time series big data |
CN110602038A (en) * | 2019-08-01 | 2019-12-20 | 中国科学院信息工程研究所 | Abnormal UA detection and analysis method and system based on rules |
WO2021012483A1 (en) * | 2019-07-23 | 2021-01-28 | 平安科技(深圳)有限公司 | Information identification method and apparatus, and computer device and storage medium |
CN113556368A (en) * | 2020-04-23 | 2021-10-26 | 北京达佳互联信息技术有限公司 | User identification method, device, server and storage medium |
CN114021014A (en) * | 2021-11-04 | 2022-02-08 | 山东库睿科技有限公司 | Single-equipment multi-user recommendation method, device, equipment and storage medium |
CN117708863A (en) * | 2024-02-05 | 2024-03-15 | 四川集鲜数智供应链科技有限公司 | Equipment data encryption processing method based on Internet of things |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1804844A (en) * | 2006-01-10 | 2006-07-19 | 西安交通大学 | Web page metadata based formalized description method for user access behaviors |
CN103051637A (en) * | 2012-12-31 | 2013-04-17 | 北京亿赞普网络技术有限公司 | User identification method and device |
US20130346447A1 (en) * | 2012-06-21 | 2013-12-26 | Xerox Corporation | Systems and methods for behavioral pattern mining |
CN104217030A (en) * | 2014-09-28 | 2014-12-17 | 北京奇虎科技有限公司 | Method and device for classifying users according to search log data of server |
CN104731914A (en) * | 2015-03-24 | 2015-06-24 | 浪潮集团有限公司 | Method for detecting user abnormal behavior based on behavior similarity |
CN105574200A (en) * | 2015-12-29 | 2016-05-11 | 成都陌云科技有限公司 | User interest extraction method based on historical record |
CN105786965A (en) * | 2016-01-27 | 2016-07-20 | 久远谦长(北京)技术服务有限公司 | URL-based user behavior analysis method and device |
US9998598B1 (en) * | 2017-02-02 | 2018-06-12 | Conduent Business Services, Llc | Methods and systems for automatically recognizing actions in a call center environment using screen capture technology |
CN108200101A (en) * | 2018-03-13 | 2018-06-22 | 河南工学院 | A kind of computer system and its personal identification method and device of user |
-
2018
- 2018-10-30 CN CN201811276191.9A patent/CN109583472A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1804844A (en) * | 2006-01-10 | 2006-07-19 | 西安交通大学 | Web page metadata based formalized description method for user access behaviors |
US20130346447A1 (en) * | 2012-06-21 | 2013-12-26 | Xerox Corporation | Systems and methods for behavioral pattern mining |
CN103051637A (en) * | 2012-12-31 | 2013-04-17 | 北京亿赞普网络技术有限公司 | User identification method and device |
CN104217030A (en) * | 2014-09-28 | 2014-12-17 | 北京奇虎科技有限公司 | Method and device for classifying users according to search log data of server |
CN104731914A (en) * | 2015-03-24 | 2015-06-24 | 浪潮集团有限公司 | Method for detecting user abnormal behavior based on behavior similarity |
CN105574200A (en) * | 2015-12-29 | 2016-05-11 | 成都陌云科技有限公司 | User interest extraction method based on historical record |
CN105786965A (en) * | 2016-01-27 | 2016-07-20 | 久远谦长(北京)技术服务有限公司 | URL-based user behavior analysis method and device |
US9998598B1 (en) * | 2017-02-02 | 2018-06-12 | Conduent Business Services, Llc | Methods and systems for automatically recognizing actions in a call center environment using screen capture technology |
CN108200101A (en) * | 2018-03-13 | 2018-06-22 | 河南工学院 | A kind of computer system and its personal identification method and device of user |
Non-Patent Citations (4)
Title |
---|
周三多: "基于大数据平台的搜索日志分析技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
周松松等: "基于URL相似度的会话识别方法", 《计算机系统应用》 * |
汤伟等: "基于行为分析的web日志用户识别算法", 《软件产业与工程》 * |
肖慧等: "Web日志挖掘中的用户识别算法", 《计算机系统应用》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110096499A (en) * | 2019-04-10 | 2019-08-06 | 华南理工大学 | A kind of the user object recognition methods and system of Behavior-based control time series big data |
CN110096499B (en) * | 2019-04-10 | 2021-08-10 | 华南理工大学 | User object identification method and system based on behavior time series big data |
CN110059141A (en) * | 2019-04-22 | 2019-07-26 | 珠海网博信息科技股份有限公司 | A method of relationship analysis is carried out to different acquisition feature by log track |
WO2021012483A1 (en) * | 2019-07-23 | 2021-01-28 | 平安科技(深圳)有限公司 | Information identification method and apparatus, and computer device and storage medium |
CN110602038A (en) * | 2019-08-01 | 2019-12-20 | 中国科学院信息工程研究所 | Abnormal UA detection and analysis method and system based on rules |
CN113556368A (en) * | 2020-04-23 | 2021-10-26 | 北京达佳互联信息技术有限公司 | User identification method, device, server and storage medium |
CN114021014A (en) * | 2021-11-04 | 2022-02-08 | 山东库睿科技有限公司 | Single-equipment multi-user recommendation method, device, equipment and storage medium |
CN117708863A (en) * | 2024-02-05 | 2024-03-15 | 四川集鲜数智供应链科技有限公司 | Equipment data encryption processing method based on Internet of things |
CN117708863B (en) * | 2024-02-05 | 2024-04-19 | 四川集鲜数智供应链科技有限公司 | Equipment data encryption processing method based on Internet of things |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109583472A (en) | A kind of web log user identification method and system | |
US11463476B2 (en) | Character string classification method and system, and character string classification device | |
US8190621B2 (en) | Method, system, and computer readable recording medium for filtering obscene contents | |
CN108280114B (en) | Deep learning-based user literature reading interest analysis method | |
CN105183781B (en) | Information recommendation method and device | |
CN104899508B (en) | A kind of multistage detection method for phishing site and system | |
KR101029160B1 (en) | Method, system and computer-readable recording medium for writing new image and its information onto image database | |
CN103544255A (en) | Text semantic relativity based network public opinion information analysis method | |
US8825620B1 (en) | Behavioral word segmentation for use in processing search queries | |
WO2008106668A1 (en) | User query mining for advertising matching | |
WO2017084205A1 (en) | Network user identity authentication method and system | |
CN105574200A (en) | User interest extraction method based on historical record | |
Katragadda et al. | Framework for real-time event detection using multiple social media sources | |
CN103530429A (en) | Webpage content extracting method | |
CN110825941A (en) | Content management system identification method, device and storage medium | |
CN108292408A (en) | The method for detecting WEB follow-up services | |
CN107527289B (en) | Investment portfolio industry configuration method, device, server and storage medium | |
Chen et al. | Finding keywords in blogs: Efficient keyword extraction in blog mining via user behaviors | |
CN106844588A (en) | A kind of analysis method and system of the user behavior data based on web crawlers | |
CN110795613A (en) | Commodity searching method, device and system and electronic equipment | |
CN105404697A (en) | Social interaction behavior collection and detection method | |
CN106997350A (en) | A kind of method and device of data processing | |
CN112035723A (en) | Resource library determination method and device, storage medium and electronic device | |
CN110019659B (en) | Method and device for searching referee document | |
CN105512334A (en) | Data mining method based on search words |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |