CN109783460A - User behavior based on network log is portrayed and prediction technique and system - Google Patents

User behavior based on network log is portrayed and prediction technique and system Download PDF

Info

Publication number
CN109783460A
CN109783460A CN201910089017.1A CN201910089017A CN109783460A CN 109783460 A CN109783460 A CN 109783460A CN 201910089017 A CN201910089017 A CN 201910089017A CN 109783460 A CN109783460 A CN 109783460A
Authority
CN
China
Prior art keywords
user
vector
feature vector
character feature
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910089017.1A
Other languages
Chinese (zh)
Inventor
康海燕
王紫豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Science and Technology University
Original Assignee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Science and Technology University filed Critical Beijing Information Science and Technology University
Priority to CN201910089017.1A priority Critical patent/CN109783460A/en
Publication of CN109783460A publication Critical patent/CN109783460A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of user behaviors based on network log to portray and prediction technique and system.This method comprises: obtaining the network log of user;The behavioural characteristic vector of user is extracted according to network log;Acquisition standard character feature vector;Calculate the behavioural characteristic vector of user and the similarity of each standard character feature vector;Character trait representated by the highest standard character feature vector of similarity is determined as to the character trait of user;Determine natural sciences class keywords quantity and literal arts class keywords quantity in the behavioural characteristic vector of user;According to the ratio of natural sciences class keywords quantity and literal arts class keywords quantity in the behavioural characteristic vector of user, the behavior of user is predicted.User behavior provided by the invention based on network log portray with prediction technique and system can personality to user, behavior predict, in turn, provide data for prevention harm and support.

Description

User behavior based on network log is portrayed and prediction technique and system
Technical field
The present invention relates to a kind of user behaviors based on network log to portray and prediction technique and system.
Background technique
With the rapid development of network and information resources, network search engines have become the main way that people obtain information Diameter, web search log contain the behavior and demand of user, may determine that the personality of a people from network log, or even can To predict the thing next to be done of user.This is even more important in security fields, can be sentenced according to the next behavior of user Which user of breaking can be classified as population at risk, as hacker is attacked commonly using the method for social engineering using the weakness of people. If user information is revealed, offender searches for the implementations such as identity information, the phone number of user on network and steals account fund Purpose.Hacker carries out information detective first, the information such as name, telephone number, identification card number is collected, to pretend user with reality Now to the deception of server end, user account is stolen.Therefore, if security department can be judged by analysis network log Dangerous crowd, it might even be possible to know the even specific thing next to be done of people of this kind of crowd, so that it may give warning in advance, take precautions against The generation of harm.
Summary of the invention
The object of the present invention is to provide a kind of user behavior based on network log portray with prediction technique and system, can The personality of user is carried out portraying prediction, in turn, the risk of user is predicted according to user's personality, is mentioned for prevention harm It is supported for data.
To achieve the above object, the present invention provides following schemes:
A kind of user behavior based on network log is portrayed and prediction technique, comprising:
Obtain the network log of user;
The behavioural characteristic vector of the user is extracted according to the network log, the behavioural characteristic vector is user network Each field keyword accounts for the vector that the ratio of keyword sum is constituted in log, and the field is divided into natural science field and society Meeting scientific domain, the natural science field include military affairs, science and technology, sport, tourism and food, and the domain of the social sciences includes Shi Zheng, literature and art, society, amusement and beauty;
Acquisition standard character feature vector, the standard character feature vector are that each field keyword accounts for pass in standard personality The vector that the ratio of keyword sum is constituted, the field are divided into natural science field and domain of the social sciences, the nature section Field include military affairs, science and technology, sport, tourism and food, the domain of the social sciences include history political affairs, literature and art, society, amusement and Beauty;
Calculate the behavioural characteristic vector of the user and the similarity of each standard character feature vector;
The personality that character trait representated by the highest standard character feature vector of similarity is determined as the user is special Sign.
Optionally, natural sciences class keywords quantity and literal arts class keywords number in the behavioural characteristic vector of the user are determined Amount;
According to the ratio of natural sciences class keywords quantity and literal arts class keywords quantity in the behavioural characteristic vector of user, prediction The behavior of the user.
Optionally, the behavioural characteristic vector for calculating the user is similar to each standard character feature vector Degree, specifically includes:
Calculate the behavioural characteristic vector of the user and the cosine similarity of each standard character feature vector;
The smallest standard character feature vector of cosine similarity is determined as and the user behavior characteristics vector similarity Maximum standard character feature vector.
Optionally, described that character trait representated by the highest standard character feature vector of similarity is determined as the use The character trait at family, specifically includes:
The standard character feature vector is divided into positive personality, intermediate personality and passive personality three types;
It will be determined as the use with the maximum affiliated type of standard character feature vector of user behavior characteristics vector similarity The character type at family.
Optionally, natural sciences class keywords quantity and literal arts class keywords quantity in the behavioural characteristic vector according to user Ratio, predict the behavior of the user, specifically include:
When natural sciences class keywords quantity in the behavioural characteristic vector of the user and the ratio of literal arts class keywords quantity are When 3:1, predict that the user has a possibility that damaging to other people.
It is portrayed the present invention also provides a kind of user behavior based on network log and forecasting system, comprising:
Network log obtains module, for obtaining the network log of user;
User behavior characteristics vector extraction module, for extracted according to the network log behavioural characteristic of the user to Amount, the ratio that the behavioural characteristic vector accounts for keyword sum by field keyword each in user network log constitute to Amount, the field are divided into natural science field and domain of the social sciences, the natural science field include military affairs, science and technology, sport, Tourism and food, the domain of the social sciences include history political affairs, literature and art, society, amusement and beauty;
Standard character feature vector obtains module, for obtaining standard character feature vector, the standard character trait to Amount accounts for the vector that the ratio of keyword sum is constituted by field keyword each in standard personality, and the field is divided into natural science Field and domain of the social sciences, the natural science field include military affairs, science and technology, sport, tourism and food, the social science Field includes history political affairs, literature and art, society, amusement and beauty;
Similarity calculation module, for calculating the behavioural characteristic vector and each standard character feature vector of the user Similarity;
User's personification module, for character trait representated by the highest standard character feature vector of similarity is true It is set to the character trait of the user.
Optionally, keyword quantity determining module, natural sciences class is crucial in the behavioural characteristic vector for determining the user Word quantity and literal arts class keywords quantity;
User's behavior prediction module, for natural sciences class keywords quantity in the behavioural characteristic vector according to user and literal arts class The ratio of keyword quantity predicts the behavior of the user.
Optionally, the similarity calculation module, specifically includes:
Similarity calculated, for calculating the behavioural characteristic vector and each standard character feature vector of the user Cosine similarity;
Personality determination unit, for the smallest standard character feature vector of cosine similarity to be determined as and user's row It is characterized the maximum standard character feature vector of vector similarity.
Optionally, user's personification module, specifically includes:
Character type division unit, for by the standard character feature vector be divided into positive personality, intermediate personality and Passive personality three types;
User's personification unit, being used for will be with the maximum standard character feature vector of user behavior characteristics vector similarity Affiliated type is determined as the character type of the user.
Optionally, the user's behavior prediction module, specifically includes:
User's behavior prediction unit, for when natural sciences class keywords quantity and literal arts in the behavioural characteristic vector of the user When the ratio of class keywords quantity is 3:1, predict that the user has a possibility that damaging to other people.
The specific embodiment provided according to the present invention, the invention discloses following technical effects: provided by the invention to be based on The user behavior of network log portray with prediction technique and system, according to user network log extract user behavioural characteristic to Amount, calculates the behavioural characteristic vector of user and the similarity of each standard character feature vector, by the highest standard personality of similarity Character trait representated by feature vector is determined as the character trait of the user, is determined and is used according to user's character trait of acquisition The risk at family.Meanwhile determining natural sciences class keywords quantity and literal arts class keywords number in the behavioural characteristic vector of the user Amount, according to the ratio of natural sciences class keywords quantity and literal arts class keywords quantity, predicts the behavior risk of the user, carries out It gives warning in advance, takes precautions against the generation of harm.
Detailed description of the invention
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings Obtain other attached drawings.
Fig. 1 is that the embodiment of the present invention is portrayed and prediction technique flow chart based on the user behavior of network log;
Fig. 2 is that the embodiment of the present invention portrays the another flow chart with prediction technique based on the user behavior of network log;
Fig. 3 is the upper approximation of set of embodiment of the present invention X, lower aprons, Boundary Region schematic diagram;
Fig. 4 is that the embodiment of the present invention is portrayed and forecasting system structural schematic diagram based on the user behavior of network log.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
The object of the present invention is to provide a kind of user behavior based on network log portray with prediction technique and system, can The personality of user, behavior are predicted, in turn, data is provided for prevention harm and supports.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.
As shown in Figure 1, it includes following step that the user behavior provided by the invention based on network log, which is portrayed with prediction technique, It is rapid:
Step 101: network log;
Step 102: the behavioural characteristic vector of the user is extracted according to the network log, the behavioural characteristic vector is Each field keyword accounts for the vector that the ratio of keyword sum is constituted in user network log, and the field is divided into natural science Field and domain of the social sciences, the natural science field include military affairs, science and technology, sport, tourism and food, the social science Field includes history political affairs, literature and art, society, amusement and beauty;
Step 103: obtaining standard character feature vector, the standard character feature vector is that each field is closed in standard personality Keyword accounts for the vector that the ratio of keyword sum is constituted, and the field is divided into natural science field and domain of the social sciences, institute Stating natural science field includes military affairs, science and technology, sport, tourism and food, and the domain of the social sciences includes history political affairs, literature and art, society Meeting, amusement and beauty;
Step 104: calculating the behavioural characteristic vector of the user and the similarity of each standard character feature vector;
Step 105: character trait representated by the highest standard character feature vector of similarity is determined as the user Character trait;
As an embodiment of the present invention, on the basis of the above embodiments, the invention also includes;
Step 106: determining natural sciences class keywords quantity and literal arts class keywords number in the behavioural characteristic vector of the user Amount;
Step 107: according to natural sciences class keywords quantity in the behavioural characteristic vector of user and literal arts class keywords quantity Ratio predicts the behavior of the user.
Wherein, step 104, it specifically includes:
Calculate the behavioural characteristic vector of the user and the cosine similarity of each standard character feature vector;
The smallest standard character feature vector of cosine similarity is determined as and the user behavior characteristics vector similarity Maximum standard character feature vector.
Step 105, it specifically includes:
The standard character feature vector is divided into positive personality, intermediate personality and passive personality three types;
It will be determined as the use with the maximum affiliated type of standard character feature vector of user behavior characteristics vector similarity The character type at family.
Step 107, it specifically includes:
When natural sciences class keywords quantity in the behavioural characteristic vector of the user and the ratio of literal arts class keywords quantity are When 3:1, predict that the user has a possibility that damaging to other people.
As another embodiment of the present invention, as shown in Fig. 2, the user behavior provided by the invention based on network log is carved Drawing with prediction technique includes the following contents:
1.1 log acquisition
Source log mostlys come from search engine server or web crawlers, and crawler system is connect with each website, from And obtain network log.Currently used crawler system has Baidu's statistics, cnzz etc..The present invention uses search dog laboratory 2008 June part web page interrogation demand and user click condition web page interrogation log.Data format are as follows: " user | query word | the URL Ranking in returning the result | the serial number that user clicks | the URL " that user clicks.Log sample is as shown in table 1.Note: herein User of individual consumer's query information more than 8 is chosen as experimental subjects.
1 log sample of table
The pretreatment of 1.2 logs
Define 1: synonym collection: synonym collection includes the word and the collection with word same or similar with the word meaning It closes.
Define 2: hypernym: hypernym refers to the wider array of word of concept and range.If " vehicle " is the hypernym of " automobile ";" traffic work Tool " is the hypernym of " vehicle ".Apposition set is contained in hypernym.
1.2.1 the building of personality model
The building of personality model includes two parts: (1) division of personality;(2) classification and selection of hypernym.Present invention ginseng According to five-factor model personality theory, personality includes opening, sense of responsibility, extroversion, pleasant property and changeable in mood personality.People are by social ring The influence in border, Social Culture and various ideological trends, personality feature have diversity, and diversity can be substantially according to five-factor model personality theory It is divided into positive, intermediate and passive three aspects.By combining the intrinsic advantage of personal touch, deficiency objectively to be evaluated.People Touch polynary values, receive mainstream thoughts, have thought active, dare to showing self, overflowing with enthusiasm, social responsibility The features such as sense is strong, team unity consciousness, creativity consciousness, Gratitude Consciousness, specific personality shows as shown in table 2 with relevant classification.On The classification of position word can be divided into Natural Science Class vocabulary and social science class vocabulary.Natural Science Class vocabulary chooses military, science and technology, body It educates, travel, 5 class of food;Social science class vocabulary chooses history political affairs, literature and art, society, amusement, 5 class of beauty.The foundation of division has: 1) The obvious vocabulary of the marks such as the characteristics of natural science and social science itself, such as science and technology, Shi Zheng, literature and art;2) according to the personality of people Feature, the strong people such as social responsibility can discuss and analyze national current events, therefore have certain probability to pay close attention to military class content;Interest Extensive people usually has a bright and cheerful disposition, and has certain probability concern sport category, the content of tourism, and sport category and tourism need Cooperation and the careful arrangement of group, the characteristics of meeting Natural Science Class.Social science class vocabulary tends to divergent thinking, considers To various aspects, such as society, amusement.The people of social science class thinking does not like to restrain, casual, therefore has certain probability concern beauty Class.It is wide that it dabbles range, and it is wide that people pay close attention to range, meets the personality feature of social science class.User's specification comprising hypernym Character feature vector library is as shown in table 3.Number represents the key with identical hypernym of crowd's search of different characters in table 3 The assembly average of the percentage of word number.
1.2.2 Preprocessing Algorithm (algorithm 1)
Preprocessing Algorithm is following (algorithm 1):
Input: the original log of user
Output: the feature vector of user
Step 1 first crawls URL in original log, obtains web-page summarization and is added in user journal
Step 2 finds out the keyword in log, counts the number of search key
Step 3 counts the number that the hypernym of keyword occurs
Step 4 acquires every class hypernym and accounts for the ratio of all hypernyms and by ratio with percents constitutive characteristic vector
Algorithm 1 is illustrated: user 125254918559 and 828687165269 searches for upper in output result such as 2 sample of table The vector of precedence composition of proportions is respectively as follows:
125254918559:(0.0,20.0,0.0,0.0,0.0,40.0,20.0,0.0,0.0,20. 0) 828687165269:(25.0,56.25,0.0,0.0,0.0,0.0,12.5,6.25,0.0,0 .0)
2 personality model of table and personality keyword classification
Core algorithm
Step 1: similarity mode
Base based on the character analysis algorithm of cosine similarity feature clustering in building above-mentioned standard character feature vector library On plinth, the ratio of all kinds of hypernyms in new log is counted, compared with row each in vector library carries out cosine similarity, folder Angle is smaller, closer to its character feature vector.Acquire the maximum component of cosine value, the i.e. the smallest component of angle, you can get it its Character trait.
Keyword in log is extracted to the new user of unknown personality and counts the frequency that its hypernym occurs respectively, is asked Assembly average, by the hypernym proportion construction feature vector of keyword, and in standard character feature vector library Component carries out similarity calculation, and the difference obtained is smaller, shows closer to the row vector, and then show that user is special in standard personality Levy immediate personality, the as personality of this user in vector library.
Character analysis algorithm based on cosine similarity feature clustering is following (algorithm 2):
Input: the feature vector (exported and generated by algorithm 1) of user
Output: the character trait of user
Step 1 constructed lattice model vector set
Step 2 constructs user's test vector
Step 3 similarity-rough set finds out the maximum component of cosine value in feature vector library
Step 4 exports the corresponding character trait of the component
Algorithm 2 is illustrated: the character trait of user 125254918559 is positive personality in output result such as 2 sample of table: Family value is strong, supports and improves oneself constantly, and (preferable family value/showing solicitude for parent/is supported and improved oneself constantly/understand the hardships of parent for respecting the old and loving the young With great efforts/bear hardships and stand hard work, adaptive faculty is strong);The character trait of user 828687165269 is passive personality: lacking and links up skill Ingeniously (confusing communication/ability to express is poor/be easy to encounter awkwardness with other people strained relations/friend's circle is smaller /).
User's personality trend is the similarity of row vector in user's character feature vector and standard character feature vector library Reflection.Matrix M is standard character feature vector library, C1-CnIndicate hypernym, W1-WnIndicate being averaged for hypernym proportional roles Value, every row represent the component in standard character feature vector library, are denoted as Represent active user's character feature vector, by this two The personality that a vector is used to calculate user tends to.
In the present inventionThe one-component in standard character feature vector library is represented,Represent active user's character trait to Amount, wherein More than two vectors String angle calcu-lation goes out the similarity degree of two vectors.
The hypernym of step 2 keyword is classified, personality classification, building behavior library and behavior vector
The method that behavior prediction algorithm based on rough set fuzzy analysis uses Reduction, will be in character feature vector library Personality be further divided into positive, intermediate and passive three grades.
The hypernym of keyword is further divided into Natural Science Class and social science class vocabulary, by the search ratio of two class vocabulary Example is divided into three grades.Analyze the difference of three factors (personality grade, natural science vocabulary grade, social science vocabulary grade) Combination, shares 33=27 kinds of combinations, by 27 kinds of combined reference Huo Lande occupation anlage test results, can must have 27 kinds of rows For the behavior library in direction.3 factors in user journal are done into same treatment, is compared with behavior library, exports the row met For.
It is approximate to define 3: K=(U, S) is given knowledge base, and U indicates that domain, S are the equivalence relation cluster on U.ThenWith an equivalence relation R ∈ IND (K) on U, as shown in figure 3, upper approximate and lower aprons of the subset X about knowledge R Are as follows:
Define 4 Reductions: the knowledge in knowledge base is not of equal importance, some knowledge redundancies.Reduction is by one A little five rings or redundant character abandon, and reduce information content under the premise of not influencing original analysis.Do not influencing former knowledge point In the case where class, n is tieed up into information space { x1, x2..., xnIt is reduced to m dimension information space { x1, x2..., xm},(m<n).Pass through Reduction generates new decision rule.
Knowledge-representation system is main knowledge representation method in rough set theory, is denoted as S=(U, A, V, f), wherein U table Show the nonempty finite set of object, i.e. domain, A is the nonempty finite set of attribute, i.e. property set.V=Uα∈AVα, VαIt indicates to belong to Property α codomain, f be U × A → V, f is information function, V be each object each attribute assign a value of information, i.e.,Therefore knowledge representation can be realized with table representation.As shown in table 3, wherein U ={ x1, x2..., xnIt is domain, A={ P1, P2..., PnIt is all attribute sets.
3 knowledge-representation system of table
Step 3 matches equal, output user behavior
First using personality and each hypernym percentage of searches as knowledge base.Then personality is brief for positive, middle rank And passiveness.The brief percentage and social science class vocabulary percentage for Natural Science Class vocabulary of hypernym.Finally according to percentage Than place difference section respectively by Natural Science Class vocabulary and social science class vocabulary it is brief be 3 grades.
Behavior prediction algorithm based on rough set fuzzy analysis is following (algorithm 3):
Input: the character trait (exported and generated by algorithm 2) of user, the feature vector of user (is exported by Preprocessing Algorithm It generates)
Output: the behavior prediction of user
Step 1 building concordance list be length be 4 two-dimensional array as behavioural analysis library
The one-dimension array that step 2 building length is 3 respectively indicates personality grade, natural science vocabulary grade and social science Vocabulary grade
Keyword in user's test vector is sought assembly average according to Natural Science Class and social science class by step 3 respectively And divided rank, it is stored in the corresponding space of one-dimension array
Step 4 divides user's personality grade, is stored in the corresponding space of one-dimension array
One-dimension array is matched corresponding row by step 5 in behavioural analysis library, exports behavior
Algorithm 3 is illustrated: the prediction knot obtained such as the behavior of user 125254918559 in 1 sample of table by using algorithm 3 Fruit are as follows: natural sciences class and literal arts class interest ratio are about 1:3, and (behavior specific explanations :) social responsibility is strong, are good at the analysis Political Bureau Gesture/concern national policy/likes the news of political class, newspaper, famous person's autobiography;The behavior of user 828687165269 by using The prediction result that algorithm 3 obtains are as follows: natural sciences class and literal arts class interest ratio are about 3:1, and (behavior specific explanations :) self-closing/to section Higher comprehension is had in, and such as computer, physics, biology/may take technical extreme behavior, people is caused to other people Body/property injury.
Personality is divided into three grades according to positive, intermediate, passive first;Then user is searched for into natural section in log The ratio summation for learning 5 class vocabulary in class, obtains the ratio of Natural Science Class vocabulary.By percentage according to [0,33), [33,66),
[66,100] three sections are divided into three grades.Social science class vocabulary is similarly.Therefore, A=personality grade, from Right Science vocabulary grade, social science class vocabulary grade }, U={ x1, x2..., x27}.To 27 kinds of property set difference element Combination, analyzes each case respectively according to Huo Lande occupation anlage test result, establishes behavioural analysis library, behavior Set be U, as shown in table 4.Wherein passive personality is denoted as 1, and intermediate personality is denoted as 2, and positive personality is denoted as 3.Natural science With social science word frequency by percentage according to [0,33) be denoted as 1, [33,66) be denoted as 2, [66,100] are denoted as 3.x1--x27Indicate row For number.
Finally the attribute set of user and behavioural analysis library are compared, found so that in user property collection and behavioural analysis library The equal record of property set, export the behavioural characteristic met.
4 user behavior analysis table of table
It is portrayed the present invention also provides a kind of user behavior based on network log and forecasting system, as shown in figure 4, this is System includes:
Network log obtains module 401, for obtaining the network log of user;
User behavior characteristics vector extraction module 402, the behavior for extracting the user according to the network log are special Vector is levied, what the ratio that the behavioural characteristic vector accounts for keyword sum by field keyword each in user network log was constituted Vector, the field are divided into natural science field and domain of the social sciences, and the natural science field includes military, science and technology, body It educates, travel and food, the domain of the social sciences include history political affairs, literature and art, society, amusement and beauty;
Standard character feature vector obtains module 403, for obtaining standard character feature vector, the standard character trait Vector is accounted for the vector that the ratio of keyword sum is constituted by field keyword each in standard personality, and the field is divided into nature section Field and domain of the social sciences, the natural science field include military affairs, science and technology, sport, tourism and food, section, the society Field includes history political affairs, literature and art, society, amusement and beauty;
Similarity calculation module 404, for calculate the user behavioural characteristic vector and each standard character trait The similarity of vector;
User's personification module 405, for personality representated by the highest standard character feature vector of similarity is special Sign is determined as the character trait of the user;
As an embodiment of the present invention, on the basis of the above embodiments, the invention also includes:
Keyword quantity determining module 406, natural sciences class keywords number in the behavioural characteristic vector for determining the user Amount and literal arts class keywords quantity;
User's behavior prediction module 407, for natural sciences class keywords quantity and text in the behavioural characteristic vector according to user The ratio of section's class keywords quantity, predicts the behavior of the user.
Wherein, the similarity calculation module 404, specifically includes:
Similarity calculated, for calculating the behavioural characteristic vector and each standard character feature vector of the user Cosine similarity;
Personality determination unit, for the smallest standard character feature vector of cosine similarity to be determined as and user's row It is characterized the maximum standard character feature vector of vector similarity.
User's personification module 405, specifically includes:
Character type division unit, for by the standard character feature vector be divided into positive personality, intermediate personality and Passive personality three types;
User's personification unit, being used for will be with the maximum standard character feature vector of user behavior characteristics vector similarity Affiliated type is determined as the character type of the user.
The user's behavior prediction module 407, specifically includes:
User's behavior prediction unit, for when natural sciences class keywords quantity and literal arts in the behavioural characteristic vector of the user When the ratio of class keywords quantity is 3:1, predict that the user has a possibility that damaging to other people.
User behavior provided by the invention based on network log portray with prediction technique and system, according to user network day Will extracts the behavioural characteristic vector of user, calculates the behavioural characteristic vector of user and the similarity of each standard character feature vector, Character trait representated by the highest standard character feature vector of similarity is determined as to the character trait of the user, according to obtaining The user's character trait obtained determines the risk of user.Meanwhile determining that natural sciences class is crucial in the behavioural characteristic vector of the user Word quantity and literal arts class keywords quantity predict institute according to the ratio of natural sciences class keywords quantity and literal arts class keywords quantity The behavior risk for stating user, gives warning in advance, and takes precautions against the generation of harm.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For system disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.
Used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said It is bright to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, foundation Thought of the invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not It is interpreted as limitation of the present invention.

Claims (10)

1. a kind of user behavior based on network log is portrayed and prediction technique characterized by comprising
Obtain the network log of user;
The behavioural characteristic vector of the user is extracted according to the network log, the behavioural characteristic vector is user network log In each field keyword account for the vector that the ratio of keyword sum is constituted, the field is divided into natural science field and social section Field, the natural science field include military affairs, science and technology, sport, tourism and food, and the domain of the social sciences includes history Political affairs, literature and art, society, amusement and beauty;
Acquisition standard character feature vector, the standard character feature vector are that each field keyword accounts for keyword in standard personality The vector that the ratio of sum is constituted, the field are divided into natural science field and domain of the social sciences, the natural science neck Domain includes military affairs, science and technology, sport, tourism and food, and the domain of the social sciences includes history political affairs, literature and art, society, amusement and beauty Hold;
Calculate the behavioural characteristic vector of the user and the similarity of each standard character feature vector;
Character trait representated by the highest standard character feature vector of similarity is determined as to the character trait of the user.
2. the user behavior according to claim 1 based on network log is portrayed and prediction technique, which is characterized in that described Method further include:
Determine natural sciences class keywords quantity and literal arts class keywords quantity in the behavioural characteristic vector of the user;
According to the ratio of natural sciences class keywords quantity and literal arts class keywords quantity in the behavioural characteristic vector of user, described in prediction The behavior of user.
3. the user behavior according to claim 1 based on network log is portrayed and prediction technique, which is characterized in that described The behavioural characteristic vector of the user and the similarity of each standard character feature vector are calculated, is specifically included:
Calculate the behavioural characteristic vector of the user and the cosine similarity of each standard character feature vector;
The smallest standard character feature vector of cosine similarity is determined as maximum with the user behavior characteristics vector similarity Standard character feature vector.
4. the user behavior according to claim 1 based on network log is portrayed and prediction technique, which is characterized in that described Character trait representated by the highest standard character feature vector of similarity is determined as to the character trait of the user, it is specific to wrap It includes:
The standard character feature vector is divided into positive personality, intermediate personality and passive personality three types;
It will be determined as the user's with the maximum affiliated type of standard character feature vector of user behavior characteristics vector similarity Character type.
5. the user behavior according to claim 2 based on network log is portrayed and prediction technique, which is characterized in that described According to the ratio of natural sciences class keywords quantity and literal arts class keywords quantity in the behavioural characteristic vector of user, the user is predicted Behavior, specifically include:
When the ratio of natural sciences class keywords quantity in the behavioural characteristic vector of the user and literal arts class keywords quantity is 3:1 When, predict that the user has a possibility that damaging to other people.
6. a kind of user behavior based on network log is portrayed and forecasting system characterized by comprising
Network log obtains module, for obtaining the network log of user;
User behavior characteristics vector extraction module, for extracting the behavioural characteristic vector of the user according to the network log, The behavioural characteristic vector is accounted for the vector that the ratio of keyword sum is constituted, institute by field keyword each in user network log The field of stating is divided into natural science field and domain of the social sciences, and the natural science field includes military affairs, science and technology, sport, tourism And food, the domain of the social sciences include history political affairs, literature and art, society, amusement and beauty;
Standard character feature vector obtains module, and for obtaining standard character feature vector, the standard character feature vector is Each field keyword accounts for the vector that the ratio of keyword sum is constituted in standard personality, and the field is divided into natural science field And domain of the social sciences, the natural science field include military affairs, science and technology, sport, tourism and food, the domain of the social sciences Including history political affairs, literature and art, society, amusement and beauty;
Similarity calculation module, the phase of behavioural characteristic vector and each standard character feature vector for calculating the user Like degree;
User's personification module, for character trait representated by the highest standard character feature vector of similarity to be determined as The character trait of the user.
7. the user behavior according to claim 6 based on network log is portrayed and forecasting system, which is characterized in that described System further include:
Keyword quantity determining module, natural sciences class keywords quantity and literal arts in the behavioural characteristic vector for determining the user Class keywords quantity;
User's behavior prediction module, it is crucial for natural sciences class keywords quantity in the behavioural characteristic vector according to user and literal arts class The ratio of word quantity predicts the behavior of the user.
8. the user behavior according to claim 6 based on network log is portrayed and forecasting system, which is characterized in that described Similarity calculation module specifically includes:
Similarity calculated, for calculate the user behavioural characteristic vector and each standard character feature vector more than String similarity;
Personality determination unit, it is special with the user behavior for being determined as the smallest standard character feature vector of cosine similarity Levy the maximum standard character feature vector of vector similarity.
9. the user behavior according to claim 6 based on network log is portrayed and forecasting system, which is characterized in that described User's personification module, specifically includes:
Character type division unit, for the standard character feature vector to be divided into positive personality, intermediate personality and passiveness Personality three types;
User's personification unit, for will with belonging to the maximum standard character feature vector of user behavior characteristics vector similarity Type is determined as the character type of the user.
10. the user behavior according to claim 7 based on network log is portrayed and forecasting system, which is characterized in that institute User's behavior prediction module is stated, is specifically included:
User's behavior prediction unit, for being closed when natural sciences class keywords quantity in the behavioural characteristic vector of the user and literal arts class When the ratio of keyword quantity is 3:1, predict that the user has a possibility that damaging to other people.
CN201910089017.1A 2019-01-30 2019-01-30 User behavior based on network log is portrayed and prediction technique and system Pending CN109783460A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910089017.1A CN109783460A (en) 2019-01-30 2019-01-30 User behavior based on network log is portrayed and prediction technique and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910089017.1A CN109783460A (en) 2019-01-30 2019-01-30 User behavior based on network log is portrayed and prediction technique and system

Publications (1)

Publication Number Publication Date
CN109783460A true CN109783460A (en) 2019-05-21

Family

ID=66503698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910089017.1A Pending CN109783460A (en) 2019-01-30 2019-01-30 User behavior based on network log is portrayed and prediction technique and system

Country Status (1)

Country Link
CN (1) CN109783460A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457590A (en) * 2019-06-25 2019-11-15 华院数据技术(上海)有限公司 Intelligent subscriber portrait method based on small data input
CN110825824A (en) * 2019-10-16 2020-02-21 天津大学 User relation portrayal method based on semantic visual/non-visual user character expression
CN116451087A (en) * 2022-12-20 2023-07-18 石家庄七彩联创光电科技有限公司 Character matching method, device, terminal and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457590A (en) * 2019-06-25 2019-11-15 华院数据技术(上海)有限公司 Intelligent subscriber portrait method based on small data input
CN110457590B (en) * 2019-06-25 2021-08-27 华院计算技术(上海)股份有限公司 Intelligent user portrait drawing method based on small data input
CN110825824A (en) * 2019-10-16 2020-02-21 天津大学 User relation portrayal method based on semantic visual/non-visual user character expression
CN116451087A (en) * 2022-12-20 2023-07-18 石家庄七彩联创光电科技有限公司 Character matching method, device, terminal and storage medium
CN116451087B (en) * 2022-12-20 2023-12-26 石家庄七彩联创光电科技有限公司 Character matching method, device, terminal and storage medium

Similar Documents

Publication Publication Date Title
Kaleel et al. Cluster-discovery of Twitter messages for event detection and trending
CN106383887A (en) Environment-friendly news data acquisition and recommendation display method and system
US9069880B2 (en) Prediction and isolation of patterns across datasets
CN109783460A (en) User behavior based on network log is portrayed and prediction technique and system
Zielinski et al. Computing controversy: Formal model and algorithms for detecting controversy on Wikipedia and in search queries
Nasution Singleton: A role of the search engine to reveal the existence of something in information space
CN111259220A (en) Data acquisition method and system based on big data
Whitmore Extracting knowledge from US department of defense freedom of information act requests with social media
Gkoulalas-Divanis et al. Large-Scale Data Analytics
Tchuente et al. Visualizing the relevance of social ties in user profile modeling
Zhukovskii et al. URL redirection accounting for improving link-based ranking methods
Rollo et al. Knowledge graphs for community detection in textual data
Kim A document ranking method with query-related web context
Santoso et al. An Ontological Crawling Approach for Improving Information Aggregation over eGovernment Websites.
CN114880540A (en) Intelligent reminding method based on intelligent financial text comments
US20200226159A1 (en) System and method of generating reading lists
Dashdorj et al. High‐level event identification in social media
Yu et al. Friend recommendation mechanism for social media based on content matching
Jelodar et al. Evaluation and analysis of popular decision tree algorithms for annoying advertisement websites classification
Xu et al. The study of content security for mobile internet
Kanagasabai et al. Classification of massive mobile web log URLs for customer profiling & analytics
Mundhe et al. Continuous top-k monitoring on document streams
Suguna et al. Association rule mining for web recommendation
Sreeja et al. Review of web crawlers
Martindale Detecting bias in news article content with machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190521