CN114626340B - Behavior feature extraction method based on mobile phone signaling and related device - Google Patents
Behavior feature extraction method based on mobile phone signaling and related device Download PDFInfo
- Publication number
- CN114626340B CN114626340B CN202210266442.5A CN202210266442A CN114626340B CN 114626340 B CN114626340 B CN 114626340B CN 202210266442 A CN202210266442 A CN 202210266442A CN 114626340 B CN114626340 B CN 114626340B
- Authority
- CN
- China
- Prior art keywords
- preset
- behavior
- distribution
- user
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Analysis (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Mobile Radio Communication Systems (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention relates to the technical field of big data, and provides a behavior feature extraction method based on mobile phone signaling and a related device, wherein the method comprises the following steps: acquiring a behavior code of a user, wherein the behavior code is obtained by coding resident information according to a resident time sequence in a preset time period by the user, and the resident information of the user is obtained according to a mobile phone signaling of the user; determining the probability of the behavior code under a preset theme by utilizing preset probability distribution; and calculating the behavior characteristics of the user according to the probability of the behavior codes under the preset theme, wherein the dimensionality of the behavior characteristics of the user is equal to the number of the preset themes. The behavior characteristics obtained by the method are more accurate and comprehensive, and the travel behavior of the user can be fully reflected.
Description
Technical Field
The invention relates to the technical field of big data, in particular to a behavior feature extraction method based on mobile phone signaling and a related device.
Background
In the prior art, the behavior characteristics of the user's trip are described by using the mobile phone signaling data, and a statistical method is generally adopted: the method comprises the steps of obtaining relevant characteristics of travel mileage, travel times, travel time and the like of a user by utilizing mobile phone signaling data to describe travel behaviors of the user, obtaining rough information of a residence place, a working place, a visiting place and the like by summarizing frequent places of the user, and counting information of working and leaving time rules, weekend travel times and the like as behavior characteristics of a depicted user.
The existing statistical mode can not fully dig out effective information in the mobile phone signaling data, so that the behavior characteristics which can not fully reflect the travel behaviors of the user can be obtained.
Disclosure of Invention
The invention aims to provide a behavior feature extraction method and a related device based on mobile phone signaling, which can extract features of a behavior code of a user with a residence time sequence by utilizing preset probability distribution so as to obtain behavior features which fully reflect the travel behaviors of the user.
In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:
in a first aspect, an embodiment of the present invention provides a behavior feature extraction method based on a mobile phone signaling, where the method includes: acquiring a behavior code of a user, wherein the behavior code is obtained by coding resident information according to a resident time sequence in a preset time period by the user, and the resident information of the user is obtained according to a mobile phone signaling of the user; determining the probability of the behavior code under a preset theme by utilizing preset probability distribution; and calculating the behavior characteristics of the user according to the probability of the behavior code under the preset theme, wherein the dimensionality of the behavior characteristics of the user is equal to the number of the preset themes.
Further, the resident information includes a resident place, a resident start time, a resident duration, and a resident date, and the step of acquiring the behavior code of the user includes:
determining a target preset place type corresponding to a resident place according to the resident place and a preset place type included in a preset area to which the resident place belongs;
determining a target interval segment corresponding to the residence starting time according to a preset interval segment, wherein the preset interval segment is obtained by dividing time intervals in one day;
determining a target duration segment corresponding to the residence duration according to a preset duration segment, wherein the preset duration segment is obtained by dividing the duration in one day;
determining a target date characteristic corresponding to the resident date according to a mapping relation between preset date and date characteristics;
and generating the behavior code according to the target preset place type, the target interval segment, the target duration segment and the target date characteristic.
Further, the step of determining the target preset location type corresponding to the residence location according to the residence location and the preset location type included in the preset area to which the residence location belongs includes:
determining a target grid to which the residence location belongs;
and taking the preset place type corresponding to the target grid as the target preset place type.
Further, the behavior code is multiple, the preset theme is multiple, the number of the preset themes is smaller than the number of the behavior codes, the probability of the behavior code under a preset theme includes the probability of each behavior code under each preset theme, and the step of calculating the behavior characteristic of the user according to the probability of the behavior code under the preset theme includes:
calculating the average value of the probability of all the behavior codes under each preset theme to obtain the behavior characteristics corresponding to each preset theme;
and taking the behavior characteristics corresponding to all preset subjects as the behavior characteristics of the user.
Further, the method further comprises:
obtaining a corpus, wherein the corpus comprises behavior codes of a plurality of users;
gibbs sampling is carried out on the corpus, and a first preset parameter and a second preset parameter of preset joint distribution are determined, wherein the preset joint distribution is used for representing the joint distribution of the preset theme and the behavior codes in the corpus, the first preset parameter is used for representing the distribution parameter of the preset theme in the corpus, and the second preset parameter is used for representing the distribution parameter of the behavior codes and the preset theme in the corpus;
summarizing the preset theme according to the preset joint distribution, the first preset parameter and the second preset parameter to obtain edge distribution;
and determining the preset probability distribution according to the preset joint distribution and the edge distribution.
Further, the first preset parameter includes a first distribution parameter and a second distribution parameter, where the first distribution parameter is used to characterize parameters of a multi-term distribution of the preset theme, and the second distribution parameter is used to characterize parameters of a dirichlet distribution of the first distribution parameter.
Further, the number of the preset topics is K, the number of behavior codes in the corpus after the de-duplication of the behavior codes is V, and the second preset parameter is a K × V matrix.
In a second aspect, an embodiment of the present invention provides a device for extracting behavioral characteristics based on mobile phone signaling, where the device includes: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a behavior code of a user, the behavior code is obtained by encoding resident information according to a resident time sequence in a preset time period by the user, and the resident information of the user is obtained according to a mobile phone signaling of the user; the determining module is used for determining the probability of the behavior code under a preset theme by utilizing preset probability distribution; and the calculation module is used for calculating the behavior characteristics of the user according to the probability of the behavior code under the preset theme, wherein the dimensionality of the behavior characteristics of the user is equal to the number of the preset themes.
In a third aspect, an embodiment of the present invention provides an electronic device, including a controller and a memory; the memory is used for storing programs; the controller is configured to implement the behavior feature extraction method based on the mobile phone signaling according to the first aspect when executing the program.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a controller, implements the method for extracting behavior characteristics based on mobile phone signaling according to the first aspect.
The method and the device for the behavior feature calculation are characterized in that after behavior codes obtained by coding resident information obtained according to mobile phone signaling of a user are obtained according to a resident time sequence in a preset time period by the user, the probability of the behavior codes under a preset theme is determined by utilizing preset probability distribution, then the behavior feature of the user is calculated according to the probability of the behavior codes under the preset theme, and the dimensionality of the behavior feature of the user is equal to the number of the preset themes. Compared with the prior art, the embodiment of the invention obtains the behavior characteristics by encoding the behaviors with the residence time sequence and performing probability calculation by using the preset probability distribution, so the behavior characteristics are more accurate and comprehensive, and the travel behaviors of the user can be fully reflected.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart of a behavior feature extraction method based on a mobile phone signaling according to an embodiment of the present invention.
Fig. 2 is a flowchart of another behavior feature extraction method based on mobile phone signaling according to an embodiment of the present invention.
Fig. 3 is a flowchart of another behavior feature extraction method based on mobile phone signaling according to an embodiment of the present invention.
Fig. 4 is a flowchart of another behavior feature extraction method based on mobile phone signaling according to an embodiment of the present invention.
Fig. 5 is a schematic block diagram of a behavior feature extraction apparatus based on mobile phone signaling according to an embodiment of the present invention.
Fig. 6 is a block diagram of an electronic device according to an embodiment of the present invention.
An icon: 10-an electronic device; 11-a processor; 12-a memory; 13-a bus; 100-a behavior feature extraction device based on mobile phone signaling; 110-an acquisition module; 120-a determination module; 130-a calculation module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present invention, it should be noted that, if the terms "upper", "lower", "inner", "outer", etc. are used to indicate the orientation or positional relationship based on the orientation or positional relationship shown in the drawings or the orientation or positional relationship which the product of the present invention is used to usually place, it is only for convenience of description and simplification of the description, but it is not intended to indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.
The inventor finds, through research on the prior art, that features such as user behavior sequence, time distribution, combination of the user behavior sequence and the time distribution are not well extracted, and a behavior sequence rule of the user at different time points is not fully considered, for example, the user goes to the place a and then goes to the place B, and the user goes to the place B and then goes to the place a, so that effective information in mobile phone signaling data cannot be fully mined, and finally the obtained behavior features cannot fully reflect behavior features of travel behaviors of the user.
In view of this, embodiments of the present invention provide a behavior feature extraction method and a related apparatus based on a mobile phone signaling, which can fully mine effective information in mobile phone signaling data to obtain a behavior feature that fully reflects a trip behavior of a user, and will be described in detail below.
Referring to fig. 1, fig. 1 is a flowchart of a behavior feature extraction method based on a mobile phone signaling according to an embodiment of the present invention, where the method includes the following steps:
step S101, behavior codes of the users are obtained, wherein the behavior codes are obtained by coding the resident information according to the resident time sequence in a preset time period by the users, and the resident information of the users is obtained according to mobile phone signaling of the users.
In this embodiment, the residence information is used to represent the residence location and residence time of the user, and may include, but is not limited to, the residence location, residence start time, residence date, and the like. The mobile phone signaling of the user comprises signaling information such as time and place of signaling sending, and resident information such as time and residence time when the user stays at which place and arrives at the resident place can be obtained through analyzing the mobile phone signaling of the user for a period of time.
In the present embodiment, the preset time period may be set in advance, for example, the preset time period is set to one week, one month, one year, or the like. The behavior codes are arranged according to the order of residence time, for example, the behavior codes are 5, and then the 5 behavior codes are arranged in sequence according to the order of residence time.
And step S102, determining the probability of the behavior code under the preset theme by using the preset probability distribution.
In this embodiment, the preset probability distribution may be obtained by analyzing the behavior codes of multiple users, and the specific analysis manner may be: firstly, selecting an original probability distribution expression which comprises unknown parameters, then carrying out inference formula solving on the unknown parameters in the original probability distribution expression according to behavior codes of a plurality of users, finally estimating the values of the unknown parameters, and determining that the original probability distribution expression of the unknown parameters is the preset probability distribution.
In this embodiment, the number of the preset topics may be pre-specified when determining the preset probability distribution, the preset probability distribution may determine the probability of the behavior code under the preset topics, and the specific meaning of each preset topic is further analyzed, for example, the preset topics may be: at least one of office work, office work during the office work, lunch meal outside, office work outside the noon, home returning from work, leisure work outside the office work, travel outside and the like, so that the behavior characteristics of the user can be described in more detail.
Step S103, calculating the behavior characteristics of the user according to the probability of the behavior code under the preset theme, wherein the dimensionality of the behavior characteristics of the user is equal to the number of the preset themes.
In this embodiment, the behavior feature of the user is obtained by recalculating the behavior code according to the dimension of the preset theme. That is to say, the number of behavior codes of different users may be different, but the dimensionalities of the finally calculated behavior features are the same, so that the behavior features of different users can be unified in dimensionality, so as to facilitate further processing according to the behavior features of multiple users, for example, performing model training or comparing the behavior features.
According to the method provided by the embodiment, the behavior characteristics are obtained by carrying out probability calculation on the behavior codes with the residence time sequence and by utilizing the preset probability distribution, so that the behavior characteristics are more accurate and more comprehensive, and the travel behaviors of the user can be fully reflected.
On the basis of fig. 1, if the residence information includes a residence location, a residence start time, a residence time length, and a residence date, an embodiment of the present invention further provides a specific implementation manner for obtaining a behavior code of a user, please refer to fig. 2, fig. 2 is a flowchart of another behavior feature extraction method based on a mobile phone signaling provided in the embodiment of the present invention, and step S101 includes the following sub-steps:
and a substep S1010 of determining a target preset location type corresponding to the resident location according to the resident location and the preset location types included in the preset area to which the resident location belongs.
In this embodiment, as a specific implementation manner, the preset location type included in the preset area may be determined according to a location type of a Point of Interest (POI) in the preset area, the POI generally refers to all geographic objects that can be abstracted as points, especially some geographic entities closely related to people's lives, such as schools, banks, restaurants, gas stations, hospitals, supermarkets, and the like, and the location type of the POI may be determined according to actual needs, for example, the POI is xx building, and the type of xx building is determined as a business area or a business office area, and the like. It can be understood that there may be a plurality of POIs in the preset area, at this time, the POIs may be clustered according to the geographical location to obtain a plurality of POI sets, and then the POI which can most embody the location type characteristics in each POI set is used as the location type of the set, for example, the POI set includes 5 POIs, and the location types are: and finally, removing the weight of the place types of all the sets, wherein the place types after the weight removal are used as preset place types included in the preset area.
In this embodiment, as another specific implementation manner, a preset area may be first subjected to grid division to obtain multiple grids, and then a preset location type of each grid is determined according to a location type of a POI in each grid, where the specific determination manner may also use a location type of a POI that can most embody the characteristics of the location type in the grid as the preset location type of the grid; all grids can also be divided into n types by clustering, and are identified by n codes, which are denoted as C1-Cn, and n can be determined according to specific services or use scenes.
In this embodiment, as a specific implementation manner for determining the type of the target preset location, that is, the preset location type corresponding to the residence location, may be implemented as follows:
first, a target grid to which the residence belongs is determined.
In this embodiment, the target grid of the residence point may be determined according to the location of the residence point and the geographic range of each grid in the preset area, for example, the location of the residence point is within the geographic range of the target grid.
And secondly, taking the preset place type corresponding to the target grid as the target preset place type.
In sub-step S1011, a target interval segment corresponding to the residence starting time is determined according to a preset interval segment, where the preset interval segment is obtained by dividing a time interval in one day.
In this embodiment, as a specific implementation manner, 24 hours a day may be divided into 24 preset interval segments, where each preset interval segment corresponds to one segment identifier and a total of 24 codes. Of course, other ways of segmenting can be performed according to actual needs, for example, 24 hours a day is divided into 12 preset interval segments, and the like.
And a substep S1012, determining a target duration segment corresponding to the residence duration according to the preset duration segment, wherein the preset duration segment is obtained by dividing the duration in one day.
In this embodiment, as a specific implementation manner, the residence time may be divided into 7 preset time segments, which are respectively: within 1 hour, 1-2 hours, 2-3 hours, 3-5 hours, 5-9 hours, 9-12 hours and 12-24 hours, each preset time segment is marked by one code, and the total of 7 codes are represented as D1-D7.
And a substep S1013 of determining a target date characteristic corresponding to the resident date according to a mapping relationship between preset dates and date characteristics.
In this embodiment, the date feature may be defined according to the requirement of the actual scene, for example, the date feature is defined as: three types of distribution codes are W1-W3 respectively in workdays, weekends and holidays.
And a substep S1014 of generating a behavior code according to the target preset place type, the target interval segment, the target duration segment and the target date characteristic.
In this embodiment, the target preset location type, the target interval segment, the target duration segment, and the target date feature may be spliced to obtain the behavior code, or the result of the splicing may be encoded to obtain the behavior code, or the target preset location type, the target interval segment, the target duration segment, and the target date feature may be directly re-encoded to obtain the behavior code.
According to the method provided by the embodiment, the residence place, the residence starting time, the residence time and the residence date are programmed into the behavior code, so that the information of the trip behavior of the user carried in the behavior code is more comprehensive.
On the basis of fig. 1, the present embodiment further provides a specific implementation manner for calculating the behavior feature of the user, please refer to fig. 3, fig. 3 is a flowchart of another behavior feature extraction method based on the mobile phone signaling according to the embodiment of the present invention, and step S103 includes the following sub-steps:
and a substep S1030, calculating an average value of the probabilities of all the behavior codes under each preset theme to obtain the behavior characteristics corresponding to each preset theme.
In this embodiment, the behavior code may be multiple, the preset theme may also be multiple, the number of the preset theme is less than the number of the behavior code, the probability of the behavior code under the preset theme includes the probability of each behavior code under each preset theme, the behavior feature corresponding to each preset theme may be first calculated, and then the behavior feature corresponding to all the preset themes is used as the behavior feature of the user, and the behavior feature may be represented in a vector form, for example, the user has 3 behavior codes a, B, C,3 themes, and the vector of the code a: (p) A1 ,p A2 ,p A3 ) (ii) a Vector for coding B: (p) B1 ,p B2 ,p B3 ) (ii) a Vector for code C: (p) C1 ,p C2 ,p C3 ) The behavior feature calculation modes of the 3 preset themes are respectively as follows: as a vectorized representation of the behavior characteristics of the last user.
In this embodiment, as another specific implementation manner, the average value may be a geometric average value, a weighted average value, or the like.
And a substep S1031, taking the behavior characteristics corresponding to all the preset topics as the behavior characteristics of the user.
By calculating the behavior features corresponding to each preset topic, the method provided by the embodiment can convert the behavior code of the user into the feature vector with the preset topic as the dimension, so that the features in the behavior code are fully reserved, the dimension of the behavior features is effectively reduced, and the data volume of the behavior code is further effectively reduced.
In the embodiment, in order to obtain the preset probability distribution, the embodiment of the invention adopts a smart way to simulate the behavior codes into Chinese characters or words, one code is equivalent to one Chinese character or word and represents a certain meaning, and the similar codes have similar meanings and are similar to similar words. The behavior code of a user arranged according to the sequence of residence time is analogized to an article, residence information of the user from the beginning to the end of a certain day can be represented by a plurality of words to form a paragraph with a plurality of words, and the behavior of the user for one month or more forms the article with a plurality of paragraphs. The behavior characteristics of a user are extracted, similar to the subject of finding an article that tells what to do, such as philosophy, sociology, emotions, culture, sports, people, etc., and similarly, the behavior characteristics of a user may be five commutes in nine nights, occasional visits to a supermarket, social reach, business trip, park square dance, take away from a seller, etc. The behavior of the user is classified into a plurality of preset topics, and the probability of the behavior code of the user under each preset topic is searched, so that a vector of a topic space can be obtained to represent the behavior characteristics of the user.
The embodiment of the invention introduces a theme space as a numerical vector space. That is, each behavior code can be represented by a topic space vector, and the behavior code of the user is represented by an average topic space vector. For a multidimensional space, a value in a certain dimension represents the position of the multidimensional space in a certain dimension direction, and a plurality of dimensional values form coordinates in the multidimensional space and represent the position of the multidimensional space. A behavior code is a probability event which is a specific preset topic, and the probability value can be a position of a certain dimension in the topic space.
Based on the analogy thought, in order to calculate the probability relationship between the behavior code and the preset theme, the embodiment of the invention obtains inspiration from a Bayesian method, wherein the preset theme is represented by z, the behavior code is represented by w, and the classic Bayesian formula is represented as follows:that is, the posterior probability p (z | w) can be obtained by giving the prior probabilities p (z), p (w) and the conditional probability p (w | z). However, since the number of the preset topics is preset, the probability p (z) is uncertain, so the conditional probability p (w | z) is also not calculable, which means that both p (z) and p (w | z) are uncertain, and cannot be calculated by the above classical formula. In this embodiment, the joint probability of the preset topic and the behavior code may be calculated by a preset joint distribution. The edge probability of the behavior code can be calculated through edge distribution, an embodiment of the present invention further provides a specific implementation manner of determining the preset joint distribution and the edge distribution, and finally determining the preset probability distribution according to the preset joint distribution and the edge distribution, please refer to fig. 4, where fig. 4 is a flowchart of another behavior feature extraction method based on mobile phone signaling provided by an embodiment of the present invention, and the method further includes calculating the edge probability of the behavior code by using edge distributionThe following steps:
step S201, a corpus is obtained, where the corpus includes behavior codes of a plurality of users.
In this embodiment, the corpus may include a plurality of documents, and each document corresponds to all behavior codes of one user in a preset time period. A document includes a plurality of words, each word corresponding to a behavior code.
Step S202, gibbs sampling is carried out on the corpus, and a first preset parameter and a second preset parameter of preset joint distribution are determined, wherein the preset joint distribution is used for representing joint distribution of preset themes and behavior codes in the corpus, the first preset parameter is used for representing distribution parameters of the preset themes in the corpus, and the second preset parameter is used for representing distribution parameters of the behavior codes and the preset themes in the corpus.
In this embodiment, the first preset parameter includes a first distribution parameter and a second distribution parameter, where the first distribution parameter is used to represent a parameter of a multi-term distribution of a preset theme, and the second distribution parameter is used to represent a parameter of a dirichlet distribution of the first distribution parameter. The second preset parameter is a K × V matrix, where the number of the preset topics is K, and the number of behavior codes after the behavior codes are deduplicated in the corpus is V.
In this embodiment, the preset joint distribution may be expressed as:
p (θ, z, w | α, β) = p (z | α) p (w | z, β), where z denotes a preset topic vector, w denotes a set of behavioral encodings, θ denotes a first distribution parameter, α denotes a second distribution parameter, and β denotes a second preset parameter. p (z | α) = { [ p (z | θ) p (θ | α) d θ,z n indicates the preset theme corresponding to the nth word, w n Is shown in a preset theme z n The downlink is the distribution of codes. Further derivation is as follows:
in formula (1),: (z | θ) d θ =1, i.e., the integral result of the probabilities of all preset subjects is 1.
It should be noted that, the derivation process of the preset joint distribution is:
firstly, for the d document in M documents, corresponding to the behavior code of any user in the preset time period, the preset theme z is d The conditional distribution of (a) can be expressed as:
p(z d |α)=∫p(z d |θ d )p(θ d |α)dθ d wherein, theta d Preset theme z d Of the plurality of terms of distribution.
According to the preset theme z of the d document in the M documents d Obtaining the preset theme condition distribution of all the documents, and expressing as:
The corresponding general expression is: p (z | α) =: (z | θ) p (θ | α) d θ.
Secondly, introducing the distribution of the behavior codes on the basis of the general expression, wherein the condition distribution of the behavior codes corresponding to the preset theme is represented as:
finally, a preset joint distribution final expression is derived from p (θ, z, w | α, β) = p (z | α) p (w | z, β).
Step S203, summarizing the preset theme according to the preset joint distribution, the first preset parameter and the second preset parameter to obtain the edge distribution.
In this embodiment, the manner of summarizing the preset theme may be:
in the above formula (1), θ is integrated, and the edge distribution after summarizing the preset theme is:
and S204, determining preset probability distribution according to the preset joint distribution and the edge distribution.
In this embodiment, the preset probability distribution can be expressed as:
where p (θ, z, w | α, β) represents a preset joint distribution, and p (w | α, β) represents an edge distribution.
It should be further noted that steps S201 to S204 may be executed when necessary, for example, executed between steps S102, or may be executed in advance, and the executed preset probability distribution is stored, so that the preset probability distribution can be directly obtained and used when the behavior feature extraction is necessary.
It should be noted that steps S201 to S204 may be executed on the same electronic device as steps S101 to S103 or steps S101 to S103 and their sub-steps, or may be executed on different electronic devices.
In order to execute the corresponding steps in the above embodiments and various possible implementations, an implementation of the behavior feature extraction apparatus 100 based on the mobile phone signaling is given below. Referring to fig. 5, fig. 5 is a block diagram illustrating a behavior feature extraction apparatus 100 based on cell phone signaling according to an embodiment of the present invention. It should be noted that the basic principle and the generated technical effect of the behavior feature extraction apparatus 100 based on the mobile phone signaling provided in the present embodiment are the same as those of the foregoing embodiments, and for the sake of brief description, no reference is made to this embodiment.
The behavior feature extraction device 100 based on the mobile phone signaling comprises an acquisition module 110, a determination module 120 and a calculation module 130.
The obtaining module 110 is configured to obtain a behavior code of a user, where the behavior code is obtained by coding the residence information according to the residence time sequence in a preset time period, and the residence information of the user is obtained according to a mobile phone signaling of the user.
Further, the residence information includes a residence location, a residence start time, a residence time, and a residence date, and the obtaining module 110 is specifically configured to: determining a target preset place type corresponding to the residence place according to the residence place and the preset place types included in the preset area to which the residence place belongs; determining a target interval segment corresponding to the residence starting time according to a preset interval segment, wherein the preset interval segment is obtained by dividing time intervals in one day; determining a target time length segment corresponding to the residence time length according to a preset time length segment, wherein the preset time length segment is obtained by dividing the time length in one day; determining a target date characteristic corresponding to the resident date according to a mapping relation between preset date and date characteristics; and generating a behavior code according to the type of a target preset place, the target interval segmentation, the target duration segmentation and the target date characteristics.
Further, the obtaining module 110 is further configured to: a corpus is obtained, wherein the corpus comprises behavior codes of a plurality of users.
A determining module 120, configured to determine, by using a preset probability distribution, a probability of the behavior code under a preset topic.
Further, the preset area includes a plurality of grids, each grid corresponds to one preset location type, and the determining module 120 is specifically configured to: determining a target grid to which the residence site belongs; and taking the preset place type corresponding to the target grid as the target preset place type.
Further, the determining module 120 is further configured to: gibbs sampling is carried out on a corpus, and a first preset parameter and a second preset parameter of preset joint distribution are determined, wherein the preset joint distribution is used for representing joint distribution of a preset theme and behavior codes in the corpus, the first preset parameter is used for representing a distribution parameter of the preset theme in the corpus, and the second preset parameter is used for representing a distribution parameter of the behavior codes and the preset theme in the corpus; summarizing preset themes according to the preset joint distribution, the first preset parameter and the second preset parameter to obtain edge distribution; and determining the preset probability distribution according to the preset joint distribution and the edge distribution.
Further, the first preset parameter includes a first distribution parameter and a second distribution parameter, where the first distribution parameter is used to represent a parameter of a multi-term distribution of a preset theme, and the second distribution parameter is used to represent a parameter of a dirichlet distribution of the first distribution parameter.
Furthermore, the number of the preset themes is K, the number of behavior codes after the repetition of the behavior codes in the corpus is removed is V, and the second preset parameter is a K multiplied by V matrix.
The calculating module 130 is configured to calculate the behavior feature of the user according to the probability of the behavior code under the preset theme, where a dimension of the behavior feature of the user is equal to the number of the preset themes.
Further, the behavior codes are multiple, the preset topics are multiple, the number of the preset topics is less than the number of the behavior codes, the probability of the behavior codes under the preset topics includes the probability of each behavior code under each preset topic, and the calculating module 130 is specifically configured to: calculating the average value of the probability of all the behavior codes under each preset theme to obtain the behavior characteristics corresponding to each preset theme; and taking the behavior characteristics corresponding to all the preset themes as the behavior characteristics of the user.
Referring to fig. 6, fig. 6 is a block schematic diagram of an electronic device 10 according to an embodiment of the present invention, where the electronic device 10 may be a physical host, or may be a virtual machine that implements the same function as the physical host, or may be a server, a server cluster, a cloud server, or a mobile terminal. The electronic device 10 includes a processor 11, a memory 12, and a bus 13. The processor 11 and the memory 12 communicate via a bus 13.
The processor 11 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 11. The Processor 11 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components.
The memory 12 is used for storing a program, for example, the behavior feature extraction device 100 based on mobile phone signaling in fig. 5 in the embodiment of the present invention, where the behavior feature extraction device 100 based on mobile phone signaling includes at least one software functional module that may be stored in the memory 12 in the form of software or firmware (firmware), and the processor 11 executes the program after receiving an execution instruction to implement the behavior feature extraction method based on mobile phone signaling in the embodiment of the present invention.
The Memory 12 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory). Alternatively, the memory 12 may be a storage device built in the processor 11, or may be a storage device independent of the processor 11.
The bus 13 may be an ISA bus, a PCI bus, an EISA bus, or the like. Fig. 6 is indicated by only one double-headed arrow, but does not indicate only one bus or one type of bus.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a controller, implements the behavior feature extraction method based on mobile phone signaling in the foregoing embodiments.
In summary, the embodiment of the present invention provides a behavior feature extraction method based on a mobile phone signaling and a related device, where the method includes: acquiring a behavior code of a user, wherein the behavior code is obtained by coding resident information according to a resident time sequence in a preset time period by the user, and the resident information of the user is obtained according to a mobile phone signaling of the user; determining the probability of the behavior code under a preset theme by utilizing preset probability distribution; and calculating the behavior characteristics of the user according to the probability of the behavior code under the preset theme, wherein the dimensionality of the behavior characteristics of the user is equal to the number of the preset themes. Compared with the prior art, the embodiment of the invention obtains the behavior characteristics by encoding the behaviors with the residence time sequence and performing probability calculation by using the preset probability distribution, so the behavior characteristics are more accurate and comprehensive, and the travel behaviors of the user can be fully reflected.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (9)
1. A behavior feature extraction method based on mobile phone signaling is characterized by comprising the following steps:
acquiring a behavior code of a user, wherein the behavior code is obtained by coding resident information according to a resident time sequence in a preset time period by the user, and the resident information of the user is obtained according to a mobile phone signaling of the user;
determining the probability of the behavior code under a preset theme by utilizing preset probability distribution;
calculating the behavior characteristics of the user according to the probability of the behavior code under the preset theme, wherein the dimensionality of the behavior characteristics of the user is equal to the number of the preset themes;
the method further comprises the following steps:
obtaining a corpus, wherein the corpus comprises behavior codes of a plurality of users;
gibbs sampling is carried out on the corpus, and a first preset parameter and a second preset parameter of preset joint distribution are determined, wherein the preset joint distribution is used for representing the joint distribution of the preset theme and the behavior codes in the corpus, the first preset parameter is used for representing the distribution parameter of the preset theme in the corpus, and the second preset parameter is used for representing the distribution parameter of the behavior codes and the preset theme in the corpus;
summarizing the preset theme according to the preset joint distribution, the first preset parameter and the second preset parameter to obtain edge distribution;
and determining the preset probability distribution according to the preset joint distribution and the edge distribution.
2. The behavior feature extraction method based on mobile phone signaling according to claim 1, wherein the residence information includes a residence location, a residence start time, a residence time, and a residence date, and the step of obtaining the behavior code of the user includes:
determining a target preset place type corresponding to a resident place according to the resident place and a preset place type included in a preset area to which the resident place belongs;
determining a target interval segment corresponding to the residence starting time according to a preset interval segment, wherein the preset interval segment is obtained by dividing time intervals in one day;
determining a target duration segment corresponding to the residence duration according to a preset duration segment, wherein the preset duration segment is obtained by dividing the duration in one day;
determining a target date characteristic corresponding to the resident date according to a mapping relation between preset date and date characteristics;
and generating the behavior code according to the target preset place type, the target interval segment, the target duration segment and the target date characteristic.
3. The method according to claim 2, wherein the preset area includes a plurality of grids, each grid corresponds to a preset location type, and the step of determining the target preset location type corresponding to the resident location according to the resident location and the preset location type included in the preset area to which the resident location belongs includes:
determining a target grid to which the residence location belongs;
and taking the preset place type corresponding to the target grid as the target preset place type.
4. The method as claimed in claim 1, wherein the behavior code is multiple, the preset topic is multiple, the number of the preset topics is smaller than the number of the behavior codes, the probability of the behavior code under the preset topic includes the probability of each behavior code under each preset topic, and the step of calculating the behavior feature of the user according to the probability of the behavior code under the preset topic includes:
calculating the average value of the probability of all the behavior codes under each preset theme to obtain the behavior characteristics corresponding to each preset theme;
and taking the behavior characteristics corresponding to all preset subjects as the behavior characteristics of the user.
5. The method as claimed in claim 1, wherein the first preset parameter includes a first distribution parameter and a second distribution parameter, wherein the first distribution parameter is used to characterize parameters of a plurality of distributions of the preset theme, and the second distribution parameter is used to characterize parameters of a dirichlet distribution of the first distribution parameter.
7. A behavior feature extraction device based on mobile phone signaling is characterized in that the device comprises:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a behavior code of a user, the behavior code is obtained by encoding resident information according to a resident time sequence in a preset time period by the user, and the resident information of the user is obtained according to a mobile phone signaling of the user;
the acquisition module is further used for acquiring a corpus, wherein the corpus comprises behavior codes of a plurality of users;
the determining module is used for determining the probability of the behavior code under a preset theme by utilizing preset probability distribution;
the calculation module is used for calculating the behavior characteristics of the user according to the probability of the behavior code under the preset theme, wherein the dimensionality of the behavior characteristics of the user is equal to the number of the preset themes;
the determining module is further configured to: gibbs sampling is carried out on the corpus, and a first preset parameter and a second preset parameter of preset joint distribution are determined, wherein the preset joint distribution is used for representing the joint distribution of the preset theme and the behavior codes in the corpus, the first preset parameter is used for representing the distribution parameter of the preset theme in the corpus, and the second preset parameter is used for representing the distribution parameter of the behavior codes and the preset theme in the corpus; summarizing the preset theme according to the preset joint distribution, the first preset parameter and the second preset parameter to obtain edge distribution; and determining the preset probability distribution according to the preset joint distribution and the edge distribution.
8. An electronic device comprising a controller and a memory; the memory is used for storing programs; the controller is used for realizing the behavior feature extraction method based on the mobile phone signaling according to any one of claims 1-6 when executing the program.
9. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a controller, implements the behavior feature extraction method based on handset signaling according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210266442.5A CN114626340B (en) | 2022-03-17 | 2022-03-17 | Behavior feature extraction method based on mobile phone signaling and related device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210266442.5A CN114626340B (en) | 2022-03-17 | 2022-03-17 | Behavior feature extraction method based on mobile phone signaling and related device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114626340A CN114626340A (en) | 2022-06-14 |
CN114626340B true CN114626340B (en) | 2023-02-03 |
Family
ID=81901486
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210266442.5A Active CN114626340B (en) | 2022-03-17 | 2022-03-17 | Behavior feature extraction method based on mobile phone signaling and related device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114626340B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110442662A (en) * | 2019-07-08 | 2019-11-12 | 清华大学 | A kind of method and information-pushing method of determining customer attribute information |
CN111159583A (en) * | 2019-12-31 | 2020-05-15 | 中国联合网络通信集团有限公司 | User behavior analysis method, device, equipment and storage medium |
CN112836507A (en) * | 2021-01-13 | 2021-05-25 | 哈尔滨工程大学 | Method for extracting domain text theme |
CN112836121A (en) * | 2021-01-28 | 2021-05-25 | 北京市城市规划设计研究院 | Travel purpose identification method and system |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8223626B2 (en) * | 2008-01-11 | 2012-07-17 | Yim Tu Investments Ltd., Llc | Linear precoding for MIMO channels with outdated channel state information in multiuser space-time block coded systems with multi-packet reception |
CN102945223B (en) * | 2012-11-21 | 2015-05-20 | 华中科技大学 | Method for constructing joint probability distribution function of output of a plurality of wind power plants |
CN107391565B (en) * | 2017-06-13 | 2020-11-03 | 东南大学 | Matching method of cross-language hierarchical classification system based on topic model |
CN110868689B (en) * | 2019-11-25 | 2020-12-08 | 智慧足迹数据科技有限公司 | Standing population determining method and device and electronic equipment |
-
2022
- 2022-03-17 CN CN202210266442.5A patent/CN114626340B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110442662A (en) * | 2019-07-08 | 2019-11-12 | 清华大学 | A kind of method and information-pushing method of determining customer attribute information |
CN111159583A (en) * | 2019-12-31 | 2020-05-15 | 中国联合网络通信集团有限公司 | User behavior analysis method, device, equipment and storage medium |
CN112836507A (en) * | 2021-01-13 | 2021-05-25 | 哈尔滨工程大学 | Method for extracting domain text theme |
CN112836121A (en) * | 2021-01-28 | 2021-05-25 | 北京市城市规划设计研究院 | Travel purpose identification method and system |
Non-Patent Citations (1)
Title |
---|
我国用户画像研究的知识网络与热点领域分析;吴加琪;《现代情报》;20180815(第08期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114626340A (en) | 2022-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649818B (en) | Application search intention identification method and device, application search method and server | |
CN110019794B (en) | Text resource classification method and device, storage medium and electronic device | |
EP3608799A1 (en) | Search method and apparatus, and non-temporary computer-readable storage medium | |
CN110209809B (en) | Text clustering method and device, storage medium and electronic device | |
CN113505272B (en) | Control method and device based on behavior habit, electronic equipment and storage medium | |
CN110046251A (en) | Community content methods of risk assessment and device | |
CN114780746A (en) | Knowledge graph-based document retrieval method and related equipment thereof | |
CN108512883A (en) | A kind of information-pushing method, device and readable medium | |
CN116795947A (en) | Document recommendation method, device, electronic equipment and computer readable storage medium | |
CN112070550A (en) | Keyword determination method, device and equipment based on search platform and storage medium | |
CN110147535A (en) | Similar Text generation method, device, equipment and storage medium | |
CN110674301A (en) | Emotional tendency prediction method, device and system and storage medium | |
CN107563394B (en) | Method and system for predicting popularity of picture | |
CN114330335A (en) | Keyword extraction method, device, equipment and storage medium | |
CN110019763B (en) | Text filtering method, system, equipment and computer readable storage medium | |
CN107665222B (en) | Keyword expansion method and device | |
CN114626340B (en) | Behavior feature extraction method based on mobile phone signaling and related device | |
CN115374793B (en) | Voice data processing method based on service scene recognition and related device | |
CN115130455A (en) | Article processing method and device, electronic equipment and storage medium | |
WO2021228220A1 (en) | Triggering method and triggering apparatus of intervention prompt on the basis of user smoking behavior records | |
CN111339287B (en) | Abstract generation method and device | |
CN113934842A (en) | Text clustering method and device and readable storage medium | |
CN111552850A (en) | Type determination method and device, electronic equipment and computer readable storage medium | |
CN109947947B (en) | Text classification method and device and computer readable storage medium | |
CN113065025A (en) | Video duplicate checking method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |