CN117094360A

CN117094360A - User characterization extraction method, device, equipment and storage medium

Info

Publication number: CN117094360A
Application number: CN202311345459.0A
Authority: CN
Inventors: 王路路; 徐超
Original assignee: Hangzhou Tonghuashun Data Development Co ltd
Current assignee: Hangzhou Tonghuashun Data Development Co ltd
Priority date: 2023-10-18
Filing date: 2023-10-18
Publication date: 2023-11-21

Abstract

The application discloses a user characterization extraction method, a device, equipment and a storage medium, which relate to the technical field of computers and comprise the following steps: acquiring a plurality of information data of a user; preprocessing by utilizing a corresponding preprocessing module according to the characteristic type of each information data so as to obtain each characteristic sequence; inputting each characteristic sequence into a preset language model encoder for training to obtain a corresponding target large model, and learning the association between the characteristic sequences in a mode of implicitly extracting the characteristics based on the large model in the training process to obtain user characterization corresponding to each characteristic sequence, so that each application can complete corresponding tasks based on the user characterization. Therefore, the application can train a large model, so that each application can acquire the user characterization based on the large model, and further finish the corresponding task according to the user characterization.

Description

User characterization extraction method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for extracting a user representation.

Background

With the development of NLP (Neuro-Linguistic Programming) and CV (Computer Vision) fields, large-scale language models represented by chatGPT (Chat Generative Pre-trained Transformer, chat robot model) and Stable dispersion and various visual models have been developed, and great success has been achieved in the respective fields, and inspired by the fact that research on a large model of a user in a recommendation system has been largely started for two years, and google microsoft aliTeng has a related paper, which is intended to learn general characterization and depth models based on understanding of user behaviors by using a large model, and then applied to various recommendation scenes downstream.

Related studies from ali: perceive Your Users in Depth: learning Universal User Representations from Multiple E-communication Tasks, published in 2018, are almost the earliest work in the field of user characterization, and the main technical points are that a multi-task and multi-objective Learning mode is adopted, including a CTR (Click-Through-Rate) objective, an L2R (ranking model) objective and the like, and a plurality of targets are integrated to be capable of Learning better user characterization theoretically, but the problem that the targets cannot be balanced well is brought.

Related studies from Tencent and Google: the Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and Recommendation proposes a general token migration learning algorithm named PeterRec in 2020, which is a relatively early method, tests the effectiveness of user tokens on a plurality of downstream tasks including recommended tasks, user portraits and other tasks, and verifies the migration capability and general capability of the user tokens.

Related studies from Tencent and Google: based on PeterRec, one Person, one Model, one world: learning Continual User Representation without Forgetting, lifelong learning (lifelong learning) thought and mechanism of user characterization are provided, and migration capability and general capability of a large Model are further improved.

Related studies from bloom and congress: in order to solve the problem that User behavior data is too sparse, authors propose a Pre-training mechanism, and meanwhile improve the existing User behavior Pre-training technology, and a Pre-training model based on heterogeneous User behaviors is realized from the use of behavior sequence information only to the addition of heterogeneous User information.

Related studies from ali: the method of learning for life mainly adopts a mode that articles in a behavior sequence are clustered first and then pre-trained, and verifies the effect on an electronic commerce scene.

However, the current user pre-training model mostly generates the user's ebedding according to the user's behavior information, and then, the user recommends the user to the recall and sequence of the system, which has the scene specificity. Therefore, how to obtain a general user model, and how to use the user model to extract user characteristics to solve various problems is urgently needed to be solved.

Disclosure of Invention

In view of the above, the present application aims to provide a method, a device, equipment and a storage medium for extracting user characteristics, which can be used for obtaining a general user model, and extracting the user characteristics by using the user model to solve various application problems. The specific scheme is as follows:

in a first aspect, the application discloses a user characterization extraction method, which comprises the following steps:

acquiring a plurality of information data of a user;

preprocessing by utilizing a corresponding preprocessing module according to the characteristic type of each information data so as to obtain each characteristic sequence;

inputting each characteristic sequence into a preset language model encoder for training to obtain a corresponding target large model, and learning the association between the characteristic sequences in a mode of implicitly extracting the characteristics based on the large model in the training process to obtain user characterization corresponding to each characteristic sequence, so that each application can complete corresponding tasks based on the user characterization.

Optionally, the information data includes any one or a combination of several of the user's basic information, the user's asset information, the user's investment concept, and the user's behavior data.

Optionally, the feature type includes any one or a combination of several of a continuous type, a key value type, a sequence type and a table type.

Optionally, the preprocessing is performed by using a corresponding preprocessing module according to the feature type of each information data, including:

the continuous information data are correspondingly preprocessed by a preset depth feature extractor;

and/or, carrying out corresponding preprocessing on the information data of the key value type by using a preset cross network;

and/or correspondingly preprocessing the information data of the sequence type based on a preset transducer structure;

and/or, correspondingly preprocessing the information data in the form based on a preset convolution network.

Optionally, the collecting process of the sequential information data includes:

collecting behavior data of different levels of the user respectively to obtain corresponding user behavior data sequences of all dimensions;

collecting feedback results of the behavior data of the user to obtain corresponding feedback result sequences; the feedback result comprises any one or a combination of more of preset explicit feedback of the user, preset implicit feedback of the user, preset positive feedback of the user and preset negative feedback of the user;

acquiring side information corresponding to the behavior of the user to obtain a corresponding side information sequence; the side information comprises environmental characteristics of the user when the behavior of the user occurs and attributes of the behavior interaction object of the user.

Optionally, the method further comprises:

performing periodic division on the user behavior data sequence based on a preset rule to obtain a target periodic sequence;

performing corresponding processing on the target periodic sequence based on a preset processing mode corresponding to the type of the target periodic sequence; the types of the target periodic sequences include long periodic sequences in units of months or years, and short periodic sequences in units of days or weeks.

Optionally, the performing, based on a preset processing manner corresponding to the type of the target periodic sequence, a corresponding processing on the target periodic sequence includes:

if the target periodic sequence is the short periodic sequence, processing the target periodic sequence based on a preset transducer structure;

and if the target periodic sequence is the long periodic sequence, carrying out multidimensional statistics on the behavior times of the user before processing the target periodic sequence so as to determine the preference of the user.

In a second aspect, the present application discloses a user characterization extraction device, including:

the data information acquisition module is used for acquiring a plurality of information data of a user;

the characteristic sequence acquisition module is used for respectively preprocessing the characteristic types of the information data by utilizing the corresponding preprocessing module so as to acquire each characteristic sequence;

the user characterization acquisition module is used for inputting each characteristic sequence into a preset language model encoder for training to obtain a corresponding target large model, and learning the association between the characteristic sequences in a mode of implicitly extracting the characteristics based on the large model in the training process to obtain the user characterization corresponding to each characteristic sequence, so that each application can complete corresponding tasks based on the user characterization.

In a third aspect, the present application discloses an electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the user characterization extraction steps of the foregoing disclosure.

In a fourth aspect, the present application discloses a computer-readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements the user characterization extraction steps of the foregoing disclosure.

When extracting user characterization, firstly acquiring a plurality of information data of a user; preprocessing by utilizing a corresponding preprocessing module according to the characteristic type of each information data so as to obtain each characteristic sequence; inputting each characteristic sequence into a preset language model encoder for training to obtain a corresponding target large model, and learning the association between the characteristic sequences in a mode of implicitly extracting the characteristics based on the large model in the training process to obtain user characterization corresponding to each characteristic sequence, so that each application can complete corresponding tasks based on the user characterization. Therefore, the application can acquire the characteristic sequence based on the information data, train by utilizing the characteristic sequence to acquire the corresponding large model, further extract the association of the characteristic sequence from the training large model to acquire the user characterization, and further enable each application to finish the corresponding task based on the user characterization. Therefore, the method and the device can acquire the universal large model, extract the user characterization based on the training large model, and further can rapidly and efficiently solve various application tasks by utilizing the user characterization.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a user characterization extraction method disclosed by the application;

FIG. 2 is a flowchart of a specific user token extraction method disclosed in the present application;

FIG. 3 is a schematic diagram of a user characterization extraction device according to the present disclosure;

fig. 4 is a block diagram of an electronic device according to the present disclosure.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The current user pre-training model mostly generates user's embellishment according to the user's behavior information, and then recalls and sorts the user's recommendation system, which has scene specificity. In order to solve the technical problems, the application discloses a user characterization extraction method which can train a general large model, extract user characterization by using the large model and further finish various application tasks.

Referring to fig. 1, the embodiment of the application discloses a user characterization extraction method, which comprises the following steps:

step S11, acquiring a plurality of information data of the user.

In this embodiment, first, several information data of a user are acquired, where the information data includes any one or a combination of several of basic information of the user, asset information of the user, investment concepts of the user, and behavior data of the user. Wherein, the basic information of the user can comprise: gender, age, constellation, chinese zodiac, marital status, resident province, resident city, birth place, residence, recently logged into city, academic, engaged in industry, etc. The investment concepts may include desired investment motivations, anti-interference ratio preferences, and the like.

Step S12, preprocessing is carried out by utilizing corresponding preprocessing modules according to the feature types of the information data so as to obtain feature sequences.

In this embodiment, the feature type includes any one or a combination of several of a continuous type, a key value type, a sequence type, and a table type. Specifically, the continuous type: such as a preference for a certain stock; key-value (key-value) type: for example, the number of views of a certain stock by a user in a period; sequence type: such as a sequence of clicks by a user in an information stream; form type: such as a user's asset configuration table. And (3) preprocessing the 4 types of features by using corresponding preprocessing modules to acquire each feature sequence. The continuous information data are correspondingly preprocessed by a preset depth feature extractor; and/or, carrying out corresponding preprocessing on the information data of the key value type by using a preset cross network; and/or correspondingly preprocessing the information data of the sequence type based on a preset transducer structure; and/or, correspondingly preprocessing the information data in the form based on a preset convolution network. In a specific embodiment, the continuous type: a depth feature extractor MLP (Multi-layer Perceptron), with the continuous feature most suitable for MLP; key-value type: cross-network FM (Factorization Machine), FM performs best in dealing with sparse features; sequence type: a transducer structure, which takes the behavior sequence as a language sequence and captures the change characteristics of the language sequence along with time; form type: the convolutional network CNN (Convolutional Neural Network), which treats the form as an image-like feature, captures the correlation inside the form.

Step S13, inputting each characteristic sequence into a preset language model encoder for training to obtain a corresponding target large model, and learning the association between the characteristic sequences in a mode of implicitly extracting the characteristics based on the large model in the training process to obtain user characterization corresponding to each characteristic sequence, so that each application can complete corresponding tasks based on the user characterization.

In the embodiment, the feature sequence is regarded as a language sequence, the language sequence is input to the GPT encoder with the highest efficiency at present, and the association between the features is implicitly extracted through a large model, so that the user characterization is extracted; the feature and behavior sequence of the user are analogically to a token of a language model, and then a universal LUM (large user model) is trained to correspond to the language model (LLM, large language model, large language model):

list one

The relation among the user characteristics is automatically captured in the encoder, and the common characteristics in the user behaviors are captured, so that the user information can be well compressed to a low-dimensional user representation. User characterization may be provided to a plurality of downstream applications, such as predictive tasks, interpretive generation of recommendations, yield of recommended solutions, and optimization of older solutions, among others.

Based on the above embodiments, various types of behavior data are included in the present application. Next, a process for acquisition of information data of the sequential type will be specifically described. Referring to fig. 2, the embodiment of the application discloses a specific user characterization extraction method, which comprises the following steps:

and S21, respectively acquiring behavior data of different levels of the user to obtain corresponding user behavior data sequences of all dimensions.

In this embodiment, each behavior in the sequence carries more information, and first carries scene information, and since the types of downstream tasks connected after the large model of the user are different, the sequence with different dimensions needs to be considered for characterization, so that the behaviors of different levels of the user are respectively collected. In a specific embodiment, the different levels of behavior appear as:

1. first order page exposure sequence: first page, quotation, self-selection, trade, discovery, financial accounting, feed, time sharing, K line, etc.; the user's jump sequence among the various pages in the handsheet is mainly depicted.

2. Secondary function click sequence:

a. quotation (first level page): global, a-stock, futures, etc.

b. Discovery (first order page): attention, recommendation, leaderboard, inventory, information, etc.

c. First page (first page): financial, grid, feed, advertisement, search, etc.

...

The primary page has a secondary function below it, which can be linked to other primary or secondary pages.

3. Sequence of consumer items in a scenario:

feed/discovery: information, results pages, news, advertisements. (recommended platform item id (Identity document)).

b. And (3) a robot: question. (recommended platform item id).

c. Time sharing-announcement: and (5) announcements.

d. Time sharing-diagnosis of the thigh: technical face, message face, history back measurement, financial comprehensive score, etc.

Step S22, collecting feedback results of the behavior data of the user to obtain corresponding feedback result sequences; the feedback result comprises any one or a combination of more than one of preset explicit feedback of the user, preset implicit feedback of the user, preset positive feedback of the user and preset negative feedback of the user.

In this embodiment, both explicit feedback and implicit feedback of the user can reflect the user's preferences for the consumed items/functions. Such feedback information is useful in expert operating and recommendation systems. For example, current large-scale deep recommendation models often target clicks, only focus on implicit positive feedback represented by user click behavior, and ignore other valid user feedback information. The present application focuses on a variety of explicit/implicit and positive/negative feedback information for the user, unbiased characterizing user interest preferences. Therefore, the feedback result of the behavior data of the user is collected to obtain a corresponding feedback result sequence. In a specific embodiment, the explicit feedback, implicit feedback, positive feedback, and negative feedback of the user are:

implicit positive feedback: clicking, reading and actively inputting a question.

Implicit negative feedback: the browse exposure is not clicked.

Explicit positive feedback: add self-choosing, praise, share, collect and pay.

Explicit negative feedback: and (5) stepping on the points.

Step S23, acquiring side information corresponding to the behavior of the user to obtain a corresponding side information sequence; the side information comprises environmental characteristics of the user when the behavior of the user occurs and attributes of the behavior interaction object of the user.

In this embodiment, the behavior-side information mainly includes the environmental characteristics of the user when the behavior occurs, and the attributes of the user behavior interaction objects (items and functions). Thus, the model can be more comprehensively trained by comprehensively collecting the environmental characteristics of the user behaviors and the attributes of the behavior interaction objects of the user, and therefore, the side information corresponding to the behaviors of the user is required to be collected so as to obtain the corresponding side information sequence. Further, training can be performed using the side information sequence model. In a specific embodiment, the environmental features when the user behavior occurs refer to the position information of the object in the sequence, the time period, the big disk rise and fall, the individual rise and fall, the scene of the current consumer goods, and the like; attributes of the user behavior interaction object (item, function) refer to the current primary page attribute, the secondary function attribute, and the label/feature (item id, index, author, investment link, item type) of the consumer item in the es (search server) item pool.

It should be noted that, considering that the user sequence may be long, the data size is too large, which causes many engineering problems, and meanwhile, some particularly early behaviors have little significance on the current downstream task, such as predicting the next click, and do not need to consider what the user clicked a few years ago; however, long period behavior often has long-term user preferences that still occasionally affect the next click, so long and short period sequences are handled in different ways. Firstly, carrying out periodic division on the user behavior data sequence based on a preset rule to obtain a target periodic sequence; performing corresponding processing on the target periodic sequence based on a preset processing mode corresponding to the type of the target periodic sequence; the types of the target periodic sequences include long periodic sequences in units of months or years, and short periodic sequences in units of days or weeks. Dividing a short period sequence according to a certain rule, wherein the short period sequence is of a day and week level and is generally 2-4 weeks; the long period sequence is on the order of months and years, and is generally taken within 1 year. The short-period sequence is processed by a transducer structure, similar to the NLP processing mode. Before the long period sequence is processed, statistics is firstly carried out, and the number of times of the user's behaviors is counted from a plurality of dimensions such as stocks. Taking a company a stock as an example, the user's preference for a company a stock may be calculated using the formula:

；

where i represents the ith action of the n actions of the user,is the temporal penalty coefficient->Representing the time difference of this time of action from now, < >>Is forgetting coefficient, < >>Is the emotional intensity of this behavior, e.g. click 1, praise 10,/o>Is the association of the interactive item with company A stock, such as clicking on a review analysisThe content of the stock price of company A is 1.

That is, if the target periodic sequence is the short periodic sequence, processing the target periodic sequence based on a preset transducer structure; and if the target periodic sequence is the long periodic sequence, carrying out multidimensional statistics on the behavior times of the user before processing the target periodic sequence so as to determine the preference of the user.

As can be seen from the above, the present application can collect the behavior data of different levels of the user, the side information corresponding to the behavior of the user, and the feedback result of the behavior data of the user, so as to obtain the corresponding sequences, further train the model according to the sequences, and obtain a more accurate general model, so as to solve various tasks of understanding and generating the user, such as new user KYC (knowledge-consumer) feature completion, user portrait abstract, recommendation interpretability, etc., and the usage scenario is no longer specific.

Referring to fig. 3, an embodiment of the present application discloses a user characterization extraction device, including:

a data information acquisition module 11, configured to acquire several information data of a user;

a feature sequence obtaining module 12, configured to perform preprocessing by using a corresponding preprocessing module according to the feature type of each information data, so as to obtain each feature sequence;

the user representation obtaining module 13 is configured to input each feature sequence into a preset language model encoder for training to obtain a corresponding target large model, and learn the association between each feature sequence in a manner of implicitly extracting features based on the large model in the training process to obtain a user representation corresponding to each feature sequence, so that each application completes a corresponding task based on the user representation.

In some specific embodiments, the feature sequence obtaining module 12 may specifically include:

the first preprocessing unit is used for correspondingly preprocessing the continuous information data by utilizing a preset depth feature extractor;

the second preprocessing unit is used for and/or utilizing a preset cross network to perform corresponding preprocessing on the key-value type information data;

the third preprocessing unit is used for and/or correspondingly preprocessing the information data of the sequence based on a preset transducer structure;

and the fourth preprocessing unit is used for and/or carrying out corresponding preprocessing on the tabular information data based on a preset convolution network.

In some specific embodiments, the apparatus may specifically include:

the behavior data acquisition module is used for acquiring the behavior data of different levels of the user respectively so as to obtain corresponding user behavior data sequences of all dimensions;

the feedback result acquisition module is used for acquiring feedback results of the behavior data of the user so as to obtain a corresponding feedback result sequence; the feedback result comprises any one or a combination of more of preset explicit feedback of the user, preset implicit feedback of the user, preset positive feedback of the user and preset negative feedback of the user;

the side information acquisition module is used for acquiring side information corresponding to the behavior of the user so as to obtain a corresponding side information sequence; the side information comprises environmental characteristics of the user when the behavior of the user occurs and attributes of the behavior interaction object of the user.

In some specific embodiments, the apparatus may further include:

the sequence dividing module is used for dividing the user behavior data sequence periodically based on a preset rule so as to obtain a target periodic sequence;

the sequence processing module is used for carrying out corresponding processing on the target periodic sequence based on a preset processing mode corresponding to the type of the target periodic sequence; the types of the target periodic sequences include long periodic sequences in units of months or years, and short periodic sequences in units of days or weeks.

In some specific embodiments, the sequence processing module may specifically include:

the first processing unit is used for processing the target periodic sequence based on a preset transducer structure if the target periodic sequence is the short periodic sequence;

and the second processing unit is used for carrying out multidimensional statistics on the behavior times of the user before processing the target periodic sequence if the target periodic sequence is the long periodic sequence so as to determine the preference of the user.

Further, the embodiment of the present application further discloses an electronic device, and fig. 4 is a block diagram of an electronic device 20 according to an exemplary embodiment, where the content of the diagram is not to be considered as any limitation on the scope of use of the present application.

Fig. 4 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is used for storing a computer program, which is loaded and executed by the processor 21 to implement the relevant steps in the user characterization extraction method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.

In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol in which the communication interface is in compliance is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.

The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.

The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and computer programs 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further comprise a computer program capable of performing other specific tasks in addition to the computer program capable of performing the user characterization extraction method performed by the electronic device 20 as disclosed in any of the previous embodiments.

Further, the application also discloses a computer readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the previously disclosed user characterization extraction method. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the application that follows may be better understood, and in order that the present principles and embodiments may be better understood; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method for extracting a user representation, comprising:

acquiring a plurality of information data of a user;

2. The user characterization extraction method according to claim 1, wherein the information data includes any one or a combination of several of basic profile information of the user, asset information of the user, investment concepts of the user, and behavior data of the user.

3. The user characterization extraction method according to claim 1 or 2, wherein the feature type includes any one or a combination of several of a continuous type, a key value type, a sequence type, and a tabular type.

4. A method of extracting a user's characterization according to claim 3, wherein the preprocessing according to the feature type of each information data by using a corresponding preprocessing module includes:

5. A method of extracting a user representation according to claim 3, wherein the sequence-type information data collection process includes:

6. The user characterization extraction method of claim 5, further comprising:

7. The method for extracting a user representation according to claim 6, wherein the performing the corresponding processing on the target periodic sequence based on the preset processing manner corresponding to the type of the target periodic sequence includes:

8. A user characterization extraction device, comprising:

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the user characterization extraction method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program; wherein the computer program when executed by a processor implements the steps of the user token extraction method according to any one of claims 1 to 7.