CN111210326A - Method and system for constructing user portrait - Google Patents

Method and system for constructing user portrait Download PDF

Info

Publication number
CN111210326A
CN111210326A CN201911379627.1A CN201911379627A CN111210326A CN 111210326 A CN111210326 A CN 111210326A CN 201911379627 A CN201911379627 A CN 201911379627A CN 111210326 A CN111210326 A CN 111210326A
Authority
CN
China
Prior art keywords
data
user
portrait
constructing
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911379627.1A
Other languages
Chinese (zh)
Inventor
刘宇
陈皓
郑海洋
陈东至
季京生
董小康
张瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ele Cloud Information Technology Co ltd
Original Assignee
Ele Cloud Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ele Cloud Information Technology Co ltd filed Critical Ele Cloud Information Technology Co ltd
Priority to CN201911379627.1A priority Critical patent/CN111210326A/en
Publication of CN111210326A publication Critical patent/CN111210326A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/10Tax strategies

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for constructing a user portrait, which comprise the following steps: acquiring original data for constructing a user portrait; preprocessing the acquired original data for constructing the user portrait according to a preset processing rule to acquire a user portrait data source in a standard format, and storing and transmitting the user portrait data source according to a preset storage and transmission mode; and determining a service classification label of the user to be portrait according to the user information of the user to be portrait, and performing behavior analysis according to the determined service classification label to obtain the user label. The method supports tax control equipment integration, and improves the accuracy of constructing user portrait data by combining with geographic position data; secondly, combining software and hardware, introducing a block chain technology to storage analysis, and realizing local area network communication to ensure the safety of the portrait data; finally, the device adopts independent distributed deployment, can realize point-to-point mode communication, has promoted data analysis and aggregation ability.

Description

Method and system for constructing user portrait
Technical Field
The present invention relates to the field of user portrayal technology, and more particularly, to a method and system for constructing a user portrayal.
Background
The user portrait is also called a user role and is an effective tool for delineating target users and connecting user appeal and design direction, and the user portrait is widely applied to various fields. User portrayal is originally applied in the E-commerce field, and in the background of the big data era, user information is flooded in a network, each piece of concrete information of a user is abstracted into labels, and the labels are utilized to concretize the user image, so that targeted services are provided for the user. In the actual operation process, the attributes, behaviors and expectations of the user are often combined with the most superficial and life-close words. As a virtual representation of an actual user, the user roles formed by user portrayal are not constructed outside products and markets, and the formed user roles need to represent the main audience and target groups of the products.
Disclosure of Invention
The invention provides a method and a system for constructing a user portrait, which aim to solve the problem of how to construct the user portrait.
To solve the above problem, according to an aspect of the present invention, there is provided a method for constructing a user representation, the method comprising:
acquiring original data for constructing a user portrait; wherein the initial data comprises: tax data, behavior information generated by tax control equipment, position information of the tax control equipment and enterprise network data;
preprocessing the acquired original data for constructing the user portrait according to a preset processing rule to acquire a user portrait data source in a standard format, and storing and transmitting the user portrait data source according to a preset storage and transmission mode;
and determining a service classification label of the user to be portrait according to the user information of the user to be portrait, and performing behavior analysis according to the determined service classification label to obtain the user label.
Preferably, wherein the tax data comprises:
registering and identifying data, declaring collection data, illegal violation data, preferential deduction data, invoice data, evaluation certification data, tax payment credit data and tax-related risk data; the behavior information includes: billing behavior, tax copying behavior, card clearing behavior and invoice drawing behavior.
Preferably, the preprocessing the acquired raw data for constructing the user portrait according to a preset processing rule includes:
processing missing value, abnormal value, de-duplication and noise of the acquired original data for constructing the user portrait to acquire a user portrait data source in a standard format;
preferably, the storing and transmitting the user image data source according to a preset storing and transmitting manner includes:
carrying out hardware encryption on data with a standard structure, and storing the data with a block structure by adopting a block chain technology; wherein the data of the standard structure includes: a user representation data source, a transaction date, and link information to a previous block;
the user representation data source is transmitted within the local area network using a point-to-point mode transmission based on a TCP/IP protocol.
Preferably, the determining a service classification tag of a user to be portrait according to user information of the user to be portrait, and performing behavior analysis according to the determined service classification tag to obtain the user tag includes:
determining a service classification label of a user to be imaged according to different dimension information; wherein the dimension information comprises: basic conditions, business requirements and demand concerns;
extracting the business classification labels of the users in an unsupervised mode, establishing a word bag for the label of each sample, and training the word bag, wherein the training process is the process of collecting the word bag of each label; the training samples are vectorized, training is carried out through a machine learning classification model or a seq2seq depth model, a more accurate bag of words is obtained, and labeling is carried out according to portrait dimensions of a user so as to obtain a user label.
In accordance with another aspect of the present invention, there is provided a system for constructing a user representation, the system comprising:
the data acquisition unit is used for acquiring original data used for constructing a user portrait; wherein the initial data comprises: tax data, behavior information generated by tax control equipment, position information of the tax control equipment and enterprise network data;
the data processing unit is used for preprocessing the acquired original data for constructing the user portrait according to a preset processing rule so as to acquire a user portrait data source in a standard format and store and transmit the user portrait data source according to a preset storage and transmission mode;
and the portrait analyzing unit is used for determining the service classification label of the user to be portrait according to the user information of the user to be portrait and performing behavior analysis according to the determined service classification label to acquire the user label.
Preferably, wherein the tax data comprises:
registering and identifying data, declaring collection data, illegal violation data, preferential deduction data, invoice data, evaluation certification data, tax payment credit data and tax-related risk data; the behavior information includes: billing behavior, tax copying behavior, card clearing behavior and invoice drawing behavior.
Preferably, the data processing unit, which preprocesses the acquired raw data for constructing the user representation according to a preset processing rule, includes:
processing missing value, abnormal value, de-duplication and noise of the acquired original data for constructing the user portrait to acquire a user portrait data source in a standard format;
preferably, the data processing unit, according to a preset storage and transmission mode, stores and transmits the user image data source, and includes:
carrying out hardware encryption on data with a standard structure, and storing the data with a block structure by adopting a block chain technology; wherein the data of the standard structure includes: a user representation data source, a transaction date, and link information to a previous block;
the user representation data source is transmitted within the local area network using a point-to-point mode transmission based on a TCP/IP protocol.
Preferably, the portrait analysis unit determines a service classification tag of the user to be portrait according to user information of the user to be portrait, and performs behavior analysis according to the determined service classification tag to obtain the user tag, including:
determining a service classification label of a user to be imaged according to different dimension information; wherein the dimension information comprises: basic conditions, business requirements and demand concerns;
extracting the business classification labels of the users in an unsupervised mode, establishing a word bag for the label of each sample, and training the word bag, wherein the training process is the process of collecting the word bag of each label; the training samples are vectorized, training is carried out through a machine learning classification model or a seq2seq depth model, a more accurate bag of words is obtained, and labeling is carried out according to portrait dimensions of a user so as to obtain a user label.
The invention provides a method and a system for constructing a user portrait, which comprise the following steps: acquiring original data for constructing a user portrait; preprocessing the acquired original data for constructing the user portrait according to a preset processing rule to acquire a user portrait data source in a standard format; and determining a service classification label of the user to be portrait according to the user information of the user to be portrait, and performing behavior analysis according to the determined service classification label to obtain the user label. The system supports tax control equipment integration, and improves the accuracy of constructing user portrait data by combining geographic position data; secondly, combining software and hardware, introducing a block chain technology to storage analysis, and realizing local area network communication to ensure the safety of the portrait data; finally, the device adopts independent distributed deployment, can realize point-to-point mode communication, has promoted data analysis and aggregation ability.
Drawings
A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:
FIG. 1 is a flow diagram of a method 100 for constructing a user representation according to an embodiment of the present invention; and
FIG. 2 is a block diagram of a system 200 for constructing a user representation, according to an embodiment of the invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.
Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
FIG. 1 is a flow diagram of a method 100 for constructing a user representation in accordance with an embodiment of the present invention. As shown in fig. 1, the method for constructing a user portrait provided by the embodiment of the present invention supports tax control device integration, and improves the accuracy of constructing user portrait data by combining with geographic location data; secondly, combining software and hardware, introducing a block chain technology to storage analysis, and realizing local area network communication to ensure the safety of the portrait data; finally, the device adopts independent distributed deployment, can realize point-to-point mode communication, has promoted data analysis and aggregation ability. The method 100 for constructing a user representation provided by the embodiment of the present invention starts with step 101, and acquires original data for constructing a user representation in step 101; wherein the initial data comprises: tax data, behavior information generated by tax control equipment, position information of the tax control equipment and enterprise network data.
Preferably, wherein the tax data comprises: registering and identifying data, declaring collection data, illegal violation data, preferential deduction data, invoice data, evaluation certification data, tax payment credit data and tax-related risk data; the behavior information includes: billing behavior, tax copying behavior, card clearing behavior and invoice drawing behavior.
In an embodiment of the invention, the collected data comprises: registering and identifying data, declaring collection data, illegal violation data, preferential deduction data, invoice data, evaluation certification data, tax payment credit data, tax-related risk data and the like; for tax control equipment accessed to a system, behavior information generated by the tax control equipment is automatically collected; acquiring position information of the position of the tax control equipment according to a preset time threshold; and enterprise network data of the mobile phone in a crawler mode. Wherein the behavior information includes: invoicing, tax copying, card clearing and invoice drawing.
In step 102, the acquired original data for constructing the user portrait is preprocessed according to a preset processing rule to acquire a user portrait data source in a standard format, and the user portrait data source is stored and transmitted according to a preset storage and transmission mode.
Preferably, the preprocessing the acquired raw data for constructing the user portrait according to a preset processing rule includes:
processing missing value, abnormal value, de-duplication and noise of the acquired original data for constructing the user portrait to acquire a user portrait data source in a standard format;
preferably, the storing and transmitting the user image data source according to a preset storing and transmitting manner includes:
carrying out hardware encryption on data with a standard structure, and storing the data with a block structure by adopting a block chain technology; wherein the data of the standard structure includes: a user representation data source, a transaction date, and link information to a previous block;
the user representation data source is transmitted within the local area network using a point-to-point mode transmission based on a TCP/IP protocol.
In the embodiment of the invention, exploratory analysis needs to be carried out on the acquired data. In the process of related data mining, a python related scientific calculation library is mainly used for carrying out preliminary data exploration, such as data types, missing values, data set scales, data distribution conditions under various characteristics and the like, a third-party drawing library is used for carrying out visual observation to obtain basic attributes and distribution conditions of data, and in addition, the relation among various characteristics in the data set can be preliminarily explored through univariate analysis and multivariate analysis to verify the hypothesis provided in the business analysis stage.
During data processing, missing values need to be obtained. The method for acquiring the missing values in the data sets can be directly acquired by various methods carried by pandas, and the missing values in most data sets generally exist, so that the final result of the model can be directly influenced by the processing quality of the missing values. Therefore, the missing values need to be processed mainly according to the importance of the attributes where the missing values are located and the distribution of the missing values.
①, in the case of low missing rate and low importance of attributes, if the attributes are numerical data, the data can be simply filled according to the data distribution, for example, if the data distribution is uniform, the data can be filled by using the average value, if the data distribution is inclined, the data can be filled by using the median, if the attributes are category attributes, the data can be filled by using a global constant 'Unknow', but this is often less effective because the algorithm may recognize the attributes as a brand new category and is therefore rarely used.
②, when the missing rate is high (> 95%) and the importance of the attribute is low, it is sufficient to delete the attribute directly, however, when the missing value is high and the attribute degree is high, the direct deletion of the attribute will have a bad effect on the result of the algorithm.
③, filling in the data with high missing value and attribute importance by using interpolation and modeling.
The interpolation method mainly includes a random interpolation method, a multiple interpolation method, a hot platform interpolation method, a lagrange interpolation method and a newton interpolation method. The random interpolation method is to randomly extract some samples from the population to replace missing samples. The multiple interpolation method predicts missing data through the relation between variables, generates a plurality of complete data sets by using a Monte Carlo method, analyzes the data sets, and finally summarizes the analysis results. The hot platform interpolation means that a sample (matching sample) similar to the sample where the missing value is located is found in the non-missing data set, and the missing value is interpolated by using an observation value in the sample. The advantage of processing the missing value by the interpolation method is that: the method is simple and easy to implement, and the accuracy is high; the disadvantages are: when the number of variables is large, it is often difficult to find exactly the same sample as the sample that needs to be interpolated. However, the data may be layered according to some variable, and mean interpolation may be applied to the missing values in the layer.
The modeling method can predict the missing data by using models such as regression, Bayes, random forests, decision trees and the like. For example: a decision tree can be constructed to predict the value of missing values using attributes of other data in the data set. In general, there is no uniform flow for processing missing data values, and the method must be selected according to the distribution of actual data, the degree of skew, the proportion of missing values, and the like. In the data preprocessing process, a modeling method is adopted for filling under more conditions except for using a simple filling method and deleting, and the method is mainly characterized in that the modeling method predicts unknown values according to existing values and has high accuracy. However, modeling may also cause the correlation between attributes to become large, which may affect the training of the final model.
For the abnormal value, the processing method comprises the following steps: 1> delete outliers: obviously, the deletion is abnormal and can be directly deleted when the number is small; 2> no treatment: if the algorithm is not sensitive to outliers it may not be processed, but if the algorithm is sensitive to outliers it is preferable not to use this method, as some algorithms based on distance calculations, including kmeans, knn, etc.; 3> mean value substitution: the loss information is small, and the method is simple and efficient; 4> as missing value: processing is performed according to the method for processing the missing value.
For duplicate entries, the basic idea is "sort and merge," where records in a dataset are first sorted according to a certain rule, and then whether a record is duplicate is detected by comparing whether neighboring records are similar. The method comprises two operations, namely sequencing and calculating the similarity. At present, in the competition process, a duplicate method is mainly used for judgment, and then repeated samples are simply deleted.
The blog seen at present and some cases of foreign competitions are basically processed by direct deletion, and no more creative method is seen.
Noise is a random error or variance of the measurand, which is mainly distinguished from outliers. By the formula: observed quantity (Measurement) is True Data (True Data) + Noise (Noise). Outliers belong to observations, which may be either true data-producing or noise-contributing, but in general are significantly different observations from most observations. Noise includes erroneous values or deviations from desired outlier values, but it cannot be said that noise points contain outliers, although most data mining methods discard outliers as noise or anomalies. However, in some applications (e.g., fraud detection), outlier analysis or anomaly mining may be performed on outliers. And some points belong to outliers locally, but are normal from a global perspective.
The noise is mainly processed by a box separation method and a regression method. The binning method smoothes out ordered data values by looking at "neighbors" of the data. These ordered values are distributed into a number of "buckets" or bins. Since the binning method looks at the values of the neighbors, it performs local smoothing. Smoothing with a box mean value: each value in the bin is replaced by the average value in the bin. Smoothing with bin number: each value in a bin is replaced by a median in the bin. Smoothing with bin boundaries: the maximum and minimum values in the bin are also considered as boundaries. Each value in the bin is replaced by the nearest boundary value. Generally, the greater the width, the more pronounced the smoothing effect. The bins may also be of equal width, with the range of intervals for each bin value being a constant. Binning may also be used as a discretization technique. A function may be used to fit the data to the regression method to smooth the data. Linear regression involves finding the "best" straight line that fits two attributes (or variables) so that one attribute can predict the other. Multiple linear regression is an extension of linear regression that involves more than two attributes and the data is fit to a multidimensional surface. Using regression, a mathematical equation is found that fits the data, which can help eliminate noise.
And (3) carrying out hardware encryption on the user portrait data source obtained after preprocessing, and storing by adopting a block chain technology and a block structure. At the same time, detailed information such as the date of the transaction and the link to the previous block is also contained. The system for realizing the method of the invention realizes distributed deployment, adopts TCP/IP protocol, carries out communication transmission in a point-to-point mode, and transmits data in a Local Area Network (LAN).
In step 103, a service classification label of the user to be portrait is determined according to the user information of the user to be portrait, and behavior analysis is performed according to the determined service classification label to obtain a user label.
Preferably, the determining a service classification tag of a user to be portrait according to user information of the user to be portrait, and performing behavior analysis according to the determined service classification tag to obtain the user tag includes:
determining a service classification label of a user to be imaged according to different dimension information; wherein the dimension information comprises: basic conditions, business requirements and demand concerns;
extracting the business classification labels of the users in an unsupervised mode, establishing a word bag for the label of each sample, and training the word bag, wherein the training process is the process of collecting the word bag of each label; the training samples are vectorized, training is carried out through a machine learning classification model or a seq2seq depth model, a more accurate bag of words is obtained, and labeling is carried out according to portrait dimensions of a user so as to obtain a user label.
In the embodiment of the present invention, the portrait analysis mainly has two processes: behavioral analysis and portrait generation. The behavior analysis is to determine the basic class label and the analysis class label of the user to be imaged and perform behavior analysis; meanwhile, the related enterprise data (upstream and downstream industrial chains, companies of the group) are combined and analyzed. Wherein, the set classification label comprises the following dimensions: basic conditions are as follows: such as "company nature, establishment time, enterprise size, registered funds, investment situation, patent, trademark, office location", etc.; and (3) operating conditions are as follows: such as "business income, product service", etc.; service requirements are as follows: such as "medium performance equipment needed", etc.; the demand concern is as follows: such as "device durability", "low noise", etc. Analysis process, notch: adopting unsupervised extraction of enterprise service labels; meanwhile, establishing a word bag for the label of each sample, and training the word bag, wherein the training process is the process of collecting the word bag of each label; and moreover, training samples are vectorized, and a machine learning classification model or a seq2seq depth model is used for training to obtain more accurate word bags. And labeling the analysis result according to the dimension of the enterprise portrait, finally arranging the analysis result, outputting a user label, and finishing the user portrait.
According to the embodiment of the invention, the user tax data is collected, the associated user data is combined, the analysis is carried out, the user basic portrait is constructed, and the accuracy of constructing the user portrait can be improved; the system adopts software and hardware combination, adopts block chain technology storage analysis, supports local area network communication, adopts independent device and distributed deployment, adopts point-to-point mode transmission, and can improve the safety and efficiency of user portrait construction.
FIG. 2 is a block diagram of a system 200 for constructing a user representation, according to an embodiment of the invention. As shown in FIG. 2, embodiments of the present invention provide a system 200 for constructing a user representation, comprising: a data acquisition unit 201, a data processing unit 202, and a portrait analysis unit 203.
Preferably, the data acquiring unit is configured to acquire original data used for constructing a user portrait; wherein the initial data comprises: tax data, behavior information generated by tax control equipment, position information of the tax control equipment and enterprise network data.
Preferably, wherein the tax data comprises: registering and identifying data, declaring collection data, illegal violation data, preferential deduction data, invoice data, evaluation certification data, tax payment credit data and tax-related risk data; the behavior information includes: billing behavior, tax copying behavior, card clearing behavior and invoice drawing behavior.
Preferably, the data processing unit 202 is configured to pre-process the acquired raw data for constructing the user portrait according to a preset processing rule to acquire a user portrait data source in a standard format, and store and transmit the user portrait data source according to a preset storage and transmission manner.
Preferably, the data processing unit 202, pre-processes the acquired raw data for constructing the user representation according to a preset processing rule, including:
processing missing value, abnormal value, de-duplication and noise of the acquired original data for constructing the user portrait to acquire a user portrait data source in a standard format;
preferably, the data processing unit 202, storing and transmitting the user image data source according to a preset storage and transmission mode, includes:
carrying out hardware encryption on data with a standard structure, and storing the data with a block structure by adopting a block chain technology; wherein the data of the standard structure includes: a user representation data source, a transaction date, and link information to a previous block;
the user representation data source is transmitted within the local area network using a point-to-point mode transmission based on a TCP/IP protocol.
Preferably, the portrait analyzing unit 203 is configured to determine a service classification tag of the user to be portrait according to the user information of the user to be portrait, and perform behavior analysis according to the determined service classification tag to obtain the user tag.
Preferably, the portrait analysis unit 203 determines a service classification tag of the user to be portrait according to the user information of the user to be portrait, and performs behavior analysis according to the determined service classification tag to obtain the user tag, including:
determining a service classification label of a user to be imaged according to different dimension information; wherein the dimension information comprises: basic conditions, business requirements and demand concerns;
extracting the business classification labels of the users in an unsupervised mode, establishing a word bag for the label of each sample, and training the word bag, wherein the training process is the process of collecting the word bag of each label; the training samples are vectorized, training is carried out through a machine learning classification model or a seq2seq depth model, a more accurate bag of words is obtained, and labeling is carried out according to portrait dimensions of a user so as to obtain a user label.
The system 200 for constructing a user representation according to an embodiment of the present invention corresponds to the method 100 for constructing a user representation according to another embodiment of the present invention, and will not be described herein again.
The invention has been described with reference to a few embodiments. However, other embodiments of the invention than the one disclosed above are equally possible within the scope of the invention, as would be apparent to a person skilled in the art from the appended patent claims.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the [ device, component, etc ]" are to be interpreted openly as referring to at least one instance of said device, component, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (10)

1. A method for constructing a user representation, the method comprising:
acquiring original data for constructing a user portrait; wherein the initial data comprises: tax data, behavior information generated by tax control equipment, position information of the tax control equipment and enterprise network data;
preprocessing the acquired original data for constructing the user portrait according to a preset processing rule to acquire a user portrait data source in a standard format, and storing and transmitting the user portrait data source according to a preset storage and transmission mode;
and determining a service classification label of the user to be portrait according to the user information of the user to be portrait, and performing behavior analysis according to the determined service classification label to obtain the user label.
2. The method of claim 1, wherein the tax data comprises:
registering and identifying data, declaring collection data, illegal violation data, preferential deduction data, invoice data, evaluation certification data, tax payment credit data and tax-related risk data; the behavior information includes: billing behavior, tax copying behavior, card clearing behavior and invoice drawing behavior.
3. The method of claim 1, wherein preprocessing the acquired raw data for constructing the user representation according to a preset processing rule comprises:
and performing missing value processing, abnormal value processing, de-duplication processing and noise processing on the acquired original data for constructing the user portrait to acquire a user portrait data source in a standard format.
4. The method of claim 1, wherein storing and transmitting the user image data source according to a preset storing and transmitting manner comprises:
carrying out hardware encryption on data with a standard structure, and storing the data with a block structure by adopting a block chain technology; wherein the data of the standard structure includes: a user representation data source, a transaction date, and link information to a previous block;
the user representation data source is transmitted within the local area network using a point-to-point mode transmission based on a TCP/IP protocol.
5. The method according to claim 1, wherein the determining a service classification label of the user to be pictured according to the user information of the user to be pictured, and performing behavior analysis according to the determined service classification label to obtain the user label comprises:
determining a service classification label of a user to be imaged according to different dimension information; wherein the dimension information comprises: basic conditions, business requirements and demand concerns;
extracting the business classification labels of the users in an unsupervised mode, establishing a word bag for the label of each sample, and training the word bag, wherein the training process is the process of collecting the word bag of each label; the training samples are vectorized, training is carried out through a machine learning classification model or a seq2seq depth model, a more accurate bag of words is obtained, and labeling is carried out according to portrait dimensions of a user so as to obtain a user label.
6. A system for constructing a representation of a user, the system comprising:
the data acquisition unit is used for acquiring original data used for constructing a user portrait; wherein the initial data comprises: tax data, behavior information generated by tax control equipment, position information of the tax control equipment and enterprise network data;
the data processing unit is used for preprocessing the acquired original data for constructing the user portrait according to a preset processing rule so as to acquire a user portrait data source in a standard format and store and transmit the user portrait data source according to a preset storage and transmission mode;
and the portrait analyzing unit is used for determining the service classification label of the user to be portrait according to the user information of the user to be portrait and performing behavior analysis according to the determined service classification label to acquire the user label.
7. The system of claim 6, wherein the tax data comprises:
registering and identifying data, declaring collection data, illegal violation data, preferential deduction data, invoice data, evaluation certification data, tax payment credit data and tax-related risk data; the behavior information includes: billing behavior, tax copying behavior, card clearing behavior and invoice drawing behavior.
8. The system of claim 6, wherein the data processing unit preprocesses the acquired raw data for constructing the user representation according to a preset processing rule, and comprises:
and performing missing value processing, abnormal value processing, de-duplication processing and noise processing on the acquired original data for constructing the user portrait to acquire a user portrait data source in a standard format.
9. The system of claim 6, wherein the data processing unit, storing and transmitting the user image data source according to a preset storing and transmitting manner, comprises:
carrying out hardware encryption on data with a standard structure, and storing the data with a block structure by adopting a block chain technology; wherein the data of the standard structure includes: a user representation data source, a transaction date, and link information to a previous block;
the user representation data source is transmitted within the local area network using a point-to-point mode transmission based on a TCP/IP protocol.
10. The system of claim 6, wherein the portrait analysis unit determines a service classification tag of the user to be portrait according to user information of the user to be portrait, and performs behavior analysis according to the determined service classification tag to obtain the user tag, comprising:
determining a service classification label of a user to be imaged according to different dimension information; wherein the dimension information comprises: basic conditions, business requirements and demand concerns;
extracting the business classification labels of the users in an unsupervised mode, establishing a word bag for the label of each sample, and training the word bag, wherein the training process is the process of collecting the word bag of each label; the training samples are vectorized, training is carried out through a machine learning classification model or a seq2seq depth model, a more accurate bag of words is obtained, and labeling is carried out according to portrait dimensions of a user so as to obtain a user label.
CN201911379627.1A 2019-12-27 2019-12-27 Method and system for constructing user portrait Pending CN111210326A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911379627.1A CN111210326A (en) 2019-12-27 2019-12-27 Method and system for constructing user portrait

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911379627.1A CN111210326A (en) 2019-12-27 2019-12-27 Method and system for constructing user portrait

Publications (1)

Publication Number Publication Date
CN111210326A true CN111210326A (en) 2020-05-29

Family

ID=70785821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911379627.1A Pending CN111210326A (en) 2019-12-27 2019-12-27 Method and system for constructing user portrait

Country Status (1)

Country Link
CN (1) CN111210326A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626792A (en) * 2020-06-01 2020-09-04 长沙理工大学 Technology for accurately portraying load storage of comprehensive energy source in distribution network
CN111723257A (en) * 2020-06-24 2020-09-29 山东建筑大学 User portrait drawing method and system based on water usage law
CN112036997A (en) * 2020-08-28 2020-12-04 山东浪潮商用系统有限公司 Method and device for predicting abnormal user in taxpayer
CN112131475A (en) * 2020-09-25 2020-12-25 重庆邮电大学 Interpretable and interactive user portrait method and device
CN112182391A (en) * 2020-09-30 2021-01-05 北京神州泰岳智能数据技术有限公司 User portrait drawing method and device
CN112257803A (en) * 2020-10-30 2021-01-22 青岛东软载波科技股份有限公司 Intelligent analysis method and system for transformer area faults
CN112256640A (en) * 2020-09-28 2021-01-22 福建慧政通信息科技有限公司 File user portrait information processing method and storage device based on service scene
CN112560054A (en) * 2020-12-14 2021-03-26 珠海格力电器股份有限公司 User data processing method and device, electronic equipment and storage medium
CN112613902A (en) * 2020-12-15 2021-04-06 航天信息股份有限公司 Method and system for establishing user portrait
CN112861003A (en) * 2021-02-19 2021-05-28 杭州谐云科技有限公司 User portrait construction method and system based on cloud edge collaboration
CN113486041A (en) * 2021-08-02 2021-10-08 南京邮电大学 Client portrait management method and system based on block chain
CN114119058A (en) * 2021-08-10 2022-03-01 国家电网有限公司 User portrait model construction method and device and storage medium
CN114841589A (en) * 2022-05-17 2022-08-02 国网浙江省电力有限公司舟山供电公司 Potential safety hazard information code generation method for electric power member violation portrait and safety portrait
CN112613902B (en) * 2020-12-15 2024-06-07 航天信息股份有限公司 Method and system for establishing user portrait

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060085201A1 (en) * 2002-02-25 2006-04-20 Tarek Sultan Customs inspection and data processing system and method thereof for web-based processing of customs information
US20130151439A1 (en) * 2011-12-07 2013-06-13 Daniel O. Galaska Systems and Methods for Automated, User-Specific, Location-Based, Comprehensive Tax Burden Calculation, Analysis, and Display from a Personal Financial Profile
CN105608171A (en) * 2015-12-22 2016-05-25 青岛海贝易通信息技术有限公司 User portrait construction method
CN106874266A (en) * 2015-12-10 2017-06-20 中国电信股份有限公司 User's portrait method and the device for user's portrait
CN106897402A (en) * 2017-02-13 2017-06-27 山大地纬软件股份有限公司 The method and user's portrait maker of user's portrait are built based on social security data
CN107784532A (en) * 2017-10-23 2018-03-09 百望金赋科技有限公司 High in the clouds billing system and billing method based on integrated tax control machine
CN108460100A (en) * 2018-02-02 2018-08-28 方欣科技有限公司 A kind of user draws a portrait construction method and device
CN108596679A (en) * 2018-04-27 2018-09-28 中国联合网络通信集团有限公司 Construction method, device, terminal and the computer readable storage medium of user's portrait
CN109658478A (en) * 2017-10-10 2019-04-19 爱信诺征信有限公司 It is a kind of that the method and system of enterprise's portrait are provided
CN109872173A (en) * 2017-12-04 2019-06-11 北京京东尚科信息技术有限公司 Construct method, system and the terminal device of user's portrait label
CN109903097A (en) * 2019-03-05 2019-06-18 云南电网有限责任公司信息中心 A kind of user draws a portrait construction method and user draws a portrait construction device
CN109901869A (en) * 2019-01-25 2019-06-18 中国电子科技集团公司第三十研究所 A kind of computer program classification method based on bag of words
CN109934619A (en) * 2019-02-13 2019-06-25 北京三快在线科技有限公司 User's portrait tag modeling method, apparatus, electronic equipment and readable storage medium storing program for executing
CN109993644A (en) * 2017-12-29 2019-07-09 航天信息股份有限公司 A kind of portrait determines method, apparatus, electronic equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060085201A1 (en) * 2002-02-25 2006-04-20 Tarek Sultan Customs inspection and data processing system and method thereof for web-based processing of customs information
US20130151439A1 (en) * 2011-12-07 2013-06-13 Daniel O. Galaska Systems and Methods for Automated, User-Specific, Location-Based, Comprehensive Tax Burden Calculation, Analysis, and Display from a Personal Financial Profile
CN106874266A (en) * 2015-12-10 2017-06-20 中国电信股份有限公司 User's portrait method and the device for user's portrait
CN105608171A (en) * 2015-12-22 2016-05-25 青岛海贝易通信息技术有限公司 User portrait construction method
CN106897402A (en) * 2017-02-13 2017-06-27 山大地纬软件股份有限公司 The method and user's portrait maker of user's portrait are built based on social security data
CN109658478A (en) * 2017-10-10 2019-04-19 爱信诺征信有限公司 It is a kind of that the method and system of enterprise's portrait are provided
CN107784532A (en) * 2017-10-23 2018-03-09 百望金赋科技有限公司 High in the clouds billing system and billing method based on integrated tax control machine
CN109872173A (en) * 2017-12-04 2019-06-11 北京京东尚科信息技术有限公司 Construct method, system and the terminal device of user's portrait label
CN109993644A (en) * 2017-12-29 2019-07-09 航天信息股份有限公司 A kind of portrait determines method, apparatus, electronic equipment and storage medium
CN108460100A (en) * 2018-02-02 2018-08-28 方欣科技有限公司 A kind of user draws a portrait construction method and device
CN108596679A (en) * 2018-04-27 2018-09-28 中国联合网络通信集团有限公司 Construction method, device, terminal and the computer readable storage medium of user's portrait
CN109901869A (en) * 2019-01-25 2019-06-18 中国电子科技集团公司第三十研究所 A kind of computer program classification method based on bag of words
CN109934619A (en) * 2019-02-13 2019-06-25 北京三快在线科技有限公司 User's portrait tag modeling method, apparatus, electronic equipment and readable storage medium storing program for executing
CN109903097A (en) * 2019-03-05 2019-06-18 云南电网有限责任公司信息中心 A kind of user draws a portrait construction method and user draws a portrait construction device

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626792A (en) * 2020-06-01 2020-09-04 长沙理工大学 Technology for accurately portraying load storage of comprehensive energy source in distribution network
CN111723257B (en) * 2020-06-24 2023-05-02 山东建筑大学 User portrayal method and system based on water usage rule
CN111723257A (en) * 2020-06-24 2020-09-29 山东建筑大学 User portrait drawing method and system based on water usage law
CN112036997A (en) * 2020-08-28 2020-12-04 山东浪潮商用系统有限公司 Method and device for predicting abnormal user in taxpayer
CN112036997B (en) * 2020-08-28 2023-08-04 浪潮软件科技有限公司 Method and device for predicting abnormal users in taxpayers
CN112131475A (en) * 2020-09-25 2020-12-25 重庆邮电大学 Interpretable and interactive user portrait method and device
CN112131475B (en) * 2020-09-25 2023-10-10 重庆邮电大学 Interpretable and interactive user portrayal method and device
CN112256640A (en) * 2020-09-28 2021-01-22 福建慧政通信息科技有限公司 File user portrait information processing method and storage device based on service scene
CN112182391A (en) * 2020-09-30 2021-01-05 北京神州泰岳智能数据技术有限公司 User portrait drawing method and device
CN112257803A (en) * 2020-10-30 2021-01-22 青岛东软载波科技股份有限公司 Intelligent analysis method and system for transformer area faults
CN112560054A (en) * 2020-12-14 2021-03-26 珠海格力电器股份有限公司 User data processing method and device, electronic equipment and storage medium
CN112613902A (en) * 2020-12-15 2021-04-06 航天信息股份有限公司 Method and system for establishing user portrait
CN112613902B (en) * 2020-12-15 2024-06-07 航天信息股份有限公司 Method and system for establishing user portrait
CN112861003A (en) * 2021-02-19 2021-05-28 杭州谐云科技有限公司 User portrait construction method and system based on cloud edge collaboration
CN113486041A (en) * 2021-08-02 2021-10-08 南京邮电大学 Client portrait management method and system based on block chain
CN113486041B (en) * 2021-08-02 2022-04-15 南京邮电大学 Client portrait management method and system based on block chain
CN114119058A (en) * 2021-08-10 2022-03-01 国家电网有限公司 User portrait model construction method and device and storage medium
CN114119058B (en) * 2021-08-10 2023-09-26 国家电网有限公司 User portrait model construction method, device and storage medium
CN114841589A (en) * 2022-05-17 2022-08-02 国网浙江省电力有限公司舟山供电公司 Potential safety hazard information code generation method for electric power member violation portrait and safety portrait
CN114841589B (en) * 2022-05-17 2022-12-06 国网浙江省电力有限公司舟山供电公司 Potential safety hazard information code generation method for electric power member violation portrait and safety portrait

Similar Documents

Publication Publication Date Title
CN111210326A (en) Method and system for constructing user portrait
CN110223168B (en) Label propagation anti-fraud detection method and system based on enterprise relationship map
US10367888B2 (en) Cloud process for rapid data investigation and data integrity analysis
CN106022900B (en) User risk data mining method and device
CN110400215B (en) Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model
CN111222976B (en) Risk prediction method and device based on network map data of two parties and electronic equipment
CN101493913A (en) Method and system for assessing user credit in internet
CN110060087B (en) Abnormal data detection method, device and server
CN110647522A (en) Data mining method, device and system
CN112070615A (en) Financial product recommendation method and device based on knowledge graph
US20190080352A1 (en) Segment Extension Based on Lookalike Selection
CN112329874A (en) Data service decision method and device, electronic equipment and storage medium
Kwon et al. User profiling via application usage pattern on digital devices for digital forensics
CN111738843A (en) Quantitative risk evaluation system and method using running water data
CN112217908B (en) Information pushing method and device based on transfer learning and computer equipment
CN112581291B (en) Risk assessment change detection method, apparatus, device and storage medium
CN113570437A (en) Product recommendation method and device
CN116739605A (en) Transaction data detection method, device, equipment and storage medium
CN110796381A (en) Method and device for processing evaluation indexes of modeling data, terminal equipment and medium
CN115471258A (en) Violation behavior detection method and device, electronic equipment and storage medium
CN116126642A (en) Information processing method, device, equipment and storage medium
CN115358878A (en) Financing user risk preference level analysis method and device
CN114723554A (en) Abnormal account identification method and device
CN112506930B (en) Data insight system based on machine learning technology
CN114331463A (en) Risk identification method based on linear regression model and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination