CN113901318A - User portrait construction system - Google Patents
User portrait construction system Download PDFInfo
- Publication number
- CN113901318A CN113901318A CN202111191345.6A CN202111191345A CN113901318A CN 113901318 A CN113901318 A CN 113901318A CN 202111191345 A CN202111191345 A CN 202111191345A CN 113901318 A CN113901318 A CN 113901318A
- Authority
- CN
- China
- Prior art keywords
- data
- user
- word
- database
- construction system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title claims abstract description 15
- 230000011218 segmentation Effects 0.000 claims abstract description 27
- 238000012545 processing Methods 0.000 claims abstract description 22
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 230000000007 visual effect Effects 0.000 claims abstract description 7
- 238000006243 chemical reaction Methods 0.000 claims abstract description 5
- 238000011835 investigation Methods 0.000 claims description 5
- 238000004140 cleaning Methods 0.000 claims description 4
- 230000008450 motivation Effects 0.000 claims description 3
- 230000002123 temporal effect Effects 0.000 claims description 3
- 238000000034 method Methods 0.000 abstract description 3
- 238000012954 risk control Methods 0.000 abstract description 2
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 3
- 229910052737 gold Inorganic materials 0.000 description 3
- 239000010931 gold Substances 0.000 description 3
- 238000011160 research Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Strategic Management (AREA)
- Artificial Intelligence (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Entrepreneurship & Innovation (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a user portrait construction system, which comprises: the data acquisition unit is used for acquiring data from different data sources and establishing a user set according to the ID of the user; the data processing unit is used for processing the user data in the data set and storing the processed data in the word segmentation database; the characteristic extraction unit is used for extracting characteristics from the data distributed and stored in the database and storing the characteristics into the characteristic database; the semantic conversion unit is used for converting language vocabularies in the characteristic database into computer languages; and generating a user portrait, defining a label and constructing model characteristics, and constructing an integral portrait model through data association and rule definition. According to the method, the visual feature set of the user set is convenient for making user figures with different dimensions subsequently, so that more accurate data can be obtained, meanwhile, the hidden requirements of the user are further known through the hidden feature set, the sales strategy is convenient to adjust, and the accuracy and the scientification of risk control are realized.
Description
Technical Field
The invention relates to the field of data processing, in particular to a user portrait construction system.
Background
The user portrait is used for outlining the contents of the background, characteristics, characters, behavior scenes and the like of a user, aims to 'silver making and gold digging' in mass user behavior data, and closely abstracts the information overview of one or a class of users through tag models in the aspects of basic attributes, purchasing ability, behavior characteristics, social networks, psychological characteristics, interests and hobbies and the like of the user acquired after data analysis, thereby helping internet enterprises to solve the problem of converting data into commercial value.
The new retail is a brand-new state beyond e-commerce, m-commerce and real estate retail with the development of the mobile internet, and is a data-driven universal retail form with the consumer experience as the center, and comprises the categories of new retail, new e-commerce, local o20, universal entertainment, new media, new finance, new logistics, new manufacturing and the like. The biggest characteristics of new retail are online-offline fusion, multi-scene multi-dimensionality, technology and data enabling, online and offline mutual diversion and mutual superposition, and a brand-new scene mode is brought. The current situation that the user representation system applied to the consumption field still has inaccurate positioning is a problem which needs to be solved in the face of the rapid increase of the number of new retail users and the huge increase of information, how to better provide personalized service for the users in the new retail era, and enable the users to obtain the best user experience.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the defects of the prior art and provide a user portrait construction system.
In order to solve the technical problems, the invention provides the following technical scheme:
the invention discloses a user portrait construction system, which comprises:
the data acquisition unit acquires data from different data sources through the thread and establishes a user set according to the ID of the user;
the data processing unit is used for processing the user data in the data set and storing the processed data in the word segmentation database;
a feature extraction unit for extracting features from the data distributed and stored in the database and storing the extracted features in a feature database,
the semantic conversion unit is used for converting language vocabularies in the characteristic database into computer languages;
and generating a user portrait, defining a label and constructing model characteristics, and constructing an integral portrait model through data association and rule definition.
As a preferred technical solution of the present invention, the data source of the data acquisition unit includes industry data, user general data, general browsing data, general content data, user attribute data, user behavior data, user growth data, access depth, questionnaire research, user interview, user participation data, and user click data, and the access depth, questionnaire research, and user interview are stored as internal data in an encrypted manner.
As a preferred technical scheme of the invention, the data of the user set is divided into a visualization feature set and a hidden feature set, the visualization feature set comprises basic features, network features and use features, and the hidden feature set comprises purposes, preferences, requirements, frequency, scenes and a historical search word list.
As a preferred technical solution of the present invention, the data processing unit includes:
the data cleaning module is used for deleting the empty words in the user set, detecting errors and inconsistency in the word segmentation database and removing and correcting error data;
the word segmentation processing module is used for dividing the described document;
and the self-defined word module automatically divides the word classes according to the classification requirements, and simultaneously realizes data exchange by adopting an XML data exchange frame.
As a preferred technical scheme of the invention, in the word segmentation processing module, a final word is firstly adopted, then a regular expression and a stop word are adopted for carrying out accurate word segmentation, and fuzzy sound words, near-meaning words and associated words in the self-defined words are all summarized into the same class.
As a preferred technical scheme of the invention, the characteristic extraction unit adopts TF-IDF algorithm to select the vocabulary with the highest frequency of occurrence from the word segmentation library,
wherein, TF (c, y): the frequency of the word c among the segmentation class y;
w: dividing the total number of parts of speech in the word library;
df (c): the number of participles containing word c.
In a preferred embodiment of the present invention, the representation model includes a representation of the base attributes, a representation based on consumption behavior, a representation based on temporal space, a representation based on usage motivation, and a representation based on usage behavior.
Compared with the prior art, the invention has the following beneficial effects:
according to the method, the visual feature set of the user set is convenient for making user figures with different dimensions in the follow-up process, so that more accurate data can be obtained, meanwhile, the hidden requirements of the user are further known through the hidden feature set, the sales strategy is convenient to adjust, and the accuracy and the scientification of risk control are realized; meanwhile, the characteristic data can be automatically adjusted according to requirements, and accuracy and efficiency of user portrait construction are improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a system framework diagram of the present invention;
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Example 1
As shown in FIG. 1, the present invention provides a user representation construction system, comprising:
the data acquisition unit acquires data from different data sources through the thread and establishes a user set according to the ID of the user;
the data processing unit is used for processing the user data in the data set and storing the processed data in the word segmentation database;
a feature extraction unit for extracting features from the data distributed and stored in the database and storing the extracted features in a feature database,
the semantic conversion unit is used for converting language vocabularies in the characteristic database into computer languages;
and generating a user portrait, defining a label and constructing model characteristics, and constructing an integral portrait model through data association and rule definition.
Further, the data source of the data acquisition unit comprises industry data, user overall data, overall browsing data, overall content data, user attribute data, user behavior data, user growth data, access depth, questionnaire investigation, user interview, user participation data and user click data, and the access depth, questionnaire investigation and user interview are encrypted and stored as internal data.
The data of the user set are divided into a visual characteristic set and a hidden characteristic set, the visual characteristic set comprises basic characteristics, network characteristics and use characteristics, and the hidden characteristic set comprises purposes, preferences, requirements, frequency, scenes and a historical search word list.
The data processing unit includes:
the data cleaning module is used for deleting the empty words in the user set, detecting errors and inconsistency in the word segmentation database and removing and correcting error data;
the word segmentation processing module is used for dividing the described document;
and the self-defined word module automatically divides the word classes according to the classification requirements, and simultaneously realizes data exchange by adopting an XML data exchange frame.
In the word segmentation processing module, a result word is firstly adopted, then a regular expression and a stop word are adopted for carrying out accurate word segmentation, and fuzzy sound words, near meaning words and associated words in the user-defined words are all summarized into the same class.
The feature extraction unit selects the vocabulary with the highest frequency of occurrence from the word segmentation library by adopting a TF-IDF algorithm,
wherein, TF (c, y): the frequency of the word c among the segmentation class y;
w: dividing the total number of parts of speech in the word library;
df (c): the number of participles containing word c.
The representation model includes representations of base attributes, representations based on consumption behavior, representations based on temporal space, representations based on usage motivation, representations based on usage behavior.
Specifically, in the first step, a data acquisition unit encrypts and stores industry data, user general data, general browsing data, general content data, user attribute data, user behavior data, user growth data, access depth, questionnaire investigation, user interview, user participation data and user click data as internal data to acquire related data, each user in a user set corresponds to a different ID, and basic characteristics in a visual feature set comprise age, gender, occupation and region; the network characteristics comprise internet surfing time, time and influence factors, and the use characteristics comprise use frequency, time and time; the hidden feature set comprises a purpose, a preference, a demand, a frequency, a scene and a historical search word list; and secondly, processing the data through a data processing unit, and cleaning, segmenting words and processing custom words for the data, for example: before word segmentation processing: international gold tendency, after word segmentation treatment: gold, storing the classified characteristic words into a word segmentation database; thirdly, extracting the characteristics of the words in the word segmentation databases, and selecting keywords as the characteristics from the words through a TF-IDF algorithm; fourthly, converting the feature text into a computer language through a semantic conversion unit; and fifthly, selecting the user characteristics related to the label in the characteristic database for each label to generate different types of user portraits.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (7)
1. A user representation construction system, comprising:
the data acquisition unit acquires data from different data sources through the thread and establishes a user set according to the ID of the user;
the data processing unit is used for processing the user data in the data set and storing the processed data in the word segmentation database;
the characteristic extraction unit is used for extracting characteristics from the data distributed and stored in the database and storing the characteristics into the characteristic database;
the semantic conversion unit is used for converting language vocabularies in the characteristic database into computer languages;
and generating a user portrait, defining a label and constructing model characteristics, and constructing an integral portrait model through data association and rule definition.
2. A user representation construction system as claimed in claim 1, wherein the data sources of said data acquisition unit comprise industry data, user population data, population browsing data, population content data, user attribute data, user behavior data, user growth data, access depth, questionnaire investigation, user interview, user engagement data and user click data, said access depth, questionnaire investigation and user interview being stored as internal data in encrypted form.
3. A user representation construction system according to claim 2, wherein the data of the user set is divided into a visual feature set and a hidden feature set, the visual feature set includes basic features, network features and use features, and the hidden feature set includes purpose, preference, demand, frequency, scene, and historical search word list.
4. A user representation construction system as claimed in claim 1, wherein said data processing unit comprises:
the data cleaning module is used for deleting the empty words in the user set, detecting errors and inconsistency in the word segmentation database and removing and correcting error data;
the word segmentation processing module is used for dividing the described document;
and the self-defined word module automatically divides the word classes according to the classification requirements, and simultaneously realizes data exchange by adopting an XML data exchange frame.
5. A user representation construction system as claimed in claim 4, wherein said segmentation processing module firstly uses the final segmentation, and then uses the regular expression and stop word to perform the precise segmentation, and the fuzzy sound word, the near meaning word and the association word in the self-defined word are all summarized into the same category.
6. The user representation construction system of claim 1, wherein the feature extraction unit selects a word with the highest frequency of occurrence from the segmentation library by using TF-IDF algorithm,
wherein, TF (c, y): the frequency of the word c among the segmentation class y;
w: dividing the total number of parts of speech in the word library;
df (c): the number of participles containing word c.
7. A user representation construction system in accordance with claim 1, wherein said representation model comprises a representation of base attributes, a representation based on consumption behavior, a representation based on temporal space, a representation based on usage motivation, a representation based on usage behavior.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111191345.6A CN113901318A (en) | 2021-10-13 | 2021-10-13 | User portrait construction system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111191345.6A CN113901318A (en) | 2021-10-13 | 2021-10-13 | User portrait construction system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113901318A true CN113901318A (en) | 2022-01-07 |
Family
ID=79191772
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111191345.6A Pending CN113901318A (en) | 2021-10-13 | 2021-10-13 | User portrait construction system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113901318A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114048283A (en) * | 2022-01-11 | 2022-02-15 | 北京仁科互动网络技术有限公司 | User portrait generation method and device, electronic equipment and storage medium |
CN116821287A (en) * | 2023-08-28 | 2023-09-29 | 湖南创星科技股份有限公司 | Knowledge graph and large language model-based user psychological portrait system and method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107578292A (en) * | 2017-09-19 | 2018-01-12 | 上海财经大学 | A kind of user's portrait constructing system |
CN111597330A (en) * | 2019-02-21 | 2020-08-28 | 中国科学院信息工程研究所 | Intelligent expert recommendation-oriented user image drawing method based on support vector machine |
CN112990973A (en) * | 2021-03-22 | 2021-06-18 | 山东顺能网络科技有限公司 | Online shop portrait construction method and system |
CN113032556A (en) * | 2019-12-25 | 2021-06-25 | 厦门铠甲网络股份有限公司 | Method for forming user portrait based on natural language processing |
-
2021
- 2021-10-13 CN CN202111191345.6A patent/CN113901318A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107578292A (en) * | 2017-09-19 | 2018-01-12 | 上海财经大学 | A kind of user's portrait constructing system |
CN111597330A (en) * | 2019-02-21 | 2020-08-28 | 中国科学院信息工程研究所 | Intelligent expert recommendation-oriented user image drawing method based on support vector machine |
CN113032556A (en) * | 2019-12-25 | 2021-06-25 | 厦门铠甲网络股份有限公司 | Method for forming user portrait based on natural language processing |
CN112990973A (en) * | 2021-03-22 | 2021-06-18 | 山东顺能网络科技有限公司 | Online shop portrait construction method and system |
Non-Patent Citations (2)
Title |
---|
王英: "高校科研用户画像特征分析及案例研究", 《图书馆理论与实践》, no. 4, 31 August 2020 (2020-08-31), pages 2 - 3 * |
许鹏程: "数据驱动下数字图书馆用户画像模型构建", 《图书情报工作》, vol. 63, no. 3, 31 March 2019 (2019-03-31), pages 2 - 3 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114048283A (en) * | 2022-01-11 | 2022-02-15 | 北京仁科互动网络技术有限公司 | User portrait generation method and device, electronic equipment and storage medium |
CN116821287A (en) * | 2023-08-28 | 2023-09-29 | 湖南创星科技股份有限公司 | Knowledge graph and large language model-based user psychological portrait system and method |
CN116821287B (en) * | 2023-08-28 | 2023-11-17 | 湖南创星科技股份有限公司 | Knowledge graph and large language model-based user psychological portrait system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109359244B (en) | Personalized information recommendation method and device | |
CN111753060B (en) | Information retrieval method, apparatus, device and computer readable storage medium | |
CN107705066B (en) | Information input method and electronic equipment during commodity warehousing | |
US9348898B2 (en) | Recommendation system with dual collaborative filter usage matrix | |
US8380727B2 (en) | Information processing device and method, program, and recording medium | |
CN113011186B (en) | Named entity recognition method, named entity recognition device, named entity recognition equipment and computer readable storage medium | |
CN110728541A (en) | Information stream media advertisement creative recommendation method and device | |
CN107291840B (en) | User attribute prediction model construction method and device | |
US11741094B2 (en) | Method and system for identifying core product terms | |
CN113901318A (en) | User portrait construction system | |
CN112231569A (en) | News recommendation method and device, computer equipment and storage medium | |
CN110633398A (en) | Method for confirming central word, searching method, device and storage medium | |
CN112926308B (en) | Method, device, equipment, storage medium and program product for matching text | |
CN112990973A (en) | Online shop portrait construction method and system | |
CN105701182A (en) | Information pushing method and apparatus | |
CN115147130A (en) | Problem prediction method, apparatus, storage medium, and program product | |
CN110795613A (en) | Commodity searching method, device and system and electronic equipment | |
CN116975615A (en) | Task prediction method and device based on video multi-mode information | |
CN111460267B (en) | Object identification method, device and system | |
CN116756281A (en) | Knowledge question-answering method, device, equipment and medium | |
CN116484872A (en) | Multi-modal aspect emotion judging method and system based on pre-training and attention | |
CN116127013A (en) | Personal sensitive information knowledge graph query method and device | |
CN115391522A (en) | Text topic modeling method and system based on social platform metadata | |
Al-Saffar et al. | Survey on Implicit Feedbacks Extraction based on Yelp Dataset using Collaborative Filtering | |
CN110413899B (en) | Storage resource optimization method and system for server storage news |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |