CN113901318A - User portrait construction system - Google Patents

User portrait construction system Download PDF

Info

Publication number
CN113901318A
CN113901318A CN202111191345.6A CN202111191345A CN113901318A CN 113901318 A CN113901318 A CN 113901318A CN 202111191345 A CN202111191345 A CN 202111191345A CN 113901318 A CN113901318 A CN 113901318A
Authority
CN
China
Prior art keywords
data
user
word
database
construction system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111191345.6A
Other languages
Chinese (zh)
Inventor
王宏艳
张超英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN202111191345.6A priority Critical patent/CN113901318A/en
Publication of CN113901318A publication Critical patent/CN113901318A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a user portrait construction system, which comprises: the data acquisition unit is used for acquiring data from different data sources and establishing a user set according to the ID of the user; the data processing unit is used for processing the user data in the data set and storing the processed data in the word segmentation database; the characteristic extraction unit is used for extracting characteristics from the data distributed and stored in the database and storing the characteristics into the characteristic database; the semantic conversion unit is used for converting language vocabularies in the characteristic database into computer languages; and generating a user portrait, defining a label and constructing model characteristics, and constructing an integral portrait model through data association and rule definition. According to the method, the visual feature set of the user set is convenient for making user figures with different dimensions subsequently, so that more accurate data can be obtained, meanwhile, the hidden requirements of the user are further known through the hidden feature set, the sales strategy is convenient to adjust, and the accuracy and the scientification of risk control are realized.

Description

User portrait construction system
Technical Field
The invention relates to the field of data processing, in particular to a user portrait construction system.
Background
The user portrait is used for outlining the contents of the background, characteristics, characters, behavior scenes and the like of a user, aims to 'silver making and gold digging' in mass user behavior data, and closely abstracts the information overview of one or a class of users through tag models in the aspects of basic attributes, purchasing ability, behavior characteristics, social networks, psychological characteristics, interests and hobbies and the like of the user acquired after data analysis, thereby helping internet enterprises to solve the problem of converting data into commercial value.
The new retail is a brand-new state beyond e-commerce, m-commerce and real estate retail with the development of the mobile internet, and is a data-driven universal retail form with the consumer experience as the center, and comprises the categories of new retail, new e-commerce, local o20, universal entertainment, new media, new finance, new logistics, new manufacturing and the like. The biggest characteristics of new retail are online-offline fusion, multi-scene multi-dimensionality, technology and data enabling, online and offline mutual diversion and mutual superposition, and a brand-new scene mode is brought. The current situation that the user representation system applied to the consumption field still has inaccurate positioning is a problem which needs to be solved in the face of the rapid increase of the number of new retail users and the huge increase of information, how to better provide personalized service for the users in the new retail era, and enable the users to obtain the best user experience.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the defects of the prior art and provide a user portrait construction system.
In order to solve the technical problems, the invention provides the following technical scheme:
the invention discloses a user portrait construction system, which comprises:
the data acquisition unit acquires data from different data sources through the thread and establishes a user set according to the ID of the user;
the data processing unit is used for processing the user data in the data set and storing the processed data in the word segmentation database;
a feature extraction unit for extracting features from the data distributed and stored in the database and storing the extracted features in a feature database,
the semantic conversion unit is used for converting language vocabularies in the characteristic database into computer languages;
and generating a user portrait, defining a label and constructing model characteristics, and constructing an integral portrait model through data association and rule definition.
As a preferred technical solution of the present invention, the data source of the data acquisition unit includes industry data, user general data, general browsing data, general content data, user attribute data, user behavior data, user growth data, access depth, questionnaire research, user interview, user participation data, and user click data, and the access depth, questionnaire research, and user interview are stored as internal data in an encrypted manner.
As a preferred technical scheme of the invention, the data of the user set is divided into a visualization feature set and a hidden feature set, the visualization feature set comprises basic features, network features and use features, and the hidden feature set comprises purposes, preferences, requirements, frequency, scenes and a historical search word list.
As a preferred technical solution of the present invention, the data processing unit includes:
the data cleaning module is used for deleting the empty words in the user set, detecting errors and inconsistency in the word segmentation database and removing and correcting error data;
the word segmentation processing module is used for dividing the described document;
and the self-defined word module automatically divides the word classes according to the classification requirements, and simultaneously realizes data exchange by adopting an XML data exchange frame.
As a preferred technical scheme of the invention, in the word segmentation processing module, a final word is firstly adopted, then a regular expression and a stop word are adopted for carrying out accurate word segmentation, and fuzzy sound words, near-meaning words and associated words in the self-defined words are all summarized into the same class.
As a preferred technical scheme of the invention, the characteristic extraction unit adopts TF-IDF algorithm to select the vocabulary with the highest frequency of occurrence from the word segmentation library,
Figure BDA0003301227620000031
wherein, TF (c, y): the frequency of the word c among the segmentation class y;
w: dividing the total number of parts of speech in the word library;
df (c): the number of participles containing word c.
In a preferred embodiment of the present invention, the representation model includes a representation of the base attributes, a representation based on consumption behavior, a representation based on temporal space, a representation based on usage motivation, and a representation based on usage behavior.
Compared with the prior art, the invention has the following beneficial effects:
according to the method, the visual feature set of the user set is convenient for making user figures with different dimensions in the follow-up process, so that more accurate data can be obtained, meanwhile, the hidden requirements of the user are further known through the hidden feature set, the sales strategy is convenient to adjust, and the accuracy and the scientification of risk control are realized; meanwhile, the characteristic data can be automatically adjusted according to requirements, and accuracy and efficiency of user portrait construction are improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a system framework diagram of the present invention;
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Example 1
As shown in FIG. 1, the present invention provides a user representation construction system, comprising:
the data acquisition unit acquires data from different data sources through the thread and establishes a user set according to the ID of the user;
the data processing unit is used for processing the user data in the data set and storing the processed data in the word segmentation database;
a feature extraction unit for extracting features from the data distributed and stored in the database and storing the extracted features in a feature database,
the semantic conversion unit is used for converting language vocabularies in the characteristic database into computer languages;
and generating a user portrait, defining a label and constructing model characteristics, and constructing an integral portrait model through data association and rule definition.
Further, the data source of the data acquisition unit comprises industry data, user overall data, overall browsing data, overall content data, user attribute data, user behavior data, user growth data, access depth, questionnaire investigation, user interview, user participation data and user click data, and the access depth, questionnaire investigation and user interview are encrypted and stored as internal data.
The data of the user set are divided into a visual characteristic set and a hidden characteristic set, the visual characteristic set comprises basic characteristics, network characteristics and use characteristics, and the hidden characteristic set comprises purposes, preferences, requirements, frequency, scenes and a historical search word list.
The data processing unit includes:
the data cleaning module is used for deleting the empty words in the user set, detecting errors and inconsistency in the word segmentation database and removing and correcting error data;
the word segmentation processing module is used for dividing the described document;
and the self-defined word module automatically divides the word classes according to the classification requirements, and simultaneously realizes data exchange by adopting an XML data exchange frame.
In the word segmentation processing module, a result word is firstly adopted, then a regular expression and a stop word are adopted for carrying out accurate word segmentation, and fuzzy sound words, near meaning words and associated words in the user-defined words are all summarized into the same class.
The feature extraction unit selects the vocabulary with the highest frequency of occurrence from the word segmentation library by adopting a TF-IDF algorithm,
Figure BDA0003301227620000041
wherein, TF (c, y): the frequency of the word c among the segmentation class y;
w: dividing the total number of parts of speech in the word library;
df (c): the number of participles containing word c.
The representation model includes representations of base attributes, representations based on consumption behavior, representations based on temporal space, representations based on usage motivation, representations based on usage behavior.
Specifically, in the first step, a data acquisition unit encrypts and stores industry data, user general data, general browsing data, general content data, user attribute data, user behavior data, user growth data, access depth, questionnaire investigation, user interview, user participation data and user click data as internal data to acquire related data, each user in a user set corresponds to a different ID, and basic characteristics in a visual feature set comprise age, gender, occupation and region; the network characteristics comprise internet surfing time, time and influence factors, and the use characteristics comprise use frequency, time and time; the hidden feature set comprises a purpose, a preference, a demand, a frequency, a scene and a historical search word list; and secondly, processing the data through a data processing unit, and cleaning, segmenting words and processing custom words for the data, for example: before word segmentation processing: international gold tendency, after word segmentation treatment: gold, storing the classified characteristic words into a word segmentation database; thirdly, extracting the characteristics of the words in the word segmentation databases, and selecting keywords as the characteristics from the words through a TF-IDF algorithm; fourthly, converting the feature text into a computer language through a semantic conversion unit; and fifthly, selecting the user characteristics related to the label in the characteristic database for each label to generate different types of user portraits.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. A user representation construction system, comprising:
the data acquisition unit acquires data from different data sources through the thread and establishes a user set according to the ID of the user;
the data processing unit is used for processing the user data in the data set and storing the processed data in the word segmentation database;
the characteristic extraction unit is used for extracting characteristics from the data distributed and stored in the database and storing the characteristics into the characteristic database;
the semantic conversion unit is used for converting language vocabularies in the characteristic database into computer languages;
and generating a user portrait, defining a label and constructing model characteristics, and constructing an integral portrait model through data association and rule definition.
2. A user representation construction system as claimed in claim 1, wherein the data sources of said data acquisition unit comprise industry data, user population data, population browsing data, population content data, user attribute data, user behavior data, user growth data, access depth, questionnaire investigation, user interview, user engagement data and user click data, said access depth, questionnaire investigation and user interview being stored as internal data in encrypted form.
3. A user representation construction system according to claim 2, wherein the data of the user set is divided into a visual feature set and a hidden feature set, the visual feature set includes basic features, network features and use features, and the hidden feature set includes purpose, preference, demand, frequency, scene, and historical search word list.
4. A user representation construction system as claimed in claim 1, wherein said data processing unit comprises:
the data cleaning module is used for deleting the empty words in the user set, detecting errors and inconsistency in the word segmentation database and removing and correcting error data;
the word segmentation processing module is used for dividing the described document;
and the self-defined word module automatically divides the word classes according to the classification requirements, and simultaneously realizes data exchange by adopting an XML data exchange frame.
5. A user representation construction system as claimed in claim 4, wherein said segmentation processing module firstly uses the final segmentation, and then uses the regular expression and stop word to perform the precise segmentation, and the fuzzy sound word, the near meaning word and the association word in the self-defined word are all summarized into the same category.
6. The user representation construction system of claim 1, wherein the feature extraction unit selects a word with the highest frequency of occurrence from the segmentation library by using TF-IDF algorithm,
Figure FDA0003301227610000021
wherein, TF (c, y): the frequency of the word c among the segmentation class y;
w: dividing the total number of parts of speech in the word library;
df (c): the number of participles containing word c.
7. A user representation construction system in accordance with claim 1, wherein said representation model comprises a representation of base attributes, a representation based on consumption behavior, a representation based on temporal space, a representation based on usage motivation, a representation based on usage behavior.
CN202111191345.6A 2021-10-13 2021-10-13 User portrait construction system Pending CN113901318A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111191345.6A CN113901318A (en) 2021-10-13 2021-10-13 User portrait construction system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111191345.6A CN113901318A (en) 2021-10-13 2021-10-13 User portrait construction system

Publications (1)

Publication Number Publication Date
CN113901318A true CN113901318A (en) 2022-01-07

Family

ID=79191772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111191345.6A Pending CN113901318A (en) 2021-10-13 2021-10-13 User portrait construction system

Country Status (1)

Country Link
CN (1) CN113901318A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114048283A (en) * 2022-01-11 2022-02-15 北京仁科互动网络技术有限公司 User portrait generation method and device, electronic equipment and storage medium
CN116821287A (en) * 2023-08-28 2023-09-29 湖南创星科技股份有限公司 Knowledge graph and large language model-based user psychological portrait system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578292A (en) * 2017-09-19 2018-01-12 上海财经大学 A kind of user's portrait constructing system
CN111597330A (en) * 2019-02-21 2020-08-28 中国科学院信息工程研究所 Intelligent expert recommendation-oriented user image drawing method based on support vector machine
CN112990973A (en) * 2021-03-22 2021-06-18 山东顺能网络科技有限公司 Online shop portrait construction method and system
CN113032556A (en) * 2019-12-25 2021-06-25 厦门铠甲网络股份有限公司 Method for forming user portrait based on natural language processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578292A (en) * 2017-09-19 2018-01-12 上海财经大学 A kind of user's portrait constructing system
CN111597330A (en) * 2019-02-21 2020-08-28 中国科学院信息工程研究所 Intelligent expert recommendation-oriented user image drawing method based on support vector machine
CN113032556A (en) * 2019-12-25 2021-06-25 厦门铠甲网络股份有限公司 Method for forming user portrait based on natural language processing
CN112990973A (en) * 2021-03-22 2021-06-18 山东顺能网络科技有限公司 Online shop portrait construction method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王英: "高校科研用户画像特征分析及案例研究", 《图书馆理论与实践》, no. 4, 31 August 2020 (2020-08-31), pages 2 - 3 *
许鹏程: "数据驱动下数字图书馆用户画像模型构建", 《图书情报工作》, vol. 63, no. 3, 31 March 2019 (2019-03-31), pages 2 - 3 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114048283A (en) * 2022-01-11 2022-02-15 北京仁科互动网络技术有限公司 User portrait generation method and device, electronic equipment and storage medium
CN116821287A (en) * 2023-08-28 2023-09-29 湖南创星科技股份有限公司 Knowledge graph and large language model-based user psychological portrait system and method
CN116821287B (en) * 2023-08-28 2023-11-17 湖南创星科技股份有限公司 Knowledge graph and large language model-based user psychological portrait system and method

Similar Documents

Publication Publication Date Title
CN109359244B (en) Personalized information recommendation method and device
CN111753060B (en) Information retrieval method, apparatus, device and computer readable storage medium
CN107705066B (en) Information input method and electronic equipment during commodity warehousing
US9348898B2 (en) Recommendation system with dual collaborative filter usage matrix
US8380727B2 (en) Information processing device and method, program, and recording medium
CN113011186B (en) Named entity recognition method, named entity recognition device, named entity recognition equipment and computer readable storage medium
CN110728541A (en) Information stream media advertisement creative recommendation method and device
CN107291840B (en) User attribute prediction model construction method and device
US11741094B2 (en) Method and system for identifying core product terms
CN113901318A (en) User portrait construction system
CN112231569A (en) News recommendation method and device, computer equipment and storage medium
CN110633398A (en) Method for confirming central word, searching method, device and storage medium
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
CN112990973A (en) Online shop portrait construction method and system
CN105701182A (en) Information pushing method and apparatus
CN115147130A (en) Problem prediction method, apparatus, storage medium, and program product
CN110795613A (en) Commodity searching method, device and system and electronic equipment
CN116975615A (en) Task prediction method and device based on video multi-mode information
CN111460267B (en) Object identification method, device and system
CN116756281A (en) Knowledge question-answering method, device, equipment and medium
CN116484872A (en) Multi-modal aspect emotion judging method and system based on pre-training and attention
CN116127013A (en) Personal sensitive information knowledge graph query method and device
CN115391522A (en) Text topic modeling method and system based on social platform metadata
Al-Saffar et al. Survey on Implicit Feedbacks Extraction based on Yelp Dataset using Collaborative Filtering
CN110413899B (en) Storage resource optimization method and system for server storage news

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination