CN111309913A - Method for analyzing gender by name - Google Patents
Method for analyzing gender by name Download PDFInfo
- Publication number
- CN111309913A CN111309913A CN202010118259.1A CN202010118259A CN111309913A CN 111309913 A CN111309913 A CN 111309913A CN 202010118259 A CN202010118259 A CN 202010118259A CN 111309913 A CN111309913 A CN 111309913A
- Authority
- CN
- China
- Prior art keywords
- gender
- data
- probability
- model
- establishing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000012795 verification Methods 0.000 claims abstract description 23
- 238000004140 cleaning Methods 0.000 claims abstract description 4
- 239000000126 substance Substances 0.000 claims description 3
- 238000006467 substitution reaction Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/01—Customer relationship services
- G06Q30/015—Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk
- G06Q30/016—After-sales
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Theoretical Computer Science (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Data Mining & Analysis (AREA)
- Game Theory and Decision Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a method for analyzing gender by name, which comprises the following steps: (1) acquiring basic data; (2) cleaning a database, and establishing a modeling set and a verification set; (3) calculating prior probability; (4) establishing a Bayesian model; (5) performing probability correction according to the result; (6) substituting into a verification set; (7) and (5) verifying actual application. The method for analyzing the gender by the name provided by the invention can be used for carrying out targeted marketing putting according to the analysis result in the industry with strong gender correlation in commodity marketing.
Description
Technical Field
The invention belongs to the technical field of data analysis, and particularly relates to a method for analyzing gender through names.
Background
Under the background of advertisement push, commodities are accurately popularized to designated crowds, the conversion rate can be improved, the relation between the commodity attributes and the gender is particularly important, and the method is particularly key to predicting the gender by utilizing a data set when the names of clients are known.
The customized accurate touchdown of the advertisement is an effective means for realizing sales increase of merchants, wherein the gender attribute of a touchdown person is widely used for accurate marketing, and the method has a strong demand scene. The company surveys and finds that different genders have obvious purchasing differences in the aspects of item selection, emphasis point, price acceptance degree and the like in different sales fields, and the technology is based on reality, and realizes effective identification on the genders of users according to the names used by the users during purchasing through Bayesian algorithm and company internal data accumulation.
In the prior art, for example, NFT (name prediction for gender) and a data set obtain an overall probability and a probability of a name of a gender to be obtained, the gender is predicted by a bayesian principle, and a weighted gender prediction model is based on the bayesian principle. However, the shortage of data set cardinality in the method has a large influence on the result, a large amount of real gender data accumulation is needed, and the establishment of the model cannot be realized by common companies and individuals.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a method for analyzing the gender by name, which analyzes the prior probability of all names, establishes a prior probability database, establishes a basic gender prediction probability model by using a Bayesian model, corrects the basic probability model according to the positions of Chinese characters appearing in the names and the matching combination of the characters, and improves the model accuracy.
In order to realize the purpose, the technical scheme adopted by the invention is as follows:
a method of analyzing gender by name, comprising the steps of:
(1) basic data acquisition
The data is obtained from the marking result of the cooperation merchant when the cooperation merchant is in contact with the user, and the data is real and effective;
(2) cleaning database, establishing modeling set and verification set
The method comprises the following steps that I, original data are not all real names, and some original data are chemical names, the data are regularized by establishing rules, and the real names are extracted to serve as a data set;
dividing the cleaned data into a modeling set and a verification set at random according to the proportion of 7:3 for establishing a model;
(3) calculating prior probabilities
Grouping by gender, counting the proportion of each Chinese character in the same gender as prior probability, and recording the prior probability;
(4) establishing Bayesian model
Substituting the prior probability obtained in the step (3) into a Bayes model, and calculating the Bayes probability of each Chinese character relative to the gender;
(5) making a probability correction based on the result
Through fitting, the weighted value of the Bayesian probability of each part of the Chinese characters is found, and a modified Bayesian probability model is established;
(6) substitution verification set
Substituting the established model into the verification set in the step (2);
(7) practical application verification
And on the premise of protecting personal privacy, data is encrypted and delivered to a service department for verification.
The method for analyzing the gender by the name provided by the invention can be used for carrying out targeted marketing putting according to the analysis result in the industry with strong gender correlation in commodity marketing. The method has the following technical effects:
1. in the aspect of accurate recommendation, the addition of the gender improves the accuracy of recommendation.
2. In the aspect of marketing documents, different reaching documents can be designed according to different genders, and differentiated document key points are selected, so that the documents are more attractive.
3. In the aspect of industry, the marketing acceptability of a certain sex to a certain industry is found through investigation, the sex with high acceptability can be mainly marketed, and the popularization expenditure is saved.
4. In the aspect of marketing time nodes, the marketing effects of different genders at different time nodes are different, for example, ordinary women prefer ornament commodities, but the purchasing power of men during the period of valentines is increased in an explosive manner, and the marketing effects can be improved according to different time nodes.
5. In the aspect of after-sale, the feminized after-sale in most industries is more suitable for receiving male customers, the feminized after-sale is more suitable for receiving female customers, and after-sale satisfaction can be improved by aiming at gender.
Drawings
FIG. 1 is a schematic flow diagram of the present invention.
Detailed Description
The specific technical scheme of the invention is described by combining the embodiment.
The technical scheme of the invention is shown in figure 1, and the method for analyzing gender by name comprises the following steps:
(1) basic data acquisition
The data is obtained from the marking result of the cooperation merchant when the cooperation merchant is in contact with the user, and the data is real and effective;
(2) cleaning database, establishing modeling set and verification set
I, original data are not real names, but are partially chemical names (such as Rough, pig, Xiaoxian girl and the like), and the data are normalized by establishing rules, so that the real names are extracted to be used as a data set;
dividing the cleaned data into a modeling set and a verification set at random according to the proportion of 7:3 for establishing a model;
(3) calculating prior probabilities
Grouping by gender, counting the proportion of each Chinese character in the same gender as prior probability, and recording the prior probability;
(4) establishing Bayesian model
Substituting the prior probability obtained in the step (3) into a Bayes model, and calculating the Bayes probability of each Chinese character relative to the gender;
(5) making a probability correction based on the result
According to the obtained result, the result is substituted into the verification set, and the fact that the positions of the Chinese characters are different is found, so that the accuracy of the result is influenced. Through fitting, the weighted value of the Bayesian probability of each part of the Chinese characters is found, and a modified Bayesian probability model is established;
(6) substitution verification set
And (3) substituting the established model into the verification set in the step (2), so that the verification result is good.
(7) Practical application verification
On the premise of protecting personal privacy, data encryption is delivered to a business department for verification, and the verification result proves that model prediction has high accuracy.
Claims (2)
1. A method for analyzing gender by name, comprising the steps of:
(1) basic data acquisition
The data is obtained from the marking result of the cooperation merchant when the cooperation merchant is in contact with the user, and the data is real and effective;
(2) cleaning database, establishing modeling set and verification set
(3) Calculating prior probabilities
Grouping by gender, counting the proportion of each Chinese character in the same gender as prior probability, and recording the prior probability;
(4) establishing Bayesian model
Substituting the prior probability obtained in the step (3) into a Bayes model, and calculating the Bayes probability of each Chinese character relative to the gender;
(5) making a probability correction based on the result
Through fitting, the weighted value of the Bayesian probability of each part of the Chinese characters is found, and a modified Bayesian probability model is established;
(6) substitution verification set
Substituting the established model into the verification set in the step (2);
(7) practical application verification
And on the premise of protecting personal privacy, data is encrypted and delivered to a service department for verification.
2. The method for analyzing gender by name as claimed in claim 1, wherein the step (2) comprises the sub-steps of:
the method comprises the following steps that I, original data are not all real names, and some original data are chemical names, the data are regularized by establishing rules, and the real names are extracted to serve as a data set;
and II, randomly dividing the cleaned data into a modeling set and a verification set according to the proportion of 7:3 for establishing the model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010118259.1A CN111309913A (en) | 2020-02-26 | 2020-02-26 | Method for analyzing gender by name |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010118259.1A CN111309913A (en) | 2020-02-26 | 2020-02-26 | Method for analyzing gender by name |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111309913A true CN111309913A (en) | 2020-06-19 |
Family
ID=71146452
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010118259.1A Pending CN111309913A (en) | 2020-02-26 | 2020-02-26 | Method for analyzing gender by name |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111309913A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113312905A (en) * | 2021-06-23 | 2021-08-27 | 北京有竹居网络技术有限公司 | Information prediction method, information prediction device, storage medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411665A (en) * | 2010-09-21 | 2012-04-11 | 腾讯科技(深圳)有限公司 | Method and device for analyzing names |
CN103389973A (en) * | 2013-07-23 | 2013-11-13 | 安阳师范学院 | Method for judging gender by utilizing Chinese name |
CN104598452A (en) * | 2013-10-30 | 2015-05-06 | 北京思博途信息技术有限公司 | Method and device for analyzing user gender |
CN110442709A (en) * | 2019-06-24 | 2019-11-12 | 厦门美域中央信息科技有限公司 | A kind of file classification method based on model-naive Bayesian |
-
2020
- 2020-02-26 CN CN202010118259.1A patent/CN111309913A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411665A (en) * | 2010-09-21 | 2012-04-11 | 腾讯科技(深圳)有限公司 | Method and device for analyzing names |
CN103389973A (en) * | 2013-07-23 | 2013-11-13 | 安阳师范学院 | Method for judging gender by utilizing Chinese name |
CN104598452A (en) * | 2013-10-30 | 2015-05-06 | 北京思博途信息技术有限公司 | Method and device for analyzing user gender |
CN110442709A (en) * | 2019-06-24 | 2019-11-12 | 厦门美域中央信息科技有限公司 | A kind of file classification method based on model-naive Bayesian |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113312905A (en) * | 2021-06-23 | 2021-08-27 | 北京有竹居网络技术有限公司 | Information prediction method, information prediction device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7328169B2 (en) | Method and system for purchase-based segmentation | |
US8571919B2 (en) | System and method for identifying attributes of a population using spend level data | |
US20080086365A1 (en) | Method of analyzing credit card transaction data | |
CN112418932B (en) | Marketing information pushing method and device based on user tag | |
US20130325567A1 (en) | System and method for creating a virtual coupon | |
US20050149398A1 (en) | Media targeting system and method | |
US20110178849A1 (en) | System and method for matching merchants based on consumer spend behavior | |
US20110178841A1 (en) | System and method for clustering a population using spend level data | |
US20110178844A1 (en) | System and method for using spend behavior to identify a population of merchants | |
US20060122886A1 (en) | Media targeting system and method | |
CN116862592B (en) | Automatic push method for SOP private marketing information based on user behavior | |
Lipyanina et al. | Targeting Model of HEI Video Marketing based on Classification Tree. | |
Zheng et al. | A scalable purchase intention prediction system using extreme gradient boosting machines with browsing content entropy | |
Saragih et al. | Analysis of brand experience and brand satisfaction with brand loyalty through brand trust as a variable mediation | |
CN116308556A (en) | Advertisement pushing method and system based on Internet of things | |
CN103577472A (en) | Method and system for obtaining and presuming personal information as well as method and system for classifying and retrieving commodities | |
US20110178843A1 (en) | System and method for using spend behavior to identify a population of consumers that meet a specified criteria | |
Huseynov et al. | Behavioural segmentation analysis of online consumer audience in Turkey by using real e-commerce transaction data | |
CN111309913A (en) | Method for analyzing gender by name | |
CN116777562A (en) | Electronic commerce AI system based on big data | |
CN116797290A (en) | Intelligent advertisement delivery system and method thereof | |
CN111539782A (en) | Merchant information data processing method and system based on deep learning | |
Mohamad et al. | To what extent are credibility and attractiveness of social media influencer important in developing positive brand image and customer attitude? | |
KR102404247B1 (en) | Customer management system | |
CN116091171A (en) | Member statistics and management system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200619 |