CN112307744A - Method for judging gender of Chinese name based on multilayer perceptron - Google Patents

Method for judging gender of Chinese name based on multilayer perceptron Download PDF

Info

Publication number
CN112307744A
CN112307744A CN202011204834.6A CN202011204834A CN112307744A CN 112307744 A CN112307744 A CN 112307744A CN 202011204834 A CN202011204834 A CN 202011204834A CN 112307744 A CN112307744 A CN 112307744A
Authority
CN
China
Prior art keywords
word
name
chinese
gender
chinese name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011204834.6A
Other languages
Chinese (zh)
Inventor
于江德
李学钰
王继鹏
李娜
翁晓茹
白香凝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anyang Normal University
Original Assignee
Anyang Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anyang Normal University filed Critical Anyang Normal University
Priority to CN202011204834.6A priority Critical patent/CN112307744A/en
Publication of CN112307744A publication Critical patent/CN112307744A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to the technical field of artificial intelligence, and particularly discloses a method for judging gender of a Chinese name based on a multilayer perceptron, which comprises the following steps: obtaining an initial word vector and a word vector of the Chinese name words on the word vector training corpus by adopting word2 vec; dividing Chinese name corpus into training corpus and testing corpus according to a certain proportion, wherein the training corpus is divided into training corpus and verification corpus according to a certain proportion; establishing a multilayer perceptron model and training the multilayer perceptron model for judging the name and the sex of the Chinese character; inputting the name of the Chinese to be judged for gender, and carrying out gender judgment and subsequent statistical processing. The judgment method of the invention judges the gender of the user only by the name of the user, the original data is easy to obtain, the character use characteristics of the Chinese names of different genders of men and women are automatically obtained by the method by means of the multilayer perceptron, manual participation in characteristic engineering is not needed, and a great amount of manpower is saved.

Description

Method for judging gender of Chinese name based on multilayer perceptron
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a method for judging gender of a Chinese name based on a multilayer perceptron.
Background
The gender of a user in a network environment is very important information, and the user with different genders needs to be treated differently in many scenes such as content push, network marketing, commodity recommendation, advertisement putting and the like. For example, at present, online shopping is a choice of more and more consumers, and online shopping behaviors of consumers of different genders are greatly different, so that the online shopping platform needs to perform differentiation processing for users of different genders of men and women when performing online marketing, commodity recommendation and advertisement delivery. The online shopping platform is easy to acquire the real name of the user, and generally cannot acquire the gender information of the user too easily, and can predict the gender information of the user from the name of the user? The answer is positive, which is the technical scheme provided by the invention. Through analysis, the Chinese name has strong gender distinguishability, people can usually presume that the person is male or female from the name of a stranger, and the accuracy rate is high.
Second hand information technology limited discloses a user gender analysis method and apparatus in patent document "user gender analysis method and apparatus" (patent application No. 201310526980.4, publication No. CN104598452A) applied by second hand information technology limited, which judges the gender of a user by analyzing the personal domain name of the user. The method analyzes the individual domain name of unknown user gender by counting the probability of the occurrence of different letters on each rank and different letter combinations on a plurality of adjacent ranks according to gender in the individual domain name of the user in the sample data set and taking the probability as reference data to judge the gender of the user. In the patent document "sex recognition method, apparatus and electronic device" filed by beijing heisui line robotics research and development ltd (patent application No. 201810900838.4, publication No. CN109190495A), a sex recognition method, apparatus and electronic device are disclosed, in which first recognition is performed by face recognition and second recognition is performed by thermal imaging. Suzhou Samsung computer Co., Ltd discloses a method for identifying the sex of an intelligent device in "a method and apparatus for identifying the sex of an intelligent device" (patent application No. 201711024078.7, publication No. CN107862263A) which performs sex identification by acquiring an image of a target person over the shoulder. These three techniques require the acquisition of a user's personal domain name or body part or even the entire image, and then perform gender identification with the aid of these acquired data.
The 'gender determination method based on Chinese name character use characteristics' document published in Shandong university newspaper in 2014 proposes a method for determining the gender of a Chinese name by using a naive Bayes classifier, and the method needs to perform statistical analysis on the character use characteristics and combinations of the Chinese name character use characteristics, and is time-consuming and labor-consuming in manual participation characteristic engineering. In view of the facts that obtaining of user individual domain names or body images in a plurality of scenes cannot be achieved and time and labor are consumed in manual feature engineering, the invention provides a Chinese name gender judgment method based on a multilayer perceptron, which can be used for gender judgment of users.
Disclosure of Invention
The invention aims to: the method can automatically acquire character use characteristics of the Chinese names of men and women with different sexes by means of the multilayer perceptron, does not need to participate in characteristic engineering manually, and saves a large amount of manpower.
The technical scheme adopted by the invention is as follows:
a method for judging gender of Chinese names based on a multilayer perceptron comprises the following steps:
(1) obtaining an initial word vector and a word vector of a Chinese name word by adopting word2vec on a word vector training corpus, and specifically comprising the following steps:
(11) after preprocessing a word and word vector training corpus segmented by single characters, establishing a dictionary of words, and then obtaining a word vector of the corpus by adopting word2 vec;
(12) after preprocessing the word and word vector training corpus of the participles, establishing a word dictionary, and then obtaining word vectors of the participle corpus by adopting word2 vec;
(2) dividing Chinese name corpus into training corpus and testing corpus according to a certain proportion, wherein the training corpus is divided into training corpus and verification corpus according to a certain proportion;
(3) the method comprises the following steps of establishing a multilayer perceptron model and training the multilayer perceptron model for Chinese name and gender judgment:
(31) constructing a multilayer perceptron model (the model structure is shown in figure 1): the leftmost side of the multilayer perceptron is an input layer and is used for receiving input data of one or batch Chinese name, the input data of the Chinese name only comprises the name of a person and does not comprise the surname, and the input data is an initial word vector and a word vector corresponding to the first word and the second word of the Chinese name and the combination of the first word and the second word;
the middle of the multilayer perceptron is provided with a plurality of hidden layers which are used for extracting and calculating the characteristics of the input Chinese name data;
the rightmost side of the multilayer perceptron is an output layer, the output is the probability that the input Chinese name belongs to male and female after being judged by the multilayer perceptron model, and the gender of the name can be determined according to the probability value;
(32) training a multilayer perceptron model for judging the gender of the Chinese name to obtain word vectors of all characters in a Chinese name training corpus, word vectors of two-character combination, weight parameters of each layer and corresponding bias items;
(4) inputting the name of the Chinese to be sex-judged, and carrying out sex judgment and post-processing, wherein the method comprises the following specific steps:
(41) obtaining Chinese name data of the gender to be determined input by an input layer through word and word vector splicing: inputting data obtained by splicing the head and the tail of the word vector of the first character, the word vector of the second character and the word vector corresponding to the combination of the word vector and the word vector of the first character of the Chinese name with the gender to be determined into the multi-layer perceptron model trained in the step (3);
(42) carrying out forward calculation on the trained multilayer perceptron model of input data, outputting the probability that the Chinese name belongs to a male and a female respectively through a Sigmoid activation function, and outputting the judgment result of the Chinese name of which the gender is to be judged according to the two probability values: male or female.
Further, in the step (11), each word in the dictionary of words is assigned with a serial number, the serial number is numbered from 1, and a 0 number is reserved to represent a word which does not appear in the dictionary of words.
Further, in the step (12), each word in the dictionary of words is assigned with a sequence number, the sequence number is numbered from 1, and a 0 number is reserved to represent a word which does not appear in the dictionary of words.
Further, in the step (2), each row of the chinese name corpus includes two columns, where the first column is a name of a person, the name includes a surname, and the second column is a gender of the person, i.e. male or female.
Further, in the step (31), the Chinese name data input by the input layer is obtained by word and word vector splicing, specifically, by querying the initial word vector and word vector obtained in the step (1), the word vector and word vector corresponding to the first word and the second word of the input Chinese name and the combination of the first word and the second word are obtained, and the three vectors, i.e., the word vector of the first word, the word vector of the second word and the word vector corresponding to the combination of the first word and the second word, are spliced end to obtain the data to be input into the input layer.
Furthermore, when the Chinese name data input by the input layer is a single character, the first character of the single character name is regarded as 'NULL', a corresponding character vector is distributed to the character, and the second character is the single character.
Furthermore, when the Chinese name data input by the input layer is three or more characters, the first character of the name is the first character of the name, and the second character is the last character of the name.
Further, if the word vector of the combination of the first word and the second word is not queried in step (1), the word vector of the combination is the average of the word vectors of the two words.
Further, in the step (31), the number of layers of the hidden layer and the number of neurons in each layer may be set according to the condition of the training data of the name of the chinese character, and the activation function of each layer is set as a ReLU function.
Further, in the step (31), since it is a dichotomy problem to perform gender determination from the name of the chinese person, the output layer adopts Sigmoid function as the activation function.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that: the invention utilizes the method for judging the name and the gender of the Chinese character based on the multilayer perceptron, and the gender of the user can be obtained only according to the name information of the user.
Drawings
FIG. 1 is a schematic diagram of a multi-layered perceptron model structure according to the present invention;
FIG. 2 is a flow chart of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Examples
The Chinese name character has strong gender distinguishability. The name of a person is restricted by a plurality of cultural factors such as history, times, society, nationality, families and the like, Chinese names have extremely rich cultural connotations, the historical deposition of thousands of years of summer culture is condensed, the wisdom and spirit of Chinese nationality are stored, and the profound background of Chinese civilization is fully displayed. The Chinese name inherits the rich connotation of Chinese culture, and the name characters have strong sex distinction, so that the name can be known to be male or female. In Chinese names, men are strong and beautiful, and are named after the best choice and the future of career, and hope to stand as if they stand as mountain when getting names, such as multi-purpose mountain, peak and the like; it is as resistant to abrasion as diamond, such as Xin, Lei, just, etc.; for example, the words "Cheng, Gong, Zi, Jian" should be used to establish the industry and achieve the same purpose. Women are eager to have the appearance like a flower and a moon, have a soft and water-like sexual feeling, and have beautiful skin, so the women's name is often named with the characters of ' plum, cinnamon, fragrant, orchid, clean, elegant, beautiful, delicate, beautiful, pearl, jade ', etc.
Character features for distinguishing genders are automatically extracted from Chinese name corpus based on a multilayer perceptron. The gender judgment from the Chinese name can be realized by adopting a traditional machine learning model, such as a naive Bayes classifier, a maximum entropy model and the like, but the traditional machine learning needs a large amount of manual feature engineering and is time-consuming and labor-consuming. The invention adopts a multilayer perceptron which is essentially realized by a multilayer fully-connected neural network, the multilayer perceptron is a neural network simulating human brain to analyze and learn, a model can automatically learn the characteristic representation in data to complete a specific task, and the characteristic representation can often match or even exceed the recognition precision of human beings, so the invention is widely applied to the field of artificial intelligence. The basic structure of the multilayer perceptron model comprises an input layer, hidden layers and an output layer, wherein the number of the hidden layers can be more or less.
Based on the above, the present embodiment provides a method for determining gender of a chinese name based on a multi-layered perceptron, the method comprising the following steps:
(1) adopting word2vec to obtain an initial word vector and a word vector of Chinese name characters on a word vector training corpus, wherein the word vector training corpus is a 1998 all year round and 2000 all year round daily newspaper corpus, preprocessing the corpus, and mainly removing English characters, Arabic numerals, pinyin and other non-Chinese characters in the corpus, and then:
(11) after preprocessing a word vector training corpus, establishing a dictionary of words, then obtaining word vectors of the corpus by adopting word2vec, specifically, obtaining the word vectors by training the corpus preprocessed by segmenting single characters by adopting a word2vec tool in a genim open source library, wherein the dimensionality of the word vectors in the training is set to be 256, and the min _ count is set to be 1. Thus obtaining the low-dimensional vector representation of each Chinese character in the corpus; assigning a sequence number to each word in the dictionary of words, the sequence number numbering from 1, the 0 number remaining to indicate a word not present in the dictionary of words;
(12) after preprocessing word vector training corpus of segmented words, establishing a dictionary of words, then obtaining word vectors of the segmented word corpus by adopting word2vec, specifically, training the segmented and preprocessed corpus by adopting a word2vec tool in a genim open source library to obtain word vectors, setting the dimensionality of the word vectors in training to be 256, and setting min _ count to be 5, thus obtaining low-dimensional vector representation of words with the occurrence frequency not less than 5 in the corpus; assigning a sequence number to each word in the dictionary of words, the sequence number numbering from 1, the 0 number remaining to indicate that no word appears in the dictionary of words;
(2) dividing Chinese name corpus into training corpus and testing corpus according to the proportion of 9: 1, wherein the training corpus is divided into training corpus and verification corpus according to the proportion of 5: 1; the Chinese name corpus in each row comprises two columns, wherein the first column is the name of a person, the name comprises the surname, and the second column is the gender of the person, namely male or female;
although the first column in the Chinese name corpus contains surnames, only the name of a person, i.e., the name of a person, is used in the technical scheme. The name of Chinese character can be divided into single character name, double character name, three character name and more than three character name according to the number of used characters. Statistics shows that Chinese names are mainly double-word names, are inferior to single-word names, and are rare when the names of three or more words are used. In the present embodiment, the first character and the second character in the Chinese name are respectively recorded as characters1Chinese character2
(3) The method comprises the following steps of establishing a multilayer perceptron model and training the multilayer perceptron model for Chinese name and gender judgment:
(31) a multi-layer perceptron model is constructed,the model structure is shown in figure 1: the leftmost of the multi-layer perceptron is an input layer for receiving input data of one or a batch of Chinese names, the input data of the Chinese names including the first name of a person and not including the last name, the input data being the character of the Chinese name1Chinese character2Chinese character' he1Chinese character2Combining the initial word vector and the word vector corresponding to the initial word vector and the word vector;
the method comprises the following steps that Chinese name data input by an input layer are obtained by word and word vector splicing, specifically, a first character, a second character and a word vector corresponding to the combination of the first character and the second character of the input Chinese name are obtained by inquiring an initial word vector and a word vector obtained in the step (1), and the three vectors of the word vector of the first character, the word vector of the second character and the word vector corresponding to the combination of the first character and the second character are spliced end to obtain data to be input into the input layer; if the word vector of the combination of the first word and the second word is not inquired in the step (1), the word vector of the combination is the average of the word vectors of the two words;
for example: for the data with name "Lizhiqiang", the name of the person is "Zhiqiang", the character of the name of the Chinese person1Chinese character2And the combination of the two1Character (Chinese character)2Respectively being 'zhi', 'strong', inquiring the initial word vector and the initial word vector obtained by training in the step (1) to respectively obtain the character vector of 'zhi' and 'strong', and also obtain the word vector of 'zhi strong', and then splicing the three vectors end to obtain the data to be input into the input layer. Wherein, if there is no word vector with "strong" in the word vectors trained in step (1), the word vector is the sum vector of the word vectors with "strong" and "strong" divided by 2;
when the Chinese name data input by the input layer is a single character, the first character of the single character name is regarded as 'NULL', a corresponding character vector is distributed to the character, and the second character is the single character;
for example: for data with name of "Lina", the name of the person is "na", and is a single word name, and the word of the Chinese name1Chinese character2And the combination of the two1Character (Chinese character)2Are respectively empty<NULL>"," na ",<NULL>na "being word null<NULL>"the assigned word vector is 256-dimensional all-zero vector, the initial word vector obtained by training in step (1) is inquired to obtain the word vector of" na ", obviously, none is in the word vector trained in step (1)"<NULL>Na's word vector, then the word vector is null<NULL>Dividing the sum vector of the word vectors of 'na' by 2, and splicing the three vectors end to obtain data to be input into an input layer;
the middle of the multilayer perceptron is a 3-layer hidden layer, the number of neurons of each layer is 128, 64 and 64 respectively, the neuron number is used for extracting and calculating the characteristics of input Chinese name data, and the activation function of each layer is set as a ReLU function;
the rightmost side of the multilayer perceptron is an output layer, the output is the probability that the input Chinese name belongs to male and female after being judged by the multilayer perceptron model, and the gender of the name can be determined according to the probability value; because the gender judgment from the Chinese name is a two-classification problem, the output layer adopts a Sigmoid function as an activation function;
(32) training a multilayer perceptron model for judging the gender of the Chinese name to obtain word vectors of all characters in a Chinese name training corpus, word vectors of two-character combination, weight parameters of each layer and corresponding bias items;
(4) inputting the name of the Chinese to be sex-judged, and carrying out sex judgment and post-processing, wherein the method comprises the following specific steps:
(41) obtaining Chinese name data of the gender to be determined input by an input layer through word and word vector splicing: inputting data obtained by splicing the head and the tail of the word vector of the first character, the word vector of the second character and the word vector corresponding to the combination of the word vector and the word vector of the first character of the Chinese name with the gender to be determined into the multi-layer perceptron model trained in the step (3);
(42) carrying out forward calculation on the trained multilayer perceptron model of input data, outputting the probability that the Chinese name belongs to a male and a female respectively through a Sigmoid activation function, and outputting the judgment result of the Chinese name of which the gender is to be judged according to the two probability values: male or female.
For example: strong input plum aspiration and sex judgment result is male;
for example: lina was input, and the result of sex determination was "woman".
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A method for judging gender of Chinese names based on a multilayer perceptron is characterized by comprising the following steps:
(1) obtaining an initial word vector and a word vector of a Chinese name word by adopting word2vec on a word vector training corpus, and specifically comprising the following steps:
(11) after preprocessing a word vector training corpus, establishing a dictionary of words, and then obtaining a word vector of the corpus by adopting word2 vec;
(12) after preprocessing the word and word vector training corpus of the participles, establishing a word dictionary, and then obtaining word vectors of the participle corpus by adopting word2 vec;
(2) dividing Chinese name corpus into training corpus and testing corpus according to a certain proportion, wherein the training corpus is divided into training corpus and verification corpus according to a certain proportion;
(3) the method comprises the following steps of establishing a multilayer perceptron model and training the multilayer perceptron model for Chinese name and gender judgment:
(31) constructing a multilayer perceptron model: the leftmost side of the multilayer perceptron is an input layer and is used for receiving input data of one or batch Chinese name, the input data of the Chinese name only comprises the name of a person and does not comprise the surname, and the input data is an initial word vector and a word vector corresponding to the first word and the second word of the Chinese name and the combination of the first word and the second word;
the middle of the multilayer perceptron is provided with a plurality of hidden layers which are used for extracting and calculating the characteristics of the input Chinese name data;
the rightmost side of the multilayer perceptron is an output layer, the output is the probability that the input Chinese name belongs to male and female after being judged by the multilayer perceptron model, and the gender of the name can be determined according to the probability value;
(32) training a multilayer perceptron model for judging the gender of the Chinese name to obtain word vectors of all characters in a Chinese name training corpus, word vectors of two-character combination, weight parameters of each layer and corresponding bias items;
(4) inputting the name of the Chinese to be sex-judged, and carrying out sex judgment and post-processing, wherein the method comprises the following specific steps:
(41) obtaining Chinese name data of the gender to be determined input by an input layer through word and word vector splicing: inputting data obtained by splicing the head and the tail of the word vector of the first character, the word vector of the second character and the word vector corresponding to the combination of the word vector and the word vector of the first character of the Chinese name with the gender to be determined into the multi-layer perceptron model trained in the step (3);
(42) carrying out forward calculation on the trained multilayer perceptron model of input data, outputting the probability that the Chinese name belongs to a male and a female respectively through a Sigmoid activation function, and outputting the judgment result of the Chinese name of which the gender is to be judged according to the two probability values: male or female.
2. The method for determining gender of Chinese name based on multi-layered perceptron as claimed in claim 1, wherein in step (11), each word in the dictionary of words is assigned a sequence number, the sequence number starting with 1 and the number 0 reserved to indicate that no word is present in the dictionary of words.
3. The method for determining gender of Chinese name based on multi-layered perceptron as claimed in claim 1, wherein in step (12), each word in the dictionary of words is assigned a sequence number, the sequence number starting with 1 and the number 0 being reserved to indicate that no word is present in the dictionary of words.
4. The method for determining gender of Chinese name based on multi-layered perceptron as claimed in claim 1, wherein in step (2), the Chinese name corpus comprises two columns per row, wherein the first column is a name of a person, the name comprises a surname, and the second column is the gender of the person, i.e. male or female.
5. The method for determining gender of Chinese name based on multi-layered perceptron as claimed in claim 1, wherein in step (31), the Chinese name data inputted in the input layer is obtained by word and word vector concatenation, specifically by querying the initial word vector and word vector obtained in step (1), so as to obtain the word vector and word vector corresponding to the first word and the second word of the inputted Chinese name and the combination of the first word and the second word, and the word vector corresponding to the combination of the first word and the second word and the word vector are obtained by concatenation of the head and the tail of the three vectors to obtain the data to be inputted in the input layer.
6. The method as claimed in claim 5, wherein when the Chinese name data inputted from the input layer is a single word, the first word of the single word name is considered as "NULL", the word is assigned with a corresponding word vector, and the second word is the single word.
7. The method as claimed in claim 5, wherein when the Chinese name data inputted from the input layer is three or more, the first word of the name is the first word of the name, and the second word is the last word of the name.
8. The method of claim 5, wherein the combined word vector of the first word and the second word is an average of word vectors of the first word and the second word if the combined word vector is not searched in the step (1).
9. The method for determining gender of Chinese name based on multi-layered perceptron as claimed in claim 1, wherein in step (31), the number of hidden layers and the number of neurons in each layer can be set according to Chinese name training data, and the activation function of each layer is set as ReLU function.
10. The method for determining gender of Chinese name based on multi-layered perceptron as claimed in claim 1, wherein in step (31), the output layer uses Sigmoid function as the activation function because gender determination from Chinese name is a binary problem.
CN202011204834.6A 2020-11-02 2020-11-02 Method for judging gender of Chinese name based on multilayer perceptron Pending CN112307744A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011204834.6A CN112307744A (en) 2020-11-02 2020-11-02 Method for judging gender of Chinese name based on multilayer perceptron

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011204834.6A CN112307744A (en) 2020-11-02 2020-11-02 Method for judging gender of Chinese name based on multilayer perceptron

Publications (1)

Publication Number Publication Date
CN112307744A true CN112307744A (en) 2021-02-02

Family

ID=74333737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011204834.6A Pending CN112307744A (en) 2020-11-02 2020-11-02 Method for judging gender of Chinese name based on multilayer perceptron

Country Status (1)

Country Link
CN (1) CN112307744A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312905A (en) * 2021-06-23 2021-08-27 北京有竹居网络技术有限公司 Information prediction method, information prediction device, storage medium and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103389973A (en) * 2013-07-23 2013-11-13 安阳师范学院 Method for judging gender by utilizing Chinese name
CN107391603A (en) * 2017-06-30 2017-11-24 北京奇虎科技有限公司 User's portrait method for building up and device for mobile terminal

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103389973A (en) * 2013-07-23 2013-11-13 安阳师范学院 Method for judging gender by utilizing Chinese name
CN107391603A (en) * 2017-06-30 2017-11-24 北京奇虎科技有限公司 User's portrait method for building up and device for mobile terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
于江德 等: "基于中文人名用字特征的性别判定方法", 《山东大学学报(工学版)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312905A (en) * 2021-06-23 2021-08-27 北京有竹居网络技术有限公司 Information prediction method, information prediction device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN110287320B (en) Deep learning multi-classification emotion analysis model combining attention mechanism
CN109948165B (en) Fine granularity emotion polarity prediction method based on mixed attention network
CN106980683B (en) Blog text abstract generating method based on deep learning
CN107818138B (en) Case law regulation recommendation method and system
CN107168945B (en) Bidirectional cyclic neural network fine-grained opinion mining method integrating multiple features
WO2018120899A1 (en) Trademark inquiry result proximity evaluating and sorting method and device
CN107391485A (en) Entity recognition method is named based on the Korean of maximum entropy and neural network model
CN109460737A (en) A kind of multi-modal speech-emotion recognition method based on enhanced residual error neural network
CN108492118A (en) The two benches abstracting method of text data is paid a return visit in automobile after-sale service quality evaluation
CN110096575B (en) Psychological portrait method facing microblog user
CN110196906A (en) Towards financial industry based on deep learning text similarity detection method
CN107315738A (en) A kind of innovation degree appraisal procedure of text message
CN112905739B (en) False comment detection model training method, detection method and electronic equipment
CN110335653A (en) Non-standard case history analytic method based on openEHR case history format
CN110750646B (en) Attribute description extracting method for hotel comment text
Li et al. Dating ancient paintings of Mogao Grottoes using deeply learnt visual codes
CN114937182B (en) Image emotion distribution prediction method based on emotion wheel and convolutional neural network
Ruwa et al. Affective visual question answering network
CN116052858A (en) Intelligent diagnosis guiding method based on BERT and feature fusion
CN112307744A (en) Method for judging gender of Chinese name based on multilayer perceptron
Inunganbi et al. Recognition of handwritten Meitei Mayek script based on texture feature
CN117236338B (en) Named entity recognition model of dense entity text and training method thereof
CN116779177A (en) Endocrine disease classification method based on unbiased mixed tag learning
CN108717450B (en) Analysis algorithm for emotion tendentiousness of film comment
CN113239277A (en) Probability matrix decomposition recommendation method based on user comments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination