CN113779191A - User identification method based on user joint information super vector and joint information model - Google Patents

User identification method based on user joint information super vector and joint information model Download PDF

Info

Publication number
CN113779191A
CN113779191A CN202110839516.5A CN202110839516A CN113779191A CN 113779191 A CN113779191 A CN 113779191A CN 202110839516 A CN202110839516 A CN 202110839516A CN 113779191 A CN113779191 A CN 113779191A
Authority
CN
China
Prior art keywords
user
information
vector
joint
follows
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110839516.5A
Other languages
Chinese (zh)
Other versions
CN113779191B (en
Inventor
刘巍巍
李亚楠
甘颖新
祁正伟
姜卫军
黄玉彬
刘建中
常海峰
柏华英
成静
马慧敏
谌秋实
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pla 61623
Original Assignee
Pla 61623
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pla 61623 filed Critical Pla 61623
Priority to CN202110839516.5A priority Critical patent/CN113779191B/en
Publication of CN113779191A publication Critical patent/CN113779191A/en
Application granted granted Critical
Publication of CN113779191B publication Critical patent/CN113779191B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a user identification method based on a user joint information super vector and a joint information model, which comprises the following steps: mapping user information from a one-dimensional feature space to a high-dimensional feature space, and constructing a user joint information feature super vector; constructing a user expansion joint information characteristic super vector according to various names or calling methods presented by user information; in the training stage, training the user joint information characteristic super vector to obtain a user joint information model; in the identification stage, model matching is carried out on the input user combined information characteristic super vector, or model matching is carried out after the input user expanded combined information characteristic super vector is converted into the user combined information characteristic super vector, and user identification is carried out according to a matching result. The invention can more accurately and quickly find the user when the information such as the name, the unit, the address and the like of the user presents various names and various calling methods, and has extremely high real-time property.

Description

User identification method based on user joint information super vector and joint information model
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a user identification method based on a user joint information super-vector and a joint information model, which is applied to an artificial intelligence directory assistance robot.
Background
The user information model commonly used at present is generally commercial customer information, which refers to some basic data about customers, such as customer preference, customer segment, customer demand, customer contact information, and the like. This type of information is mainly derived from the registered information of the client and the basic information of the client collected by the operation management system of the enterprise, and the most important evaluation element of the description information of the client is a dynamic commercial user image. The method aims to accurately identify a target customer group, expand the transaction amount and realize maximization and optimization of market transaction.
The existing commercial user portrait only needs to find a group with high probability meeting a target, has low requirement on accuracy, has no real-time requirement on query requirements, and can allow processing and processing of customer information, information mining and information extraction.
The user information view is used for government offices, and is different from a commercial information view in that static data is more concerned, the source for acquiring the user information is a user file, the user file is not published externally, and high accuracy and real-time are required, so that a new solution is required.
Disclosure of Invention
The invention aims to provide a user identification method based on a user joint information super-vector and a joint information model, which is used for an artificial intelligent number searching robot of government office departments, is convenient to search departments or individuals needing to be contacted and is suitable for occasions with more people, small relative mobility of the personnel and easy acquisition of personnel information. The invention can accurately find out the contact information of departments or individuals searched by the user, has high real-time performance, and can more accurately and quickly search the user when the user name, unit and address present various names and calls.
To solve the above technical problem, an embodiment of the present invention provides the following solutions:
a user identification method based on a user joint information super vector and a joint information model comprises the following steps:
s1, mapping the user information from the one-dimensional feature space to a high-dimensional feature space, and constructing a user joint information feature super vector;
s2, constructing a user expansion joint information feature super vector according to various names or calling methods presented by user information;
s3, in the training stage, training the user joint information characteristic super vector to obtain a user joint information model;
and S4, in the identification stage, performing model matching on the input user combined information feature super vector, or performing model matching after converting the input user expanded combined information feature super vector into a user combined information feature super vector, and performing user identification according to the matching result.
Preferably, in step S1, the user xiUser joint information feature supervector
Figure BDA0003178307130000021
The calculation is as follows:
Figure BDA0003178307130000022
the user extension joint information characteristic super vector is cascaded with a user xiName information vector of
Figure BDA0003178307130000023
Affiliated unit information vector
Figure BDA0003178307130000024
Post information vector
Figure BDA0003178307130000025
Rank information vector
Figure BDA0003178307130000026
Gender information vector
Figure BDA0003178307130000027
Contact telephone information vector
Figure BDA0003178307130000028
Address information vector
Figure BDA0003178307130000029
Access code information vector
Figure BDA00031783071300000210
Voiceprint feature information vector
Figure BDA00031783071300000211
Call authority information vector
Figure BDA00031783071300000212
Information modification time vector
Figure BDA00031783071300000213
Preferably, each vector is calculated as follows:
the name information vector is calculated as follows:
Figure BDA00031783071300000214
for user xiIs a 3-dimensional vector,
Figure BDA00031783071300000215
wherein
Figure BDA00031783071300000216
Is a two-dimensional vector that is,
Figure BDA00031783071300000217
Figure BDA00031783071300000218
Ssurnames is surname,
Figure BDA00031783071300000219
the unit information vector is calculated as follows:
Figure BDA00031783071300000220
for user xiThe unit information vector to which the data is added is a 5-dimensional vector,
Figure BDA00031783071300000221
wherein
Figure BDA00031783071300000222
SDepL_1S is a country-level department and
Figure BDA00031783071300000223
Figure BDA00031783071300000224
SDepL_2s is provincial department };
Figure BDA00031783071300000225
SDepL_3s is grade department;
Figure BDA00031783071300000226
SDepL_4s is a county-level department;
Figure BDA0003178307130000031
SDepL_5s is country department };
the job information vector is calculated as follows:
Figure BDA0003178307130000032
for user xiThe vector of the position information of the user,
Figure BDA0003178307130000033
p is the user's job space;
the rank information vector is calculated as follows:
Figure BDA0003178307130000034
for user xiThe level information vector of (a) is,
Figure BDA0003178307130000035
the gender information vector is calculated as follows:
Figure BDA0003178307130000036
for user xiThe vector of gender information of (a),
Figure BDA0003178307130000037
the contact telephone information vector is calculated as follows:
Figure BDA0003178307130000038
for user xiIs an 8-dimensional vector,
Figure BDA0003178307130000039
and is
Figure BDA00031783071300000310
Wherein
Figure BDA00031783071300000311
For user xiThe office fixed telephone number of (a) is,
Figure BDA00031783071300000312
is a fixed telephone number of a house,
Figure BDA00031783071300000313
in order to encrypt the mobile phone number,
Figure BDA00031783071300000314
is a non-encrypted mobile phone number,
Figure BDA00031783071300000315
the other contact ways are selected;
the address information vector is calculated as follows:
Figure BDA00031783071300000316
for user xiThe vector of address information of (a) is,
Figure BDA00031783071300000317
wherein the content of the first and second substances,
Figure BDA00031783071300000318
Figure BDA00031783071300000319
Figure BDA00031783071300000320
SAd_1s is a first-level administrative district };
Figure BDA00031783071300000321
SAd_2s is a secondary administrative district };
Figure BDA00031783071300000322
SAd_3s is a three-level administrative district };
Figure BDA00031783071300000323
SAd_4s is a four-level administrative district;
Figure BDA00031783071300000324
SAd_5s is a house number;
the access code information vector is calculated as follows:
Figure BDA00031783071300000325
for user xiThe access code information vector of (1) is an 8-bit integer;
the voiceprint feature information vector is calculated as follows:
Figure BDA00031783071300000326
for user xiThe voiceprint characteristic information vector adopts vector factors of a vector speaker as the voiceprint characteristics of the user,
Figure BDA00031783071300000327
here, 600 dimensions are taken;
the call permission information vector is calculated as follows:
Figure BDA00031783071300000328
for user xiThe call right information vector of (a), is a 3-dimensional vector,
Figure BDA00031783071300000329
Figure BDA00031783071300000330
is a calling subscriber xiTo call the inter-city telephone right,
Figure BDA00031783071300000331
is a calling subscriber xiThe right to call the inter-provincial telephone,
Figure BDA0003178307130000041
is a calling subscriber xiCall international telephone authority;
Figure BDA0003178307130000042
wherein 0 is no authority, 1 is authority;
the information modification time vector is calculated as follows:
Figure BDA0003178307130000043
for user xiThe last time the information modifies the time vector to a 12-bit integer.
Preferably, in step S2, the user xiThe calculation of the user expansion joint information characteristic super vector is as follows:
Figure BDA0003178307130000044
where each vector is calculated as follows:
Figure BDA0003178307130000045
Figure BDA0003178307130000046
(1≤j≤3),(nj∈N),
Figure BDA0003178307130000047
is that
Figure BDA0003178307130000048
One of an alias, synonym, dialect of the feature, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure BDA0003178307130000049
(1≤j≤3),(mj∈N)
Figure BDA00031783071300000410
is that
Figure BDA00031783071300000411
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure BDA00031783071300000412
(1≤j≤3),(oj∈N)
Figure BDA00031783071300000413
is that
Figure BDA00031783071300000414
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure BDA00031783071300000415
Figure BDA00031783071300000416
(1≤j≤5)
Figure BDA00031783071300000417
is that
Figure BDA00031783071300000418
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure BDA0003178307130000051
is that
Figure BDA0003178307130000052
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure BDA0003178307130000053
is that
Figure BDA0003178307130000054
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure BDA0003178307130000055
Figure BDA0003178307130000056
Figure BDA0003178307130000057
(1≤j≤5),(sj∈N),
Figure BDA0003178307130000058
is that
Figure BDA0003178307130000059
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure BDA00031783071300000510
(1≤j≤5),(tj∈N),
Figure BDA00031783071300000511
is that
Figure BDA00031783071300000512
Or an alias of (1), a synonym of (b), a dialect, or an approximation recognized as a result of a phonetic abbreviation resulting from a human pronunciation habit.
PreferablyIn step S3, for user xiIn the training phase, all others do not belong to user xiThe user joint feature supervectors are combined together to form a non-target training sample set, and a one-to-many method is adopted to carry out the operation on a user xiTraining to obtain user x by using the user joint feature supervectorsiThe user federated information model of (1).
Preferably, in step S4, the user recognition is based on the output of the SVM classifier, and the feature supervector is combined with any one user for input
Figure BDA00031783071300000513
Output function of SVM classifier
Figure BDA00031783071300000514
Comprises the following steps:
Figure BDA00031783071300000515
in the formula (I), the compound is shown in the specification,
Figure BDA00031783071300000516
the intelligent kernel function is an intelligent kernel function, and when a user selects different intelligent services, the intelligent kernel function can be changed along with the intelligent kernel function; y isjIt is shown that the ideal output is,
Figure BDA0003178307130000061
is a calling subscriber xcallingThe user association information feature supervector of (a),
Figure BDA0003178307130000062
is a support vector; and if the value of the output function of the SVM classifier exceeds a preset threshold, the corresponding input user is considered as a target user.
Preferably, for the vector space model, the decision expression becomes:
Figure BDA0003178307130000063
in the formula (I), the compound is shown in the specification,
Figure BDA0003178307130000064
the user associated information model parameter representing the user P.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
in the embodiment of the invention, a user joint information super vector and a user extension joint information super vector are provided aiming at user information data, a user joint information model is constructed, and a user is identified by carrying out model matching on the joint information super vector of an input user. The model is used for the artificial intelligent number-searching robot of the government office department, can help the artificial intelligent number-searching robot to more accurately and quickly search the user when the information such as the name, the unit, the address and the like of the user presents various names and various calling, and has extremely high real-time performance.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of a user identification method based on a user associated information super vector and an associated information model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a model training process provided by an embodiment of the invention;
fig. 3 is a schematic diagram of a user identification process provided in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
An embodiment of the present invention provides a user identification method based on a user joint information super vector and a joint information model, as shown in fig. 1, the method includes the following steps:
s1, mapping the user information from the one-dimensional feature space to a high-dimensional feature space, and constructing a user joint information feature super vector;
s2, constructing a user expansion joint information feature super vector according to various names or calling methods presented by user information;
s3, in the training stage, training the user joint information characteristic super vector to obtain a user joint information model, as shown in FIG. 2;
and S4, in the identification stage, performing model matching on the input user combined information feature super vector, or performing model matching after converting the input user expanded combined information feature super vector into a user combined information feature super vector, and performing user identification according to the matching result, as shown in FIG. 3.
Specifically, in the intelligent service, a calling user and a called user need to be defined, and in order to describe the feature super vector of one user, the user needs to be mapped to a user feature super vector space, namely, the user xiFrom a one-dimensional user space X to a high-dimensional feature space F, which is an Euclidean space (Euclidean Spaces). X → F, where user Xi(i is a natural number) is mapped to a user joint feature supervector
Figure BDA0003178307130000071
Namely, it is
Figure BDA0003178307130000072
In step S1, user xiUser joint information feature supervector
Figure BDA0003178307130000073
The calculation is as follows:
Figure BDA0003178307130000074
the user extension linkThe combined information feature supervectors cascade users xiName information vector of
Figure BDA0003178307130000075
Affiliated unit information vector
Figure BDA0003178307130000076
Post information vector
Figure BDA0003178307130000077
Rank information vector
Figure BDA0003178307130000078
Gender information vector
Figure BDA0003178307130000079
Contact telephone information vector
Figure BDA00031783071300000710
Address information vector
Figure BDA00031783071300000711
Access code information vector
Figure BDA00031783071300000712
Voiceprint feature information vector
Figure BDA00031783071300000713
Call authority information vector
Figure BDA00031783071300000714
Information modification time vector
Figure BDA00031783071300000715
Further, each vector is calculated as follows:
(1) the name information vector is calculated as follows:
Figure BDA00031783071300000716
for user xiIs a 3-dimensional vector,
Figure BDA00031783071300000717
wherein
Figure BDA00031783071300000718
Is a two-dimensional vector that is,
Figure BDA00031783071300000719
Figure BDA00031783071300000720
Ssurnames is surname,
Figure BDA00031783071300000721
(2) the unit information vector is calculated as follows:
Figure BDA00031783071300000722
for user xiThe unit information vector to which the data is added is a 5-dimensional vector,
Figure BDA00031783071300000723
wherein
Figure BDA00031783071300000724
SDepL_1S is a country-level department and
Figure BDA00031783071300000725
Figure BDA00031783071300000726
SDepL_2s is the province partA gate };
Figure BDA00031783071300000727
SDepL_3s is grade department;
Figure BDA0003178307130000081
SDepL_4s is a county-level department;
Figure BDA0003178307130000082
SDepL_5s is country department };
specifically, by 2018, 19.06.19, it is common nationwide (the following administrative divisions are counted and do not include hong kong and australian districts):
first-class administrative district (provincial administrative district): 34 (23 provinces, 5 autonomous regions, 4 municipalities in direct jurisdiction and 2 special administrative districts);
second-level administrative districts (ground-level administrative districts): 334 (294 municipalities, 7 regions, 30 autonomous states, 3 alliances);
third-level administrative districts (county-level administrative districts): 2851 (966 prefectures, 367 prefectures, 1347 prefectures, 117 autonomous prefectures, 49 flags, 3 autonomous flags, 1 special district, 1 forest district);
four-level administrative district (rural administrative district): 39888 (2 prefectures, 21116 towns, 9392 villages, 152 sappan wood, 984 ethnic villages, 1 ethnic sappan wood, 8241 streets);
(3) the job information vector is calculated as follows:
Figure BDA0003178307130000083
for user xiThe vector of the position information of the user,
Figure BDA0003178307130000084
p is the user's job space;
(4) the rank information vector is calculated as follows:
Figure BDA0003178307130000085
for user xiThe level information vector of (a) is,
Figure BDA0003178307130000086
specifically, the level settings are as shown in the following table:
grade Description of the invention
Level 0 Telephone with various service classes
Level 1 Unregistered subscriber
Stage 2 Clerk (twenty-seven to nineteen grade)
Grade 3 Scientists (twenty-six to eighteen grade)
4 stage Subsidiary of the department, subsidiary of the countryside, subsidiary of the chief and ren (twenty-four to seventy)
Grade 5 Department grade due time, country grade due time, chief and ren officer (twenty-two to sixteen grades)
Grade 6 Assistant investigator assistant subsidiary everywhere (twenty to fourteen grade)
Stage 7 Grade department, county, investigator (eighteen to twelve grade)
Stage 8 Department level and assistant staff, hall level and assistant patrolman (fifteen to ten levels)
Grade 9 Department level due, hall level due, patrol member (thirteen to eight level)
Grade 10 Department, province, and subsidiary (ten to six levels)
11 stage Department level due, provincial due (eight to four levels)
(5) The gender information vector is calculated as follows:
Figure BDA0003178307130000087
for user xiThe vector of gender information of (a),
Figure BDA0003178307130000088
(6) the contact telephone information vector is calculated as follows:
Figure BDA0003178307130000091
for user xiIs an 8-dimensional vector,
Figure BDA0003178307130000092
and is
Figure BDA0003178307130000093
Wherein
Figure BDA0003178307130000094
For user xiThe office fixed telephone number of (a) is,
Figure BDA0003178307130000095
is a fixed telephone number of a house,
Figure BDA0003178307130000096
in order to encrypt the mobile phone number,
Figure BDA0003178307130000097
is a non-encrypted mobile phone number,
Figure BDA0003178307130000098
the other contact ways are selected;
(7) the address information vector is calculated as follows:
Figure BDA0003178307130000099
for user xiThe vector of address information of (a) is,
Figure BDA00031783071300000910
wherein the content of the first and second substances,
Figure BDA00031783071300000911
Figure BDA00031783071300000912
Figure BDA00031783071300000913
SAd_1s is a first-level administrative district };
Figure BDA00031783071300000914
SAd_2s is a secondary administrative district };
Figure BDA00031783071300000915
SAd_3s is a three-level administrative district };
Figure BDA00031783071300000916
SAd_4s is a four-level administrative district;
Figure BDA00031783071300000917
SAd_5s is a house number;
(8) the access code information vector is calculated as follows:
Figure BDA00031783071300000918
for user xiThe access code information vector of (1) is an 8-bit integer;
(9) the voiceprint feature information vector is calculated as follows:
Figure BDA00031783071300000919
for user xiThe voiceprint characteristic information vector adopts vector factors of a vector speaker as the voiceprint characteristics of the user,
Figure BDA00031783071300000920
here, 600 dimensions are taken;
(10) the call permission information vector is calculated as follows:
Figure BDA00031783071300000921
for user xiThe call right information vector of (a), is a 3-dimensional vector,
Figure BDA00031783071300000922
Figure BDA00031783071300000923
is a calling subscriber xiTo call the inter-city telephone right,
Figure BDA00031783071300000924
is a calling subscriber xiThe right to call the inter-provincial telephone,
Figure BDA00031783071300000925
is a calling subscriber xiCall international telephone authority;
Figure BDA00031783071300000926
wherein 0 is no authority, 1 is authority;
(11) the information modification time vector is calculated as follows:
Figure BDA00031783071300000928
for user xiThe last time the information modifies the time vector to a 12-bit integer.
Further, in the step S2, the user xiThe calculation of the user expansion joint information characteristic super vector is as follows:
Figure BDA00031783071300000927
where each vector is calculated as follows:
Figure BDA0003178307130000101
Figure BDA0003178307130000102
(1≤j≤3),(nj∈N),
Figure BDA0003178307130000103
(0<k≤nj) Is that
Figure BDA0003178307130000104
One of an alias, synonym, dialect of the feature, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure BDA0003178307130000105
(1≤j≤3),(mj∈N)
Figure BDA0003178307130000106
(0<k≤mj) Is that
Figure BDA0003178307130000107
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure BDA0003178307130000108
(1≤j≤3),(oj∈N)
Figure BDA0003178307130000109
(0<k≤oj) Is that
Figure BDA00031783071300001010
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure BDA00031783071300001011
Figure BDA00031783071300001012
(1≤j≤5)
Figure BDA00031783071300001013
is that
Figure BDA00031783071300001014
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure BDA00031783071300001015
is that
Figure BDA00031783071300001016
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure BDA0003178307130000111
is that
Figure BDA0003178307130000112
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure BDA0003178307130000113
Figure BDA0003178307130000114
Figure BDA0003178307130000115
(1≤j≤5),(sj∈N),
Figure BDA0003178307130000116
(0<k≤sj) Is that
Figure BDA0003178307130000117
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure BDA0003178307130000118
(1≤j≤5),(tj∈N),
Figure BDA0003178307130000119
(0<k≤tj) Is that
Figure BDA00031783071300001110
Or an alias of (1), a synonym of (b), a dialect, or an approximation recognized as a result of a phonetic abbreviation resulting from a human pronunciation habit.
Further, in the step S3, for the user xiIn the training phase, all others do not belong to user xiThe user joint feature supervectors are combined together to form a non-target training sample set, and a one-to-many method is adopted to carry out the operation on a user xiTraining to obtain user x by using the user joint feature supervectorsiThe user federated information model of (1).
Further, in step S4, the user recognition is based on the output of the SVM classifier, and the feature supervector is combined for any user input
Figure BDA00031783071300001111
Output function of SVM classifier
Figure BDA00031783071300001112
Comprises the following steps:
Figure BDA00031783071300001113
in the formula (I), the compound is shown in the specification,
Figure BDA00031783071300001114
the intelligent kernel function is an intelligent kernel function, and when a user selects different intelligent services, the intelligent kernel function can be changed along with the intelligent kernel function; y isjIt is shown that the ideal output is,
Figure BDA00031783071300001115
is a calling subscriber xcallingThe user association information feature supervector of (a),
Figure BDA00031783071300001116
is a support vector; and if the value of the output function of the SVM classifier exceeds a preset threshold, the corresponding input user is considered as a target user.
For the vector space model, its decision expression becomes:
Figure BDA00031783071300001117
in the formula (I), the compound is shown in the specification,
Figure BDA0003178307130000121
the user associated information model parameter representing the user P.
In summary, the present invention provides a user associated information super vector and a user extended associated information super vector for user information data, constructs a user associated information model, and identifies a user by performing model matching on the associated information super vector of an input user. The model is used for the artificial intelligent number-searching robot of the government office department, can help the artificial intelligent number-searching robot to more accurately and quickly search the user when the information such as the name, the unit, the address and the like of the user presents various names and various calling, and has extremely high real-time performance.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (7)

1. A user identification method based on a user joint information super vector and a joint information model is characterized by comprising the following steps:
s1, mapping the user information from the one-dimensional feature space to a high-dimensional feature space, and constructing a user joint information feature super vector;
s2, constructing a user expansion joint information feature super vector according to various names or calling methods presented by user information;
s3, in the training stage, training the user joint information characteristic super vector to obtain a user joint information model;
and S4, in the identification stage, performing model matching on the input user combined information feature super vector, or performing model matching after converting the input user expanded combined information feature super vector into a user combined information feature super vector, and performing user identification according to the matching result.
2. The method for identifying users based on the supervector of joint information and the joint information model of claim 1, wherein in step S1, user xiUser joint information feature supervector
Figure FDA0003178307120000011
The calculation is as follows:
Figure FDA0003178307120000012
the user extension joint information characteristic super vector is cascaded with a user xiName information vector of
Figure FDA0003178307120000013
Affiliated unit information vector
Figure FDA0003178307120000014
Post information vector
Figure FDA0003178307120000015
Rank information vector
Figure FDA0003178307120000016
Gender information vector
Figure FDA0003178307120000017
Contact telephone information vector
Figure FDA0003178307120000018
Address information vector
Figure FDA0003178307120000019
Access code information vector
Figure FDA00031783071200000110
Voiceprint feature information vector
Figure FDA00031783071200000111
Call authority information vector
Figure FDA00031783071200000112
Information modification time vector
Figure FDA00031783071200000113
3. The method of claim 2, wherein each vector is calculated as follows:
the name information vector is calculated as follows:
Figure FDA00031783071200000114
for user xiIs a 3-dimensional vector,
Figure FDA00031783071200000115
wherein
Figure FDA00031783071200000116
Is a two-dimensional vector that is,
Figure FDA00031783071200000117
Figure FDA00031783071200000118
Ssurnames is surname,
Figure FDA00031783071200000119
the unit information vector is calculated as follows:
Figure FDA00031783071200000120
for user xiThe unit information vector to which the data is added is a 5-dimensional vector,
Figure FDA0003178307120000021
wherein
Figure FDA0003178307120000022
SDepL_1S is a country-level department and
Figure FDA0003178307120000023
Figure FDA0003178307120000024
SDepL_2s is provincial department };
Figure FDA0003178307120000025
SDepL_3s is grade department;
Figure FDA0003178307120000026
SDepL_4s is a county-level department;
Figure FDA0003178307120000027
SDepL_5s is country department };
the job information vector is calculated as follows:
Figure FDA0003178307120000028
for user xiThe vector of the position information of the user,
Figure FDA0003178307120000029
p is the user's job space;
the rank information vector is calculated as follows:
Figure FDA00031783071200000210
for user xiThe level information vector of (a) is,
Figure FDA00031783071200000211
the gender information vector is calculated as follows:
Figure FDA00031783071200000212
for user xiThe vector of gender information of (a),
Figure FDA00031783071200000213
the contact telephone information vector is calculated as follows:
Figure FDA00031783071200000214
for user xiIs an 8-dimensional vector,
Figure FDA00031783071200000215
and is
Figure FDA00031783071200000216
Wherein
Figure FDA00031783071200000217
For user xiThe office fixed telephone number of (a) is,
Figure FDA00031783071200000218
is a fixed telephone number of a house,
Figure FDA00031783071200000219
in order to encrypt the mobile phone number,
Figure FDA00031783071200000220
is a non-encrypted mobile phone number,
Figure FDA00031783071200000221
the other contact ways are selected;
the address information vector is calculated as follows:
Figure FDA00031783071200000222
for user xiThe vector of address information of (a) is,
Figure FDA00031783071200000223
wherein the content of the first and second substances,
Figure FDA00031783071200000224
Figure FDA00031783071200000225
Figure FDA00031783071200000226
SAd_1s is a first-level administrative district };
Figure FDA00031783071200000227
SAd_2s is a secondary administrative district };
Figure FDA00031783071200000228
SAd_3s is a three-level administrative district };
Figure FDA00031783071200000229
SAd_4s is a four-level administrative district;
Figure FDA00031783071200000230
SAd_5s is a house number;
the access code information vector is calculated as follows:
Figure FDA00031783071200000231
for user xiThe access code information vector of (1) is an 8-bit integer;
the voiceprint feature information vector is calculated as follows:
Figure FDA0003178307120000031
for user xiThe voiceprint characteristic information vector adopts vector factors of a vector speaker as the voiceprint characteristics of the user,
Figure FDA0003178307120000032
here, 600 dimensions are taken;
the call permission information vector is calculated as follows:
Figure FDA0003178307120000033
for user xiThe call right information vector of (a), is a 3-dimensional vector,
Figure FDA0003178307120000034
Figure FDA0003178307120000035
is a calling subscriber xiTo call the inter-city telephone right,
Figure FDA0003178307120000036
is a calling subscriber xiThe right to call the inter-provincial telephone,
Figure FDA0003178307120000037
is a calling subscriber xiCall international telephone authority;
Figure FDA0003178307120000038
wherein 0 is no authority, 1 is authority;
the information modification time vector is calculated as follows:
Figure FDA0003178307120000039
for user xiThe last time the information modifies the time vector to a 12-bit integer.
4. The method for identifying users based on the supervector of joint information and the joint information model of claim 1, wherein in step S2, user xiThe calculation of the user expansion joint information characteristic super vector is as follows:
Figure FDA00031783071200000310
where each vector is calculated as follows:
Figure FDA00031783071200000311
Figure FDA00031783071200000312
Figure FDA00031783071200000313
is that
Figure FDA00031783071200000314
One of an alias, synonym, dialect of the feature, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure FDA00031783071200000315
Figure FDA00031783071200000316
is that
Figure FDA00031783071200000317
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure FDA00031783071200000318
Figure FDA00031783071200000319
is that
Figure FDA00031783071200000320
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure FDA0003178307120000041
Figure FDA0003178307120000042
Figure FDA0003178307120000043
is that
Figure FDA0003178307120000044
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure FDA0003178307120000045
Figure FDA0003178307120000046
is that
Figure FDA0003178307120000047
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure FDA0003178307120000048
Figure FDA0003178307120000049
is that
Figure FDA00031783071200000410
One of an alias, a synonym, a dialect, or an approximation recognized by a voice abbreviation due to human pronunciation habits;
Figure FDA00031783071200000411
Figure FDA00031783071200000412
Figure FDA00031783071200000413
Figure FDA00031783071200000414
is that
Figure FDA00031783071200000415
Alias, synonym, dialect ofOne of calling, or an approximate word recognized by a voice abbreviation due to a human pronunciation habit;
Figure FDA00031783071200000416
Figure FDA00031783071200000417
is that
Figure FDA00031783071200000418
Or an alias of (1), a synonym of (b), a dialect, or an approximation recognized as a result of a phonetic abbreviation resulting from a human pronunciation habit.
5. The method for identifying users based on the supervector and the combined information model of claim 1, wherein in step S3, for user xiIn the training phase, all others do not belong to user xiThe user joint feature supervectors are combined together to form a non-target training sample set, and a one-to-many method is adopted to carry out the operation on a user xiTraining to obtain user x by using the user joint feature supervectorsiThe user federated information model of (1).
6. The method for identifying users based on the joint information supervector and the joint information model as claimed in claim 1, wherein the user identification is mainly based on the output of the SVM classifier in step S4, and for any one user joint feature supervector input
Figure FDA0003178307120000051
Output function of SVM classifier
Figure FDA0003178307120000052
Comprises the following steps:
Figure FDA0003178307120000053
in the formula (I), the compound is shown in the specification,
Figure FDA0003178307120000054
the intelligent kernel function is an intelligent kernel function, and when a user selects different intelligent services, the intelligent kernel function can be changed along with the intelligent kernel function; y isjIt is shown that the ideal output is,
Figure FDA0003178307120000055
Figure FDA0003178307120000056
is a calling subscriber xcallingThe user association information feature supervector of (a),
Figure FDA0003178307120000057
is a support vector; and if the value of the output function of the SVM classifier exceeds a preset threshold, the corresponding input user is considered as a target user.
7. The method of claim 6, wherein for the vector space model, the decision expression becomes:
Figure FDA0003178307120000058
in the formula (I), the compound is shown in the specification,
Figure FDA0003178307120000059
the user associated information model parameter representing the user P.
CN202110839516.5A 2021-07-23 2021-07-23 User identification method based on user joint information supervector and joint information model Active CN113779191B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110839516.5A CN113779191B (en) 2021-07-23 2021-07-23 User identification method based on user joint information supervector and joint information model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110839516.5A CN113779191B (en) 2021-07-23 2021-07-23 User identification method based on user joint information supervector and joint information model

Publications (2)

Publication Number Publication Date
CN113779191A true CN113779191A (en) 2021-12-10
CN113779191B CN113779191B (en) 2024-03-05

Family

ID=78836046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110839516.5A Active CN113779191B (en) 2021-07-23 2021-07-23 User identification method based on user joint information supervector and joint information model

Country Status (1)

Country Link
CN (1) CN113779191B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101535945A (en) * 2006-04-25 2009-09-16 英孚威尔公司 Full text query and search systems and method of use
CN101640043A (en) * 2009-09-01 2010-02-03 清华大学 Speaker recognition method based on multi-coordinate sequence kernel and system thereof
US20140244257A1 (en) * 2013-02-25 2014-08-28 Nuance Communications, Inc. Method and Apparatus for Automated Speaker Parameters Adaptation in a Deployed Speaker Verification System
CN106448681A (en) * 2016-09-12 2017-02-22 南京邮电大学 Super-vector speaker recognition method
US20190087529A1 (en) * 2014-03-24 2019-03-21 Imagars Llc Decisions with Big Data
US20190341057A1 (en) * 2018-05-07 2019-11-07 Microsoft Technology Licensing, Llc Speaker recognition/location using neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101535945A (en) * 2006-04-25 2009-09-16 英孚威尔公司 Full text query and search systems and method of use
CN101640043A (en) * 2009-09-01 2010-02-03 清华大学 Speaker recognition method based on multi-coordinate sequence kernel and system thereof
US20140244257A1 (en) * 2013-02-25 2014-08-28 Nuance Communications, Inc. Method and Apparatus for Automated Speaker Parameters Adaptation in a Deployed Speaker Verification System
US20190087529A1 (en) * 2014-03-24 2019-03-21 Imagars Llc Decisions with Big Data
CN106448681A (en) * 2016-09-12 2017-02-22 南京邮电大学 Super-vector speaker recognition method
US20190341057A1 (en) * 2018-05-07 2019-11-07 Microsoft Technology Licensing, Llc Speaker recognition/location using neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王彩霞;张志刚;: "关于无线网络用户需求信息快速识别仿真", 计算机仿真, no. 04 *

Also Published As

Publication number Publication date
CN113779191B (en) 2024-03-05

Similar Documents

Publication Publication Date Title
US20060112133A1 (en) System and method for creating and maintaining data records to improve accuracy thereof
CN109658062A (en) A kind of electronic record intelligent processing method based on deep learning
Christen et al. A probabilistic geocoding system based on a national address file
CN108197319A (en) A kind of audio search method and system of the characteristic point based on time-frequency local energy
CN110837568A (en) Entity alignment method and device, electronic equipment and storage medium
CN116414823A (en) Address positioning method and device based on word segmentation model
CN114780680A (en) Retrieval and completion method and system based on place name and address database
CN113672718A (en) Dialog intention recognition method and system based on feature matching and field self-adaption
US20180260473A1 (en) Full text retrieving and matching method and system based on lucene custom lexicon
CN107577744A (en) Nonstandard Address automatic matching model, matching process and method for establishing model
Duan et al. Building knowledge graph from public data for predictive analysis: a case study on predicting technology future in space and time
CN112836008B (en) Index establishing method based on decentralized storage data
CN113779191A (en) User identification method based on user joint information super vector and joint information model
CN116561327A (en) Government affair data management method based on clustering algorithm
CN110175219A (en) A kind of K12 stage repeats school's recognition methods, device, equipment and storage medium
CN115048682B (en) Safe storage method for land circulation information
CN113505863B (en) Image multistage classification method and system based on cascade mean vector comprehensive scoring
CN116662643A (en) Legal recommendation method, legal recommendation system, electronic device and storage medium
CN115719289A (en) House data processing method, device, equipment and medium
Christen et al. A probabilistic deduplication, record linkage and geocoding system
CN113792081B (en) Method and system for automatically checking data assets
CN112989811B (en) History book reading auxiliary system based on BiLSTM-CRF and control method thereof
CN109919811B (en) Insurance agent culture scheme generation method based on big data and related equipment
CN113535883A (en) Business place entity linking method, system, electronic device and storage medium
CN113065354A (en) Method for identifying geographic position in corpus and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant