CN108763246B - Personnel grouping method and device, storage medium and electronic equipment - Google Patents
Personnel grouping method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN108763246B CN108763246B CN201810273041.6A CN201810273041A CN108763246B CN 108763246 B CN108763246 B CN 108763246B CN 201810273041 A CN201810273041 A CN 201810273041A CN 108763246 B CN108763246 B CN 108763246B
- Authority
- CN
- China
- Prior art keywords
- feedback information
- phrase
- sample
- representation
- class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000013145 classification model Methods 0.000 claims description 97
- 238000002372 labelling Methods 0.000 claims description 42
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000013139 quantization Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 11
- 238000013527 convolutional neural network Methods 0.000 description 4
- 230000001131 transforming effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The disclosure provides a personnel grouping method and device, a storage medium and an electronic device. The method comprises the following steps: obtaining attention of a person to be dispensedInformation, splitting the attention information into at least one phrase, and acquiring vectorization representation of each phrase; clustering based on the phrase vectorized representation to obtain M1A first class, each first class representing a dimension information, M1Not less than 1; setting the weight corresponding to each dimension information, and obtaining the grouping result of the personnel to be distributed by using the dimension information and the weight corresponding to the dimension information. According to the scheme, the accuracy and the rationality of the grouping result are improved, and the satisfaction degree of the personnel to be distributed on the grouping result is improved.
Description
Technical Field
The present disclosure relates to the field of information processing technologies, and in particular, to a method and an apparatus for grouping persons, a storage medium, and an electronic device.
Background
In daily life, people grouping problems are often encountered. For example, schools, businesses, assigning dormitories for students or employees, classes, dividing study groups for students, departments, dividing work groups for employees, etc., all involve a person grouping problem.
In order to improve the rationality of grouping, some dimension information related to grouping is usually preset, and then clustering analysis is performed according to the dimension information to obtain a final grouping result. Taking school assignment students dormitory as an example, the dimension information that may be considered is: the grade, the hospital system, the sex, the age, the work and rest time and the hobbies can be set manually, the weighted sum of all dimension information of students is calculated, the students are clustered according to the weighted sum, and the obtained clustering result is the grouping result distributed by the dormitory.
In such a distribution scheme, some common information is usually selected as dimension information, the dimension information is relatively fixed, and the individual dimension information concerned by the personnel to be distributed is not considered, so that the accuracy and the rationality of the grouping result are low, and the satisfaction degree of the personnel on the grouping result is influenced.
Disclosure of Invention
The main purpose of the present disclosure is to provide a method and an apparatus for grouping persons, a storage medium, and an electronic device, which are helpful for improving the accuracy and rationality of the grouping result, and further improving the satisfaction of the persons to be assigned on the grouping result.
In order to achieve the above object, the present disclosure provides a person grouping method, the method including:
acquiring attention information of a person to be distributed, splitting the attention information into at least one phrase, and acquiring vectorization representation of each phrase;
clustering based on the phrase vectorized representation to obtain M1A first class, each first class representing a dimension information, M1≥1;
Setting the weight corresponding to each dimension information, and obtaining the grouping result of the personnel to be distributed by using the dimension information and the weight corresponding to the dimension information.
Optionally, the obtaining a vectorized representation of each phrase includes:
obtaining the initial vectorization representation of each phrase, and clustering based on the initial vectorization representation of the phrase to obtain M2A second class, M2>1;
Will be from the M2Combining the first sample phrases selected from the second categories into first sample phrase pairs pairwise, and acquiring the labeling information of each first sample phrase pair, wherein the labeling information is similar or dissimilar;
training to obtain a phrase classification model by using the first sample phrase pair and the labeling information of the first sample phrase pair, wherein the phrase classification model comprises a phrase representation layer for vectorization processing;
and taking the phrase obtained by splitting the attention information as an input, and outputting a new vectorization representation through the phrase representation layer to serve as the vectorization representation of the phrase.
Optionally, after obtaining a new vectorized representation of the phrase, the obtaining a vectorized representation of each phrase further includes:
clustering the new vectorized representation based on the phrase to obtain M3A third class, M3>1;
Selecting the distance difference (d)2-d1) Minimum N1A second sample phrase, d1Distance of the second sample phrase from the nearest class center of the third class, d2A class of the second sample phrase and a second nearest third classThe distance of the center;
combining the second sample phrase and the nearest class center of the third class and/or the second sample phrase and the second nearest class center of the third class into a second sample phrase pair in pairs, and acquiring the labeling information of each second sample phrase pair;
updating the phrase classification model by using the second sample phrase pair and the labeling information of the second sample phrase pair until the updated phrase classification model meets a preset condition, wherein the updated phrase classification model comprises an updated phrase representation layer;
and taking the phrase obtained by splitting the attention information as an input, and outputting an updated vectorization representation through the updated phrase representation layer to serve as the vectorization representation of the phrase.
Optionally, the method further comprises:
acquiring feedback information of the personnel to be distributed aiming at the grouping result and vectorization representation of each piece of feedback information;
clustering is carried out on vectorization representation based on feedback information to obtain M4A fourth class, each fourth class corresponding to a satisfaction level, M4≥1;
Judging the M4Whether a category to be adjusted is included in the fourth categories or not, wherein the satisfaction degree grade corresponding to the category to be adjusted is used for indicating that a person to be adjusted belonging to the category to be adjusted is not satisfied with the grouping result, and the person to be adjusted belongs to the person to be distributed;
if said M is4If the fourth category comprises the category to be adjusted, acquiring the grouping update information of the personnel to be adjusted;
and adjusting the grouping result of the personnel to be adjusted by utilizing the grouping update information.
Optionally, the obtaining a vectorized representation of each piece of feedback information includes:
obtaining the initial vectorization representation of each piece of feedback information, and clustering based on the initial vectorization representation of the feedback information to obtain M5In a fifth one of the categories of the first,M5>1;
from the M5Selecting first sample feedback information from the fifth category, and marking the satisfaction degree grade of each first sample feedback information;
training to obtain a satisfaction degree classification model by using the first sample feedback information and the satisfaction degree grade of the first sample feedback information, wherein the satisfaction degree classification model comprises a text representation layer for vectorization processing;
and taking the feedback information as input, and outputting a new vectorization representation through the text representation layer to serve as the vectorization representation of the feedback information.
Optionally, after obtaining the new vectorized representation of the feedback information, the obtaining the vectorized representation of each piece of feedback information further includes:
clustering the new vectorized representation based on the feedback information to obtain M6A sixth class, M6>1;
Selecting the distance difference (d)4-d3) Minimum N2Second sample feedback information and marking the satisfaction degree grade of each second sample feedback information, d3Distance of the second sample feedback information from the nearest class center of the sixth class, d4Feeding back the distance between the information and the class center of the second sample in the sixth next category;
calculating an update class center based on the satisfaction degree level of the first sample feedback information, the satisfaction degree level of the second sample feedback information, and the satisfaction degree levels of the remaining feedback information except the first sample feedback information and the second sample feedback information, wherein the satisfaction degree levels of the remaining feedback information are determined by the category to which the remaining feedback information belongs;
clustering the feedback information again based on the update class centers to determine the update class to which each piece of feedback information belongs, wherein each update class corresponds to one update class center;
updating the satisfaction degree classification model by using the feedback information and the update classification to which the feedback information belongs until the updated satisfaction degree classification model meets a preset condition, wherein the updated satisfaction degree classification model comprises an updated text representation layer;
and taking the feedback information as input, and outputting an updated vectorization representation through the updated text representation layer to serve as the vectorization representation of the feedback information.
Optionally, obtaining M6After the sixth category, the obtaining the vectorized representation of each piece of feedback information further includes:
determining M7An available class center, said M7An available class center and said M6The distance between class centers of the sixth class is not less than the preset distance;
based on a new class center, clustering the feedback information again to determine a new class to which each piece of feedback information belongs, wherein each new class corresponds to a new class center, and the new class center comprises the M7An available class center and said M6Class centers of a sixth category;
then, the second sample feedback information is selected as follows: selecting the distance difference (d)6-d5) Minimum N2Second sample feedback information, d5Distance of the second sample feedback information from the nearest new class center, d6The distance of the information from the next nearest new class center is fed back for the second sample.
The present disclosure provides a people grouping apparatus, the apparatus comprising:
the concerned information splitting module is used for acquiring the concerned information of the personnel to be distributed and splitting the concerned information into at least one phrase;
a phrase vectorization representation acquisition module for acquiring a vectorization representation of each phrase;
a phrase clustering module for clustering based on the vectorized representation of the phrase to obtain M1A first class, each first class representing a dimension information, M1≥1;
And the grouping result obtaining module is used for setting the weight corresponding to each dimension information and obtaining the grouping result of the personnel to be distributed by using the dimension information and the weight corresponding to the dimension information.
Optionally, the phrase vectorization representation obtaining module includes:
a first vectorized representation obtaining module for obtaining an initial vectorized representation of each phrase;
a first clustering module for clustering based on the initial vectorization representation of the phrase to obtain M2A second class, M2>1;
A first sample phrase selection module to select from the M2Combining the first sample phrases selected from the second categories into first sample phrase pairs pairwise, and acquiring the labeling information of each first sample phrase pair, wherein the labeling information is similar or dissimilar;
the phrase classification model training module is used for training to obtain a phrase classification model by utilizing the first sample phrase pair and the labeling information of the first sample phrase pair, and the phrase classification model comprises a phrase representation layer used for vectorization processing;
and the second vectorization representation output module is used for taking the phrase split from the attention information as input and outputting a new vectorization representation as the vectorization representation of the phrase through the phrase representation layer.
Optionally, the phrase vectorization representation obtaining module further includes:
a second clustering module, configured to, after obtaining the new vectorized representation of the phrase, perform clustering based on the new vectorized representation of the phrase to obtain M3A third class, M3>1;
A second sample phrase selecting module for selecting the distance difference (d)2-d1) Minimum N1A second sample phrase, d1Distance of the second sample phrase from the nearest class center of the third class, d2Distance of the second sample phrase from a class center of a next closest third class; the second sample phrase is associated with the nearest class center of the third category, and/or the first categoryCombining every two of the second sample phrases and the second nearest class center of the third class into second sample phrase pairs, and acquiring the labeling information of each second sample phrase pair;
the phrase classification model updating module is used for updating the phrase classification model by using the second sample phrase pair and the labeling information of the second sample phrase pair until the updated phrase classification model meets a preset condition, and the updated phrase classification model comprises an updated phrase representation layer;
and the third vectorization representation output module is used for taking the phrase split from the attention information as input, and outputting an updated vectorization representation through the updated phrase representation layer to serve as the vectorization representation of the phrase.
Optionally, the apparatus further comprises:
the feedback information acquisition module is used for acquiring feedback information of the personnel to be distributed aiming at the grouping result;
the feedback information vectorization representation acquisition module is used for acquiring the vectorization representation of each piece of feedback information;
a feedback information clustering module for clustering based on vectorization representation of feedback information to obtain M4A fourth class, each fourth class corresponding to a satisfaction level, M4≥1;
A module for judging the category of M4Whether a category to be adjusted is included in the fourth categories or not, wherein the satisfaction degree grade corresponding to the category to be adjusted is used for indicating that a person to be adjusted belonging to the category to be adjusted is not satisfied with the grouping result, and the person to be adjusted belongs to the person to be distributed;
a packet update information acquisition module for acquiring packet update information at the M4When the fourth category comprises the category to be adjusted, acquiring the grouping update information of the personnel to be adjusted;
and the grouping result adjusting module is used for adjusting the grouping result of the personnel to be adjusted by utilizing the grouping updating information.
Optionally, the feedback information vectorization representation obtaining module includes:
the fourth-direction quantization representation acquisition module is used for acquiring the initial quantization representation of each piece of feedback information;
a third clustering processing module for clustering based on the initial vectorization representation of the feedback information to obtain M5A fifth category, M5>1;
A first sample feedback information selection module for selecting M5Selecting first sample feedback information from the fifth category, and marking the satisfaction degree grade of each first sample feedback information;
the satisfaction degree classification model training module is used for training to obtain a satisfaction degree classification model by utilizing the first sample feedback information and the satisfaction degree grade of the first sample feedback information, and the satisfaction degree classification model comprises a text representation layer used for vectorization processing;
and a fifth vectorization representation output module, configured to output a new vectorization representation as the vectorization representation of the feedback information through the text representation layer, with the feedback information as an input.
Optionally, the feedback information vectorization representation obtaining module further includes:
a fourth clustering module, configured to, after obtaining the new vectorized representation of the feedback information, perform clustering based on the new vectorized representation of the feedback information to obtain M6A sixth class, M6>1;
A second sample feedback information selection module for selecting the distance difference (d)4-d3) Minimum N2Second sample feedback information and marking the satisfaction degree grade of each second sample feedback information, d3Distance of the second sample feedback information from the nearest class center of the sixth class, d4Feeding back the distance between the information and the class center of the second sample in the sixth next category;
an update class center calculation module, configured to calculate an update class center based on the satisfaction level of the first sample feedback information, the satisfaction level of the second sample feedback information, and the satisfaction levels of remaining feedback information except the first sample feedback information and the second sample feedback information, where the satisfaction levels of the remaining feedback information are determined by a category to which the remaining feedback information belongs;
the fifth clustering processing module is used for clustering the feedback information again based on the update class center to determine the update class to which each piece of feedback information belongs, and each update class corresponds to one update class center;
the satisfaction degree classification model updating module is used for updating the satisfaction degree classification model by using the feedback information and the update classification to which the feedback information belongs until the updated satisfaction degree classification model meets a preset condition, and the updated satisfaction degree classification model comprises an updated text representation layer;
and a sixth vectorization representation output module, configured to output, as the vectorization representation of the feedback information, an updated vectorization representation through the updated text representation layer, with the feedback information as an input.
Optionally, the feedback information vectorization representation obtaining module further includes:
an available class center determination module for obtaining M6After a sixth category, M is determined7An available class center, said M7An available class center and said M6The distance between class centers of the sixth class is not less than the preset distance;
a sixth clustering module, configured to perform clustering processing on the feedback information again based on a new class center, to determine a new class to which each piece of feedback information belongs, where each new class corresponds to a new class center, and the new class center includes the M7An available class center and said M6Class centers of a sixth category;
then, the second sample feedback information selection module is used for selecting the distance difference (d)6-d5) Minimum N2Second sample feedback information, d5Distance of the second sample feedback information from the nearest new class center, d6Feeding back information and next nearest new for second sampleThe distance of the class center.
The present disclosure provides a storage medium having stored therein a plurality of instructions, which are loaded by a processor, for performing the steps of the above-described person grouping method.
The present disclosure provides an electronic device, comprising;
the storage medium described above; and
a processor to execute the instructions in the storage medium.
According to the scheme, the attention information of the personnel to be distributed can be analyzed, the dimension information which can reflect the requirements of the personnel to be distributed is determined in a clustering mode, and then the personnel are grouped based on the determined dimension information.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is a schematic flow chart of an embodiment 1 of a person grouping method according to the present disclosure;
FIG. 2 is a schematic flow diagram of embodiment 1 of vectorization representation of phrase acquisition in the disclosed solution;
FIG. 3 is a network diagram of a phrase classification model in accordance with the disclosed aspects;
FIG. 4 is a flow diagram of embodiment 2 of vectorization representation of phrase acquisition in the disclosed solution;
FIG. 5 is a schematic flow chart of embodiment 2 of the person grouping method according to the present disclosure;
FIG. 6 is a schematic flow chart of embodiment 1 of vectorization representation for obtaining feedback information in the solution of the present disclosure;
FIG. 7 is a network diagram of an satisfaction classification model in accordance with aspects of the present disclosure;
FIG. 8 is a schematic diagram of a node corresponding to an intermediate node in the disclosed solution;
FIG. 9 is a schematic flow chart of embodiment 2 of vectorization representation for obtaining feedback information in the disclosed solution;
FIG. 10 is a schematic diagram of a person grouping apparatus according to an embodiment of the present disclosure;
fig. 11 is a schematic structural diagram of an electronic device for grouping people according to the present disclosure.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
Referring to fig. 1, a flow diagram of embodiment 1 of the grouping method of the present disclosure is shown. May include the steps of:
s101, obtaining attention information of a person to be distributed, dividing the attention information into at least one phrase, and obtaining vectorization representation of each phrase.
In order to improve the accuracy and the rationality of the grouping result, the dimension analysis can be performed on the basis of the information concerned by the personnel to be distributed, and the dimension information which can better reflect the requirements of the personnel to be distributed is determined. It can be understood that the dimension information determined by the scheme of the present disclosure is, compared with the common dimension information in the prior art, not the dimension information representing the individual needs of a single person to be allocated, that is, the dimension information determined by the scheme of the present disclosure is also the common needs of the persons to be allocated, except that the prior art may not use the dimension information as a factor influencing the grouping result.
As an example, the attention information of the person to be distributed may be obtained by way of questionnaire, which may not be specifically limited by the present disclosure. After the attention information is obtained, the attention information can be split to obtain at least one phrase, and dimension information for reflecting the needs of the staff to be distributed is found by analyzing the phrases. It is to be understood that in the course of practical application, when a phrase is divided, certain stop words which are not in themselves expressly defined, such as "and" in "may be filtered out, and the disclosure is not limited thereto.
S102, clustering processing is carried out based on the vectorization expression of the phrases to obtain M1A first class, each first class representing a dimension information, M1≥1。
The scheme of the disclosure can perform clustering processing based on vectorization expression of phrases to obtain M1A first category, discovery M1And (4) dimension information. For example, the clustering process may be performed by a K-means algorithm, a K-nearest neighbor algorithm, and the like, which is not particularly limited in the present disclosure.
In the practical application process, a great number of synonyms may exist in the phrases split from the attention information, and in order to improve the clustering accuracy of the synonym phrases, vectorization representation of the phrases can be obtained based on a word embedding (Chinese: word embedding) mode, and then clustering processing is performed based on the vectorization representation of the phrases.
As an example, a vectorized representation of each word included in a phrase may be obtained based on a general corpus; determining the mean value of the vectorization representation of each word as the initial vectorization representation of the phrase, and then carrying out clustering processing based on the initial vectorization representation of the phrase to obtain M1A first category.
In addition, in consideration of a certain difference between the general corpus and the corpus formed by the phrases split from the attention information, in order to improve the accuracy of phrase vectorization representation, the present disclosure further provides a new method for obtaining the vectorization representation of the phrases, which can be specifically described in fig. 2 and 4 below, and will not be described in detail here.
S103, setting the weight corresponding to each dimension information, and obtaining the grouping result of the personnel to be distributed by using the dimension information and the weight corresponding to the dimension information.
Obtaining M based on the phrases split out from the attention information1After each dimension information, the weight corresponding to each dimension information can be setAnd then calculating a weighted sum based on the dimension information and the weight corresponding to the dimension information, and clustering the personnel to be distributed according to the weighted sum to obtain a clustering result, namely a grouping result of the personnel to be distributed. As an example, the weight value of each dimension information may be set artificially, and this may not be particularly limited in the present disclosure.
In the actual application process, people can be grouped only based on the found dimension information, or people can be grouped by combining the found dimension information and the existing common dimension information; for discovered dimension information, it can be based on M1Personal grouping by individual dimension information, or can be based on slave M1And carrying out personnel grouping on part of the dimension information selected from the dimension information. The specific implementation mode of utilizing the dimension information to carry out personnel grouping in the scheme disclosed by the invention is not limited, and can be determined by combining the actual application requirements.
In summary, according to the scheme, M is determined in a clustering manner by analyzing the attention information of the personnel to be distributed1Compared with the prior art that personnel are grouped only through common dimension information, the method and the device for grouping the personnel can better reflect the dimension information required by the personnel to be distributed, and are beneficial to improving the accuracy and the reasonability of the grouping result, so that the satisfaction degree of the personnel to be distributed on the grouping result is improved.
Referring to FIG. 2, a flow diagram of a vectorized representation embodiment 1 of the present disclosure of acquiring phrases is shown. May include the steps of:
s201, obtaining the initial vectorization representation of each phrase, and clustering based on the initial vectorization representation of the phrase to obtain M2A second class, M2>1。
S202, will follow the M2And combining the first sample phrases selected from the second categories into first sample phrase pairs pairwise, and acquiring the labeling information of each first sample phrase pair, wherein the labeling information is similar or dissimilar.
According to the introduction, after the initial vectorization representation of the phrase is obtained based on the general language database, the initial vectorization representation of the phrase can be firstly utilized for carrying outClustering once to obtain M2A second category, M derived from this consideration, given that the initial vectorized representation of the phrase is relatively less accurate2The accuracy of this second category is also relatively low. In view of the above, the present disclosure may train a phrase classification model including a phrase representation layer, and perform vectorization processing on the phrases split from the attention information through the phrase representation layer, so as to improve accuracy of vectorization representation and further improve accuracy of clustering.
Specifically, training the sample data of the phrase classification model may include:
(1) a first pair of sample phrases. Specifically, M may be the first to be selected2And selecting first sample phrases from the second categories, and combining the first sample phrases pairwise to obtain a first sample phrase pair.
(2) Labeling information of the first sample phrase pair. Specifically, the labeling information may be embodied as similar or dissimilar, and the labeling information of the first sample phrase pair may be artificially set.
As an example, M may be randomly selected from2Selecting a first sample phrase from the second category; alternatively, the first sample phrase may be selected from among the aliasing-prone phrases located at the boundaries of the second category. In particular, the distance d of each phrase from the nearest class center of the second category may be calculated separatelyM21Distance d between each phrase and the class center of the second, next closest classM22(ii) a In general, the phrase corresponds to a distance difference (d)M22-dM21) The larger the size, the less likely inter-class aliasing will occur, so it can be based on (d)M22-dM21) And sorting the phrases, and selecting a certain number of phrases with the minimum distance difference as first sample phrases.
The method for selecting the first sample phrase, the number of the first sample phrases and the number of the first sample phrase pairs are not particularly limited in the scheme disclosed by the disclosure, and in order to ensure data balance of the training samples, the number of the first sample phrase pairs consisting of the intra-class phrases and the number of the first sample phrase pairs consisting of the inter-class phrases can be equal, so that the trained phrase classification model has good distinguishability.
As an example, the clustering process is performed based on the initial vectorized representation of the phrase, which may be embodied as automatic clustering, i.e. the number of the second category is not limited; or, in order to improve the clustering efficiency, the number of the second categories may be determined in advance by setting a super parameter, and then clustering is performed. The specific implementation manner of the clustering process in the scheme of the present disclosure may not be limited.
S203, training to obtain a phrase classification model by using the first sample phrase pair and the labeling information of the first sample phrase pair, wherein the phrase classification model comprises a phrase representation layer for vectorization processing.
As one example, the phrase classification model may be trained using the network shown in FIG. 3. The phrase representation layer comprises a left part and a right part, the left part and the right part have the same network layer, the weights of the network layers are kept consistent, namely the left part and the right part share parameters, the input of the model is a first phrase pair of the sample, and the output of the model is the label information of the first phrase pair of the sample. For example, the topological structure of the phrase representation layer may be embodied as CNN (chinese: Convolutional Neural Network), RNN (chinese: Convolutional Neural Network), Recursive self-coding Recursive auto encoder, and the like, which is not specifically limited in this disclosure.
Specifically, two phrases in the first sample phrase pair are respectively input into the left part and the right part of the phrase representation layer, and are subjected to convolution layer, pooling layer and full connection layer to obtain vectorization representation of the phrases; then, the two vectorization representation results output by the phrase representation layer are spliced and merged by the series layer, and the labeling information of the first phrase pair is output by the feedforward layer. It can be understood that when the labeling information of the first sample phrase pair output by the phrase classification model is consistent with the labeling information set by human, the model training can be considered to be completed.
And S204, taking the phrase obtained by splitting the attention information as input, and outputting a new vectorization representation through the phrase representation layer to serve as the vectorization representation of the phrase.
In contrast to a general-purpose corpus,the phrase classification model obtained by training the phrases obtained by splitting the attention information is used for vectorizing, so that the accuracy of vectorization processing is higher, and the new vectorization representation output by the phrase representation layer can be used as the vectorization representation of the phrases. Correspondingly, S102 may perform clustering based on the new vectorized representation of the phrase, resulting in M1A first category.
Referring to FIG. 4, a flow diagram of a vectorized representation embodiment 2 of the present disclosure of acquiring phrases is shown. May include the steps of:
s301, obtaining the initial vectorization representation of each phrase, and clustering based on the initial vectorization representation of the phrase to obtain M2A second class, M2>1。
S302, will follow the M2And combining the first sample phrases selected from the second categories into first sample phrase pairs pairwise, and acquiring the labeling information of each first sample phrase pair, wherein the labeling information is similar or dissimilar.
And S303, training to obtain a phrase classification model by using the first sample phrase pair and the labeling information of the first sample phrase pair, wherein the phrase classification model comprises a phrase representation layer for vectorization processing.
S304, taking the phrase obtained by splitting the attention information as input, and outputting a new vectorization representation through the phrase representation layer.
The implementation process of steps S301 to S304 can refer to the descriptions of steps S201 to S204 above, and will not be described herein again.
S305, clustering processing is carried out based on the new vectorization representation of the phrase to obtain M3A third class, M3>1。
S306, selecting the distance difference (d)2-d1) Minimum N1A second sample phrase, d1Distance of the second sample phrase from the nearest class center of the third class, d2Is the distance of the second sample phrase from the class center of the next closest third class.
S307, combining the second sample phrase and the nearest class center of the third class and/or the second sample phrase and the second nearest class center of the third class into a second sample phrase pair in pairs, and acquiring the labeling information of each second sample phrase pair.
In order to further improve the accuracy of the phrase vectorization representation, the scheme disclosed by the invention also provides a scheme for optimizing the phrase classification model. Specifically, optimizing the sample data of the phrase classification model may include:
(1) a second sample phrase pair.
Specifically, clustering may be performed based on a new vectorized representation of the phrase to obtain M3A third category; then selecting a second sample phrase from the phrases which are located at the boundary of the third category and easy to be aliased; and combining the second sample phrase and the corresponding class center pairwise to obtain a second sample phrase pair.
As an example, the way of selecting the second sample phrase may be embodied as: respectively calculating the distance d between each phrase and the nearest class center of the third class1Distance d between each phrase and the class center of the second closest third class2(ii) a In general, the phrase corresponds to a distance difference (d)2-d1) The larger the size, the less likely inter-class aliasing will occur, so it can be based on (d)2-d1) Sorting the phrases and selecting the smallest top N1The individual phrase serves as the second sample phrase. In practical applications, the second sample phrase may be selected from phrases other than the first sample phrase, which is not specifically limited in this disclosure.
The above-mentioned combining the second sample phrase and the corresponding class center two by two means that the second sample phrase and the class center of the nearest third class and/or the second sample phrase and the class center of the second nearest third class are combined two by two to obtain the second sample phrase pair.
In addition, M is3The third category may be clustered by automatic clustering or by setting a super parameter, which is not specifically limited in the present disclosure.
(2) Labeling information of the second sample phrase pair. Specifically, the labeling information may be embodied as similar or dissimilar, and the labeling information of the second sample phrase pair may be artificially set.
And S308, updating the phrase classification model by using the second sample phrase pair and the labeling information of the second sample phrase pair until the updated phrase classification model meets a preset condition, wherein the updated phrase classification model comprises an updated phrase representation layer.
In the present disclosure, the satisfaction of the preset condition may be embodied as: updating iteration times of the phrase classification model are not less than preset times; or, clustering is carried out by using the updated vectorization representation output by the updated phrase representation layer, and the following conditions are met: (d)2-d1) Not less than a predetermined distance difference; or, (d)2-d1) The number of phrases not less than the preset distance difference is not less than the preset number, and the like, and the scheme disclosed by the invention does not limit the preset condition, the preset times, the preset distance difference, the preset number and the like, and can be specifically set by combining with the actual application requirements.
S309, taking the phrase obtained by splitting the attention information as an input, and outputting an updated vectorization representation through the updated phrase representation layer to serve as the vectorization representation of the phrase.
The accuracy of the vectorization representation can be further improved by optimizing the updated phrase classification model by using the second sample phrase pair, so that the updated vectorization representation output by the updated phrase representation layer can be used as the vectorization representation of the phrase. Correspondingly, S102 may perform clustering based on the updated vectorized representation of the phrase, resulting in M1A first category.
As an example, in order to further improve the accuracy and the reasonability of the grouping result, the present disclosure further provides a scheme for adjusting the grouping result based on the feedback information of the person to be allocated. Referring to fig. 5, a flow diagram of embodiment 2 of the grouping method of the present disclosure is shown. May include the steps of:
s401, obtaining feedback information of the personnel to be distributed aiming at the grouping result and vectorization representation of each piece of feedback information.
After the grouping result of the to-be-distributed personnel is obtained according to the method shown in fig. 1, feedback information of the to-be-distributed personnel for the grouping result can be obtained, the grouping result is optimized based on the feedback information, and the satisfaction degree of the to-be-distributed personnel for the grouping result is further improved.
As an example, the feedback information of the person to be allocated may be obtained through questionnaire survey, periodic return visit, and the like, which may not be specifically limited by the present disclosure.
S402, clustering is carried out on the basis of vectorization representation of feedback information to obtain M4A fourth class, each fourth class corresponding to a satisfaction level, M4≥1。
The scheme of the method and the device can perform clustering processing based on vectorization expression of the feedback information after obtaining the M4The fourth category may be artificially labeled with a satisfaction level corresponding to each fourth category, for example, the satisfaction level may be satisfactory, general, or unsatisfactory, and the number of the satisfaction levels in the present disclosure may not be specifically limited. For example, the clustering process may be performed by a K-means algorithm, a K-nearest neighbor algorithm, and the like, which is not particularly limited in the present disclosure.
As an example, a vectorized representation of each word included in the feedback information may be obtained based on the general corpus; determining the mean value of the vectorization representation of each word as the initial vectorization representation of the feedback information, and then carrying out clustering processing based on the initial vectorization representation of the feedback information to obtain M4A fourth category.
In addition, in consideration of a certain difference between the general corpus and the corpus formed by the feedback information, in order to improve accuracy of vectorization representation of the feedback information, the present disclosure further provides a new method for obtaining vectorization representation of the feedback information, which is described in fig. 6 and 9 below and will not be described in detail here.
S403, judging the M4Whether the fourth category comprises a category to be adjusted or not, wherein the satisfaction degree grade corresponding to the category to be adjusted is used for indicating that the personnel to be adjusted belonging to the category to be adjusted are not satisfied with the grouping result, and the personnel to be adjusted belong to the category to be adjustedThe adjustment person belongs to the person to be allocated.
Based on the feedback information, obtaining M4After the fourth category and the satisfaction degree level corresponding to each fourth category, whether the category to be adjusted is included can be judged. If the category to be adjusted does not exist, all the staff to be distributed are satisfied with the grouping result obtained by the method shown in the figure 1; otherwise, part of the staff to be distributed is not satisfied with the grouping result obtained by the method shown in fig. 1, namely, the staff to be adjusted exists.
For example, if 2 fourth categories are obtained by clustering, the corresponding satisfaction levels are: satisfactory, unsatisfactory, a fourth category indicative of dissatisfaction may be determined as the category to be adjusted. For example, if 4 fourth categories are obtained by clustering, the corresponding satisfaction levels are: satisfactory, general, unsatisfactory, or very unsatisfactory, a fourth category indicative of dissatisfaction and/or very unsatisfactory may be determined as the category to be adjusted, depending on the requirements. The satisfaction degree grade corresponding to the category to be adjusted in the scheme of the disclosure can be determined by actual application requirements without limitation.
S404, if the M is4And if the fourth category comprises the category to be adjusted, acquiring the group updating information of the personnel to be adjusted.
S405, the grouping result of the personnel to be adjusted is adjusted by utilizing the grouping updating information.
When it is determined by S403 that there is a category to be adjusted, the grouping update information of the persons to be adjusted belonging to the category may be acquired, and the grouping result of each person to be adjusted is adjusted according to the grouping update information, so as to improve the satisfaction of the person to be adjusted with the grouping result. As an example, the group update information of the person to be adjusted may be input by an external person, and this may not be specifically limited by the present disclosure.
Referring to fig. 6, a flow diagram of a vectorization representation embodiment 1 of the present disclosure for obtaining feedback information is shown. May include the steps of:
s501, obtaining the initial vectorization representation of each piece of feedback information, and clustering based on the initial vectorization representation of the feedback information to obtain M5A fifth category, M5>1。
S502, from the M5And selecting first sample feedback information from the fifth category, and marking the satisfaction degree grade of each first sample feedback information.
According to the introduction, after the initial vectorization representation of the feedback information is obtained based on the general language database, the initial vectorization representation of the feedback information can be utilized to perform a clustering process to obtain M5A fifth category, M derived from this consideration, in view of the relatively low accuracy of the initial vectorized representation of the feedback information5The accuracy of this fifth category is also relatively low. Aiming at the problem, the scheme disclosed by the invention can train a satisfaction degree classification model comprising a text representation layer, and the feedback information is subjected to vectorization processing through the text representation layer, so that the accuracy of vectorization representation can be improved, and the clustering accuracy is further improved.
Specifically, training the sample data of the satisfaction classification model may include:
(1) the first sample feeds back information. In particular, can be selected from M5The first sample feedback information is selected from a fifth category.
(2) A satisfaction level of the first sample feedback information. Specifically, the satisfaction level may be at least one of satisfactory, general, unsatisfactory, and extremely unsatisfactory, and the satisfaction level of the first sample feedback information may be set artificially.
As an example, M may be randomly selected from5Selecting first sample feedback information from the fifth category; alternatively, the first sample feedback information may be selected from among aliasing-prone feedback information located at the boundary of the fifth category. Specifically, the distance d between each feedback information and the nearest class center of the fifth category may be calculated respectivelyM51Distance d between each feedback information and the class center of the next closest fifth classM52(ii) a Usually, the feedback information corresponds to a distance difference (d)M52-dM51) The larger the size, the less likely inter-class aliasing will occur, so it can be based on (d)M52-dM51) Sorting the feedback information, selecting a certain amount of feedback information with the minimum distance difference as the first sample feedbackAnd (4) information.
The method for selecting the first sample feedback information and the number of the first sample feedback information are not particularly limited, and in order to ensure the data balance of the training samples, the number of the first sample feedback information with different satisfaction degrees can be equivalent, so that the satisfaction degree classification model obtained by training has better distinguishability.
As an example, M5The fifth category may be clustered by automatic clustering or by setting a super parameter, which is not specifically limited in the present disclosure.
S503, training to obtain a satisfaction degree classification model by using the first sample feedback information and the satisfaction degree grade of the first sample feedback information, wherein the satisfaction degree classification model comprises a text representation layer for vectorization processing.
As an example, the phrase structure grammar tree shown in FIG. 7 may be used to train a satisfaction classification model, where the input of the model is the first sample feedback information and the output of the model is the satisfaction level of the first sample feedback information. After the first sample feedback information is input, the structural grammar tree represented by the text can be automatically generated through the text representation layer, and fig. 7 shows a binary phrase structural grammar tree, so that the input of each intermediate node is divided into a left node and a right node. The specific implementation process of the phrase structure grammar tree can be referred to as related explanations in the Bin Chinese tree library (Chinese Treebank: https:// catalog. LDC. upenn. edu/LDC2016T13), which may not be described in detail in the present disclosure. Generally, the number of intermediate nodes in the grammar tree is mainly influenced by the number of words split from the first sample feedback information, the content of the first sample feedback information, and other factors.
In practical applications, the intermediate node in fig. 7 may be embodied as a node map shown in fig. 8, and corresponding formula transformation may be described as follows, where capital letters represent matrices and lower case letters represent vectors:
indicating the way of calculation of the input gate (i) byFor transforming the input information of the current node, wherein,
the calculation mode of the output gate (o) is represented and used for controlling the information transmission from the current node to the father node;
for transforming and merging two sub-nodes of input and information of current input into ujPerforming the following steps;
fk=tanh(Wfxj+Ufhk+bf) The calculation mode of the forgetting gate (f) is used for filtering or transforming the sub-node information;
cj=ij*uj+∑k∈C(j)fk*ckthe calculation mode of the memory unit (c) is indicated, and the memory unit is used for controlling corresponding information input through a forgetting gate and an input gate, selecting proper information to be reserved and transmitting the information to a subsequent node;
hj=tanh(cj)*ojthe calculation mode of the current node hidden layer (h) is shown.
For example, the topological structure of the text presentation layer may be embodied as Tree-LSTM, Bi-LSTM, CNN, Recursive Autoencoder, etc., and the scheme of the present disclosure may not be specifically limited thereto. It is understood that when the satisfaction degree grade of the first sample feedback information output by the satisfaction degree classification model is consistent with the satisfaction degree grade set by people, the model training can be considered to be finished.
S504, taking the feedback information as input, and outputting a new vectorization representation through the text representation layer as the vectorization representation of the feedback information.
Compared with a general corpus, the satisfaction degree classification model obtained by utilizing the feedback information training of the personnel to be distributedSince the accuracy of the vectorization processing is higher, a new vectorization representation output from the text presentation layer can be used as the vectorization representation of the feedback information. Correspondingly, S402 may perform clustering based on the new vectorized representation of the feedback information, resulting in M4A fourth category.
Referring to fig. 9, a flow diagram of a vectorization representation embodiment 2 of the present disclosure for obtaining feedback information is shown. May include the steps of:
s601, obtaining the initial vectorization representation of each piece of feedback information, and clustering based on the initial vectorization representation of the feedback information to obtain M5A fifth category, M5>1。
S602, from M5And selecting first sample feedback information from the fifth category, and marking the satisfaction degree grade of each first sample feedback information.
S603, training to obtain a satisfaction degree classification model by using the first sample feedback information and the satisfaction degree grade of the first sample feedback information, wherein the satisfaction degree classification model comprises a text representation layer for vectorization processing.
S604, taking the feedback information as input, and outputting a new vectorization representation through the text representation layer.
The implementation processes of steps S601 to S604 can refer to the descriptions of steps S501 to S504, and are not described herein again.
S605, clustering the new vectorization representation based on the feedback information to obtain M6A sixth class, M6>1。
S606, selecting the distance difference (d)4-d3) Minimum N2Second sample feedback information and marking the satisfaction degree grade of each second sample feedback information, d3Distance of the second sample feedback information from the nearest class center of the sixth class, d4The distance of the second sample feedback information from the class center of the next closest sixth class.
In order to further improve the accuracy of vectorization representation of the feedback information, the scheme disclosed by the invention also provides a scheme for optimizing the satisfaction degree classification model. Specifically, optimizing the sample data of the satisfaction classification model may include:
(1) second sample feedback information.
Specifically, the new vectorization representation based on the feedback information may be clustered to obtain M6A sixth category; second sample feedback information is then selected from the aliasing prone feedback information located at the boundary of the sixth category.
As an example, the manner of selecting the second sample feedback information may be embodied as: respectively calculating the distance d between each feedback information and the nearest class center of the sixth class3Distance d between each feedback information and the class center of the next closest sixth class4(ii) a Usually, the feedback information corresponds to a distance difference (d)4-d3) The larger the size, the less likely inter-class aliasing will occur, so it can be based on (d)4-d3) Sorting the feedback information and selecting the smallest front N2The second sample feedback information is the second feedback information. In an actual application process, the second sample feedback information may be preferentially selected from the feedback information other than the first sample feedback information, which may not be specifically limited by the scheme of the present disclosure.
In addition, M is6The sixth category may be clustered by automatic clustering or by setting a super parameter, which is not specifically limited in the present disclosure.
(2) A satisfaction level of the second sample feedback information. Specifically, the satisfaction level may be at least one of satisfactory, general, unsatisfactory, and extremely unsatisfactory, and the satisfaction level of the second sample feedback information may be set artificially.
As an example, M is clustered6The sixth category may not cover all the satisfaction levels, and accordingly, the present disclosure may further provide a new clustering method to obtain more categories and cover all the satisfaction levels as much as possible.
Specifically, M may be determined first7An available class center, M7An available class center and M6The distance between class centers of the sixth class is not smallAt a preset distance; then M is added7An available class center and M6And the class centers of the sixth class are collectively called new class centers, and based on the new class centers, clustering processing is carried out on the feedback information of the personnel to be distributed again to determine a new class to which each piece of feedback information belongs, wherein each new class corresponds to one new class center.
As an example, M may be determined in the following manner7One available class center:
mode one, directly select M7The feedback information is used as an available class center to ensure that the selected feedback information and M are matched6The distance between the class centers of the sixth class is not less than the preset distance.
Second, first select M7Using the feedback information as initial available class center, selecting a certain amount of available feedback information near each initial available class center, calculating the average value of the initial available class center and the available feedback information to obtain M7An available class center. The number of available feedback information in the scheme of the present disclosure may not be limited, and may be determined specifically by combining with actual application requirements.
It is understood that the larger the preset distance, the farther the distance between the available class center and the class center of the sixth class is, and the higher the possibility of finding a new satisfaction level is, the specific value of the preset distance may not be limited by the present disclosure, and the preset distance is generally larger than the distance from the class center of the sixth class to the boundary.
As an example, M6The class center of the sixth category may be a class center calculated from all feedback information included in the sixth category; or, a part of feedback information included in the sixth category and a calculated class center may also be used.
In conclusion, (M) is obtained6+M7) After a new class center is generated, the feedback information can be clustered again. In particular, the new category to which the feedback information belongs may be determined according to the distance between the feedback information and the new class center, and generally, the feedback information belongs to the closest oneNew class centers correspond to new classes.
In response, the second sample feedback information may be selected from among the aliasing-prone feedback information located at the boundary of the new category. Specifically, the distance d between the feedback information and the nearest new class center may be calculated first5Distance d between the feedback information and the next closest new class center6Then according to the distance difference (d)6-d5) Select out N2Second sample feedback information.
S607, calculating an update class center based on the satisfaction degree grade of the first sample feedback information, the satisfaction degree grade of the second sample feedback information and the satisfaction degree grade of the residual feedback information except the first sample feedback information and the second sample feedback information, wherein the satisfaction degree grade of the residual feedback information is determined by the category to which the residual feedback information belongs.
The scheme of the disclosure can divide the feedback information of the personnel to be distributed into three parts: first sample feedback information, second sample feedback information, remaining feedback information. The satisfaction degree levels of the first sample feedback information and the second sample feedback information can be set manually, and the satisfaction degree levels of the remaining feedback information can be determined according to the category to which the remaining feedback information belongs.
For example, if the second sample feedback information is selected from the feedback information which is located at the boundary of the sixth category and easy to alias, the satisfaction level of the remaining feedback information can be determined according to the sixth category to which the remaining feedback information belongs; if the second sample feedback information is selected from the feedback information which is located at the boundary of the new category and easy to alias, the satisfaction degree grade of the residual feedback information can be determined according to the new category to which the residual feedback information belongs.
As an example, the update class center may be calculated according to the following formula:
wherein, mukIs the Kth update class center; r isnk、wnkAn 0/1 matrix of dimension NxK for representing the satisfaction level of the nth feedback information, when the nth feedback information is the first sample feedback information or the second sample feedback information, if the satisfaction level is labeled as class K, then wnk1, otherwise wnk0; when the nth feedback information is the remaining feedback information, if the satisfaction level thereof is labeled as the kth class, rnk1, otherwise rnk0; n is the total number of feedback information, and K is the total number of update classes; alpha is the weight of the residual feedback information, and alpha is more than 0 and less than 1; f (S)n) Is the output of the S layer in the network shown in fig. 7.
And S608, clustering the feedback information again based on the update class centers, and determining the update class to which each piece of feedback information belongs, wherein each update class corresponds to one update class center.
After the updated class center is obtained, the feedback information can be clustered again. Specifically, the update category to which the feedback information belongs may be determined according to a distance between the feedback information and the update class center, and generally, the feedback information belongs to the update category corresponding to the update class center closest to the feedback information. Understandably, re-clustering the feedback information is equivalent to updating the satisfaction degree r of the nth feedback informationnk、wnk。
And S609, updating the satisfaction degree classification model by using the feedback information and the update classification to which the feedback information belongs until the updated satisfaction degree classification model meets a preset condition, wherein the updated satisfaction degree classification model comprises an updated text representation layer.
After obtaining the feedback information and the satisfaction level of the feedback information based on the update category, the text representation network in fig. 7, i.e., f(s), may be updated using back propagation trainingn) N is 1,2, L, N until convergence. Wherein the loss function may be embodied as:
loss function JsemiThe method comprises the following three items: the first item represents the cost of the remaining feedback information; the second term represents cost of the first sample feedback information and the second sample feedback information; the third item is used for promoting the feedback information to be close to the center of the corresponding update class and to be far away from the centers of other update classes when the distance between the feedback information and the centers of the two update classes is close; alpha is the weight of the residual feedback information, and alpha is more than 0 and less than 1; l is a hyperparameter representing the degree of proximity of distances.
S610, taking the feedback information as input, and outputting an updated vectorization representation through the updated text representation layer to serve as the vectorization representation of the feedback information.
And optimizing the updated satisfaction degree classification model by using the second sample feedback information, so that the accuracy of vectorization representation can be further improved, and the updated vectorization representation output by the updated text representation layer can be used as the vectorization representation of the feedback information. Correspondingly, S402 may perform clustering based on the updated vectorized representation of the feedback information to obtain M4A fourth category.
Referring to fig. 10, a schematic diagram of the grouping apparatus of the present disclosure is shown. The apparatus may include:
an attention information splitting module 701, configured to obtain attention information of a person to be allocated, and split the attention information into at least one phrase;
a phrase vectorization representation acquisition module 702 for acquiring a vectorization representation of each phrase;
a phrase clustering module 703 for performing clustering based on the vectorized representation of the phrase to obtain M1A first class, each first class representing a dimension information, M1≥1;
A grouping result obtaining module 704, configured to set a weight corresponding to each dimension information, and obtain a grouping result of the to-be-assigned person by using the dimension information and the weight corresponding to the dimension information.
Optionally, the phrase vectorization representation obtaining module includes:
a first vectorized representation obtaining module for obtaining an initial vectorized representation of each phrase;
a first clustering module for clustering based on the initial vectorization representation of the phrase to obtain M2A second class, M2>1;
A first sample phrase selection module to select from the M2Combining the first sample phrases selected from the second categories into first sample phrase pairs pairwise, and acquiring the labeling information of each first sample phrase pair, wherein the labeling information is similar or dissimilar;
the phrase classification model training module is used for training to obtain a phrase classification model by utilizing the first sample phrase pair and the labeling information of the first sample phrase pair, and the phrase classification model comprises a phrase representation layer used for vectorization processing;
and the second vectorization representation output module is used for taking the phrase split from the attention information as input and outputting a new vectorization representation as the vectorization representation of the phrase through the phrase representation layer.
Optionally, the phrase vectorization representation obtaining module further includes:
a second clustering module, configured to, after obtaining the new vectorized representation of the phrase, perform clustering based on the new vectorized representation of the phrase to obtain M3A third class, M3>1;
A second sample phrase selecting module for selecting the distance difference (d)2-d1) Minimum N1A second sample phrase, d1Distance of the second sample phrase from the nearest class center of the third class, d2Distance of the second sample phrase from a class center of a next closest third class; combining the second sample phrase and the nearest class center of the third class and/or the second sample phrase and the second nearest class center of the third class into a second sample phrase pair in pairs, and acquiring the labeling information of each second sample phrase pair;
the phrase classification model updating module is used for updating the phrase classification model by using the second sample phrase pair and the labeling information of the second sample phrase pair until the updated phrase classification model meets a preset condition, and the updated phrase classification model comprises an updated phrase representation layer;
and the third vectorization representation output module is used for taking the phrase split from the attention information as input, and outputting an updated vectorization representation through the updated phrase representation layer to serve as the vectorization representation of the phrase.
Optionally, the apparatus further comprises:
the feedback information acquisition module is used for acquiring feedback information of the personnel to be distributed aiming at the grouping result;
the feedback information vectorization representation acquisition module is used for acquiring the vectorization representation of each piece of feedback information;
a feedback information clustering module for clustering based on vectorization representation of feedback information to obtain M4A fourth class, each fourth class corresponding to a satisfaction level, M4≥1;
A module for judging the category of M4Whether a category to be adjusted is included in the fourth categories or not, wherein the satisfaction degree grade corresponding to the category to be adjusted is used for indicating that a person to be adjusted belonging to the category to be adjusted is not satisfied with the grouping result, and the person to be adjusted belongs to the person to be distributed;
a packet update information acquisition module for acquiring packet update information at the M4When the fourth category comprises the category to be adjusted, acquiring the grouping update information of the personnel to be adjusted;
and the grouping result adjusting module is used for adjusting the grouping result of the personnel to be adjusted by utilizing the grouping updating information.
Optionally, the feedback information vectorization representation obtaining module includes:
the fourth-direction quantization representation acquisition module is used for acquiring the initial quantization representation of each piece of feedback information;
a third cluster processing module for initial vector based on feedback informationClustering the expression to obtain M5A fifth category, M5>1;
A first sample feedback information selection module for selecting M5Selecting first sample feedback information from the fifth category, and marking the satisfaction degree grade of each first sample feedback information;
the satisfaction degree classification model training module is used for training to obtain a satisfaction degree classification model by utilizing the first sample feedback information and the satisfaction degree grade of the first sample feedback information, and the satisfaction degree classification model comprises a text representation layer used for vectorization processing;
and a fifth vectorization representation output module, configured to output a new vectorization representation as the vectorization representation of the feedback information through the text representation layer, with the feedback information as an input.
Optionally, the feedback information vectorization representation obtaining module further includes:
a fourth clustering module, configured to, after obtaining the new vectorized representation of the feedback information, perform clustering based on the new vectorized representation of the feedback information to obtain M6A sixth class, M6>1;
A second sample feedback information selection module for selecting the distance difference (d)4-d3) Minimum N2Second sample feedback information and marking the satisfaction degree grade of each second sample feedback information, d3Distance of the second sample feedback information from the nearest class center of the sixth class, d4Feeding back the distance between the information and the class center of the second sample in the sixth next category;
an update class center calculation module, configured to calculate an update class center based on the satisfaction level of the first sample feedback information, the satisfaction level of the second sample feedback information, and the satisfaction levels of remaining feedback information except the first sample feedback information and the second sample feedback information, where the satisfaction levels of the remaining feedback information are determined by a category to which the remaining feedback information belongs;
the fifth clustering processing module is used for clustering the feedback information again based on the update class center to determine the update class to which each piece of feedback information belongs, and each update class corresponds to one update class center;
the satisfaction degree classification model updating module is used for updating the satisfaction degree classification model by using the feedback information and the update classification to which the feedback information belongs until the updated satisfaction degree classification model meets a preset condition, and the updated satisfaction degree classification model comprises an updated text representation layer;
and a sixth vectorization representation output module, configured to output, as the vectorization representation of the feedback information, an updated vectorization representation through the updated text representation layer, with the feedback information as an input.
Optionally, the feedback information vectorization representation obtaining module further includes:
an available class center determination module for obtaining M6After a sixth category, M is determined7An available class center, said M7An available class center and said M6The distance between class centers of the sixth class is not less than the preset distance;
a sixth clustering module, configured to perform clustering processing on the feedback information again based on a new class center, to determine a new class to which each piece of feedback information belongs, where each new class corresponds to a new class center, and the new class center includes the M7An available class center and said M6Class centers of a sixth category;
then, the second sample feedback information selection module is used for selecting the distance difference (d)6-d5) Minimum N2Second sample feedback information, d5Distance of the second sample feedback information from the nearest new class center, d6The distance of the information from the next nearest new class center is fed back for the second sample.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Referring to fig. 11, a schematic structural diagram of an electronic device 800 for people grouping of the present disclosure is shown. The electronic device 800 may include at least a processor 801 and a storage medium 802. as an example, the processor 801 and the storage medium 802 may be connected via a bus or other means, such as the bus connection shown in FIG. 11. The number of the processors 801 may be one or more, and one processor is illustrated in fig. 11 as an example. The storage medium 802 represents a storage device resource for storing instructions, such as application programs, that are executable by the processor 801. Further, the processor 801 may be configured to load instructions in a storage medium to perform the people grouping method described above.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.
Claims (16)
1. A method of people grouping, the method comprising:
acquiring attention information of a person to be distributed, splitting the attention information into at least one phrase, and acquiring vectorization representation of each phrase;
clustering based on the phrase vectorized representation to obtain M1A first class, each first class representing a dimension information, M1Not less than 1; wherein the dimension information represents the commonalities needs that the persons to be distributed haveSolving;
setting the weight corresponding to each dimension information, and obtaining the grouping result of the personnel to be distributed by using the dimension information and the weight corresponding to the dimension information.
2. The method of claim 1, wherein obtaining a vectorized representation of each phrase comprises:
obtaining the initial vectorization representation of each phrase, and clustering based on the initial vectorization representation of the phrase to obtain M2A second class, M2>1;
Will be from the M2Combining the first sample phrases selected from the second categories into first sample phrase pairs pairwise, and acquiring the labeling information of each first sample phrase pair, wherein the labeling information is similar or dissimilar;
training to obtain a phrase classification model by using the first sample phrase pair and the labeling information of the first sample phrase pair, wherein the phrase classification model comprises a phrase representation layer for vectorization processing;
and taking the phrase obtained by splitting the attention information as an input, and outputting a new vectorization representation through the phrase representation layer to serve as the vectorization representation of the phrase.
3. The method of claim 2, wherein after obtaining a new vectorized representation of the phrase, the obtaining a vectorized representation of each phrase further comprises:
clustering the new vectorized representation based on the phrase to obtain M3A third class, M3>1;
Selecting the distance difference (d)2-d1) Minimum N1A second sample phrase, d1Distance of the second sample phrase from the nearest class center of the third class, d2Distance of the second sample phrase from a class center of a next closest third class;
combining the second sample phrase and the nearest class center of the third class and/or the second sample phrase and the second nearest class center of the third class into a second sample phrase pair in pairs, and acquiring the labeling information of each second sample phrase pair;
updating the phrase classification model by using the second sample phrase pair and the labeling information of the second sample phrase pair until the updated phrase classification model meets a preset condition, wherein the updated phrase classification model comprises an updated phrase representation layer;
and taking the phrase obtained by splitting the attention information as an input, and outputting an updated vectorization representation through the updated phrase representation layer to serve as the vectorization representation of the phrase.
4. The method according to any one of claims 1 to 3, further comprising:
acquiring feedback information of the personnel to be distributed aiming at the grouping result and vectorization representation of each piece of feedback information;
clustering is carried out on vectorization representation based on feedback information to obtain M4A fourth class, each fourth class corresponding to a satisfaction level, M4≥1;
Judging the M4Whether a category to be adjusted is included in the fourth categories or not, wherein the satisfaction degree grade corresponding to the category to be adjusted is used for indicating that a person to be adjusted belonging to the category to be adjusted is not satisfied with the grouping result, and the person to be adjusted belongs to the person to be distributed;
if said M is4If the fourth category comprises the category to be adjusted, acquiring the grouping update information of the personnel to be adjusted;
and adjusting the grouping result of the personnel to be adjusted by utilizing the grouping update information.
5. The method of claim 4, wherein obtaining the vectorized representation of each piece of feedback information comprises:
obtaining the initial vectorization representation of each piece of feedback information, and performing the vectorization representation based on the initial vectorization representation of the feedback informationClustering to obtain M5A fifth category, M5>1;
From the M5Selecting first sample feedback information from the fifth category, and marking the satisfaction degree grade of each first sample feedback information;
training to obtain a satisfaction degree classification model by using the first sample feedback information and the satisfaction degree grade of the first sample feedback information, wherein the satisfaction degree classification model comprises a text representation layer for vectorization processing;
and taking the feedback information as input, and outputting a new vectorization representation through the text representation layer to serve as the vectorization representation of the feedback information.
6. The method of claim 5, wherein after obtaining the new vectorized representation of the feedback information, the obtaining the vectorized representation of each piece of feedback information further comprises:
clustering the new vectorized representation based on the feedback information to obtain M6A sixth class, M6>1;
Selecting the distance difference (d)4-d3) Minimum N2Second sample feedback information and marking the satisfaction degree grade of each second sample feedback information, d3Distance of the second sample feedback information from the nearest class center of the sixth class, d4Feeding back the distance between the information and the class center of the second sample in the sixth next category;
calculating an update class center based on the satisfaction degree level of the first sample feedback information, the satisfaction degree level of the second sample feedback information, and the satisfaction degree levels of the remaining feedback information except the first sample feedback information and the second sample feedback information, wherein the satisfaction degree levels of the remaining feedback information are determined by the category to which the remaining feedback information belongs;
clustering the feedback information again based on the update class centers to determine the update class to which each piece of feedback information belongs, wherein each update class corresponds to one update class center;
updating the satisfaction degree classification model by using the feedback information and the update classification to which the feedback information belongs until the updated satisfaction degree classification model meets a preset condition, wherein the updated satisfaction degree classification model comprises an updated text representation layer;
and taking the feedback information as input, and outputting an updated vectorization representation through the updated text representation layer to serve as the vectorization representation of the feedback information.
7. The method of claim 6, wherein the obtaining M6After the sixth category, the obtaining the vectorized representation of each piece of feedback information further includes:
determining M7An available class center, said M7An available class center and said M6The distance between class centers of the sixth class is not less than the preset distance;
based on a new class center, clustering the feedback information again to determine a new class to which each piece of feedback information belongs, wherein each new class corresponds to a new class center, and the new class center comprises the M7An available class center and said M6Class centers of a sixth category;
then, the second sample feedback information is selected as follows: selecting the distance difference (d)6-d5) Minimum N2Second sample feedback information, d5Distance of the second sample feedback information from the nearest new class center, d6The distance of the information from the next nearest new class center is fed back for the second sample.
8. A people grouping apparatus, characterized in that the apparatus comprises:
the concerned information splitting module is used for acquiring the concerned information of the personnel to be distributed and splitting the concerned information into at least one phrase;
a phrase vectorization representation acquisition module for acquiring a vectorization representation of each phrase;
phrase clustering processing moduleFor clustering based on the phrase vectorized representation to obtain M1A first class, each first class representing a dimension information, M1Not less than 1; the dimension information represents the commonality requirements of the personnel to be distributed;
and the grouping result obtaining module is used for setting the weight corresponding to each dimension information and obtaining the grouping result of the personnel to be distributed by using the dimension information and the weight corresponding to the dimension information.
9. The apparatus of claim 8, wherein the phrase vectorized representation acquisition module comprises:
a first vectorized representation obtaining module for obtaining an initial vectorized representation of each phrase;
a first clustering module for clustering based on the initial vectorization representation of the phrase to obtain M2A second class, M2>1;
A first sample phrase selection module to select from the M2Combining the first sample phrases selected from the second categories into first sample phrase pairs pairwise, and acquiring the labeling information of each first sample phrase pair, wherein the labeling information is similar or dissimilar;
the phrase classification model training module is used for training to obtain a phrase classification model by utilizing the first sample phrase pair and the labeling information of the first sample phrase pair, and the phrase classification model comprises a phrase representation layer used for vectorization processing;
and the second vectorization representation output module is used for taking the phrase split from the attention information as input and outputting a new vectorization representation as the vectorization representation of the phrase through the phrase representation layer.
10. The apparatus of claim 9, wherein the phrase vectorized representation acquisition module further comprises:
a second clustering module for phrase-based novelty after obtaining a new vectorized representation of the phraseThe vectorization representation of (D) is clustered to obtain M3A third class, M3>1;
A second sample phrase selecting module for selecting the distance difference (d)2-d1) Minimum N1A second sample phrase, d1Distance of the second sample phrase from the nearest class center of the third class, d2Distance of the second sample phrase from a class center of a next closest third class; combining the second sample phrase and the nearest class center of the third class and/or the second sample phrase and the second nearest class center of the third class into a second sample phrase pair in pairs, and acquiring the labeling information of each second sample phrase pair;
the phrase classification model updating module is used for updating the phrase classification model by using the second sample phrase pair and the labeling information of the second sample phrase pair until the updated phrase classification model meets a preset condition, and the updated phrase classification model comprises an updated phrase representation layer;
and the third vectorization representation output module is used for taking the phrase split from the attention information as input, and outputting an updated vectorization representation through the updated phrase representation layer to serve as the vectorization representation of the phrase.
11. The apparatus of any one of claims 8 to 10, further comprising:
the feedback information acquisition module is used for acquiring feedback information of the personnel to be distributed aiming at the grouping result;
the feedback information vectorization representation acquisition module is used for acquiring the vectorization representation of each piece of feedback information;
a feedback information clustering module for clustering based on vectorization representation of feedback information to obtain M4A fourth class, each fourth class corresponding to a satisfaction level, M4≥1;
A module for judging the category of M4Whether a class to be adjusted is included in a fourth class, theThe satisfaction degree grade corresponding to the category to be adjusted is used for representing that the personnel to be adjusted belonging to the category to be adjusted are dissatisfied with the grouping result, and the personnel to be adjusted belong to the personnel to be distributed;
a packet update information acquisition module for acquiring packet update information at the M4When the fourth category comprises the category to be adjusted, acquiring the grouping update information of the personnel to be adjusted;
and the grouping result adjusting module is used for adjusting the grouping result of the personnel to be adjusted by utilizing the grouping updating information.
12. The apparatus of claim 11, wherein the feedback information vectorized representation obtaining module comprises:
the fourth-direction quantization representation acquisition module is used for acquiring the initial quantization representation of each piece of feedback information;
a third clustering processing module for clustering based on the initial vectorization representation of the feedback information to obtain M5A fifth category, M5>1;
A first sample feedback information selection module for selecting M5Selecting first sample feedback information from the fifth category, and marking the satisfaction degree grade of each first sample feedback information;
the satisfaction degree classification model training module is used for training to obtain a satisfaction degree classification model by utilizing the first sample feedback information and the satisfaction degree grade of the first sample feedback information, and the satisfaction degree classification model comprises a text representation layer used for vectorization processing;
and a fifth vectorization representation output module, configured to output a new vectorization representation as the vectorization representation of the feedback information through the text representation layer, with the feedback information as an input.
13. The apparatus of claim 12, wherein the feedback information vectorized representation obtaining module further comprises:
a fourth clustering module for obtaining a new vector of the feedback informationAfter the representation is realized, new vectorization representation based on the feedback information is subjected to clustering processing to obtain M6A sixth class, M6>1;
A second sample feedback information selection module for selecting the distance difference (d)4-d3) Minimum N2Second sample feedback information and marking the satisfaction degree grade of each second sample feedback information, d3Distance of the second sample feedback information from the nearest class center of the sixth class, d4Feeding back the distance between the information and the class center of the second sample in the sixth next category;
an update class center calculation module, configured to calculate an update class center based on the satisfaction level of the first sample feedback information, the satisfaction level of the second sample feedback information, and the satisfaction levels of remaining feedback information except the first sample feedback information and the second sample feedback information, where the satisfaction levels of the remaining feedback information are determined by a category to which the remaining feedback information belongs;
the fifth clustering processing module is used for clustering the feedback information again based on the update class center to determine the update class to which each piece of feedback information belongs, and each update class corresponds to one update class center;
the satisfaction degree classification model updating module is used for updating the satisfaction degree classification model by using the feedback information and the update classification to which the feedback information belongs until the updated satisfaction degree classification model meets a preset condition, and the updated satisfaction degree classification model comprises an updated text representation layer;
and a sixth vectorization representation output module, configured to output, as the vectorization representation of the feedback information, an updated vectorization representation through the updated text representation layer, with the feedback information as an input.
14. The apparatus of claim 13, wherein the feedback information vectorized representation obtaining module further comprises:
an available class center determination module for obtaining M6After a sixth category, M is determined7An available class center, said M7An available class center and said M6The distance between class centers of the sixth class is not less than the preset distance;
a sixth clustering module, configured to perform clustering processing on the feedback information again based on a new class center, to determine a new class to which each piece of feedback information belongs, where each new class corresponds to a new class center, and the new class center includes the M7An available class center and said M6Class centers of a sixth category;
then, the second sample feedback information selection module is used for selecting the distance difference (d)6-d5) Minimum N2Second sample feedback information, d5Distance of the second sample feedback information from the nearest new class center, d6The distance of the information from the next nearest new class center is fed back for the second sample.
15. A storage medium having stored thereon a plurality of instructions, wherein the instructions are loadable by a processor and adapted to cause execution of the steps of the method according to any of claims 1 to 7.
16. An electronic device, characterized in that the electronic device comprises:
the storage medium of claim 15; and
a processor to execute the instructions in the storage medium.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810273041.6A CN108763246B (en) | 2018-03-29 | 2018-03-29 | Personnel grouping method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810273041.6A CN108763246B (en) | 2018-03-29 | 2018-03-29 | Personnel grouping method and device, storage medium and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108763246A CN108763246A (en) | 2018-11-06 |
CN108763246B true CN108763246B (en) | 2022-04-22 |
Family
ID=63980772
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810273041.6A Active CN108763246B (en) | 2018-03-29 | 2018-03-29 | Personnel grouping method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108763246B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113609295A (en) * | 2021-08-11 | 2021-11-05 | 平安科技(深圳)有限公司 | Text classification method and device and related equipment |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6751621B1 (en) * | 2000-01-27 | 2004-06-15 | Manning & Napier Information Services, Llc. | Construction of trainable semantic vectors and clustering, classification, and searching using trainable semantic vectors |
CN103823809B (en) * | 2012-11-16 | 2018-06-08 | 百度在线网络技术(北京)有限公司 | A kind of method, the method for Classified optimization and its device to query phrase classification |
CN106355449B (en) * | 2016-08-31 | 2021-09-07 | 腾讯科技(深圳)有限公司 | User selection method and device |
CN106897384B (en) * | 2017-01-23 | 2020-09-11 | 科大讯飞股份有限公司 | Method and device for automatically evaluating key points |
CN107169001A (en) * | 2017-03-31 | 2017-09-15 | 华东师范大学 | A kind of textual classification model optimization method based on mass-rent feedback and Active Learning |
-
2018
- 2018-03-29 CN CN201810273041.6A patent/CN108763246B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN108763246A (en) | 2018-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11403288B2 (en) | Querying a data graph using natural language queries | |
CN110457442B (en) | Intelligent power grid customer service question and answer oriented knowledge graph construction method | |
CN108804641B (en) | Text similarity calculation method, device, equipment and storage medium | |
WO2019153737A1 (en) | Comment assessing method, device, equipment and storage medium | |
Gweon et al. | Three methods for occupation coding based on statistical learning | |
US20200134263A1 (en) | Non-factoid question-answering device | |
CN103810999B (en) | Language model training method based on Distributed Artificial Neural Network and system thereof | |
US9552549B1 (en) | Ranking approach to train deep neural nets for multilabel image annotation | |
US20190340945A1 (en) | Automatic generation and personalization of learning paths | |
CN112465138A (en) | Model distillation method, device, storage medium and equipment | |
CN111488462A (en) | Recommendation method, device, equipment and medium based on knowledge graph | |
US11461613B2 (en) | Method and apparatus for multi-document question answering | |
WO2021098397A1 (en) | Data processing method, apparatus, and storage medium | |
US11836768B2 (en) | Machine learning systems for computer generation of automated recommendation outputs | |
CN110163252B (en) | Data classification method and device, electronic equipment and storage medium | |
US20160034556A1 (en) | System and method for computerized batching of huge populations of electronic documents | |
CN109523995B (en) | Audio recognition method, speech recognition equipment, readable storage medium storing program for executing and electronic equipment | |
EP3812919A1 (en) | Methods, apparatuses, and systems for data mapping | |
JP2018185771A (en) | Sentence pair classification apparatus, sentence pair classification learning apparatus, method, and program | |
CN116415177A (en) | Classifier parameter identification method based on extreme learning machine | |
CN114357143A (en) | Key information query method for knowledge graph of military equipment | |
CN108763246B (en) | Personnel grouping method and device, storage medium and electronic equipment | |
CN108550019A (en) | A kind of resume selection method and device | |
CN117668536A (en) | Software defect report priority prediction method based on hypergraph attention network | |
CN111813941A (en) | Text classification method, device, equipment and medium combining RPA and AI |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |