CN112650841A

CN112650841A - Information processing method and device and electronic equipment

Info

Publication number: CN112650841A
Application number: CN202011432971.5A
Authority: CN
Inventors: 吴培昊; 谭言信; 雷孝钧
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-04-13
Also published as: WO2022121801A1

Abstract

The embodiment of the disclosure discloses an information processing method, an information processing device and electronic equipment. One embodiment of the method comprises: importing at least two problems to be clustered into a clustering model to obtain at least one target cluster center, wherein the target cluster center indicates a cluster; and determining the at least two problems to be clustered as at least one cluster based on the at least one target cluster center. Therefore, a new problem clustering mode is provided.

Description

Information processing method and device and electronic equipment

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to an information processing method and apparatus, and an electronic device.

Background

With the development of the internet, users increasingly use terminal devices to browse various information. When browsing various information, the user may present various problems. The development of intelligent customer service technology can realize that the machine can automatically reply the question of the user.

In an intelligent customer service scenario, high frequency question answering (FAQ) is an important basic capability, which relies on a standard question-answer library in the background. The sources of the contents in the question-answer library comprise offline manual planning and online high-frequency question collection, and the addition of the latter can greatly enrich the standard question-answer library and improve the FAQ coverage rate. The on-line high frequency problem is often derived from data analysis of the on-line problem. Therefore, the analysis processing capability for the on-line problem is crucial.

Disclosure of Invention

This disclosure is provided to introduce concepts in a simplified form that are further described below in the detailed description. This disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, an embodiment of the present disclosure provides an information processing method, where the method includes: importing at least two problems to be clustered into a clustering model to obtain at least one target cluster center, wherein the target cluster center indicates a cluster; and determining the at least two problems to be clustered as at least one cluster based on the at least one target cluster center.

In a second aspect, an embodiment of the present disclosure provides an information processing apparatus, including: the system comprises a generating unit, a clustering unit and a processing unit, wherein the generating unit is used for leading at least two problems to be clustered into a clustering model to obtain at least one target cluster center, and the target cluster center indicates a cluster; and the determining unit is used for determining the at least two problems to be clustered as at least one cluster based on the center of the at least one target cluster.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the information processing method according to the first aspect.

In a fourth aspect, the disclosed embodiments provide a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the steps of the information processing method according to the first aspect.

It should be noted that, in the information processing method, the device and the electronic apparatus provided in the embodiment of the present disclosure, the problem to be clustered is introduced into the clustering model, and then at least one target cluster center is obtained; and then determining the at least two problems to be clustered as at least one cluster according to the center of the target cluster. Therefore, a new clustering mode can be provided, and problem clustering speed and problem clustering accuracy are improved.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

FIG. 1 is a flow diagram of one embodiment of an information processing method according to the present disclosure;

FIG. 2A is a schematic flow chart of a training process for a classification model according to the present disclosure;

FIG. 2B is a schematic diagram of an application scenario of an information processing method according to the present disclosure;

FIG. 3 is a schematic diagram of an alternative implementation of step 202 according to the present disclosure;

FIG. 4 is a schematic diagram of an alternative implementation of step 101 of an information processing method according to the present disclosure;

FIG. 5 is a schematic diagram of an alternative implementation of step 402 of the information processing method of the present disclosure;

FIG. 6 is a schematic diagram of another alternative implementation of step 402 of the information processing method of the present disclosure;

FIG. 7 is a schematic block diagram of one embodiment of an information processing apparatus according to the present disclosure;

FIG. 8 is an exemplary system architecture to which the information processing method of one embodiment of the present disclosure may be applied;

fig. 9 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Referring to fig. 1, a flow of one embodiment of an information processing method according to the present disclosure is shown. The information processing method as shown in fig. 1 includes the steps of:

step 101, at least two problems to be clustered are led into a clustering model to obtain at least one target cluster center.

In this embodiment, an executing subject (e.g., a server) of the information processing method may import at least two problems to be clustered into the clustering model, and obtain at least one target cluster center.

Here, the number of problems to be clustered may be at least two. The problem to be clustered may be textual information.

Here, the fields related to the problem to be clustered may be various fields, and are not limited herein.

In this embodiment, the cluster center of the at least one target cluster center may be used to indicate a cluster, and may also be understood as a problem type indicating a problem to be clustered. The problems to be clustered belonging to the same cluster can be understood as belonging to the same type of problem.

It can be understood that the problem types corresponding to the class clusters exist objectively; however, the name of the problem type corresponding to the class cluster may be determined prior to the occurrence of the class cluster, or may be determined after the class cluster is determined.

In some application scenarios, after the problems to be clustered are clustered to obtain clusters of the same type of problems, the problems in the clusters can be further analyzed to obtain corresponding analysis results.

By way of example, problems in the class clusters can be analyzed to find problems that have not occurred in the set of included problems, thereby implementing mining of new problems.

As an example, the problem types corresponding to the various class clusters may be analyzed to find a problem type that does not appear in the included problem set. In other words, it is possible that the problem types involved in the entire class cluster are not previously included, and thus, mining for new problem types can be implemented.

In some embodiments, the clustering model may include a feature extraction sub-model. The feature extraction sub-model can generate feature vectors corresponding to the problems to be clustered, and the feature vectors are used for clustering to determine the center of the target cluster.

In some embodiments, the feature extraction submodel is obtained based on a feature extraction layer of a pre-trained classification model.

In some embodiments, the classification model may be trained in advance. The classification model may include a feature extraction layer and a classification layer. Then, the feature extraction layer of the trained classification model can be used as a feature extraction submodel.

It should be noted that the feature extraction layer in the classification model has an extraction capability for the type features, that is, the differences between the different types of problems to be clustered can be expanded, and the differences between the same types of problems to be clustered can be reduced.

And 102, determining the at least two problems to be clustered as at least one cluster based on at least one target cluster center.

In this embodiment, the executing entity may determine the at least two problems to be clustered as at least one cluster according to the at least one target cluster center determined in step 101.

Here, the cluster center to which the problem to be clustered belongs may be determined by determining the distance between the problem to be clustered and each target cluster center. Therefore, the problems to be clustered under the central flag of each target cluster can be used as one cluster, and at least two problems to be clustered can be divided into at least one cluster.

It should be noted that, in the information processing method provided in this embodiment, the problem to be clustered is introduced into the clustering model, and then at least one target cluster center is obtained; and then determining the at least two problems to be clustered as at least one cluster according to the center of the target cluster. Therefore, a new clustering mode can be provided, and the clustering speed and the clustering accuracy for problems are improved.

It should be noted that, in some application scenarios, the feature extraction submodel in the clustering model for determining the center of the target cluster has the capability of extracting the type features, so that the feature vector serving as the clustering basis has a better type representation capability, thereby improving the clustering efficiency and reducing the time consumed by clustering; also, the accuracy of clustering can be improved.

In some embodiments, the classification model may be obtained by the first step. Here, the first step may be realized by a flow shown in fig. 2A. The flow shown in fig. 2A may include step 201 and step 202.

Step 201, a training sample is obtained.

Here, the training sample may have a label, and the label may indicate a text content type. The text content type may relate to various fields, and is not limited herein.

Step 202, training the classification network to be trained based on the training samples and the corresponding labels to obtain the classification model.

Here, the classification network to be trained may include a feature extraction layer to be trained and a classification layer to be trained. The feature extraction layer in the classification model can be obtained by training the feature extraction layer to be trained.

Here, the specific structure of the feature extraction layer to be trained may be set according to an actual application scenario, and is not limited herein. As an example, the feature extraction layer to be trained may include a convolutional neural network. As an example, the feature extraction layer to be trained may employ a baud (bireternal codes responses from transforms, BERT) structure.

Here, the specific structure of the classification layer to be trained may be set according to an actual application scenario, and is not limited herein. As an example, the classification layer to be trained may include a pooling layer and a fully connected layer; the full connectivity layer is used to map features to types.

Here, the training samples may be imported into a classification network to be trained, so as to obtain a classification result. And then comparing the classification result with the label corresponding to the training sample to determine a loss value. And then, the loss value can be used for carrying out back propagation, and the parameters of the classification network to be trained are adjusted. And determining the classification network to be trained obtained by training as a classification model through multiple iteration steps until the condition of stopping iteration is met.

Please refer to fig. 2B, which illustrates a schematic diagram of an exemplary application scenario according to an embodiment of the present application.

Firstly, in the training stage of the classification model, a training sample classification task can be adopted to train a pre-established classification network to be trained to obtain the classification model. The classification network to be trained may include a feature extraction layer to be trained and a classification layer to be trained.

Then, the trained feature extraction layer can be taken out from the classification model to be used as a feature extraction sub-model in the clustering model.

Then, in the clustering stage, the problem to be clustered can be led into a clustering model to obtain the center of the target cluster. The clustering model may include a feature extraction sub-model and a clustering sub-model.

Finally, in the inference stage, the distance between the problem to be clustered and the center of each target cluster can be determined, so as to determine the target cluster center to which the problem to be clustered belongs. Therefore, the problems to be clustered under the central flag of each target cluster can be used as one cluster, and at least two problems to be clustered can be divided into at least one cluster.

In some embodiments, step 202 described above may include the steps shown in FIG. 3. The steps shown in fig. 3 may include step 301, step 302, step 303 and step 304.

Step 301, at least two training samples are led into a classification network to be trained, and prediction types corresponding to the at least two training samples are obtained.

Here, the labels of the at least two training samples are different from each other.

Step 302, determining a single sample loss value of each training sample according to the prediction type and the label of each training sample.

Here, various loss calculation methods may be employed to determine a single sample loss value.

As an example, a single sample loss value may be determined using a cross entropy loss function.

Step 303, determining a total loss value of the samples according to the determined loss value of the single samples.

Here, the individual sample loss values may be combined in various ways to determine a sample total loss value. As an example, the determined individual sample loss values may be added, and the resulting sum may be taken as the sample total loss value.

And 304, adjusting parameters of the classification network to be trained based on the total loss value of the sample.

In this case, the total loss value of the sample may be used for back propagation to adjust the parameters of the classification network to be trained.

In some application scenarios, two different types of problem sample sets may be taken as training sets for the classification model. Then, each training sample is extracted from each problem sample set to form a pair of training samples. And then, respectively vectorizing each training sample by adopting bert, and obtaining sentence-level integral representation by adopting a pooling mode for vectorized output. And mapping the pooled output to type dimensions through a linear layer for classification to obtain a classification result. And calculating a single sample loss value by using each classification result and the corresponding label, adding the single sample loss values to obtain a total sample loss value, performing back propagation by using the total sample loss value, and updating the parameters of the bert.

It should be noted that, by using at least two training samples to train the classification network to be trained simultaneously, and the labels of the at least two training samples input to the classification network to be trained together are different, the neural network to be trained has a better generalization capability in the training process, i.e. the neural network to be trained has a more accurate feature extraction capability for various types of training samples. In contrast, if the parameters of the classification network to be trained are adjusted by using a single sample loss value, it may be difficult for the classification network to be trained to consider various types of problem samples, for example, after the classification network is adjusted on the problem sample set of type a and then the problem sample set of type b is used to update the parameters of the problem sample set of type b, the characterization capability of the updated classification network for the problem sample of type a may be deteriorated.

In some embodiments, step 101 may include steps in the flowchart shown in fig. 4. The flow shown in fig. 4 may include step 401 and step 402.

Step 401, importing the problem to be clustered into a feature extraction submodel to obtain a first feature vector.

And 402, updating the initial cluster center based on a back propagation algorithm and the first feature vector to obtain the at least one target cluster center.

Here, the initial cluster center may be determined in a randomly set manner.

It should be noted that the initial cluster center is updated through the first feature vector and the back propagation algorithm, which can be understood as determining the target cluster center in a deep learning manner, so that the accuracy of determining the target cluster center can be improved.

In some embodiments, the initial cluster center may be obtained by clustering the first feature vector by using a mean clustering algorithm.

Here, the mean clustering algorithm may include a K-means clustering (K-means) algorithm. The principle of the K-means algorithm is briefly described as follows: firstly, randomly selecting K objects as initial clustering centers. The distance between each object and the respective seed cluster center is then calculated, and each object is assigned to the cluster center closest to it. The cluster centers and the objects assigned to them represent a cluster. Once all objects are assigned, the cluster center for each cluster is recalculated based on the objects existing in the cluster. This process will be repeated until some termination condition is met.

It should be noted that the initial cluster center is generated by the mean clustering algorithm, so that the initial cluster center is more suitable for the actual scene of the current clustering, the accuracy of the initial cluster center is improved, and the time and the calculation amount for obtaining the target cluster center based on the initial cluster center can be reduced.

In some embodiments, step 402 may include the steps shown in FIG. 5. The steps shown in fig. 5 may include step 501, step 502, and step 503.

Step 501, determining an initial cluster center as a first candidate cluster center.

Step 502, based on the first candidate cluster center, performing the following first iteration step: determining a first probability value of the problem to be clustered belonging to each first candidate cluster center based on the first candidate cluster center and the first feature vector; strengthening each first probability value to obtain a first strengthened value; generating a first loss value according to the first enhancement value and the first probability value; and in response to determining that the first stopping condition is met, determining the first candidate cluster center as the target cluster center and outputting.

Here, the first candidate cluster center may be continuously updated. The first candidate cluster-like center may be different each time the iteration step is performed. The number of the first candidate cluster centers may be at least one; that is, one or at least two.

Here, the first probability value that the problem to be clustered belongs to each first candidate cluster center is determined based on the first candidate cluster center, the first candidate cluster center and the first feature vector, and may be implemented in various ways, which is not limited herein.

As an example, for each first feature vector, calculating a distance between the feature vector and the center of each first candidate cluster; and solving a ratio of the distance between the first feature vector and the center of the target first candidate cluster and the sum of the distances of the centers of the first candidate clusters, and determining the ratio as a first probability value.

As an example, for each first feature vector, a first square of a distance between the feature vector and each first candidate cluster center is calculated, and then the first square and 1 are added to obtain a first sum, and each first candidate cluster center corresponds to the first sum; and calculating a ratio of a first sum corresponding to the center of the first candidate cluster and the sum of the first sums, and determining the ratio as a first probability value.

Here, the strengthening process is used to widen the gap between the first probability values. In other words, the emphasis process can emphasize the proportion of the portion with higher reliability in the first probability value population.

Here, the specific manner of the enhancement processing may be set according to an actual application scenario, and is not limited herein.

As an example, a first enhancement value is determined for the first probability value by taking the power of the first probability value and then taking the ratio of the power of the second probability to the sum of the powers of the first probabilities.

Here, the generation of the first loss value based on the first reinforcement value and the first probability value may be implemented in various ways, and is not limited herein.

As an example, a logarithm may be taken of a ratio of the first reinforcement value and the first probability value as the first loss value.

It should be noted that, as the first probability value is considered toward the first enhancement value, the first loss becomes smaller and smaller, and approaches convergence (e.g., converges to a constant). Thus, as the first iteration step progresses, iteration of the cluster-like center can be achieved.

Step 503, in response to determining that the first stopping condition is not satisfied, performing back propagation based on the generated first loss value, adjusting the first candidate cluster center to obtain a new first candidate cluster center, and skipping to execute the first iteration step.

Here, the first stop condition may be set according to an actual application scenario. As an example, the first stop condition may include, but is not limited to, at least one of: the iteration times are not less than a preset time threshold, and the first loss value is not less than a preset loss value threshold.

Here, the back propagation may be performed based on the first loss value, and the value of the first candidate cluster center may be adjusted to obtain a new first candidate cluster center. Then, jumping to the first iteration step, and continuing to execute the first iteration step (the first candidate cluster center based on which the first iteration step is executed is different relative to the first iteration step in the previous round).

In some embodiments, step 102 may include: and determining the cluster to which the problem to be clustered belongs according to the first feature vector and the center of the target cluster.

Here, the distance between the first feature vector and each target cluster center may be determined, and then the target cluster center corresponding to the maximum distance is determined as the target cluster center to which the problem to be clustered belongs. It can be understood that after the target cluster center is determined for each problem to be clustered, each target cluster center may have its own problem to be clustered, that is, cluster division is performed on at least two problems to be clustered.

It should be noted that, the iteration step is performed by updating the first candidate cluster center without updating the feature extraction submodel to continuously determine the new cluster center, so that on one hand, the feature characterization capability of the pre-trained feature extraction submodel can be retained, on the other hand, the calculation amount can be reduced, and the calculation speed can be increased.

In some embodiments, step 402 may include the steps shown in FIG. 6. The steps shown in fig. 6 may include step 601, step 602, and step 603.

Step 601, determining the initial cluster center as a second candidate cluster center, and determining the first feature vector as a second feature vector.

Here, the first feature vector is a first feature vector generated by the initial feature extraction submodel.

Here, the second feature vector may be understood as a name for distinguishing from the first feature vector, and does not change a specific value representing the first feature vector.

Step 602, based on the second candidate cluster center and the second feature vector, performing the following second iteration step: determining a second probability value of the problem to be clustered belonging to each second candidate cluster center based on the second candidate cluster centers and the second feature vectors; performing strengthening treatment on each second probability value to obtain a second strengthened value; generating a second loss value according to the second strengthening value and the second probability value; in response to determining that the second stop condition is satisfied, determining and outputting a second candidate cluster center as a target cluster center, and determining the feature extraction submodel as an adjusted feature extraction submodel.

Here, the second candidate cluster center may be continuously updated. The second candidate cluster-like center may be different each time the iteration step is performed. The number of the second candidate cluster centers may be at least one; that is, one or at least two.

Here, the second probability value that the problem to be clustered belongs to each second candidate cluster center is determined based on the second candidate cluster center, the second candidate cluster center and the second feature vector, which may be implemented in various ways and is not limited herein.

As an example, for each second feature vector, calculating a distance between the feature vector and the center of each second candidate cluster; and solving a ratio of the distance between the second feature vector and the center of the target second candidate cluster and the sum of the distances of the centers of the second candidate clusters, and determining the ratio as a second probability value.

As an example, for each second feature vector, calculating a second square of the distance between the feature vector and each second candidate cluster center, and then summing the second square and 1 to obtain a second sum, wherein each second candidate cluster center corresponds to the second sum; and solving a ratio of a second sum corresponding to the center of the second candidate cluster and the sum of the second sum, and determining the ratio as a second probability value.

Here, the reinforcement processing is to expand the gap between the second probability values. In other words, the enhancement process can enhance the proportion of the portion with higher confidence in the second probability value population.

As an example, the second enhanced value is determined by taking the second probability value to the square and then taking the ratio of the square to the sum of the respective second probability squares.

Here, the generating of the second loss value according to the second reinforcement value and the second probability value may be implemented in various ways, and is not limited herein.

As an example, a logarithm may be taken of a ratio of the second reinforcement value and the second probability value as the second loss value.

It should be noted that the second loss tends to approach zero more and more as the second probability value is considered toward the second enhanced value. Thus, the second iteration step can be performed to realize iteration of clustering.

Step 603, in response to determining that the second stopping condition is not satisfied, adjusting a second candidate cluster center based on the generated second loss value to obtain a new second candidate cluster center, adjusting a parameter of the feature extraction submodel based on the generated second loss value, importing the problem to be clustered into the adjusted feature extraction submodel to obtain a new second feature vector, and skipping to execute the second iteration step.

Here, the second stop condition may be set according to an actual application scenario. As an example, the second stop condition may include, but is not limited to, at least one of: the iteration times are not less than a preset time threshold, and the second loss value is not less than a preset loss value threshold.

Here, the back propagation may be performed based on the second loss value, and the value of the second candidate cluster center may be adjusted to obtain a new second candidate cluster center. Then, jump to the second iteration step, and continue to execute the second iteration step (the second candidate cluster center based on which the second iteration step is executed this time is different from the second iteration step in the previous round).

Here, the parameters of the feature extraction submodel may be adjusted based on the second loss value. In other words, as the iteration step progresses, the feature extraction submodel is also continuously updated. And in the second characteristic vector in each second iteration step, the updated characteristic extraction sub-model can be used for obtaining a new characteristic vector for the problem to be clustered.

In some embodiments, the step 102 may include: importing the problem to be clustered into the adjusted feature extraction sub-model to obtain a third feature vector; and determining the cluster to which the problem to be clustered belongs according to the third feature vector and the center of the target cluster.

Here, in step 603, since the parameters of the feature extraction submodel are adjusted by performing back propagation based on the second loss value, the feature extraction submodel is continuously updated. As the updating proceeds, the feature extraction submodel used in each second iteration step may be the latest feature extraction submodel retained after updating, and therefore, when the problem to be clustered is introduced into the adjusted feature extraction submodel, those skilled in the art can understand the latest feature extraction submodel retained after updating. Therefore, the obtained third feature vector can more accurately express the type feature of the problem to be clustered.

Here, the distance between the third feature vector and each target cluster center may be determined, and then the target cluster center corresponding to the maximum distance may be determined as the target cluster center to which the problem to be clustered belongs. It can be understood that after the target cluster center is determined for each problem to be clustered, each target cluster center may have its own problem to be clustered, that is, at least two problems to be clustered are stacked, thereby determining at least one cluster.

It should be noted that the iteration step is performed by updating the second candidate cluster center and updating the feature extraction submodel to continuously determine the new cluster center and the second feature vector, so that the feature characterization capability of the feature extraction submodel can be further improved, and the clustering accuracy is improved.

With further reference to fig. 7, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an information processing apparatus, which corresponds to the embodiment of the method shown in fig. 1, and which is particularly applicable in various electronic devices.

As shown in fig. 7, the information processing apparatus of the present embodiment includes: a generating unit 701 and a determining unit 702. The system comprises a generating unit, a clustering model generating unit and a processing unit, wherein the generating unit is used for leading at least two problems to be clustered into the clustering model to obtain at least one target cluster center, and the target cluster center indicates a cluster; and the determining unit is used for determining the at least two problems to be clustered as at least one cluster based on the center of the at least one target cluster.

In this embodiment, specific processing of the recording unit generation unit 701 and the determination unit 702 of the information processing apparatus and technical effects thereof can refer to related descriptions of step 101 and step 102 in the corresponding embodiment of fig. 1, which are not described herein again.

In some embodiments, the clustering model includes a feature extraction sub-model, where the feature extraction sub-model is obtained based on a feature extraction layer of a pre-trained classification model, the feature extraction sub-model is used to generate a feature vector corresponding to a problem to be clustered, and the feature vector is used for clustering to determine a target cluster center. In some embodiments, the classification model is obtained by a first step, wherein the first step comprises: acquiring a training sample, wherein a label of the training sample indicates a text content type; and training a classification network to be trained based on the training samples and the corresponding labels to obtain the classification model, wherein the classification network to be trained comprises a feature extraction layer to be trained and a classification layer, and the feature extraction layer in the classification model is obtained by training the feature extraction layer to be trained.

In some embodiments, the training a classification network to be trained based on the training samples and the corresponding labels to obtain the classification model includes: importing at least two training samples into a classification network to be trained to obtain prediction types corresponding to the at least two training samples, wherein labels of the at least two training samples are different; determining a single sample loss value of each training sample according to the prediction type and the label of each training sample; determining a total loss value of the samples according to the determined loss value of the single samples; and adjusting parameters of the classification network to be trained based on the sample total loss value.

In some embodiments, the importing at least two problems to be clustered into a clustering model to obtain at least one target cluster center includes: importing the problem to be clustered into the feature extraction submodel to obtain a first feature vector; and updating the initial cluster center based on a back propagation algorithm and the first feature vector to obtain the at least one target cluster center.

In some embodiments, the initial cluster-like center is obtained by clustering the first feature vector by using a mean clustering algorithm.

In some embodiments, the updating the initial cluster center based on the back propagation algorithm and the first feature vector to obtain the at least one target cluster center includes: determining the initial cluster center as a first candidate cluster center; based on the first candidate cluster center, performing the following first iteration step: determining a first probability value of the problem to be clustered belonging to each first candidate cluster center based on the first candidate cluster center and the first feature vector; strengthening each first probability value to obtain a first strengthened value; generating a first loss value according to the first enhancement value and the first probability value; in response to determining that the first stopping condition is satisfied, determining and outputting a first candidate cluster center as a target cluster center; and responding to the first stopping condition, performing back propagation based on the generated first loss value, adjusting the first candidate cluster center to obtain a new first candidate cluster center, and skipping to execute the first iteration step.

In some embodiments, the determining the at least two questions to be clustered as at least one class cluster based on the at least one target class cluster center includes: and determining the cluster to which the problem to be clustered belongs according to the first feature vector and the center of the target cluster.

In some embodiments, the updating the initial cluster center based on the back propagation algorithm and the first feature vector to obtain the at least one target cluster center includes: determining the initial cluster center as a second candidate cluster center, and determining the first feature vector as a second feature vector; based on the second candidate cluster center and the second feature vector, performing the following second iteration step: determining a second probability value of the problem to be clustered belonging to each second candidate cluster center based on the second candidate cluster centers and the second feature vectors; performing strengthening treatment on each second probability value to obtain a second strengthened value; generating a second loss value according to the second strengthening value and the second probability value; in response to determining that the second stopping condition is satisfied, determining and outputting a second candidate cluster center as a target cluster center; and in response to the fact that the second stopping condition is not met, performing back propagation based on the generated second loss value, adjusting the center of the second candidate cluster to obtain a new center of the second candidate cluster, performing back propagation based on the generated second loss value to adjust the parameters of the feature extraction submodel, introducing the problem to be clustered into the adjusted feature extraction submodel to obtain a new second feature vector, and skipping to execute a second iteration step.

In some embodiments, the determining the at least two questions to be clustered as at least one class cluster based on the at least one target class cluster center includes: importing the problem to be clustered into the adjusted feature extraction sub-model to obtain a third feature vector; and determining the cluster to which the problem to be clustered belongs according to the third feature vector and the center of the target cluster.

Referring to fig. 8, fig. 8 illustrates an exemplary system architecture to which the information processing method of one embodiment of the present disclosure may be applied.

As shown in fig. 8, the system architecture may include

terminal devices

801, 802, 803, a network 804, and a server 805. The network 804 serves to provide a medium for communication links between the

terminal devices

801, 802, 803 and the server 805. Network 804 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

801, 802, 803 may interact with a server 805 over a network 804 to receive or send messages or the like. The

terminal devices

801, 802, 803 may have various client applications installed thereon, such as a web browser application, a search-type application, and a news-information-type application. The client application in the

terminal device

801, 802, 803 may receive the instruction of the user, and complete the corresponding function according to the instruction of the user, for example, add the corresponding information in the information according to the instruction of the user.

The

terminal devices

801, 802, 803 may be hardware or software. When the

terminal devices

801, 802, 803 are hardware, they may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like. When the

terminal devices

801, 802, 803 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 805 may be a server providing various services, for example, receiving an information acquisition request sent by the

terminal devices

801, 802, and 803, and acquiring presentation information corresponding to the information acquisition request in various ways according to the information acquisition request. And the relevant data of the presentation information is sent to the

terminal devices

801, 802, 803.

It should be noted that the information processing method provided by the embodiment of the present disclosure may be executed by a terminal device, and accordingly, the information processing apparatus may be provided in the

terminal devices

801, 802, and 803. Furthermore, the information processing method provided by the embodiment of the present disclosure may also be executed by the server 805, and accordingly, an information processing apparatus may be provided in the server 805.

It should be understood that the number of terminal devices, networks, and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to fig. 9, shown is a schematic diagram of an electronic device (e.g., a terminal device or a server of fig. 8) suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 9, the electronic device may include a processing means (e.g., a central processing unit, a graphic processor, etc.) 901, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage means 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are also stored. The processing apparatus 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

Generally, the following devices may be connected to the I/O interface 905: input devices 909 including, for example, a touch screen, a touch pad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, and the like; an output device 907 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 908 including, for example, magnetic tape, hard disk, etc.; and a communication device 909. The communication means 909 may allow the electronic device to perform wireless or wired communication with other devices to exchange data. While fig. 9 illustrates an electronic device having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing apparatus 901.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (hypertext transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: importing at least two problems to be clustered into a clustering model to obtain at least one target cluster center, wherein the target cluster center indicates a cluster; and determining the at least two problems to be clustered as at least one cluster based on the at least one target cluster center.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation on the unit itself, for example, a generation unit may also be described as a "unit that generates a target cluster center".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, the clustering model includes a feature extraction submodel, the feature extraction submodel is used to generate a feature vector corresponding to a problem to be clustered, and the feature vector is used for clustering to determine a target cluster center.

According to one or more embodiments of the present disclosure, the feature extraction submodel is obtained based on a feature extraction layer of a pre-trained classification model.

According to one or more embodiments of the present disclosure, the classification model is obtained by a first step, wherein the first step includes: acquiring a training sample, wherein a label of the training sample indicates a text content type; and training a classification network to be trained based on the training samples and the corresponding labels to obtain the classification model, wherein the classification network to be trained comprises a feature extraction layer to be trained and a classification layer, and the feature extraction layer in the classification model is obtained by training the feature extraction layer to be trained.

According to one or more embodiments of the present disclosure, training a classification network to be trained based on the training samples and the corresponding labels to obtain the classification model includes: importing at least two training samples into a classification network to be trained to obtain prediction types corresponding to the at least two training samples, wherein labels of the at least two training samples are different; determining a single sample loss value of each training sample according to the prediction type and the label of each training sample; determining a total loss value of the samples according to the determined loss value of the single samples; and adjusting parameters of the classification network to be trained based on the sample total loss value.

According to one or more embodiments of the present disclosure, the importing at least two problems to be clustered into a clustering model to obtain at least one target cluster center includes: importing the problem to be clustered into the feature extraction submodel to obtain a first feature vector; and updating the initial cluster center based on a back propagation algorithm and the first feature vector to obtain the at least one target cluster center.

According to one or more embodiments of the present disclosure, the initial cluster-like center is obtained by clustering the first feature vector by using a mean clustering algorithm.

According to one or more embodiments of the present disclosure, the updating an initial cluster center based on a back propagation algorithm and a first feature vector to obtain the at least one target cluster center includes: determining the initial cluster center as a first candidate cluster center; based on the first candidate cluster center, performing the following first iteration step: determining a first probability value of the problem to be clustered belonging to each first candidate cluster center based on the first candidate cluster center and the first feature vector; strengthening each first probability value to obtain a first strengthened value; generating a first loss value according to the first enhancement value and the first probability value; in response to determining that the first stopping condition is satisfied, determining and outputting a first candidate cluster center as a target cluster center; and responding to the first stopping condition, performing back propagation based on the generated first loss value, adjusting the first candidate cluster center to obtain a new first candidate cluster center, and skipping to execute the first iteration step.

According to one or more embodiments of the present disclosure, the determining the at least two problems to be clustered as at least one cluster based on the at least one target cluster center includes: and determining the cluster to which the problem to be clustered belongs according to the first feature vector and the center of the target cluster.

According to one or more embodiments of the present disclosure, the updating an initial cluster center based on a back propagation algorithm and a first feature vector to obtain the at least one target cluster center includes: determining the initial cluster center as a second candidate cluster center, and determining the first feature vector as a second feature vector; based on the second candidate cluster center and the second feature vector, performing the following second iteration step: determining a second probability value of the problem to be clustered belonging to each second candidate cluster center based on the second candidate cluster centers and the second feature vectors; performing strengthening treatment on each second probability value to obtain a second strengthened value; generating a second loss value according to the second strengthening value and the second probability value; in response to determining that the second stopping condition is satisfied, determining and outputting a second candidate cluster center as a target cluster center; and in response to the fact that the second stopping condition is not met, performing back propagation based on the generated second loss value, adjusting the center of the second candidate cluster to obtain a new center of the second candidate cluster, performing back propagation based on the generated second loss value to adjust the parameters of the feature extraction submodel, introducing the problem to be clustered into the adjusted feature extraction submodel to obtain a new second feature vector, and skipping to execute a second iteration step.

According to one or more embodiments of the present disclosure, the determining the at least two problems to be clustered as at least one cluster based on the at least one target cluster center includes: importing the problem to be clustered into the adjusted feature extraction sub-model to obtain a third feature vector; and determining the cluster to which the problem to be clustered belongs according to the third feature vector and the center of the target cluster.

According to one or more embodiments of the present disclosure, an information processing apparatus includes: the system comprises a generating unit, a clustering unit and a processing unit, wherein the generating unit is used for leading at least two problems to be clustered into a clustering model to obtain at least one target cluster center, and the target cluster center indicates a cluster; and the determining unit is used for determining the at least two problems to be clustered as at least one cluster based on the center of the at least one target cluster.

According to one or more embodiments of the present disclosure, the clustering model includes a feature extraction submodel, where the feature extraction submodel is obtained based on a feature extraction layer of a pre-trained classification model, the feature extraction submodel is used to generate a feature vector corresponding to a problem to be clustered, and the feature vector is used for clustering to determine a target cluster center. According to one or more embodiments of the present disclosure, the classification model is obtained by a first step, wherein the first step includes: acquiring a training sample, wherein a label of the training sample indicates a text content type; and training a classification network to be trained based on the training samples and the corresponding labels to obtain the classification model, wherein the classification network to be trained comprises a feature extraction layer to be trained and a classification layer, and the feature extraction layer in the classification model is obtained by training the feature extraction layer to be trained.

According to one or more embodiments of the present disclosure, an electronic device includes: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method as in any information processing method.

According to one or more embodiments of the present disclosure, a computer-readable medium has stored thereon a computer program which, when executed by a processor, implements any of the methods described as information processing methods.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. An information processing method characterized by comprising:

importing at least two problems to be clustered into a clustering model to obtain at least one target cluster center, wherein the target cluster center indicates a cluster;

and determining the at least two problems to be clustered as at least one cluster based on the at least one target cluster center.

2. The method of claim 1, wherein the clustering model comprises a feature extraction submodel, the feature extraction submodel is used for generating feature vectors corresponding to the problems to be clustered, and the feature vectors are used for clustering to determine the target cluster center.

3. The method of claim 2, wherein the feature extraction submodel is derived based on a feature extraction layer of a pre-trained classification model.

4. The method according to claim 3, wherein the classification model is obtained by a first step, wherein the first step comprises:

acquiring a training sample, wherein a label of the training sample indicates a text content type;

and training a classification network to be trained based on the training samples and the corresponding labels to obtain the classification model, wherein the classification network to be trained comprises a feature extraction layer to be trained and a classification layer, and the feature extraction layer in the classification model is obtained by training the feature extraction layer to be trained.

5. The method of claim 4, wherein training the classification network to be trained based on the training samples and the corresponding labels to obtain the classification model comprises:

importing at least two training samples into a classification network to be trained to obtain prediction types corresponding to the at least two training samples, wherein labels of the at least two training samples are different;

determining a single sample loss value of each training sample according to the prediction type and the label of each training sample;

determining a total loss value of the samples according to the determined loss value of the single samples;

and adjusting parameters of the classification network to be trained based on the sample total loss value.

6. The method according to any one of claims 1 to 5, wherein the importing at least two problems to be clustered into a clustering model to obtain at least one target cluster center comprises:

importing the problem to be clustered into the feature extraction submodel to obtain a first feature vector;

and updating the initial cluster center based on a back propagation algorithm and the first feature vector to obtain the at least one target cluster center.

7. The method of claim 6, wherein the initial cluster-like center is obtained by clustering the first feature vector using a mean clustering algorithm.

8. The method according to claim 6, wherein the updating the initial cluster center based on the back propagation algorithm and the first feature vector to obtain the at least one target cluster center comprises:

determining the initial cluster center as a first candidate cluster center;

based on the first candidate cluster center, performing the following first iteration step: determining a first probability value of the problem to be clustered belonging to each first candidate cluster center based on the first candidate cluster center and the first feature vector; strengthening each first probability value to obtain a first strengthened value; generating a first loss value according to the first enhancement value and the first probability value; in response to determining that the first stopping condition is satisfied, determining and outputting a first candidate cluster center as a target cluster center;

and responding to the first stopping condition, performing back propagation based on the generated first loss value, adjusting the first candidate cluster center to obtain a new first candidate cluster center, and skipping to execute the first iteration step.

9. The method according to claim 8, wherein the determining the at least two questions to be clustered as at least one cluster based on the at least one target cluster center comprises:

and determining the cluster to which the problem to be clustered belongs according to the first feature vector and the center of the target cluster.

10. The method according to claim 6, wherein the updating the initial cluster center based on the back propagation algorithm and the first feature vector to obtain the at least one target cluster center comprises:

determining the initial cluster center as a second candidate cluster center, and determining the first feature vector as a second feature vector;

based on the second candidate cluster center and the second feature vector, performing the following second iteration step: determining a second probability value of the problem to be clustered belonging to each second candidate cluster center based on the second candidate cluster centers and the second feature vectors; performing strengthening treatment on each second probability value to obtain a second strengthened value; generating a second loss value according to the second strengthening value and the second probability value; in response to determining that the second stopping condition is satisfied, determining and outputting a second candidate cluster center as a target cluster center;

and in response to the fact that the second stopping condition is not met, performing back propagation based on the generated second loss value, adjusting the center of the second candidate cluster to obtain a new center of the second candidate cluster, performing back propagation based on the generated second loss value to adjust the parameters of the feature extraction submodel, introducing the problem to be clustered into the adjusted feature extraction submodel to obtain a new second feature vector, and skipping to execute a second iteration step.

11. The method according to claim 10, wherein the determining the at least two questions to be clustered as at least one cluster based on the at least one target cluster center comprises:

importing the problem to be clustered into the adjusted feature extraction sub-model to obtain a third feature vector;

and determining the cluster to which the problem to be clustered belongs according to the third feature vector and the center of the target cluster.

12. An information processing apparatus characterized by comprising:

the system comprises a generating unit, a clustering unit and a processing unit, wherein the generating unit is used for leading at least two problems to be clustered into a clustering model to obtain at least one target cluster center, and the target cluster center indicates a cluster;

and the determining unit is used for determining the at least two problems to be clustered as at least one cluster based on the center of the at least one target cluster.

13. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-11.

14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-11.