CN112269860A

CN112269860A - Automatic response processing method and device, electronic equipment and readable storage medium

Info

Publication number: CN112269860A
Application number: CN202010797069.7A
Authority: CN
Inventors: 王阳阳
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Huijun Technology Co ltd
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2021-01-26
Anticipated expiration: 2040-08-10
Also published as: CN112269860B

Abstract

The application provides an automatic response processing method, an automatic response processing device, an electronic device and a readable storage medium, wherein the method comprises the following steps: acquiring a text of inquiry information input by a user; inputting the text of the query information into a target text classification model to obtain text classification information of the query information, wherein the target text classification model is obtained by training an intermediate text classification model by using a plurality of target training texts, the target training texts comprise a plurality of corpus information and target classification information of each corpus information, and the target classification information is obtained by processing the corpus information based on an initial text classification model in advance; determining response information of the inquiry information according to the text classification information of the inquiry information; and outputting response information of the inquiry information. The method can reduce the classification information of the automatically generated corpus and greatly improve the efficiency of the training stage.

Description

Automatic response processing method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to computer technologies, and in particular, to an automatic response processing method and apparatus, an electronic device, and a readable storage medium.

Background

With the continuous development of artificial intelligence and big data technology, it plays an important role in various fields. For example, in the field of unmanned customer service, unmanned customer service systems may implement their functionality based on artificial intelligence and big data technology. The unmanned customer service system can automatically recognize the user intention based on the information input by the client, divide the user intention into several categories, and output the user intention to the user by matching a corresponding response scheme under each category. In the unmanned customer service system, the user intention may be recognized by text-classifying the user input information.

In the prior art, text classification in the unmanned customer service system is mainly customized based on services that the unmanned customer service system needs to undertake. Specifically, the unmanned client system may use a text classification model to perform text classification on a text corresponding to the customized service. Wherein, the text classification model needs to be trained in advance. Before training a text classification model, an operator of an unmanned customer service system needs to firstly card service points of borne services, perform data labeling on corpora used for training the text classification model based on the service points, and train the text classification model by using the corpora after the data labeling.

However, the method in the prior art causes a long time to be consumed in the training phase of the text classification model, and further causes low efficiency in the construction of the unmanned customer service system.

Disclosure of Invention

The application provides an automatic response processing method and device, electronic equipment and a readable storage medium, which are used for solving the problem of low efficiency of building an unmanned customer service system caused by long time consumption in a model training stage in the prior art.

In a first aspect, an embodiment of the present application provides an automatic response processing method, including:

acquiring a text of inquiry information input by a user; inputting a text of the query information into a target text classification model to obtain text classification information of the query information, wherein the target text classification model is obtained by training an intermediate text classification model by using a plurality of target training texts, the target training texts comprise a plurality of corpus information and target classification information of each corpus information, the target classification information is obtained by processing the corpus information based on an initial text classification model in advance, and the intermediate text classification model is obtained by updating the initial text classification model when the initial text classification model processes the corpus information; determining response information of the inquiry information according to the text classification information of the inquiry information; and outputting response information of the inquiry information.

In a possible implementation manner, before the entering the text of the query information into the target text classification model, the method further includes:

determining target classification information of the corpus information according to the initial classification information of the corpus information and reference classification information of the corpus information output by the initial text classification model, and updating the initial text classification model to obtain an intermediate text classification model; and training the intermediate text classification model by using the corpus information and the target classification information of the corpus information to obtain the target text classification model.

In a possible implementation manner, the determining, according to the initial classification information of the corpus information and the reference classification information of the corpus information output by the initial text classification model, the target classification information of the corpus information and updating the initial text classification model includes:

dividing the plurality of corpus information into a preset number of corpus sets, wherein the preset number is greater than or equal to 2, and taking a first corpus set in the preset number of corpus sets as a to-be-classified set.

And clustering the to-be-classified set into at least one subset to obtain initial classification information of each corpus information in the to-be-classified set, wherein the initial classification information of each corpus information in the same subset is the same.

A. Updating the initial text classification model according to the initial classification information of each corpus information in the to-be-classified collection and the reference classification information of each corpus information in the to-be-classified collection output by the initial text classification model, and determining whether the initial classification information of the corpus information needs to be corrected or not; if yes, executing the step B, otherwise, executing the step C.

B. And correcting the initial classification information of the corpus information according to the initial classification information and the reference classification information to obtain new initial classification information of each corpus information of the to-be-classified collection, and executing the step A.

C. And if the to-be-classified set does not comprise all the plurality of corpus information, adding a second corpus set in the preset number of corpus sets into the to-be-classified set to obtain a new to-be-classified set, wherein the initial classification information of each corpus information in the second corpus set is obtained by processing the second corpus set based on the initial text classification model, and executing the step A.

And if the to-be-classified set comprises all the corpus information, taking the initial classification information of each corpus information in the to-be-classified set as the target classification information of each corpus information, and ending.

In a possible implementation manner, the modifying the initial classification information of the corpus information according to the initial classification information and the reference classification information to obtain new initial classification information of each corpus information of the to-be-classified collection includes:

modifying the classification value of the initial classification information of the corpus information meeting a first condition into no classification, wherein the first condition comprises the following steps: the probability value of the reference classification information of the corpus information is smaller than a first threshold, the initial classification information of the corpus information is different from the reference classification information, and the classification value of the initial classification information of the corpus information is not classified; if the number of the linguistic data information with the classification value of the initial classification information in the to-be-classified set as no classification is larger than a second threshold value, clustering the linguistic data information with the classification value of no classification into at least one set, wherein the classification values of the linguistic data information in the same set are the same, and correcting the initial classification information of the linguistic data information with the classification value of no classification in the to-be-classified set according to the clustering result; and combining the corrected initial classification information of each corpus information in the to-be-classified collection according to the confusion parameter among the corrected initial classification information of each corpus information in the to-be-classified collection to obtain new initial classification information of each corpus information in the to-be-classified collection.

In a possible implementation manner, the modifying, according to the clustering result, the initial classification information of the corpus information whose classification value is not classified in the to-be-classified collection includes:

and if clusters with the number of the corpus information larger than a third threshold exist in the clustering result, modifying the classification value of the initial classification information of the corpus information in the clusters into the same value.

In a possible implementation manner, the merging the corrected initial classification information of each corpus information in the to-be-classified collection according to the confusion parameter between the corrected initial classification information of each corpus information in the to-be-classified collection to obtain new initial classification information of each corpus information in the to-be-classified collection includes:

if the value of the confusion parameter of the two classes to which the corpus information in the corrected corpus collection belongs is larger than a third threshold value, merging the initial classification information of the corpus information belonging to the two classes into the same initial classification information, wherein the confusion parameter is used for representing the confusion degree between the two classes.

In a possible implementation manner, before adding the second corpus set of the preset number of corpus sets to the to-be-classified set to obtain a new to-be-classified set, the method further includes:

inputting each corpus information of the second corpus set into the initial text classification model to obtain reference classification information of each corpus information of the second corpus set and a probability value of the reference classification information; if the probability value is smaller than a fourth threshold value, modifying the classification value of the reference classification information into no classification;

and if the number of the corpus information with the classification value of the reference classification information being no classification is larger than a fifth threshold value, clustering the corpus information with the classification value being no classification into at least one set, wherein the classification values of the corpus information in the same set are the same, and obtaining the initial classification information of each corpus information in the second corpus set according to the clustering result.

In a possible implementation manner, before dividing the corpus information into a preset number of corpus sets, the method further includes:

and randomly dividing the plurality of corpus information to obtain the corpus sets with the preset number.

In a second aspect, an embodiment of the present application provides an automatic response processing apparatus, including:

and the acquisition module is used for acquiring the text of the inquiry information input by the user.

A processing module, configured to input a text of the query information into a target text classification model, so as to obtain text classification information of the query information, where the target text classification model is obtained by using a plurality of target training texts to train an intermediate text classification model, where the target training texts include a plurality of corpus information and target classification information of each corpus information, the target classification information is obtained by processing the corpus information based on an initial text classification model in advance, and the intermediate text classification model is obtained by updating the initial text classification model when the initial text classification model processes the corpus information; determining response information of the inquiry information according to the text classification information of the inquiry information; and outputting response information of the inquiry information.

In one possible implementation, the processing module is further configured to:

In a possible implementation manner, the processing module is specifically configured to:

dividing the plurality of corpus information into a preset number of corpus sets, wherein the preset number is more than or equal to 2, and taking a first corpus set in the preset number of corpus sets as a to-be-classified set; and clustering the to-be-classified set into at least one subset to obtain initial classification information of each corpus information in the to-be-classified set, wherein the initial classification information of each corpus information in the same subset is the same.

If the to-be-classified set comprises all the corpus information, ending the circulation, and taking the initial classification information of each corpus information in the to-be-classified set as the target classification information of each corpus information.

and when clusters with the number of the corpus information larger than a third threshold exist in the clustering result, modifying the classification value of the initial classification information of the corpus information in the clusters into the same value.

and when the value of the confusion parameter of the two classes to which the corpus information in the corrected to-be-classified collection belongs is larger than a third threshold value, merging the initial classification information of the corpus information belonging to the two classes into the same initial classification information, wherein the confusion parameter is used for representing the confusion degree between the two classes.

In one possible implementation, the processing module is further configured to:

inputting each corpus information of the second corpus set into the initial text classification model to obtain reference classification information of each corpus information of the second corpus set and a probability value of the reference classification information; if the probability value is smaller than a fourth threshold value, modifying the classification value of the reference classification information into no classification; and if the number of the corpus information with the classification value of the reference classification information being no classification is larger than a fifth threshold value, clustering the corpus information with the classification value being no classification into at least one set, wherein the classification values of the corpus information in the same set are the same, and obtaining the initial classification information of each corpus information in the second corpus set according to the clustering result.

In one possible implementation, the processing module is further configured to:

In a third aspect, an embodiment of the present application provides an electronic device, including:

a memory for storing program instructions.

A processor for calling and executing the program instructions in the memory to perform the method steps of the first aspect.

In a fourth aspect, an embodiment of the present application provides a readable storage medium, where a computer program is stored, where the computer program is used to execute the method in the first aspect.

The automatic response processing method, the automatic response processing device, the electronic equipment and the readable storage medium provided by the embodiment of the application, after the query information input by the user is acquired, the text classification information of the query information can be obtained by using the target text classification model, so that the intention of the user is acquired, and further outputs response information to the user based on the text classification information, the target text classification model used in the above process is obtained by training the intermediate text classification model by using the target training text, and the target classification information of the target training text is obtained by processing the speech information based on the initial text classification model, namely, the text classification information of the corpus is automatically obtained based on the initial text classification model without combing all possible service classifications and manually labeling data in advance, therefore, the time consumption in the training phase of the model can be greatly reduced, and the efficiency of the training phase of the model can be greatly improved.

Drawings

In order to more clearly illustrate the technical solutions in the present application or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a diagram of an exemplary system architecture according to an embodiment of the present application;

fig. 2 is a schematic flowchart of an automatic response processing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of an automatic response processing method according to an embodiment of the present application;

fig. 4 is a schematic flowchart of an automatic response processing method according to an embodiment of the present application;

FIG. 5 is an exemplary diagram of determining target classification information for material information using the process of FIG. 4;

fig. 6 is a block diagram of an automatic response processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device 700 according to an embodiment of the present disclosure.

Detailed Description

In order to make the purpose, technical solutions and advantages of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the prior art, main text classification is mainly customized based on services to be undertaken by an unmanned customer service system. Before training a text classification model, an operator of an unmanned customer service system needs to firstly card service points of borne services, perform data labeling on corpora used for training the text classification model based on the service points, and train the text classification model by using the corpora after the data labeling. In the prior art, all possible service points to which the corpus belongs, i.e. service classes, are known, and these service points need to be manually combed in advance. For example, the unmanned customer service system is an unmanned customer service system of a certain bank, and at all possible service points of the bank, such as transfer, remittance, transfer may also include personal to business transfer, business to business transfer and the like. This way of manually combing the service points ahead of time tends to be time consuming and inefficient. Meanwhile, after the service points are combed out, the corpora of the training text classification model are labeled purely manually or by means of tools. The training model has a huge corpus, so that the labeling of the corpus takes a long time. Therefore, the time consumption for training the text classification model by using the method in the prior art is long, and the efficiency of constructing the unmanned customer service system is low. In addition, the problems of service point omission or errors, labeling errors and the like can also exist in manual service point combing and data labeling.

In consideration of the problems that in the prior art, the time consumption of a text classification model training stage is long and the efficiency is low due to manual combing of service points and manual marking of corpora, the embodiment of the application can finish the marking of training data without manual participation in the training of the text classification model by automatically identifying the service classification of the corpora and automatically marking the corpora, so that the time consumption of the model training stage is greatly reduced, and the accuracy of data marking can be ensured.

Fig. 1 is a diagram of an exemplary system architecture according to an embodiment of the present application, and as shown in fig. 1, the method according to an embodiment of the present application involves an initial text classification model, an intermediate text classification model, and a target text classification model. The relationship between the three is as follows: firstly, an initial text classification model is constructed, automatic labeling of the corpus is achieved by the aid of the initial text classification model, and in the process, the initial text classification model is updated. And after the labeling of all the linguistic data is finished, taking the updated initial text classification model as the intermediate text classification model. And training the intermediate text classification model by using the labeled corpus, and taking the trained intermediate text classification model as the target text classification model after the training is finished. The target text classification model is further used in the automatic answer processing method of the embodiment of the application.

In a specific implementation process, the initial text classification model, the intermediate text classification model, and the target text classification model may be run on the same electronic device, for example, the initial text classification model is used to complete corpus labeling, the intermediate text classification model is trained, and the target text classification model is used to perform response processing in the same server, or may be run on different electronic devices. The embodiment of the present application is not particularly limited to this.

The automatic response processing method of the embodiment of the application can be applied to unmanned customer service systems, such as unmanned customer service systems in the fields of banks, online shopping platforms and the like. Or, the automatic response processing method according to the embodiment of the present Application may also be applied to other scenes that require automatic response processing, for example, an automatic response Application (APP) of a mobile phone. The following embodiments are all described by taking an unmanned customer service system scenario as an example.

Fig. 2 is a schematic flowchart of an automatic response processing method according to an embodiment of the present application, in which an execution subject of the method is an electronic device running a target text classification model, for example, a server of an unmanned customer service system of a certain bank. As shown in fig. 2, the method includes:

s201, acquiring a text of inquiry information input by a user.

Illustratively, a user uses an unmanned customer service system of a certain bank for problem consultation. The user can input inquiry information on the client of the unmanned customer service system in a voice or text mode and the like. If the user input is speech, the client may first convert the speech to text. The client further sends the text of the query information to the server, and the server receives the text of the query information.

S202, inputting the text of the query information into a target text classification model to obtain text classification information of the query information, wherein the target text classification model is obtained by training an intermediate text classification model by using a plurality of target training texts, the target training texts comprise a plurality of corpus information and target classification information of each corpus information, and the target classification information is obtained by processing the corpus information based on an initial text classification model in advance.

After the query information is input into the target text classification model, the target text classification model may output text classification information of the query information, which may characterize the intent of the user. Illustratively, the query information input by the user is 'what the flow of asking for transfer', the target text classification model outputs the text classification information as 'transfer', and the intention of the user is stated to be related to the transfer.

The text classification information may be a business classification, for example, for an unmanned customer service system of a bank, the business classification may include: money transfer, etc.

The target text classification model is obtained by training the intermediate text classification model by the plurality of target training texts, and the target training texts comprise corpus information and target classification information of the corpus information. The corpus information may be the query information, for example, when training an intermediate text classification model of an unmanned customer service system of a bank, a large amount of historical query information of customers of the bank may be collected in advance, and the historical query information may be used as the corpus information. And the corpus information is processed based on the constructed initial classification model, and the target classification information of all corpus information can be automatically generated.

It should be noted that the target classification information, and the initial classification information and the reference classification information described below in the embodiments of the present application are all referred to as classification information. For example, in a bank unmanned customer service system, the classification information may be money transfer, remittance, or the like.

S203, determining response information of the query information according to the text classification information of the query information.

Optionally, the server may pre-store a mapping table between the text classification information and the response information, and for a certain text classification information, the optimal response information may be queried through the mapping table.

And S204, outputting response information of the inquiry information.

For example, after the server of the unmanned customer service system of the bank obtains the response information through the above process, the response information may be sent to the terminal device where the client of the unmanned customer service system is located, and the terminal device performs text display or voice broadcast at the client.

In this embodiment, after the query information input by the user is obtained, the text classification information of the query information can be obtained by using the target text classification model, so as to obtain the intention of the user, and then the response information can be output to the user based on the text classification information.

The following describes a process of determining text classification information of corpus information and training an intermediate text classification model using the text classification information.

Fig. 3 is a schematic flow chart of an automatic response processing method according to an embodiment of the present application, and as shown in fig. 3, before the target text classification model is used for the automatic response of fig. 2, the target text classification model may be obtained through the following processes:

s301, determining target classification information of the corpus information according to the initial classification information of the corpus information and the reference classification information of the corpus information output by the initial text classification model, and updating the initial text classification model to obtain the intermediate text classification model.

Optionally, the initial classification information of the corpus information is classification information obtained by using a specific classification manner, and the specific classification manner may be, for example, a clustering manner.

In addition, the constructed initial text classification model can obtain the reference classification information of the corpus information. Based on the classification information obtained by the two methods, the target classification information of the corpus information can be obtained.

When the reference classification information is output based on the initial text classification model, the initial text classification model is correspondingly updated, and after the target classification information of all the corpus information is determined, the updated initial text classification information can be used as the intermediate text classification model.

S302, training the intermediate text classification model by using the corpus information and the target classification information of the corpus information to obtain the target text classification model.

And after the target classification information of the corpus information is obtained, finishing the classification information labeling of the corpus information. And taking the marked corpus as a training text to train the intermediate classification model. The training process may include multiple times of training, and when a training result of a certain time is close enough to a target classification result of the corpus, the training may be ended, and the intermediate text classification model after the training is taken as the target text classification model.

In this embodiment, based on the initial classification information of the corpus information and the reference classification information output by the initial text classification model, automatic labeling of the target classification information of the corpus information can be achieved, and the two types of text classification information are classification information obtained by different methods, so that the target classification information with high accuracy can be obtained by processing the two types of classification information.

The following is a process of obtaining the target classification information based on the initial classification information and the reference classification information in the above-described step S301.

Fig. 4 is a flowchart illustrating an automatic response processing method according to an embodiment of the present application, and as shown in fig. 4, a process of obtaining target classification information based on initial classification information and reference classification information may include:

s401, dividing the plurality of corpus information into a preset number of corpus sets, and taking a first corpus set in the preset number of corpus sets as a to-be-classified set.

Wherein the preset number is greater than or equal to 2.

In this embodiment, all corpora are divided into a plurality of corpus sets, classification information of one corpus set is determined first, and the remaining corpus sets are added continuously through circulation, each time circulation is performed, all corpora of a new corpus set are added, and the corpus sets are processed continuously as a whole again, and in the process of multiple circulation, classification information of each corpus information is adjusted and optimized continuously.

In this step, all the corpus information is first divided into a plurality of corpus sets, and the corpus sets may be divided uniformly or non-uniformly. And selecting one set from the corpus sets as an initial processing set, and continuously adding the rest corpus sets one by one subsequently.

For example, 5000 pieces of corpus information are total, the 5000 pieces of corpus information may be uniformly divided into 5 corpus sets, each corpus set includes 1000 pieces of corpus information, and one corpus set is selected from the 5 corpus sets as an initial to-be-classified set.

Optionally, when the plurality of corpus sets are divided, the corpus information may be randomly divided to obtain a plurality of corpus sets. By means of random division, the actual classification distribution of the corpora of each corpus set can be uniform, and inaccurate processing results during subsequent circulation processing are avoided.

In addition, before the corpus information is divided, the corpus information may be preprocessed first, which may include: eliminating stop words, full angles to half angles, eliminating emoticons, eliminating call words and invalid problems, replacing Chinese punctuation with English punctuation, and eliminating common punctuation.

S402, clustering the to-be-classified set into at least one subset to obtain initial classification information of each corpus information in the to-be-classified set, wherein the initial classification information of each corpus information in the same subset is the same.

Optionally, the to-be-classified set may be clustered into at least one subset by using a specific clustering algorithm, and after clustering, the corpus information in the same subset has the same classification information, which may be used as the initial classification information of each corpus information in the to-be-classified set.

Optionally, in a cluster of the clustering result, if the number of corpus information in a certain cluster is greater than a certain threshold, an identifier of the cluster, for example, an ID of the cluster, may be used as initial classification information of the corpus information in the cluster. If the number of the corpus information in a certain cluster is less than or equal to the threshold value, the classification value of the initial classification information of the corpus information in the cluster is marked as no classification (other classification).

The above steps S401 to S402 are processing procedures for the initial to-be-classified set, and after the initial classification information of the to-be-classified set is obtained, the following step S403 is performed. Steps S403 to S405 described below are a loop execution process.

S403, updating the initial text classification model according to the initial classification information of each corpus information in the to-be-classified collection and the reference classification information of each corpus information in the to-be-classified collection output by the initial text classification model, and determining whether the initial classification information of the corpus information needs to be corrected, if so, executing step S404, and if not, executing step S405.

If the set is an initial set to be classified, the initial classification information is the identification or no classification of the clusters obtained in the clustering process of the step S402, and if the step is executed after the circulation of the step S405, the initial classification information is the classification information of the original set to be classified and the classification information obtained by processing the newly added corpus set through the initial text classification model.

In this step, for each corpus information in the set to be classified, the reference classification information of each corpus information is obtained by using the initial text classification model. Specifically, the set to be classified may be averagely divided into a plurality of parts according to the initial classification information, so that the classification information of the corpus information in each part is distributed more uniformly. And for the plurality of parts of corpus information, one part of the corpus information is used for prediction, and the other parts of the corpus information are used for corpus training, wherein the process is an evaluation process of an initial text classification model, after the process is finished, the initial text classification model is updated, and the corpus information in the to-be-classified collection obtains the reference classification information of the corpus information.

The initial text classification model may output reference classification information for each corpus information and a probability value of the reference classification information.

Based on the initial classification information and the reference classification information, it can be determined whether the accuracy of the initial classification information meets a preset requirement, that is, whether the initial classification needs to be corrected, if so, it indicates that the accuracy of the initial classification information is not high, and then step S404 is performed to correct the initial classification information, otherwise, it indicates that the initial classification information has reached the preset requirement in the current cycle, and then step S405 is performed to add new corpus information and continue the determination and correction.

Optionally, the classified PRF may be calculated based on the initial classification information and the reference classification information, where P is Precision (Precision), R is Recall (Recall), and F is an F-measure (F-measure), and whether the initial classification information meets the preset requirement is determined according to the PRF.

S404, correcting the initial classification information of the corpus information according to the initial classification information and the reference classification information to obtain new initial classification information of each corpus information of the to-be-classified collection, and executing the step S403.

After entering this step, it is described that the initial classification information does not meet the preset requirement, and therefore, the initial classification information may be corrected based on the initial classification information and the reference classification information, and a specific process of the correction will be described in detail in the following embodiments.

After the correction, it is described that the initial classification information of part or all of the corpus information in the to-be-classified collection is changed, so that step S403 may be continuously performed to verify whether the corrected initial classification information is reasonable, and if not, the correction is continued until the initial classification information is reasonable.

S405, if the to-be-classified collection comprises all the corpus information, executing a step S407, otherwise, executing a step S406.

S406, adding the second corpus set in the preset number of corpus sets into the to-be-classified set to obtain a new to-be-classified set, and executing the step S403, wherein the initial classification information of each corpus information in the second corpus set is obtained by processing the second corpus set based on the initial text classification model.

S407, taking the initial classification information of each corpus information in the to-be-classified collection as the target classification information of each corpus information, and ending.

If the to-be-classified set does not include all the corpora in the corpus information, it indicates that the processing of all the corpus information has not been completed, and therefore, one corpus set of the remaining corpus sets of the corpus sets divided in step S401 is added to the original to-be-classified set, a new to-be-classified set is formed, and step S403 is continuously performed for processing. The initial classification information of the second corpus can be obtained from the initial text classification model, and the specific process will be described in detail in the following embodiments.

If all the corpora in the corpus information are included in the to-be-classified set, it is indicated that the processing of all the corpus information is completed, and the initial classification information of all the corpus information meets the preset requirement, and at this time, the initial classification information of each corpus can be used as the target classification information. And, the initial text classification model after the loop is finished may be used as the aforementioned intermediate text classification model.

In the cyclic processing process, firstly, part of the corpora in the whole corpora is selected as a processing object, firstly, the initial classification information and the reference classification information of the part of the corpora are utilized to determine the classification information of the part of the corpora, then, part of the corpora in the rest of the corpora is added into the corpora information which has obtained the classification information, the classification of all the corpora information after the addition is determined by continuously utilizing the initial classification information and the reference classification information, and the cycle is continuously carried out until all the corpora are added and the initial classification information of all the corpora is reasonable. Through the above-mentioned cycle process, the classification information of all the corpora is continuously adjusted and optimized, and then the accuracy of the target classification information of all the corpora information obtained after the cycle is finished is ensured, meanwhile, in the above-mentioned process, the initial text classification model is continuously evaluated, so that the initial text classification model is updated and optimized, and therefore, the training time in the subsequent training can be reduced.

Fig. 5 is an exemplary diagram of determining target classification information using the process of fig. 4 for corpus information, and as shown in fig. 5, assuming that 5000 corpus information is total, 5000 corpus information may be divided into 5 corpus sets, each prediction set includes 1000 corpus information, first obtaining modified initial classification information of the 1000 corpora based on the reference classification information and the initial classification information of the 1000 corpora, then adding a second 1000 corpus information to form 2000 corpus information, obtaining modified initial classification information of the 2000 corpora based on the reference classification information and the initial classification information of the 2000 corpora, and continuously circulating until modified initial classification information of all 5000 corpora, that is, target classification information of 5000 corpora is obtained.

The following describes the process of modifying the initial classification information of the corpus information according to the initial classification information and the reference classification information in step S404 to obtain new initial classification information of each corpus information of the to-be-classified collection.

The initial classification information can be corrected by the following three steps.

Step one, the classification value of the initial classification information of the corpus information meeting a first condition is modified into no classification, and the first condition comprises the following steps: the probability value of the reference classification information of the corpus information is smaller than a first threshold, the initial classification information of the corpus information is different from the reference classification information, and the classification value of the initial classification information of the corpus information is not classified.

If a certain corpus information satisfies the first condition, it indicates that the current initial classification information of the corpus information is not classified (other classification), and the reference classification information of the corpus information has low reliability (probability value is smaller than the first threshold), and the results of the initial classification information and the reference classification information are not consistent, in this case, the classification value of the initial classification information of the corpus information is modified to be classified, so as to perform further processing.

And secondly, if the number of the linguistic data information with the classification value of the initial classification information in the to-be-classified set as no classification is larger than a second threshold value, clustering the linguistic data information with the classification value of no classification into at least one set, wherein the classification values of the linguistic data information in the same set are the same, and correcting the initial classification information of the linguistic data information with the classification value of no classification in the to-be-classified set according to the clustering result.

The above number is greater than the second threshold, which may mean that the actual number is greater than a certain threshold, or may also mean that the ratio of the number of the non-classified corpus information to the number of the corpus information of the to-be-classified collection is greater than a certain threshold.

If the number of the non-classified corpus information is larger than the second threshold, it indicates that the number of the non-classified corpus information is larger, and at this time, the corpus information can be divided through clustering.

Optionally, if a cluster with the number of corpus information greater than the third threshold exists in the clustering result, the classification value of the initial classification information of the corpus information in the cluster is modified to the same value. The same modified value is a new classification value, i.e. different from the classification values of other initial classification information in the to-be-classified set.

Optionally, the identification of the cluster, for example, the cluster ID, may be used as the initial classification information of the corpus information in the cluster.

And thirdly, combining the corrected initial classification information of each corpus information in the to-be-classified set according to the confusion parameter among the corrected initial classification information of each corpus information in the to-be-classified set to obtain new initial classification information of each corpus information in the to-be-classified set.

Specifically, if the value of the confusion parameter of the two classes to which the corpus information in the corrected to-be-classified collection belongs is greater than the third threshold, the initial classification information of the corpus information belonging to the two classes is merged into the same initial classification information, and the confusion parameter is used for representing the confusion degree between the two classes.

Alternatively, the confusion parameter may be calculated using the following formula (1).

Wherein N is_{catei，catej}Representing the actual intention as catei, but misclassified to the number of catej, N_cateiThe actual number of the cateis is shown,

indicating the number of predicted cateis.

The following describes a process of obtaining initial classification information of each corpus information of the second corpus set related to the above step S406.

Firstly, inputting each corpus information of the second corpus set into the initial text classification model to obtain reference classification information of each corpus information of the second corpus set and a probability value of the reference classification information.

The initial text classification model refers to the initial text classification model updated in step S403.

And secondly, obtaining initial classification information of each corpus information of the second corpus set according to the reference classification information and the probability value of each corpus information of the second corpus set. Specifically, for the reference classification information with the probability value larger than the fourth threshold, the reference classification information may be directly used as the initial classification information of the corpus information. And for the reference classification information with the probability value smaller than or equal to the fourth threshold value, modifying the classification value of the reference classification information into no classification. And further, if the reference classification information value is that the number of the language material information without classification is larger than a fifth threshold value, clustering the language material information with the classification value without classification into at least one set, wherein the classification values of the language material information in the same set are the same, and obtaining the initial classification information of each language material information in the second language material set according to the clustering result. Optionally, if there is a cluster in which the number of the corpus information is greater than the third threshold in the clustering result, the classification value of the reference classification information of the corpus information in the cluster is modified to the same value. The same modified value is a new classification value, i.e. different from the classification values of other initial classification information in the to-be-classified set. And taking the modified value as the initial classification information of the corpus information in the cluster. And taking the reference classification information of the corpus information in other sets in the clustering result as the initial classification information of the corpus information.

Fig. 6 is a block diagram of an automatic response processing apparatus according to an embodiment of the present application, and as shown in fig. 6, the apparatus includes:

the obtaining module 601 is configured to obtain a text of query information input by a user.

A processing module 602, configured to input a text of the query information into a target text classification model, so as to obtain text classification information of the query information, where the target text classification model is obtained by using a plurality of target training texts to train an intermediate text classification model, where the target training text includes a plurality of corpus information and target classification information of each corpus information, the target classification information is obtained by processing the corpus information based on an initial text classification model in advance, and the intermediate text classification model is obtained by updating the initial text classification model when the initial text classification model processes the corpus information; determining response information of the inquiry information according to the text classification information of the inquiry information; and outputting response information of the inquiry information.

In an alternative embodiment, the processing module 602 is further configured to:

In an optional implementation, the processing module 602 is specifically configured to:

modifying the classification value of the initial classification information of the corpus information meeting a first condition into no classification, wherein the first condition comprises the following steps: the probability value of the reference classification information of the corpus information is smaller than a first threshold, the initial classification information of the corpus information is different from the reference classification information, and the classification value of the initial classification information of the corpus information is not classified.

And if the number of the linguistic data information with the classification value of the initial classification information in the to-be-classified set as no classification is larger than a second threshold value, clustering the linguistic data information with the classification value of no classification into at least one set, wherein the classification values of the linguistic data information in the same set are the same, and correcting the initial classification information of the linguistic data information with the classification value of no classification in the to-be-classified set according to the clustering result.

And combining the corrected initial classification information of each corpus information in the to-be-classified collection according to the confusion parameter among the corrected initial classification information of each corpus information in the to-be-classified collection to obtain new initial classification information of each corpus information in the to-be-classified collection.

The automatic response processing apparatus provided in the embodiment of the present application may perform the method steps in the foregoing method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the determining module may be a processing element separately set up, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the determining module is called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when some of the above modules are implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor that can call program code. As another example, these modules may be integrated together, implemented in the form of a system-on-a-chip (SOC).

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Fig. 7 is a schematic structural diagram of an electronic device 700 according to an embodiment of the present disclosure. As shown in fig. 7, the electronic device may include: the system comprises a processor 71, a memory 72, a communication interface 73 and a system bus 74, wherein the memory 72 and the communication interface 73 are connected with the processor 71 through the system bus 74 and complete mutual communication, the memory 72 is used for storing computer execution instructions, the communication interface 73 is used for communicating with other devices, and the processor 71 implements the scheme of the embodiment shown in fig. 2 to 5 when executing the computer program.

The system bus mentioned in fig. 7 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The memory may comprise Random Access Memory (RAM) and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The processor may be a general-purpose processor, including a central processing unit CPU, a Network Processor (NP), and the like; but also a digital signal processor DSP, an application specific integrated circuit ASIC, a field programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components.

Optionally, an embodiment of the present application further provides a storage medium, where instructions are stored in the storage medium, and when the storage medium is run on a computer, the storage medium causes the computer to execute the method according to the embodiment shown in fig. 2 to 5.

Optionally, an embodiment of the present application further provides a chip for executing the instruction, where the chip is configured to execute the method in the embodiment shown in fig. 2 to 5.

The embodiment of the present application further provides a program product, where the program product includes a computer program, where the computer program is stored in a storage medium, and the computer program can be read from the storage medium by at least one processor, and when the computer program is executed by the at least one processor, the method of the embodiment shown in fig. 2 to 5 may be implemented.

In the embodiments of the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship; in the formula, the character "/" indicates that the preceding and following related objects are in a relationship of "division". "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

It is to be understood that the various numerical references referred to in the embodiments of the present application are merely for convenience of description and distinction and are not intended to limit the scope of the embodiments of the present application.

It should be understood that, in the embodiment of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. An automatic response processing method, comprising:

acquiring a text of inquiry information input by a user;

inputting a text of the query information into a target text classification model to obtain text classification information of the query information, wherein the target text classification model is obtained by training an intermediate text classification model by using a plurality of target training texts, the target training texts comprise a plurality of corpus information and target classification information of each corpus information, the target classification information is obtained by processing the corpus information based on an initial text classification model in advance, and the intermediate text classification model is obtained by updating the initial text classification model when the initial text classification model processes the corpus information;

determining response information of the inquiry information according to the text classification information of the inquiry information;

and outputting response information of the inquiry information.

2. The method of claim 1, wherein prior to entering the text of the query information into the target text classification model, further comprising:

determining target classification information of the corpus information according to the initial classification information of the corpus information and reference classification information of the corpus information output by the initial text classification model, and updating the initial text classification model to obtain an intermediate text classification model;

and training the intermediate text classification model by using the corpus information and the target classification information of the corpus information to obtain the target text classification model.

3. The method according to claim 2, wherein said determining target classification information of said corpus information and updating said initial text classification model according to initial classification information of said corpus information and reference classification information of said corpus information outputted by said initial text classification model comprises:

dividing the plurality of corpus information into a preset number of corpus sets, wherein the preset number is more than or equal to 2, and taking a first corpus set in the preset number of corpus sets as a to-be-classified set;

clustering the to-be-classified set into at least one subset to obtain initial classification information of each corpus information in the to-be-classified set, wherein the initial classification information of each corpus information in the same subset is the same;

A. updating the initial text classification model according to the initial classification information of each corpus information in the to-be-classified collection and the reference classification information of each corpus information in the to-be-classified collection output by the initial text classification model, and determining whether the initial classification information of the corpus information needs to be corrected or not; if yes, executing the step B, otherwise, executing the step C;

B. correcting the initial classification information of the corpus information according to the initial classification information and the reference classification information to obtain new initial classification information of each corpus information of the to-be-classified collection, and executing the step A;

C. if the to-be-classified set does not include all the corpus information, adding a second corpus set in the corpus sets with the preset number into the to-be-classified set to obtain a new to-be-classified set, wherein initial classification information of each corpus information in the second corpus set is obtained by processing the second corpus set based on the initial text classification model, and executing the step A; and if the to-be-classified set comprises all the corpus information, taking the initial classification information of each corpus information in the to-be-classified set as the target classification information of each corpus information, and ending.

4. The method according to claim 3, wherein said modifying the initial classification information of the corpus information according to the initial classification information and the reference classification information to obtain new initial classification information of each corpus information of the to-be-classified collection comprises:

modifying the classification value of the initial classification information of the corpus information meeting a first condition into no classification, wherein the first condition comprises the following steps: the probability value of the reference classification information of the corpus information is smaller than a first threshold, the initial classification information of the corpus information is different from the reference classification information, and the classification value of the initial classification information of the corpus information is not classified;

if the number of the linguistic data information with the classification value of the initial classification information in the to-be-classified set as no classification is larger than a second threshold value, clustering the linguistic data information with the classification value of no classification into at least one set, wherein the classification values of the linguistic data information in the same set are the same, and correcting the initial classification information of the linguistic data information with the classification value of no classification in the to-be-classified set according to the clustering result;

5. The method according to claim 4, wherein said modifying the initial classification information of the corpus information whose classification value is not classified in the to-be-classified collection according to the clustering result comprises:

6. The method according to claim 4, wherein said merging the corrected initial classification information of each corpus information in the to-be-classified collection according to the confusion parameter between the corrected initial classification information of each corpus information in the to-be-classified collection to obtain new initial classification information of each corpus information in the to-be-classified collection comprises:

7. The method according to claim 3, wherein before adding the second corpus set of the preset number of corpus sets to the to-be-classified set to obtain a new to-be-classified set, the method further comprises:

inputting each corpus information of the second corpus set into the initial text classification model to obtain reference classification information of each corpus information of the second corpus set and a probability value of the reference classification information;

if the probability value is smaller than a fourth threshold value, modifying the classification value of the reference classification information into no classification;

8. The method according to any one of claims 3-7, wherein before dividing the corpus information into a preset number of corpus sets, the method further comprises:

9. An automatic response processing apparatus, comprising:

the acquisition module is used for acquiring a text of inquiry information input by a user;

a processing module, configured to input a text of the query information into a target text classification model, so as to obtain text classification information of the query information, where the target text classification model is obtained by using a plurality of target training texts to train an intermediate text classification model, where the target training texts include a plurality of corpus information and target classification information of each corpus information, the target classification information is obtained by processing the corpus information based on an initial text classification model in advance, and the intermediate text classification model is obtained by updating the initial text classification model when the initial text classification model processes the corpus information; and the number of the first and second groups,

determining response information of the inquiry information according to the text classification information of the inquiry information; and the number of the first and second groups,

and outputting response information of the inquiry information.

10. An electronic device, comprising:

a memory for storing program instructions;

a processor for invoking and executing program instructions in said memory for performing the method steps of any of claims 1-8.

11. A readable storage medium, characterized in that a computer program is stored in the readable storage medium for performing the method of any of claims 1-8.