CN111159360B - Method and device for obtaining query topic classification model and query topic classification - Google Patents

Method and device for obtaining query topic classification model and query topic classification Download PDF

Info

Publication number
CN111159360B
CN111159360B CN201911422174.6A CN201911422174A CN111159360B CN 111159360 B CN111159360 B CN 111159360B CN 201911422174 A CN201911422174 A CN 201911422174A CN 111159360 B CN111159360 B CN 111159360B
Authority
CN
China
Prior art keywords
sample
question
answer
topic
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911422174.6A
Other languages
Chinese (zh)
Other versions
CN111159360A (en
Inventor
杨帆
方磊
方四安
方昕
徐承
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Ustc Iflytek Co ltd
Original Assignee
Hefei Ustc Iflytek Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Ustc Iflytek Co ltd filed Critical Hefei Ustc Iflytek Co ltd
Priority to CN201911422174.6A priority Critical patent/CN111159360B/en
Publication of CN111159360A publication Critical patent/CN111159360A/en
Application granted granted Critical
Publication of CN111159360B publication Critical patent/CN111159360B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Educational Administration (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Human Computer Interaction (AREA)
  • Development Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method and a device for obtaining a query topic classification model and a query topic classification, wherein the method comprises the following steps: firstly, carrying out natural language preprocessing on sample question-answer pairs in a sample news inquiry record to obtain each sample question sentence participle and each sample answer sentence participle; then, training a convolutional neural network based on an attention mechanism to obtain a signal query topic classification model according to each sample question word and each corresponding contribution degree score thereof, each sample answer sentence word and each corresponding contribution degree score thereof and the labeled topic category of the sample question-answer pair. Therefore, the topic category information of the sample question-answer pair is increased through the contribution degree score, and the training of information query topic classification is enhanced; and the convolutional neural network based on the attention mechanism can fully learn the relationship between the word segmentation of the 'strong topic' and the topic category, so that the topic classification effect of the topic classification model for the inquiry signal is better, and the accuracy of the subsequent topic classification for the inquiry signal is improved.

Description

Method and device for obtaining information query topic classification model and information query topic classification
Technical Field
The application relates to the technical field of data analysis, in particular to a method and a device for obtaining a news query topic classification model and a news query topic classification.
Background
The inquiry record is used as an important basis for solving and conviction through cases, and in order to avoid that staff artificially extracts and marks element information from the inquiry record, the content of the inquiry record needs to be automatically structured, namely, the question and answer pairs in the inquiry record are automatically classified into topics.
Currently, for question and answer pairs in an inquiry record, topic classification is generally performed by using a topic classification method based on machine learning, specifically, feature extraction is performed on the question and answer pairs in the inquiry record, and the extracted question and answer pair features are input into a simple classification model for parameter training and classification prediction.
However, the inventor finds that the topic classification method based on machine learning has low accuracy, and actually performs question-answer pair tests in the news inquiry record, so that the classification accuracy is low, and some topics can only reach about 40% and cannot reach the practical usable degree at all.
Disclosure of Invention
In view of this, the embodiments of the present application provide a method and an apparatus for obtaining a topic classification model and a topic classification method, so that the topic classification effect of the topic classification model is better, and the accuracy of subsequent topic classification methods is improved.
In a first aspect, an embodiment of the present application provides a method for obtaining a query topic classification model, where the method includes:
carrying out natural language preprocessing on the sample question-answer pairs in the sample news inquiry record to obtain each sample question sentence participle and each sample answer sentence participle;
and training a convolutional neural network based on an attention mechanism to obtain a signal query topic classification model according to each sample question word and each corresponding contribution degree score thereof, each sample answer sentence word and each corresponding contribution degree score thereof and the labeled topic category of the sample question-answer pair.
Optionally, the training, according to each sample question word and each corresponding contribution score thereof, each sample answer sentence word and each corresponding contribution score thereof, and the labeled topic category of the sample question-answer pair, the attention-based convolutional neural network is trained to obtain a query topic classification model, including:
obtaining a first matrix based on each sample question word and each corresponding contribution degree score; obtaining a second matrix based on each sample answer sentence segmentation word and each corresponding contribution degree score;
splicing the first matrix and the second matrix to obtain a third matrix;
obtaining a feature vector of the sample question-answer pair based on the third matrix and the weight vector; the weight vector is obtained based on a transpose of the third matrix;
obtaining the predicted topic category of the sample question-answer pair based on the feature vector and a preset activation function;
and training network parameters in the convolutional neural network based on the attention mechanism according to the predicted topic category and the labeled topic category to obtain the inquiry topic classification model.
Optionally, the obtaining a first matrix based on each sample question word and each corresponding contribution score thereof includes:
obtaining word vectors of the participles of the sample question sentences;
respectively splicing word vectors of the sample question participles and the corresponding contribution degree scores to obtain the first matrix;
the obtaining a second matrix based on each sample answer sentence segmentation word and each corresponding contribution degree score comprises:
obtaining a word vector of each sample answer sentence segmentation;
and respectively splicing the word vectors of the sample answer sentence participles and the corresponding contribution degree scores to obtain the second matrix.
Optionally, the step of obtaining each contribution score includes:
based on each sample question sentence participle and each sample answer sentence participle, obtaining a contribution score of each participle by using a word frequency-inverse file frequency algorithm;
and removing the contribution degree score of the irrelevant topic word in the contribution degree scores of the words based on a preset word list to obtain each contribution degree score.
Optionally, the natural language preprocessing is performed on the sample question-answer pairs in the sample news inquiry record to obtain sample question sentence participles and corresponding sample answer sentence participles, and the method includes:
performing word segmentation on the sample question-answer pairs in the sample news inquiry record to obtain each question word and each answer sentence word;
performing preset entity type character replacement processing on entity nouns which accord with preset entity types in the question clauses and the answer clauses to obtain sample question clauses and sample answer clauses; the preset entity types comprise numbers, time, names of people, names of places and/or names of organizations.
Optionally, if the preset entity type is a number and/or time, the preset entity type character replacement processing is preset entity type character replacement processing based on rule matching; and if the preset entity type is a name of a person, a place and/or an organization, the preset entity type character replacement processing is preset entity type character replacement processing based on named entity prediction.
In a second aspect, an embodiment of the present application provides a method for categorizing a query topic, where the method includes:
natural language preprocessing is carried out on the question-answer pairs to be classified in the inquiry notes of the news to be classified, and each question-sentence segmentation word to be classified and each question-answer segmentation word to be classified are obtained;
inputting each question and sentence segmentation word to be classified and each contribution degree score corresponding to the question and answer word to be classified into a news question and question classification model to obtain the predicted topic category and the predicted probability of the question and answer pair to be classified;
determining a target topic category of the question-answer pair to be classified based on the predicted topic category and the predicted probability of the question-answer pair to be classified;
wherein the query topic classification model is obtained according to the method of any one of the first aspect.
Optionally, the method further includes:
obtaining a topic category set corresponding to case information of the to-be-classified inquiry record;
correspondingly, the step of determining the target topic category of the question-answer pair to be classified based on the predicted topic category and the predicted probability of the question-answer pair to be classified specifically includes:
and determining the target topic category of the question-answer pair to be classified by combining the topic category set based on the predicted topic category and the predicted probability of the question-answer pair to be classified.
In a third aspect, an embodiment of the present application provides an apparatus for obtaining a query topic classification model, where the apparatus includes:
the first obtaining unit is used for carrying out natural language preprocessing on the sample question-answer pairs in the sample news inquiry record to obtain each sample question sentence participle and each sample answer sentence participle;
and the second obtaining unit is used for training a convolutional neural network based on an attention mechanism to obtain a news query topic classification model according to each sample question word and each corresponding contribution degree score thereof, each sample answer word and each corresponding contribution degree score thereof and the labeled topic category of the sample question-answer pair.
In a fourth aspect, an embodiment of the present application provides an apparatus for classifying a query topic, where the apparatus includes:
a third obtaining unit, configured to perform natural language preprocessing on question-answer pairs to be classified in the query transcript of the news to be classified, so as to obtain each question-sentence segmentation word to be classified and each question-answer segmentation word to be classified;
a fourth obtaining unit, configured to input each question and sentence segmentation to be classified and each contribution score corresponding to the question and sentence segmentation to be classified into the news inquiry topic classification model, and obtain a predicted topic category and a predicted probability of the question and answer pair to be classified;
a first determining unit, configured to determine a target topic category of the question-answer pair to be classified based on a predicted topic category and a predicted probability of the question-answer pair to be classified;
wherein the query topic classification model is obtained according to the method of any one of the first aspect.
In a fifth aspect, an embodiment of the present application provides a terminal device, where the terminal device includes a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the method for obtaining a query topic classification model according to any one of the above first aspects according to instructions in the program code, or execute the method for obtaining a query topic classification model according to any one of the above second aspects according to instructions in the program code.
In a sixth aspect, the present invention provides a computer-readable storage medium for storing a program code for executing the method for obtaining a query topic classification model according to any one of the above first aspects, or executing the method for obtaining a query topic classification according to any one of the above second aspects.
Compared with the prior art, the method has at least the following advantages:
by adopting the technical scheme of the embodiment of the application, firstly, natural language preprocessing is carried out on sample question-answer pairs in a sample news inquiry record to obtain each sample question sentence participle and each sample answer sentence participle; then, training a convolutional neural network based on an attention mechanism to obtain a signal query topic classification model according to each sample question word and each corresponding contribution degree score thereof, each sample answer sentence word and each corresponding contribution degree score thereof and the labeled topic category of the sample question-answer pair. Therefore, by means of the contribution degree scores corresponding to the question and sentence participles of each sample and the contribution degree scores corresponding to the answer and sentence participles of each sample, the topic category information of the question and answer pairs of the samples is increased, and the training of news inquiry topic classification is enhanced; and the convolutional neural network based on the attention mechanism can fully learn the relationship between the word segmentation of the strong topic and the topic category, so that the topic classification effect of the topic classification model for the inquiry is better, and the accuracy of the subsequent topic classification for the inquiry is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic diagram of a system framework related to an application scenario in an embodiment of the present application;
fig. 2 is a schematic flowchart of a method for obtaining a topic classification model according to an embodiment of the present application;
fig. 3 is a schematic diagram of sample question-answer pairs to sample question sentence participles and sample answer sentence participles in a sample news query transcript according to an embodiment of the present application;
fig. 4 is a schematic diagram of a new word vector corresponding to a sample question word or a new word vector corresponding to a sample answer word provided in an embodiment of the present application;
fig. 5 is a schematic diagram of obtaining feature vectors of sample question-answer pairs according to an embodiment of the present application;
fig. 6 is a schematic diagram of a weight vector a according to an embodiment of the present disclosure;
fig. 7 is a flowchart illustrating a method for classifying query topics according to an embodiment of the present application;
fig. 8 is a schematic diagram illustrating a method for determining a target topic category of a question-answer pair to be classified according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an apparatus for obtaining a topic classification model according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a device for classifying query topics according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The news inquiry record is used as an important basis for solving and crime through cases, and the automatic topic classification of the question and answer pair in the news inquiry record is particularly important. The inventor finds that if topic classification is carried out by a topic classification method based on keywords and rules, a large number of strong topic words and a large number of syntactic rules need to be set, so that a large amount of manpower, energy and time are consumed, and manual intervention is inevitably needed; the existing topic classification method based on machine learning is low in accuracy, actually through a question-answer pair test in a news inquiry record, the classification accuracy is low, and some topics can only reach about 40% and can not reach the actual usable degree at all.
In order to solve the above problems, in the embodiment of the present application, natural language preprocessing is performed on sample question-answer pairs in a sample news query transcript to obtain each sample question clause participle and each sample answer sentence participle; and training a convolutional neural network based on an attention mechanism to obtain a signal query topic classification model according to each sample question word and each corresponding contribution degree score thereof, each sample answer word and each corresponding contribution degree score thereof and the labeled topic category of the sample question-answer pair. Therefore, by means of the contribution degree scores corresponding to the question and sentence participles of each sample and the contribution degree scores corresponding to the answer and sentence participles of each sample, the topic category information of the question and answer pairs of the samples is increased, and the training of news inquiry topic classification is enhanced; and the convolutional neural network based on the attention mechanism can fully learn the relationship between the word segmentation of the strong topic and the topic category, so that the topic classification effect of the topic classification model for the inquiry is better, and the accuracy of the subsequent topic classification for the inquiry is improved.
For example, one of the scenarios in the embodiment of the present application may be applied to the scenario shown in fig. 1, where the scenario includes the user terminal 101 and the server 102, and the user terminal 101 and the server 102 interact with each other. The user terminal 101 obtains the sample inquiry record and sends the sample inquiry record to the server 102, and the server 102 obtains the inquiry topic classification model by adopting the implementation mode of obtaining the inquiry topic classification model in the embodiment of the application. When the user terminal 101 acquires the query record to be classified and sends the query record to the server 102, the server 102 determines the target topic category of the question-answer pair to be classified in the query record to be classified and sends the target topic category to the user terminal 101 according to the implementation manner of the query topic classification in the embodiment of the application, so that the user terminal 101 displays the target topic category of the question-answer pair to be classified in the query record to be classified to the user.
It is to be understood that, in the above application scenario, although the actions of the embodiments of the present application are described as being performed by the server 102, the present application is not limited in terms of the execution subject as long as the actions disclosed in the embodiments of the present application are performed.
It can be understood that the foregoing scenario is only one example of the scenario provided in the embodiment of the present application, and the embodiment of the present application is not limited to this scenario.
The following describes in detail specific implementations of a method and an apparatus for obtaining a query topic classification model and a query topic classification in the embodiments of the present application by embodiments in combination with the accompanying drawings.
Exemplary method
Method embodiment one
Referring to fig. 2, a schematic flow chart of a method for obtaining a query topic classification model in an embodiment of the present application is shown. In this embodiment, the method may include, for example, the steps of:
step 201: and carrying out natural language preprocessing on the sample question-answer pairs in the sample news inquiry record to obtain each sample question sentence participle and each sample answer sentence participle.
It can be understood that the query topic classification training is performed through the sample query bibliography, and the premise of obtaining the query topic classification model is as follows: and processing the sample question-answer pairs in the sample message inquiry record by utilizing a natural language preprocessing technology to obtain each sample question sentence participle and each sample answer sentence participle.
In practical application, most of the data of the message inquiry record is stored in a picture form, and for the data of the message inquiry record in the picture form, the picture form can be converted into a text form by using an Optical Character Recognition (English: optical Character Recognition, abbreviation: OCR) technology to obtain the sample message inquiry record. For example, an OCR model is obtained by training in advance, and the data of the query record in the form of a picture is input into the OCR model and the sample query record is output.
First, a sample challenge-response pair is obtained from the sample message query transcript. Because the sample message inquiry record has a specific preset format and a specific preset syntax mode of one question and one answer, firstly, the record information which is not related to the message inquiry, such as the beginning date, in the sample message inquiry record can be deleted by utilizing the preset format; and then, carrying out question-answer pair segmentation on the sample message query stroke record by utilizing the preset syntactic pattern to obtain sample question-answer pairs arranged in sequence. Therefore, in an optional implementation manner of the embodiment of the present application, the step of obtaining the sample challenge-response pair may include the following steps:
step A: and deleting the record information which is not related to the sample information inquiry in the sample information inquiry record based on the preset format of the sample information inquiry record.
And B, step B: and based on a preset syntax mode of the sample message query record, segmenting the sample message query record to obtain the sample question-answer pair.
Then, it should be noted that each sample question sentence participle and each sample answer sentence participle are obtained from the sample question-answer pair. Processing sample question-answer pairs in the sample message query stroke book by using word segmentation technical words to obtain each question word segmentation and each answer word segmentation; because each question word and each answer word may include entity nouns conforming to preset entity types such as numbers, time, names of people, names of places, names of organizations and the like, the entity nouns are diversified in form, and if subsequent training is directly performed, training samples are too sparse, so that the model classification accuracy is reduced, so that the entity nouns conforming to the preset entity types need to be replaced by corresponding preset entity type characters before the subsequent training, that is, the entity nouns conforming to the preset entity types in each question word and each answer word need to be processed by using a preset entity type character replacement technology, so as to obtain each sample question word and each sample answer word. Therefore, in an optional implementation manner of this embodiment of the present application, the step 201 may include, for example, the following steps:
step C: and performing word segmentation on the sample question-answer pairs in the sample news inquiry record to obtain each question word segmentation and each answer sentence segmentation.
Step D: performing preset entity type character replacement processing on entity nouns which accord with preset entity types in the question clauses and the answer clauses to obtain sample question clauses and sample answer clauses; the preset entity type comprises a number, time, a person name, a place name and/or an organization name.
Specifically, the digital nouns may be replaced with a preset digital character "num", the time nouns may be replaced with a preset time character "time", the name nouns may be replaced with a preset name character "name", the place nouns may be replaced with a preset place name character "loc", and the organization nouns may be replaced with a preset organization name character "org". The digital nouns and the time nouns are easy to identify and can be processed by utilizing a preset entity type character replacement technology matched with rules; however, the names, places and mechanisms belong to named entities, cannot be exhausted and are not easy to identify, and therefore, the names, places and mechanisms need to be processed by a preset entity type character replacement technology based on named entity prediction, for example, the preset entity type character replacement technology of a neural network predictor based on named entity recognition. Therefore, in an optional implementation manner of the embodiment of the present application, if the preset entity type is a number and/or time, the preset entity type character replacement processing is preset entity type character replacement processing based on rule matching; and if the preset entity type is a name of a person, a place and/or an organization, the preset entity type character replacement processing is preset entity type character replacement processing based on named entity prediction.
As an example, as shown in fig. 3, a schematic diagram of sample question-answer pairs in a sample query transcript to each sample question word and each sample answer sentence word is shown, where the sample question-answer pairs in the sample query transcript in the diagram are subjected to natural language preprocessing, which may specifically be word segmentation processing and preset entity type character replacement processing, so as to obtain each sample question word and each sample answer sentence word.
Step 202: and training a convolutional neural network based on an attention mechanism to obtain a signal query topic classification model according to each sample question word and each corresponding contribution degree score thereof, each sample answer word and each corresponding contribution degree score thereof and the labeled topic category of the sample question-answer pair.
It should be noted that, because a large number of "strong topic" words and a large number of syntactic rules need to be set in the topic classification method based on the keywords and the rules, a large amount of manpower, effort and time are consumed, and manual intervention is inevitably needed; the topic classification method based on machine learning has low accuracy, and actually passes through question-answer pair tests in the news inquiry notes, the classification accuracy is low, and the actual usable degree can not be reached at all. Therefore, in the embodiment of the present application, after obtaining each sample question word and each sample answer sentence word of the sample question-answer pair in step 201, the sample question word, each sample answer sentence word, a contribution degree score representing a contribution degree of each sample question word and each answer sentence, a contribution degree score representing a contribution degree of each sample answer sentence word and each answer sentence, and a labeled topic category of the sample answer pair are all input into an attention-based convolutional neural network capable of sufficiently learning a relationship between a "strong topic" word and a topic category for training, so as to increase topic category information of the sample question-answer pair, enhance training of signal question classification, and obtain a signal question classification model with a better topic classification effect.
First, considering that there is a difference in contribution degree of each sample question word and each sample answer word to the topic type of the sample question-answer pair, in order to increase the topic type information of the sample question-answer pair in the news query excerpt and enhance the training of news query topic classification, it is necessary to assign different contribution degree scores to each sample question word and each sample answer word. In order to avoid that the contribution degree score of the partial strong topic participles is small, and the contribution degree score of the partial irrelevant topic participles is large, in the embodiment of the application, the contribution degree score of each sample question participle and each participle in each sample answer participle is obtained by using a word frequency-inverse file frequency algorithm, the contribution degree score of the irrelevant topic participle in the contribution degree scores of the participles is removed by using a preset word list formed by common irrelevant topic participles, and each contribution degree score corresponding to each sample question participle and each contribution degree score corresponding to each sample sentence participle are finally obtained. Therefore, in an optional implementation manner of this embodiment, the step of obtaining each of the contribution degree scores may include the following steps:
step E: and obtaining the contribution degree score of each participle by using a word frequency-inverse file frequency algorithm based on each sample question participle and each sample answer sentence participle.
For example, the contribution score of each participle is calculated using the following formula:
Figure BDA0002352671490000101
wherein, N i,j Representing the number of occurrences of the word i in the message query record j, D representing the total number of message query records in the message query record library, D i∈j Score, total number of query notes including a participle i in the query note library i And the contribution score corresponding to the word i in the news query record j is represented.
Step F: and removing the contribution degree score of the irrelevant topic word in the contribution degree scores of the words based on a preset word list to obtain each contribution degree score.
Due to the characteristics of the word frequency-inverse file frequency algorithm, the contribution degree scores corresponding to some 'irrelevant topic' participles are high, but the contribution degree scores are useless for classifying subsequent news query topics, such as 'our', 'goodbye' and 'their', and the like; therefore, the common 'irrelevant topic' word segmentation is formed into a preset word list in advance, and the preset word list is used for removing the contribution degree scores of the irrelevant topic word segmentation so as to obtain each contribution degree score which has practical significance on the subsequent news query topic classification.
Next, it should be noted that, in advance, each topic category of the information query record is set in combination with the actual case service, each topic category may include three major categories, such as information/query object, party situation, case situation, and the like, the information/query object may include two minor categories, such as query object and query object, the party situation may include five minor categories, such as personal basic information, unit basic information, family situation, physical and appearance characteristics, relationship between an advertiser and a victim, and the case situation may include nine minor categories, such as case reason, case tool, case pass, pre-crime performance, post-crime performance, dirt direction, injury situation of the victim, and criminal suspects explanation, that is, sixteen minor categories in total. Correspondingly, the topic marking category of the sample question-answer pair is one of the sixteen small categories.
Finally, it should be noted that, when step 202 is specifically implemented, on the basis of each sample question word, each contribution score corresponding to each sample question word is combined to obtain a first matrix corresponding to the sample question, and meanwhile, on the basis of each sample answer word, each contribution score corresponding to each sample answer word is combined to obtain a second matrix corresponding to the sample answer; splicing the first matrix and the second matrix to obtain a third matrix corresponding to the sample question-answer pair; on the basis of the third matrix, obtaining a characteristic vector of a representative sample question-answer pair by combining an attention mechanism with the weight vector; predicting the topic category of the question-answer pair of the sample through the feature vector and a preset activation function; and comparing the predicted topic categories with the marked topic categories to train network parameters to obtain a news query topic classification model. Therefore, in an alternative implementation manner of this embodiment of the present application, the step 202 may include the following steps:
step G: obtaining a first matrix based on each sample question word and each corresponding contribution degree score; and obtaining a second matrix based on each sample answer sentence segmentation word and each corresponding contribution degree score.
Wherein, the obtaining of the first matrix means: the method comprises the steps of firstly obtaining word vectors corresponding to sample question clauses, then splicing the word vectors corresponding to the sample question clauses with corresponding contribution degree scores to obtain new word vectors corresponding to the sample question clauses, wherein the new word vectors corresponding to the sample question clauses form a first matrix. Therefore, in an optional implementation manner of the embodiment of the present application, the step of obtaining the first matrix based on each sample question sentence segmentation word and each corresponding contribution score in the step G may include the following steps:
step G1: and obtaining word vectors of the participles of the sample question sentences.
Step G2: and respectively splicing the word vectors of the sample question clauses and the corresponding contribution degree scores to obtain the first matrix.
Similarly, the obtaining of the second matrix means: the word vectors corresponding to the sample answer sentence participles are obtained first, then the word vectors corresponding to the sample answer sentence participles are spliced with the corresponding contribution degree scores to obtain new word vectors corresponding to the sample answer sentence participles, and the new word vectors corresponding to the sample answer sentence participles form a second matrix. The step G of obtaining a second matrix based on each sample sentence-segmentation word and each corresponding contribution score may include the following steps:
step G3: and obtaining word vectors of the participles of the sample answer sentences.
Step G4: and respectively splicing the word vectors of the sample answer sentence participles and the corresponding contribution degree scores to obtain the second matrix.
In the embodiment of the present application, the steps G1 to G2 and the steps G3 to G4 may be performed simultaneously. As an example, as shown in fig. 4, for a schematic diagram of a new word vector corresponding to a sample question word or a new word vector corresponding to a sample answer sentence word, assuming that a dimension of the word vector corresponding to the sample question word is u-dimension, a dimension of the new word vector corresponding to the sample question word is u + 1-dimension; or, assuming that the dimension of the word vector corresponding to the sample answer sentence participles is u dimension, the dimension of the new word vector corresponding to the sample answer sentence participles is u +1 dimension.
Step H: and splicing the first matrix and the second matrix to obtain a third matrix.
Step I: obtaining a feature vector of the sample question-answer pair based on the third matrix and the weight vector; the weight vector is obtained based on a transpose of the third matrix.
For example, as shown in fig. 5, a schematic diagram of obtaining feature vectors of sample question-answer pairs is shown, where the number of each sample question-sentence clause is n, and the number of each sample answer-sentence clause is n, then each first word vector forms a first matrix which is an nx (u + 1) matrix, then each second word vector forms a second matrix which is an nx (u + 1) matrix, and the third matrix is a 2 nx (u + 1) matrix. The feature vectors of the sample question-answer pairs can be obtained by using the following formula:
M(1×n)=softmax(A×H×W s );
A=softmax(W s2 tanh(W s1 HT));
wherein H represents a third matrix, A represents a weight vector, M represents a feature vector of a sample question-answer pair, and W s Denotes a (u + 1) × n parameter matrix, W s1 Denotes d a X (u + 1) random initial parameter matrix, u +1 size 1, the rest columns being random numbers between (0, 1), W s2 Is d a A random parameter vector. It should be noted that in the process of repeated iterative training, the weight vector a is continuously optimized, so as to better learn the relationship between the "strong topic" participle and the topic category, as shown in fig. 6, which is a schematic diagram of the weight vector a.
Step J: and obtaining the predicted topic category of the sample question-answer pair based on the feature vector and a preset activation function.
Step K: and training network parameters in the convolutional neural network based on the attention mechanism according to the predicted topic category and the labeled topic category to obtain the inquiry topic classification model.
Through various implementation manners provided by the embodiment, firstly, natural language preprocessing is performed on sample question-answer pairs in a sample news inquiry record to obtain each sample question sentence participle and each sample answer sentence participle; then, training a convolutional neural network based on an attention mechanism to obtain a signal query topic classification model according to each sample question word and each corresponding contribution degree score thereof, each sample answer sentence word and each corresponding contribution degree score thereof and the labeled topic category of the sample question-answer pair. Therefore, by means of the contribution degree scores corresponding to the question and sentence participles of each sample and the contribution degree scores corresponding to the answer and sentence participles of each sample, the topic category information of the question and answer pairs of the samples is increased, and the training of news inquiry topic classification is enhanced; and the convolutional neural network based on the attention mechanism can fully learn the relationship between the word segmentation of the strong topic and the topic category, so that the topic classification effect of the topic classification model for the inquiry is better, and the accuracy of the subsequent topic classification for the inquiry is improved.
Method example II
It should be noted that, on the basis of obtaining the query topic classification model in the above method embodiment, because the topic category information of the sample question-answer pair is added during training, the training of the query topic classification is enhanced; the convolutional neural network based on the attention mechanism can be used for fully learning the relation between the word segmentation of the strong topic and the topic category, so that the topic classification effect of the topic classification model of the inquiry is better; therefore, in practical application, a natural language preprocessing technology is used for processing question and answer pairs to be classified in a news inquiry record to be classified to obtain each question and answer word to be classified and each sentence word to be classified; inputting each question and sentence segmentation to be classified, each answer sentence segmentation to be classified, contribution degree score representing the contribution degree of each question and sentence segmentation dialogue question to be classified and contribution degree score representing the contribution degree of each answer sentence segmentation dialogue question to be classified into the news question and question classification model, and thus obtaining the predicted topic category and the predicted probability of the question and answer pair to be classified; thereby determining the target topic category of the question-answer pair to be classified.
Referring to fig. 7, a flowchart of a method for categorizing query topics in the embodiment of the present application is shown. In this embodiment, by using the topic query classification model described in the first embodiment of the method, the method may include the following steps:
step 701: and performing natural language preprocessing on the question and answer pairs to be classified in the inquiry notes to be classified to obtain each question and answer word to be classified and each answer sentence word to be classified.
Step 702: and inputting each question and sentence segmentation word to be classified and each contribution degree score corresponding to the question and answer word to be classified into a news question and question classification model to obtain the predicted topic category and the predicted probability of the question and answer pair to be classified.
Step 703: and determining the target topic category of the question-answer pair to be classified based on the predicted topic category and the predicted probability of the question-answer pair to be classified.
Wherein, the enquiry topic classification model is obtained according to the method of the first embodiment of the method.
For example, the predicted topic categories and the predicted probabilities of the question-answer pairs to be classified are sorted from high to low according to the predicted probabilities, the predicted topic categories corresponding to the highest predicted probabilities are selected, and the predicted topic categories are determined as the target topic categories of the question-answer pairs to be classified.
It should be noted that, there is a certain relationship between the information of the case of the to-be-classified inquiry record and the topic category, and when the information of the case of the to-be-classified inquiry record is determined, the topic category set formed by the specific topic category corresponding to the information of the case can be determined. Therefore, in an optional implementation manner of the embodiment of the present application, for example, the method may further include the step L: obtaining a topic category set corresponding to case information of the to-be-classified inquiry record; correspondingly, the step 703 may specifically be, for example: and determining the target topic category of the question-answer pair to be classified by combining the topic category set based on the predicted topic category and the predicted probability of the question-answer pair to be classified.
And after the predicted topic categories and the predicted probabilities of the question-answer pairs to be classified are sorted from high to low according to the predicted probabilities, acquiring a topic category set corresponding to the information of the case of the message inquiry record to be classified, if the predicted topic category corresponding to the highest predicted probability is not in the topic category set, selecting the predicted topic category corresponding to the next highest predicted probability to judge whether the predicted topic category is in the topic category set, and so on until the predicted topic category corresponding to the selected predicted probability is judged to be in the topic category set, and determining the predicted topic category as the target topic category of the question-answer pairs to be classified.
As an example, fig. 8 is a schematic diagram illustrating the determination of the target topic category of the question-answer pair to be classified, where the upper diagram shows that step 703 is directly performed after step 702 is performed, and the target topic category of the question-answer pair to be classified is "dirty go"; the following figure shows that after the step 702 is executed, the step L is executed first, and then the corresponding step 703 is executed, because the topic category set corresponding to the information "blow others" of the directory inquiry to be classified does not include "dirty going", and the target topic category of the question-answer pair to be classified is "strike reason".
Through various implementation modes provided by the embodiment, natural language preprocessing is performed on question-answer pairs to be classified in a message inquiry record to be classified, so that each question-sentence segmentation word to be classified and each answer-sentence segmentation word to be classified are obtained; inputting each question and sentence segmentation word to be classified and each corresponding contribution score thereof as well as each question and sentence segmentation word to be classified and each corresponding contribution score thereof into a news inquiry topic classification model to obtain a predicted topic category and a predicted probability of a question and answer pair to be classified; and determining the target topic category of the question-answer pair to be classified based on the predicted topic category and the predicted probability of the question-answer pair to be classified. For the news query topic classification model, the topic category information of the sample question-answer pairs is increased through the contribution degree scores corresponding to the question clauses and the contribution degree scores corresponding to the answer clauses of each sample, so that the training of the news query topic classification is enhanced; and the convolutional neural network based on the attention mechanism can fully learn the relation between the word segmentation of the 'strong topic' and the topic category, so that the topic classification effect of the classification model of the information query topic is better, and the accuracy of classification of the information query topic by the to-be-classified question and answer in the information query excerpt to be classified is improved.
Exemplary devices
Apparatus embodiment one
Referring to fig. 9, a schematic structural diagram of an apparatus for obtaining a query topic classification model in an embodiment of the present application is shown. In this embodiment, the apparatus may specifically include:
a first obtaining unit 901, configured to perform natural language preprocessing on the sample question-answer pairs in the sample news query transcript to obtain each sample question clause and each sample answer sentence clause;
a second obtaining unit 902, configured to train a convolutional neural network based on an attention mechanism to obtain a signal query topic classification model according to each sample question word and each corresponding contribution score thereof, each sample answer word and each corresponding contribution score thereof, and a labeled topic category of the sample question-answer pair.
In an optional implementation manner of the embodiment of the present application, the second obtaining unit 902 includes:
the first obtaining subunit is used for obtaining a first matrix based on each sample question sentence segmentation word and each corresponding contribution degree score;
the second obtaining subunit is used for obtaining a second matrix based on each sample answer clause segmentation word and each corresponding contribution degree score;
a third obtaining subunit, configured to splice the first matrix and the second matrix to obtain a third matrix;
a fourth obtaining subunit, configured to obtain a feature vector of the sample question-and-answer pair based on the third matrix and the weight vector; the weight vector is obtained based on a transpose of the third matrix;
a fifth obtaining subunit, configured to obtain, based on the feature vector and a preset activation function, a predicted topic category of the sample question-answer pair;
and a sixth obtaining subunit, configured to train a network parameter in the convolutional neural network based on the attention mechanism according to the predicted topic category and the labeled topic category, and obtain the query topic classification model.
In an optional implementation manner of the embodiment of the present application, the first obtaining subunit includes:
the first obtaining module is used for obtaining word vectors of the participles of the sample question sentences;
a second obtaining module, configured to splice word vectors of the question clauses and the corresponding contribution scores to obtain the first matrix;
the second obtaining subunit includes:
a third obtaining module, configured to obtain a word vector of each sample answer sentence segmentation;
and the fourth obtaining module is used for respectively splicing the word vectors of the sample answer sentence participles and the corresponding contribution degree scores to obtain the second matrix.
In an optional implementation manner of the embodiment of the present application, the apparatus further includes a contribution degree score obtaining unit, where the contribution degree score obtaining unit includes:
a seventh obtaining subunit, configured to obtain, based on each sample question sentence participle and each sample answer sentence participle, a contribution score of each participle by using a word frequency-inverse file frequency algorithm;
and the eighth obtaining subunit is configured to remove the contribution degree score of the irrelevant topic word in the contribution degree scores of the words based on a preset word list, and obtain each contribution degree score.
In an optional implementation manner of this embodiment of this application, the first obtaining unit 901 includes:
a ninth obtaining subunit, configured to perform word segmentation on the sample question-answer pairs in the sample news query transcript, so as to obtain each question word and each answer sentence word;
a tenth obtaining subunit, configured to perform preset entity type character replacement processing on entity nouns that conform to a preset entity type in each question clause and each answer clause, and obtain each sample question clause and each sample answer clause; the preset entity types comprise numbers, time, names of people, names of places and/or names of organizations.
In an optional implementation manner of the embodiment of the present application, if the preset entity type is a number and/or time, the preset entity type character replacement processing is preset entity type character replacement processing based on rule matching; and if the preset entity type is a name of a person, a place and/or an organization, the preset entity type character replacement processing is preset entity type character replacement processing based on named entity prediction.
Through various implementation manners provided by the embodiment, firstly, natural language preprocessing is performed on sample question-answer pairs in a sample news inquiry record to obtain each sample question sentence participle and each sample answer sentence participle; then, training a convolutional neural network based on an attention mechanism to obtain a signal query topic classification model according to each sample question word and each corresponding contribution degree score thereof, each sample answer sentence word and each corresponding contribution degree score thereof and the labeled topic category of the sample question-answer pair. Therefore, by means of the contribution degree scores corresponding to the question and sentence participles of each sample and the contribution degree scores corresponding to the answer and sentence participles of each sample, the topic category information of the question and answer pairs of the samples is increased, and the training of news inquiry topic classification is enhanced; and the convolutional neural network based on the attention mechanism can fully learn the relation between the word segmentation of the 'strong topic' and the topic category, so that the topic classification effect of the topic classification model for the inquiry topic is better, and the accuracy of the subsequent topic classification for the inquiry is improved.
Device embodiment II
Referring to fig. 10, a schematic structural diagram of an apparatus for classifying news-questioning topics in the embodiment of the present application is shown. In this embodiment, the apparatus may specifically include:
a third obtaining unit 1001, configured to perform natural language preprocessing on question-answer pairs to be classified in the query transcript of the news to be classified, so as to obtain each question-sentence segmentation word to be classified and each question-answer segmentation word to be classified;
a fourth obtaining unit 1002, configured to input each question and sentence division to be classified and each contribution score corresponding to the question and sentence division to be classified into the news inquiry topic classification model, and obtain a predicted topic category and a predicted probability of the question and answer pair to be classified;
a determining unit 1003, configured to determine, based on the predicted topic category and the predicted probability of the question-answer pair to be classified, a target topic category of the question-answer pair to be classified;
wherein the query topic classification model is obtained according to the method of the first embodiment of the method.
In an optional implementation manner of the embodiment of the present application, the apparatus further includes a topic category set obtaining unit;
the topic category set obtaining unit is used for obtaining a topic category set corresponding to the case information of the to-be-classified news inquiry record;
correspondingly, the determining unit 1003 is specifically configured to:
and determining the target topic category of the question-answer pair to be classified by combining the topic category set based on the predicted topic category and the predicted probability of the question-answer pair to be classified.
Through various implementation modes provided by the embodiment, natural language preprocessing is carried out on question-answer pairs to be classified in a query note to be classified, and each question-sentence participle to be classified and each question-answer participle to be classified are obtained; inputting each question and sentence segmentation word to be classified and each corresponding contribution degree score thereof, and each answer and sentence segmentation word to be classified and each corresponding contribution degree score thereof into the news question and question classification model, and obtaining the predicted topic category and the predicted probability of the question and answer pair to be classified; and determining the target topic category of the question-answer pair to be classified based on the predicted topic category and the predicted probability of the question-answer pair to be classified. For the news query topic classification model, the topic category information of the sample question-answer pairs is increased through the contribution degree scores corresponding to the question clauses and the contribution degree scores corresponding to the answer clauses of each sample, so that the training of the news query topic classification is enhanced; and the convolutional neural network based on the attention mechanism can fully learn the relationship between the word of the strong topic and the topic category, so that the topic classification effect of the message inquiry topic classification model is better, and the accuracy of the classification of the message inquiry topic by the to-be-classified question and answer in the to-be-classified message inquiry record is improved.
In addition, an embodiment of the present application further provides a terminal device, where the terminal device includes a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the method for obtaining a query topic classification model according to the first method embodiment according to the instructions in the program code.
An embodiment of the present application further provides a terminal device, where the terminal device includes a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is used for executing the method for querying topic classification described in the second method embodiment according to the instructions in the program code.
The embodiment of the present application further provides a computer-readable storage medium for storing a program code, where the program code is used to execute the method for obtaining a categorization model of a questioning topic according to the first embodiment of the present application.
The embodiment of the present application further provides a computer-readable storage medium, which is used for storing a program code, where the program code is used for executing the method for classifying the query topics described in the second method embodiment.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely a preferred embodiment of the present application and is not intended to limit the present application in any way. Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application to the details shown. Those skilled in the art can now make numerous possible variations and modifications to the disclosed embodiments, or modify equivalent embodiments, using the methods and techniques disclosed above, without departing from the scope of the claimed embodiments. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present application still fall within the protection scope of the technical solution of the present application without departing from the content of the technical solution of the present application.

Claims (10)

1. A method for obtaining a categorization model of a query topic, comprising:
carrying out natural language preprocessing on the sample question-answer pairs in the sample news inquiry record to obtain each sample question sentence participle and each sample answer sentence participle;
obtaining word vectors of the sample question word segmentations, and respectively splicing the word vectors of the sample question word segmentations and the corresponding contribution degree scores to obtain a first matrix; obtaining a word vector of each sample answer sentence segmentation; respectively splicing the word vectors of the sample answer sentence participles and the corresponding contribution degree scores to obtain a second matrix;
splicing the first matrix and the second matrix to obtain a third matrix;
obtaining a feature vector of the sample question-answer pair based on the third matrix and the weight vector; the weight vector is obtained based on a transpose of the third matrix;
obtaining the predicted topic category of the sample question-answer pair based on the feature vector and a preset activation function;
training network parameters in the convolutional neural network based on the attention mechanism according to the predicted topic category and the labeled topic category to obtain the inquiry topic classification model;
wherein, the step of obtaining the sample question-answer pair comprises:
deleting the record information which is not related to the sample information inquiry in the sample information inquiry record based on the preset format of the sample information inquiry record;
and based on a preset syntax mode of the sample message query record, segmenting the sample message query record to obtain the sample question-answer pair.
2. The method of claim 1, wherein the step of obtaining each of the contribution score comprises:
based on each sample question sentence participle and each sample answer sentence participle, obtaining a contribution score of each participle by using a word frequency-inverse file frequency algorithm;
and removing the contribution degree score of the irrelevant topic word in the contribution degree scores of the words based on a preset word list to obtain each contribution degree score.
3. The method of claim 1, wherein the natural language preprocessing is performed on the sample question-answer pairs in the sample news query transcript to obtain sample question clauses and corresponding sample answer clause clauses, and the method comprises the following steps:
performing word segmentation on the sample question-answer pairs in the sample news inquiry record to obtain each question word segmentation and each answer sentence segmentation;
performing preset entity type character replacement processing on entity nouns which accord with preset entity types in the question clauses and the answer clauses to obtain sample question clauses and sample answer clauses; the preset entity types comprise numbers, time, names of people, names of places and/or names of organizations.
4. The method according to claim 3, wherein if the predetermined entity type is number and/or time, the predetermined entity type character replacement processing is predetermined entity type character replacement processing based on rule matching; and if the preset entity type is a name of a person, a place and/or an organization, the preset entity type character replacement processing is preset entity type character replacement processing based on named entity prediction.
5. A method for categorizing a questioning topic, comprising:
natural language preprocessing is carried out on the question-answer pairs to be classified in the inquiry notes of the news to be classified, and each question-sentence segmentation word to be classified and each question-answer segmentation word to be classified are obtained;
inputting each question and sentence segmentation word to be classified and each contribution degree score corresponding to the question and sentence segmentation word to be classified into a news inquiry topic classification model to obtain a predicted topic category and a predicted probability of the question and answer pair to be classified;
determining a target topic category of the question-answer pair to be classified based on the predicted topic category and the predicted probability of the question-answer pair to be classified;
wherein the query topic classification model is obtained according to the method of any one of claims 1-4.
6. The method of claim 5, further comprising:
obtaining a topic category set corresponding to the case information of the to-be-classified inquiry record;
correspondingly, the step of determining the target topic category of the question-answer pair to be classified based on the predicted topic category and the predicted probability of the question-answer pair to be classified specifically includes:
and determining the target topic category of the question-answer pair to be classified by combining the topic category set based on the predicted topic category and the predicted probability of the question-answer pair to be classified.
7. An apparatus for obtaining a categorization model of a query topic, comprising:
the first obtaining unit is used for carrying out natural language preprocessing on the sample question-answer pairs in the sample news inquiry record to obtain each sample question sentence participle and each sample answer sentence participle;
the second obtaining unit is used for obtaining the word vector of each sample question word, and respectively splicing the word vector of each sample question word and each corresponding contribution degree score to obtain a first matrix; obtaining a word vector of each sample answer sentence segmentation; respectively splicing the word vectors of the sample answer sentence participles and the corresponding contribution degree scores to obtain a second matrix;
a third obtaining unit, configured to splice the first matrix and the second matrix to obtain a third matrix;
a fourth obtaining unit, configured to obtain a feature vector of the sample question-and-answer pair based on the third matrix and the weight vector; the weight vector is obtained based on a transpose of the third matrix;
a fifth obtaining unit, configured to obtain a predicted topic category of the sample question-answer pair based on the feature vector and a preset activation function;
a sixth obtaining unit, configured to train a network parameter in the convolutional neural network based on the attention mechanism according to the predicted topic category and the labeled topic category, and obtain the query topic classification model;
the device further comprises:
the sample question-answer pair obtaining unit is used for deleting the record information which is not related to the sample information inquiry in the sample information inquiry record based on the preset format of the sample information inquiry record; and based on a preset syntax mode of the sample message query record, segmenting the sample message query record to obtain the sample question-answer pair.
8. An apparatus for categorizing a questioning topic, comprising:
a third obtaining unit, configured to perform natural language preprocessing on question-answer pairs to be classified in the query transcript to be classified, to obtain question-sentence clauses to be classified and answer-sentence clauses to be classified;
a fourth obtaining unit, configured to input each question and sentence segmentation to be classified and each contribution score corresponding to the question and sentence segmentation to be classified into the news inquiry topic classification model, and obtain a predicted topic category and a predicted probability of the question and answer pair to be classified;
the determining unit is used for determining a target topic category of the question-answer pair to be classified based on the predicted topic category and the predicted probability of the question-answer pair to be classified;
wherein the query topic classification model is obtained according to the method of any one of claims 1-4.
9. A terminal device, comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the method for obtaining the query topic classification model according to any one of claims 1 to 4 according to instructions in the program code, or execute the method for obtaining the query topic classification according to any one of claims 5 to 6 according to instructions in the program code.
10. A computer-readable storage medium for storing a program code for performing the method of obtaining a query topic classification model of any one of claims 1-4 or for performing the method of classifying a query topic of any one of claims 5-6.
CN201911422174.6A 2019-12-31 2019-12-31 Method and device for obtaining query topic classification model and query topic classification Active CN111159360B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911422174.6A CN111159360B (en) 2019-12-31 2019-12-31 Method and device for obtaining query topic classification model and query topic classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911422174.6A CN111159360B (en) 2019-12-31 2019-12-31 Method and device for obtaining query topic classification model and query topic classification

Publications (2)

Publication Number Publication Date
CN111159360A CN111159360A (en) 2020-05-15
CN111159360B true CN111159360B (en) 2022-12-02

Family

ID=70560684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911422174.6A Active CN111159360B (en) 2019-12-31 2019-12-31 Method and device for obtaining query topic classification model and query topic classification

Country Status (1)

Country Link
CN (1) CN111159360B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559748A (en) * 2020-12-18 2021-03-26 厦门市法度信息科技有限公司 Method for classifying stroke record data records, terminal equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763326A (en) * 2018-05-04 2018-11-06 南京邮电大学 A kind of sentiment analysis model building method of the diversified convolutional neural networks of feature based
CN108959482A (en) * 2018-06-21 2018-12-07 北京慧闻科技发展有限公司 Single-wheel dialogue data classification method, device and electronic equipment based on deep learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220231A (en) * 2016-03-22 2017-09-29 索尼公司 Electronic equipment and method and training method for natural language processing
US11568855B2 (en) * 2017-08-29 2023-01-31 Tiancheng Zhao System and method for defining dialog intents and building zero-shot intent recognition models
CN109815492A (en) * 2019-01-04 2019-05-28 平安科技(深圳)有限公司 A kind of intension recognizing method based on identification model, identification equipment and medium
CN110516070B (en) * 2019-08-28 2022-09-30 上海海事大学 Chinese question classification method based on text error correction and neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763326A (en) * 2018-05-04 2018-11-06 南京邮电大学 A kind of sentiment analysis model building method of the diversified convolutional neural networks of feature based
CN108959482A (en) * 2018-06-21 2018-12-07 北京慧闻科技发展有限公司 Single-wheel dialogue data classification method, device and electronic equipment based on deep learning

Also Published As

Publication number Publication date
CN111159360A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
CN112800170A (en) Question matching method and device and question reply method and device
CN110347787B (en) Interview method and device based on AI auxiliary interview scene and terminal equipment
CN107729468A (en) Answer extracting method and system based on deep learning
CN111209384A (en) Question and answer data processing method and device based on artificial intelligence and electronic equipment
CN112214593A (en) Question and answer processing method and device, electronic equipment and storage medium
CN104471568A (en) Learning-based processing of natural language questions
CN112468659B (en) Quality evaluation method, device, equipment and storage medium applied to telephone customer service
CN115599902B (en) Oil-gas encyclopedia question-answering method and system based on knowledge graph
CN112613293B (en) Digest generation method, digest generation device, electronic equipment and storage medium
CN112256845A (en) Intention recognition method, device, electronic equipment and computer readable storage medium
CN113742733A (en) Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device
CN110597968A (en) Reply selection method and device
CN111369980A (en) Voice detection method and device, electronic equipment and storage medium
CN111241397A (en) Content recommendation method and device and computing equipment
CN111159405B (en) Irony detection method based on background knowledge
CN112287197A (en) Method for detecting sarcasm of case-related microblog comments described by dynamic memory cases
CN112069312A (en) Text classification method based on entity recognition and electronic device
CN114756675A (en) Text classification method, related equipment and readable storage medium
KR102206781B1 (en) Method of fake news evaluation based on knowledge-based inference, recording medium and apparatus for performing the method
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
CN110852071A (en) Knowledge point detection method, device, equipment and readable storage medium
CN111159360B (en) Method and device for obtaining query topic classification model and query topic classification
CN115293142A (en) Common sense question-answering method based on dictionary enhanced pre-training model
CN110941713A (en) Self-optimization financial information plate classification method based on topic model
CN114969347A (en) Defect duplication checking implementation method and device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant