CN111274382A - Text classification method, device, equipment and storage medium - Google Patents

Text classification method, device, equipment and storage medium Download PDF

Info

Publication number
CN111274382A
CN111274382A CN201811383555.3A CN201811383555A CN111274382A CN 111274382 A CN111274382 A CN 111274382A CN 201811383555 A CN201811383555 A CN 201811383555A CN 111274382 A CN111274382 A CN 111274382A
Authority
CN
China
Prior art keywords
channel
query
text
query request
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811383555.3A
Other languages
Chinese (zh)
Inventor
王颖帅
李晓霞
苗诗雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201811383555.3A priority Critical patent/CN111274382A/en
Publication of CN111274382A publication Critical patent/CN111274382A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a text classification method, a text classification device, text classification equipment and a storage medium, wherein a query request input by a user is received, wherein the query request comprises a query text; then extracting text features from the query text; and inputting the text characteristics into the scene classification model to obtain a service scene corresponding to the query request. The method of the invention applies the trained scene classification model, can realize the classification of the service scene by inputting the text characteristics of the query text, reduces the maintenance cost, has less dependence on the human, can flexibly understand and predict the user intention, has higher accuracy and reliability, and improves the user experience.

Description

Text classification method, device, equipment and storage medium
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a text classification method, apparatus, device, and storage medium.
Background
With the increasing importance of research and application of machine learning in the industrial industry, artificial intelligence has been widely applied to various fields such as speech recognition, image processing, text semantic understanding, personalized recommendation and the like, and a big data era comes to enable a machine to obtain new knowledge skills. On the E-commerce platform, users like to shop online in a more intelligent mode, and how to mine an intelligent shopping assistant is very critical.
In the prior art, an e-commerce platform generally provides query search services, such as querying commodities, brands, coupons, services and the like, a good-language template needs to be designed in advance, and after a query request input by a user is received, a query text and the template are regularly matched, so that the text input by the user is matched to a corresponding business scene category. In the prior art, the regular matching of the dialect templates has large dependence on manual work, the dialect needs to be preset, the maintenance cost is high, the sentence pattern classification prediction is rigid, and the understanding and prediction of the user intention can not be flexibly performed.
Disclosure of Invention
The invention provides a text classification method, a text classification device, text classification equipment and a storage medium, which are used for reducing the dependence on manpower, reducing the maintenance cost and realizing the flexible understanding and prediction of user intentions according to query requests input by users.
In a first aspect, an embodiment of the present invention provides a text classification method, including:
receiving a query request input by a user, wherein the query request comprises query text;
extracting text features from the query text;
and inputting the text features into a scene classification model to obtain a service scene corresponding to the query request.
In a second aspect, an embodiment of the present invention provides a text classification apparatus, including:
the receiving module is used for receiving a query request input by a user, wherein the query request comprises query text;
the characteristic extraction module is used for extracting text characteristics from the query text;
and the service scene classification module is used for inputting the text characteristics into a scene classification model to obtain a service scene corresponding to the query request.
In a third aspect, an embodiment of the present invention provides a text classification device, including:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored;
which when executed by a processor implements the method according to the first aspect.
The text classification method, the text classification device, the text classification equipment and the text classification storage medium provided by the invention receive a query request input by a user, wherein the query request comprises a query text; then extracting text features from the query text; and inputting the text characteristics into the scene classification model to obtain a service scene corresponding to the query request. The method of the invention applies the trained scene classification model, can realize the classification of the service scene by inputting the text characteristics of the query text, reduces the maintenance cost, has less dependence on the human, can flexibly understand and predict the user intention, has higher accuracy and reliability, and improves the user experience.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a text classification method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a text classification method according to another embodiment of the present invention;
FIG. 3 is a block diagram of a text classification apparatus according to an embodiment of the present invention;
fig. 4 is a structural diagram of a text classification device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a text classification method according to an embodiment of the present invention. The embodiment provides a text classification method, which comprises the following specific steps:
s101, receiving a query request input by a user, wherein the query request comprises query texts.
In this embodiment, the user may input and convert the query text into the query text through voice at the terminal, or the user directly inputs the query text through the terminal, and then generates the query request according to the query text, and sends the query request to the text classification server. The first sentence in the interaction process between the user and the terminal can be extracted as the query text through the regular matching. The query text may be, for example: "I want to control the ceiling fan remotely", "I want to buy what underwear is taken in", "how to get the coupon", or "what express delivery the milk powder I buy" etc. In this embodiment, it is necessary to determine what service the user wants to use according to the query text, that is, determine a service scenario corresponding to the query request, where the service scenario may include a specific commodity query service scenario, an order query service scenario, a fuzzy preference query service scenario, a specific preference query service scenario, an after-sale service scenario, a channel query service scenario, an unknown service scenario, and the like.
Further, after receiving a query request input by a user, it may be determined whether the query text is a term in a product thesaurus or a brand thesaurus, if so, it is determined that a service scenario corresponding to the query request is a commodity query service scenario, otherwise, subsequent S102 is performed.
In this embodiment, the query text input by the user may be an ultra-short text, for example, a brand word or a product word, such as "Qingdao beer", "sweeping robot", "Adida", and the like, and may be first matched through Redis full text according to a product thesaurus and/or a brand thesaurus, and if a word in the product thesaurus or the brand thesaurus is directly hit, a service scenario corresponding to the query request is determined to be a commodity query service scenario, and further, a corresponding commodity page may be skipped. The product word library comprises the names of the existing products and the names of the new products discovered in the processes of labeling, evaluating and the like; the brand word bank comprises the existing brand names and new brand names found in the processes of labeling, evaluating and the like; the product thesaurus and the brand thesaurus can be continuously updated. In addition, keywords of other service scenes can be obtained in advance, the same full-text matching is carried out, and the service scene can be directly determined if the keywords are directly hit without the following process, so that the text classification process is simplified, and the classification efficiency is improved.
And S102, extracting text features from the query text.
In this embodiment, the present text feature extraction tool, such as TF-IDF, OneHot or bag-of-words model, may be preferably used in this embodiment to obtain valuable information, and the feature utilization rate is high, so as to evaluate the importance degree of a word to a document set or a corpus, where TF means word frequency, IDF means inverse text frequency index, and the importance of a word increases in proportion to the number of times it appears in a document, but in inverse proportion to the frequency of it appears in a corpus. In this embodiment, the query text may be preprocessed, which may specifically include case conversion, for example, converting all capital letters in the corpus into lowercase; the method comprises the following steps of (1) removing stop words, and filtering common words without actual information content, such as 'the', 'woollen', 'the' and the like; and (4) word segmentation, which can use the existing word segmentation tools, such as the ending word segmentation, pre-add the word stock of the brand and the word stock of the product, and perform offline fine-tuning storage on the frequency characteristics of the ending word segmentation, and update the frequency characteristics periodically. In this embodiment, after the preprocessing is completed, the text features are extracted by using TF-IDF, and the main idea is as follows: the importance degree of a word is in direct proportion to the word frequency of the word in the service scene category and in inverse proportion to the occurrence frequency of all the service scene categories, the high-weight keyword is screened out through TF-IDF and ranked in the front, when a user common dictionary is constructed, a plurality of words with the highest weight in a sentence are intercepted, and the fact that the words entering the dictionary are all useful and common words can be guaranteed.
S103, inputting the text features into a scene classification model to obtain a service scene corresponding to the query request.
In this embodiment, a scene classification model may be trained in advance, where the scene classification model may be a neural network model, or may be another model, where an input of the model is a text feature and an output is a service scene.
In the text classification method provided by this embodiment, a query request input by a user is received, where the query request includes a query text; extracting text features from the query text; and inputting the text features into a scene classification model to obtain a service scene corresponding to the query request. The method and the device have the advantages that the trained scene classification model is applied, classification of the service scene can be achieved by inputting text features of the query text, maintenance cost is reduced, dependence on workers is small, understanding and prediction of user intentions can be flexibly made, high accuracy and reliability are achieved, and user experience is improved.
On the basis of the foregoing embodiment, after obtaining the service scenario corresponding to the query request in S103, the method may further include:
and if the service scene corresponding to the query request is channel query, determining a channel corresponding to the query request and entering the channel.
In this embodiment, for the above listed service scenarios, such as a specific commodity query service scenario, an order query service scenario, a fuzzy preferential query service scenario, a specific preferential query service scenario, an after-sale service scenario, an unknown service scenario, etc., the service scenario has no more branches, and when the service scenario corresponding to the query request is determined, the service scenario can be directly entered into, whereas for the channel query scenario, since the channels included in the scenario may include multiple channels, for example, the channels may include kyoto member, kyoto white bar, kyoto second killer, tokyo recovery, real name authentication, etc., if it is determined that the service scenario corresponding to the query request is a channel query, it is necessary to further determine which channel the user wants to query according to the query request, and then enter the corresponding channel.
Specifically, the determining a channel corresponding to the query request includes:
s201, judging whether the query text meets a preset statement format and contains preset channel keywords;
if yes, executing S202; if not, go to S203.
In this embodiment, by determining whether the query text satisfies the preset sentence format and includes the preset channel keyword, the channel corresponding to the query request can be accurately determined. For example, the preset statement format is: "i want to go … …", "i want to go … …", "i want … …", "i want … …", etc., and the query text, such as "i want to go to jingdong member", satisfies the preset sentence format and contains the preset channel keyword "jingdong member", it can be directly determined that the query request is a query for the "jingdong member" channel. For example, for a channel of "jingdong member", the channel keyword may be "white bar", "white bar printing", "white bar returning", "white bar opening", "white bar repayment", "white bar flash payment", "white bar activation", "white bar staging", and the like, and other channels and preset channel keywords are not illustrated one by one here, wherein the preset channel keyword may be stored in a channel keyword library in association with the corresponding channel.
And S202, if yes, determining a channel corresponding to the query request according to the preset channel key words.
In this embodiment, if it is determined that the query text satisfies the preset sentence format and includes the preset channel keyword, the channel corresponding to the preset channel keyword is used as the channel corresponding to the query request.
S203, if not, inputting the query text into a fuzzy semantic classification model, and determining a channel corresponding to the query request.
In this embodiment, if it is determined that the query text does not satisfy the preset sentence format and includes the preset channel keyword, that is, the query intention of the user for the channel cannot be directly determined, the channel may be determined through the fuzzy semantic classification model.
Specifically, the step of inputting the query text into the fuzzy semantic classification model in S203 to determine a channel corresponding to the query request includes:
s2031, obtaining the similarity between the query text and the corpus in any channel corpus.
S2032, acquiring the channel with the maximum similarity as the channel corresponding to the query request.
In this embodiment, a channel corpus may be obtained in advance, for example, for a channel of "white bar in kyoto", the corpus may include: "skip white bar", "refund from white bar in kyoto", "how to operate the white bar cancel", "how to do the white bar payment", "how to pay the white bar", "what to pay the white bar first", and so on. For the query text, the similarity between the query text and the corpus in any channel corpus can be obtained, and the channel with the maximum similarity is taken as the channel corresponding to the query request.
The obtaining of the similarity between the query text and the corpus in any channel corpus in S2031 may specifically include:
acquiring a space vector of a query text according to the query text;
and acquiring the similarity between the space vector of the query text and the space vector of any channel corpus, wherein the space vector of any channel corpus is the space vector of an article combined by all corpora in the channel corpus.
In this embodiment, the spatial vector of text is an algebraic model representing the text file as a vector of identifiers (e.g., indices) that is applied to information filtering, information retrieval, indexing, and associated sorting. In the embodiment, after the features are extracted through the TF-IDF, the query text is processed according to the existing tool (such as word2vector algorithm) for obtaining the text space vector, so that the space vector of the query text is obtained; for any channel corpus, all corpora of the channel are combined in advance to serve as an article, and the space vector of the article is obtained through the same method and serves as the space vector of the channel corpus. Then, the similarity between the space vector of the query text and the space vector of any channel corpus is obtained, wherein the similarity can be cosine similarity, and specifically, on the basis of the euclidean sum and the euclidean norm, the cosine similarity between the vector a and the vector B is defined as follows:
Figure BDA0001872448040000061
further, on the basis of the foregoing embodiment, the inputting the query text into a fuzzy semantic classification model to determine a channel corresponding to the query request may further include:
determining a channel corresponding to the query request by using a Support Vector machine (Svm); or
And combining the result obtained according to the similarity with the result obtained according to the support vector machine to determine the channel corresponding to the query request.
The Svm is a supervised learning model, is usually used for pattern recognition, classification and regression analysis, and has a significant effect on small data volume as a traditional text classifier algorithm. In this embodiment, because the number of channels is limited and the number of corpora in each channel corpus is relatively small, Svm may be used alone to determine the channel corresponding to the query request, or Svm may be combined with the method for determining the channel category according to the similarity in the above embodiments. The core algorithm idea is that vectors are firstly mapped into a higher-dimensional space, maximum interval hyperplanes are established in the space, data are separated, and the larger the distance between the hyperplanes is, the smaller the total classification error is. That is, the result obtained according to the similarity and the result obtained according to the support vector machine are integrated to determine the channel corresponding to the query request.
Further, on the basis of the above embodiment, before the inputting the query text into the fuzzy semantic classification model, the method may further include:
judging whether the query text is a preset fuzzy sentence or not, if so, determining a channel corresponding to the query request according to a channel corresponding to the preset fuzzy sentence; and/or
And judging whether the query text contains words in a channel keyword library, if so, determining a channel corresponding to the query request according to a channel corresponding to the words in the channel keyword library.
In this embodiment, before the query text is input into the fuzzy semantic classification model, whether the query text is a preset fuzzy sentence may be determined through Redis full-text matching, for example, the preset fuzzy sentence for the "jingdong wallet" channel may be "jingdong balance", "balance present", "why the" jingdong balance cannot be paid "," how the "jingdong balance is used", "how money in my wallet is present", and the like, and when the query text is the preset fuzzy sentence, the channel corresponding to the query request is determined to be the "jingdong wallet" channel. In addition, in this embodiment, the keyword inclusion logic may also be determined according to the channel keyword library referred to in the above S201, that is, whether the query text includes a term in the keyword library is determined, for example, if any term of the kyoto member, the member, and the my member is included, the channel corresponding to the query request may be determined to be the "kyoto member" channel. Of course, a fuzzy channel keyword library can be constructed according to the channel keyword library, and whether the query text contains the words in the fuzzy channel keyword library or not can be judged.
As an optional embodiment, on the basis of the foregoing embodiment, determining the channel corresponding to the query request may specifically include the following steps:
s301, judging whether the query text meets a preset sentence format and contains preset channel keywords;
if yes, determining a channel corresponding to the query request according to the preset channel key words; if not, go to step S302.
S302, judging whether the query text is a preset fuzzy sentence or not;
if so, determining a channel corresponding to the query request according to a channel corresponding to the preset fuzzy sentence; if not, go to S303.
S303, judging whether the query text contains words in a channel keyword library;
if so, determining a channel corresponding to the query request according to a channel corresponding to the terms in the channel keyword library; if not, go to step S304.
S304, inputting the query text into a fuzzy semantic classification model, and determining a channel corresponding to the query request.
It should be noted that the order of S302 and S303 may also be changed, and will not be described herein.
On the basis of the above embodiment, the method may further include:
obtaining a training sample of a scene classification model;
and constructing a neural network, and training the neural network by adopting the training sample so as to obtain the scene classification model.
In this embodiment, 5000 common high-frequency Chinese characters are used as the character vector level features, and then digital mapping is performed, so that character vector features suitable for scene classification are prepared. Providing a word segmentation vector model, setting the length of the feature of a word level to be 4, wherein the length is B, I, E and S, B represents begin, I represents intermedia, E represents end, and S represents single, after word segmentation, the position of each word is designed above, and the word position information of the word in the whole sentence is described. After the text of the training sample is segmented, a Word2Vector algorithm is adopted to endow each Word with a Vector, and the Word Vector is more suitable for scene classification by debugging key parameters such as Vector dimension, sliding window size and the like.
Further, a Convolutional Neural Network (CNN) is constructed, which includes a Convolutional layer with multiple Convolutional kernels, a pooling layer (Pool), a fully connected layer (Full _ dense), a random discarded neuron layer (Dropout), an activation function layer (Relu), a classification layer (classifier), an optimization layer (Optimizer), and an Accuracy calculation layer (Accuracy). The convolutional layer adopts a convolutional layer with the function of multiple convolution kernels, and the training sample text is convolved based on the multiple convolution kernels 1 × 1, 3 × 3 and 5 × 5 respectively; setting a switch to enable or not to enable a custom highway layer, wherein the highway layer is provided with two small switches, and can control whether the implementation is based on conv1d or mlp; setting a switch, and enabling a custom attribute layer or not; the previous convolution results are spliced by Concat. Pool of Pool layers corresponds to weak attentions. The Full _ dense layer adds the function of batch _ normal and realizes Full connection. The Dropout layer randomly discards some neurons to prevent overfitting. The Relu layer uses an activation function to add nonlinear factors to improve the expression capability of the model. The Classier layer is activated by adopting a Softmax function to predict the category. The Optimizer layer uses an Adam Optimizer. And outputting the model precision by the Accuracy layer. In the embodiment, the neural network is trained through the training samples, and the parameters of the neural network are continuously debugged, so that a final scene classification model is obtained. The specific parameters are not described in detail herein.
Further, on the basis of the above embodiment, the method further includes a testing step of testing each service scenario through a common test sample; ensuring the model effect through a multi-party joint debugging test; the offline index requires that the model accuracy Accuary reaches 0.92, and the model F1 value reaches 0.91. And carrying out comprehensive data analysis after the online operation to generate an analysis report.
In the text classification method provided by this embodiment, a query request input by a user is received, where the query request includes a query text; extracting text features from the query text; and inputting the text features into a scene classification model to obtain a service scene corresponding to the query request. The method and the device have the advantages that the trained scene classification model is applied, classification of the service scene can be achieved by inputting text features of the query text, maintenance cost is reduced, dependence on workers is small, understanding and prediction of user intentions can be flexibly made, high accuracy and reliability are achieved, and user experience is improved. And moreover, the deep learning and machine learning combined method is adopted, more sentence patterns except for the templates can be covered, the classification flexibility is improved, and the service scene can be more effectively predicted.
Fig. 3 is a structural diagram of a text classification apparatus according to an embodiment of the present invention. The text classification device provided in this embodiment may execute the processing flow provided in the foregoing text classification method embodiment, as shown in fig. 3, the device includes a receiving module 41, a feature extraction module 42, and a business scenario classification module 43:
the receiving module 41 is configured to receive a query request input by a user, where the query request includes query text;
a feature extraction module 42, configured to extract text features from the query text;
and a service scene classification module 43, configured to input the text feature into a scene classification model, and obtain a service scene corresponding to the query request.
Further, the apparatus further comprises a channel classification module 44 configured to:
and if the service scene corresponding to the query request is channel query, determining a channel corresponding to the query request and entering the channel.
Further, the channel classification module 44 is configured to:
judging whether the query text meets a preset sentence format and contains preset channel keywords;
if yes, determining a channel corresponding to the query request according to the preset channel key words;
if not, inputting the query text into a fuzzy semantic classification model, and determining a channel corresponding to the query request.
Further, the channel classification module 44 is specifically configured to:
if the query text does not meet the preset sentence format and contains preset channel keywords, acquiring the similarity between the query text and the corpus in any channel corpus;
and acquiring the channel with the maximum similarity as the channel corresponding to the query request.
Further, the channel classification module 44 is specifically configured to:
acquiring a space vector of a query text according to the query text;
and acquiring the similarity between the space vector of the query text and the space vector of any channel corpus, wherein the space vector of any channel corpus is the space vector of an article combined by all corpora in the channel corpus.
Further, the channel classification module 44 is further configured to:
determining a channel corresponding to the query request by using a support vector machine; or
And combining the result obtained according to the similarity with the result obtained according to the support vector machine to determine the channel corresponding to the query request.
Further, the channel classification module 44 is further configured to:
before the query text is input into a fuzzy semantic classification model, judging whether the query text is a preset fuzzy sentence or not, if so, determining a channel corresponding to the query request according to a channel corresponding to the preset fuzzy sentence; and/or
And judging whether the query text contains words in a channel keyword library, if so, determining a channel corresponding to the query request according to a channel corresponding to the words in the channel keyword library.
Further, the service scenario classification module 43 is further configured to:
before extracting text features from the query text, judging whether the query text is a word in a product word bank or a brand word bank, if so, determining that a service scene corresponding to the query request is a commodity query service scene.
Further, the apparatus further comprises a training module configured to:
obtaining a training sample of a scene classification model;
and constructing a neural network, and training the neural network by adopting the training sample so as to obtain the scene classification model.
The text classification device provided in the embodiment of the present invention may be specifically configured to execute the method embodiments provided in fig. 1 and fig. 2, and specific functions are not described herein again.
The text classification device provided by the embodiment of the invention receives a query request input by a user, wherein the query request comprises a query text; extracting text features from the query text; and inputting the text features into a scene classification model to obtain a service scene corresponding to the query request. The method and the device have the advantages that the trained scene classification model is applied, classification of the service scene can be achieved by inputting text features of the query text, maintenance cost is reduced, dependence on workers is small, understanding and prediction of user intentions can be flexibly made, high accuracy and reliability are achieved, and user experience is improved.
Fig. 4 is a schematic structural diagram of a text classification device according to an embodiment of the present invention. The text classification device provided by the embodiment of the present invention may execute the processing flow provided by the text classification method embodiment, as shown in fig. 4, the text classification device 50 includes a memory 51, a processor 52, a computer program, and a communication interface 53; wherein a computer program is stored in the memory 51 and is configured to execute the text classification method described in the above embodiments by the processor 52.
The text classification device of the embodiment shown in fig. 4 may be used to implement the technical solution of the above method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
In addition, the present embodiment also provides a computer-readable storage medium on which a computer program is stored, the computer program being executed by a processor to implement the text classification method described in the above embodiments.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (20)

1. A method of text classification, comprising:
receiving a query request input by a user, wherein the query request comprises query text;
extracting text features from the query text;
and inputting the text features into a scene classification model to obtain a service scene corresponding to the query request.
2. The method according to claim 1, wherein after obtaining the service scenario corresponding to the query request, the method further comprises:
and if the service scene corresponding to the query request is channel query, determining a channel corresponding to the query request and entering the channel.
3. The method of claim 2, wherein the determining the channel corresponding to the query request comprises:
judging whether the query text meets a preset sentence format and contains preset channel keywords;
if yes, determining a channel corresponding to the query request according to the preset channel key words;
if not, inputting the query text into a fuzzy semantic classification model, and determining a channel corresponding to the query request.
4. The method of claim 3, wherein the inputting the query text into a fuzzy semantic classification model and determining the channel corresponding to the query request comprises:
acquiring the similarity between the query text and the corpus in any channel corpus;
and acquiring the channel with the maximum similarity as the channel corresponding to the query request.
5. The method according to claim 4, wherein said obtaining similarity between said query text and corpora in any channel corpus comprises:
acquiring a space vector of a query text according to the query text;
and acquiring the similarity between the space vector of the query text and the space vector of any channel corpus, wherein the space vector of any channel corpus is the space vector of an article combined by all corpora in the channel corpus.
6. The method of claim 4, wherein the inputting the query text into a fuzzy semantic classification model and determining a channel corresponding to the query request further comprises:
determining a channel corresponding to the query request by using a support vector machine; or
And combining the result obtained according to the similarity with the result obtained according to the support vector machine to determine the channel corresponding to the query request.
7. The method of claim 3, wherein before entering the query text into the fuzzy semantic classification model, further comprising:
judging whether the query text is a preset fuzzy sentence or not, if so, determining a channel corresponding to the query request according to a channel corresponding to the preset fuzzy sentence; and/or
And judging whether the query text contains words in a channel keyword library, if so, determining a channel corresponding to the query request according to a channel corresponding to the words in the channel keyword library.
8. The method of claim 1, wherein before extracting the text features from the query text, the method further comprises:
and judging whether the query text is a word in a product word bank or a brand word bank, if so, determining that a service scene corresponding to the query request is a commodity query service scene.
9. The method according to any one of claims 1-8, further comprising:
obtaining a training sample of a scene classification model;
and constructing a neural network, and training the neural network by adopting the training sample so as to obtain the scene classification model.
10. A text classification apparatus, comprising:
the receiving module is used for receiving a query request input by a user, wherein the query request comprises query text;
the characteristic extraction module is used for extracting text characteristics from the query text;
and the service scene classification module is used for inputting the text characteristics into a scene classification model to obtain a service scene corresponding to the query request.
11. The apparatus of claim 10, further comprising a channel classification module configured to:
and if the service scene corresponding to the query request is channel query, determining a channel corresponding to the query request and entering the channel.
12. The apparatus of claim 11, wherein the channel classification module is configured to:
judging whether the query text meets a preset sentence format and contains preset channel keywords;
if yes, determining a channel corresponding to the query request according to the preset channel key words;
if not, inputting the query text into a fuzzy semantic classification model, and determining a channel corresponding to the query request.
13. The apparatus of claim 12, wherein the channel classification module is specifically configured to:
if the query text does not meet the preset sentence format and contains preset channel keywords, acquiring the similarity between the query text and the corpus in any channel corpus;
and acquiring the channel with the maximum similarity as the channel corresponding to the query request.
14. The apparatus of claim 13, wherein the channel classification module is specifically configured to:
acquiring a space vector of a query text according to the query text;
and acquiring the similarity between the space vector of the query text and the space vector of any channel corpus, wherein the space vector of any channel corpus is the space vector of an article combined by all corpora in the channel corpus.
15. The apparatus of claim 13, wherein the channel classification module is further configured to:
determining a channel corresponding to the query request by using a support vector machine; or
And combining the result obtained according to the similarity with the result obtained according to the support vector machine to determine the channel corresponding to the query request.
16. The apparatus of claim 12, wherein the channel classification module is further configured to:
before the query text is input into a fuzzy semantic classification model, judging whether the query text is a preset fuzzy sentence or not, if so, determining a channel corresponding to the query request according to a channel corresponding to the preset fuzzy sentence; and/or
And judging whether the query text contains words in a channel keyword library, if so, determining a channel corresponding to the query request according to a channel corresponding to the words in the channel keyword library.
17. The apparatus of claim 10, wherein the traffic scene classification module is further configured to:
before extracting text features from the query text, judging whether the query text is a word in a product word bank or a brand word bank, if so, determining that a service scene corresponding to the query request is a commodity query service scene.
18. The apparatus of any one of claims 10-17, further comprising a training module to:
obtaining a training sample of a scene classification model;
and constructing a neural network, and training the neural network by adopting the training sample so as to obtain the scene classification model.
19. A text classification apparatus, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-9.
20. A computer-readable storage medium, having stored thereon a computer program;
the computer program, when executed by a processor, implementing the method of any one of claims 1-9.
CN201811383555.3A 2018-11-20 2018-11-20 Text classification method, device, equipment and storage medium Pending CN111274382A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811383555.3A CN111274382A (en) 2018-11-20 2018-11-20 Text classification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811383555.3A CN111274382A (en) 2018-11-20 2018-11-20 Text classification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111274382A true CN111274382A (en) 2020-06-12

Family

ID=71001312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811383555.3A Pending CN111274382A (en) 2018-11-20 2018-11-20 Text classification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111274382A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112100390A (en) * 2020-11-18 2020-12-18 智者四海(北京)技术有限公司 Scene-based text classification model, text classification method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101499277A (en) * 2008-07-25 2009-08-05 中国科学院计算技术研究所 Service intelligent navigation method and system
CN103188408A (en) * 2011-12-29 2013-07-03 上海博泰悦臻电子设备制造有限公司 Voice auto-answer cloud server, voice auto-answer system and voice auto-answer method
US20150052115A1 (en) * 2013-08-15 2015-02-19 Google Inc. Query response using media consumption history
CN106815198A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 The recognition methods of model training method and device and sentence type of service and device
CN108427722A (en) * 2018-02-09 2018-08-21 卫盈联信息技术(深圳)有限公司 intelligent interactive method, electronic device and storage medium
CN108597519A (en) * 2018-04-04 2018-09-28 百度在线网络技术(北京)有限公司 A kind of bill classification method, apparatus, server and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101499277A (en) * 2008-07-25 2009-08-05 中国科学院计算技术研究所 Service intelligent navigation method and system
CN103188408A (en) * 2011-12-29 2013-07-03 上海博泰悦臻电子设备制造有限公司 Voice auto-answer cloud server, voice auto-answer system and voice auto-answer method
US20150052115A1 (en) * 2013-08-15 2015-02-19 Google Inc. Query response using media consumption history
CN106815198A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 The recognition methods of model training method and device and sentence type of service and device
CN108427722A (en) * 2018-02-09 2018-08-21 卫盈联信息技术(深圳)有限公司 intelligent interactive method, electronic device and storage medium
CN108597519A (en) * 2018-04-04 2018-09-28 百度在线网络技术(北京)有限公司 A kind of bill classification method, apparatus, server and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112100390A (en) * 2020-11-18 2020-12-18 智者四海(北京)技术有限公司 Scene-based text classification model, text classification method and device
CN112100390B (en) * 2020-11-18 2021-05-07 智者四海(北京)技术有限公司 Scene-based text classification model, text classification method and device

Similar Documents

Publication Publication Date Title
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
CN109101537B (en) Multi-turn dialogue data classification method and device based on deep learning and electronic equipment
CN110704641B (en) Ten-thousand-level intention classification method and device, storage medium and electronic equipment
CN111753060B (en) Information retrieval method, apparatus, device and computer readable storage medium
CN106951422B (en) Webpage training method and device, and search intention identification method and device
CN106202010B (en) Method and apparatus based on deep neural network building Law Text syntax tree
KR102288249B1 (en) Information processing method, terminal, and computer storage medium
CN111241237B (en) Intelligent question-answer data processing method and device based on operation and maintenance service
CN111177569A (en) Recommendation processing method, device and equipment based on artificial intelligence
CN106095845B (en) Text classification method and device
CN109086265B (en) Semantic training method and multi-semantic word disambiguation method in short text
WO2019133506A1 (en) Intelligent routing services and systems
CN108875065B (en) Indonesia news webpage recommendation method based on content
Gao et al. Text classification research based on improved Word2vec and CNN
CN112287672A (en) Text intention recognition method and device, electronic equipment and storage medium
CN111753082A (en) Text classification method and device based on comment data, equipment and medium
CN110858226A (en) Conversation management method and device
CN112330455A (en) Method, device, equipment and storage medium for pushing information
Athindran et al. Comparative analysis of customer sentiments on competing brands using hybrid model approach
CN112464655A (en) Word vector representation method, device and medium combining Chinese characters and pinyin
CN110610003B (en) Method and system for assisting text annotation
CN107545505A (en) Insure recognition methods and the system of finance product information
CN114417823A (en) Aspect level emotion analysis method and device based on syntax and graph convolution network
CN111680501B (en) Query information identification method and device based on deep learning and storage medium
TWI734085B (en) Dialogue system using intention detection ensemble learning and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination