CN114706956A - Classification information obtaining method, classification information obtaining device, classification information classifying device, electronic equipment and storage medium - Google Patents

Classification information obtaining method, classification information obtaining device, classification information classifying device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114706956A
CN114706956A CN202210399188.6A CN202210399188A CN114706956A CN 114706956 A CN114706956 A CN 114706956A CN 202210399188 A CN202210399188 A CN 202210399188A CN 114706956 A CN114706956 A CN 114706956A
Authority
CN
China
Prior art keywords
term
word
query
classification information
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210399188.6A
Other languages
Chinese (zh)
Inventor
韩钊
王晓元
姜杰
李玉婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210399188.6A priority Critical patent/CN114706956A/en
Publication of CN114706956A publication Critical patent/CN114706956A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The disclosure provides a classification information obtaining method, a classification information obtaining device, an electronic device and a storage medium, and relates to the technical field of data processing, in particular to the technical field of big data and artificial intelligence. The specific implementation scheme is as follows: acquiring a first word; in a query statement, determining a second term corresponding to the first term, and establishing a correlation relationship between the first term and the second term; and determining the correlation relation, the first term and the second term as query classification information for classifying the query sentences. The embodiment of the disclosure can increase the classification information and improve the classification accuracy.

Description

Classification information obtaining method, classification information obtaining device, classification information classifying device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to the field of big data and artificial intelligence technologies, and in particular, to a method and an apparatus for obtaining and classifying classification information, an electronic device, and a storage medium.
Background
In the search technology, the query statement needs to be subjected to intention recognition, and a search is performed based on the recognized intention, and the search result is presented to the user.
Query statements may generally be classified using pre-generated terms to determine the intent of the query statement.
Disclosure of Invention
The disclosure provides a classification information acquisition method, a classification information acquisition device, a classification information acquisition apparatus, an electronic device and a storage medium.
According to an aspect of the present disclosure, there is provided a classification information obtaining method, including:
acquiring a first word;
in a query statement, determining a second term corresponding to the first term, and establishing a correlation between the first term and the second term;
and determining the correlation relation, the first term and the second term as query classification information for classifying the query sentences.
According to another aspect of the present disclosure, there is provided a classification method including:
acquiring an input sentence input by a user;
in the query classification information, a target term corresponding to the input sentence and a term related to the target term are queried, and the type of the input sentence is determined, wherein the query classification information is obtained according to the classification information obtaining method according to any embodiment of the disclosure.
According to an aspect of the present disclosure, there is provided a classification information acquisition apparatus including:
the first word acquisition module is used for acquiring a first word;
the term and relation determining module is used for determining a second term corresponding to the first term in the query statement and establishing a correlation between the first term and the second term;
and the query classification information generation module is used for determining the correlation, the first terms and the second terms as query classification information and classifying the query sentences.
According to another aspect of the present disclosure, there is provided a classification apparatus including:
the input sentence acquisition module is used for acquiring an input sentence input by a user;
the input sentence classification module is configured to query a target word corresponding to the input sentence and a word related to the target word in query classification information, and determine a type of the input sentence, where the query classification information is obtained according to the classification information obtaining method according to any embodiment of the present disclosure.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the classification information obtaining method according to any one of the embodiments of the present disclosure or the classification method according to any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the classification information acquisition method according to any one of the embodiments of the present disclosure or the classification method according to any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the classification information acquisition method according to any one of the embodiments of the present disclosure, or the classification method according to any one of the embodiments of the present disclosure.
The embodiment of the disclosure can increase the classification information and improve the classification accuracy.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a flowchart of a classification information acquisition method disclosed according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of another classification information acquisition method disclosed in accordance with an embodiment of the present disclosure;
FIG. 3 is a flow chart of a classification method disclosed in accordance with an embodiment of the present disclosure;
FIG. 4 is a schematic illustration of another application scenario disclosed in accordance with an embodiment of the present disclosure;
fig. 5 is a structural diagram of a classification information acquisition apparatus disclosed according to an embodiment of the present disclosure;
FIG. 6 is a block diagram of a sorting apparatus according to an embodiment of the present disclosure;
fig. 7 is a block diagram of an electronic device for implementing the classification information acquisition method or the classification method according to the embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flowchart of a classification information obtaining method disclosed in an embodiment of the present disclosure, and this embodiment may be applied to a case of generating classification information. The method of this embodiment may be executed by a classification information obtaining apparatus, which may be implemented in a software and/or hardware manner, and is specifically configured in an electronic device with certain data operation capability, where the electronic device may be a client device or a server device, and the client device is, for example, a mobile phone, a tablet computer, a vehicle-mounted terminal, a desktop computer, and the like.
S101, acquiring a first word.
The first terms are used as standard classification terms of the query classification information, and terms in a more subdivided field are expanded to perform more accurate classification on the query sentences. The first term may be extracted from the query statement or may be a user input. Optionally, a first term is obtained, which includes at least one of the following: obtaining interest information and extracting a first word; and acquiring a middle and long tail query statement, and extracting a first term. The interest information may refer to information of interest of the user. The medium and long tail query statement refers to a query statement which is small in search quantity but exists in search quantity for a long time. The search volume of the medium and long tail query sentences is less than that of the hot spot query sentences. The long-time existence search amount may mean that the search amount exists in a time period of a preset time length. Specifically, the interest information of a plurality of users can be clustered, and the first term is determined according to the extracted term representing the type of each type. Or screening the collected medium and long tail query sentences to obtain medium and long tail query sentences which do not generate clicks and have large Page View (PV), and performing word segmentation or entity extraction to obtain a first word. Wherein the interest information may be information input by the enterprise user. The medium and long tail query sentences can be screened from the query sentences of the enterprise users acquired in the search system.
S102, in the query statement, determining a second term corresponding to the first term, and establishing a correlation between the first term and the second term.
A query statement (query) refers to a statement that needs to be queried and is input by a user. Wherein, the query statement is input by the individual user. A large number of query statements may be collected in advance and for each query statement, a second term may be determined. It should be noted that the collected query statement is authorized by the user and conforms to the regulations of the relevant laws and regulations without violating the good customs of the public order. The second term refers to a term related to the first term in the query sentence, and specifically is a term which expands the first term and has a certain degree of distinction. The second term is used to further classify the classification information represented by the first term. Matching the first term with the second term may mean that the first term is similar to the second term, but with different semantics. The relational relationship may refer to a relationship between the first term and the second term. The correlation relationship is used to determine a corresponding other word according to the word, for example, to determine the first word according to the second word, or to further classify according to the first word to determine the second word. Where the second term may be understood as more specifically categorized query classification information associated with the first term. In fact, the second term and the first term do not exist in isolation, and the correlation between the first term and the second term can be represented by a tree structure, at this time, the first term can be understood as a root node, and a plurality of second terms corresponding to the first term are child nodes of the first term.
S103, determining the correlation, the first terms and the second terms as query classification information for classifying the query sentences.
The query classification information includes terms and the correlation between terms. The query sentences are classified, the target terms corresponding to the query sentences can be queried in the query classification information, the terms related to the target terms are determined according to the related relation in the query classification information, and the expansion terms corresponding to the query sentences are screened. Therefore, the types of the query sentences can be determined as the target words and the expansion words, the types of the query sentences are enriched, and the query sentences are classified more accurately.
In the prior art, query sentences are classified generally according to existing classified words, and when a new industry needs to be built, a large amount of resources need to be invested to extract words in the new industry as classified words, and the query sentences are classified according to the new classified words.
According to the technical scheme, the first terms are obtained, the second terms corresponding to the first terms are extracted from the query sentences, the correlation between the first terms and the second terms is established, the correlation between the terms and the terms is determined to be the query classification information, the query sentences are classified, the classification terms of the query classification information can be increased, the classification range of the query sentences is increased, the classification accuracy of the query sentences is improved, the terms used for classification can be added according to the query sentences obtained in real time, and the instantaneity of the query classification information is improved.
Fig. 2 is a flowchart of another classification information acquisition method disclosed in an embodiment of the present disclosure, which is further optimized and expanded based on the foregoing technical solution, and may be combined with each of the foregoing optional embodiments. In the query statement, determining a second term corresponding to the first term, specifically: identifying a first entity in a query statement; acquiring a target keyword corresponding to the first word according to the first entity; and determining a second word according to the target keyword.
S201, acquiring a first word.
S202, a first entity is identified in the query statement.
An entity is identified in the query statement and determined to be a first entity. The first entity is often referred to by a proper noun. Specifically, the first entity may refer to a noun in the query statement. The entity recognition method may be a dictionary-based method, a statistical-based method, an understanding-based method, and the like. More specifically, the dictionary-based method is to obtain an entity based on matching of a character string in a dictionary with a character string in a query sentence. A statistical method, for example, based on Hidden Markov Models (HMMs), determines a plurality of words having a high adjacent occurrence probability as entities. The understanding-based approach may be to identify text based on semantic information and syntactic information, e.g., input a query statement, output an entity in the query statement based on a pre-trained neural network model. The first entity is used for screening out the terms related to the first terms and adding the terms to the query classification information.
Illustratively, the query statement is: how does hypoglycemic agent a have an effect? The first entity identified is: hypoglycemic agent and A.
In addition, some query sentences are unrelated to the first term, and the target keywords cannot be extracted from the query sentences. Illustratively, the first term is blood glucose, and the query statement is: where is there a toilet near the XX intersection? The query statement is irrelevant to the first term, and the term corresponding to the first term cannot be extracted from the query statement. Optionally, identifying the first entity in the query statement may include: the method includes the steps of screening a plurality of query sentences collected in advance, and identifying a first entity in the screened query sentences. The screening method may be that the query statement similar to the first term is determined as the screened query statement. The query sentences similar to the first terms can calculate the similarity between the first terms and the query sentences through a pre-trained deep learning model, can calculate the similarity by extracting the text features of the first terms and the text features of the query sentences, and determine the query sentences with the similarity value larger than or equal to a preset similarity threshold value as the query sentences similar to the first terms. The deep learning model may be a neural network model, for example, a convolutional neural network model, and, for example, may be a language model, such as an Enhanced Representation by Knowledge Integration (ERNIE) model, or a Bidirectional Encoder Representation from transducers (BERT) model, etc. The similarity threshold may be 0.7, with the highest similarity being 1 and the lowest similarity being 0. It should be noted that the similarity threshold cannot be too high, and the query statement similar to the first term is usually a query statement similar to the first term but with a certain degree of discrimination.
The query sentences collected in advance are screened, and the screened query sentences are identified to obtain the first entity so as to screen the target keywords, so that the detection data amount of the expanded words of the first words is reduced, and the detection accuracy of the expanded words of the first words is improved.
S203, according to the first entity, obtaining a target keyword corresponding to the first word.
The target keyword is a word which is associated with the first word and has a certain distinguishing degree in a plurality of first entities. The target keyword may refer to an expanded word of the first word. The target keyword is used to determine the second term. Illustratively, the first term is blood glucose, as in the previous example of the query statement, and the target keywords corresponding to the first term are hypoglycemic agent and a. For another example, the first term is a mobile phone, and the query statement is: how can XX brand handset capabilities? The first entity comprises an XX brand and a mobile phone, and the target keyword corresponding to the first word is the mobile phone.
According to the similarity value between the first word and the first entity, the first entity similar to the first word is screened out from the plurality of first entities, and the first entity is determined as the target keyword corresponding to the first word. Further, the target keyword is different from the first word. The same words as the first word may also be eliminated from the first entity.
Optionally, the obtaining, according to the first entity, a target keyword corresponding to the first term includes: expanding the first word to obtain a similar sentence; respectively extracting the features of the first word and the similar sentence to form a first feature vector; obtaining an average feature vector according to each first feature vector; performing feature extraction on the first entity to form a second feature vector; and screening target keywords corresponding to the first words in each first entity according to the average feature vector and each second feature vector.
Similar phrases refer to phrases similar to the first phrase, wherein the phrases include at least one of: words and sentences, etc. The expansion of the first term may be to obtain some sentences, query sentences similar to the first term from the sentences, determine similar sentences as the first term, or obtain sentences input by a user, determine similar sentences, or obtain second terms determined historically, obtain query sentences similar to the first term, determine similar sentences as the first term, and the like. In addition, the query sentences with similar first terms may be the query sentences obtained by the aforementioned screening.
And extracting features of the first words to obtain first feature vectors, extracting features of the similar sentences to obtain first feature vectors, and extracting features of the first entity to obtain second feature vectors. The average feature vector may be an average of the first feature vector for describing features of the first word. The feature vector may be a feature representing text, specifically a feature describing semantics and a font of a word. The text is subjected to feature extraction, and the text can be represented by standard charting. The feature vector extraction may be implemented by a feature extraction model, which may be, for example, a support vector machine, a convolutional neural network model, a BERT model, or an ERNIE model.
In fact, the first word has only one word, and the extracted first feature vector is difficult to represent the features of the first word. And only the first feature vector extracted by the first word is adopted to be matched with the second feature vector of each first entity, so that the accuracy rate of the matching result is low. Similar sentences of the first words can be added, the first feature vectors are respectively extracted, the average feature vector is calculated, the representativeness of the average feature vector to the first words can be improved, the semantic information of the first words is generalized, and the feature information of the first words is enriched.
According to the average feature vector and the second feature vectors, the target keywords are screened from the first entity, and the similarity between the average feature vector and each second feature vector is calculated, and the first entity of the second feature vectors with the similarity greater than or equal to a preset similarity threshold is determined as the target keywords. Wherein, the similarity of two vectors can be calculated by the distance between the two vectors.
The method comprises the steps of obtaining similar sentences of first words, respectively extracting features of the first words and the similar sentences to obtain first feature vectors, averaging to obtain average feature vectors, enriching feature information of the first words, improving representativeness of the average feature vectors for the first words, and meanwhile, screening target keywords in each first entity based on the average feature vectors enriching the feature information of the first words and second feature vectors obtained by extracting features of each first entity, so that the detection range of the target keywords related to the first words can be enlarged, and the detection accuracy of the target keywords is improved.
S204, determining a second word according to the target keyword, and establishing a correlation between the first word and the second word.
And determining the second word according to the target keyword, wherein the target keyword can be determined as the second word, and the target keyword can be further processed to obtain the second word. And establishing a correlation between the first terms and the second terms determined by the query statement. In fact, a plurality of query sentences may be collected, different query sentences may determine a plurality of target keywords having the same font or the same semantic, the target keywords determined by the plurality of query sentences may be deduplicated to reduce the number of redundant target keywords, and the second term may be determined according to the deduplicated target keywords.
Optionally, the determining a second term according to the target keyword includes: extracting a second entity corresponding to the target keyword from the query statement; and determining a second word according to the target keyword and the second entity.
The second entity is generally referred to by a proper noun. Specifically, the second entity refers to a product noun. The second entity corresponding to the target keyword may refer to a second entity belonging to the type of the target keyword. Determining the second word according to the target keyword and the second entity may be determining both the target keyword and the second entity as the second word, or determining the second entity as the second word. Illustratively, the query statement is: can a lower blood glucose? The target keyword is hypoglycemic agent, and the second entity corresponding to the target keyword is A. As another example, a query statement includes: how does a certain blood glucose meter have an effect? The target keyword is a glucometer, and the extracted second entity comprises a certain glucometer. Additionally, the second entity may also include an opto-electronic glucometer, a photochemical glucometer, or a XX brand glucometer, among others.
For example, the second entity corresponding to the target keyword may be extracted from the query statement based on a pre-trained neural network model implementation. The input of the model is the query statement and the target keyword, and the output of the model is the second entity in the query statement. For example, the neural network model includes a convolutional neural network, a generative countermeasure network, an image neural network, and the like. More specifically, the neural network module is a mixed Density network model (mix Density Networks).
And extracting a second entity corresponding to the target keyword from the query sentence, actually further expanding the target keyword, determining the second entity needing to be queried, establishing a correlation relation with the target keyword and the first term, and further enriching query classification information.
The number of the query sentences is at least one, and the target keyword determined by each query sentence is at least one. Target keywords can be summarized, and each keyword is adopted to extract entities from each query statement respectively to obtain a second entity.
By identifying the second entity in the query sentence based on the target keyword, further expanding the target keyword, and adding the second entity as the related second term of the first term to the query classification information, the classification information related to the first term can be increased, and the range and the precision of the query classification information can be increased, so that the classification accuracy of the query sentence can be improved.
Optionally, the establishing a correlation between the first word and the second word includes: establishing a first-level correlation between the first words and the target keywords; and establishing a second-level correlation between the target keywords and the corresponding second entities.
The correlations may include multiple levels of correlations. The first level of correlation is used to indicate that the target keyword is an expanded classified word of the first word, and the second level of correlation is used to indicate that the second entity is an expanded classified word of the target keyword. In practice, the first term is subdivided into a more specific plurality of target keywords, each of which may in turn be subdivided into a more specific plurality of second entities. The first term may be understood as a parent, the target keyword is a child of the first term, the second entity is a child of the target keyword, and the target keyword is a parent of the second entity.
For example, in the case that the type of the query statement is determined as the target second entity, the type of the query statement may further include a target keyword related to the target second entity and a target first term related to the target keyword, so that the classification information of the query statement may be increased. In addition, under the condition that the type of the query statement is determined to be the target keyword, the second entity of the query statement can be further detected aiming at the second entity related to the target keyword, so that the classification information of the query statement is increased, the query statement is determined more accurately, the classification granularity of the query statement is increased, and the classification of the query statement is adjusted flexibly.
By establishing the correlation among a plurality of hierarchies of the first term, the target keyword and the second entity, the correlation among the classified terms of the query classification information can be enriched so as to accurately classify the query sentence.
S205, determining the correlation, the first terms and the second terms as query classification information for classifying the query sentences.
According to the technical scheme, the query sentence is segmented to obtain the first entity, the target keyword corresponding to the first word is obtained through screening in the first entity, the second word is determined, the target keyword related to the first word can be accurately obtained from the real-time query sentence, the second word is determined and used as the classified word to classify the query sentence, the classified word expanded by the first word can be accurately obtained, specific classification branches of the first word are enriched, the classification range is increased, and the classification accuracy of the query sentence is improved.
Fig. 3 is a flowchart of a classification method disclosed in an embodiment of the present disclosure, which may be applied to a case where query statements are classified according to query classification information. The method of this embodiment may be executed by a classification apparatus, which may be implemented in a software and/or hardware manner and is specifically configured in an electronic device with certain data operation capability, where the electronic device may be a client device or a server device, and the client device may be, for example, a mobile phone, a tablet computer, a vehicle-mounted terminal, a desktop computer, and the like.
S301, acquiring an input sentence input by a user.
The input statement is a query statement input by the user. A large number of input sentences of the user can be acquired. The input sentence of the user is obtained according with the regulation of related laws and regulations without violating the good custom of the public order.
S302, in the query classification information, querying a target term corresponding to the input sentence and a term related to the target term, and determining the type of the input sentence, wherein the query classification information is obtained according to the classification information obtaining method according to any embodiment of the present disclosure.
The query classification information includes terms and the correlation between the terms. It should be noted that, the query classification information may be terms in which some terms have a correlation relationship. Specifically, the query classification information includes a first term, and the first term and a second term have a correlation relationship. The query classification information may further include a third term, the third term having no correlation with other terms. The target word may refer to a word similar to the input sentence for determining the type of the input sentence. The query classification information can be understood as a classification word bank, the target words of the input sentences are queried according to the query classification information, the input sentences are actually classified, and the types of the input sentences are determined to be the target words.
And inquiring the target terms corresponding to the input sentences in the inquiry classification information, and determining the terms related to the target terms according to the related relation corresponding to the target terms. And determining the target words and the related words as the input sentence types. In addition, the related terms of the target terms can be continuously inquired according to the related relationship corresponding to the related terms of the target terms, and the related terms are also determined as the related terms of the target terms and used for determining the type of the input statement.
For example, the target term corresponding to the input sentence may be queried using a rule-based classification method and/or a model-based classification method. The rule-based classification method may be that the input sentence is segmented, the segmented words are respectively matched with the words in the query classification information, the words corresponding to the segmented words are determined, and the corresponding words and the related words are determined as the target words of the input sentence. The words corresponding to the words obtained by division refer to the words same as the words obtained by division. The model-based classification method may be to train a machine learning model in advance, and input the input sentence into the trained machine learning model to obtain a target word corresponding to the input sentence. By adopting two modes, the target words corresponding to the input sentences are respectively determined, and the target words corresponding to the input sentences can be processed, such as duplication elimination, repeated words in the target words are reduced, and the target words corresponding to the input sentences are updated.
Optionally, the classification method further includes: and classifying the user according to the type of the input statement.
And taking the target words corresponding to the input sentences as the target words corresponding to the user who inputs the input sentences. The target words can be understood as the label information of the users, and the users are classified according to the target words corresponding to the users. A large number of users are classified, different types of user clusters can be obtained, and meanwhile the types of the user clusters can be represented by the same target word corresponding to each user in the user clusters. That is, the target words corresponding to the users in the user cluster may be determined as the label information of the user cluster, and the users may be accurately classified and the types of the users may be determined. The user cluster may represent users who pay attention to the same topic, and the users in the user cluster may be processed specifically according to an application scenario, for example, the users in the user cluster push information associated with a target word corresponding to the user, and for example, the frequency of pushing may be determined according to the number of the users in the user cluster. For another example, when there is information to be pushed, the information to be pushed may be determined, a user cluster corresponding to the information to be pushed is queried according to a target term corresponding to the user cluster, and the information to be pushed is respectively sent to users in the user cluster, so that accurate information pushing is achieved. In addition, other application scenarios exist, and corresponding processing can be performed specifically according to the actual scenario.
Through the types of the input sentences, the users who provide the input sentences are classified, accurate classification of the users can be achieved, the application scenes are adapted to process the users who pay attention to the same theme, and accuracy of data processing is improved.
Optionally, the query classification information includes terms and correlation relationships between terms; the querying of the target terms corresponding to the input sentence and the terms related to the target terms comprises: determining terms to be updated in terms included in the query classification information according to term length and term semantics; adding related words in the words to be updated according to the related relations among the words, and updating the words to be updated; and inputting the input sentence into a pre-trained classification model, and outputting a target word corresponding to the input sentence according to the updated word to be updated.
In practice, the appropriate rule-based classifier words and the appropriate model-based classifier words are different. Generally, terms that are single in terms of phrases and semantics are suitable as the classification words employed in the rule-based classification method. While longer words and word-ambiguous words are suitable as classification words for use in model-based classification methods. XX is for example a brand, but at the same time also understood as a food. The classified words suitable for the rule-based classification method may be determined as rule words and the classified words suitable for the model-based classification method may be determined as model words. The model-based classification method generally classifies words according to semantic information of the words, and the accuracy of classification for polysemy of a word is low, so that constraint information can be added to the model words, the semantics of the model words are single, the model words are clearer, and the classification accuracy of the model is improved. Illustratively, XX adds a food, resulting in XX food, so that it can be determined that XX food is a semantic meaning representing a food.
And classifying the words according to the word length and word semantics of each word in the query classification information to obtain the words to be updated and the words not to be updated. Wherein, the words to be updated are words with shorter words and/or multiple words; the words not to be updated include words with longer words and single semanteme. The words to be updated can be understood as the aforementioned model words, and the words not to be updated can be understood as the aforementioned rule words.
Aiming at the words to be updated, related words can be added to the words to be updated according to the related relations between the words and the words in the query classification information and based on the related relations, so that semantic constraints are added to the words to be updated, and the semantics of the words to be updated are more accurate. Specifically, the correlation relationship includes a first-level correlation relationship and a second-level correlation relationship, the priority of the correlation relationship may be preset, the correlation relationship with a high priority is selected, the related term to be added is determined, and the term to be added is added to the term to be updated. Illustratively, the related relation with high priority corresponding to the word to be updated is obtained, and is determined as the related word related to the word to be updated. It is understood that words of parent nodes of the words to be updated are added to the words to be updated, rather than words of child nodes of the words to be updated.
Illustratively, a first-order correlation exists between blood glucose and a glucometer and between the glucometer and a photoelectric glucometer, a second-order correlation exists between the glucometer and the photoelectric glucometer, and between the hypoglycemic agent and A. For example, the word to be updated is A, the related word of A is hypoglycemic, and the hypoglycemic is added into A to obtain the hypoglycemic A. For another example, the word to be updated is a blood glucose meter, the priority of the first-level correlation is higher than that of the second-level correlation, and the related word in the first-level correlation of the blood glucose meter is blood glucose, so that the blood glucose can be added into the blood glucose meter to obtain the blood glucose meter.
The pre-classified classification model is used for inquiring a target word corresponding to the input sentence from the plurality of words and determining the corresponding target word as the type of the input sentence. In the embodiment of the present disclosure, the classification model is used to query the target term corresponding to the input sentence in the query classification information and the updated term to be updated. When the target term corresponding to the query statement is the updated term to be updated, the term corresponding to the term to be updated can be determined according to the corresponding correlation of the term to be updated in the query classification information before updating, and the corresponding term is also determined as the target term corresponding to the query statement. The classification model may be a deep learning model, and may be an ERNIE model, for example.
The words in the query classification information are classified according to the word length and the semantics, the words to be updated are determined, the words related to the words to be updated in the query classification information are added to the words to be updated, and semantic constraints are added to the words to be updated, so that the semantics of the words to be updated are more accurate, the query sentence classification is performed based on the updated words to be updated, and the classification accuracy of the classification model can be improved.
In addition, in the case of querying the terms related to the target term, the related relationship with high priority in the target term may be queried, and the terms related to the target term may be determined. Illustratively, a first-order correlation exists between blood glucose and a glucometer and between the glucometer and a photoelectric glucometer, a second-order correlation exists between the glucometer and the photoelectric glucometer, and between the hypoglycemic agent and A. The priority of the first-level correlation relation is higher than that of the second-level correlation relation, the target word is A, the related word of the A is hypoglycemic agent, and the related word of the first-level correlation relation of the hypoglycemic agent is blood sugar, so that the blood sugar, the hypoglycemic agent and the A can be determined as the type corresponding to the input sentence. For another example, the target word is a blood glucose meter, and the related word in the first-level correlation relationship of the blood glucose meter is blood glucose, so blood glucose is the related word of the blood glucose meter.
According to the technical scheme, the classification accuracy of the input sentences can be improved by obtaining the query classification information and determining the target words of the input sentences as the types of the input sentences, and meanwhile, the input sentences are classified based on the words included by the abundant query classification information, so that the classification accuracy can be improved.
Fig. 4 is a schematic diagram of another application scenario disclosed in accordance with an embodiment of the present disclosure. The method can comprise the following steps:
firstly, query classification information construction:
obtaining a first word: and collecting medium and long tail query sentences which are not clicked from the recently searched query sentences in the enterprise user, and screening out the query sentences of which the page browsing amount is greater than a preset browsing amount threshold value. And acquiring interest information input by the enterprise user. And extracting the first term from at least one item of the screened query statement, the interest information and the like. The following is a detailed explanation of the group of interest of the first word "blood glucose".
In fact, if the first word of "blood sugar" is used alone to make similar judgment with the query statement, it is not enough to recall, for example, "insulin", "diabetes" or some hypoglycemic drug products, which may result in insufficient coverage of crowd clusters and inaccurate classification of users. Therefore, semantic expansion of the first term is required.
The method includes the steps of screening a plurality of query sentences collected in advance, and identifying a first entity in the screened query sentences. Or "blood glucose" as an example: the similarity judgment is carried out on the blood sugar and the query sentences of the whole amount in a single day, specifically, the similarity between the first term and the collected query sentences can be calculated by adopting a task-oriented ERNIE-sim model, the query sentences with the similarity value larger than a similarity threshold (for example, 0.7) are selected, and the query sentences are determined to be the basic expanded query sentences of the first term so as to carry out subsequent first entity identification. To obtain an expanded word with a certain degree of discrimination from the first word "blood glucose", the similarity threshold should not be too high. After the base expanded query statement is obtained, the first entity is extracted therefrom. For example, the three query statements are: "current blood glucose standard", "hyperglycemia hypoglycemic agent" and "price of electronic glucometer". Two first entities, "hypoglycemic drugs" and "blood glucose meters" may be identified.
And acquiring a target keyword corresponding to the first word according to the first entity. Specifically, the first word is expanded to obtain a similar sentence; respectively extracting features of the first words and the similar sentences to form first feature vectors; obtaining an average feature vector according to each first feature vector; performing feature extraction on the first entity to form a second feature vector; and screening the target keywords corresponding to the first words in each first entity according to the average feature vector and each second feature vector.
The similarity between the first entity and the first word "blood sugar" is judged, and the step is to draw other entities related to the first word "blood sugar". The feature vector of the first term "blood sugar" is difficult to express all information contained in the first term "blood sugar", so that the query statement, i.e. the similar statement, which is obtained by screening and has certain similarity with the first term "blood sugar" is used as the supplementary information of the first term "blood sugar", and the feature vector of the query statement obtained by screening is extracted. In order to ensure the dimension consistency of the feature vectors during the subsequent similarity calculation, all the feature vectors are summed and then averaged to obtain an average feature vector which is used as the feature vector of the first word blood sugar. And carrying out similarity judgment on the average feature vector and a second feature vector extracted from the features of the target keywords. The similarity discrimination method and the similarity threshold value can adopt the ERNIE-sim model and the similarity threshold value.
Illustratively, under the condition that similarity discrimination is directly performed on a first feature vector of a first word "blood sugar" and a second feature vector of a target keyword without expanding a similar sentence, the target keyword corresponding to the first word "blood sugar" is screened out and determined as the target keyword obtained before transformation. And under the condition of expanding similar sentences and carrying out similarity judgment on the average characteristic vector of the first word blood sugar and the second characteristic vector of the target keyword, screening the target keyword corresponding to the first word blood sugar, and determining the target keyword as the converted target keyword. The following shows the first word "blood glucose", the comparison between the target keywords obtained before transformation and the transformed target keywords, as shown in table 1:
TABLE 1
Before conversion After transformation
Hypoglycaemia Hypoglycaemia
Hypoglycemic agent Hypoglycemic agent
Hyperglycemia Hyperglycemia
Symptoms of hyperglycemia Symptoms of hyperglycemia
Measuring blood sugar Blood sugar measurement
Fasting blood glucose Fasting blood sugar
Diabetes mellitus Diabetes mellitus
Treatment of diabetes Treatment of diabetes
Glucose Blood glucose meter
As can be seen from table 1, the target keyword obtained before transformation and the target keyword after transformation are different, and as can be seen from the last row, the target keyword after transformation is more similar to blood glucose.
Extracting a second entity corresponding to the target keyword from the query statement; and determining a second word according to the target keyword and the second entity. The step is mainly to identify a product entity corresponding to the target keyword in the query sentence obtained by screening in an entity extraction mode and determine the product entity as a second entity. For example, the target keyword of "hypoglycemic agent" can be extracted to the corresponding specific product names, such as A and B. Illustratively, the user-defined entity recognition can be realized by adopting an entity recognition operator NLPC-MONET operator in a Natural Language Processing task (NLPC) based on a C Language, the query statement and the target keyword obtained by screening are input, and the NLPC-MONET operator recognizes the second entity in the query statement obtained by screening according to the target keyword. For example, the "A" second entity may be extracted from "A can lower blood glucose" in the query statement. Table 2 lists identifying a second entity in the query statement according to the target keyword:
TABLE 2
Entity word Brand word
Blood glucose meter Three-type blood glucose meter
Blood glucose meter Sannuo blood glucose meter
Blood glucose meter Lep glucometer
Blood glucose meter Photochemical glucometer
Blood glucose meter Photoelectric blood glucose meter
Blood glucose meter Photoelectric blood glucose meter
Hypoglycemic agent Dola glycopeptides
Hypoglycemic agent Pioglitazone
Hypoglycemic agent Anda Tang Dynasty
The embodiment of the disclosure can automatically detect whether a certain communication tool exists on the website and monitor corresponding communication behavior data.
Establishing a first-level correlation between the first words and the target keywords; establishing a second-level correlation between the target keywords and the corresponding second entities; and determining the correlation relation, the first terms and the second terms as query classification information for classifying the query sentences. A set of target keywords and second entities corresponding to the first term are obtained according to the first term. In addition, a plurality of first terms can be obtained, and a batch of corresponding target keywords and second entities can be expanded for each first term.
After the query classification information is constructed, the user may be classified according to an input sentence input by the user. In the classification process, a rule-based classification method and a model-based classification method can be adopted to query the target words corresponding to the input sentences and the words related to the target words, determine the types of the input sentences, and classify the users according to the types of the input sentences. And determining users of the same class as a user cluster, and determining the type or label information of the user cluster according to the type of input sentences input by the users.
In the query classification information, the terms suitable for the rule classification are different from the terms suitable for the model classification. In general, terms suitable for model classification include terms suitable for regular classification.
For the rule word: in the first word "blood sugar" example, the target keywords can be basically used for rule judgment, and in the second entity, some hypoglycemic drug products or products of a glucometer can also be used as judgment words for rules.
For example, when a 'alopecia attention crowd' is constructed, a second entity 'grows' and if the inclusion (in) is used as a judgment mode, wrong words (badcase) such as 'student development' are easily expanded, and the actual effect is influenced. Therefore, when the query statement is classified regularly, the query statement needs to be participled once. The "student development" is divided into "student" and "development" and then matched with the regular words, so that wrong words matched due to improper word segmentation can be filtered out. That is, the words of the input sentence are segmented, the words are queried in the query classification information according to the words obtained by the segmentation, the words with the same words obtained by the segmentation are determined, and the words are determined as the target words corresponding to the input sentence.
For model words: the target keywords can be used as model words, but it should be noted that, since there are many second entities identified, some words (badcase) which are easy to misjudge are easily generated at this step. Therefore, when the second entity is used, the prefix of the extracted target keyword needs to be added and spliced with the second entity to serve as a basis for similarity judgment, and the influence of the words which are easy to misjudge is reduced. Equivalently, determining the words to be updated, namely the model words, in the constructed query classification information, determining the words related to the words to be updated according to the related relation in the query classification information, and adding the words to be updated. And the classification model classifies the input sentences based on the updated words to be updated.
The classification method based on the model can determine a target word corresponding to the input sentence based on the updated word to be updated and the query classification information, determine a first related word according to the correlation of the target word in the query classification information, and determine the first related word as the type of the input sentence.
Through the above operations, the type of the user can be judged according to the query sentence input by the user, and whether the user belongs to the user cluster of the specific concerned subject (word) or not is detected.
According to the technical scheme, the classification information can be added, the divided user clusters can cover a plurality of subdivided fields, the fields corresponding to the hot points are automatically generated in real time based on the hot point information, the users of the types corresponding to the hot points are quickly determined, the user clusters which concern the hot points are generated, the accuracy and the real-time performance of user classification are improved, and the flexibility of user classification is improved.
Fig. 5 is a structural diagram of a classification information acquisition apparatus according to an embodiment of the present disclosure, which is suitable for a case of generating classification information. The device is realized by software and/or hardware and is specifically configured in electronic equipment with certain data operation capacity.
A classification information acquisition apparatus 500 shown in fig. 5 includes: a first term obtaining module 501, a term and relationship determining module 502 and a query classification information generating module 503; wherein the content of the first and second substances,
a first word obtaining module 501, configured to obtain a first word;
a term and relationship determining module 502, configured to determine, in the query statement, a second term corresponding to the first term, and establish a correlation relationship between the first term and the second term;
a query classification information generating module 503, configured to determine the correlation relationship, the first term, and the second term as query classification information, and classify the query statement.
According to the technical scheme, the first terms are obtained, the second terms corresponding to the first terms are extracted from the query sentences, the correlation between the first terms and the second terms is established, the correlation between the terms and the terms is determined to be the query classification information, the query sentences are classified, the classification terms of the query classification information can be increased, the classification range of the query sentences is increased, the classification accuracy of the query sentences is improved, the terms used for classification can be added according to the query sentences obtained in real time, and the instantaneity of the query classification information is improved.
Further, the term and relationship determination module includes: a first entity obtaining unit, configured to identify a first entity in the query statement; the keyword screening unit is used for acquiring a target keyword corresponding to the first word according to the first entity; and the second word determining unit is used for determining a second word according to the target keyword.
Further, the second word determination unit includes: a second entity obtaining unit, configured to extract a second entity corresponding to the target keyword from the query statement; and the second word generation subunit is used for determining a second word according to the target keyword and the second entity.
Further, the term and relationship determination module includes: the first-level correlation relationship establishing unit is used for establishing a first-level correlation relationship between the first word and the target keyword; and the second-level correlation relationship establishing unit is used for establishing a second-level correlation relationship between the target key words and the corresponding second entities.
Further, the keyword screening unit includes: the first word expansion subunit is used for expanding the first word to obtain a similar sentence; the first feature extraction subunit is used for respectively extracting features of the first word and the similar sentence to form a first feature vector; the average vector calculation subunit is used for obtaining an average characteristic vector according to each first characteristic vector; the second feature extraction subunit is used for performing feature extraction on the first entity to form a second feature vector; and the target keyword determining subunit is configured to, according to the average feature vector and each of the second feature vectors, obtain, in each of the first entities, a target keyword corresponding to the first word by screening.
The classification information acquisition device can execute the classification information acquisition method provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of executing the classification information acquisition method.
Fig. 6 is a structural diagram of a classification device in an embodiment of the present disclosure, and the embodiment of the present disclosure is applied to a case of classifying an input sentence. The device is realized by software and/or hardware and is specifically configured in electronic equipment with certain data operation capacity.
A sorting apparatus 600 as shown in fig. 6, comprising: an input sentence acquisition module 601 and an input sentence classification module 602; wherein the content of the first and second substances,
an input sentence acquisition module 601, configured to acquire an input sentence input by a user;
an input sentence classification module 602, configured to query, in query classification information, a target term corresponding to the input sentence and a term related to the target term, and determine a type of the input sentence, where the query classification information is obtained according to the classification information obtaining method according to any embodiment of the present disclosure.
According to the technical scheme, the classification accuracy of the input sentences can be improved by obtaining the query classification information and determining the target words of the input sentences as the types of the input sentences, and meanwhile, the input sentences are classified based on the words included by the abundant query classification information, so that the classification accuracy can be improved.
Further, the classification device further includes: and the user classification module is used for classifying the users according to the types of the input sentences.
Further, the query classification information includes terms and correlation relations between the terms; the input sentence classification module 602 includes: a word to be updated acquiring unit, configured to determine a word to be updated in the words included in the query classification information according to the word length and the word semantics; the classification information updating unit is used for adding related words in the words to be updated according to the related relations among the words and updating the words to be updated; and the sentence classification unit is used for inputting the input sentences into a pre-trained classification model and outputting the target words corresponding to the input sentences according to the updated words to be updated.
The classification device can execute the classification method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects for executing the classification method.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 7 illustrates a schematic area diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 executes the respective methods and processes described above, such as the classification information acquisition method or the classification method. For example, in some embodiments, the classification information acquisition method or classification method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the classification information acquisition method or the classification method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the classification information acquisition method or the classification method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or area diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (19)

1. A classification information acquisition method includes:
acquiring a first word;
in a query statement, determining a second term corresponding to the first term, and establishing a correlation relationship between the first term and the second term;
and determining the correlation relation, the first term and the second term as query classification information for classifying the query sentences.
2. The method of claim 1, wherein the determining, in the query statement, a second term that corresponds to the first term comprises:
identifying a first entity in a query statement;
acquiring a target keyword corresponding to the first word according to the first entity;
and determining a second word according to the target keyword.
3. The method of claim 2, wherein said determining a second term from said target keyword comprises:
extracting a second entity corresponding to the target keyword from the query statement;
and determining a second word according to the target keyword and the second entity.
4. The method of claim 3, wherein said establishing a correlation between said first term and said second term comprises:
establishing a first-level correlation between the first words and the target keywords;
and establishing a second-level correlation between the target keywords and the corresponding second entities.
5. The method of claim 2, wherein said obtaining, according to the first entity, a target keyword corresponding to the first term comprises:
expanding the first word to obtain a similar sentence;
respectively extracting features of the first words and the similar sentences to form first feature vectors;
obtaining an average feature vector according to each first feature vector;
performing feature extraction on the first entity to form a second feature vector;
and screening target keywords corresponding to the first words in each first entity according to the average feature vector and each second feature vector.
6. A method of classification, comprising:
acquiring an input sentence input by a user;
in query classification information, a target term corresponding to the input sentence and a term related to the target term are queried to determine the type of the input sentence, wherein the query classification information is obtained according to the classification information obtaining method of any one of claims 1 to 5.
7. The method of claim 6, further comprising:
and classifying the user according to the type of the input statement.
8. The method of claim 6, wherein the query classification information includes terms and correlation relationships between terms;
the querying of the target terms corresponding to the input sentence and the terms related to the target terms comprises:
determining terms to be updated in terms included in the query classification information according to term length and term semantics;
adding related words in the words to be updated according to the related relations among the words, and updating the words to be updated;
and inputting the input sentence into a pre-trained classification model, and outputting a target word corresponding to the input sentence according to the updated word to be updated.
9. A classification information acquisition apparatus comprising:
the first word acquisition module is used for acquiring a first word;
the term and relation determining module is used for determining a second term corresponding to the first term in the query statement and establishing a correlation between the first term and the second term;
and the query classification information generation module is used for determining the correlation, the first terms and the second terms as query classification information and classifying the query sentences.
10. The apparatus of claim 9, wherein the term and relationship determination module comprises:
a first entity obtaining unit, configured to identify a first entity in the query statement;
the keyword screening unit is used for acquiring a target keyword corresponding to the first word according to the first entity;
and the second word determining unit is used for determining a second word according to the target keyword.
11. The apparatus of claim 10, wherein the second term determination unit comprises:
a second entity obtaining unit, configured to extract a second entity corresponding to the target keyword from the query statement;
and the second word generation subunit is used for determining a second word according to the target keyword and the second entity.
12. The apparatus of claim 11, wherein the term and relationship determination module comprises:
the first-level correlation relationship establishing unit is used for establishing a first-level correlation relationship between the first word and the target keyword;
and the second-level correlation relationship establishing unit is used for establishing a second-level correlation relationship between the target keyword and the corresponding second entity.
13. The apparatus of claim 10, wherein the keyword filtering unit comprises:
the first word expansion subunit is used for expanding the first word to obtain a similar sentence;
the first feature extraction subunit is used for respectively extracting features of the first word and the similar sentence to form a first feature vector;
the average vector calculation subunit is used for obtaining an average characteristic vector according to each first characteristic vector;
the second feature extraction subunit is used for performing feature extraction on the first entity to form a second feature vector;
and the target keyword determining subunit is configured to, according to the average feature vector and each of the second feature vectors, obtain, in each of the first entities, a target keyword corresponding to the first word by screening.
14. A sorting apparatus comprising:
the input sentence acquisition module is used for acquiring an input sentence input by a user;
an input sentence classification module, configured to query a target word corresponding to the input sentence and a word related to the target word in query classification information, and determine a type of the input sentence, where the query classification information is obtained according to the classification information obtaining method according to any one of claims 1 to 5.
15. The apparatus of claim 14, further comprising:
and the user classification module is used for classifying the users according to the types of the input sentences.
16. The apparatus of claim 14, wherein the query classification information includes terms and correlation relationships between terms;
the input sentence classification module comprises:
a word to be updated acquiring unit, configured to determine a word to be updated in the words included in the query classification information according to the word length and the word semantics;
the classification information updating unit is used for adding related words in the words to be updated according to the related relations among the words and updating the words to be updated;
and the sentence classification unit is used for inputting the input sentences into a pre-trained classification model and outputting the target words corresponding to the input sentences according to the updated words to be updated.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the classification information acquisition method of any one of claims 1-5 or to perform the classification method of any one of claims 6-8.
18. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the classification information acquisition method according to any one of claims 1 to 5 or the classification method according to any one of claims 6 to 8.
19. A computer program product comprising a computer program which, when executed by a processor, implements the classification information acquisition method according to any one of claims 1 to 5, or the classification method according to any one of claims 6 to 8.
CN202210399188.6A 2022-04-15 2022-04-15 Classification information obtaining method, classification information obtaining device, classification information classifying device, electronic equipment and storage medium Pending CN114706956A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210399188.6A CN114706956A (en) 2022-04-15 2022-04-15 Classification information obtaining method, classification information obtaining device, classification information classifying device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210399188.6A CN114706956A (en) 2022-04-15 2022-04-15 Classification information obtaining method, classification information obtaining device, classification information classifying device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114706956A true CN114706956A (en) 2022-07-05

Family

ID=82173924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210399188.6A Pending CN114706956A (en) 2022-04-15 2022-04-15 Classification information obtaining method, classification information obtaining device, classification information classifying device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114706956A (en)

Similar Documents

Publication Publication Date Title
US11544459B2 (en) Method and apparatus for determining feature words and server
CN107301170B (en) Method and device for segmenting sentences based on artificial intelligence
CN113986864A (en) Log data processing method and device, electronic equipment and storage medium
CN112148881A (en) Method and apparatus for outputting information
CN114579104A (en) Data analysis scene generation method, device, equipment and storage medium
CN107526721A (en) A kind of disambiguation method and device to electric business product review vocabulary
CN112989235B (en) Knowledge base-based inner link construction method, device, equipment and storage medium
CN113836316A (en) Processing method, training method, device, equipment and medium for ternary group data
US20210216710A1 (en) Method and apparatus for performing word segmentation on text, device, and medium
CN116467461A (en) Data processing method, device, equipment and medium applied to power distribution network
CN114491232B (en) Information query method and device, electronic equipment and storage medium
CN115952258A (en) Generation method of government affair label library, and label determination method and device of government affair text
CN112560425B (en) Template generation method and device, electronic equipment and storage medium
CN115292506A (en) Knowledge graph ontology construction method and device applied to office field
CN114647727A (en) Model training method, device and equipment applied to entity information recognition
KR20220024251A (en) Method and apparatus for building event library, electronic device, and computer-readable medium
CN114528378A (en) Text classification method and device, electronic equipment and storage medium
CN114860872A (en) Data processing method, device, equipment and storage medium
CN112926297A (en) Method, apparatus, device and storage medium for processing information
CN114706956A (en) Classification information obtaining method, classification information obtaining device, classification information classifying device, electronic equipment and storage medium
CN113792546A (en) Corpus construction method, apparatus, device and storage medium
CN112528644A (en) Entity mounting method, device, equipment and storage medium
CN114186552B (en) Text analysis method, device and equipment and computer storage medium
CN113377921B (en) Method, device, electronic equipment and medium for matching information
CN113377922B (en) Method, device, electronic equipment and medium for matching information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination