CN112069324A - Classified label adding method, device, equipment and storage medium - Google Patents

Classified label adding method, device, equipment and storage medium Download PDF

Info

Publication number
CN112069324A
CN112069324A CN202010879905.6A CN202010879905A CN112069324A CN 112069324 A CN112069324 A CN 112069324A CN 202010879905 A CN202010879905 A CN 202010879905A CN 112069324 A CN112069324 A CN 112069324A
Authority
CN
China
Prior art keywords
classified
phrase
event
text
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010879905.6A
Other languages
Chinese (zh)
Inventor
郭明坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lynxi Technology Co Ltd
Original Assignee
Beijing Lynxi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lynxi Technology Co Ltd filed Critical Beijing Lynxi Technology Co Ltd
Priority to CN202010879905.6A priority Critical patent/CN112069324A/en
Publication of CN112069324A publication Critical patent/CN112069324A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The embodiment of the disclosure discloses a method, a device, equipment and a storage medium for adding a classification label. The method comprises the following steps: acquiring a warning situation text to be classified and a target phrase corresponding to at least one classification label, wherein the target phrase comprises: event phrases and similar words of the event phrases; matching the warning situation text to be classified with the target phrase; if the matching is successful, adding the classification label corresponding to the target phrase to the alert text to be classified, and by the technical scheme, automatically classifying the alert text and improving the classification efficiency.

Description

Classified label adding method, device, equipment and storage medium
Technical Field
The present disclosure relates to computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for adding a classification tag.
Background
In the face of massive warning situation texts, data statistics and analysis are achieved through a manual mode, actual requirements of urban public security cannot be met far away, and warning situation classification work shows a large improvement space on timeliness and accuracy. There is a need for a method capable of adding classification tags to alert texts so as to automatically classify alert texts according to the classification tags.
Disclosure of Invention
The embodiment of the disclosure provides a classification label adding method, a classification label adding device, equipment and a storage medium, so that automatic classification of warning situation texts can be realized, and the classification efficiency is improved.
In a first aspect, an embodiment of the present disclosure provides a classification tag adding method, including:
acquiring a warning situation text to be classified and a target phrase corresponding to at least one classification label, wherein the target phrase comprises: event phrases and similar words of the event phrases;
matching the warning situation text to be classified with the target phrase;
and if the matching is successful, adding the classification label corresponding to the target phrase to the alert text to be classified.
Further, after the alert text to be classified is matched with the target phrase, the method further includes:
if the matching is not successful, determining a target word corresponding to the warning situation text to be classified according to the warning situation text to be classified and a database, wherein the database is constructed according to keywords extracted from event phrases corresponding to the at least one classification label, and the database comprises: at least one of a keyword, a sub-category word of the keyword, a near-synonym of the keyword, and a near-synonym of the sub-category word of the keyword, wherein the keyword comprises: at least one of a behavior class verb, a real class noun, and a locale class noun;
inputting the target word into a trained neural network model to obtain an event phrase corresponding to the target word;
and adding the classification label corresponding to the event phrase to the alert text to be classified.
Further, the training method of the neural network model comprises the following steps:
acquiring sample words and sample event phrases corresponding to the sample words;
inputting the sample word into a neural network model to be trained to obtain a predicted event phrase;
and training the model parameters of the neural network model to be trained according to the target function formed by the prediction event phrase and the sample event phrase until the trained neural network model is obtained.
Further, matching the alert text to be classified with the target phrase includes:
performing word segmentation processing on the warning situation text to be classified;
extracting word segmentation vectors of the word segmentation results;
and matching the word segmentation vector with the target phrase.
Further, the sub-category of the keyword includes: the subclass words of the real class nouns and/or the subclass words of the place class nouns.
Further, the synonyms of the keywords include: at least one of the similar meaning words of the behavior class verb, the similar meaning words of the real class noun and the similar meaning words of the place class noun.
Further, the method further comprises:
and updating the target phrase corresponding to the classification label according to the warning situation text to be classified and the classification label of the warning situation text to be classified.
Further, the event phrase includes: a combination of at least two of a behavioral verb, a physical noun, and a locale noun.
In a second aspect, an embodiment of the present disclosure further provides a classification label adding apparatus, where the apparatus includes:
the obtaining module is used for obtaining a warning situation text to be classified and a target phrase corresponding to at least one classification label, wherein the target phrase comprises: event phrases and similar words of the event phrases;
the matching module is used for matching the warning situation text to be classified with the target phrase;
and the adding module is used for adding the classification label corresponding to the target phrase to the alarm text to be classified if the matching is successful.
In a third aspect, an embodiment of the present disclosure further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the method for adding a classification tag according to any one of the embodiments of the present disclosure.
In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the classification label adding method according to any one of the disclosed embodiments.
The classification label adding method provided by the embodiment of the disclosure obtains a warning situation text to be classified and a target phrase corresponding to at least one classification label, wherein the target phrase comprises: event phrases and similar words of the event phrases; matching the warning situation text to be classified with the target phrase; and if the matching is successful, adding the classification label corresponding to the target phrase to the alert text to be classified. The classification label adding method provided by the embodiment can automatically add the classification label to the warning situation text, and automatically classify the warning situation text according to the classification label, so that the classification efficiency is improved.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present disclosure and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings may be obtained from the drawings without inventive effort.
FIG. 1 is a flow chart of a category label addition method in an exemplary embodiment of the present disclosure;
FIG. 1a is a database build diagram in an exemplary embodiment of the present disclosure;
FIG. 1b is a flow chart of adding classification tags to text in an exemplary embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a category label adding device in an exemplary embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a computer device in an exemplary embodiment of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the disclosure and are not limiting of the disclosure. It should be further noted that, for the convenience of description, only some of the structures relevant to the present disclosure are shown in the drawings, not all of them.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present disclosure, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Along with the increasingly prominent social security problem, public security information centers receive a large amount of public security information every day, and in order to facilitate classified storage of the security texts, public security personnel extract a plurality of pieces of specific information through keyword retrieval, and then manually screen and judge case classification, or directly and manually classify one by one. The mode not only consumes a large amount of time of workers, so that the working efficiency is not high, but also has low accuracy because of the numerous categories of the warning situation texts, and the phenomenon of wrong classification is easy to occur through manual classification.
The training data of the neural network greatly affects the recognition accuracy of the neural network, and because the expression modes of each alarm person are different, the problem that the actual situation is difficult to locate when the alarm person is judged by a machine is solved, and the accuracy of the obtained recognition result is low when the text data resources with high complexity and huge quantity are analyzed through a single algorithm model.
The traditional knowledge graph is constructed by adopting an entity-relation-entity architecture and is mainly suitable for an information retrieval and question-answering system. The application effect of the event classification is poor, when the alarm text is classified, the extracted event information mainly exists in the form of the relationship between nodes and is difficult to be used as an analysis key point, and the entity has little influence on the alarm classification, so that a large amount of node space is occupied.
Fig. 1 is a flowchart of a classification label adding method according to an exemplary embodiment of the present disclosure, where this embodiment is applicable to a case of adding a classification label to an alert text, and the method may be executed by a classification label adding apparatus in an embodiment of the present disclosure, where the apparatus may be implemented in a software and/or hardware manner, as shown in fig. 1, the method includes the following steps:
step 110, obtaining a warning situation text to be classified and a target phrase corresponding to at least one classification label, wherein the target phrase comprises: an event phrase and a similar word of the event phrase.
The event phrase is a phrase extracted from the classification tags and used for representing an event, and the event phrase can be acquired in a manner of directly splitting the classification tags; for example, if the classification label is "stealing bicycle", three words of "stealing", "selling" and "bicycle" are extracted from the classification label "stealing bicycle", and the three words are combined, for example, the parts of speech of the three words are respectively determined, the "stealing" and "selling" are verbs, the "bicycle" is a noun, and the three words are expanded and combined according to a moving object structure, so that an event phrase including "stealing bicycle" and "stealing bicycle".
It should be understood that the classification tags may include words of various parts of speech, and the event phrases extracted according to the classification tags may include word composition phrases of various parts of speech, for example, phrases that may be a combination of at least two of behavioral verbs, real-world nouns, and locale nouns.
Wherein, the target phrase also comprises the similar meaning words of the event phrase. The synonyms of the event phrase can be obtained according to the event phrase expansion, and can also be determined and added according to the alert text to be classified, which is not limited by the disclosure.
The manner of obtaining the at least one classification label may be to construct a knowledge graph frame of the classification label in advance, obtain a last-stage label in the knowledge graph frame, and use the last-stage label as the classification label, where there may be a plurality of last-stage labels, and therefore there may be a plurality of classification labels.
The alarm text to be classified is the alarm text which is input by a user and needs to be classified, and the alarm text is the text related to the alarm.
In a possible implementation manner, a warning situation text to be classified input by a user and a plurality of labels for classification input by the user are obtained in advance, the plurality of labels for classification input by the user are divided into a plurality of levels, a knowledge graph frame of the classification labels is constructed according to the plurality of labels and the level relationship of the labels, a last-level label in the knowledge graph frame is obtained as a classification label, and a target phrase is extracted from the classification label, for example, the first level includes: public order management, industrial site management, dangerous goods management, public security attack, dog involvement, security and the like, wherein a second-level label is arranged below the first-level label, for example, the second-level label below the security is as follows: complaint and dispute, there may be a third-level label below the second-level label, for example, the third-level label below the security complaint includes: the security guard, security theft, security gambling and the like, four-level labels or even five-level labels are arranged below the three-level labels, the grade of the labels is determined by requirements, and a knowledge graph framework of the classification labels is constructed according to the grade relation of the labels.
And step 120, matching the alert texts to be classified with the target phrases.
Wherein the target phrase comprises: the event phrase and the similar words of the event phrase, the manner of extracting the event phrase may be to pre-establish a database about the corresponding relationship between the classification tags and the event phrase, query the database according to the classification tags to obtain the event phrase corresponding to the tags, or obtain the event phrase corresponding to the classification tags in a form of table lookup.
For example, the manner of matching the alert text to be classified with the target phrase may be to pre-establish a database about a corresponding relationship between the alert text and the event phrase, query the database according to the alert text to be classified, and match the alert text to be classified with the event phrase in the database; or the alarm text to be classified is matched with the event phrase and the near-meaning words of the event phrase in the database for the database which is pre-established about the corresponding relation between the alarm text and the event phrase and the near-meaning words of the event phrase; or the Word segmentation processing can be carried out on the warning situation text to be classified, Word segmentation vectors are extracted from Word2vec according to Word segmentation results, the Word segmentation vectors are subjected to vector matching with the event phrase and the near-meaning words of the event phrase, and the text to be classified is subjected to character string matching with the event phrase and the near-meaning words of the event phrase; the method may further include performing Word segmentation on the alert text to be classified, extracting Word segmentation vectors from Word2vec according to Word segmentation results, and performing vector matching on the Word segmentation vectors and the event phrases and the similar words of the event phrases, which is not limited in this embodiment of the present invention.
And step 130, if the matching is successful, adding the classification label corresponding to the target phrase to the alert text to be classified.
The successful matching refers to that a target phrase corresponding to the alert text to be classified exists, and it should be noted that the successful matching may be that the target phrase exists, the matching degree of which with the alert text to be classified is greater than the threshold value of the matching degree; the target phrase with the matching degree of 100% with the warning situation text to be classified may also exist, which is not limited in the embodiment of the present invention. For example, if the manner of matching the alert text to be classified with the target phrase is to pre-establish a database about a corresponding relationship between the alert text and an event phrase, and match the alert text to be classified with the event phrase in the database, a successful match means that the event phrase corresponding to the alert text to be classified is queried in the database; if the mode of matching the alert text to be classified with the target phrase is to pre-establish a database about the corresponding relationship between the alert text and the event phrase and the near-meaning words of the event phrase, and match the alert text to be classified with the event phrase and the near-meaning words of the event phrase in the database, the successful matching means that the event phrase and/or the near-meaning words of the event phrase corresponding to the alert text to be classified are searched in the database; if the alarm text to be classified is matched with the target phrase in a way that the alarm text to be classified is subjected to Word segmentation, Word segmentation vectors are extracted from Word2vec as Word segmentation results, the Word segmentation vectors are subjected to vector matching with event phrases and near-meaning words of the event phrases, and the text to be classified is subjected to character string matching with the event phrases and the near-meaning words of the event phrases, successful matching means that event phrases and/or near-meaning words of the event phrases are matched with the Word segmentation vectors of the alarm text to be classified, and/or event phrases and/or near-meaning words of the event phrases are matched with the character strings of the alarm text to be classified; if the mode of matching the alert text to be classified with the target phrase is to perform Word segmentation on the alert text to be classified, extract Word segmentation vectors from Word2vec as Word segmentation results, and perform vector matching on the Word segmentation vectors and the event phrases and the near-meaning words of the event phrases, the successful matching means that the event phrases and/or the near-meaning words of the event phrases are matched with the Word segmentation vectors of the alert text to be classified, which is not limited in the embodiment of the present invention.
For example, if the matching is successful, adding the classification label corresponding to the target phrase to the alert text to be classified, and further classifying the alert text to be classified according to the classification label. For example, if the alert text to be classified is matched with the target phrase in a manner of performing segmentation processing on the alert text to be classified, extracting a segmentation vector from a segmentation result through Word2vec, performing vector matching on the segmentation vector and the near-meaning words of the event phrase and the event phrase, performing character string matching on the text to be classified and the near-meaning words of the event phrase and the event phrase, and/or adding a classification label corresponding to the event phrase and/or the near-meaning words of the event phrase to the alert text to be classified if the event phrase and/or the near-meaning words of the event phrase match the segmentation vector of the alert text to be classified and/or the event phrase and the near-meaning words of the event phrase match the character string of the alert text to be classified exist; and if the alarm text to be classified is matched with the target phrase in a way of performing Word segmentation processing on the alarm text to be classified, extracting Word segmentation vectors from Word2vec according to Word segmentation results, performing vector matching on the Word segmentation vectors and the event phrases and the near-sense words of the event phrases, and adding classification labels corresponding to the event phrases and/or the near-sense words of the event phrases to the alarm text to be classified if the event phrases and/or the near-sense words of the event phrases are matched with the Word segmentation vectors of the alarm text to be classified.
In an alternative example, the user-provided tags include: first-level labeling: public order management, trade place management type, dangerous goods management type, public security strike type, the class of wading with dogs, security class etc. have the second grade label under the first grade label, have public safety type and public security class again under the public order management of first grade label, have third grade label again under the second grade label, for example: the second-level public security class comprises a mobile phone theft pin, a theft pin bicycle and the like, and the third-level label is the last-level label. Obtaining an event phrase extracted from a warning condition text to be classified and a three-level label (a mobile phone stealing and a stealing bicycle), wherein the format of the event phrase is noun + verb (such as mobile phone stealing, mobile phone selling and mobile phone losing), or verb + noun (such as bicycle stealing, bicycle stealing and bicycle losing), performing Word segmentation on the warning condition text to be classified, extracting Word segmentation vectors from Word2vec according to Word segmentation results, performing vector matching on the Word segmentation vectors and vectors converted by mobile phone stealing, bicycle stealing and a stealing wallet, and adding a stealing bicycle label to the warning condition text to be classified if the warning condition text to be classified is successfully matched with the stealing bicycle, so that the fact that the two-level label to which the warning condition text to be classified belongs is public safety can be deduced, and the corresponding first-level label is managed by public order.
Optionally, after the alert text to be classified is matched with the target phrase, the method further includes:
if the matching is not successful, determining a target word corresponding to the warning situation text to be classified according to the warning situation text to be classified and a database, wherein the database is constructed according to keywords extracted from event phrases corresponding to the at least one classification label, and the database comprises: at least one of a keyword, a sub-category word of the keyword, a near-synonym of the keyword, and a near-synonym of the sub-category word of the keyword, wherein the keyword comprises: at least one of a behavior class verb, a real class noun, and a locale class noun;
inputting the target word into a trained neural network model to obtain an event phrase corresponding to the target word;
and adding the classification label corresponding to the event phrase to the alert text to be classified.
The keywords may be extracted from the event phrases corresponding to the classification tags, for example, the extraction method may be to perform word segmentation processing on the event phrases corresponding to the classification tags, and select the keywords from word segmentation results; for example, if the classification label is a stolen bicycle, and the event phrase corresponding to the stolen bicycle is a stolen bicycle, a sold and stolen bicycle, and a bicycle is lost, the keyword is extracted from the event phrase: theft, bicycle, sale, loss.
Wherein the database comprises: keywords, sub-classes of keywords, near-synonyms of keywords, and/or near-synonyms of sub-classes of keywords. For example, if the keyword is a bicycle, the synonyms of the keyword include: bicycles and pedal cycles; the sub-categories of keywords may include brand a bicycles, brand B bicycles, brand C bicycles. If the keyword is a hotel, the keyword can be called as a hotel, a hotel and a hotel. The subclasses of keywords include: brand, kind, model, etc., if the keyword is a mobile phone, the subclass of the mobile phone may include: the key words include drugs, subclasses of drugs may include: heroin, methamphetamine, marijuana, and the like, and if the keyword is a hotel, the subclasses of the hotel may include: a hotel of brand a, a hotel of brand B, a hotel of brand C, etc., which are not limited in this embodiment of the present invention.
The target words comprise at least one word, and the target words are keywords which are matched successfully.
The method for determining the target words corresponding to the warning situation text to be classified according to the warning situation text to be classified and the database can be that the warning situation text to be classified is subjected to word segmentation, word segmentation vectors are extracted from word2vec according to word segmentation results, the database is queried according to the extracted word segmentation vectors, and the target words corresponding to the word segmentation vectors are obtained; the word segmentation processing may also be directly performed on the warning situation text to be classified, and a database is queried according to the word segmentation processing result to obtain a target word corresponding to the warning situation text to be classified, which is not limited in the embodiment of the present invention.
For example, if the alert text to be classified is matched with the target phrase and the matching is not successful, an event phrase corresponding to at least one classification tag may be obtained, keywords may be extracted from the event phrase, obtaining the similar meaning words of the keywords according to the keywords, obtaining the subclasses of the keywords according to the keywords, obtaining the similar meaning words of the subclasses of the keywords, storing the keywords, the subclasses of the keywords, the similar meaning words of the keywords and the similar meaning words of the subclasses of the keywords into a database, and querying a database according to the alert text to be classified to obtain a target word corresponding to the alert text to be classified, inputting the target word into a pre-trained neural network model to obtain an event phrase corresponding to the target word, and adding a classification label corresponding to the event phrase to the alert text to be classified so as to facilitate automatic classification according to the classification label.
Optionally, the training method of the neural network model includes:
acquiring sample words and sample event phrases corresponding to the sample words;
inputting the sample word into a neural network model to be trained to obtain a predicted event phrase;
and training the model parameters of the neural network model to be trained according to the target function formed by the prediction event phrase and the sample event phrase until the trained neural network model is obtained.
The sample words can only comprise the keywords, so that the complexity and the calculation amount can be reduced, and the training speed of the neural network can be improved.
Wherein the objective function is a loss function.
For example, inputting a sample word into a neural network model to be trained to obtain a predicted event word group; training model parameters of the neural network model to be trained according to a target function formed by the prediction event phrase and the sample event phrase; and returning to execute the operation of inputting the sample words into the neural network model to be trained to obtain the predicted event word group until the trained neural network model is obtained.
In a possible implementation manner, the target phrase corresponding to the classification label may be updated according to the alert text to be classified and the classification label of the alert text to be classified.
For example, when determining the classification label corresponding to the alert text to be classified, according to the alert text to be classified and the classification label, a phrase or a near-sense word of the phrase related to the classification label may be determined, and the phrase or the near-sense word of the phrase related to the classification label may be added to the target phrase corresponding to the classification label.
For example, the matching degree between the alert text to be classified and the target phrase corresponding to the classification label M is less than 100%, and is greater than or equal to the threshold value of the matching degree, and it may be determined that the label of the alert text to be classified is the classification label M. The word group or the near-meaning word of the word group related to the classification label M may be determined according to the alert text to be classified, for example, the alert text to be classified is segmented, the segmented words are combined to obtain the word group or the near-meaning word of the word group related to the classification label M, and the obtained word group or the near-meaning word of the word group is added to the target word group corresponding to the classification label M, so as to update the target word group corresponding to the classification label M.
Therefore, the event phrases and the similar meaning words of the event phrases in the target phrases corresponding to the classification labels can be enriched continuously, and the accuracy and the classification efficiency of adding the classification labels of the alarm situation texts to be classified subsequently are improved.
Optionally, matching the alert text to be classified with the target phrase includes:
performing word segmentation processing on the warning situation text to be classified;
extracting word segmentation vectors of the word segmentation results;
and matching the word segmentation vector with the target phrase.
The manner of extracting the segmentation vector of the segmentation result may be to extract the segmentation vector of the segmentation result through an NLP technique, for example, the segmentation vector of the segmentation result is extracted through word2vec, which is not limited in this embodiment of the present invention.
Optionally, the sub-category of the keyword includes: the subclass words of the real class nouns and/or the subclass words of the place class nouns.
Optionally, the synonyms of the keyword include: at least one of the similar meaning words of the behavior class verb, the similar meaning words of the real class noun and the similar meaning words of the place class noun.
Optionally, the event phrase includes: a combination of at least two of a behavioral verb, a physical noun, and a locale noun.
For example, as shown in fig. 1a, a user provides classification tags in advance, the classification tags are divided into a plurality of levels, a knowledge graph frame of the tags is constructed according to relationships between the tags, a last-level tag is obtained, the last-level tag is associated with a previous-level tag, an event phrase and a near-meaning word of the event phrase corresponding to the last-level tag are obtained, and a keyword is extracted from the event phrase, where the keyword includes: the method comprises the steps of acquiring behavior keywords, physical keywords and place keywords, acquiring similar words of the behavior keywords, similar words of the physical keywords and similar words of the place keywords, acquiring subclasses of the physical keywords and similar words of the subclasses of the physical keywords, and acquiring subclasses of the place keywords and similar words of the subclasses of the place keywords.
In an alternative example, the user provides the classification label in advance, and the classification label is divided into a plurality of levels, for example: the first level comprises public order management, industrial site management, dangerous goods management, public security attack, dog-related and security, the second level label is arranged under the first level label, for example, complaint and dispute are arranged under the security, the third level label is arranged under the second level label, for example, security offender, security theft and security gambling are arranged under the security complaint, the fourth level label or even the fifth level label is arranged under the security offender, and the label is determined by the user requirement. And constructing a knowledge graph frame of the labels according to the classification labels, wherein the next label can be deduced to the previous label according to the knowledge graph frame, so that the previous label can be deduced on the basis of determining that the text to be classified belongs to the last label. Extracting event phrases from the last-level tags in the form of noun + verb (e.g. gas poisoning) or verb + noun (e.g. stealing bicycles), for some tags, such as tags under the category of industry and place, it is necessary to add information representing place/place category in the event besides verb and noun. And matching the alert texts to be classified with the event phrases and/or the similar words of the event phrases. If the matching is successful, adding a classification label corresponding to the target phrase into the alert text to be classified, if the matching is not successful, extracting core keywords according to the event phrase corresponding to the label, and dividing the core keywords into three types: the behavior class verb, the object class noun and the place class noun, the number of the keyword types can be 1 class, 2 classes or 3 classes according to different labels. Based on the keywords of the object class and the keywords of the place class, subclasses are continuously expanded, for example, vehicles are divided into motor vehicles and non-motor vehicles, and the keywords, the similar words of the subclasses and events are continuously supplemented through a large amount of data samples. Similar terms such as bicycle, hotel, etc. may be used to refer to bicycle, hotel, etc. The subclasses of keywords include: brand, kind, model, etc., if the keyword is a mobile phone, the subclass of the mobile phone may include: the key words include drugs, subclasses of drugs may include: heroin, methamphetamine, marijuana, and the like, and if the keyword is a hotel, the subclasses of the hotel may include: a brand a hotel, a brand B hotel, a brand C hotel, etc. A deep neural network model is trained through a large amount of data, input is keywords which are divided into three types of behaviors, objects and places, output is an event phrase, and the event phrase corresponds to the last-stage label information.
In an optional example, as shown in fig. 1b, a warning situation text to be classified is input, word segmentation processing is performed on the warning situation text to be classified, word segmentation vectors are extracted from word2vec as word segmentation processing results, character string matching and vector matching are performed on event phrases and near-sense words of the event phrases, if any matching is successful, a label corresponding to the event phrase corresponding to the warning situation text to be classified is obtained, and a multi-level label is obtained according to the label association. If the matching is not successful, extracting keywords from the event phrases corresponding to the labels, constructing a keyword library according to the keywords, the keyword near-sense words, the keyword sub-class words and the near-sense words of the keyword sub-class words, performing vector matching and character string matching on the alert texts to be classified and the keyword library to obtain matched keywords, inputting the matched keywords into a neural network to obtain the event phrases corresponding to the matched keywords, obtaining the labels corresponding to the event phrases, and obtaining the multilevel labels according to the label association.
The embodiment of the invention can automatically classify the alert texts, greatly improves the classification efficiency, and has important significance for classification, serial case string and deep case analysis.
In the method for adding the classification label provided by this embodiment, the warning situation text to be classified is matched with the target phrase by acquiring the warning situation text to be classified and the target phrase corresponding to at least one classification label; if the matching is successful, the classification labels corresponding to the target phrases are added to the alarm texts to be classified, the classification labels can be automatically added to the alarm texts, automatic classification is carried out according to the classification labels, and the classification efficiency is improved.
Fig. 2 is a schematic structural diagram of an apparatus for adding a classification tag according to an exemplary embodiment of the present disclosure, where this embodiment is applicable to a case of adding a classification tag to an alert text, and the apparatus may be implemented in a software and/or hardware manner, and may be integrated into any device that provides a function of adding a classification tag to an alert text, as shown in fig. 2, where the apparatus for adding a classification tag includes: an acquisition module 210, a matching module 220, and an addition module 230.
The obtaining module 210 is configured to obtain a warning situation text to be classified and a target phrase corresponding to at least one classification tag, where the target phrase includes: event phrases and similar words of the event phrases;
the matching module 220 is configured to match the alert text to be classified with the target phrase;
and an adding module 230, configured to add the classification label corresponding to the target phrase to the alert text to be classified if the matching is successful.
The device can execute the methods provided by all the embodiments of the disclosure, and has corresponding functional modules and beneficial effects for executing the methods. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in all the foregoing embodiments of the disclosure.
Fig. 3 is a schematic structural diagram of a computer device in an exemplary embodiment of the present disclosure. FIG. 3 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present disclosure. The computer device 12 shown in fig. 3 is only one example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.
As shown in FIG. 3, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 3, and commonly referred to as a "hard drive"). Although not shown in FIG. 3, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk-Read Only Memory (CD-ROM), a Digital Video disk (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally perform the functions and/or methodologies of the embodiments described in this disclosure.
Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. In the computer device 12 of the present embodiment, the display 24 is not provided as a separate body but is embedded in the mirror surface, and when the display surface of the display 24 is not displayed, the display surface of the display 24 and the mirror surface are visually integrated. Moreover, computer device 12 may also communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN)) and/or a public Network (e.g., the Internet) via Network adapter 20. As shown, network adapter 20 communicates with the other modules of computer device 12 via bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, Redundant processing units, external disk drive Arrays, disk array (RAID) systems, tape drives, and data backup storage systems, to name a few.
The processing unit 16 executes various functional applications and data processing by executing programs stored in the system memory 28, for example, implementing the classification label adding method provided by the embodiment of the present disclosure:
acquiring a warning situation text to be classified and a target phrase corresponding to at least one classification label, wherein the target phrase comprises: event phrases and similar words of the event phrases;
matching the warning situation text to be classified with the target phrase;
and if the matching is successful, adding the classification label corresponding to the target phrase to the alert text to be classified.
The disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the classification tag adding method provided by all the disclosed embodiments of the present application:
acquiring a warning situation text to be classified and a target phrase corresponding to at least one classification label, wherein the target phrase comprises: event phrases and similar words of the event phrases;
matching the warning situation text to be classified with the target phrase;
and if the matching is successful, adding the classification label corresponding to the target phrase to the alert text to be classified.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (Hyper Text Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present disclosure and the technical principles employed. Those skilled in the art will appreciate that the present disclosure is not limited to the particular embodiments described herein, and that various obvious changes, adaptations, and substitutions are possible, without departing from the scope of the present disclosure. Therefore, although the present disclosure has been described in greater detail with reference to the above embodiments, the present disclosure is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present disclosure, the scope of which is determined by the scope of the appended claims.

Claims (10)

1. A classification label adding method is characterized by comprising the following steps:
acquiring a warning situation text to be classified and a target phrase corresponding to at least one classification label, wherein the target phrase comprises: event phrases and similar words of the event phrases;
matching the warning situation text to be classified with the target phrase;
and if the matching is successful, adding the classification label corresponding to the target phrase to the alert text to be classified.
2. The method according to claim 1, wherein after matching the alert text to be classified with the target phrase, the method further comprises:
if the matching is not successful, determining a target word corresponding to the warning situation text to be classified according to the warning situation text to be classified and a database, wherein the database is constructed according to keywords extracted from event phrases corresponding to the at least one classification label, and the database comprises: at least one of a keyword, a sub-category word of the keyword, a near-synonym of the keyword, and a near-synonym of the sub-category word of the keyword, wherein the keyword comprises: at least one of a behavior class verb, a real class noun, and a locale class noun;
inputting the target word into a trained neural network model to obtain an event phrase corresponding to the target word;
and adding the classification label corresponding to the event phrase to the alert text to be classified.
3. The method of claim 2, wherein the training method of the neural network model comprises:
acquiring sample words and sample event phrases corresponding to the sample words;
inputting the sample word into a neural network model to be trained to obtain a predicted event phrase;
and training the model parameters of the neural network model to be trained according to the target function formed by the prediction event phrase and the sample event phrase until the trained neural network model is obtained.
4. The method of claim 1, wherein matching the alert text to be classified with the target phrase comprises:
performing word segmentation processing on the warning situation text to be classified;
extracting word segmentation vectors of the word segmentation results;
and matching the word segmentation vector with the target phrase.
5. The method of claim 2, wherein the sub-category of the keyword comprises: the subclass words of the real class nouns and/or the subclass words of the place class nouns.
6. The method according to claim 1 or 2, characterized in that the method further comprises:
and updating the target phrase corresponding to the classification label according to the warning situation text to be classified and the classification label of the warning situation text to be classified.
7. The method of claim 1, wherein the event phrase comprises: a combination of at least two of a behavioral verb, a physical noun, and a locale noun.
8. An apparatus for adding a classification label, comprising:
the obtaining module is used for obtaining a warning situation text to be classified and a target phrase corresponding to at least one classification label, wherein the target phrase comprises: event phrases and similar words of the event phrases;
the matching module is used for matching the warning situation text to be classified with the target phrase;
and the adding module is used for adding the classification label corresponding to the target phrase to the alarm text to be classified if the matching is successful.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-7 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202010879905.6A 2020-08-27 2020-08-27 Classified label adding method, device, equipment and storage medium Pending CN112069324A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010879905.6A CN112069324A (en) 2020-08-27 2020-08-27 Classified label adding method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010879905.6A CN112069324A (en) 2020-08-27 2020-08-27 Classified label adding method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112069324A true CN112069324A (en) 2020-12-11

Family

ID=73659518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010879905.6A Pending CN112069324A (en) 2020-08-27 2020-08-27 Classified label adding method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112069324A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989050A (en) * 2021-03-31 2021-06-18 建信金融科技有限责任公司 Table classification method, device, equipment and storage medium
CN113704458A (en) * 2021-10-29 2021-11-26 江铃汽车股份有限公司 Vehicle instrument character display method, system, storage medium and equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893606A (en) * 2016-04-25 2016-08-24 深圳市永兴元科技有限公司 Text classifying method and device
CN108334605A (en) * 2018-02-01 2018-07-27 腾讯科技(深圳)有限公司 File classification method, device, computer equipment and storage medium
CN109284391A (en) * 2018-12-07 2019-01-29 吉林大学 A kind of document automatic classification method
CN109542830A (en) * 2018-11-21 2019-03-29 北京灵汐科技有限公司 A kind of data processing system and data processing method
CN109840280A (en) * 2019-03-05 2019-06-04 百度在线网络技术(北京)有限公司 A kind of file classification method, device and computer readable storage medium
WO2019174423A1 (en) * 2018-03-16 2019-09-19 北京国双科技有限公司 Entity sentiment analysis method and related apparatus
CN110580288A (en) * 2019-08-23 2019-12-17 腾讯科技(深圳)有限公司 text classification method and device based on artificial intelligence
CN110597988A (en) * 2019-08-28 2019-12-20 腾讯科技(深圳)有限公司 Text classification method, device, equipment and storage medium
CN110796160A (en) * 2019-09-16 2020-02-14 腾讯科技(深圳)有限公司 Text classification method, device and storage medium
CN110837601A (en) * 2019-10-25 2020-02-25 杭州叙简科技股份有限公司 Automatic classification and prediction method for alarm condition
CN111191445A (en) * 2018-11-15 2020-05-22 北京京东金融科技控股有限公司 Advertisement text classification method and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893606A (en) * 2016-04-25 2016-08-24 深圳市永兴元科技有限公司 Text classifying method and device
CN108334605A (en) * 2018-02-01 2018-07-27 腾讯科技(深圳)有限公司 File classification method, device, computer equipment and storage medium
WO2019174423A1 (en) * 2018-03-16 2019-09-19 北京国双科技有限公司 Entity sentiment analysis method and related apparatus
CN111191445A (en) * 2018-11-15 2020-05-22 北京京东金融科技控股有限公司 Advertisement text classification method and device
CN109542830A (en) * 2018-11-21 2019-03-29 北京灵汐科技有限公司 A kind of data processing system and data processing method
CN109284391A (en) * 2018-12-07 2019-01-29 吉林大学 A kind of document automatic classification method
CN109840280A (en) * 2019-03-05 2019-06-04 百度在线网络技术(北京)有限公司 A kind of file classification method, device and computer readable storage medium
CN110580288A (en) * 2019-08-23 2019-12-17 腾讯科技(深圳)有限公司 text classification method and device based on artificial intelligence
CN110597988A (en) * 2019-08-28 2019-12-20 腾讯科技(深圳)有限公司 Text classification method, device, equipment and storage medium
CN110796160A (en) * 2019-09-16 2020-02-14 腾讯科技(深圳)有限公司 Text classification method, device and storage medium
CN110837601A (en) * 2019-10-25 2020-02-25 杭州叙简科技股份有限公司 Automatic classification and prediction method for alarm condition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
沈佳杰,江红,王肃: "基于关键词的云计算语义文本自适应分类", 计算机工程, vol. 40, no. 7, 31 July 2014 (2014-07-31), pages 247 - 253 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989050A (en) * 2021-03-31 2021-06-18 建信金融科技有限责任公司 Table classification method, device, equipment and storage medium
CN112989050B (en) * 2021-03-31 2023-05-30 建信金融科技有限责任公司 Form classification method, device, equipment and storage medium
CN113704458A (en) * 2021-10-29 2021-11-26 江铃汽车股份有限公司 Vehicle instrument character display method, system, storage medium and equipment

Similar Documents

Publication Publication Date Title
CN108038183B (en) Structured entity recording method, device, server and storage medium
CN109189942B (en) Construction method and device of patent data knowledge graph
CN109657054B (en) Abstract generation method, device, server and storage medium
US10025819B2 (en) Generating a query statement based on unstructured input
US20150310096A1 (en) Comparing document contents using a constructed topic model
CN110717049A (en) Text data-oriented threat information knowledge graph construction method
US20150081277A1 (en) System and Method for Automatically Classifying Text using Discourse Analysis
EP3832488A2 (en) Method and apparatus for generating event theme, device and storage medium
TW202020691A (en) Feature word determination method and device and server
CN113158653B (en) Training method, application method, device and equipment for pre-training language model
US20100023505A1 (en) Search method, similarity calculation method, similarity calculation, same document matching system, and program thereof
US11321580B1 (en) Item type discovery and classification using machine learning
JP2023519049A (en) Method and apparatus for obtaining POI status information
CN114722137A (en) Security policy configuration method and device based on sensitive data identification and electronic equipment
CN112069324A (en) Classified label adding method, device, equipment and storage medium
CN111259262A (en) Information retrieval method, device, equipment and medium
CN115017425B (en) Location search method, location search device, electronic device, and storage medium
CN113282754A (en) Public opinion detection method, device, equipment and storage medium for news events
US20170140010A1 (en) Automatically Determining a Recommended Set of Actions from Operational Data
CN112256765A (en) Data mining method, system and computer readable storage medium
CN116776881A (en) Active learning-based domain entity identification system and identification method
CN109446318A (en) A kind of method and relevant device of determining auto repair document subject matter
CN115129913A (en) Sensitive word mining method and device, equipment and medium thereof
CN114528908A (en) Network request data classification model training method, classification method and storage medium
CN113505117A (en) Data quality evaluation method, device, equipment and medium based on data indexes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination