CN115827864A - Processing method for automatic classification of bulletins - Google Patents

Processing method for automatic classification of bulletins Download PDF

Info

Publication number
CN115827864A
CN115827864A CN202211554771.6A CN202211554771A CN115827864A CN 115827864 A CN115827864 A CN 115827864A CN 202211554771 A CN202211554771 A CN 202211554771A CN 115827864 A CN115827864 A CN 115827864A
Authority
CN
China
Prior art keywords
classification
classified
word bank
bulletin
announcement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211554771.6A
Other languages
Chinese (zh)
Inventor
赵晶晶
朱玥
孙超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qichacha Technology Co ltd
Original Assignee
Qichacha Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qichacha Technology Co ltd filed Critical Qichacha Technology Co ltd
Priority to CN202211554771.6A priority Critical patent/CN115827864A/en
Publication of CN115827864A publication Critical patent/CN115827864A/en
Pending legal-status Critical Current

Links

Images

Abstract

The application relates to a processing method for automatic classification of announcements. The method comprises the following steps: acquiring a notice to be classified, wherein the notice to be classified comprises a notice title and a notice text; determining a first category of the bulletins to be classified, and determining a target word bank corresponding to the bulletins to be classified in a classification configuration table according to the first category, wherein word banks corresponding to preset bulletins of different categories are stored in the classification configuration table, and each word bank comprises a keyword word bank and an irrelevant word bank; matching the announcement titles of the announcement texts to be classified by using the target word bank, and judging whether the announcement titles accord with classification rules, wherein the classification rules comprise that the announcement titles comprise keywords in at least one keyword word bank and do not comprise irrelevant words in the irrelevant word bank; and if the judgment result is that the classification rule is met, determining the classification result of the bulletin text to be classified based on the matched keywords in the target word bank. A fast classification of the announcements can be achieved.

Description

Processing method for automatic classification of bulletins
Technical Field
The application relates to the technical field of natural language processing, in particular to a processing method for automatic classification of announcements.
Background
With the development of the market, the announcements become important information distribution channels, and therefore, the demands of users for the accuracy of the categories to which the announcements belong are higher and higher. However, at present, the related industries of all large companies are wide in range, the types of the bulletins are messy, the content is more, and users cannot quickly screen the bulletins in the field to which the users belong.
In the related technology, the text content of the bulletin is extracted, the text content is analyzed, words with high occurrence frequency are extracted, and automatic classification is achieved. However, the announcement content is more text and more extensive, and the error rate is higher using the automated classification method.
Disclosure of Invention
Therefore, it is necessary to provide a processing method capable of automatically classifying the bulletins, which can determine the first category of the bulletins to be classified, and further classify the bulletin titles and the bulletin texts according to the keyword lexicon and the irrelevant word lexicon to obtain the category of the bulletins to be classified.
In a first aspect, the present application provides a method for processing automatic classification of announcements. The method comprises the following steps:
acquiring a notice to be classified, wherein the notice to be classified comprises a notice title and a notice text; determining a first category of the bulletins to be classified, and determining a target word bank corresponding to the bulletins to be classified in a classification configuration table according to the first category, wherein word banks corresponding to preset bulletins of different categories are stored in the classification configuration table, and each word bank comprises a keyword word bank and an irrelevant word bank;
matching the announcement titles of the announcement texts to be classified by using the target word bank, and judging whether the announcement titles accord with classification rules, wherein the classification rules comprise that the announcement titles comprise keywords in at least one keyword word bank and do not comprise irrelevant words in the irrelevant word bank;
and if the judgment result is that the classified text accords with the classification rule, determining the classification result of the announcement text to be classified based on the matched keywords in the target word bank.
In one embodiment, the method further comprises:
if the judgment result does not accord with the classification rule, matching the bulletin text of the bulletin text to be classified by using the target word bank, and judging whether the bulletin text accords with the classification rule, wherein the classification rule comprises that the bulletin text comprises keywords in at least one keyword word bank and does not comprise irrelevant words in the irrelevant word bank;
and if the judgment result is that the classification rule is met, determining the classification result of the bulletin text to be classified based on the matched keywords in the target word bank.
In one embodiment, if the determination result does not accord with the classification rule, the word segmentation processing is performed again, and a corresponding target word bank is obtained according to the classification configuration table.
In one embodiment, the word segmentation processing includes combining a single character in the announcement title or the announcement body or at least two adjacent characters in position to obtain a combined candidate word, and determining the category of the announcement to be classified according to the target word bank.
In one embodiment, the first category of the to-be-classified bulletin includes at least one, and the category of the to-be-classified bulletin is obtained according to the matching degree of the to-be-classified bulletin and the classification configuration table.
In a second aspect, the application further provides a processing apparatus for automatic classification of announcements. The device comprises:
the system comprises an acquisition module, a classification module and a classification module, wherein the acquisition module is used for acquiring the bulletin to be classified, and the bulletin to be classified comprises a bulletin title and a bulletin text;
the classification module is used for determining a first category of the bulletin to be classified, and determining a target word bank corresponding to the bulletin to be classified in a classification configuration table according to the first category, wherein word banks corresponding to preset bulletins of different categories are stored in the classification configuration table, and each word bank comprises a keyword word bank and an irrelevant word bank;
the judging module is used for matching the announcement titles of the announcement texts to be classified by using the target word bank and judging whether the announcement titles accord with the classification rules, wherein the classification rules comprise that the announcement titles comprise keywords in at least one keyword word bank and do not comprise irrelevant words in the irrelevant word bank;
and the matching module is used for determining the classification result of the bulletin text to be classified based on the matched keywords in the target word bank if the judgment result is that the bulletin text conforms to the classification rule.
In one embodiment, the apparatus further comprises:
if the judgment result does not accord with the classification rule, matching the announcement text of the announcement text to be classified by using the target word bank, and judging whether the announcement text accords with the classification rule or not, wherein the classification rule comprises that the announcement text comprises keywords in at least one keyword word bank and does not comprise irrelevant words in the irrelevant word bank;
and if the judgment result is that the classification rule is met, determining the classification result of the bulletin text to be classified based on the matched keywords in the target word bank.
In one embodiment, if the determination result does not accord with the classification rule, the word segmentation processing is performed again, and a corresponding target word bank is obtained according to the classification configuration table.
In one embodiment, the word segmentation processing includes combining a single character in the announcement title or the announcement body or at least two adjacent characters in position to obtain a combined candidate word, and determining the category of the announcement to be classified according to the target word bank.
In one embodiment, the first category of the to-be-classified bulletin includes at least one, and the category of the to-be-classified bulletin is obtained according to the matching degree of the to-be-classified bulletin and the classification configuration table.
In a third aspect, the present disclosure also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the processing method of the automated classification of the announcements when executing the computer program.
In a fourth aspect, the present disclosure also provides a computer-readable storage medium. The computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the processing method of the automated classification of announcements.
In a fifth aspect, the present disclosure also provides a computer program product. The computer program product comprises a computer program which, when being executed by a processor, carries out the steps of the processing method of automated classification of announcements.
The processing method for automatically classifying the announcements at least comprises the following beneficial effects:
according to the embodiment scheme provided by the disclosure, the first category of the to-be-classified announcement can be determined. Segmenting the announcement title according to the word stock to obtain related keywords and unrelated words, if the related keywords are not obtained or the unrelated words are included, performing word segmentation on the announcement text, and obtaining the category of the announcement to be classified according to the classification configuration table.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or technical solutions in the conventional technologies, the drawings used in the description of the embodiments or conventional technologies will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a diagram of an application environment for a process for advertising automated classification in one embodiment;
FIG. 2 is a flow diagram that illustrates a processing method for automated classification of announcements, according to one embodiment;
FIG. 3 is a diagram of a classification configuration table in one embodiment;
FIG. 4 is a flowchart illustrating a processing method of automated classification of announcements in another embodiment;
FIG. 5 is a block diagram of a processing device for automated classification of announcements in one embodiment;
FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment;
FIG. 7 is an internal block diagram of a server in one embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded. For example, if the terms first, second, etc. are used to denote names, they do not denote any particular order.
The embodiment of the disclosure provides a processing method for automatic classification of announcements, which can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.
In some embodiments of the present disclosure, as shown in fig. 2, a method for automatically classifying a bulletin is provided, which is described by taking the method as an example of being applied to the server in fig. 1 to process bulletin to be classified. It is understood that the method can be applied to a server, and can also be applied to a system comprising a terminal and a server, and is realized through the interaction of the terminal and the server. In a specific embodiment, the method may include the steps of:
s202: and acquiring the bulletin to be classified, wherein the bulletin to be classified comprises a bulletin title and a bulletin text.
The announcement to be classified can comprise an announcement title and an announcement text. The title and the text of the bulletin can include keywords, related keywords and unrelated words. The keywords, related keywords and unrelated words can be at least composed of one character, the keyword thesaurus stores the similar meaning words, the synonyms and the like of the keywords, namely the related keywords, and the unrelated word thesaurus stores the words which are not contained in the bulletin titles or the bulletin texts. For example, the category of the to-be-classified bulletin is an annual financial report, and generally, the target lexicon corresponding to the classification configuration table may not include words such as a semi-annual financial report and a quarterly financial report, so the keyword lexicon of the first category of the annual financial report may store words such as annual financial reports and quarterly financial reports, and the irrelevant word lexicon may store words such as a semi-annual financial report and a quarterly financial report.
S204: determining a first category of the bulletins to be classified, and determining a target word bank corresponding to the bulletins to be classified in a classification configuration table according to the first category, wherein word banks corresponding to preset bulletins of different categories are stored in the classification configuration table, and each word bank comprises a keyword word bank and an irrelevant word bank.
The first category of the bulletin to be classified may be determined according to methods such as original classification or manual classification, the original classification may be a classification system formed in a general classification manner, and may include some misclassifications or some inaccurate classifications, and some categories that do not meet rules may be deleted or categories that are not accurate enough may be reclassified according to the original classification, so that the first category of the bulletin to be classified may be obtained. The classification configuration table may store word banks corresponding to different types of pre-set announcements. The announcement titles can be participled according to a word bank, and the word bank can comprise a keyword word bank and an irrelevant word bank. The word segmentation process can determine the character length range of the segmented words according to the character length of the word stock. For example, if it is determined that the longest character group of the word bank to which the first category of the bulletin to be classified belongs is 5 characters, the bulletin to be classified can be divided into 5 characters at most when the word segmentation is performed, which is beneficial to improving the classification speed.
S206: and matching the announcement titles of the announcement texts to be classified by using the target word bank, and judging whether the announcement titles accord with classification rules, wherein the classification rules comprise that the announcement titles comprise keywords in at least one keyword word bank and do not comprise irrelevant words in the irrelevant word bank.
Performing word segmentation processing on the bulletin title according to the target word bank to obtain a first candidate word, where the first candidate word may include three cases: including keywords, not including irrelevant words; including keywords, including unrelated words; no keywords are included. If the first candidate word comprises an irrelevant word or does not comprise a keyword, a classification result cannot be obtained according to the title to be classified.
In some embodiments of the present disclosure, if the first candidate word includes a keyword and does not include an irrelevant word, a classification result of the to-be-classified bulletin may be obtained according to the bulletin title. If the first candidate word comprises the keyword and comprises the irrelevant word, the classification result of the bulletin to be classified cannot be obtained. If the first candidate word does not include the keyword, no matter whether the first candidate word includes the irrelevant word, the classification result of the bulletin to be classified cannot be obtained.
S208: and if the judgment result is that the classified text accords with the classification rule, determining the classification result of the announcement text to be classified based on the matched keywords in the target word bank.
If the first candidate word meets the classification rule, that is, the first candidate word comprises the keyword and does not comprise the irrelevant word, the classification result of the bulletin text to be classified can be determined according to the matched keyword in the target word bank.
In the method for processing automatic classification of bulletins, a first category of the bulletins to be classified can be determined, and then a target lexicon corresponding to the bulletins to be classified in a classification configuration table is determined, wherein the lexicon comprises a keyword lexicon and an irrelevant word lexicon, the bulletin titles of the bulletins to be classified can be matched according to the target lexicon, and if the matching judgment result of the target lexicon and the bulletin titles of the bulletins to be classified is in accordance with the classification rule, the classification result of the bulletin texts to be classified can be determined based on the matched keywords in the target lexicon.
In some embodiments of the present disclosure, the method further comprises:
if the judgment result does not accord with the classification rule, matching the bulletin text of the bulletin text to be classified by using the target word bank, and judging whether the bulletin text accords with the classification rule, wherein the classification rule comprises that the bulletin text comprises keywords in at least one keyword word bank and does not comprise irrelevant words in the irrelevant word bank;
and if the judgment result is that the classification rule is met, determining the classification result of the bulletin text to be classified based on the matched keywords in the target word bank.
And if the matching judgment result of the target word bank and the announcement titles of the to-be-classified announcement texts does not accord with the classification rule, performing word segmentation on the announcement texts according to the word bank to obtain second candidate words, and judging the to-be-classified announcements again. The second candidate word may include a keyword and an unrelated word.
FIG. 3 is a diagram illustrating a classification configuration table in one embodiment. The classification configuration table comprises a plurality of word banks, and the word banks can comprise keyword word banks and irrelevant word banks.
In some embodiments of the present disclosure, if the text of the bulletin includes the related keywords but does not include the unrelated words, the text of the bulletin conforms to the target word bank, and the category of the bulletin to be classified is obtained. If the bulletin text comprises the keywords and comprises the irrelevant words, the category of the bulletin to be classified cannot be obtained according to the bulletin text.
For example, when the first category of the bulletin to be classified is an important report according to the classification configuration table, the corresponding target thesaurus may be found, and possibly, the first classification result may further correspond to a plurality of second classification results, and the second classification result may be located according to the keyword thesaurus, and if the second classification result is an annual report, the irrelevant words may include half-year, entrusted, and the like, thereby improving the classification accuracy.
In some embodiments of the present disclosure, if the determination result does not meet the classification rule, the word segmentation process is performed again, and a corresponding target lexicon is obtained according to the classification configuration table.
The word segmentation processing may include combining a single character in the bulletin title and the bulletin body text or at least two characters adjacent in position to obtain a combined candidate word, and determining a category of the candidate word according to a word bank. If the judgment result does not accord with the classification rule, the word segmentation processing can be carried out on the bulletin title and the bulletin text again, and the classification accuracy is improved.
In some embodiments of the present disclosure, the word segmentation processing includes combining a single character in a bulletin title, a bulletin body text, or at least two adjacent characters in position to obtain a combined candidate word, and determining a category of the bulletin to be classified according to the target lexicon.
When the word segmentation is carried out, the character length range of the segmentation can be determined according to the character length of the word stock, the minimum segmentation result is a single character, and the maximum segmentation result is the longest character range in the word stock corresponding to the category determined according to the classification configuration table. And the adjacent pairs of characters can be combined in the word segmentation process, so that the accuracy of the recognition pair is improved.
Fig. 4 is a flow diagram illustrating a processing method for advertising automatic classification in one embodiment. The target word bank corresponding to the first category of the bulletin to be classified can be determined, and then the title to be classified and the text to be classified are classified according to the keyword word bank and the irrelevant word bank, so that the category of the bulletin to be classified is obtained.
In some embodiments of the present disclosure, the first category of the to-be-classified advertisement includes at least one, and the category of the to-be-classified advertisement is obtained according to the matching degree between the to-be-classified advertisement and the classification configuration table.
When the first category of the bulletin to be classified is determined, a plurality of first categories may be obtained, so that the judgment can be performed according to the word banks corresponding to the plurality of first categories when the classification processing is performed. And obtaining the best matched target word bank according to the classification configuration table rule of the word bank. The classification configuration table rules can comprise keyword rules and irrelevant word rules, and the categories of the bulletins to be classified can be obtained according to the matching degree of the bulletins to be classified and the classification configuration table rules. The degree of matching can be determined based on the related keywords and the unrelated words. For example, when the word segmentation processing is performed, if one first category of the word segmentation processing includes an irrelevant word, the judgment is performed according to the other first categories. If none of the plurality of first categories contains irrelevant words, the classification result of the bulletin to be classified can be determined according to the number of the keywords.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present disclosure further provides a processing apparatus for automatically classifying the announcements, which is used for implementing the above processing method for automatically classifying the announcements. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the method, so specific limitations in the following embodiment of the processing device for automatically classifying the announcements can be referred to the limitations on the processing method for automatically classifying the announcements, and details are not repeated here.
The apparatus may include systems (including distributed systems), software (applications), modules, components, servers, clients, etc. that use the methods described in embodiments of the present specification in conjunction with any necessary apparatus to implement the hardware. Based on the same innovative concept, the embodiments of the present disclosure provide an apparatus in one or more embodiments as described in the following embodiments. Since the implementation scheme of the apparatus for solving the problem is similar to that of the method, the specific implementation of the apparatus in the embodiment of the present specification may refer to the implementation of the foregoing method, and repeated details are not repeated. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
In one embodiment, as shown in fig. 5, a processing apparatus 500 for automated classification of announcements is provided, and the apparatus may be the aforementioned server, or a module, component, device, unit, etc. integrated with the server. The apparatus 500 may comprise:
an obtaining module 502, configured to obtain a to-be-classified bulletin, where the to-be-classified bulletin includes a bulletin title and a bulletin text;
a classification module 504, configured to determine a first category of the to-be-classified bulletin, and determine, according to the first category, a target thesaurus corresponding to the to-be-classified bulletin in a classification configuration table, where word libraries corresponding to preset different categories of bulletins are stored in the classification configuration table, where the word libraries include a keyword thesaurus and an irrelevant word thesaurus;
a judging module 506, configured to match the announcement title of the announcement text to be classified by using the target thesaurus, and judge whether the announcement title meets a classification rule, where the classification rule includes that the announcement title includes at least one keyword in a keyword thesaurus and does not include an unrelated word in the unrelated word thesaurus;
a matching module 508, configured to determine a classification result of the to-be-classified advertisement text based on the matched keywords in the target lexicon if the determination result is that the classification rule is met.
In one embodiment, if the judgment result does not accord with the classification rule, matching the announcement text of the announcement text to be classified by using the target word bank, and judging whether the announcement text accords with the classification rule, wherein the classification rule comprises that the announcement text comprises keywords in at least one keyword word bank and does not comprise irrelevant words in the irrelevant word bank;
and if the judgment result is that the classification rule is met, determining the classification result of the bulletin text to be classified based on the matched keywords in the target word bank.
In one embodiment, if the determination result does not meet the classification rule, performing word segmentation again, and obtaining a corresponding target word bank according to the classification configuration table.
In one embodiment, the word segmentation processing includes combining a single character in a bulletin title, a bulletin body text or at least two adjacent characters in position to obtain a combined candidate word, and determining the category of the bulletin to be classified according to the target word bank.
In one embodiment, the first category of the to-be-classified bulletin includes at least one category, and the category of the to-be-classified bulletin is obtained according to the matching degree of the to-be-classified bulletin and the classification configuration table.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
The various modules in the processing device for automated classification of announcements described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing the categories of the announcements to be classified. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize a mass keyword searching method based on finite automata.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 7. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a processing method for automated classification of announcements. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the configurations shown in fig. 6 and 7 are merely block diagrams of partial configurations relevant to the present disclosure, and do not constitute a limitation on the computer apparatus to which the present disclosure may be applied, and that a particular computer apparatus may include more or fewer components than those shown in the figures, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, implements the method of any of the embodiments of the present disclosure.
In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method of any of the embodiments of the present disclosure.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, databases, or other media used in the embodiments provided by the present disclosure may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), magnetic Random Access Memory (MRAM), ferroelectric Random Access Memory (FRAM), phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases involved in embodiments provided by the present disclosure may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided in this disclosure may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic, quantum computing based data processing logic, etc., without limitation.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present disclosure, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present disclosure. It should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the concept of the present disclosure, and these changes and modifications are all within the scope of the present disclosure. Therefore, the protection scope of the present disclosure should be subject to the appended claims.

Claims (13)

1. A method for automated classification of a publication, the method comprising:
acquiring a notice to be classified, wherein the notice to be classified comprises a notice title and a notice text; determining a first category of the bulletins to be classified, and determining a target word bank corresponding to the bulletins to be classified in a classification configuration table according to the first category, wherein word banks corresponding to preset bulletins of different categories are stored in the classification configuration table, and each word bank comprises a keyword word bank and an irrelevant word bank;
matching the announcement titles of the announcement texts to be classified by using the target word bank, and judging whether the announcement titles accord with classification rules, wherein the classification rules comprise that the announcement titles comprise keywords in at least one keyword word bank and do not comprise irrelevant words in the irrelevant word bank;
and if the judgment result is that the classification rule is met, determining the classification result of the bulletin text to be classified based on the matched keywords in the target word bank.
2. The method of claim 1, further comprising:
if the judgment result does not accord with the classification rule, matching the bulletin text of the bulletin text to be classified by using the target word bank, and judging whether the bulletin text accords with the classification rule, wherein the classification rule comprises that the bulletin text comprises keywords in at least one keyword word bank and does not comprise irrelevant words in the irrelevant word bank;
and if the judgment result is that the classification rule is met, determining the classification result of the bulletin text to be classified based on the matched keywords in the target word bank.
3. The method of claim 1, wherein if the determination result does not meet the classification rule, performing segmentation again to obtain a corresponding target thesaurus according to the classification configuration table.
4. The method according to claim 3, wherein the word segmentation processing includes combining a single character in the announcement title, the announcement body or at least two adjacent characters in position to obtain a combined candidate word, and determining the category of the announcement to be classified according to the target lexicon.
5. The method according to claim 1, wherein the first category of the to-be-classified advertisement includes at least one category, and the category of the to-be-classified advertisement is obtained according to a matching degree between the to-be-classified advertisement and the classification configuration table.
6. A processing apparatus for automated classification of announcements, the apparatus comprising:
the system comprises an acquisition module, a classification module and a classification module, wherein the acquisition module is used for acquiring the bulletin to be classified, and the bulletin to be classified comprises a bulletin title and a bulletin text;
the classification module is used for determining a first category of the bulletin to be classified, and determining a target word bank corresponding to the bulletin to be classified in a classification configuration table according to the first category, wherein word banks corresponding to preset bulletins of different categories are stored in the classification configuration table, and each word bank comprises a keyword word bank and an irrelevant word bank;
the judging module is used for matching the announcement titles of the announcement texts to be classified by using the target word bank and judging whether the announcement titles accord with the classification rules, wherein the classification rules comprise that the announcement titles comprise keywords in at least one keyword word bank and do not comprise irrelevant words in the irrelevant word bank;
and the matching module is used for determining the classification result of the bulletin text to be classified based on the matched keywords in the target word bank if the judgment result is that the bulletin text conforms to the classification rule.
7. The apparatus of claim 6, further comprising:
if the judgment result does not accord with the classification rule, matching the bulletin text of the bulletin text to be classified by using the target word bank, and judging whether the bulletin text accords with the classification rule, wherein the classification rule comprises that the bulletin text comprises keywords in at least one keyword word bank and does not comprise irrelevant words in the irrelevant word bank;
and if the judgment result is that the classification rule is met, determining the classification result of the bulletin text to be classified based on the matched keywords in the target word bank.
8. The apparatus of claim 6, wherein if the determination result does not meet the classification rule, performing segmentation again to obtain a corresponding target lexicon according to the classification configuration table.
9. The apparatus of claim 8, wherein the word segmentation process includes combining a bulletin title, a single character in a bulletin body, or at least two adjacent characters, to obtain a combined candidate word, and determining the category of the bulletin to be classified according to the target lexicon.
10. The apparatus of claim 6, wherein the first category of the to-be-classified advertisement includes at least one category, and the category of the to-be-classified advertisement is obtained according to a matching degree between the to-be-classified advertisement and the classification configuration table.
11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 5.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
13. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 5 when executed by a processor.
CN202211554771.6A 2022-12-06 2022-12-06 Processing method for automatic classification of bulletins Pending CN115827864A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211554771.6A CN115827864A (en) 2022-12-06 2022-12-06 Processing method for automatic classification of bulletins

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211554771.6A CN115827864A (en) 2022-12-06 2022-12-06 Processing method for automatic classification of bulletins

Publications (1)

Publication Number Publication Date
CN115827864A true CN115827864A (en) 2023-03-21

Family

ID=85545168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211554771.6A Pending CN115827864A (en) 2022-12-06 2022-12-06 Processing method for automatic classification of bulletins

Country Status (1)

Country Link
CN (1) CN115827864A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737926A (en) * 2023-06-07 2023-09-12 北京天融信网络安全技术有限公司 Method, device, equipment and storage medium for classifying threat information text

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737926A (en) * 2023-06-07 2023-09-12 北京天融信网络安全技术有限公司 Method, device, equipment and storage medium for classifying threat information text

Similar Documents

Publication Publication Date Title
US9418144B2 (en) Similar document detection and electronic discovery
US10565273B2 (en) Tenantization of search result ranking
US20220114199A1 (en) System and method for information recommendation
US10725981B1 (en) Analyzing big data
US10438133B2 (en) Spend data enrichment and classification
CN109885773B (en) Personalized article recommendation method, system, medium and equipment
CN109190044A (en) Personalized recommendation method, device, server and medium
US20130060769A1 (en) System and method for identifying social media interactions
CN115795000A (en) Joint similarity algorithm comparison-based enclosure identification method and device
Naude Comparing downloads, Mendeley readership and Google Scholar citations as indicators of article performance
CN115827864A (en) Processing method for automatic classification of bulletins
CN111310065A (en) Social contact recommendation method and device, server and storage medium
US9286349B2 (en) Dynamic search system
US20210026889A1 (en) Accelerated large-scale similarity calculation
CN113011153B (en) Text correlation detection method, device, equipment and storage medium
CN112732891A (en) Office course recommendation method and device, electronic equipment and medium
CN113157964A (en) Method and device for searching data set through voice and electronic equipment
CN110781365A (en) Commodity searching method, device and system and electronic equipment
CN115879466A (en) Data deduplication method and device, and computer equipment
US20180011920A1 (en) Segmentation based on clustering engines applied to summaries
CN114756654A (en) Dynamic place name and address matching method and device, computer equipment and storage medium
CN115952419A (en) Method, device, equipment and medium for processing training data
CN117130708A (en) Application program language switching method, device, computer equipment and storage medium
CN116881544A (en) Financial product information pushing method, device, computer equipment and storage medium
CN116702024A (en) Method, device, computer equipment and storage medium for identifying type of stream data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination