CN115994534A - Government scene hot word mining method, device, equipment and storage medium - Google Patents

Government scene hot word mining method, device, equipment and storage medium Download PDF

Info

Publication number
CN115994534A
CN115994534A CN202211656838.7A CN202211656838A CN115994534A CN 115994534 A CN115994534 A CN 115994534A CN 202211656838 A CN202211656838 A CN 202211656838A CN 115994534 A CN115994534 A CN 115994534A
Authority
CN
China
Prior art keywords
hotword
hot
preset
words
hot word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211656838.7A
Other languages
Chinese (zh)
Inventor
汪永清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202211656838.7A priority Critical patent/CN115994534A/en
Publication of CN115994534A publication Critical patent/CN115994534A/en
Pending legal-status Critical Current

Links

Images

Abstract

The disclosure provides a government scene hot word mining method, device, equipment and storage medium, relates to the technical field of artificial intelligence, in particular to the technical fields of data analysis, text recognition and the like, and can be applied to scenes such as prompt of a follow-up, public opinion situation awareness and the like. The specific implementation scheme comprises the following steps: acquiring words contained in the worksheet data; determining a first keyword with word frequency meeting the preset frequency requirement; determining second keywords with similarity meeting the preset similarity requirement with the first keywords in a preset corpus; clustering the first keywords and the second keywords to obtain a hot word clustering result, wherein the hot word clustering result comprises hot words, and the hot words in the hot word clustering result are synonyms; and generating a hotword vocabulary according to the hotword clustering result. According to the method and the device, the hot word list can be generated by intelligently mining the hot words according to the work order data, the hot word list can be updated in time according to the work order data, and the quantity and types of the hot words in the hot word list are rich.

Description

Government scene hot word mining method, device, equipment and storage medium
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of data analysis, text recognition and the like, and particularly relates to a government scene hot word mining method, device, equipment and storage medium.
Background
Hot words are abbreviated as hot words, and reflect a class of problems and things which are generally concerned by people in a region and a period, or reflect hot topics and civil problems in a period. In a government affair scene, hot words are analyzed and mined, hot topics in cities, demand hot spots of residents and core problems occurring recently can be found in time, and therefore government affair service quality is improved.
Currently, the way to perform hot word mining on worksheets of government affairs generally includes: the method comprises the steps of determining keywords in work order data through manual word segmentation and labeling, and manually screening the keywords to obtain a manually configured hot word list, wherein the hot word list comprises one or more hot words obtained through manual excavation.
Disclosure of Invention
The disclosure provides a government affair scene hot word mining method, device, equipment and storage medium, which can intelligently mine hot words according to worksheets to generate hot word lists, wherein the hot word lists can be updated in time according to worksheets, the quantity and types of the hot words in the hot word lists are rich, and more effective data support can be provided for hot word application in government affairs scenes.
According to a first aspect of the present disclosure, there is provided a government scene hot word mining method, the method including:
acquiring words with degrees of freedom and solidification meeting preset conditions in texts contained in worksheets in government affair scenes; determining a first keyword with word frequency meeting the preset frequency requirement in the words; determining second keywords with similarity meeting the preset similarity requirement between the second keywords in preset words contained in a preset corpus; clustering the first keywords and the second keywords to obtain at least one hot word clustering result, wherein each hot word clustering result comprises at least one hot word, the hot words included in each hot word clustering result are synonyms, and the hot words are the first keywords or the second keywords; and generating a hotword vocabulary according to the hotword clustering result, wherein the hotword vocabulary comprises hotwords and synonyms of the hotwords.
According to a second aspect of the present disclosure, there is provided a government affair scene hot word mining apparatus, the apparatus comprising:
the acquiring unit is used for acquiring words with the degrees of freedom and the degree of solidification meeting preset conditions in texts contained in the worksheet data in the government affair scene; the screening unit is used for determining a first keyword with word frequency meeting the preset frequency requirement in the words; the recall unit is used for determining second keywords, the similarity of which meets the preset similarity requirement, from preset words contained in a preset corpus; the mining unit is used for clustering the first keywords and the second keywords to obtain at least one hot word clustering result, wherein each hot word clustering result comprises at least one hot word, the hot words included in each hot word clustering result are synonyms, and the hot words are the first keywords or the second keywords; and the hot word configuration unit is used for generating a hot word list according to the hot word clustering result, wherein the hot word list comprises hot words and synonyms of the hot words.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method according to the first aspect.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to the first aspect.
According to a sixth aspect of the present disclosure, there is provided a hotword system comprising: the system comprises a hotword analysis module, an elastic search database and a service module; the hotword parsing module obtains hotword vocabularies according to the method of the first aspect; the elastic search database is used for storing a hot word list; the service module is connected with the elastic search database and comprises at least one hotword application program interface.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a flow chart of a government scene hot word mining method provided by an embodiment of the disclosure;
fig. 2 is another flow chart of a government scene hot word mining method according to an embodiment of the disclosure;
fig. 3 is a schematic configuration flow chart of a hotword blacklist according to an embodiment of the disclosure;
fig. 4 is a schematic flow chart of a government scenario hot word mining method according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of the composition of a hotword system provided by embodiments of the present disclosure;
fig. 6 is a schematic diagram of a government affair scene hot word mining device according to an embodiment of the disclosure;
fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be appreciated that in embodiments of the present disclosure, the character "/" generally indicates that the context associated object is an "or" relationship. The terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.
Hot words are abbreviated as hot words, and reflect a class of problems and things which are generally concerned by people in a region and a period, or reflect hot topics and civil problems in a period. In a government affair scene, hot words are analyzed and mined, hot topics in cities, demand hot spots of residents and core problems occurring recently can be found in time, and therefore government affair service quality is improved.
For example, hot word mining can be performed on a government work order of public complaints, so that the complaint hot spots of residents and core problems occurring recently can be known in time.
Currently, the way to perform hot word mining on worksheets of government affairs generally includes: the method comprises the steps of determining keywords in work order data through manual word segmentation and labeling, and manually screening the keywords to obtain a manually configured hot word list, wherein the hot word list comprises one or more hot words obtained through manual excavation.
However, the current hot word list cannot be updated timely by the work data at any time, and the number and types of hot words in the manually configured hot word list are limited and not comprehensive enough.
The utility model provides a government affair scene hot word mining method, which can intelligently mine hot words according to the work order data to generate a hot word list, wherein the hot word list can be updated in time according to the work order data, the quantity and the types of the hot words in the hot word list are rich, and more effective data support can be provided for hot word application in the government affair scene.
For example, according to the government affair scene hotword mining method provided by the embodiment of the disclosure, richer hotword data can be provided, and government affair service capability is improved.
The subject of the method may be a computer or a server, or may be other devices with data processing capabilities, for example. The subject of execution of the method is not limited herein.
In some embodiments, the server may be a single server, or may be a server cluster formed by a plurality of servers. In some implementations, the server cluster may also be a distributed cluster. The present disclosure is not limited to a specific implementation of the server.
The government affair scene hot word mining method is exemplified below.
Fig. 1 is a flow chart of a government scenario hot word mining method provided in an embodiment of the present disclosure. As shown in FIG. 1, the government scene hot word mining method may include:
s101, acquiring words with degrees of freedom and solidification meeting preset conditions in texts contained in worksheets in government affair scenes.
The worksheet data may be in the 12345 government view, or may be worksheet data in other government views, as examples and not limited thereto.
The degree of solidification and the degree of freedom may represent a degree of association between two words. The solidification degree can be obtained through the probability of adjacent word combination fragments, and represents the tightness degree between words in one word combination fragment. The degree of freedom can be obtained through the information entropy of the adjacent word combination fragments, and represents the fixed degree between one word combination fragment and the adjacent word combination fragment.
For example, for a piece of text in the work order data, words in the text may be combined in a sequential relationship to obtain multiple word combination fragments. For each word-combining segment, the degree of freedom and the degree of solidification of the word-combining segment can be obtained. And when the degree of freedom and the degree of solidification of the word combination segment meet preset conditions, the word is obtained.
The degree of freedom and the degree of solidification of the word combination segment meeting the preset conditions may include: the degree of freedom of the word combination segment is smaller than a degree of freedom threshold, and the degree of solidification is greater than a degree of solidification threshold, which may be artificially set values, and the degree of freedom threshold and the degree of solidification threshold are not limited here.
In other words, S101 may determine which words are included in the text included in the work order data.
S102, determining a first keyword with word frequency meeting a preset frequency requirement in words.
For example, the preset frequency requirement may include that the word frequency is greater than the frequency threshold, or that the word frequency is highest and meets a preset number or a preset proportion, etc., where the preset frequency requirement (including the magnitude of the frequency threshold, the magnitude of the preset number or the preset proportion, etc.) is not limited.
For example, 100 words are obtained in S101, and the words with word frequency greater than the frequency threshold value in the 100 words are 20, and then the 20 words may be used as the first keyword.
For another example, 100 words are obtained in S101, and 20 words with highest word frequency (for example, a preset number is 20 or a preset proportion is 20%) in the 100 words may be used as the first keyword.
S103, determining second keywords with similarity meeting the preset similarity requirement with the first keywords from preset words contained in a preset corpus.
The preset corpus may be a full-volume corpus, and the preset corpus may include a large number of existing words. The word sources of the predetermined corpus are not limited herein.
Illustratively, the predetermined similarity requirement may include that the similarity is greater than a predetermined similarity threshold, e.g., the similarity threshold may be 90%, 95%, etc., and the size of the similarity threshold is not limited.
Optionally, in some implementations, in S103, it may be determined whether the similarity between the first keyword and the preset word included in the preset corpus meets the preset similarity requirement by calculating cosine similarity between the first keyword word vector and the word vector of the preset word included in the preset corpus.
In other implementations, the similarity between the first keyword and the preset word included in the preset corpus may also be calculated through other similarity algorithms, which is not limited by the disclosure.
S104, clustering the first keywords and the second keywords to obtain at least one hot word clustering result, wherein each hot word clustering result comprises at least one hot word, the hot words included in each hot word clustering result are synonyms, and the hot words are the first keywords or the second keywords.
For example, the first keyword and the second keyword may be hot words, for example, the first keyword includes 50 words, the second keyword includes 200 words, and then the foregoing 250 words are hot words.
In S104, the first keyword and the second keyword may be clustered, and the clustering algorithm may be a k-means clustering algorithm, or other clustering algorithms, which is not limited herein. After the first keyword and the second keyword are clustered, one or more clustering results, which may be referred to as hot word clustering results, may be obtained.
One or more hotwords may be included in each hotword cluster result. For each hotword clustering result, the hotwords contained in the hotword clustering result are similar words. In the embodiment of the disclosure, the paraphrasing may refer to two words with similar semantics or higher word vector similarity.
It may be understood that, for each hotword clustering result, the hotword included in the hotword clustering result may be the first keyword or the second keyword.
S105, generating a hotword list according to the hotword clustering result, wherein the hotword list comprises hotwords and synonyms of the hotwords.
For example, the hotword vocabulary may include all the words in the first keyword and the second keyword, each word is a hotword, and for each hotword, a synonym list of the hotword may be maintained according to the hotword clustering result. Each hotword and the synonyms of the hotword can be queried according to the hotword vocabulary.
According to the method and the device, the words with the degrees of freedom and the degrees of solidification meeting the preset conditions in the texts contained in the worksheet data in the government affair scene are obtained, the first keywords with the word frequencies meeting the preset frequency requirements are determined in the words, hot words in the worksheet data can be mined, and the hot word list can be updated in time along with the worksheet data. For example, when a hot event newly occurs in a work order, the newly added word can be found timely. The second keywords with the similarity meeting the preset similarity requirement are determined from the preset words contained in the preset corpus, so that recall of the hot word corpus is realized, the corpus coverage breadth of hot word mining is enlarged, and abundant hot word data are provided for the subsequent hot word vocabulary. By clustering the first keywords and the second keywords and generating a hotword list according to the hotword clustering result, the hotword list can be expanded based on hotword synonyms, and the hotword query efficiency and accuracy based on the hotword list can be effectively provided.
Illustratively, the hotword vocabulary according to the embodiments of the present disclosure may be used in a multi-government scenario to analyze and query hotword data, such as: trend early warning, folk investigation and the like can be realized.
In addition, in the embodiment of the disclosure, the flow of manually marking hot words and configuring hot word lists is omitted, and the labor cost is greatly reduced.
Fig. 2 is another flow chart of a government scenario hot word mining method according to an embodiment of the present disclosure. As shown in fig. 2, in some embodiments, the government affair scene hot word mining method may further include:
s201, receiving a hotword blacklist configuration operation of a user.
Illustratively, the user may manage and configure the hotword blacklist through an application program interface (application programming interface, API). For example, the API that manages and configures the hotword blacklist may be a blacklist library management API.
The hotword blacklist configuration operation of the user may be an operation of adding hotwords to the hotword blacklist through the blacklist library management API.
S202, responding to the hotword blacklist configuration operation, and adding target hotword settings corresponding to the hotword blacklist configuration operation to the hotword blacklist.
For example, after receiving the hotword blacklist configuration operation, a target hotword setting corresponding to the hotword blacklist configuration operation may be added to the hotword blacklist.
The target hotword corresponding to the hotword blacklist configuration operation can include one or more hotwords. When the target hotword corresponding to the hotword blacklist configuration operation includes a plurality of hotwords, that is, the plurality of hotwords may be added to the hotword blacklist in batches.
Optionally, the user may also execute a hotword blacklist cancellation operation, and delete the target hotword corresponding to the hotword blacklist cancellation operation from the hotword blacklist.
In the embodiment, a blacklist filtering function can be added for the hot word list, so that the efficiency and the flexibility of hot word inquiry can be improved. For example, querying only hotwords in the hotword blacklist may be supported by the hotword blacklist, or the hotwords in the hotword blacklist may not be allowed to be queried or presented.
Fig. 3 is a schematic diagram of a configuration flow of a hotword blacklist according to an embodiment of the disclosure. As shown in fig. 3, in some embodiments, adding the target hotword setting corresponding to the hotword blacklist configuration operation to the hotword blacklist may include:
s301, acquiring a mode field of a target hotword.
Illustratively, a pattern (schema) field of the target hotword may be acquired in S301. The schema field of the target hotword records attribute information of the target hotword, such as a document type, a document identification, text information of the target hotword, and the like of a document in which the target hotword is located.
S302, adding a target field in a mode field of a target hotword.
For example, the target field may be used to identify whether the target hotword is a word in a hotword blacklist. For example, the target field may be an "is_in_blacklist field".
It should be noted that the specific implementation of the target field is not limited by the present disclosure.
S303, setting the value of the target field to the first value.
Illustratively, in some implementations, after the target field is added in the schema field of the target hotword, it indicates that the target hotword is a word in the hotword blacklist.
In other implementations, after the target field is newly added in the schema field of the target hotword, the value of the target field may also be set, and when the value of the target field is 1, it indicates that the target hotword is a word in the hotword blacklist; when the value of the target field is 0, it indicates that the target hotword is not a word in the hotword blacklist. Wherein 1 is the first value of the target field and 0 is the second value of the target field.
In still other implementations, the first value and the second value may be other values, without limitation.
Illustratively, taking the target hotword "gold Niu Ou" as an example, the schema field of the target hotword "gold Niu Ou" may be as follows:
Figure SMS_1
Figure SMS_2
please refer to the above-mentioned scheme field of the target hotword "gold Niu Ou" and the comment "// not being the blacklist", an "is_in_blacklist" field may be newly added to the scheme field of the target hotword "gold Niu Ou", and when the value of the "is_in_blacklist" field is 0, it indicates that the target hotword "gold Niu Ou" is not a word in the hotword blacklist; when the value of the "is_in_blacklist" field is 1, it indicates that the target hotword "gold Niu Ou" is a word in the hotword blacklist.
According to the embodiment, the target field is newly added in the mode field of the target hotword, the value of the target field is set to be the first value, the target hotword is added to the hotword blacklist, the hotword blacklist can be maintained separately to store information of whether the hotword is in the blacklist, the function of the hotword blacklist can be achieved in a simpler mode, and the data storage capacity is reduced.
Fig. 4 is a schematic flow chart of a government scenario hot word mining method according to an embodiment of the present disclosure. As shown in fig. 4, in some embodiments, before the generating the hotword vocabulary according to the hotword clustering result, the government affair scene hotword mining method may further include:
s401, determining the classification category of the hotword through a preset classification model.
The classification model is obtained by training a neural network by adopting a sample hotword and a sample classification label corresponding to the sample hotword.
In this embodiment, based on different government affairs scenarios, different classification tags may be set. For example, the category labels may include: "organization," "city life," "city component," "geographic location," and the like. A class label corresponds to a class category. The sample classification label refers to a classification label corresponding to a classification category marked after the sample hotword is manually classified.
It should be noted that the present disclosure is not limited by the classification basis and the specific type of the classification label.
After the hotword is input into the classification model, the classification model may output a classification category corresponding to the hotword, which may specifically be a classification label.
S402, labeling category labels for the hotwords according to the classification categories of the hotwords.
Illustratively, after the classification category of the hotword is obtained, a category label may be labeled for the hotword according to the classification category of the hotword. For example, a classification identification field, i.e., a class label, may be added to the pattern field of the hotword, and the classification identification field may be used to indicate the classification class of the hotword.
In this embodiment, the classification category of the hotword is determined through a preset classification model, and the category label is labeled for the hotword according to the classification category of the hotword, so that the hotword query efficiency can be further improved. For example, when querying hotwords, hotwords under a particular category may be filtered based on category.
In some embodiments, before determining, among the preset words included in the preset corpus, a second keyword having a similarity with the first keyword that meets a preset similarity requirement, the method further includes: and filtering the first keywords, and screening out the first keywords which do not meet the preset requirements.
Illustratively, the preset requirements may include: not entertainment words, government related, non-sensitive words, etc.
The first keywords which do not meet the preset requirements are filtered, so that the mined hot words can be filtered, the quality of the mining result of the hot words is improved, and more effective hot words are provided for the subsequent hot word use scenes.
In some embodiments, before the generating the hotword vocabulary according to the hotword clustering result, the method further includes: and filtering the hot words, and screening out the hot words which do not meet the preset requirements.
The preset requirements may refer to the preset requirements for screening the first keywords, which are not described herein.
Through filtering the hot words, screening the hot words which do not meet the preset requirements can be realized, after the hot words are expanded and recalled, the hot words which do not meet the preset requirements are screened for the second time, and the expanded and recalled second keywords can be subjected to supplementary filtration, so that the quality of hot word mining results is further improved.
In some embodiments, the method further comprises: and storing the hot word list and the work order data in an elastic search database, wherein the elastic search database is connected with at least one hot word application program interface.
Illustratively, elastic search (es) databases are abbreviated as es libraries. The hotword application program interface may include: a hotword search API, a case hotword search API, a blacklist library management API, a hotword query API, a hotword trend analysis API, a keyword search API, a hotword association API, and the like.
The hot word related fields stored in the es library may include: the hot word classification category, the hot word synonym, whether the hot word is a blacklist hot word, etc. may be specifically described with reference to the foregoing embodiments, and will not be described again.
The hotword application API connected with the es library can access the es library to realize one or more functions of hotword trend early warning, main event analysis of work order occurrence and the like. That is, the es library may support the hotword application requirements of the application scenario of the backend.
In this embodiment, by storing the hot word list and the work order data in the elastic search database, data support can be more conveniently provided for the rear-end hot word application scenario.
In an exemplary embodiment, the present disclosure further provides a hotword system, including: the system comprises a hotword analysis module, an elastic search database and a service module. The hotword parsing module obtains the hotword vocabulary according to the method described in the above embodiment. The elastic search database is used for storing the hot word list. The service module is connected with the elastic search database and comprises at least one hotword application program interface.
Illustratively, fig. 5 is a schematic diagram of the composition of a hotword system provided in an embodiment of the disclosure. As shown in fig. 5, the hotword system includes: a hotword parsing module 501, an es library 502, and a service module 503.
Wherein, the hotword parsing module may include: and the function modules comprise hotword mining, filtering analysis, blacklist analysis, synonym analysis, part-of-speech mining and the like.
The hotword parsing module 501 may routinely read worksheet data.
The hot word mining function module can acquire words with degrees of freedom and solidification meeting preset conditions in texts contained in worksheets in government affair scenes, and determines first keywords with word frequency meeting preset frequency requirements in the words.
The synonym analysis function module can determine second keywords, the similarity of which meets the requirement of the preset similarity, from preset words contained in a preset corpus, cluster the first keywords and the second keywords to obtain at least one hot word clustering result, each hot word clustering result comprises at least one hot word, the hot words contained in each hot word clustering result are synonyms, and the hot words are the first keywords or the second keywords.
The hot word mining function module can also generate a hot word list according to the hot word clustering result, wherein the hot word list comprises hot words and synonyms of the hot words.
The blacklist analysis functional module can receive hot word blacklist configuration operation of a user, respond to the hot word blacklist configuration operation, and add target hot word settings corresponding to the hot word blacklist configuration operation to the hot word blacklist.
The filtering and analyzing functional module can filter the first keywords, screen out the first keywords which do not meet the preset requirements, and filter the hot words, screen out the hot words which do not meet the preset requirements.
The part-of-speech mining function module can determine the classification category of the hot word through a preset classification model, and label the hot word with a category label according to the classification category of the hot word.
The hotword parsing module 501 may store the hotword vocabulary and the work order data in the es library 502, and the es library 502 is connected to the service module 503.
The service module 503 may be a business encapsulation service, and includes a hotword application API such as a hotword search API, a case hotword search API, a blacklist library management API, a hotword query API, a hotword trend analysis API, a keyword search API, and a hotword association API.
The hotword application API can access the es library to realize one or more functions of hotword trend early warning, main event analysis of worksheet occurrence and the like.
In an exemplary embodiment, the embodiment of the present disclosure further provides a government affair scene hot word mining apparatus, which may be used to implement the government affair scene hot word mining method described in the foregoing embodiment. Fig. 6 is a schematic diagram of a government affair scene hot word mining device according to an embodiment of the disclosure. As shown in fig. 6, the apparatus may include: an acquisition unit 601, a screening unit 602, a recall unit 603, a mining unit 604 and a hotword configuration unit 605.
The acquiring unit 601 is configured to acquire a word with a degree of freedom and a degree of solidification meeting preset conditions in a text included in a worksheet data in a government affair scene.
And the screening unit 602 is configured to determine, from the words, a first keyword whose word frequency meets a preset frequency requirement.
The recall unit 603 is configured to determine, from the preset words included in the preset corpus, a second keyword whose similarity with the first keyword meets a preset similarity requirement.
The mining unit 604 is configured to cluster the first keyword and the second keyword to obtain at least one hotword clustering result, where each hotword clustering result includes at least one hotword, and the hotwords included in each hotword clustering result are synonyms, and the hotwords are the first keyword or the second keyword.
The hotword configuration unit 605 is configured to generate a hotword vocabulary according to the hotword clustering result, where the hotword vocabulary includes hotwords and synonyms of the hotwords.
Optionally, the hotword configuration unit 605 is further configured to: receiving hot word blacklist configuration operation of a user; and responding to the hotword blacklist configuration operation, and adding target hotword settings corresponding to the hotword blacklist configuration operation to the hotword blacklist.
Optionally, the hotword configuration unit 605 is specifically configured to: acquiring a mode field of a target hotword; adding a target field in a mode field of a target hotword; the value of the target field is set to a first value.
Optionally, the excavation unit 604 is further configured to: determining the classification category of the hotword through a preset classification model; labeling category labels for the hotwords according to the classification categories of the hotwords; the classification model is obtained by training a neural network by adopting a sample hotword and a sample classification label corresponding to the sample hotword.
Optionally, the screening unit 602 is further configured to: before the recall unit 603 determines, among the preset words included in the preset corpus, the second keywords having the similarity with the first keywords meeting the preset similarity requirement, the first keywords are filtered, and the first keywords not meeting the preset requirement are screened out.
Optionally, the screening unit 602 is further configured to: before the hotword configuration unit 605 generates the hotword vocabulary according to the hotword clustering result, the hotwords are filtered, and hotwords which do not meet the preset requirements are screened.
Optionally, the hotword configuration unit 605 is further configured to: and storing the hot word list and the work order data in an elastic search database, wherein the elastic search database is connected with at least one hot word application program interface.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
In an exemplary embodiment, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the above embodiments. The electronic device may be the computer or server described above.
In an exemplary embodiment, the readable storage medium may be a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method according to the above embodiment.
In an exemplary embodiment, the computer program product comprises a computer program which, when executed by a processor, implements the method according to the above embodiments.
Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the electronic device 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, such as the government scene hotword mining method. For example, in some embodiments, the government scene hot word mining method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. When the computer program is loaded into RAM 703 and executed by computing unit 701, one or more steps of the government scenario hotword mining method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the government scene hotword mining method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (18)

1. A government scene hot word mining method, the method comprising:
acquiring words with degrees of freedom and solidification meeting preset conditions in texts contained in worksheets in government affair scenes;
determining a first keyword with word frequency meeting the preset frequency requirement in the words;
determining second keywords with similarity meeting a preset similarity requirement between the second keywords in preset words contained in a preset corpus;
Clustering the first keywords and the second keywords to obtain at least one hot word clustering result, wherein each hot word clustering result comprises at least one hot word, the hot words included in each hot word clustering result are synonyms, and the hot words are the first keywords or the second keywords;
and generating a hotword vocabulary according to the hotword clustering result, wherein the hotword vocabulary comprises the hotwords and synonyms of the hotwords.
2. The method of claim 1, the method further comprising:
receiving hot word blacklist configuration operation of a user;
and responding to the hotword blacklist configuration operation, and adding target hotword settings corresponding to the hotword blacklist configuration operation to a hotword blacklist.
3. The method of claim 2, the adding the target hotword setting corresponding to the hotword blacklist configuration operation to a hotword blacklist, comprising:
acquiring a mode field of the target hotword;
adding a target field in the mode field of the target hotword;
the value of the target field is set to a first value.
4. A method according to any one of claims 1-3, further comprising, prior to generating a hotword vocabulary from the hotword clustering result:
Determining the classification category of the hotword through a preset classification model;
labeling category labels for the hot words according to the classification categories of the hot words;
the classification model is obtained by training a neural network by adopting a sample hotword and a sample classification label corresponding to the sample hotword.
5. The method according to any one of claims 1-4, wherein before determining, among the preset terms included in the preset corpus, a second keyword whose similarity with the first keyword meets a preset similarity requirement, the method further includes:
and filtering the first keywords, and screening out the first keywords which do not meet the preset requirements.
6. The method of claim 5, wherein before generating the hotword vocabulary according to the hotword clustering result, the method further comprises:
and filtering the hot words, and screening out the hot words which do not meet the preset requirements.
7. The method of any one of claims 1-6, further comprising:
and storing the hot word list and the work order data in an elastic search database, wherein the elastic search database is connected with at least one hot word application program interface.
8. A government scene hot word mining device, the device comprising:
the acquiring unit is used for acquiring words with the degrees of freedom and the degree of solidification meeting preset conditions in texts contained in the worksheet data in the government affair scene;
the screening unit is used for determining a first keyword with word frequency meeting the preset frequency requirement in the words;
a recall unit, configured to determine, from preset words included in a preset corpus, a second keyword whose similarity with the first keyword meets a preset similarity requirement;
the mining unit is used for clustering the first keywords and the second keywords to obtain at least one hot word clustering result, wherein each hot word clustering result comprises at least one hot word, the hot words included in each hot word clustering result are synonyms, and the hot words are the first keywords or the second keywords;
and the hot word configuration unit is used for generating a hot word list according to the hot word clustering result, wherein the hot word list comprises the hot words and synonyms of the hot words.
9. The apparatus of claim 8, the hotword configuration unit to further:
receiving hot word blacklist configuration operation of a user;
And responding to the hotword blacklist configuration operation, and adding target hotword settings corresponding to the hotword blacklist configuration operation to a hotword blacklist.
10. The apparatus of claim 9, the hotword configuration unit is specifically configured to:
acquiring a mode field of the target hotword;
adding a target field in the mode field of the target hotword;
the value of the target field is set to a first value.
11. The apparatus of any of claims 8-10, the digging unit further configured to:
determining the classification category of the hotword through a preset classification model;
labeling category labels for the hot words according to the classification categories of the hot words;
the classification model is obtained by training a neural network by adopting a sample hotword and a sample classification label corresponding to the sample hotword.
12. The apparatus of any one of claims 8-11, the screening unit further to:
before the recall unit determines a second keyword, the similarity of which meets the preset similarity requirement, from preset words contained in a preset corpus, the first keyword is filtered, and the first keyword which does not meet the preset requirement is screened out.
13. The apparatus of claim 12, the screening unit further to:
before the hotword configuration unit generates a hotword list according to the hotword clustering result, filtering the hotwords, and screening out the hotwords which do not meet the preset requirements.
14. The apparatus of any of claims 8-14, the hotword configuration unit to further:
and storing the hot word list and the work order data in an elastic search database, wherein the elastic search database is connected with at least one hot word application program interface.
15. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-7.
18. A hotword system, comprising: the system comprises a hotword analysis module, an elastic search database and a service module;
the hotword parsing module obtains a hotword vocabulary according to the method of any one of claims 1-7;
the elastic search database is used for storing the hot word list;
the service module is connected with the elastic search database and comprises at least one hotword application program interface.
CN202211656838.7A 2022-12-22 2022-12-22 Government scene hot word mining method, device, equipment and storage medium Pending CN115994534A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211656838.7A CN115994534A (en) 2022-12-22 2022-12-22 Government scene hot word mining method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211656838.7A CN115994534A (en) 2022-12-22 2022-12-22 Government scene hot word mining method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115994534A true CN115994534A (en) 2023-04-21

Family

ID=85989858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211656838.7A Pending CN115994534A (en) 2022-12-22 2022-12-22 Government scene hot word mining method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115994534A (en)

Similar Documents

Publication Publication Date Title
CN107133263B (en) POI recommendation method, device, equipment and computer readable storage medium
EP3916584A1 (en) Information processing method and apparatus, electronic device and storage medium
CN113836314B (en) Knowledge graph construction method, device, equipment and storage medium
CN113239295A (en) Search method, search device, electronic equipment and storage medium
CN111314063A (en) Big data information management method, system and device based on Internet of things
CN115392235A (en) Character matching method and device, electronic equipment and readable storage medium
CN114064925A (en) Knowledge graph construction method, data query method, device, equipment and medium
CN113495825A (en) Line alarm processing method and device, electronic equipment and readable storage medium
CN116955856A (en) Information display method, device, electronic equipment and storage medium
CN110738048B (en) Keyword extraction method and device and terminal equipment
CN112784050A (en) Method, device, equipment and medium for generating theme classification data set
CN112148841A (en) Object classification and classification model construction method and device
CN114880498B (en) Event information display method and device, equipment and medium
EP4224322A1 (en) Application testing method and apparatus, electronic device and storage medium
CN110659208A (en) Test data set updating method and device
CN115952258A (en) Generation method of government affair label library, and label determination method and device of government affair text
CN113590447B (en) Buried point processing method and device
CN115994534A (en) Government scene hot word mining method, device, equipment and storage medium
CN115292506A (en) Knowledge graph ontology construction method and device applied to office field
CN114860872A (en) Data processing method, device, equipment and storage medium
CN114116924A (en) Data query method based on map data, map data construction method and device
CN111126034B (en) Medical variable relation processing method and device, computer medium and electronic equipment
CN114417862A (en) Text matching method, and training method and device of text matching model
CN112887426A (en) Information flow pushing method and device, electronic equipment and storage medium
CN112926297A (en) Method, apparatus, device and storage medium for processing information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination