CN112597772A - Hotspot information determination method, computer equipment and device - Google Patents

Hotspot information determination method, computer equipment and device Download PDF

Info

Publication number
CN112597772A
CN112597772A CN202011632878.9A CN202011632878A CN112597772A CN 112597772 A CN112597772 A CN 112597772A CN 202011632878 A CN202011632878 A CN 202011632878A CN 112597772 A CN112597772 A CN 112597772A
Authority
CN
China
Prior art keywords
text
elements
determining
sharing
synonymous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011632878.9A
Other languages
Chinese (zh)
Inventor
卜民
周维
陈志刚
谭昶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iflytek Information Technology Co Ltd
Original Assignee
Iflytek Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iflytek Information Technology Co Ltd filed Critical Iflytek Information Technology Co Ltd
Priority to CN202011632878.9A priority Critical patent/CN112597772A/en
Publication of CN112597772A publication Critical patent/CN112597772A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a hotspot information determination method, computer equipment and a device, wherein the hotspot information determination method comprises the following steps: acquiring a text set to be processed; extracting at least one element of each text in a text set to be processed; determining a sharing key corresponding to each element; wherein each shared key corresponds to one element or a plurality of semantically identical elements; combining different sharing keys to obtain a sharing key combination; and determining hot spot information according to the occurrence frequency of the shared key combination in the text set to be processed. By means of the mode, the hotspot information can be accurately and efficiently determined.

Description

Hotspot information determination method, computer equipment and device
Technical Field
The present application relates to the field of information analysis, and in particular, to a hotspot information determining method, a computer device, and an apparatus.
Background
With the continuous development of the internet, the information transmission speed is faster and faster, and the internet can emerge massive consultation news in a short time, and is complicated and disordered. People are more and more convenient to acquire information, but the information acquisition is convenient, time is needed to be spent on carefully reading, and the current hotspot information is difficult to acquire from a large amount of texts. In order to enable the provided network service to have higher instantaneity and timeliness, it is important to accurately and efficiently determine the hotspot information in the current network.
Disclosure of Invention
The technical problem mainly solved by the application is to provide a hotspot information determining method, computer equipment and device, which can accurately and efficiently determine hotspot information.
In order to solve the technical problem, the application adopts a technical scheme that: the hotspot information determining method comprises the following steps: acquiring a text set to be processed; extracting at least one element of each text in a text set to be processed; determining a sharing key corresponding to each element; wherein each shared key corresponds to one element or a plurality of semantically identical elements; combining different sharing keys to obtain a sharing key combination; and determining hot spot information according to the occurrence frequency of the shared key combination in the text set to be processed.
In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a computer device comprising a processor for executing instructions to implement the hotspot information determination method described above.
In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided an apparatus having a storage function, which stores program data that can be read by a computer and that can be executed by a processor, to implement the above-described hotspot information determination method.
The beneficial effect of this application is: different from the situation of the prior art, the application provides a hot spot information determination method. According to the method, the sharing keys of the elements are determined, the sharing keys are combined to determine the mode of the hotspot information, the same contents of different description modes can be combined, and the heat degree of the hotspot information cannot be dispersed. The method can accurately determine the occurrence frequency of the events from the text set, improve the recall rate of the hotspot information and avoid the loss of the hotspot information caused by different description modes.
Drawings
Fig. 1 is a schematic flowchart of a hot spot information determining method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a hot spot information determining method according to another embodiment of the present application;
FIG. 3 is a flowchart illustrating a method for updating a synonymous component library according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a hotspot information push interface according to an embodiment of the application;
fig. 5 is a schematic structural diagram of a hotspot information determination system according to an embodiment of the present application;
FIG. 6 is a schematic block diagram of a computer device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a device with a storage function according to an embodiment of the present application.
Detailed Description
In order to make the purpose, technical solution and effect of the present application clearer and clearer, the present application is further described in detail below with reference to the accompanying drawings and examples.
According to the hotspot information determining method, the similar information in different description modes can be extracted from a large amount of text information, the hotspot information in the text information is determined, and the accuracy and the recall rate of hotspot information determination are effectively improved. The embodiment of the application can be applied to various information pushing systems, such as a webpage pushing system, a news pushing system or an article pushing system. The application scenario described in the embodiment of the present application is for more clearly illustrating the technical solution of the embodiment of the present application, and does not constitute a limitation on the ending solution provided in the embodiment of the present application. As can be seen by those skilled in the art, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems in other application scenarios without creative efforts. The hotspot information determining method can be implemented in terminal equipment, and can also be implemented by equipment such as a network background server, a server cluster or a service site.
In different texts, different description modes may be provided for the same event, which may cause the heat degree of the hotspot information to be dispersed, so that the hotspot information is difficult to be found, and a problem of low hotspot information recall rate is caused. In order to solve the technical problem, the present application provides a method for determining hot spot information, which is described in detail below.
Referring to fig. 1, fig. 1 is a schematic flowchart of a hot spot information determining method according to an embodiment of the present application. The hot spot information determining method can be realized by a hot spot information determining system. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 1 is not limited in this embodiment. As shown in fig. 1, the method includes:
step 110: and acquiring a text set to be processed.
In one embodiment, the pending text set may refer to a set of texts occurring within a preset time period collected by the hotspot information determination system. The hotspot information determination system can acquire the text set to be processed from the data source. Data sources may include, but are not limited to, authors of various news websites, authors of review websites, or authors of social networking platforms, etc. When the data source issues the text, the hotspot information determining system is used for acquiring the newly issued text. The preset time period may be set according to user requirements, and may be, for example, one hour, one day, one week, one month, or the like. The text may be letters, news articles, commentary articles, social network platform articles (e.g., micro blogs, public numbers, articles posted on circles of friends), or web page content, among others.
Step 130: at least one element of each text in the text set to be processed is extracted.
In one embodiment, an element may refer to a constituent element in an event. The element types may include: time, people, government agency names, general agency names, diseases, group names, addresses, laws and regulations, documents, items or brands, and the like. Elements may also refer to medical, engineering, jurisdictions, and the like.
The hotspot information determination system can extract elements in the text by any method, for example, the elements in the text can be extracted by an element extraction model.
Step 150: and determining a sharing key corresponding to each element.
Each shared key corresponds to one element or a plurality of elements with the same semantic meaning; in one embodiment, each element corresponds to a shared key, but a shared key may correspond to one or more synonymous elements. The correspondence between the elements and the shared key is shown in table 1. For example, "transfer soldier" corresponds to a shared key "46", but the shared key "46" corresponds to a plurality of synonymous elements, such as "transfer soldier", "transfer officer", "transfer old soldier", or "synonymous elements, i.e. elements having the same meaning may refer to different expressions.
TABLE 1 example table of corresponding relationship between elements and shared keys
Figure BDA0002880495170000041
The shared key corresponding to the element may be stored or newly created. The hotspot information determining system may retrieve the sharing key corresponding to the element from another system, search for the sharing key corresponding to the determined element from a stored database, or may newly establish the sharing key when the element does not have the stored sharing key.
Step 170: and combining different sharing keys to obtain a sharing key combination.
In one embodiment, different numbers of shared keys may be combined at random. The shared key corresponds to a word and cannot accurately describe an event. The sharing keys are combined to form phrases or sentences, so that the events can be accurately described. A variety of events can be acquired by randomly combining the sharing keys. By combining the sharing keys, a plurality of elements corresponding to different sharing keys can be combined. Thus, one sharing key combination may correspond to an element combination set including a plurality of element combinations. Specifically, N different sharing keys may be combined to obtain a sharing key combination, for example, the sharing key 32 and the sharing key 46 are combined to obtain a sharing key combination [32&46], and the corresponding element combinations are [ "an" AA share limited "&" transfer soldiers "," AA company "&" transfer soldiers "," an "AA share limited" & "transfer soldiers", "AA" & "transfer soldiers", "an" transfer soldiers "&" transfer soldiers "," AA "&" transfer soldiers "," transfer soldiers "&" transfer soldier transfer "," an "AA" & "transfer soldia transfer soldier transfer officer", "AA" & "transfer officer, "Anhui AA" & "transfer old soldier", "AA" & "transfer old soldier", etc.
Step 190: and determining hot spot information according to the occurrence frequency of the shared key combination in the text set to be processed.
The occurrence frequency of the sharing key combination in the text set to be processed may refer to the number or the number ratio of the texts containing any element combination corresponding to the sharing key combination. The element combination corresponding to the sharing key combination with the occurrence frequency meeting the preset condition can be used as the hotspot information. The hotspot information may also refer to a text containing a combination of sharing keys meeting a preset condition.
By determining the sharing keys of the elements and combining the sharing keys to determine the hot spot information, the same contents in different description modes can be combined, so that the heat degree of the hot spot information cannot be dispersed. The method can accurately determine the occurrence frequency of the events from the text set, improve the recall rate of the hotspot information and avoid the loss of the hotspot information caused by different description modes.
Referring to fig. 2, fig. 2 is a schematic flowchart of a hot spot information determining method according to another embodiment of the present application. The hot spot information determining method can be realized by a hot spot information determining system. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 2 is not limited in this embodiment. As shown in fig. 2, the method includes:
step 210: and acquiring a text set to be processed.
The text set to be processed comprises a plurality of texts.
Step 220: and reading, understanding and labeling the texts in sequence, and extracting at least one element of each text.
Reading and understanding the text can extract elements which have mutual association in the text, such as elements having mutual affiliation, such as the dependences of the electric loved persons, the electric loved persons units, the electric loved persons and the like. Specifically, reading and understanding the text includes: determining the number of answers of a preset question in the text and the initial position of each answer, and extracting elements at corresponding positions in the text according to the initial position of each answer. Reading comprehension of the text can be realized through a reading comprehension model. The reading understanding model is a machine learning model which is trained in advance, and a training sample is a text with a plurality of element labels with subordination relations.
The text is labeled sequentially for extracting conventional elements, such as organization names, document names, laws and regulations, project names, and the like. Specifically, the specific process of performing sequence labeling on the text includes: each character in the text is classified, and if a plurality of consecutive characters belong to the same category, the plurality of consecutive characters can be regarded as one element, and the element can be extracted. The sequence labeling of the text can be realized through a sequence labeling model. The sequence labeling model is a machine learning model trained in advance, and the training samples are texts with element labels.
The training texts of the reading understanding model and the sequence labeling model can be texts in various fields, such as the medical field, the engineering field, the judicial field or the education field. The reading understanding model and the sequence labeling model are trained by adopting text samples in various fields, so that the reading understanding model and the sequence labeling model can be suitable for collecting elements of articles in various fields.
Elements may include a variety of different types, for example: address class, name class, or time class, etc. The name class may include an organization name, a person name, a project name, and the like. The address class may include multiple levels of addresses, for example, Anhui province, Hefei city, Yao Hai district, etc. are all address class elements.
In one embodiment, the elements are normalized to obtain normalized elements. After the elements in the text are collected, the partial classification type elements can be normalized to obtain normalized elements. Regularization may refer to complete expansion of an element to obtain a full name corresponding to the element, so as to facilitate subsequent steps. For example, for the normalization processing of the address class elements, the address to be extracted can be split into an administrative division name and a specific address name. An abbreviation mapping table may be constructed from map data for administrative district names, as shown in table 1. The short names of the administrative divisions are normalized into full names through an abbreviation mapping table. For example, the regularized elements corresponding to the elements such as the compound fertilizer, the compound fertilizer market, the Anhui compound fertilizer, the Anhui province fertilizer market and the like are the Anhui province fertilizer market.
TABLE 2 normalized processing map example Table
Figure BDA0002880495170000061
Step 230: and determining a sharing key corresponding to the element.
In one embodiment, searching is carried out in a synonymous element set library, and a sharing key corresponding to an element is determined, wherein the synonymous element library comprises the corresponding relation between the element and the sharing key; . One shared key in the synonymous element library can correspond to one synonymous element set, and the synonymous element set comprises one or more elements with the same semantic meaning. Different descriptions of the same semantics can be acquired more quickly through the synonym element library.
Searching and matching can be directly carried out in the synonymous element library, and the sharing key corresponding to the element is determined; when the synonymous element library does not include the element, the synonymous element library may be updated to identify the shared key corresponding to the element. More details of the update method of the synonymous component library are shown in fig. 3 and the related description of fig. 3.
In the initialization stage, the hotspot information determination system can call the saved initial synonymous element library. The initial synonymous factor library may be a manually constructed factor library with a certain amount of synonymous samples. By constructing the synonymous element library, the shared key corresponding to the element can be determined only by searching in the library, so that the time for repeated matching can be greatly reduced; meanwhile, words with the same semantics can be classified into one category, and data support is provided for subsequent hot spot information determination.
Step 240: and constructing a text index library.
In one embodiment, determining a corresponding text cluster of each shared key in a text set to be processed; wherein each text in the corresponding text cluster contains at least one element corresponding to the shared key. Namely, a text index library is constructed, and the text index library comprises the mapping relation between the shared keys and the text. The mapping relation means that at least one element under the sharing key is contained in the text corresponding to the sharing key. Further, the text may be numbered, and the text index library may only include the correspondence between the element sharing key and the text number. The mapping relationship between the sharing key and the text number is shown in table 3.
TABLE 3 example table of mapping relationship between shared key and text number
Figure BDA0002880495170000071
As shown in table 3, text No. 33 shows "transfer soldier", text No. 156 shows "transfer soldier", and text No. 378 shows "transfer old soldier"; the sharing keys corresponding to the "transfer soldier", "transfer soldier" and "transfer old soldier" are all 46, so that the text number list corresponding to the sharing key 46 is [33, 156, 378], that is, the text cluster corresponding to the sharing key 46 includes the texts with the text numbers 33, 156 and 378.
The corresponding relation between the sharing key and the text can be established by establishing the text index library, and convenience is provided for subsequent hot spot information determination.
Step 250: and combining different sharing keys to obtain a sharing key combination.
The combination continues for an arbitrary number of shared keys. Further, the number of sharing key items to be combined is 5 or less. If the number of shared keys to be combined is more than 5, the processing time of the subsequent steps is remarkably increased, and the significance of the combination is not large.
In one embodiment, the shared keys are classified and the shared keys of different classes are combined. For example, the category of the share key may include an address, a person or time, and the like. When the combination is carried out, the sharing keys of different types are combined respectively, and the sharing key combination containing the address, the person and the time can be obtained. The phrases obtained by combining the same categories are largely meaningless and cannot describe events. Through different types of shared key combinations, the number of meaningless combinations can be reduced, and the processing efficiency is improved.
Step 260: a confidence level for the combination of sharing keys is determined.
The confidence of a shared key combination refers to the probability that, if one shared key in the combination occurs, the other shared keys in the combination occur. Search statistics may be performed in the text index library to obtain confidence in the shared key combination. The statistical process is specifically explained by taking a sharing key combination "a & B" as an example: and counting the repeated text number of the text number list corresponding to the sharing key A and the text number list corresponding to the sharing key B, and calculating the proportion of the text number in the text number list corresponding to the sharing key A, namely the confidence coefficient of the sharing key combination 'A & B'. The above is merely an exemplary illustration, and the confidence may be calculated in any other calculation manner. In one embodiment, for the sharing key combination with the confidence coefficient larger than the first threshold, the occurrence frequency of the sharing key combination in the text set to be processed is counted. That is, the subsequent steps are performed for the shared key combination with the confidence greater than the first threshold. The first threshold value may be a fixed value set in advance, or may be a related value set according to the feature of the shared key combination. For the sharing key combination with lower confidence, the sharing key combination is directly abandoned. By screening the confidence, partial meaningless sharing key combinations can be removed, so that the processing time of subsequent steps is reduced, and the processing efficiency is improved.
Step 270: and counting the occurrence frequency of the sharing key combination in the text set to be processed.
In one embodiment, the number of texts overlapping each other in the corresponding text cluster of the sharing key in the sharing key combination is determined, and the occurrence frequency of the sharing key combination is determined based on the number of texts overlapping each other. The number of texts overlapped with each other refers to the number of texts appearing in the corresponding text clusters of different sharing keys in the sharing key combination at the same time. Or directly carrying out search statistics in a text index library to obtain the number of texts overlapped in a text number set corresponding to the combined sharing key.
Step 280: and determining hotspot information.
In an embodiment, the element combination corresponding to the sharing key combination with the occurrence frequency meeting the second preset condition is used as the hotspot information. The second preset condition may refer to the frequency of occurrence being greater than a second threshold; or the occurrence frequency is the first N items. The second threshold may be a fixed value set in advance, or may be a related value set according to the shared key combination characteristics.
In addition, the second preset condition can be configured according to requirements, for example, for a shared key combination with certain specific characteristics, the occurrence frequency higher than a smaller set value can be determined as the hotspot information. Or may prioritize certain combinations of shared keys with features.
Step 290: and pushing the text corresponding to the hotspot information to the user.
The text containing the hotspot information can be pushed to the user, so that the user can preferentially see the hotspot information. And the hotspot information to be pushed can be selected in a personalized manner according to the requirements of the user.
In one embodiment, the sharing key combinations serving as the hotspot information are classified to obtain the category of each sharing key combination; acquiring a user requirement category; and pushing the text corresponding to the sharing key combination with the same category as the user requirement to the user. The shared key combination can be classified according to the characteristics of time, place or task, and the text pushing is carried out according to the requirement category of the user. In addition, text can be selectively pushed to the user according to other rules, such as: all push, push by category on demand, push by frequency and confidence, push by address on demand, push by mechanism on demand, push by person, and the like. Taking address push as an example according to requirements, hot spot information can be divided in detail according to the level of the required address, and an event of a lower-level place belongs to a higher-level place, so that texts corresponding to the hot spot information are pushed in a layering manner. For example, as shown in fig. 4, fig. 4 is a schematic diagram of a hotspot information pushing interface according to an embodiment of the present application. In fig. 4, the road widening of Anhui province is a piece of hot information, the road widening hot events of each city can be seen by clicking Anhui province, the road widening hot events of each county and district can be seen by clicking each city, and the number of texts containing the road widening hot events corresponding to each level of address is displayed.
When the element cannot be searched in the synonymous element library, the synonymous element library needs to be updated. Please refer to fig. 3 for a specific update method. Fig. 3 is a flowchart illustrating a method for updating a synonymous component library according to an embodiment of the present application. The synonymy element library updating method can be realized by a hotspot information determination system. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 3 is not limited in this embodiment. As shown in fig. 3, the method includes:
step 310: and performing fuzzy search to determine a candidate synonym element list.
When an element is not searched in the synonymous element library, the element is a new element. Fuzzy search is required to be carried out on the new elements in the synonymous element library to determine a candidate synonymous element list. Fuzzy search means that a certain difference exists between searched information and information searched, and the difference is the meaning of 'fuzzy' in the search. For example, the search may be performed using the same elements as the search element. Fuzzy search is carried out in the synonymous element library, a plurality of elements with a little similarity with the new elements are obtained, and a candidate synonymous element list is formed.
Step 320: the similarity of the element and the candidate synonymous elements in the candidate synonymous element list is determined. And carrying out similarity calculation on the element and each candidate synonymous element so as to obtain the similarity of the element and each candidate synonymous element. The similarity calculation may be any calculation method, such as cosine similarity or cross similarity. The process of determining similarity may be implemented by matching models. The matching model may be a machine learning model. The matching model has better generalization capability by fusing DSSM, ESIM, Simase Network and Decomposable Attention together through a Blending model fusion method. The training samples of the matching model can be a plurality of groups of element combinations with different similarities.
Step 330: it is judged whether the synonymous element library has the synonymous element.
In one embodiment, a candidate synonymous element with a similarity satisfying a first preset condition is determined as the synonymous element of the element. Wherein the element is referred to as a new element. The first preset condition may mean that the similarity is maximum and greater than a third threshold. The third threshold value may be a fixed value set in advance, or may be a related value set according to the element feature.
If yes, i.e. having synonymous elements, go to step 340; if not, i.e., there is no synonymous element, proceed to step 350.
Step 340: and taking the shared key corresponding to the synonymous element as the shared key of the element.
And determining the shared key corresponding to the synonymous element as the shared key corresponding to the element. The synonymous elements are elements existing in the synonymous element library, and the shared key corresponding to the synonymous element in the synonymous element library is used as the shared key of the new element.
Step 350: and generating a new sharing key, associating the element with the newly generated sharing key, and adding the newly generated sharing key into the synonymous element library.
And if the synonymous elements of the elements do not exist in the synonymous element library, generating a new shared key. And associating the element with the newly generated sharing key, namely, the newly generated sharing key is the sharing key corresponding to the element. The element is referred to as a new element. And storing the newly generated shared key and the corresponding new element into the synonymous element library.
Step 360: the element is added to the synonym library.
The element is a new element in the above, that is, an element that cannot be searched in the synonymous element library. And adding the element and the corresponding relation between the element and the shared key into the synonymous element library to complete the updating of the synonymous element library.
The application also provides a hotspot information determination system, and the hotspot information determination system is referred to in fig. 5. Fig. 5 is a schematic structural diagram of a hotspot information determination system according to an embodiment of the present application. In this embodiment, the hotspot information determination system 500 comprises an acquisition module 510, an extraction module 520, a determination sharing key module 530, a combination module 540, and a hotspot information determination module 550.
The obtaining module 510 is configured to obtain a text set to be processed.
An extracting module 520, configured to extract at least one element of each text in the set of texts to be processed. The extracting module 520 may be further configured to input the texts into pre-trained element extracting models, respectively, and extract at least one element of each text; wherein the element extraction model comprises a reading understanding model and a sequence labeling model.
And a shared key determining module 530, configured to determine a shared key corresponding to each element. Wherein, each shared key corresponds to one element or a plurality of semantically identical elements. The determine shared key module 530 may also be used to obtain a synonym factor library; the synonymous element library comprises the corresponding relation between the elements and the shared key; and determining a sharing key corresponding to the element from the synonymous element set library. The shared key determining module 530 may be further configured to perform a fuzzy search to determine a candidate list of synonymous elements if no element exists in the synonymous element library; determining the similarity between the element and the candidate synonymous element in the candidate synonymous element list by adopting a matching model; determining candidate synonymous elements with the similarity meeting a first preset condition as synonymous elements of the elements, and taking shared keys corresponding to the synonymous elements as shared keys of the elements; and adding the key element into the synonym element library. The determine shared key module 530 may be further configured to generate a new shared key if a synonymous element of the element does not exist in the synonymous element library; and associating the elements with the newly generated shared key, and adding the newly generated shared key and the elements into the synonymous element library. The shared key determining module 530 may also be configured to perform a regularization process on the elements to obtain regularized elements, and determine a shared key corresponding to the regularized elements.
And the combination module 540 is configured to combine different sharing keys to obtain a sharing key combination. Wherein the number of the combined sharing key items is less than or equal to 5.
And a hot spot information determining module 550, configured to determine hot spot information according to the occurrence frequency of the shared key combination in the to-be-processed text set. The hotspot information determining module 550 may also be configured to count the occurrence frequency of the sharing key combination in the to-be-processed text set; and taking the element combination corresponding to the sharing key combination with the occurrence frequency meeting the second preset condition as the hotspot information. The hot spot information determining module 550 may also be configured to determine a text cluster corresponding to each shared key in the to-be-processed text set; wherein each text in the corresponding text cluster contains at least one element corresponding to the shared key; determining the number of texts overlapped with each other in the corresponding text clusters of the sharing key in the sharing key combination, and determining the occurrence frequency of the sharing key combination based on the number of texts overlapped with each other. The determine hotspot information module 550 may also be used to determine a confidence level of the combination of sharing keys; and counting the occurrence frequency of the sharing key combination in the text set to be processed for the sharing key combination with the confidence coefficient larger than the first threshold value.
The hotspot information determining system 500 further includes a pushing module (not shown) for classifying the sharing key combinations as hotspot information to obtain a category of each sharing key combination; acquiring a user requirement category; and pushing the text corresponding to the sharing key combination with the same category as the user requirement to the user.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application. In this embodiment, the computer device 600 includes a processor 610.
Processor 610 may also be referred to as a CPU (Central Processing Unit). The processor 610 may be an integrated circuit chip having signal processing capabilities. The processor 610 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The computer device 600 may further include a memory (not shown) for storing instructions and data required for the processor 610 to operate.
The processor 610 is configured to execute instructions to implement the methods provided by any of the embodiments of the adaptive filtering method described above and any non-conflicting combinations.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a device with a storage function according to an embodiment of the present application. The apparatus 700 with storage function according to the embodiments of the present application stores instructions that, when executed, implement the methods provided by any of the embodiments of the adaptive filtering method according to the present application and any non-conflicting combinations. The instructions may form a program file stored in the apparatus with a storage function in the form of a software product, so as to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods according to the embodiments of the present application. The apparatus 700 with storage function includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.
The beneficial effects that may be brought by the embodiments of the present application include, but are not limited to: (1) the recall rate of the hotspot information is high, and the synonymous events with different description modes can be classified into one event by using the shared key of the synonymous element, so that the heat dispersion is avoided; (2) the application range is wide, and training samples in multiple fields are used during model training, so that the method can be applied to hotspot determination in multiple fields; (3) the user experience degree is improved, and the user can check the hotspot information of the field or area to be known according to the self requirement. It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (12)

1. A hotspot information determination method is characterized by comprising the following steps:
acquiring a text set to be processed;
extracting at least one element of each text in the text set to be processed;
determining a sharing key corresponding to each element; wherein each shared key corresponds to one element or a plurality of semantically identical elements;
combining different sharing keys to obtain a sharing key combination;
and determining hotspot information according to the occurrence frequency of the sharing key combination in the text set to be processed.
2. The method according to claim 1, wherein the determining the shared key corresponding to each of the elements includes:
searching from the synonymous element set library to determine the sharing key corresponding to the element; wherein the synonymous element library includes a correspondence of the element with the shared key.
3. The method according to claim 2, wherein the determining of the shared key corresponding to each of the elements further comprises determining a shared key corresponding to each of the elements
If the elements do not exist in the synonymous element library, performing fuzzy search to determine a candidate synonymous element list;
determining similarity of the element and a candidate synonymous element in a candidate synonymous element list;
determining the candidate synonymous elements with the similarity satisfying a first preset condition as the synonymous elements of the elements,
taking the sharing key corresponding to the synonymous element as the sharing key of the element; and are
Adding the elements into the synonymous element library.
4. The method according to claim 3, wherein the determining of the shared key corresponding to each of the elements further comprises determining a shared key corresponding to each of the elements
If the synonymous element of the element does not exist in the synonymous element library, generating a new shared key;
associating the element with the newly generated shared key, and
and adding the newly generated shared key and the elements into the synonymous element library.
5. The method according to claim 1, wherein the determining a shared key corresponding to each of the elements further comprises:
and carrying out regularization processing on the elements to obtain regularized elements, and determining the shared keys corresponding to the regularized elements.
6. The method according to claim 1, wherein the determining hot spot information according to the frequency of occurrence of the sharing key combination in the to-be-processed text set includes:
counting the occurrence frequency of the sharing key combination in the text set to be processed;
and taking the element combination corresponding to the sharing key combination with the occurrence frequency meeting a second preset condition as hotspot information.
7. The method according to claim 6, wherein counting the frequency of occurrence of the sharing key combination in the to-be-processed text set includes:
determining a corresponding text cluster of each shared key in the text set to be processed; wherein each text in the corresponding text cluster contains at least one of the elements corresponding to the shared key;
determining the number of texts overlapped with each other in the corresponding text clusters of the sharing key in the sharing key combination,
determining the occurrence frequency of the sharing key combination based on the number of texts overlapped with each other.
8. The method according to claim 6, wherein the counting the frequency of occurrence of the sharing key combination in the text set to be processed comprises:
determining a confidence level of the sharing key combination;
and for the sharing key combination with the confidence coefficient larger than a first threshold value, counting the occurrence frequency of the sharing key combination in the text set to be processed.
9. The method for determining hot spot information according to claim 1, further comprising:
classifying the sharing key combination serving as the hotspot information to obtain the category of each sharing key combination;
acquiring a user requirement category;
and pushing the text corresponding to the sharing key combination with the same type as the user requirement to the user.
10. The method according to claim 1, wherein the extracting at least one element of each text in the set of texts to be processed comprises:
and reading, understanding and labeling the texts in sequence, and extracting at least one element of each text.
11. A computer device comprising a processor for executing instructions to implement the hotspot information determination method of any one of claims 1-10.
12. An apparatus having a storage function, wherein program data is stored and can be read by a computer, and the program data can be executed by a processor to implement the hotspot information determination method according to any one of claims 1 to 10.
CN202011632878.9A 2020-12-31 2020-12-31 Hotspot information determination method, computer equipment and device Pending CN112597772A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011632878.9A CN112597772A (en) 2020-12-31 2020-12-31 Hotspot information determination method, computer equipment and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011632878.9A CN112597772A (en) 2020-12-31 2020-12-31 Hotspot information determination method, computer equipment and device

Publications (1)

Publication Number Publication Date
CN112597772A true CN112597772A (en) 2021-04-02

Family

ID=75206900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011632878.9A Pending CN112597772A (en) 2020-12-31 2020-12-31 Hotspot information determination method, computer equipment and device

Country Status (1)

Country Link
CN (1) CN112597772A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779983A (en) * 2021-04-16 2021-12-10 南京擎盾信息科技有限公司 Text data processing method and device, storage medium and electronic device

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090103721A1 (en) * 2005-10-17 2009-04-23 Tomokazu Sada Data transmitting apparatus, data receiving apparatus and data communication apparatus
CN103886077A (en) * 2014-03-24 2014-06-25 广东省电信规划设计院有限公司 Short text clustering method and system
US20150046791A1 (en) * 2013-08-08 2015-02-12 Palantir Technologies, Inc. Template system for custom document generation
CN104850574A (en) * 2015-02-15 2015-08-19 博彦科技股份有限公司 Text information oriented sensitive word filtering method
CN105389354A (en) * 2015-11-02 2016-03-09 东南大学 Social media text oriented unsupervised method for extracting and sorting events
CN106484767A (en) * 2016-09-08 2017-03-08 中国科学院信息工程研究所 A kind of event extraction method across media
CN109670163A (en) * 2017-10-17 2019-04-23 阿里巴巴集团控股有限公司 Information identifying method, information recommendation method, template construction method and calculating equipment
CN109740152A (en) * 2018-12-25 2019-05-10 腾讯科技(深圳)有限公司 Determination method, apparatus, storage medium and the computer equipment of text classification
CN110020424A (en) * 2019-01-04 2019-07-16 阿里巴巴集团控股有限公司 Extracting method, the extracting method of device and text information of contract information
CN110443586A (en) * 2019-08-12 2019-11-12 Oppo(重庆)智能科技有限公司 Shared schedule information processing method and device, terminal and storage medium
CN110517082A (en) * 2019-08-29 2019-11-29 深圳前海微众银行股份有限公司 Advertisement sending method, device, equipment and computer readable storage medium
CN111198946A (en) * 2019-12-25 2020-05-26 北京邮电大学 Network news hotspot mining method and device
CN111507083A (en) * 2020-06-19 2020-08-07 科大讯飞(苏州)科技有限公司 Text analysis method, device, equipment and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090103721A1 (en) * 2005-10-17 2009-04-23 Tomokazu Sada Data transmitting apparatus, data receiving apparatus and data communication apparatus
US20150046791A1 (en) * 2013-08-08 2015-02-12 Palantir Technologies, Inc. Template system for custom document generation
CN103886077A (en) * 2014-03-24 2014-06-25 广东省电信规划设计院有限公司 Short text clustering method and system
CN104850574A (en) * 2015-02-15 2015-08-19 博彦科技股份有限公司 Text information oriented sensitive word filtering method
CN105389354A (en) * 2015-11-02 2016-03-09 东南大学 Social media text oriented unsupervised method for extracting and sorting events
CN106484767A (en) * 2016-09-08 2017-03-08 中国科学院信息工程研究所 A kind of event extraction method across media
CN109670163A (en) * 2017-10-17 2019-04-23 阿里巴巴集团控股有限公司 Information identifying method, information recommendation method, template construction method and calculating equipment
CN109740152A (en) * 2018-12-25 2019-05-10 腾讯科技(深圳)有限公司 Determination method, apparatus, storage medium and the computer equipment of text classification
CN110020424A (en) * 2019-01-04 2019-07-16 阿里巴巴集团控股有限公司 Extracting method, the extracting method of device and text information of contract information
CN110443586A (en) * 2019-08-12 2019-11-12 Oppo(重庆)智能科技有限公司 Shared schedule information processing method and device, terminal and storage medium
CN110517082A (en) * 2019-08-29 2019-11-29 深圳前海微众银行股份有限公司 Advertisement sending method, device, equipment and computer readable storage medium
CN111198946A (en) * 2019-12-25 2020-05-26 北京邮电大学 Network news hotspot mining method and device
CN111507083A (en) * 2020-06-19 2020-08-07 科大讯飞(苏州)科技有限公司 Text analysis method, device, equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XIANGFENG LUO等: "Power Series Representation Model of Text Knowledge Based on Human Concept Learning", IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, vol. 44, no. 1, 29 July 2013 (2013-07-29), pages 86, XP011535117, DOI: 10.1109/TSMCC.2012.2231674 *
刁洪: "基于CiteSpace的国内影视翻译研究可视化分析", 重庆工商大学学报, vol. 34, no. 6, 15 December 2017 (2017-12-15), pages 115 *
富雅玲等: "基于重点突发词的突发事件检测方法", 电子技术应用, vol. 46, no. 11, 6 November 2020 (2020-11-06), pages 82 *
郭俊枫;赵仁亮;郑娇龙;: "面向网页文本的地理要素变化发现", 地理信息世界, vol. 22, no. 01, 25 February 2015 (2015-02-25), pages 52 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779983A (en) * 2021-04-16 2021-12-10 南京擎盾信息科技有限公司 Text data processing method and device, storage medium and electronic device

Similar Documents

Publication Publication Date Title
CN106649818B (en) Application search intention identification method and device, application search method and server
Santos et al. Learning to combine multiple string similarity metrics for effective toponym matching
CN109885773B (en) Personalized article recommendation method, system, medium and equipment
Pu et al. Subject categorization of query terms for exploring Web users' search interests
US20170316519A1 (en) Mutually reinforcing ranking of social media accounts and contents
CN111460798A (en) Method and device for pushing similar meaning words, electronic equipment and medium
CN106970991B (en) Similar application identification method and device, application search recommendation method and server
CN113312461A (en) Intelligent question-answering method, device, equipment and medium based on natural language processing
US20170235836A1 (en) Information identification and extraction
CN112100396A (en) Data processing method and device
Vick et al. The effects of standardizing names for record linkage: Evidence from the United States and Norway
US20150206101A1 (en) System for determining infringement of copyright based on the text reference point and method thereof
CN111447575A (en) Short message pushing method, device, equipment and storage medium
CN112149422A (en) Enterprise news dynamic monitoring method based on natural language
US20110264683A1 (en) System and method for managing information map
US20170235835A1 (en) Information identification and extraction
Singh et al. Mining the blogosphere from a socio-political perspective
US10853429B2 (en) Identifying domain-specific accounts
CN112597772A (en) Hotspot information determination method, computer equipment and device
CN112434126B (en) Information processing method, device, equipment and storage medium
CN110222156B (en) Method and device for discovering entity, electronic equipment and computer readable medium
CN110968691B (en) Judicial hotspot determination method and device
CN113590792A (en) User problem processing method and device and server
Samah et al. TF-IDF and Data Visualization For Syafie Madhhab Hadith Scriptures Authenticity
CN111291248A (en) Searching method and system based on intelligent agent knowledge base

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination