WO2023125589A1 - 突发事件的监测方法及装置 - Google Patents

突发事件的监测方法及装置 Download PDF

Info

Publication number
WO2023125589A1
WO2023125589A1 PCT/CN2022/142554 CN2022142554W WO2023125589A1 WO 2023125589 A1 WO2023125589 A1 WO 2023125589A1 CN 2022142554 W CN2022142554 W CN 2022142554W WO 2023125589 A1 WO2023125589 A1 WO 2023125589A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity set
entity
similarity
emergency
associated text
Prior art date
Application number
PCT/CN2022/142554
Other languages
English (en)
French (fr)
Inventor
陈建国
陈涛
黄丽达
刘一青
陈杨
史盼盼
王晓萌
刘春慧
赵晨阳
狄文杰
刘连顺
秦阳阳
Original Assignee
北京辰安科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京辰安科技股份有限公司 filed Critical 北京辰安科技股份有限公司
Publication of WO2023125589A1 publication Critical patent/WO2023125589A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Definitions

  • the present disclosure relates to the technical field of information management, and in particular to an emergency monitoring method, device, electronic equipment and storage medium.
  • the present disclosure aims to solve one of the technical problems in the related art at least to a certain extent.
  • the embodiment of the first aspect of the present disclosure proposes a method for monitoring emergencies, including:
  • the embodiment of the second aspect of the present disclosure proposes a monitoring device for emergencies, including:
  • the first acquisition module is used to traverse the network information based on the reference words contained in the thesaurus, so as to extract candidate texts containing the reference words therefrom;
  • a first determination module configured to perform semantic analysis on the candidate texts, so as to determine the associated texts contained in the candidate texts that are associated with emergencies;
  • the second determination module is configured to perform entity extraction on the associated text, so as to determine a first entity set corresponding to the associated text;
  • a third determining module configured to determine a first similarity between the first entity set and a second entity set corresponding to each emergency event in the emergency event data set;
  • a fourth determination module configured to determine that the associated text corresponds to any second entity set when the first similarity between the first entity set and any second entity set is greater than a first threshold The associated text of the first incident of .
  • the embodiment of the third aspect of the present disclosure proposes an electronic device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the program, it realizes the The emergency monitoring method proposed in the embodiment of the first aspect.
  • the embodiment of the fourth aspect of the present disclosure provides a computer-readable storage medium storing a computer program.
  • the computer program is executed by a processor, the emergency monitoring method as proposed in the embodiment of the first aspect of the present disclosure is implemented.
  • the embodiment of the fifth aspect of the present disclosure provides a computer program product, including a computer program.
  • the computer program is executed by a processor, the emergency monitoring method as proposed in the embodiment of the first aspect of the present disclosure is implemented.
  • the emergency monitoring method, device, electronic equipment, and storage medium provided by the present disclosure have the following beneficial effects:
  • the network information is traversed to extract candidate texts containing the reference words, and then the candidate texts are semantically analyzed to determine the content of the candidate texts and burst
  • the associated text associated with the event and then perform entity extraction on the associated text to determine the first entity set corresponding to the associated text, and then determine the relationship between the first entity set and the second entity set corresponding to each emergency event in the emergency data set
  • the first similarity degree between the first entity set and any second entity set is greater than the first threshold, and finally the associated text is determined to be the first emergency event corresponding to any second entity set associated text. Therefore, analyzing and sorting out the emergency text contained in the network information can not only timely and accurately mine the relevant information of the emergency from the massive network information, but also cluster the texts describing the same emergency. , so that new emergencies can be discovered in time.
  • FIG. 1 is a schematic flowchart of a monitoring method for an emergency provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a method for monitoring emergencies provided by another embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of semantic analysis of candidate texts provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic flowchart of a method for monitoring an emergency provided by another embodiment of the present disclosure.
  • FIG. 5 is a schematic flowchart of a method for monitoring an emergency provided by another embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of an emergency monitoring device provided by an embodiment of the present disclosure.
  • FIG. 7 shows a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure.
  • FIG. 1 is a schematic flowchart of a method for monitoring an emergency provided by an embodiment of the present disclosure.
  • the embodiment of the present disclosure is illustrated by taking the emergency event monitoring method configured in the emergency event monitoring device as an example.
  • the emergency event monitoring device can be applied to any electronic device, so that the electronic device can execute emergency events. Event monitoring function.
  • the electronic device can be a personal computer (Personal Computer, referred to as PC), cloud device, mobile device, etc.
  • the mobile device can be a mobile phone, tablet computer, personal digital assistant, wearable device, vehicle-mounted device, etc., with various operating systems, Hardware devices for touch screens and/or displays.
  • the emergency monitoring method may include the following steps 101 to 105 .
  • Step 101 based on the reference words contained in the thesaurus, traverse the network information to extract candidate texts containing the reference words.
  • the thesaurus may contain reference words corresponding to each type of emergency, and the reference words contained in the thesaurus may be predetermined.
  • emergencies may include natural disasters, accident disasters, public health events, and social security events.
  • natural disasters may also include: heavy rain, tornado, earthquake, etc.
  • Accident disasters can include: car accidents, fires, etc.
  • Public health events can include infectious diseases, food poisoning, etc.
  • Social security incidents may include: terrorist attacks, large-scale crowd gatherings, and the like.
  • each type of reference event has its corresponding reference word.
  • a rainstorm event is usually accompanied by a strong wind, therefore, the strong wind may be a reference word corresponding to the rainstorm event.
  • the candidate text containing "gale" extracted from network information may be "a yellow warning of a strong wind was issued in a certain area today".
  • Step 102 perform semantic analysis on the candidate texts to determine associated texts contained in the candidate texts that are associated with the emergency event.
  • emergencies refer to unconventional events that do not occur frequently. Therefore, even if the acquired candidate text contains candidate words, the candidate text is not text related to emergencies. Thus, after the candidate text is acquired, semantic analysis can be performed on the candidate text to determine whether the candidate text describes an emergency.
  • an attention-enhanced bidirectional long-short-term memory model (BERT-Att-BiLSTM model) can be used to perform semantic analysis on the candidate texts, so as to determine the associated texts contained in the candidate texts that are associated with the emergency.
  • a latent Dirichlet Allocation can be used to perform semantic analysis on the candidate texts, so as to determine the associated texts contained in the candidate texts that are associated with the emergency.
  • any other desirable manner may also be used to perform semantic analysis on the candidate texts, so as to determine the associated texts contained in the candidate texts that are associated with the emergencies.
  • Step 103 performing entity extraction on the associated text to determine a first entity set corresponding to the associated text.
  • the first set of entities may include a first event type, a first geographic location, and a first time of occurrence.
  • the first event type may be the type of the emergency described in the associated text.
  • the first event type corresponding to the associated text may be rainstorm, tornado, and the like.
  • the first geographic location may be the geographic location where the emergency event described in the associated text occurs.
  • the first geographic location corresponding to the associated text may be XX County, XX City, XX province, or XX province.
  • the first occurrence time may be the occurrence event information of the emergency described in the associated text.
  • the first occurrence time corresponding to the associated text may be October 20, 2020, 2008, and so on.
  • the associated text does not contain the occurrence time corresponding to the emergency event, the first occurrence time in the first entity set corresponding to the associated text is empty; or, the associated text does not contain the geographic location corresponding to the emergency event , the first geographic location in the first entity set corresponding to the associated text is empty.
  • Step 104 determining a first similarity between the first entity set and the second entity set corresponding to each emergency event in the emergency event data set.
  • the second entity set corresponding to each emergency event may include a second event type, a second geographic location, and a second occurrence time.
  • the first event type and the second event type may be based on the Two similarities, the third similarity between the first geographic location and the second geographic location in the second entity set, the fourth similarity between the first occurrence time and the second occurrence time in the second entity set, determine the first entity The first similarity between the set and the second entity set corresponding to each incident in the incident data set.
  • the Euclidean distance formula and the Manhattan distance formula are used to calculate the distance between the first entity set and the first event time.
  • the first similarity between the two entity sets, or, by calculating the cosine similarity between the first entity set and the second entity set, the cosine similarity can be used as the first entity set and the second entity set first similarity.
  • Step 105 when the first similarity between the first entity set and any second entity set is greater than a first threshold, determine that the associated text is the associated text of the first emergency event corresponding to any second entity set.
  • the associated text can be associated with the first incident corresponding to any second entity set, that is, the associated text can be stored in the first incident corresponding to any second entity set in the emergency data set. in the collection corresponding to the event.
  • each first similarity when each first similarity is less than or equal to the first threshold, it means that the emergency data set does not contain the emergency described by the associated text, therefore, the associated text and the first entity set can be associated and stored in Emergencies dataset, as new emergencies. Therefore, the occurrence of new emergencies can be accurately monitored, so that emergency management personnel can monitor the occurrence of new emergencies in a timely manner, and according to the first event type in the first entity set corresponding to the emergency event , the first geographical location, the first incident, and take emergency measures for emergencies in a timely manner.
  • the network information is traversed to extract candidate texts containing the reference words, and then the candidate texts are semantically analyzed to determine the content of the candidate texts and burst
  • the associated text associated with the event and then perform entity extraction on the associated text to determine the first entity set corresponding to the associated text, and then determine the relationship between the first entity set and the second entity set corresponding to each emergency event in the emergency data set
  • the first similarity degree between the first entity set and any second entity set is greater than the first threshold, and finally the associated text is determined to be the first emergency event corresponding to any second entity set associated text. Therefore, analyzing and sorting out the emergency text contained in the network information can not only timely and accurately mine the relevant information of the emergency from the massive network information, but also cluster the texts describing the same emergency. , so that new emergencies can be discovered in time.
  • FIG. 2 is a schematic flowchart of a method for monitoring an emergency provided by an embodiment of the present disclosure. As shown in FIG. 2 , the method for monitoring an emergency may include the following steps 201 to 209 .
  • Step 201 based on the reference words contained in the thesaurus, traverse the network information to extract candidate texts containing the reference words.
  • multiple texts related to each type of emergency can be randomly obtained first, and the multiple texts are merged into one text, and then word segmentation is performed on the merged text to obtain the text contained in the merged text.
  • Word frequency statistics are performed on each word, that is, the number of times each word appears in the merged text is obtained, and then the word whose word frequency is greater than the preset threshold is used as the reference word corresponding to this type of emergency.
  • the weight corresponding to each reference word may be determined according to the word frequency corresponding to each reference word.
  • the weight is positively correlated with the word frequency, that is, the greater the word frequency corresponding to the reference word, the greater the corresponding weight.
  • the reference words and corresponding weights may also be adjusted according to the candidate texts.
  • UTF-8 Universal Character Set/Unicode Transformation Format
  • UTF-8 is a variable-length character encoding for Unicode (Unicode).
  • Step 202 perform semantic analysis on the candidate texts to determine associated texts contained in the candidate texts that are associated with the emergency event.
  • FIG. 3 is a schematic diagram of a semantic analysis of candidate texts provided by an embodiment of the present disclosure.
  • the two-way long-short-term memory model (BERT-Att- BiLSTM model) performs semantic analysis on the candidate texts to determine the associated texts contained in the candidate texts that are associated with emergencies. That is, first input the candidate text into BERT to obtain the semantic representation of the candidate text, and then input the semantic representation of the candidate text into Att-BiLSTM to further perform semantic analysis on the semantic representation of the candidate text and judge whether the candidate text is an emergency associated text.
  • BERT Bidirectional Encoder Representation from Transformers
  • BERT Bidirectional Encoder Representation from Transformers
  • the pre-trained BERT Chinese basic model (BERT-Base-Chinese) is used here, which can include a 12-layer transformer structure and 12 self-attention mechanisms, and the vector dimension is 768.
  • the number of layers of the transformer structure, the number of self-attention mechanisms, and the vector dimensions can be selected according to actual needs.
  • the bidirectional long short-term memory model (Bi-directional Long Short-Term Memory, BiLSTM) layer includes forward and reverse Its output is expressed as:
  • V is the weight
  • T power of V is the transposition matrix of V
  • ⁇ s is the weight corresponding to the sentence s in the attention mechanism
  • b s is the bias corresponding to the sentence s in the attention mechanism
  • tanh( .) is the hyperbolic tangent function
  • T is the number of characters
  • F is the semantic feature of the candidate text.
  • the logistic regression (Softmax) layer is used to generate the conditional probability on the class space according to the semantic features of the candidate text, so as to judge whether the candidate text is the relevant text of the emergency.
  • Step 203 performing entity extraction on the associated text to determine a first entity set corresponding to the associated text.
  • the reference word set corresponding to each event type can be obtained first from the reference lexicon, and then according to the number of occurrences of each reference word in the associated text in the reference word set corresponding to each event type, and each Refer to the weight of the words to determine the associated probability value between the associated text and each event type, and then determine the first event type corresponding to the associated text according to the second threshold corresponding to each associated probability value and each event type.
  • the calculation formula of the correlation probability value can be:
  • P e is the association probability value between the associated text and each event type e
  • C k is the number of occurrences of the reference word k
  • W k is the weight corresponding to the reference word k.
  • the first event type corresponding to the associated text corresponds to the event type e.
  • the associated post may correspond to the rainstorm event type and the typhoon event type.
  • location extraction may be performed on the associated text according to the location entity contained in the location entity library, so as to determine the first geographic location corresponding to the associated text.
  • the associated text may contain a specific location, such as XX County, XX City, XX province; it may not contain a specific location, but it includes a building that can represent a specific location, and then the associated text description can be determined according to the building The corresponding location of the emergency.
  • a specific location entity extraction may be performed on the associated text according to the first location entity library, so as to obtain a specific occurrence location of the emergency described by the associated text. If the specific location is not extracted, further extract the building entity from the content contained in the associated text according to the second location entity library, and then obtain the specific occurrence location of the emergency described by the associated text according to the geographical location of the building entity .
  • the first location entity library includes specific geographic locations
  • the second entity library includes buildings that can represent geographic locations.
  • the specific occurrence location after determining the specific occurrence location of the emergency described in the associated text, can be expressed in a structured manner, that is, in the form of XX county/district XX village/town of XX city in XX province.
  • the associated text if the associated text does not contain location information, the associated text is deleted, and the emergency event associated with it is no longer determined.
  • time extraction may be performed on the associated text based on a preset algorithm, so as to determine the first occurrence time corresponding to the associated text.
  • any desirable manner may be used to perform time extraction on the associated text, so as to determine the first occurrence time corresponding to the associated text.
  • regular expressions can be used to extract the time information contained in the associated text.
  • the time information extracted from the associated text is an absolute time, such as XX: XX, XX, XX, XX, XXXX, then the absolute information can be directly used as the first occurrence time corresponding to the associated text.
  • the time information extracted from the associated text is information such as yesterday, early morning, and three days ago, then the first occurrence time corresponding to the associated text is determined according to the release time of the associated text, and the first occurrence time at this time is relative to the relative time. For example, the release time of the associated text is March 5, 2020, and the time information contained in the associated text is yesterday, then the first time information corresponding to the associated text is March 4, 2020.
  • Step 204 determining the second similarity between the first event type and the second event type in the second entity set, the third similarity between the first geographic location and the second geographic location in the second entity set, and the first occurrence time A fourth similarity to the second occurrence time in the second entity set.
  • the first event type corresponding to the associated text should be the same as the first emergency event
  • the corresponding second event types are the same, so if the first event type is the same as the second event type, the second similarity is 1; if the first event type is the same as the second event type, the second similarity is 0.
  • the first event type is the same as the second event type, the difference between the first geographic location and the second geographic location is small, and the difference between the first occurrence time and the second occurrence time is also small, then the associated event What may be described is the first incident corresponding to the second entity set. Therefore, in the case that the first event type is the same as the second event type, the third similarity between the first geographic location and the second geographic location in the second entity set can be further calculated, and the first occurrence time and the second entity The fourth similarities between the second occurrence times are concentrated, so that the calculation amount can be reduced.
  • the third similarity between the first geographic location and the second geographic location in the second entity set may be determined according to the level of the first geographic location and the level of the second geographic location. Among them, the more detailed the geographic location, the higher the level. For example, if the first geographic location includes XX village/town in XX county/district of XX city in XX province, its level is the highest.
  • the third similarity may be 0.8. If the first geographic location and the second geographic location are in the same county/district, and the first geographic location or the second geographic location lacks village/town information, the third similarity may be 0.6. If the first geographic location and the second geographic location are in the same city, and the first geographic location or the second geographic location lacks county/district information, the third similarity may be 0.4. If the first geographic location and the second geographic location are in the same province, and there is no urban area information in the first geographic location or the second geographic location, the third similarity may be 0.2. Otherwise, the third similarity may be 0.
  • the fourth similarity may be determined according to the first time difference between the first occurrence time and the second occurrence time.
  • the fourth similarity may be 0.9. If the first time difference is less than 1 hour, the fourth similarity may be 0.7. If the first time difference is less than 1 day, the fourth similarity may be 0.5. If the first time difference is less than 3 days, the fourth similarity may be 0.3. Otherwise, the fourth similarity can be 0.
  • Step 205 according to the second similarity, the third similarity and the fourth similarity, determine the first similarity between the first entity set and the second entity set.
  • the first event type corresponding to the associated text should be the same as the first emergency event.
  • the corresponding second event types are completely the same, therefore, the second similarity plays a decisive role in the process of determining the first similarity.
  • calculation formula of the first similarity can be:
  • S 1 is the first similarity
  • S 2 is the second similarity
  • S 3 is the third similarity
  • S 4 is the fourth similarity
  • a is the weight corresponding to the third similarity
  • b is the fourth similarity corresponding weight.
  • the value of a may be 0.5, and the value of b may be 0.5.
  • Step 206 if the first similarity between the first entity set and any second entity set is greater than a first threshold, determine that the associated text is the associated text of the first emergency event corresponding to any second entity set.
  • Step 207 update any second entity set according to the first entity set, so as to obtain any updated second entity set.
  • the second occurrence time and the second geographical location in the second entity set corresponding to each first emergency event in the emergency data set may not be particularly detailed, if the newly detected associated text contains more For the specific first occurrence event and the first geographic location, the second entity set corresponding to the first emergency event described in the associated text may be updated to make the information of the first emergency event more accurate.
  • the second geographic location in any second entity set Update to get any updated second entity set.
  • the first geographic location corresponding to the associated text is XX Town, XX County, XX City, XX province, and the second geographic location contained in any second entity set is XX province, then the first geographic location has a higher level than the second The level of the geographic location, the second geographic location in the second entity set may be updated to XX Town, XX County, XX City, XX province.
  • the first occurrence time is an absolute time and the second occurrence time in any second entity set is a relative time
  • the first occurrence time is an absolute time and the second occurrence time in any second entity set is a relative time
  • the second occurrence time in any second entity set Event is updated to get any updated second entity set.
  • the first occurrence time corresponding to the associated text is the absolute time at 10:00 on October 22, 2016, and the second occurrence time included in any second entity set is estimated based on time information such as "early morning” and "yesterday”. If the relative time is obtained, the second occurrence time in the second entity set may be updated to 10 o'clock on October 22, 2016.
  • any second entity set does not need to be updated according to the first entity set.
  • the second entity set corresponding to the first emergency described by it is updated according to the first entity set corresponding to the associated text, so as to obtain the updated second entity set, so that the emergency database can be
  • the occurrence time and location of each emergency in the system are more accurate, so that it is convenient for relevant staff to take processing measures according to the specific information of the emergency time.
  • Step 208 determining a fifth similarity between any updated second entity set and each of the remaining second entity sets.
  • determining the fifth similarity degree in step 208 reference may be made to the specific description of determining the first similarity degree in the embodiments of the present disclosure, which will not be described in detail here.
  • Step 209 in response to any fifth similarity being greater than the first threshold, associating the emergency corresponding to the second entity set corresponding to any fifth similarity with the updated emergency corresponding to any second entity set .
  • any fifth similarity is greater than the first threshold, it means that there are other first emergencies identical to the first emergency described in the associated text in the emergency event data set, so the two A collection describing the same incident is merged.
  • any second entity set corresponding to the fifth similarity degree may also be based on any updated A second entity set updates the second entity set corresponding to any fifth similarity, that is, repeats steps 207, 208, and 209 until the entity sets corresponding to the two emergencies are identical.
  • the network information is traversed to extract candidate texts containing the reference words, and then the candidate texts are semantically analyzed to determine the content of the candidate texts and burst
  • entity extraction is performed on the associated text to determine the first entity set corresponding to the associated text, and then obtain the first similarity between the first entity set and the second entity set corresponding to each first emergency event Degree, when the first similarity between the first entity set and any second entity set is greater than the first threshold, determine that the associated text is the associated text of the first emergency event corresponding to any second entity set, and then Update any second entity set according to the first entity set to obtain any updated second entity set, and determine the fifth distance between any updated second entity set and every other second entity set Similarity, in response to any fifth similarity being greater than the first threshold, associating the emergency corresponding to the second entity set corresponding to any fifth similarity with the updated emergency corresponding to any second entity set .
  • FIG. 4 is a schematic flowchart of a method for monitoring an emergency provided by another embodiment of the present disclosure. As shown in FIG. 4 , the method for monitoring an emergency may include the following steps 401 to 406 .
  • Step 401 based on the reference words contained in the thesaurus, traverse the network information to extract candidate texts containing the reference words.
  • Step 402 perform semantic analysis on the candidate texts to determine the associated texts contained in the candidate texts that are associated with the emergency events.
  • Step 403 performing entity extraction on the associated text to determine a first entity set corresponding to the associated text.
  • Step 404 in response to the fact that the first entity set does not contain the first occurrence time, according to the second occurrence time corresponding to each second emergency event in the emergency event data set, obtain the first occurrence time included in the first entity set within a preset period of time. Multiple second incident events of the same event type.
  • the first entity set does not contain the first occurrence time, that is, the first occurrence time is empty, which means that the associated text does not contain the occurrence time corresponding to the emergency event, that is, the first entity set corresponding to the associated text only contains the first occurrence time An event type set first geographic location.
  • the associated text does not contain an emergency event
  • the type of the emergency event described in the associated text that is, the first event type in the first entity set, and the release time of the associated text
  • the emergency Among the emergencies included in the incident data set, a plurality of second emergencies that are similar in release time to the associated text and of the same event type are obtained, and then it is determined whether the associated text describes a certain second emergency.
  • the preset time period may be 5 days, 10 days and so on.
  • the release time of the associated text is September 15, 2021
  • the event type corresponding to the associated text is rainstorm
  • the preset time period is 5 days
  • the second occurrence time of September 2021 can be obtained from the emergency data set.
  • the second event type is multiple second emergencies of heavy rain.
  • the associated text when it does not contain an emergency event, it can also be included from the emergency event data set according to the first event type corresponding to the associated text, the first geographic location, and the release time of the associated text.
  • the emergencies a plurality of second emergencies that are close to the release time of the associated text, have the same event type, and are close in geographical location are acquired.
  • the release time of the associated text is August 10, 2021
  • the first event type corresponding to the associated text is rainstorm
  • the first geographic location is XX City, XX Province
  • the preset time period is 3 days
  • the emergency In the event data set the second occurrence time is from August 7, 2021 to August 10, 2021
  • the second geographical location is XX city or XX province in XX province
  • the second event type is multiple second emergencies of heavy rain .
  • Step 405 obtaining the total number of texts associated with each second emergency event, the number of identical characters between the associated text and the text associated with each second emergency event, and the release time of the associated text and the number of characters associated with each second emergency event A second time difference between the second occurrence times corresponding to the events.
  • each emergency event in the emergency event data set may be associated with multiple texts describing the emergency event.
  • the third threshold number of texts associated with the emergency event can be randomly selected to calculate the ratio between the text associated with the second emergency event and the associated text The same number of characters between, which can reduce the amount of calculation.
  • the third threshold may be 60, 80 and so on.
  • 60 texts can be randomly selected from the 200 texts associated with the second emergency event to calculate the difference between the associated text and the associated text.
  • Merge 60 texts associated with the second emergency into one text then perform word segmentation on the merged text and the associated text, calculate whether each character in the associated text appears in the merged text, and finally Determine the total number of characters in the combined text of the characters in the associated text, that is, the number of identical characters between the associated text and the text associated with the second emergency event.
  • word segmentation processing may be performed on the merged text and associated text in any preferred manner.
  • a Jieba tokenizer can be used to perform word segmentation processing on the merged text and associated text respectively.
  • Step 406 according to the total amount of text associated with the second emergency event, the number of identical characters and the second time difference, determine the second emergency event associated with the text.
  • the total number of texts associated with the second emergency, the same number of characters, and the second time difference can be input into a logistic regression model (Logistic Regression, LR) to determine whether the associated text describes the second emergency .
  • Logistic Regression, LR Logistic Regression
  • the calculation formula of the logistic regression model can be:
  • N W is the same number of characters between the relevant text and the text associated with the second emergency event
  • ⁇ t is the second time difference
  • NP is the total number of texts associated with the second emergency event.
  • ⁇ 0 , ⁇ 1 , ⁇ 2 , and ⁇ 3 are parameters of the logistic regression model, which can be determined during the training process of the logistic regression model.
  • Logit(P) is the output of the logistic regression model. If P is 0, it means that the associated text describes not the second emergency, and the associated text is deleted; if P is 1, it means that the associated text describes the second emergency. Second emergencies, and then associate the associated text with the second emergencies, that is, merge the associated texts into the text set corresponding to the second emergencies.
  • the network information is traversed to extract candidate texts containing the reference words. Then carry out semantic analysis on the candidate text to determine the associated text associated with the emergency event contained in the candidate text, and then perform entity extraction on the associated text to determine the first entity set corresponding to the associated text, which does not contain In the case of the first occurrence time, according to the second occurrence time corresponding to each second emergency event in the emergency event data set, obtain multiple second events of the same type as the first event contained in the first entity set within the preset period.
  • the second emergency event described in the associated text can also be accurately determined, so as to realize the organization of the text describing the emergency event in the network information.
  • Fig. 5 is a schematic flowchart of a method for monitoring an emergency provided by another embodiment of the present disclosure. As shown in FIG. 5 , the emergency monitoring method includes Phase 1 to Phase 3 .
  • Stage 1 Based on the reference words contained in the thesaurus, traverse the network information to extract candidate texts containing reference words, and then perform semantic analysis on the candidate texts to determine whether the candidate texts are associated with emergencies, If it is associated with the emergency event, the candidate text is the associated text of the emergency event.
  • Stage 2 Perform entity extraction on the associated text to obtain the first event type, first geographic location, and first occurrence time corresponding to the associated text.
  • Phase 3 After successfully extracting the first event type, first geographic location, and first occurrence time three entities from the associated text, obtain the first entity set and the first entity set corresponding to each first emergency event in the emergency event dataset.
  • the first similarity between two entity sets wherein the first entity set contains the first event type, the first geographic location, and the first occurrence time, and the second entity set contains the second event type, the second geographic location, and the second occurrence time, and then determine the first emergency event described by the associated text according to the first similarity, and store the associated text into the corresponding first emergency event in the emergency event data set.
  • the model judges the second emergency event described by the associated text, and associates the associated text with the corresponding second emergency event after determining the second emergency event described by the associated text.
  • the present disclosure also proposes a device for monitoring emergencies.
  • Fig. 6 is a schematic structural diagram of an emergency monitoring device provided by an embodiment of the present disclosure.
  • the emergency event monitoring apparatus 600 may include: a first acquisition module 610 , a first determination module 620 , a second determination module 630 , a third determination module 640 , and a fourth determination module 650 .
  • the first obtaining module 610 is configured to traverse the network information based on the reference words contained in the thesaurus, so as to extract candidate texts containing the reference words therefrom.
  • the first determination module 620 is configured to perform semantic analysis on the candidate texts, so as to determine relevant texts contained in the candidate texts that are associated with the emergency event.
  • the second determination module 630 is configured to perform entity extraction on the associated text, so as to determine the first entity set corresponding to the associated text.
  • the third determining module 640 is configured to determine a first similarity between the first entity set and the second entity set corresponding to each emergency event in the emergency event data set.
  • the fourth determination module 650 is configured to determine that the associated text is the first burst corresponding to any second entity set when the first similarity between the first entity set and any second entity set is greater than the first threshold The associated text of the event.
  • the second determination module 630 is specifically configured to: acquire a reference word set corresponding to each event type from the reference lexicon; The number of occurrences, and the weight of each reference word, determine the associated probability value between the associated text and each event type; according to the second threshold value corresponding to each associated probability value and each event type, determine the first threshold value corresponding to the associated text Event type; according to the location entity contained in the location entity library, extract the location of the associated text to determine the first geographic location corresponding to the associated text; based on a preset algorithm, extract the time of the associated text to determine the corresponding location of the associated text first occurrence time.
  • the third determination module 640 is specifically configured to: determine the second similarity between the first event type and the second event type in the second entity set, the first geographic location and the second geographic location in the second entity set The third similarity between locations, the fourth similarity between the first occurrence time and the second occurrence time in the second entity set; according to the second similarity, third similarity and fourth similarity, determine the first entity The first similarity between the set and the second entity set.
  • the third determining module 640 is specifically configured to: determine the fourth similarity according to the first time difference between the first occurrence time and the second occurrence time.
  • an update module is further included, specifically configured to: in response to the level of the first geographic location being higher than the level of the second geographic location in any second entity set, according to the first geographic location, any second geographic location The second geographic location in the entity set is updated to obtain any updated second entity set;
  • the update module is further configured to respond to the first occurrence time being an absolute time, the second occurrence time in any second entity set is a relative time, and according to the first occurrence time, for any second entity set The second occurrence time is updated to obtain any updated second entity set.
  • it also includes an association module, specifically configured to: determine the fifth similarity between any updated second entity set and each of the remaining second entity sets; in response to any fifth similarity greater than The first threshold is for associating an emergency event corresponding to any second entity set corresponding to any fifth similarity with an updated emergency event corresponding to any second entity set.
  • it further includes: a storage module, configured to store the associated text and the association of the first entity set into the emergency event data set in response to each of the first similarities being less than or equal to the first threshold.
  • a fifth determination module is further included, specifically configured to: in response to the fact that the first entity set does not contain the first occurrence time, according to the second occurrence time corresponding to each second emergency event in the emergency event data set, Obtain a plurality of second incidents of the same type as the first event contained in the first entity set within a preset period of time; acquire the total number of texts associated with each second incident, the associated text and each second incident The same number of characters between the associated texts, and the second time difference between the release time of the associated texts and the second occurrence time corresponding to each second emergency event; according to the total number of texts associated with the second emergency event, the same The number of characters and the second time difference determine the second unexpected event associated with the associated text.
  • the emergency monitoring device of the embodiment of the present disclosure first traverses the network information based on the reference words contained in the thesaurus to extract candidate texts containing reference words, and then performs semantic analysis on the candidate texts to determine candidate texts.
  • the associated text associated with the emergency event contained in the text and then perform entity extraction on the associated text to determine the first entity set corresponding to the associated text, and then determine that the first entity set corresponds to each emergency event in the emergency event dataset
  • the first similarity between the second entity set, and finally in the case of the first similarity between the first entity set and any second entity set is greater than the first threshold, determine that the associated text is any second entity set
  • the associated text of the corresponding first emergency event Therefore, analyzing and sorting out the emergency text contained in the network information can not only timely and accurately mine the relevant information of the emergency from the massive network information, but also cluster the texts describing the same emergency. , so that new emergencies can be discovered in time.
  • the present disclosure also proposes an electronic device, including: a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • the processor executes the program, the foregoing embodiments of the present disclosure are implemented. Proposed monitoring method for emergencies.
  • the present disclosure also proposes a computer-readable storage medium storing a computer program.
  • the computer program is executed by a processor, the method for monitoring emergencies as proposed in the foregoing embodiments of the present disclosure is implemented.
  • the present disclosure also proposes a computer program product, including a computer program, when the computer program is executed by a processor, it implements the method for monitoring emergencies as proposed in the foregoing embodiments of the present disclosure.
  • FIG. 7 shows a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure.
  • the electronic device 12 shown in FIG. 7 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
  • electronic device 12 takes the form of a general-purpose computing device.
  • Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16 , system memory 28 , bus 18 connecting various system components including system memory 28 and processing unit 16 .
  • Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include but are not limited to Industry Standard Architecture (Industry Standard Architecture; hereinafter referred to as: ISA) bus, Micro Channel Architecture (Micro Channel Architecture; hereinafter referred to as: MAC) bus, enhanced ISA bus, video electronics Standards Association (Video Electronics Standards Association; hereinafter referred to as: VESA) local bus and Peripheral Component Interconnection (hereinafter referred to as: PCI) bus.
  • Computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer device 12 and include both volatile and nonvolatile media, removable and non-removable media.
  • the memory 28 may include a computer system readable medium in the form of a volatile memory, such as a random access memory (Random Access Memory; hereinafter referred to as: RAM) 30 and/or a cache memory 32 .
  • Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media.
  • storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard drive").
  • a disk drive for reading and writing to a removable nonvolatile disk may be provided, as well as a disk drive for removable nonvolatile disks (such as a CD-ROM (Compact Disc Read Only Memory; hereinafter referred to as: CD-ROM), Digital Video Disc Read Only Memory (hereinafter referred to as: DVD-ROM) or other optical media).
  • each drive may be connected to bus 18 via one or more data media interfaces.
  • Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present disclosure.
  • a program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including but not limited to an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include implementations of network environments.
  • the program modules 42 generally perform the functions and/or methods of the embodiments described in this disclosure.
  • the computer device 12 may also communicate with one or more external devices 14 (e.g., a keyboard, pointing device, display 24, etc.), and with one or more devices that enable a user to interact with the computer device 12, and/or with Any device (eg, network card, modem, etc.) that enables the computing device 12 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interface 22 .
  • the computer device 12 can also communicate with one or more networks (such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN) and/or public networks, such as the Internet, through the network adapter 20. ) communication.
  • networks such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN) and/or public networks, such as the Internet, through the network adapt
  • network adapter 20 communicates with other modules of computer device 12 via bus 18 .
  • bus 18 It should be appreciated that although not shown in FIG. 7, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape Drives and data backup storage systems, etc.
  • the processing unit 16 executes various functional applications and data processing by running the programs stored in the system memory 28 , such as implementing the methods mentioned in the foregoing embodiments.
  • the network information is traversed to extract candidate texts containing reference words, and then the candidate texts are semantically analyzed to determine the content and bursts contained in the candidate texts.
  • the associated text associated with the event and then perform entity extraction on the associated text to determine the first entity set corresponding to the associated text, and then determine the relationship between the first entity set and the second entity set corresponding to each emergency event in the emergency data set
  • the first similarity degree between the first entity set and any second entity set is greater than the first threshold, and finally the associated text is determined to be the first emergency event corresponding to any second entity set associated text. Therefore, analyzing and sorting out the emergency text contained in the network information can not only timely and accurately mine the relevant information of the emergency from the massive network information, but also cluster the texts describing the same emergency. , so that new emergencies can be discovered in time.
  • first and second are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features.
  • the features defined as “first” and “second” may explicitly or implicitly include at least one of these features.
  • “plurality” means at least two, such as two, three, etc., unless otherwise specifically defined.
  • a "computer-readable medium” may be any device that can contain, store, communicate, propagate or transmit a program for use in or in conjunction with an instruction execution system, device or device.
  • computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM).
  • the computer-readable medium may even be paper or other suitable medium on which the program can be printed, as it may be possible, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or other suitable processing if necessary.
  • the program is processed electronically and stored in computer memory.
  • various parts of the present disclosure may be implemented in hardware, software, firmware or a combination thereof.
  • various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: a discrete Logic circuits, ASICs with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are implemented in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
  • the storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开提出一种突发事件的监测方法、装置、电子设备及存储介质,涉及信息管理技术领域。包括:基于参考词,从网络信息进行遍历中提取包含参考词的候选文本;对候选文本进行语义分析,以确定与突发事件关联的关联文本;对关联文本进行实体提取,以确定关联文本对应的第一实体集;确定第一实体集与突发事件数据集中每个突发事件对应的第二实体集之间的第一相似度;在第一实体集与任一第二实体集间的第一相似度大于第一阈值的情况下,确定关联文本为任一第二实体集对应的第一突发事件的关联文本。

Description

突发事件的监测方法及装置
相关申请的交叉引用
本申请基于申请号为202111640233.4、申请日为2021年12月29日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及信息管理技术领域,尤其涉及一种突发事件的监测方法、装置、电子设备及存储介质。
背景技术
随着互联网技术的发展,越来越多的用户会在互联网络上发布各种各样的信息。其中,也包括突发事件,比如,传染病、台风、洪水、爆炸、核事故等。突发事件的应急管理人员可以通过网络上的信息及时了解突发事件的情况。但是,从海量的网络信息中及时、准确地挖掘突发事件的相关信息并非易事。因此,如何从大量的网络信息中获取突发事件的相关信息成为重点的研究方向。
发明内容
本公开旨在至少在一定程度上解决相关技术中的技术问题之一。
本公开第一方面实施例提出了一种突发事件的监测方法,包括:
基于词库中包含的参考词,对网络信息进行遍历,以从中提取包含所述参考词的候选文本;
对所述候选文本进行语义分析,以确定所述候选文本中包含的与突发事件关联的关联文本;
对所述关联文本进行实体提取,以确定所述关联文本对应的第一实体集;
确定所述第一实体集与突发事件数据集中每个突发事件对应的第二实体集之间的第一相似度;
在所述第一实体集与任一第二实体集间的第一相似度大于第一阈值的情况下,确定所述关联文本为所述任一第二实体集对应的第一突发事件的关联文本。
本公开第二方面实施例提出了一种突发事件的监测装置,包括:
第一获取模块,用于基于词库中包含的参考词,对网络信息进行遍历,以从中提取包含所述参考词的候选文本;
第一确定模块,用于对所述候选文本进行语义分析,以确定所述候选文本中包含的与突发事件关联的关联文本;
第二确定模块,用于对所述关联文本进行实体提取,以确定所述关联文本对应的第一实体集;
第三确定模块,用于确定所述第一实体集与突发事件数据集中每个突发事件对应的第二实体集之间的第一相似度;
第四确定模块,用于在所述第一实体集与任一第二实体集间的第一相似度大于第一阈值的情况下,确定所述关联文本为所述任一第二实体集对应的第一突发事件的关联文本。
本公开第三方面实施例提出了一种电子设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时,实现如本公开第一方面实施例提出的突发事件的监测方法。
本公开第四方面实施例提出了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,实现如本公开第一方面实施例提出的突发事件的监测方法。
本公开第五方面实施例提出了一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时,实现如本公开第一方面实施例提出的突发事件的监测方法。
本公开提供的突发事件的监测方法、装置、电子设备及存储介质,存在如下有益效果:
本公开实施例中,首先基于词库中包含的参考词,对网络信息进行遍历,以从中提取包含参考词的候选文本,之后对候选文本进行语义分析,以确定候选文本中包含的与突发事件关联的关联文本,再对关联文本进行实体提取,以确定关联文本对应的第一实体集,之后确定第一实体集与突发事件数据集中每个突发事件对应的第二实体集之间的第一相似度,最后在第一实体集与任一第二实体集间的第一相似度大于第一阈值的情况下,确定关联文本为任一第二实体集对应的第一突发事件的关联文本。由此,将网络信息中包含的突发事件文本进行分析、整理,不仅可以从海量的网络信息中及时准确地挖掘突发事件的相关信息,而且可以将描述同一突发事件的文本进行聚类,从而可 以及时地发现新的突发事件。
本公开附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本公开的实践了解到。
附图说明
本公开上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1为本公开一实施例所提供的一种突发事件的监测方法的流程示意图;
图2为本公开另一实施例所提供的一种突发事件的监测方法的流程示意图;
图3为本公开一实施例所提供的一种对候选文本进行语义分析的示意图;
图4为本公开另一实施例所提供的一种突发事件的监测方法的流程示意图;
图5为本公开另一实施例所提供的一种突发事件的监测方法的流程示意图;
图6为本公开一实施例所提供的一种突发事件的监测装置的结构示意图;
图7示出了适于用来实现本公开实施方式的示例性电子设备的框图。
具体实施方式
下面详细描述本公开的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本公开,而不能理解为对本公开的限制。
下面参考附图描述本公开实施例的突发事件的监测方法、装置、电子设备和存储介质。
图1为本公开实施例所提供的一种突发事件的监测方法的流程示意图。
本公开实施例以该突发事件的监测方法被配置于突发事件的监测装置中来举例说明,该突发事件的监测装置可以应用于任一电子设备中,以使该电子设备可以执行突发事件的监测功能。
其中,电子设备可以为个人电脑(Personal Computer,简称PC)、云端设备、移动设备等,移动设备例如可以为手机、平板电脑、个人数字助理、穿戴式设备、车载设备等具有各种操作系统、触摸屏和/或显示屏的硬件设备。
如图1所示,该突发事件的监测方法可以包括以下步骤101至步骤105。
步骤101,基于词库中包含的参考词,对网络信息进行遍历,以从中提取包含参考词的候选文本。
在一些实施例中,词库中可以包含每种类型突发事件对应的参考词,词库中包含的参考词可以为预先确定的。
其中,突发事件可以包括自然灾害、事故灾难、公共卫生事件和社会安全事件等。比如自然灾害还可以包括:暴雨、龙卷风、地震等。事故灾难可以包括:车祸、火灾等。公共卫生事件可以包括传染病、食物中毒等。社会安全事件可以包括:恐怖袭击事件、大型聚众事件等。
可以理解的是,每种类型的参考事件都有其对应的参考词。比如,暴雨事件通常会伴随大风,因此,大风可以为暴雨事件对应的参考词。从网络信息中提取的包含“大风”的候选文本,可以为“今天某地区发布了大风黄色预警”。
步骤102,对候选文本进行语义分析,以确定候选文本中包含的与突发事件关联的关联文本。
需要说明的是,突发事件是指不经常发生的非常规事件,因此,即使获取的候选文本中包含候选词,但该候选文本并不是突发事件相关的文本。由此,在获取了候选文本之后,可以对候选文本进行语义分析,以确定候选文本是否描述了突发事件。
在一些实施例中,可以采用注意力增强的双向长短时记忆模型(BERT-Att-BiLSTM模型)对候选文本进行语义分析,以确定候选文本中包含的与突发事件关联的关联文本。
在一些实施例中,可以采用隐含狄利克雷分布(Latent Dirichlet Allocation,LDA)对候选文本进行语义分析,以确定候选文本中包含的与突发事件关联的关联文本。
需要说明的是,本公开实施例中,也可以采取其他任何可取的方式对候选文本进行语义分析,以确定候选文本中包含的与突发事件关联的关联文本。
步骤103,对关联文本进行实体提取,以确定关联文本对应的第一实体集。
在一些实施例中,第一实体集中可以包括第一事件类型、第一地理位置及第一发生时间。
其中,第一事件类型可以为关联文本描述的突发事件的类型。比如,关联文本对应的第一事件类型可以为暴雨、龙卷风等。
其中,第一地理位置可以为关联文本描述的突发事件发生的地理位置。比如,关联文本对应的第一地理位置可以为XX省XX市XX县,或者XX省等。
其中,第一发生时间可以为关联文本描述的突发事件的发生的事件信息。比如,关联文本对应的第一发生时间可以为2020年10月20日,2008年等。
需要说明的是,关联文本中若不包含突发事件对应的发生时间,则关联文本对应的第一实体集中的第一发生时间为空;或者,关联文本中不包含突发时间对应的地理位置,则关联文本对应的第一实体集中的第一地理位置为空。
步骤104,确定第一实体集与突发事件数据集中每个突发事件对应的第二实体集之间的第一相似度。
在一些实施例中,每个突发事件对应的第二实体集中可以包括第二事件类型、第二地理位置及第二发生时间。
在一些实施例中,在第一实体集中包含的第一事件类型、第一地理位置及第一发生时间均不为空的情况下,可以根据第一事件类型与第二事件类型之间的第二相似度,第一地理位置与第二实体集中第二地理位置之间的第三相似度、第一发生时间与第二实体集中第二发生时间之间的第四相似度,确定第一实体集与突发事件数据集中每个突发事件对应的第二实体集之间的第一相似度。
在一些实施例中,在第一实体集中包含的第一事件类型、第一地理位置及第一发生时间均不为空的情况下,采用欧式距离公式、曼哈顿距离公式计算第一实体集与第二实体集之间的第一相似度,或者,还可以通过计算第一实体集与第二实体集之间的余弦相似度,将余弦相似度作为第一实体集与第二实体集之间的第一相似度。
步骤105,在第一实体集与任一第二实体集间的第一相似度大于第一阈值的情况下,确定关联文本为任一第二实体集对应的第一突发事件的关联文本。
可以理解的是,若第一实体集与任一第二实体集间的第一相似度大于第一阈值,表示关联文本与任一第二实体集对应的第一突发事件描述的是同一个突发事件,因此,可以将关联文本与任一第二实体集对应的第一突发事件进行关联,即可以将关联文本存入突发事件数据集中任一第二实体集对应的第一突发事件对应的集合中。
在一些实施例中,在各个第一相似度均小于或等于第一阈值,表示突发事件数据集中不包含关联文本描述的突发事件,因此,可以将关联文本及第一实体集关联存入突发事件数据集中,作为新的突发事件。从而可以准确地监测到新的突发事件的发生,以便突发事件应急管理人员可以及时地监测到新的突发事件的发生,并根据突发事件对应的第一实体集中的第一事件类型、第一地理位置、第一发生事件,及时地对突发事件采取应急措施。
本公开实施例中,首先基于词库中包含的参考词,对网络信息进行遍历,以从中提取包含参考词的候选文本,之后对候选文本进行语义分析,以确定候选文本中包含的与突发事件关联的关联文本,再对关联文本进行实体提取,以确定关联文本对应的第一实体集,之后确定第一实体集与突发事件数据集中每个突发事件对应的第二实体集之间的第一相似度,最后在第一实体集与任一第二实体集间的第一相似度大于第一阈值的情况下,确定关联文本为任一第二实体集对应的第一突发事件的关联文本。由此,将网络信息中包含的突发事件文本进行分析、整理,不仅可以从海量的网络信息中及时准确地挖掘突发事件的相关信息,而且可以将描述同一突发事件的文本进行聚类,从而可以及时地发现新的突发事件。
图2为本公开一实施例所提供的一种突发事件的监测方法的流程示意图,如图2所示,该突发事件的监测方法可以包括以下步骤201至步骤209。
步骤201,基于词库中包含的参考词,对网络信息进行遍历,以从中提取包含参考词的候选文本。
本公开实施例中,可以先随机获取每种类型的突发事件相关的多个文本,将多个文本合并为一个文本,之后对合并后的文本进行分词处理,以获取合并后的文本中包含的词语,对每个词语进行 词频统计,即获取每个词语在合并后的文本中出现的次数,之后将词频大于预设阈值的词语作为该类型突发事件对应的参考词。
需要说明的是,每种类型突发事件对应的参考词可以为一个、也可以为多个。
在一些实施例中,在获取参考词之后,可以根据每个参考词对应的词频,确定每个参考词对应的权重。其中权重与词频呈正相关关系,即参考词对应的词频越大,则对应的权重越大。
在一些实施例中,在利用参考词抓取候选文本之后,还可以根据候选文本,调整参考词及对应的权重。
本公开实施例中,在获取每种类型的突发事件相关的多个文本之后,可以对每个文本进行删除统一资源定位符(uniform resource locator,URL)、空格、标点符号等预处理操作,之后将预处理后的每个文本编码为相同的编码格式。比如,将每个文本统一为UTF-8(Universal Character Set/Unicode Transformation Format)的编码格式。其中,UTF-8是针对统一码(Unicode)的一种可变长度字符编码。
步骤202,对候选文本进行语义分析,以确定候选文本中包含的与突发事件关联的关联文本。
本公开实施例中,图3为本公开一实施例所提供的一种对候选文本进行语义分析的示意图,如图3所示,可以采用注意力增强的双向长短时记忆模型(BERT-Att-BiLSTM模型)对候选文本进行语义分析,以确定候选文本中包含的与突发事件关联的关联文本。即先将候选文本输入BERT中,以获取候选文本的语义表示,之后将候选文本的语义表示输入Att-BiLSTM中,以进一步对候选文本的语义表示进行语义分析,判断候选文本是否为突发事件的关联文本。
本公开实施例中,BERT(Bidirectional Encoder Representation from Transformers)是词向量生成模型,可以采用双向的transformer架构,能够联合模型所有层中的上下文进行训练。此处采用预训练的BERT中文基础模型(BERT-Base-Chinese),其中,可以包括12层transformer结构和12个自注意力机制,向量维度为768。
需要说明的是,本公开实施例中,transformer结构的层数自注意力机制的数量,及向量维度可以根据实际需要选择。
本公开实施例中,双向长短时记忆模型(Bi-directional Long Short-Term Memory,BiLSTM)层包括前向
Figure PCTCN2022142554-appb-000001
和反向
Figure PCTCN2022142554-appb-000002
其输出表示为:
Figure PCTCN2022142554-appb-000003
其中,
Figure PCTCN2022142554-appb-000004
为句子s的字符i的前向信息,
Figure PCTCN2022142554-appb-000005
为句子s的字符i的后向信息,两者均为隐藏向量。
各字符的注意力权重表示如下:
Figure PCTCN2022142554-appb-000006
Figure PCTCN2022142554-appb-000007
其中,
Figure PCTCN2022142554-appb-000008
为注意力得分,V为权重,V的T次方是V的转置矩阵,ω s为注意力机制中句子s对应的权重,b s为注意力机制中句子s对应的偏置,tanh(.)为双曲正切函数,T为字符数量,
Figure PCTCN2022142554-appb-000009
为句子s中字符i的注意力权重。
其中,上下文表示的输出为:
Figure PCTCN2022142554-appb-000010
其中,F为候选文本的语义特征。
其中,逻辑回归(Softmax)层用于根据候选文本的语义特征生成类空间上的条件概率,从而判断候选文本是否是突发事件的关联文本。
步骤203,对关联文本进行实体提取,以确定关联文本对应的第一实体集。
在一些实施例中,可以先从参考词库中获取每个事件类型对应的参考词集,之后根据每个事件类型对应的参考词集中每个参考词在关联文本中的出现次数,及每个参考词的权重,确定关联文本与每个事件类型间的关联概率值,之后根据每个关联概率值与每个事件类型对应的第二阈值,确定关联文本对应的第一事件类型。
其中,关联概率值的计算公式可以为:
P e=∑ kC kW k
其中,P e关联文本与每个事件类型e间的关联概率值,为C k为参考词k的出现次数,W k为参考词k对应的权重。
若P e的值大于事件类型e对应的第二阈值,则该关联文本对应的第一事件类型对事件类型e。
需要说明的是,由于自然灾害之间往往存在连锁反应,比如,台风通常伴随着暴雨发生,因此,一个关联文本也可以对应多个事件类型。若关联帖子既描述了台风,又描述了由于台风原因造成的暴雨,则该关联帖子可以对应暴雨事件类型和台风事件类型。
在一些实施例中,可以根据位置实体库中包含的位置实体,对关联文本进行位置提取,以确定关联文本对应的第一地理位置。
在一些实施例中,关联文本中可能包含具体的位置,比如XX省XX市XX县;也可以不包含具体的位置,但是包含可以代表具体位置的建筑物,进而可以根据建筑物确定关联文本描述的突发事件对应的位置。
本公开实施例中,可以先根据第一位置实体库对关联文本进行具体的位置实体提取,以获取关联文本描述的突发事件的具体发生位置。若未提取到具体的位置,则进一步根据第二位置实体库从关联文本包含的内容中提取建筑物实体,进而根据建筑物实体所在的地理位置,获取关联文本描述的突发事件的具体发生位置。
其中,第一位置实体库中包含的是具体的地理位置,第二实体库中包含的是可以代表地理位置的建筑物等。
在一些实施例中,在确定关联文本描述的突发事件的具体发生位置之后,可以将具体发生位置进行结构化表示,即表示为XX省XX市XX县/区XX村/镇的形式。
本公开实施例中,若关联文本中不包含位置信息,则将关联文本删除,不再确定与其关联的突发事件。
在一些实施例中,可以基于预设的算法,对关联文本进行时间提取,以确定关联文本对应的第一发生时间。
在一些实施例中,本公开实施例中,可以采取任何可取的方式对关联文本进行时间提取,以确定关联文本对应的第一发生时间。比如,可以采用正则表达式提取关联文本中包含的时间信息。
本公开实施例中,若从关联文本中提取的时间信息为绝对时间,比如XXXX年XX月XX日XX时XX分,则可以直接将该绝对信息作为关联文本对应的第一发生时间。若从关联文本中提取的时间信息为昨天、清晨、三天前等信息,进而根据关联文本的发布时间确定关联文本对应的第一发生时间,此时的第一发生时间对相对时间。比如关联文本的发布时间为2020年3月5日,关联文本中包含的时间信息为昨天,则关联文本对应的第一时间信息为2020年3月4日。
步骤204,确定第一事件类型与第二实体集中第二事件类型之间的第二相似度、第一地理位置与第二实体集中第二地理位置之间的第三相似度、第一发生时间与第二实体集中第二发生时间之间的第四相似度。
可以理解的是,若关联文本与突发事件数据集中的某一第一突发事件描述的是相同的突发事件的情况下,关联文本对应的第一事件类型应该与该第一突发事件对应的第二事件类型相同,因此,若第一事件类型与第二事件类型的相同,则第二相似度为1;若第一事件类型与第二事件类型的相同,则第二相似度为0。
本公开实施例中,若第一事件类型与第二事件类型相同,第一地理位置与第二地理位置的差异 较小,第一发生时间与第二发生时间的差异也较小,则关联事件可能描述的是第二实体集对应的第一突发事件。因此,可以在第一事件类型与第二事件类型相同的情况下,进一步计算第一地理位置与第二实体集中第二地理位置之间的第三相似度,及第一发生时间与第二实体集中第二发生时间之间的第四相似度,从而可以减少计算量。
在一些实施例中,可以根据第一地理位置的等级与第二地理位置的等级,确定第一地理位置与第二实体集中第二地理位置之间的第三相似度。其中,地理位置越详细,等级越高。比如,第一地理位置中包含XX省XX市XX县/区XX村/镇,则其等级最高。
举例来说,若第一地理位置与第二地理位置为同一村/镇,则第三相似度可以为0.8。若第一地理位置与第二地理位置为同一县/区,且第一地理位置或第二地理位置缺失村/镇信息,则第三相似度可以为0.6。若第一地理位置与第二地理位置为同一个市,且第一地理位置或第二地理位置缺失县/区信息,则第三相似度可以为0.4。若第一地理位置与第二地理位置为同一个省,且第一地理位置或第二地理位置缺失市区信息,则第三相似度可以为0.2。否则,第三相似度可以为0。
需要说明的是,上述示例只是简单的举例说明,不能作为本公开实施例中第三相似度的具体限定。
在一些实施例中,可以根据第一发生时间与第二发生时间之间的第一时间差,确定第四相似度。
举例来说,若第一时间差小于1分钟,则第四相似度可以为0.9。第一时间差小于1小时,则第四相似度可以为0.7。若第一时间差小于1天,则第四相似度可以为0.5。若第一时间差小于3天,则第四相似度可以为0.3。否则,第四相似度可以为0。
需要说明的是,上述示例只是简单的举例说明,不能作为本公开实施例中第四相似度的具体限定。
步骤205,根据第二相似度、第三相似度及第四相似度,确定第一实体集与第二实体集之间的第一相似度。
需要说明的是,若关联文本与突发事件数据集中的某一第一突发事件描述的是相同的突发事件的情况下,关联文本对应的第一事件类型应该与该第一突发事件对应的第二事件类型完全相同,因此,第二相似度在确定第一相似度的过程中起决定性作用。
其中,第一相似度的计算公式可以为:
S 1=S 2×(aS 3+bS 4)
其中,S 1为第一相似度,S 2为第二相似度,S 3为第三相似度,S 4为第四相似度,a为第三相似度对应的权重,b为第四相似度对应的权重。
其中,a的取值可以为0.5,b的取值可以为0.5。
步骤206,在第一实体集与任一第二实体集间的第一相似度大于第一阈值的情况下,确定关联文本为任一第二实体集对应的第一突发事件的关联文本。
步骤207,根据第一实体集对任一第二实体集进行更新,以获取更新后的任一第二实体集。
可以理解的是,突发事件数据集中每个第一突发事件对应的第二实体集中的第二发生时间,第二地理位置,可能并不是特别详细,若新检测到的关联文本中包含更具体的第一发生事件、第一地理位置,则可以对关联文本描述的第一突发事件对应的第二实体集进行更新,以使该第一突发事件的信息更加准确。
在一些实施例中,在第一地理位置的等级高于任一第二实体集中的第二地理位置的等级的情况下,根据第一地理位置,对任一第二实体集中的第二地理位置进行更新,以获取更新后的任一第二实体集。
举例来说,关联文本对应的第一地理位置为XX省XX市XX县XX镇,而任一第二实体集中包含的第二地理位置为XX省,则第一地理位置的等级高于第二地理位置的等级,则可以将第二实体集中的第二地理位置更新为XX省XX市XX县XX镇。
在一些实施例中,在第一发生时间为绝对时间,任一第二实体集中的第二发生时间为相对时间的情况下,根据第一发生时间,对任一第二实体集中的第二发生事件进行更新,以获取更新后的任 一第二实体集。
举例来说,关联文本对应的第一发生时间为绝对时间2016年10月22日10时,而任一第二实体集中包含的第二发生时间为根据“清晨”、“昨日”等时间信息推算出的相对时间,则可以将第二实体集中的第二发生时间更新为2016年10月22日10时。
本公开实施例中,若第一实体集与任一第二实体集完全相同,则不需要根据第一实体集对任一第二实体集进行更新。
本公开实施例中,根据关联文本对应的第一实体集对与其描述的第一突发事件对应的第二实体集进行更新,以获取更新后的第二实体集,从而可以使突发事件数据库中每个突发事件对应的发生时间、发生位置更加准确,从而方便相关工作人员根据突发时间的具体信息采取处理措施。
步骤208,确定更新后的任一第二实体集与其余每个第二实体集之间的第五相似度。
其中,步骤208中确定第五相似度的具体实现形式,可参照本公开实施例中确定第一相似度的具体描述,此处不再详细赘述。
步骤209,响应于任一第五相似度大于第一阈值,将任一第五相似度对应的第二实体集对应的突发事件与更新后的任一第二实体集对应的突发事件关联。
可以理解的是,在任一第五相似度大于第一阈值的情况下,说明突发事件数据集中存在与关联文本描述的第一突发事件相同的其他第一突发事件,因此可以将这两个描述同一突发事件的集合进行合并。
本公开实施例中,在将任一第五相似度对应的第二实体集对应的突发事件与更新后的任一第二实体集对应的突发事件关联之后,还可以根据更新后的任一第二实体集对任一第五相似度对应的第二实体集进行更新,即循环步骤207、步骤208及步骤209,直至两个突发事件对应的实体集完全相同。
本公开实施例中,首先基于词库中包含的参考词,对网络信息进行遍历,以从中提取包含参考词的候选文本,之后对候选文本进行语义分析,以确定候选文本中包含的与突发事件关联的关联文本,对关联文本进行实体提取,以确定关联文本对应的第一实体集,之后获取第一实体集与每个第一突发事件对应的第二实体集之间的第一相似度,在第一实体集与任一第二实体集间的第一相似度大于第一阈值的情况下,确定关联文本为任一第二实体集对应的第一突发事件的关联文本,之后根据第一实体集对任一第二实体集进行更新,以获取更新后的任一第二实体集,确定更新后的任一第二实体集与其余每个第二实体集之间的第五相似度,响应于任一第五相似度大于第一阈值,将任一第五相似度对应的第二实体集对应的突发事件与更新后的任一第二实体集对应的突发事件关联。由此,将网络信息中包含的突发事件文本进行分析、整理,可以根据新发现的关联文本中包含的突发事件的具体信息,对突发时间数据集中包含的突发事件的信息进行更新,以使获取的突发事件的相关信息更加准确。
图4为本公开一另实施例所提供的一种突发事件的监测方法的流程示意图,如图4所示,该突发事件的监测方法可以包括以下步骤401至406。
步骤401,基于词库中包含的参考词,对网络信息进行遍历,以从中提取包含参考词的候选文本。
步骤402,对候选文本进行语义分析,以确定候选文本中包含的与突发事件关联的关联文本。
步骤403,对关联文本进行实体提取,以确定关联文本对应的第一实体集。
其中,步骤401至步骤403的具体实现形式,可参照本公开其他各实施例中的详细描述,此处不再详细赘述。
步骤404,响应于第一实体集中不包含第一发生时间,根据突发事件数据集中每个第二突发事件对应的第二发生时间,获取预设时段内与第一实体集中包含的第一事件类型相同的多个第二突发事件。
可以理解的是,第一实体集中不包含第一发生时间,即第一发生时间为空,表示关联文本中没有包含突发事件对应的发生时间,即关联文本对应的第一实体集中只包含第一事件类型集第一地理位置。
在一些实施例中,在关联文本中不包含突发事件的情况下,可以根据关联文本描述的突发事件 的类型,即第一实体集中第一事件类型,及关联文本的发布时间,从突发事件数据集中包含的突发事件中,获取与关联文本的发布时间相近,且事件类型相同的多个第二突发事件,进而判断关联文本是否描述某一第二突发事件。
其中,预设时段可以为5天,10天等。
举例来说,关联文本的发布时间为2021年9月15日,关联文本对应的事件类型为暴雨,预设时段为5天,则可以从突发事件数据集中获取第二发生时间在2021年9月10日至2021年9月15日,第二事件类型为暴雨的多个第二突发事件。
在一些实施例中,在关联文本中不包含突发事件的情况下,还可以根据关联文本对应的第一事件类型,第一地理位置,及关联文本的发布时间,从突发事件数据集中包含的突发事件中,获取与关联文本的发布时间相近,且事件类型相同,地理位置相近的多个第二突发事件。
举例来说,关联文本的发布时间为2021年8月10日,关联文本对应的第一事件类型为暴雨,第一地理位置为XX省XX市,预设时段为3天,则可以从突发事件数据集中获取第二发生时间在2021年8月7日至2021年8月10日,第二地理位置为XX省XX市或XX省,第二事件类型为暴雨的多个第二突发事件。
需要说明的是,上述示例只是简单的举例说明,不能作为本公开实施例中,关联文本的发布时间、第一事件类型、第一地理位置等的具体限定。
步骤405,获取每个第二突发事件关联的文本总数量、关联文本与每个第二突发事件关联的文本之间相同的字符数、及关联文本的发布时间与每个第二突发事件对应的第二发生时间之间的第二时间差。
需要说明的是,突发事件数据集中的每个突发事件可以关联多个描述该突发事件的文本。
本公开实施例中,若第二突发事件关联的文本总数量超过第三阈值,则可以随机选择第三阈值个与突发事件关联的文本计算第二突发事件关联的文本与关联文本之间相同的字符数,从而可以减少计算量。
其中,第三阈值可以为60、80等。
举例来说,若第二突发事件关联的文本总数量为200个,第三阈值为60,则可以随机从200个与第二突发事件关联的文本中选择60个文本计算与关联文本之间的相同的字符数。将60个与第二突发事件关联的文本合并为一个文本,之后分别对合并后的文本、及关联文本进行分词处理,计算关联文本中的每个字符是否在合并后的文本中出现,最后确定关联文本中的字符在合并后的文本中出现的总字符数量,即为关联文本与第二突发事件关联的文本之间相同的字符数。
在一些实施例中,本公开实施例中,可以采取任何可取的方式对合并后的文本、及关联文本进行分词处理。比如,可以采用结巴(Jieba)分词器分别对合并后的文本、及关联文本进行分词处理。
步骤406,根据第二突发事件关联的文本总数量、相同的字符数及第二时间差,确定关联文本关联的第二突发事件。
在一些实施例中,可以将第二突发事件关联的文本总数量、相同的字符数及第二时间差输入逻辑回归模型(Logistic Regression,LR),以确定关联文本是否描述该第二突发事件。
在一些实施例中,逻辑回归模型的计算公式可以为:
Logit(P)=β 01N w2Δt+β 3N p
其中,N W为关联文本与第二突发事件关联的文本之间相同的字符数,Δt为第二时间差,N P为第二突发事件关联的文本总数量。β 0、β 1、β 2、β 3为逻辑回归模型的参数,可以在逻辑回归模型的训练过程中确定。
其中,Logit(P)为逻辑回归模型的输出,若P为0,则表示关联文本描述的不是第二突发事件,则将关联文本删除;若P为1,则表示关联文本描述的是第二突发事件,进而可以将关联文本与该第二突发事件进行关联,即将关联文本合并至第二突发事件对应的文本集中。
本公开实施例,首先基于词库中包含的参考词,对网络信息进行遍历,以从中提取包含参考词的候选文本。之后对候选文本进行语义分析,以确定候选文本中包含的与突发事件关联的关联文本, 再对关联文本进行实体提取,以确定关联文本对应的第一实体集,在第一实体集中不包含第一发生时间的情况下,根据突发事件数据集中每个第二突发事件对应的第二发生时间,获取预设时段内与第一实体集中包含的第一事件类型相同的多个第二突发事件,之后再每个第二突发事件关联的文本总数量、关联文本与每个第二突发事件关联的文本之间相同的字符数、及关联文本的发布时间与每个第二突发事件对应的第二发生时间之间的第二时间差,最后根据第二突发事件关联的文本总数量、相同的字符数及第二时间差,确定关联文本关联的第二突发事件。由此,在关联文本中不包含第一发生时间的情况下,也可以准确地确定关联文本描述的第二突发事件,从而实现对网络信息中描述突发事件的文本的整理。
图5为本公开另一实施例所提供的一种突发事件的监测方法的流程示意图。如图5所示,该突发事件的监测方法包括阶段1至阶段3。
阶段1:对基于词库中包含的参考词,对网络信息进行遍历,以从中提取包含参考词的候选文本,之后对所述候选文本进行语义分析,以确定候选文本是否与突发事件关联,若与突发事件关联,则该候选文本为突发事件的关联文本。
阶段2:对关联文本进行实体提取,以获取关联文本对应的第一事件类型、第一地理位置、第一发生时间。
阶段3:在从关联文本中成功提取第一事件类型、第一地理位置、第一发生时间三个实体之后,获取第一实体集与突发事件数据集中每个第一突发事件对应的第二实体集之间的第一相似度,其中,第一实体集中包含第一事件类型、第一地理位置、第一发生时间,第二实体集中包含第二事件类型、第二地理位置、第二发生时间,之后根据第一相似度确定关联文本描述的第一突发事件,将关联文本存入突发事件数据集中与其对应的第一突发事件中。
在未从关联文本中提取到第一发生时间的情况下,根据关联文本的发布时间,从突发事件数据集中获取与关联文本的发布时间相近的多个第二突发事件,之后采用逻辑回归模型判断关联文本描述的第二突发事件,在确定关联文本描述的第二突发事件之后,将关联文本与其对应的第二突发事件关联。
为了实现上述实施例,本公开还提出一种突发事件的监测装置。
图6为本公开一实施例所提供的一种突发事件的监测装置的结构示意图。如图6所示,该突发事件的监测装置600可以包括:第一获取模块610、第一确定模块620、第二确定模块630、第三确定模块640、第四确定模块650。
第一获取模块610,用于基于词库中包含的参考词,对网络信息进行遍历,以从中提取包含参考词的候选文本。
第一确定模块620,用于对候选文本进行语义分析,以确定候选文本中包含的与突发事件关联的关联文本。
第二确定模块630,用于对关联文本进行实体提取,以确定关联文本对应的第一实体集。
第三确定模块640,用于确定第一实体集与突发事件数据集中每个突发事件对应的第二实体集之间的第一相似度。
第四确定模块650,用于在第一实体集与任一第二实体集间的第一相似度大于第一阈值的情况下,确定关联文本为任一第二实体集对应的第一突发事件的关联文本。
在一些实施例中,第二确定模块630,具体用于:从参考词库中获取每个事件类型对应的参考词集;根据每个事件类型对应的参考词集中每个参考词在关联文本中的出现次数,及每个参考词的权重,确定关联文本与每个事件类型间的关联概率值;根据每个关联概率值与每个事件类型对应的第二阈值,确定关联文本对应的第一事件类型;根据位置实体库中包含的位置实体,对关联文本进行位置提取,以确定关联文本对应的第一地理位置;基于预设的算法,对关联文本进行时间提取,以确定关联文本对应的第一发生时间。
在一些实施例中,第三确定模块640,具体用于:确定第一事件类型与第二实体集中第二事件类型之间的第二相似度、第一地理位置与第二实体集中第二地理位置之间的第三相似度、第一发生时间与第二实体集中第二发生时间之间的第四相似度;根据第二相似度、第三相似度及第四相似度, 确定第一实体集与第二实体集之间的第一相似度。
在一些实施例中,第三确定模块640,具体用于:根据第一发生时间与第二发生时间之间的第一时间差,确定第四相似度。
在一些实施例中,还包括更新模块,具体用于:响应于第一地理位置的等级高于任一第二实体集中的第二地理位置的等级,根据第一地理位置,对任一第二实体集中的第二地理位置进行更新,以获取更新后的任一第二实体集;
在一些实施例中,更新模块还用于响应于第一发生时间为绝对时间,任一第二实体集中的第二发生时间为相对时间,根据第一发生时间,对任一第二实体集中的第二发生时间进行更新,以获取更新后的任一第二实体集。
在一些实施例中,还包括关联模块,具体用于:确定更新后的任一第二实体集与其余每个第二实体集之间的第五相似度;响应于任一第五相似度大于第一阈值,将任一第五相似度对应的第二实体集对应的突发事件与更新后的任一第二实体集对应的突发事件关联。
在一些实施例中,还包括:存储模块,用于响应于各个第一相似度均小于或等于第一阈值,将关联文本及第一实体集关联存入突发事件数据集中。
在一些实施例中,还包括第五确定模块,具体用于:响应于第一实体集中不包含第一发生时间,根据突发事件数据集中每个第二突发事件对应的第二发生时间,获取预设时段内与第一实体集中包含的第一事件类型相同的多个第二突发事件;获取每个第二突发事件关联的文本总数量、关联文本与每个第二突发事件关联的文本之间相同的字符数、及关联文本的发布时间与每个第二突发事件对应的第二发生时间之间的第二时间差;根据第二突发事件关联的文本总数量、相同的字符数及第二时间差,确定关联文本关联的第二突发事件。
本公开实施例中的上述各模块的功能及具体实现原理,可参照上述各方法实施例,此处不再赘述。
本公开实施例的突发事件的监测装置,首先首先基于词库中包含的参考词,对网络信息进行遍历,以从中提取包含参考词的候选文本,之后对候选文本进行语义分析,以确定候选文本中包含的与突发事件关联的关联文本,再对关联文本进行实体提取,以确定关联文本对应的第一实体集,之后确定第一实体集与突发事件数据集中每个突发事件对应的第二实体集之间的第一相似度,最后在第一实体集与任一第二实体集间的第一相似度大于第一阈值的情况下,确定关联文本为任一第二实体集对应的第一突发事件的关联文本。由此,将网络信息中包含的突发事件文本进行分析、整理,不仅可以从海量的网络信息中及时准确地挖掘突发事件的相关信息,而且可以将描述同一突发事件的文本进行聚类,从而可以及时地发现新的突发事件。
为了实现上述实施例,本公开还提出一种电子设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行程序时,实现如本公开前述实施例提出的突发事件的监测方法。
为了实现上述实施例,本公开还提出了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时,实现如本公开前述实施例提出的突发事件的监测方法。
为了实现上述实施例,本公开还提出了一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时,实现如本公开前述实施例提出的突发事件的监测方法。
图7示出了适于用来实现本公开实施方式的示例性电子设备的框图。图7显示的电子设备12仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图7所示,电子设备12以通用计算设备的形式表现。计算机设备12的组件可以包括但不限于:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括系统存储器28和处理单元16)的总线18。
总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(Industry Standard Architecture;以下简称:ISA)总线,微通道体系结构(Micro Channel Architecture;以下简称:MAC)总线,增强型ISA总线、视频电子标准协 会(Video Electronics Standards Association;以下简称:VESA)局域总线以及外围组件互连(Peripheral Component Interconnection;以下简称:PCI)总线。
计算机设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机设备12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory;以下简称:RAM)30和/或高速缓存存储器32。计算机设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图7未显示,通常称为“硬盘驱动器”)。尽管图7中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如:光盘只读存储器(Compact Disc Read Only Memory;以下简称:CD-ROM)、数字多功能只读光盘(Digital Video Disc Read Only Memory;以下简称:DVD-ROM)或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本公开各实施例的功能。
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如存储器28中,这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本公开所描述的实施例中的功能和/或方法。
计算机设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该计算机设备12交互的设备通信,和/或与使得该计算机设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且,计算机设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(Local Area Network;以下简称:LAN),广域网(Wide Area Network;以下简称:WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器20通过总线18与计算机设备12的其它模块通信。应当明白,尽管图7中未示出,可以结合计算机设备12使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
处理单元16通过运行存储在系统存储器28中的程序,从而执行各种功能应用以及数据处理,例如实现前述实施例中提及的方法。
本公开的技术方案,首先基于词库中包含的参考词,对网络信息进行遍历,以从中提取包含参考词的候选文本,之后对候选文本进行语义分析,以确定候选文本中包含的与突发事件关联的关联文本,再对关联文本进行实体提取,以确定关联文本对应的第一实体集,之后确定第一实体集与突发事件数据集中每个突发事件对应的第二实体集之间的第一相似度,最后在第一实体集与任一第二实体集间的第一相似度大于第一阈值的情况下,确定关联文本为任一第二实体集对应的第一突发事件的关联文本。由此,将网络信息中包含的突发事件文本进行分析、整理,不仅可以从海量的网络信息中及时准确地挖掘突发事件的相关信息,而且可以将描述同一突发事件的文本进行聚类,从而可以及时地发现新的突发事件。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本公开的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本公开的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限 定。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本公开的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本公开的实施例所属技术领域的技术人员所理解。
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。
应当理解,本公开的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。
此外,在本公开各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本公开的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本公开的限制,本领域的普通技术人员在本公开的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (19)

  1. 一种突发事件的监测方法,包括:
    基于词库中包含的参考词,对网络信息进行遍历,以从中提取包含所述参考词的候选文本;
    对所述候选文本进行语义分析,以确定所述候选文本中包含的与突发事件关联的关联文本;
    对所述关联文本进行实体提取,以确定所述关联文本对应的第一实体集;
    确定所述第一实体集与突发事件数据集中每个突发事件对应的第二实体集之间的第一相似度;
    在所述第一实体集与任一第二实体集间的第一相似度大于第一阈值的情况下,确定所述关联文本为所述任一第二实体集对应的第一突发事件的关联文本。
  2. 根据权利要求1所述的方法,其中,所述对所述关联文本进行实体提取,以确定所述关联文本对应的第一实体集,包括:
    从所述参考词库中获取每个事件类型对应的参考词集;
    根据每个事件类型对应的参考词集中每个参考词在所述关联文本中的出现次数,及每个所述参考词的权重,确定所述关联文本与每个事件类型间的关联概率值;
    根据每个所述关联概率值与每个事件类型对应的第二阈值,确定所述关联文本对应的第一事件类型;
    根据位置实体库中包含的位置实体,对所述关联文本进行位置提取,以确定所述关联文本对应的第一地理位置;
    基于预设的算法,对所述关联文本进行时间提取,以确定所述关联文本对应的第一发生时间。
  3. 根据权利要求2所述的方法,其中,所述确定所述第一实体集与突发事件数据集中每个第一突发事件对应的第二实体集之间的第一相似度,包括:
    确定所述第一事件类型与所述第二实体集中第二事件类型之间的第二相似度、所述第一地理位置与所述第二实体集中第二地理位置之间的第三相似度、所述第一发生时间与所述第二实体集中第二发生时间之间的第四相似度;
    根据所述第二相似度、所述第三相似度及所述第四相似度,确定所述第一实体集与所述第二实体集之间的第一相似度。
  4. 根据权利要求3所述的方法,其中,所述确定所述第一发生时间与所述第二实体集中第二发生时间之间的第四相似度,包括:
    根据所述第一发生时间与所述第二发生时间之间的第一时间差,确定所述第四相似度。
  5. 根据权利要求3所述的方法,其特征在于,在所述确定所述关联文本为所述任一第二实体集对应的第一突发事件的关联文本之后,还包括:
    响应于所述第一地理位置的等级高于所述任一第二实体集中的第二地理位置的等级,根据所述第一地理位置,对所述任一第二实体集中的第二地理位置进行更新,以获取更新后的任一第二实体集;
    或者,
    响应于所述第一发生时间为绝对时间,所述任一第二实体集中的第二发生时间为相对时间,根据所述第一发生时间,对所述任一第二实体集中的第二发生时间进行更新,以获取更新后的任一第二实体集。
  6. 根据权利要求5所述的方法,其中,在所述获取更新后的任一第二实体集之后,还包括:
    确定所述更新后的任一第二实体集与其余每个第二实体集之间的第五相似度;
    响应于任一第五相似度大于所述第一阈值,将所述任一第五相似度对应的第二实体集对应的突发事件与所述更新后的任一第二实体集对应的突发事件关联。
  7. 根据权利要求1至6中任一项所述的方法,其中,在所述确定所述第一实体集与突发事件数据集中每个突发事件对应的第二实体集之间的第一相似度之后,还包括:
    响应于各个所述第一相似度均小于或等于所述第一阈值,将所述关联文本及所述第一实体 集关联存入所述突发事件数据集中。
  8. 根据权利要求2所述的方法,其中,在所述对所述关联文本进行实体提取,以确定所述关联文本对应的第一实体集之后,还包括:
    响应于所述第一实体集中不包含第一发生时间,根据所述突发事件数据集中每个第二突发事件对应的第二发生时间,获取预设时段内与所述第一实体集中包含的第一事件类型相同的多个第二突发事件;
    获取每个所述第二突发事件关联的文本总数量、所述关联文本与每个所述第二突发事件关联的文本之间相同的字符数、及所述关联文本的发布时间与每个所述第二突发事件对应的第二发生时间之间的第二时间差;
    根据所述第二突发事件关联的文本总数量、所述相同的字符数及所述第二时间差,确定所述关联文本关联的第二突发事件。
  9. 一种突发事件的监测装置,包括:
    第一获取模块,用于基于词库中包含的参考词,对网络信息进行遍历,以从中提取包含所述参考词的候选文本;
    第一确定模块,用于对所述候选文本进行语义分析,以确定所述候选文本中包含的与突发事件关联的关联文本;
    第二确定模块,用于对所述关联文本进行实体提取,以确定所述关联文本对应的第一实体集;
    第三确定模块,用于确定所述第一实体集与突发事件数据集中每个突发事件对应的第二实体集之间的第一相似度;
    第四确定模块,用于在所述第一实体集与任一第二实体集间的第一相似度大于第一阈值的情况下,确定所述关联文本为所述任一第二实体集对应的第一突发事件的关联文本。
  10. 根据权利要求9所述的装置,其中,所述第二确定模块,具体用于:
    从所述参考词库中获取每个事件类型对应的参考词集;
    根据每个事件类型对应的参考词集中每个参考词在所述关联文本中的出现次数,及每个所述参考词的权重,确定所述关联文本与每个事件类型间的关联概率值;
    根据每个所述关联概率值与每个事件类型对应的第二阈值,确定所述关联文本对应的第一事件类型;
    根据位置实体库中包含的位置实体,对所述关联文本进行位置提取,以确定所述关联文本对应的第一地理位置;
    基于预设的算法,对所述关联文本进行时间提取,以确定所述关联文本对应的第一发生时间。
  11. 根据权利要求10所述的装置,其中,所述第三确定模块,具体用于:
    确定所述第一事件类型与所述第二实体集中第二事件类型之间的第二相似度、所述第一地理位置与所述第二实体集中第二地理位置之间的第三相似度、所述第一发生时间与所述第二实体集中第二发生时间之间的第四相似度;
    根据所述第二相似度、所述第三相似度及所述第四相似度,确定所述第一实体集与所述第二实体集之间的第一相似度。
  12. 根据权利要求11所述的装置,其中,所述第三确定模块,具体用于:
    根据所述第一发生时间与所述第二发生时间之间的第一时间差,确定所述第四相似度。
  13. 根据权利要求11所述的装置,其特征在于,还包括更新模块,具体用于:
    响应于所述第一地理位置的等级高于所述任一第二实体集中的第二地理位置的等级,根据所述第一地理位置,对所述任一第二实体集中的第二地理位置进行更新,以获取更新后的任一第二实体集;
    或者,
    响应于所述第一发生时间为绝对时间,所述任一第二实体集中的第二发生时间为相对时间, 根据所述第一发生时间,对所述任一第二实体集中的第二发生时间进行更新,以获取更新后的任一第二实体集。
  14. 根据权利要求13所述的装置,还包括关联模块,具体用于:
    确定所述更新后的任一第二实体集与其余每个第二实体集之间的第五相似度;
    响应于任一第五相似度大于所述第一阈值,将所述任一第五相似度对应的第二实体集对应的突发事件与所述更新后的任一第二实体集对应的突发事件关联。
  15. 根据权利要求9至14中任一项所述的装置,还包括:
    存储模块,用于响应于各个所述第一相似度均小于或等于所述第一阈值,将所述关联文本及所述第一实体集关联存入所述突发事件数据集中。
  16. 根据权利要求10所述的装置,还包括第五确定模块,具体用于:
    响应于所述第一实体集中不包含第一发生时间,根据所述突发事件数据集中每个第二突发事件对应的第二发生时间,获取预设时段内与所述第一实体集中包含的第一事件类型相同的多个第二突发事件;
    获取每个所述第二突发事件关联的文本总数量、所述关联文本与每个所述第二突发事件关联的文本之间相同的字符数、及所述关联文本的发布时间与每个所述第二突发事件对应的第二发生时间之间的第二时间差;
    根据所述第二突发事件关联的文本总数量、所述相同的字符数及所述第二时间差,确定所述关联文本关联的第二突发事件。
  17. 一种电子设备,包括:
    存储器;
    处理器;及
    存储在存储器上并可在处理器上运行的计算机程序;
    其中,所述处理器执行所述程序时,实现如权利要求1至8中任一所述的突发事件的监测方法。
  18. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至8中任一所述的突发事件的监测方法。
  19. 一种计算机程序产品,包括计算机程序,其中,所述计算机程序在被处理器执行时实现如权利要求1至8中任一项所述的突发事件的监测方法。
PCT/CN2022/142554 2021-12-29 2022-12-27 突发事件的监测方法及装置 WO2023125589A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111640233.4 2021-12-29
CN202111640233.4A CN114528396A (zh) 2021-12-29 2021-12-29 突发事件的监测方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023125589A1 true WO2023125589A1 (zh) 2023-07-06

Family

ID=81620207

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/142554 WO2023125589A1 (zh) 2021-12-29 2022-12-27 突发事件的监测方法及装置

Country Status (2)

Country Link
CN (1) CN114528396A (zh)
WO (1) WO2023125589A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117713386A (zh) * 2024-02-05 2024-03-15 国网山东省电力公司东营市河口区供电公司 电网智能监测控制方法、装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114528396A (zh) * 2021-12-29 2022-05-24 北京辰安科技股份有限公司 突发事件的监测方法、装置、电子设备及存储介质
CN117557946B (zh) * 2024-01-10 2024-05-17 中国科学技术大学 视频事件描述与归因生成方法、系统、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886567A (zh) * 2017-01-12 2017-06-23 北京航空航天大学 基于语义扩展的微博突发事件检测方法及装置
WO2019184217A1 (zh) * 2018-03-26 2019-10-03 平安科技(深圳)有限公司 热点事件分类方法、装置及存储介质
CN112100374A (zh) * 2020-08-28 2020-12-18 清华大学 文本聚类方法、装置、电子设备及存储介质
CN113836267A (zh) * 2021-09-24 2021-12-24 国家市场监督管理总局信息中心 一种突发事件检测方法及装置
CN114528396A (zh) * 2021-12-29 2022-05-24 北京辰安科技股份有限公司 突发事件的监测方法、装置、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886567A (zh) * 2017-01-12 2017-06-23 北京航空航天大学 基于语义扩展的微博突发事件检测方法及装置
WO2019184217A1 (zh) * 2018-03-26 2019-10-03 平安科技(深圳)有限公司 热点事件分类方法、装置及存储介质
CN112100374A (zh) * 2020-08-28 2020-12-18 清华大学 文本聚类方法、装置、电子设备及存储介质
CN113836267A (zh) * 2021-09-24 2021-12-24 国家市场监督管理总局信息中心 一种突发事件检测方法及装置
CN114528396A (zh) * 2021-12-29 2022-05-24 北京辰安科技股份有限公司 突发事件的监测方法、装置、电子设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117713386A (zh) * 2024-02-05 2024-03-15 国网山东省电力公司东营市河口区供电公司 电网智能监测控制方法、装置
CN117713386B (zh) * 2024-02-05 2024-04-16 国网山东省电力公司东营市河口区供电公司 电网智能监测控制方法、装置

Also Published As

Publication number Publication date
CN114528396A (zh) 2022-05-24

Similar Documents

Publication Publication Date Title
WO2023125589A1 (zh) 突发事件的监测方法及装置
US11036791B2 (en) Computerized system and method for determining non-redundant tags from a user's network activity
US9141600B2 (en) Computer arrangement for and computer implemented method of detecting polarity in a message
US20210358601A1 (en) Artificial intelligence system for clinical data semantic interoperability
US9880998B1 (en) Producing datasets for representing terms and objects based on automated learning from text contents
US11373146B1 (en) Job description generation based on machine learning
US11977574B2 (en) Method and apparatus for acquiring POI state information
CN114341865B (zh) 用于实时谈话的渐进式并置
CN111930792B (zh) 数据资源的标注方法、装置、存储介质及电子设备
Kamal Subjectivity classification using machine learning techniques for mining feature-opinion pairs from web opinion sources
CN113282703B (zh) 新闻数据的事件关联图谱构建方法及装置
CN110516034A (zh) 日志管理方法、装置、网络设备和可读存储介质
CN110750627A (zh) 一种素材的检索方法、装置、电子设备及存储介质
WO2023098658A1 (zh) 文本衔接性判断方法、装置、电子设备及存储介质
US11893990B2 (en) Audio file annotation
US20210064703A1 (en) Natural language processing using joint topic-sentiment detection
US9262395B1 (en) System, methods, and data structure for quantitative assessment of symbolic associations
US20190095427A1 (en) Assisted free form decision definition using rules vocabulary
CN113821590A (zh) 一种文本类别的确定方法、相关装置以及设备
US11698934B2 (en) Graph-embedding-based paragraph vector machine learning models
US11238243B2 (en) Extracting joint topic-sentiment models from text inputs
CN113836308A (zh) 网络大数据长文本多标签分类方法、系统、设备及介质
Qiu et al. Integrating NLP and Ontology Matching into a Unified System for Automated Information Extraction from Geological Hazard Reports
Jiang et al. A Discourse Coherence Analysis Method Combining Sentence Embedding and Dimension Grid
Shakeri Hossein Abad et al. Physical Activity, Sedentary Behavior, and Sleep on Twitter: Multicountry and Fully Labeled Public Data Set for Digital Public Health Surveillance Research

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22914841

Country of ref document: EP

Kind code of ref document: A1