CN112541075A - Method and system for extracting standard case time of warning situation text - Google Patents

Method and system for extracting standard case time of warning situation text Download PDF

Info

Publication number
CN112541075A
CN112541075A CN202011195667.3A CN202011195667A CN112541075A CN 112541075 A CN112541075 A CN 112541075A CN 202011195667 A CN202011195667 A CN 202011195667A CN 112541075 A CN112541075 A CN 112541075A
Authority
CN
China
Prior art keywords
time
text
case
elements
clauses
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011195667.3A
Other languages
Chinese (zh)
Other versions
CN112541075B (en
Inventor
叶恺翔
吕晓宝
王坚
胡祥月
宋剑锋
王元兵
王海荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sugon Nanjing Research Institute Co ltd
Original Assignee
Sugon Nanjing Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sugon Nanjing Research Institute Co ltd filed Critical Sugon Nanjing Research Institute Co ltd
Priority to CN202011195667.3A priority Critical patent/CN112541075B/en
Publication of CN112541075A publication Critical patent/CN112541075A/en
Application granted granted Critical
Publication of CN112541075B publication Critical patent/CN112541075B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a method and a system for extracting standard case time of an alarm condition text, and belongs to the technical field of extraction of public security alarm condition texts. The method comprises the following steps: sequentially extracting time elements in the warning situation text in a named entity identification mode; cutting the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and time elements; establishing and training a case time recognition model, and recognizing the expression content in the text clauses through the case time recognition model to determine case time; carrying out standardization processing on the determined case time; and merging the case sending time after the standardization processing, and further marking the merged case sending time. The invention adds a case time recognition model on the basis of naming the entity recognition time elements, accurately recognizes and extracts the case time information, and provides service convenience and support for the policeman to quickly and accurately analyze and check the alarm condition.

Description

Method and system for extracting standard case time of warning situation text
Technical Field
The invention belongs to the technical field of extraction of public security warning situation texts, and particularly relates to a method and a system for extracting standard case time of a warning situation text.
Background
The time element extraction technology in the text is mature, and the method can achieve good effects as a named entity recognition task, a regular expression method, a sequence labeling model method and the like. The regular expression is used for matching the text based on a fixed time expression template; and the sequence labeling model enables a machine to learn the characteristics of the time elements in the text sequence through artificial labels by depending on the text data labeled in advance.
However, in the public security alert system, how to distinguish the attribute of each time element in the alert text and convert the attribute into a standard time format to perform inference of multiple time relationships is not related by the current technology. The time elements in the warning text are divided into alarm time, case time, other background time and the like. Wherein, the time of occurrence is a time period or a time point under a specific scene. At present, the existing model in the prior art is difficult to accurately extract the case time in the alarm text, and the business pressure of the policemen is greatly increased.
Disclosure of Invention
The invention provides a method and a system for extracting standard case time of an alarm text, which aim to solve the problems in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for extracting standard case time of an alarm text comprises the following steps:
step 1: sequentially extracting time elements in the warning situation text in a named entity identification mode;
step 2: cutting the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and time elements;
and step 3: establishing and training a case time recognition model, and recognizing the expression content in the text clauses through the case time recognition model to determine case time;
and 4, step 4: carrying out standardization processing on the determined case time;
and 5: and merging the case sending time after the standardization processing, and further marking the merged case sending time.
In a further embodiment, the step 1 extracts the time elements by using a regular expression, and the specific process is as follows:
step 11: firstly, removing the content in parentheses in the warning situation text, and eliminating time element interference information in the parenthesis content;
step 12: then, extracting time elements in the text by using a regular expression, wherein the regular expression is as follows: ([0-9] {4} years)? ([0-9] {1,2} month)? ([0-9] {1,2} day)? (today | yesterbefore)? [ \ \ u4E00- \ \ u9FA5 ]? (night | morning | am | afternoon | evening)? [ \ \ u4E00- \ \ u9FA5 ]? ([0-9] {0,2} [ time | point ])? ([0-9] {0,2} min);
in the formula:
([0-9] {4} year) representing four digits plus "year" to match the time of year;
([0-9] {1,2} month), representing one or two digits plus "month" to match the month time;
([0-9] {1,2} day), representing one or two digits plus "day" to match the time of day;
(today | yesterday) [ \ \ u4E00- \ \ u9FA5] to match the relative date descriptions of "today", "yesterday" and "day before";
(night | morning | afternoon | evening) [ \\ \ u4E00- \ \ u9FA5] to match the period descriptions of "night", "morning", "afternoon" and;
([0-9] {1,2} [ hour | Point ]), representing a one or two digit number plus a "hour" or "point" to match a particular hour;
([0-9] {1,2} score), representing one or two digits plus "the score" to match a particular minute.
In a further embodiment, the step 2 is further:
firstly, arranging the extracted time elements in sequence according to the sequence of the occurrence in the warning situation text, and setting the first time as the warning time;
then, dividing the warning situation text into a plurality of text clauses through punctuation mark regular matching;
finally, determining text clauses where time elements except the alarm time are located; if the text clauses contain time elements and the left clause and the right clause of the text clauses do not contain the time elements, combining the left text clause and the right text clause which do not contain the time elements and the clauses containing the time elements to form new text clauses; and constructing key value pairs of which the time elements correspond to the text clauses one by one.
In a further embodiment, the pattern time recognition model in step 3 comprises a pre-training model and a discriminant model;
the method comprises the steps that a pre-training model firstly establishes a database, training data in the database are derived from historical warning situation data of artificially marked case sending time, and case sending time in a warning situation text is determined by comparing text clauses containing time elements in a warning situation text with the training data; automatically marking the judged text clause data and then supplementing the text clause data into a database;
the discrimination model comprises an input layer, a hidden layer and an output layer; the input layer is a text clause which is used for segmenting the warning situation text and contains time elements, and the number of nodes is the number of the text clauses; the hidden layer is data newly added into the database in the pre-training process and original data in the database; the output layer determines whether the time elements in the text clauses are the case time or not through comparison, and the number of the nodes of the output layer is equal to the number of the text clauses needing to be judged; aiming at the condition that the data extension exceeds the training database in the judging process, the inputted text clauses are processed manually, the processed data is supplemented into the database, and the data of the hidden layer is gradually increased along with the increase of the training process;
the discrimination model carries out error measurement and calculation on the discrimination result:
Figure BDA0002753938350000031
in the formula, XijFor text clause samples containing time elements, P (X)ij) Is the probability that the time element in the text clause is the case time, Q (X)ij) Is the probability that the time element in the text clause is not the case time, and P (X)ij)+Q(Xij) 1, M is the node number of the hidden layer, and N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
In a further embodiment, the step 4 is further:
step 41: directly determining a time element "year, month, day, hour" by "[0-9] {4} year", "[0-9] {1,2} month", "[0-9] {1,2} day", "[0-9] {0,2} [ hour | point ]" regular expression, and performing the next step;
step 42: if "night", "afternoon", "evening" appears in the time element text and the number of hours by "[0-9] {0,2} [ hour | point ]" is less than 12, the number of hours plus 12;
step 43: if the 'day' element in the time elements is missing and the time elements comprise 'today', 'yesterday' and 'before', deducing 0 day, 1 day and 2 days before according to the alarm time to obtain the corresponding 'day' element;
step 44: if a single element in the time elements 'year, month, day and hour' is missing, filling the corresponding element in the former time element;
step 45: the time elements are normalized to form the standard case time in the "yyymmddhh" 10-bit digital format.
In a further embodiment, the step 5 is further:
step 51: judging whether two adjacent time elements appear in the same text clause or not, when the two adjacent time elements appear in the same text clause and the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
step 52: calculating the hour difference between two adjacent time elements, and combining the standard case sending times corresponding to the two time elements to form a case sending time period when the difference between the two adjacent time elements is less than 24 hours and the former time is earlier than the latter time, otherwise executing the next step;
step 53: searching keywords in the text clauses, and when keywords 'start' and 'start' exist in the text clauses corresponding to the former time element in two adjacent time elements, keywords 'end' and 'end' exist in the text clauses corresponding to the latter time element, and the former time is earlier than the latter time, merging the standard case sending times corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
step 54: positioning the standard case time corresponding to the residual time elements as case time points;
step 55: and marking the case time period and the case time point according to the time sequence.
A standard case time extraction system of an alarm situation text comprises:
the first module is used for sequentially extracting the time elements in the warning situation text in a named entity identification mode;
a second module for dividing the warning situation text into a plurality of text clauses and constructing key value pairs of the text clauses and the time elements;
a third module for establishing and training a case time recognition model, and recognizing the expression content in the text clauses through the case time recognition model to determine the case time;
a fourth module for standardizing the determined case time;
and the fifth module is used for merging the case time after the standardization processing and further marking the case time after merging.
In a further embodiment, the first module extracts time elements by using a regular expression, first removes the content in the parentheses in the alert text, excludes the time element interference information in the parenthesis content, and then extracts the time elements in the text by using the regular expression, where the regular expression is:
([0-9] {4} years)? ([0-9] {1,2} month)? ([0-9] {1,2} day)? (today | yesterbefore)? [ \ \ u4E00- \ \ u9FA5 ]? (night | morning | am | afternoon | evening)? [ \ \ u4E00- \ \ u9FA5 ]? ([0-9] {0,2} [ time | point ])? ([0-9] {0,2} min);
in the formula:
([0-9] {4} year) representing four digits plus "year" to match the time of year;
([0-9] {1,2} month), representing one or two digits plus "month" to match the month time;
([0-9] {1,2} day), representing one or two digits plus "day" to match the time of day;
(today | yesterday) [ \ \ u4E00- \ \ u9FA5] to match the relative date descriptions of "today", "yesterday" and "day before";
(night | morning | afternoon | evening) [ \\ \ u4E00- \ \ u9FA5] to match the period descriptions of "night", "morning", "afternoon" and;
([0-9] {1,2} [ hour | Point ]), representing a one or two digit number plus a "hour" or "point" to match a particular hour;
([0-9] {1,2} score), representing one or two digits plus "score" to match a particular minute;
the second module firstly arranges the extracted time elements in sequence according to the sequence of the alarm situation texts, and sets the first time as the alarm time; then, dividing the warning situation text into a plurality of text clauses through punctuation mark regular matching; finally, determining text clauses where time elements except the alarm time are located; if the text clauses contain time elements and the left clause and the right clause of the text clauses do not contain the time elements, combining the left text clause and the right text clause which do not contain the time elements and the clauses containing the time elements to form new text clauses; and constructing key value pairs of which the time elements correspond to the text clauses one by one.
The third module establishes and trains a case time recognition model, and the case time recognition model comprises a pre-training model and a discrimination model;
the method comprises the steps that a pre-training model firstly establishes a database, training data in the database are derived from historical warning situation data of artificially marked case sending time, and case sending time in a warning situation text is determined by comparing text clauses containing time elements in a warning situation text with the training data; automatically marking the judged text clause data and then supplementing the text clause data into a database;
the discrimination model comprises an input layer, a hidden layer and an output layer; the input layer is a text clause which is used for segmenting the warning situation text and contains time elements, and the number of nodes is the number of the text clauses; the hidden layer is data newly added into the database in the pre-training process and original data in the database; the output layer determines whether the time elements in the text clauses are the case time or not through comparison, and the number of the nodes of the output layer is equal to the number of the text clauses needing to be judged; aiming at the condition that the data extension exceeds the training database in the judging process, the inputted text clauses are processed manually, the processed data is supplemented into the database, and the data of the hidden layer is gradually increased along with the increase of the training process;
the discrimination model carries out error measurement and calculation on the discrimination result:
Figure BDA0002753938350000051
in the formula, XijFor text clause samples containing time elements, P (X)ij) Is the probability that the time element in the text clause is the case time, Q (X)ij) Is the probability that the time element in the text clause is not the case time, and P (X)ij)+Q(Xij) 1, M is the node number of the hidden layer, and N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
In a further embodiment, the fourth module first determines the time element "year, month, day, hour" directly by a "0-9 {4} year", "[0-9] {1,2} month", "[0-9] {1,2} day", "[0-9] {0,2} [ hour | point ]" regular expression; when "night", "afternoon", "evening" appears in the time element text, and the number of hours by "[0-9] {0,2} [ hour | point ]" is less than 12, the number of hours plus 12; when the 'day' element in the time elements is missing and the time elements comprise 'today', 'yesterday' and 'before', reasoning 0 day, 1 day and 2 days before according to the alarm time to obtain the corresponding 'day' element; if a single element in the time elements 'year, month, day and hour' is missing, filling the corresponding element in the previous time element; finally, the time elements are standardized to form standard case time in a 'yyyymmddhh' 10-bit digital format;
the fifth module firstly judges whether two adjacent time elements appear in the same text clause, and when the two adjacent time elements appear in the same text clause and the previous time is earlier than the next time, the standard case sending time corresponding to the two time elements is merged to form a case sending time period; calculating the hour difference between two adjacent time elements, and combining the standard case sending time corresponding to the two time elements to form a case sending time period when the difference between the two adjacent time elements is less than 24 hours and the former time is earlier than the latter time; searching keywords in the text clauses, and when keywords 'start' and 'start' exist in the text clauses corresponding to the former time element in two adjacent time elements, keywords 'end' and 'end' exist in the text clauses corresponding to the latter time element, and the former time is earlier than the latter time, merging the standard case sending times corresponding to the two time elements to form a case sending time period; positioning the standard case time corresponding to the residual time elements as case time points; and finally, marking the case time period and the case time point according to the time sequence.
A computer processing system comprising a storage module, wherein a computer program for the standard case time extraction method of the alert text in any of the above embodiments is stored in the storage module.
Has the advantages that: firstly, a case time recognition model is added on the basis of naming entity recognition time elements, case time information is accurately recognized and extracted, and business convenience is provided for the policeman to rapidly analyze and check the alarm condition;
secondly, dividing the alert text into a plurality of text clauses containing time elements, constructing key value pairs of the text clauses and the time elements, and performing semantic recognition on the text clauses to judge whether the time elements in the text clauses are case sending time, so that the situation that case sending time recognition and extraction are difficult due to complex alert text content is reduced;
and finally, combining the case time, and marking the case time point and the case time period, thereby providing business support for the policeman to quickly and accurately analyze the alarm condition.
Drawings
Fig. 1 is a flow chart of extracting standard case time of the alert text of the present invention.
FIG. 2 is a schematic diagram of the structure of the discriminant model of the present invention.
Detailed Description
The technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings and embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
Researches show that time elements in the alarm text are divided into alarm time, case time, other background time and the like, but the existing public security alarm system is difficult to distinguish the time element attributes in the alarm text and realize reasoning of a plurality of time relations so as to accurately identify and extract the case time, and the work usually requires a policeman to carry out artificial identification marking, so that the business pressure of the policeman is greatly increased.
Example 1: as shown in fig. 1, in order to solve the problems in the prior art, embodiment 1 of the present invention provides a method for extracting the standard case time of an alert text, including the following steps:
step 1: sequentially extracting time elements in the warning situation text in a named entity identification mode;
step 2: cutting the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and time elements;
and step 3: establishing and training a case time recognition model, and recognizing the expression content in the text clauses through the case time recognition model to determine case time;
and 4, step 4: carrying out standardization processing on the determined case time;
and 5: and merging the case sending time after the standardization processing, and further marking the merged case sending time.
The content in the alert text is complex, the time elements are many, and the attributes are different. For example, the time elements in the alert text are mainly classified into alarm time, case time, other background time, and the like. These time elements of different attributes greatly increase the difficulty of extracting the time to issue. To illustrate embodiments of the present invention in more detail, the present application provides a simple alert text "10 hours 10/27/07/2020, and sends out the connected alarm name: in 2020, 7.21.21.7.P later, 1 time of a bank card of Zhang III (household address: xxx, identification number: xxx, date of birth: xxx) of an alarm is embezzled, and 100 yuan is lost; when the pen is stolen, 2 pens are brushed, and the total loss is 200 yuan, namely, the pen is stolen from 7 month, 25 days and 6 hours in 2020 to 7 month, 25 days and 8 hours and 10 minutes. At 9 am, 26 am, 7/26/2020, the bank is left to report the loss ". In the text, "07/27/10/40 minutes in 2020" is the alarm time, "7/21/7/late/7/2020", "7/25/2020 6/49/7/25/8/10 minutes" is the case time, and the birth date of the alarm person is other background time. The time elements have complex attributes, and the identification difficulty of the case time is greatly increased. Moreover, part of the time elements in the alert text are in a non-standard time format, such as "7 o' clock late", which also increases the difficulty of time element identification.
Therefore, in order to accurately discriminate the attribute of the time element, first, a useful time element in the alert text is accurately extracted. Further, step 1 adopts a regular expression to extract time elements, and the specific process is as follows:
firstly, the content in the parentheses in the warning text is removed, and the time element interference information in the parenthesis content is excluded. For example, the birth date of an alarm person in the above alarm situation text, in practical applications, the content in parentheses usually also includes the birth time of the alarm person, which interferes with the extraction of text time elements, and these interfering time elements should be excluded first;
then, extracting time elements in the text by using a regular expression, wherein the regular expression is as follows:
([0-9] {4} years)? ([0-9] {1,2} month)? ([0-9] {1,2} day)? (today | yesterbefore)? [ \ \ u4E00- \ \ u9FA5 ]? (night | morning | am | afternoon | evening)? [ \ \ u4E00- \ \ u9FA5 ]? ([0-9] {0,2} [ time | point ])? ([0-9] {0,2} min);
in the formula:
([0-9] {4} year) representing four digits plus "year" to match the time of year;
([0-9] {1,2} month), representing one or two digits plus "month" to match the month time;
([0-9] {1,2} day), representing one or two digits plus "day" to match the time of day;
(today | yesterday) [ \ \ u4E00- \ \ u9FA5] to match the relative date descriptions of "today", "yesterday" and "day before"; the wildcards of single Chinese characters can increase the richness of word matching and is close to the description of an alarm person in daily life;
(night | morning | afternoon | evening) [ \\ \ u4E00- \ \ u9FA5] to match the period descriptions of "night", "morning", "afternoon" and; the wildcard character of a single Chinese character can increase the richness of word matching, effectively solve the problem of irregular time elements in the warning text caused by spoken language, avoid directly extracting 7 points later as 7 points and ensure the accuracy of time element extraction;
([0-9] {1,2} [ hour | Point ]), representing a one or two digit number plus a "hour" or "point" to match a particular hour;
([0-9] {1,2} score), representing one or two digits plus "score" to match a particular minute;
the above-mentioned warning situation text is processed by the regular expression, and the time elements "07/27/10/2020 divided by 40", "7/21/7/late/2020 divided by 6/7/25/2020 divided by 8/10/7/25/26/2020 divided by 9 am" are sequentially extracted.
Because the content of the warning text is complex and the time elements are more, if the judgment is directly carried out, the case time judgment is difficult. Therefore, the warning situation text is divided into a plurality of text clauses, and semantic recognition is carried out on each text clause to judge whether the time element in the text clauses is the case time. Therefore, in a further embodiment, the step 2 specifically comprises the following steps:
firstly, arranging the extracted time elements in sequence according to the sequence of the occurrence in the warning situation text, and setting the first time as the warning time; the warning situation text is '10 minutes and 40 minutes at 07, 27 and 27 months in 2020', namely the warning time;
then, dividing the warning situation text into a plurality of text clauses through punctuation mark regular matching; at this time, the situation that a part of text clauses have no time elements occurs, and the content of the text clauses without the time elements is the content which is necessary for analyzing whether the time elements are the case time or not, so that the text clauses without the time elements can not be directly eliminated, and the text clauses without the time elements need to be merged into the text clauses with the time elements, so that the text content before and after the text clauses with the time elements is perfected, and the case time can be accurately judged;
finally, determining text clauses where time elements except the alarm time are located; if the text clauses contain time elements and the left clause and the right clause of the text clauses do not contain the time elements, combining the left text clause and the right text clause which do not contain the time elements and the clauses containing the time elements to form new text clauses; and constructing key value pairs corresponding to the time elements and the text clauses one by one, wherein the key value pairs of the time elements and the text clauses constructed by the warning situation text are as follows:
Figure BDA0002753938350000091
after the time elements are extracted, if the time elements need to be judged to be the case time, the accurate judgment can be carried out according to the semantics of the text clauses where the time elements are located. In a further embodiment, the case time recognition model is established and trained to recognize the expression content of the text clause, so as to determine whether the time element is the case time. The pattern time recognition model comprises a pre-training model and a discrimination model.
Firstly, a database is established through a pre-training model, and training data in the database is derived from historical warning situation data of artificially marked case sending time. Then comparing the text clauses containing time elements in the warning situation text with the training data to determine case time in the warning situation text; and the distinguished text clause data is automatically marked and then is supplemented into the database, so that the content of the database is further enriched, and the case time is rapidly distinguished in the actual distinguishing process.
With reference to fig. 2, the discriminant model includes an input layer, a hidden layer, and an output layer; the input layer is a text clause which is used for segmenting the warning situation text and contains time elements, and the number of nodes is the number of the text clauses; the hidden layer is data newly added into the database in the pre-training process and original data in the database; the output layer determines whether the time elements in the text clauses are the case time or not through comparison, and the number of the nodes of the output layer is equal to the number of the text clauses needing to be judged. And the text clauses containing time elements input by the input layer and the comparison data of the hidden layer are subjected to similarity comparison judgment, and finally, the judgment result is output by the output layer. And aiming at the condition that the data extension exceeds the training database in the judging process, the input text clauses can be manually processed, and the processed data is supplemented into the database, so that the data of the hidden layer is gradually increased along with the increase of the training process. Therefore, as the number of text clauses processed by the discrimination model increases, the discrimination difficulty of the discrimination model decreases.
And (3) carrying out error measurement and calculation on the discrimination result by adopting a cross entropy loss function for the discrimination structure of the discrimination model so as to increase the accuracy of the discrimination result:
Figure BDA0002753938350000101
in the formula, XijText clause samples containing time elements; p (X)ij) The probability that the time element in the text clause is the case time is given; q (X)ij) Is the probability that the time element in the text clause is not the case time, and P (X)ij)+Q(Xij) 1 is ═ 1; m is the number of nodes of the hidden layer; n is the number of text clause samples containing time elements; the smaller the H (P, Q) value is, the smaller the error representing the judgment result is; therefore, whether to adopt the determined case time can be determined according to the comparison result of the measured error and the set value. For example: and if H (P, Q) is less than the set value, collecting the letter, considering that the time element in the judged text clause is the case time, and if H (P, Q) is less than the set value, not collecting the letter, and manually judging whether the time element without collecting the letter is the case time or not.
Because Chinese characters exist in the extracted time elements, for example, "21 st night 7 p/2020" and "7 th 25 th 6 th 49 p/2020" are both time elements representing time, but "7 th night" and "49 p/6" are two different time representation modes, and if the two time representation modes are not unified, the digital archiving is not facilitated, and the inconvenience of analyzing and handling case situations by policemen is increased. Therefore, in order to uniformly process the case time, the case time needs to be standardized. In a further embodiment, step 4 is further:
step 41: directly determining a time element "year, month, day, hour" by "[0-9] {4} year", "[0-9] {1,2} month", "[0-9] {1,2} day", "[0-9] {0,2} [ hour | point ]" regular expression, and performing the next step;
step 42: if "night", "afternoon", "evening" appears in the time element text and the number of hours by "[0-9] {0,2} [ hour | point ]" is less than 12, the number of hours plus 12;
step 43: if the 'day' element in the time elements is missing and the time elements comprise 'today', 'yesterday' and 'before', deducing 0 day, 1 day and 2 days before according to the alarm time to obtain the corresponding 'day' element;
step 44: if a single element in the time elements 'year, month, day and hour' is missing, filling the corresponding element in the former time element;
step 45: carrying out standardization processing on the time elements to form standard case time in a 'yyyymmddhh' 10-bit digital format; the 1 st to 4 th digits represent the "year" time, the 5 th to 6 th digits represent the "month" time, the 7 th to 8 th digits represent the "day" time, and the 9 th to 10 th digits represent the "hour" time.
Therefore, the case time in the alert text "7 am 21 st 7 p 2020", "7 am 7 me 25 me 6 h 2020", "49 min 7 me 25 me 8 h 10 min 7 me". After normalization at each time, "2020072119", "2020072506", "2020072508" were obtained.
In the alert text, the infringed event of the alarmer may correspond to a plurality of time points, and the time points are different in interval, some may be separated by several days, and some may be separated by several hours. If the case time is determined by the time points, the case problems with short time intervals can lose the relevance, and the case solving difficulty of policemen is increased. Therefore, in a further embodiment, by performing merging processing on the standardized processed case time, merging case time points with shorter interval time, or merging case time points with obvious relevance and marking the merged case time, convenience is provided for the police to analyze the case. Therefore, the specific process of step 5 is:
step 51: judging whether two adjacent time elements appear in the same text clause or not, when the two adjacent time elements appear in the same text clause and the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
step 52: calculating the hour difference between two adjacent time elements, and combining the standard case sending times corresponding to the two time elements to form a case sending time period when the difference between the two adjacent time elements is less than 24 hours and the former time is earlier than the latter time, otherwise executing the next step;
step 53: searching keywords in the text clauses, and when keywords 'start' and 'start' exist in the text clauses corresponding to the former time element in two adjacent time elements, keywords 'end' and 'end' exist in the text clauses corresponding to the latter time element, and the former time is earlier than the latter time, merging the standard case sending times corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
step 54: positioning the standard case time corresponding to the residual time elements as case time points;
step 55: marking the case time period and the case time point according to the time sequence; for example, the time can be marked as "first episode time", "second episode time", "third episode time", etc.
The case time "7/21/late 7/2020" in the alert text is the case time point, and "7/25/6/49/2020" and "7/25/8/10" are combined into a case time period, which can be denoted as "first case time" of 2020072119 "and" second case time "of 2020072506 and 2020072508" by a label. The mode not only can be convenient for digital filing, but also can mark the case sending time, and the case sending time is prevented from being memorized by the policemen, so that the policemen can know the order of the case occurrence through the marked case sending time, the convenience of the policemen for processing the case is greatly improved, and the policemen can quickly and accurately analyze the case.
Example 2: the embodiment 2 of the invention provides a standard case time extraction system of an alarm text, which comprises a first module, a second module, a third module, a fourth module and a fifth module;
the first module is used for sequentially extracting time elements in the warning situation text in a named entity identification mode;
the second module is used for cutting the warning situation text into a plurality of text clauses and constructing key value pairs of the text clauses and the time elements;
the third module is used for establishing and training a case time recognition model, and recognizing the expression content in the text clauses through the case time recognition model to determine case time;
the fourth module is used for carrying out standardization processing on the determined case time;
the fifth module is used for merging the case sending time after the standardization processing and further marking the case sending time after the merging;
the first module, the second module, the third module, the fourth module and the fifth module of the standard case time extraction system of the alarm text are used for realizing the standard time extraction method of the alarm text in the embodiment 1, so that the standard time extraction method of the alarm text has the technical effect, and the standard case time extraction system of the alarm text also has the same advantages.
Example 3: embodiment 3 of the present invention provides a computer processing system including a memory module; a computer program for realizing the method for extracting the standard case time of the warning situation text in any embodiment is stored in the storage module; the computer processing system can be used for realizing the standard time extraction method of the warning situation text, so the computer processing system also has the technical effect of the standard time extraction method of the warning situation text.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A method for extracting the standard case time of an alarm text is characterized by comprising the following steps:
step 1: sequentially extracting time elements in the warning situation text in a named entity identification mode;
step 2: cutting the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and time elements;
and step 3: establishing and training a case time recognition model, and recognizing the expression content in the text clauses through the case time recognition model to determine case time;
and 4, step 4: carrying out standardization processing on the determined case time;
and 5: and merging the case sending time after the standardization processing, and further marking the merged case sending time.
2. The method for extracting the standard case time of the alarm text according to claim 1, wherein the step 1 adopts a regular expression to extract the time elements, and the specific process is as follows:
step 11: firstly, removing the content in parentheses in the warning situation text, and eliminating time element interference information in the parenthesis content;
step 12: then, extracting time elements in the text by using a regular expression, wherein the regular expression is as follows: ([0-9] {4} years)? ([0-9] {1,2} month)? ([0-9] {1,2} day)? (today | yesterbefore)? [ \ \ u4E00- \ \ u9FA5 ]? (night | morning | am | afternoon | evening)? [ \ \ u4E00- \ \ u9FA5 ]? ([0-9] {0,2} [ time | point ])? ([0-9] {0,2} min);
in the formula:
([0-9] {4} year) representing four digits plus "year" to match the time of year;
([0-9] {1,2} month), representing one or two digits plus "month" to match the month time;
([0-9] {1,2} day), representing one or two digits plus "day" to match the time of day;
(today | yesterday) [ \ \ u4E00- \ \ u9FA5] to match the relative date descriptions of "today", "yesterday" and "day before";
(night | morning | afternoon | evening) [ \\ \ u4E00- \ \ u9FA5] to match the period descriptions of "night", "morning", "afternoon" and;
([0-9] {1,2} [ hour | Point ]), representing a one or two digit number plus a "hour" or "point" to match a particular hour;
([0-9] {1,2} score), representing one or two digits plus "the score" to match a particular minute.
3. The method for extracting the standard case time of the warning text according to claim 1, wherein the step 2 further comprises:
firstly, arranging the extracted time elements in sequence according to the sequence of the occurrence in the warning situation text, and setting the first time as the warning time;
then, dividing the warning situation text into a plurality of text clauses through punctuation mark regular matching;
finally, determining text clauses where time elements except the alarm time are located; if the text clauses contain time elements and the left clause and the right clause of the text clauses do not contain the time elements, combining the left text clause and the right text clause which do not contain the time elements and the clauses containing the time elements to form new text clauses; and constructing key value pairs of which the time elements correspond to the text clauses one by one.
4. The method for extracting the standard case time of the warning text according to claim 1, wherein the case time recognition model in the step 3 comprises a pre-training model and a discriminant model;
the method comprises the steps that a pre-training model firstly establishes a database, training data in the database are derived from historical warning situation data of artificially marked case sending time, and case sending time in a warning situation text is determined by comparing text clauses containing time elements in a warning situation text with the training data; automatically marking the judged text clause data and then supplementing the text clause data into a database;
the discrimination model comprises an input layer, a hidden layer and an output layer; the input layer is a text clause which is used for segmenting the warning situation text and contains time elements, and the number of nodes is the number of the text clauses; the hidden layer is data newly added into the database in the pre-training process and original data in the database; the output layer determines whether the time elements in the text clauses are the case time or not through comparison, and the number of the nodes of the output layer is equal to the number of the text clauses needing to be judged; aiming at the condition that the data extension exceeds the training database in the judging process, the inputted text clauses are processed manually, the processed data is supplemented into the database, and the data of the hidden layer is gradually increased along with the increase of the training process;
the discrimination model carries out error measurement and calculation on the discrimination result:
Figure FDA0002753938340000021
in the formula, XijFor text clause samples containing time elements, P (X)ij) Is the probability that the time element in the text clause is the case time, Q (X)ij) Is the probability that the time element in the text clause is not the case time, and P (X)ij)+Q(Xij) 1, M is the node number of the hidden layer, and N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
5. The method for extracting the standard case time of the warning text according to claim 1, wherein the step 4 is further as follows:
step 41: directly determining a time element "year, month, day, hour" by "[0-9] {4} year", "[0-9] {1,2} month", "[0-9] {1,2} day", "[0-9] {0,2} [ hour | point ]" regular expression, and performing the next step;
step 42: if "night", "afternoon", "evening" appears in the time element text and the number of hours by "[0-9] {0,2} [ hour | point ]" is less than 12, the number of hours plus 12;
step 43: if the 'day' element in the time elements is missing and the time elements comprise 'today', 'yesterday' and 'before', deducing 0 day, 1 day and 2 days before according to the alarm time to obtain the corresponding 'day' element;
step 44: if a single element in the time elements 'year, month, day and hour' is missing, filling the corresponding element in the former time element;
step 45: the time elements are normalized to form the standard case time in the "yyymmddhh" 10-bit digital format.
6. The method for extracting the standard case time of the warning text according to claim 1, wherein the step 5 further comprises:
step 51: judging whether two adjacent time elements appear in the same text clause or not, when the two adjacent time elements appear in the same text clause and the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
step 52: calculating the hour difference between two adjacent time elements, and combining the standard case sending times corresponding to the two time elements to form a case sending time period when the difference between the two adjacent time elements is less than 24 hours and the former time is earlier than the latter time, otherwise executing the next step;
step 53: searching keywords in the text clauses, and when keywords 'start' and 'start' exist in the text clauses corresponding to the former time element in two adjacent time elements, keywords 'end' and 'end' exist in the text clauses corresponding to the latter time element, and the former time is earlier than the latter time, merging the standard case sending times corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
step 54: positioning the standard case time corresponding to the residual time elements as case time points;
step 55: and marking the case time period and the case time point according to the time sequence.
7. A standard case time extraction system of an alarm text is characterized by comprising:
the first module is used for sequentially extracting the time elements in the warning situation text in a named entity identification mode;
a second module for dividing the warning situation text into a plurality of text clauses and constructing key value pairs of the text clauses and the time elements;
a third module for establishing and training a case time recognition model, and recognizing the expression content in the text clauses through the case time recognition model to determine the case time;
a fourth module for standardizing the determined case time;
and the fifth module is used for merging the case time after the standardization processing and further marking the case time after merging.
8. The standard case time extraction system of an alert text according to claim 7,
the first module extracts time elements by adopting a regular expression, firstly removes the content in brackets in the warning situation text, eliminates the time element interference information in the bracket content, and then extracts the time elements in the text by utilizing the regular expression, wherein the regular expression is as follows:
([0-9] {4} years)? ([0-9] {1,2} month)? ([0-9] {1,2} day)? (today | yesterbefore)? [ \ \ u4E00- \ \ u9FA5 ]? (night | morning | am | afternoon | evening)? [ \ \ u4E00- \ \ u9FA5 ]? ([0-9] {0,2} [ time | point ])? ([0-9] {0,2} min);
in the formula:
([0-9] {4} year) representing four digits plus "year" to match the time of year;
([0-9] {1,2} month), representing one or two digits plus "month" to match the month time;
([0-9] {1,2} day), representing one or two digits plus "day" to match the time of day;
(today | yesterday) [ \ \ u4E00- \ \ u9FA5] to match the relative date descriptions of "today", "yesterday" and "day before";
(night | morning | afternoon | evening) [ \\ \ u4E00- \ \ u9FA5] to match the period descriptions of "night", "morning", "afternoon" and;
([0-9] {1,2} [ hour | Point ]), representing a one or two digit number plus a "hour" or "point" to match a particular hour;
([0-9] {1,2} score), representing one or two digits plus "score" to match a particular minute;
the second module firstly arranges the extracted time elements in sequence according to the sequence of the alarm situation texts, and sets the first time as the alarm time; then, dividing the warning situation text into a plurality of text clauses through punctuation mark regular matching; finally, determining text clauses where time elements except the alarm time are located; if the text clauses contain time elements and the left clause and the right clause of the text clauses do not contain the time elements, combining the left text clause and the right text clause which do not contain the time elements and the clauses containing the time elements to form new text clauses; constructing key value pairs of which the time elements correspond to the text clauses one by one;
the third module establishes and trains a case time recognition model, and the case time recognition model comprises a pre-training model and a discrimination model;
firstly, establishing a database by a pre-training model, wherein training data in the database is derived from historical alarm condition data of artificially marked case sending time, and determining the case sending time in the alarm condition text by comparing text clauses containing time elements in the alarm condition text with the training data; automatically marking the judged text clause data and then supplementing the text clause data into a database;
the discrimination model comprises an input layer, a hidden layer and an output layer; the input layer is a text clause which is used for segmenting the warning situation text and contains time elements, and the number of nodes is the number of the text clauses; the hidden layer is data newly added into the database in the pre-training process and original data in the database; the output layer determines whether the time elements in the text clauses are the case time or not through comparison, and the number of the nodes of the output layer is equal to the number of the text clauses needing to be judged; aiming at the condition that the data extension exceeds the training database in the judging process, the inputted text clauses are processed manually, the processed data is supplemented into the database, and the data of the hidden layer is gradually increased along with the increase of the training process;
the discrimination model carries out error measurement and calculation on the discrimination result:
Figure FDA0002753938340000051
in the formula, XijFor text clause samples containing time elements, P (X)ij) Is the probability that the time element in the text clause is the case time, Q (X)ij) Is the probability that the time element in the text clause is not the case time, and P (X)ij)+Q(Xij) 1, M is the node number of the hidden layer, and N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
9. The standard case time extraction system of an alert text according to claim 7,
the fourth module firstly determines the time elements 'year, month, day, hour' directly by '0-9 {4} year', '0-9 {1,2} month', '0-9 {1,2} day', '0-9 ] {0,2} [ hour | point ]' regular expression; when "night", "afternoon", "evening" appears in the time element text, and the number of hours by "[0-9] {0,2} [ hour | point ]" is less than 12, the number of hours plus 12; when the 'day' element in the time elements is missing and the time elements comprise 'today', 'yesterday' and 'before', reasoning 0 day, 1 day and 2 days before according to the alarm time to obtain the corresponding 'day' element; if a single element in the time elements 'year, month, day and hour' is missing, filling the corresponding element in the previous time element; finally, the time elements are standardized to form standard case time in a 'yyyymmddhh' 10-bit digital format;
the fifth module firstly judges whether two adjacent time elements appear in the same text clause, and when the two adjacent time elements appear in the same text clause and the previous time is earlier than the next time, the standard case sending time corresponding to the two time elements is merged to form a case sending time period; calculating the hour difference between two adjacent time elements, and combining the standard case sending time corresponding to the two time elements to form a case sending time period when the difference between the two adjacent time elements is less than 24 hours and the former time is earlier than the latter time; searching keywords in the text clauses, and when keywords 'start' and 'start' exist in the text clauses corresponding to the former time element in two adjacent time elements, keywords 'end' and 'end' exist in the text clauses corresponding to the latter time element, and the former time is earlier than the latter time, merging the standard case sending times corresponding to the two time elements to form a case sending time period; positioning the standard case time corresponding to the residual time elements as case time points; and finally, marking the case time period and the case time point according to the time sequence.
10. A computer processing system comprising a storage module, wherein a computer program for implementing the standard case time extraction method of the alert text according to any one of claims 1 to 6 is stored in the storage module.
CN202011195667.3A 2020-10-30 2020-10-30 Standard case sending time extraction method and system for alert text Active CN112541075B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011195667.3A CN112541075B (en) 2020-10-30 2020-10-30 Standard case sending time extraction method and system for alert text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011195667.3A CN112541075B (en) 2020-10-30 2020-10-30 Standard case sending time extraction method and system for alert text

Publications (2)

Publication Number Publication Date
CN112541075A true CN112541075A (en) 2021-03-23
CN112541075B CN112541075B (en) 2024-04-05

Family

ID=75013660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011195667.3A Active CN112541075B (en) 2020-10-30 2020-10-30 Standard case sending time extraction method and system for alert text

Country Status (1)

Country Link
CN (1) CN112541075B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116108163A (en) * 2023-04-04 2023-05-12 之江实验室 Text matching method, device, equipment and storage medium

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300565A1 (en) * 2016-04-14 2017-10-19 Xerox Corporation System and method for entity extraction from semi-structured text documents
US20180012462A1 (en) * 2016-07-11 2018-01-11 Google Inc. Methods and Systems for Providing Event Alerts
CN108305050A (en) * 2018-02-08 2018-07-20 贵州小爱机器人科技有限公司 Information of reporting a case to the security authorities and the extracting method of service requirement information, device, equipment and medium
CN108920461A (en) * 2018-06-26 2018-11-30 武大吉奥信息技术有限公司 A kind of polymorphic type and entity abstracting method and device containing complex relationship
US20190034499A1 (en) * 2017-07-29 2019-01-31 Splunk Inc. Navigating hierarchical components based on an expansion recommendation machine learning model
CN109472419A (en) * 2018-11-16 2019-03-15 中山大学 Method for building up, device and the storage medium of alert prediction model based on space-time
CN110287292A (en) * 2019-07-04 2019-09-27 科大讯飞股份有限公司 A kind of judge's measurement of penalty irrelevance prediction technique and device
CN110941702A (en) * 2019-11-26 2020-03-31 北京明略软件系统有限公司 Retrieval method and device for laws and regulations and laws and readable storage medium
CN110990562A (en) * 2019-10-29 2020-04-10 新智认知数字科技股份有限公司 Alarm classification method and system
CN111047092A (en) * 2019-12-11 2020-04-21 深圳前海环融联易信息科技服务有限公司 Dispute case victory rate prediction method and device, computer equipment and storage medium
US20200126174A1 (en) * 2018-08-10 2020-04-23 Rapidsos, Inc. Social media analytics for emergency management
CN111062834A (en) * 2019-12-11 2020-04-24 深圳前海环融联易信息科技服务有限公司 Dispute case entity identification method and device, computer equipment and storage medium
CN111260223A (en) * 2020-01-17 2020-06-09 山东省计算中心(国家超级计算济南中心) Intelligent identification and early warning method, system, medium and equipment for trial and judgment risk
WO2020114373A1 (en) * 2018-12-07 2020-06-11 北京国双科技有限公司 Method and apparatus for realizing element recognition in judicial document
CN111274804A (en) * 2020-01-17 2020-06-12 珠海市新德汇信息技术有限公司 Case information extraction method based on named entity recognition
CN111680512A (en) * 2020-05-11 2020-09-18 上海阿尔卡特网络支援系统有限公司 Named entity recognition model, telephone exchange switching extension method and system
CN111783420A (en) * 2020-06-19 2020-10-16 上海交通大学 Anti-complaint book element extraction method, system, medium and device based on BERT model

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300565A1 (en) * 2016-04-14 2017-10-19 Xerox Corporation System and method for entity extraction from semi-structured text documents
US20180012462A1 (en) * 2016-07-11 2018-01-11 Google Inc. Methods and Systems for Providing Event Alerts
US20190034499A1 (en) * 2017-07-29 2019-01-31 Splunk Inc. Navigating hierarchical components based on an expansion recommendation machine learning model
CN108305050A (en) * 2018-02-08 2018-07-20 贵州小爱机器人科技有限公司 Information of reporting a case to the security authorities and the extracting method of service requirement information, device, equipment and medium
CN108920461A (en) * 2018-06-26 2018-11-30 武大吉奥信息技术有限公司 A kind of polymorphic type and entity abstracting method and device containing complex relationship
US20200126174A1 (en) * 2018-08-10 2020-04-23 Rapidsos, Inc. Social media analytics for emergency management
CN109472419A (en) * 2018-11-16 2019-03-15 中山大学 Method for building up, device and the storage medium of alert prediction model based on space-time
WO2020114373A1 (en) * 2018-12-07 2020-06-11 北京国双科技有限公司 Method and apparatus for realizing element recognition in judicial document
CN110287292A (en) * 2019-07-04 2019-09-27 科大讯飞股份有限公司 A kind of judge's measurement of penalty irrelevance prediction technique and device
CN110990562A (en) * 2019-10-29 2020-04-10 新智认知数字科技股份有限公司 Alarm classification method and system
CN110941702A (en) * 2019-11-26 2020-03-31 北京明略软件系统有限公司 Retrieval method and device for laws and regulations and laws and readable storage medium
CN111047092A (en) * 2019-12-11 2020-04-21 深圳前海环融联易信息科技服务有限公司 Dispute case victory rate prediction method and device, computer equipment and storage medium
CN111062834A (en) * 2019-12-11 2020-04-24 深圳前海环融联易信息科技服务有限公司 Dispute case entity identification method and device, computer equipment and storage medium
CN111260223A (en) * 2020-01-17 2020-06-09 山东省计算中心(国家超级计算济南中心) Intelligent identification and early warning method, system, medium and equipment for trial and judgment risk
CN111274804A (en) * 2020-01-17 2020-06-12 珠海市新德汇信息技术有限公司 Case information extraction method based on named entity recognition
CN111680512A (en) * 2020-05-11 2020-09-18 上海阿尔卡特网络支援系统有限公司 Named entity recognition model, telephone exchange switching extension method and system
CN111783420A (en) * 2020-06-19 2020-10-16 上海交通大学 Anti-complaint book element extraction method, system, medium and device based on BERT model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JING LI: ""A Survey on Deep Learning for Named Entity Recognition"", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》, 30 October 2020 (2020-10-30), pages 50 - 70 *
杨峰 等: ""基于情景相似度的突发事件情报感知实现方法"", 《情报学报》, 31 May 2019 (2019-05-31), pages 525 - 533 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116108163A (en) * 2023-04-04 2023-05-12 之江实验室 Text matching method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112541075B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
EP3846048A1 (en) Online log analysis method, system, and electronic terminal device thereof
CN107766371B (en) Text information classification method and device
CN110413787B (en) Text clustering method, device, terminal and storage medium
CN110175334B (en) Text knowledge extraction system and method based on custom knowledge slot structure
CN111259160B (en) Knowledge graph construction method, device, equipment and storage medium
CN110941720A (en) Knowledge base-based specific personnel information error correction method
CN111259951A (en) Case detection method and device, electronic equipment and readable storage medium
CN115168345B (en) Database classification method, system, device and storage medium
CN113268615A (en) Resource label generation method and device, electronic equipment and storage medium
CN111859070A (en) Mass internet news cleaning system
CN114090736A (en) Enterprise industry identification system and method based on text similarity
CN112069383A (en) News text event and time extraction and normalization system for event tracking
CN112328792A (en) Optimization method for recognizing credit events based on DBSCAN clustering algorithm
CN113268982A (en) Network table structure identification method and device, computer device and computer readable storage medium
CN115936624A (en) Basic level data management method and device
CN112541075B (en) Standard case sending time extraction method and system for alert text
CN109960707B (en) College recruitment data acquisition method and system based on artificial intelligence
CN109542845B (en) Text metadata extraction method based on keyword expression
CN113761137A (en) Method and device for extracting address information
CN111291535A (en) Script processing method and device, electronic equipment and computer readable storage medium
CN110765107A (en) Question type identification method and system based on digital coding
CN113468315B (en) Vulnerability vendor name matching method
CN115994531A (en) Multi-dimensional text comprehensive identification method
CN114298041A (en) Network security named entity identification method and identification device
CN113609864A (en) Text semantic recognition processing system and method based on industrial control system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant