CN112541075A - Method and system for extracting standard case time of warning situation text - Google Patents
Method and system for extracting standard case time of warning situation text Download PDFInfo
- Publication number
- CN112541075A CN112541075A CN202011195667.3A CN202011195667A CN112541075A CN 112541075 A CN112541075 A CN 112541075A CN 202011195667 A CN202011195667 A CN 202011195667A CN 112541075 A CN112541075 A CN 112541075A
- Authority
- CN
- China
- Prior art keywords
- time
- text
- case
- elements
- clauses
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 42
- 238000012545 processing Methods 0.000 claims abstract description 20
- 238000000605 extraction Methods 0.000 claims abstract description 19
- 239000000284 extract Substances 0.000 claims abstract description 8
- 230000008569 process Effects 0.000 claims description 21
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000005259 measurement Methods 0.000 claims description 5
- 230000001502 supplementing effect Effects 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 241001622623 Coeliadinae Species 0.000 description 7
- 230000000694 effects Effects 0.000 description 3
- 238000002372 labelling Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Tourism & Hospitality (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Marketing (AREA)
- Data Mining & Analysis (AREA)
- Human Resources & Organizations (AREA)
- Educational Administration (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Databases & Information Systems (AREA)
- Development Economics (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention discloses a method and a system for extracting standard case time of an alarm condition text, and belongs to the technical field of extraction of public security alarm condition texts. The method comprises the following steps: sequentially extracting time elements in the warning situation text in a named entity identification mode; cutting the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and time elements; establishing and training a case time recognition model, and recognizing the expression content in the text clauses through the case time recognition model to determine case time; carrying out standardization processing on the determined case time; and merging the case sending time after the standardization processing, and further marking the merged case sending time. The invention adds a case time recognition model on the basis of naming the entity recognition time elements, accurately recognizes and extracts the case time information, and provides service convenience and support for the policeman to quickly and accurately analyze and check the alarm condition.
Description
Technical Field
The invention belongs to the technical field of extraction of public security warning situation texts, and particularly relates to a method and a system for extracting standard case time of a warning situation text.
Background
The time element extraction technology in the text is mature, and the method can achieve good effects as a named entity recognition task, a regular expression method, a sequence labeling model method and the like. The regular expression is used for matching the text based on a fixed time expression template; and the sequence labeling model enables a machine to learn the characteristics of the time elements in the text sequence through artificial labels by depending on the text data labeled in advance.
However, in the public security alert system, how to distinguish the attribute of each time element in the alert text and convert the attribute into a standard time format to perform inference of multiple time relationships is not related by the current technology. The time elements in the warning text are divided into alarm time, case time, other background time and the like. Wherein, the time of occurrence is a time period or a time point under a specific scene. At present, the existing model in the prior art is difficult to accurately extract the case time in the alarm text, and the business pressure of the policemen is greatly increased.
Disclosure of Invention
The invention provides a method and a system for extracting standard case time of an alarm text, which aim to solve the problems in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for extracting standard case time of an alarm text comprises the following steps:
step 1: sequentially extracting time elements in the warning situation text in a named entity identification mode;
step 2: cutting the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and time elements;
and step 3: establishing and training a case time recognition model, and recognizing the expression content in the text clauses through the case time recognition model to determine case time;
and 4, step 4: carrying out standardization processing on the determined case time;
and 5: and merging the case sending time after the standardization processing, and further marking the merged case sending time.
In a further embodiment, the step 1 extracts the time elements by using a regular expression, and the specific process is as follows:
step 11: firstly, removing the content in parentheses in the warning situation text, and eliminating time element interference information in the parenthesis content;
step 12: then, extracting time elements in the text by using a regular expression, wherein the regular expression is as follows: ([0-9] {4} years)? ([0-9] {1,2} month)? ([0-9] {1,2} day)? (today | yesterbefore)? [ \ \ u4E00- \ \ u9FA5 ]? (night | morning | am | afternoon | evening)? [ \ \ u4E00- \ \ u9FA5 ]? ([0-9] {0,2} [ time | point ])? ([0-9] {0,2} min);
in the formula:
([0-9] {4} year) representing four digits plus "year" to match the time of year;
([0-9] {1,2} month), representing one or two digits plus "month" to match the month time;
([0-9] {1,2} day), representing one or two digits plus "day" to match the time of day;
(today | yesterday) [ \ \ u4E00- \ \ u9FA5] to match the relative date descriptions of "today", "yesterday" and "day before";
(night | morning | afternoon | evening) [ \\ \ u4E00- \ \ u9FA5] to match the period descriptions of "night", "morning", "afternoon" and;
([0-9] {1,2} [ hour | Point ]), representing a one or two digit number plus a "hour" or "point" to match a particular hour;
([0-9] {1,2} score), representing one or two digits plus "the score" to match a particular minute.
In a further embodiment, the step 2 is further:
firstly, arranging the extracted time elements in sequence according to the sequence of the occurrence in the warning situation text, and setting the first time as the warning time;
then, dividing the warning situation text into a plurality of text clauses through punctuation mark regular matching;
finally, determining text clauses where time elements except the alarm time are located; if the text clauses contain time elements and the left clause and the right clause of the text clauses do not contain the time elements, combining the left text clause and the right text clause which do not contain the time elements and the clauses containing the time elements to form new text clauses; and constructing key value pairs of which the time elements correspond to the text clauses one by one.
In a further embodiment, the pattern time recognition model in step 3 comprises a pre-training model and a discriminant model;
the method comprises the steps that a pre-training model firstly establishes a database, training data in the database are derived from historical warning situation data of artificially marked case sending time, and case sending time in a warning situation text is determined by comparing text clauses containing time elements in a warning situation text with the training data; automatically marking the judged text clause data and then supplementing the text clause data into a database;
the discrimination model comprises an input layer, a hidden layer and an output layer; the input layer is a text clause which is used for segmenting the warning situation text and contains time elements, and the number of nodes is the number of the text clauses; the hidden layer is data newly added into the database in the pre-training process and original data in the database; the output layer determines whether the time elements in the text clauses are the case time or not through comparison, and the number of the nodes of the output layer is equal to the number of the text clauses needing to be judged; aiming at the condition that the data extension exceeds the training database in the judging process, the inputted text clauses are processed manually, the processed data is supplemented into the database, and the data of the hidden layer is gradually increased along with the increase of the training process;
the discrimination model carries out error measurement and calculation on the discrimination result:
in the formula, XijFor text clause samples containing time elements, P (X)ij) Is the probability that the time element in the text clause is the case time, Q (X)ij) Is the probability that the time element in the text clause is not the case time, and P (X)ij)+Q(Xij) 1, M is the node number of the hidden layer, and N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
In a further embodiment, the step 4 is further:
step 41: directly determining a time element "year, month, day, hour" by "[0-9] {4} year", "[0-9] {1,2} month", "[0-9] {1,2} day", "[0-9] {0,2} [ hour | point ]" regular expression, and performing the next step;
step 42: if "night", "afternoon", "evening" appears in the time element text and the number of hours by "[0-9] {0,2} [ hour | point ]" is less than 12, the number of hours plus 12;
step 43: if the 'day' element in the time elements is missing and the time elements comprise 'today', 'yesterday' and 'before', deducing 0 day, 1 day and 2 days before according to the alarm time to obtain the corresponding 'day' element;
step 44: if a single element in the time elements 'year, month, day and hour' is missing, filling the corresponding element in the former time element;
step 45: the time elements are normalized to form the standard case time in the "yyymmddhh" 10-bit digital format.
In a further embodiment, the step 5 is further:
step 51: judging whether two adjacent time elements appear in the same text clause or not, when the two adjacent time elements appear in the same text clause and the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
step 52: calculating the hour difference between two adjacent time elements, and combining the standard case sending times corresponding to the two time elements to form a case sending time period when the difference between the two adjacent time elements is less than 24 hours and the former time is earlier than the latter time, otherwise executing the next step;
step 53: searching keywords in the text clauses, and when keywords 'start' and 'start' exist in the text clauses corresponding to the former time element in two adjacent time elements, keywords 'end' and 'end' exist in the text clauses corresponding to the latter time element, and the former time is earlier than the latter time, merging the standard case sending times corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
step 54: positioning the standard case time corresponding to the residual time elements as case time points;
step 55: and marking the case time period and the case time point according to the time sequence.
A standard case time extraction system of an alarm situation text comprises:
the first module is used for sequentially extracting the time elements in the warning situation text in a named entity identification mode;
a second module for dividing the warning situation text into a plurality of text clauses and constructing key value pairs of the text clauses and the time elements;
a third module for establishing and training a case time recognition model, and recognizing the expression content in the text clauses through the case time recognition model to determine the case time;
a fourth module for standardizing the determined case time;
and the fifth module is used for merging the case time after the standardization processing and further marking the case time after merging.
In a further embodiment, the first module extracts time elements by using a regular expression, first removes the content in the parentheses in the alert text, excludes the time element interference information in the parenthesis content, and then extracts the time elements in the text by using the regular expression, where the regular expression is:
([0-9] {4} years)? ([0-9] {1,2} month)? ([0-9] {1,2} day)? (today | yesterbefore)? [ \ \ u4E00- \ \ u9FA5 ]? (night | morning | am | afternoon | evening)? [ \ \ u4E00- \ \ u9FA5 ]? ([0-9] {0,2} [ time | point ])? ([0-9] {0,2} min);
in the formula:
([0-9] {4} year) representing four digits plus "year" to match the time of year;
([0-9] {1,2} month), representing one or two digits plus "month" to match the month time;
([0-9] {1,2} day), representing one or two digits plus "day" to match the time of day;
(today | yesterday) [ \ \ u4E00- \ \ u9FA5] to match the relative date descriptions of "today", "yesterday" and "day before";
(night | morning | afternoon | evening) [ \\ \ u4E00- \ \ u9FA5] to match the period descriptions of "night", "morning", "afternoon" and;
([0-9] {1,2} [ hour | Point ]), representing a one or two digit number plus a "hour" or "point" to match a particular hour;
([0-9] {1,2} score), representing one or two digits plus "score" to match a particular minute;
the second module firstly arranges the extracted time elements in sequence according to the sequence of the alarm situation texts, and sets the first time as the alarm time; then, dividing the warning situation text into a plurality of text clauses through punctuation mark regular matching; finally, determining text clauses where time elements except the alarm time are located; if the text clauses contain time elements and the left clause and the right clause of the text clauses do not contain the time elements, combining the left text clause and the right text clause which do not contain the time elements and the clauses containing the time elements to form new text clauses; and constructing key value pairs of which the time elements correspond to the text clauses one by one.
The third module establishes and trains a case time recognition model, and the case time recognition model comprises a pre-training model and a discrimination model;
the method comprises the steps that a pre-training model firstly establishes a database, training data in the database are derived from historical warning situation data of artificially marked case sending time, and case sending time in a warning situation text is determined by comparing text clauses containing time elements in a warning situation text with the training data; automatically marking the judged text clause data and then supplementing the text clause data into a database;
the discrimination model comprises an input layer, a hidden layer and an output layer; the input layer is a text clause which is used for segmenting the warning situation text and contains time elements, and the number of nodes is the number of the text clauses; the hidden layer is data newly added into the database in the pre-training process and original data in the database; the output layer determines whether the time elements in the text clauses are the case time or not through comparison, and the number of the nodes of the output layer is equal to the number of the text clauses needing to be judged; aiming at the condition that the data extension exceeds the training database in the judging process, the inputted text clauses are processed manually, the processed data is supplemented into the database, and the data of the hidden layer is gradually increased along with the increase of the training process;
the discrimination model carries out error measurement and calculation on the discrimination result:
in the formula, XijFor text clause samples containing time elements, P (X)ij) Is the probability that the time element in the text clause is the case time, Q (X)ij) Is the probability that the time element in the text clause is not the case time, and P (X)ij)+Q(Xij) 1, M is the node number of the hidden layer, and N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
In a further embodiment, the fourth module first determines the time element "year, month, day, hour" directly by a "0-9 {4} year", "[0-9] {1,2} month", "[0-9] {1,2} day", "[0-9] {0,2} [ hour | point ]" regular expression; when "night", "afternoon", "evening" appears in the time element text, and the number of hours by "[0-9] {0,2} [ hour | point ]" is less than 12, the number of hours plus 12; when the 'day' element in the time elements is missing and the time elements comprise 'today', 'yesterday' and 'before', reasoning 0 day, 1 day and 2 days before according to the alarm time to obtain the corresponding 'day' element; if a single element in the time elements 'year, month, day and hour' is missing, filling the corresponding element in the previous time element; finally, the time elements are standardized to form standard case time in a 'yyyymmddhh' 10-bit digital format;
the fifth module firstly judges whether two adjacent time elements appear in the same text clause, and when the two adjacent time elements appear in the same text clause and the previous time is earlier than the next time, the standard case sending time corresponding to the two time elements is merged to form a case sending time period; calculating the hour difference between two adjacent time elements, and combining the standard case sending time corresponding to the two time elements to form a case sending time period when the difference between the two adjacent time elements is less than 24 hours and the former time is earlier than the latter time; searching keywords in the text clauses, and when keywords 'start' and 'start' exist in the text clauses corresponding to the former time element in two adjacent time elements, keywords 'end' and 'end' exist in the text clauses corresponding to the latter time element, and the former time is earlier than the latter time, merging the standard case sending times corresponding to the two time elements to form a case sending time period; positioning the standard case time corresponding to the residual time elements as case time points; and finally, marking the case time period and the case time point according to the time sequence.
A computer processing system comprising a storage module, wherein a computer program for the standard case time extraction method of the alert text in any of the above embodiments is stored in the storage module.
Has the advantages that: firstly, a case time recognition model is added on the basis of naming entity recognition time elements, case time information is accurately recognized and extracted, and business convenience is provided for the policeman to rapidly analyze and check the alarm condition;
secondly, dividing the alert text into a plurality of text clauses containing time elements, constructing key value pairs of the text clauses and the time elements, and performing semantic recognition on the text clauses to judge whether the time elements in the text clauses are case sending time, so that the situation that case sending time recognition and extraction are difficult due to complex alert text content is reduced;
and finally, combining the case time, and marking the case time point and the case time period, thereby providing business support for the policeman to quickly and accurately analyze the alarm condition.
Drawings
Fig. 1 is a flow chart of extracting standard case time of the alert text of the present invention.
FIG. 2 is a schematic diagram of the structure of the discriminant model of the present invention.
Detailed Description
The technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings and embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
Researches show that time elements in the alarm text are divided into alarm time, case time, other background time and the like, but the existing public security alarm system is difficult to distinguish the time element attributes in the alarm text and realize reasoning of a plurality of time relations so as to accurately identify and extract the case time, and the work usually requires a policeman to carry out artificial identification marking, so that the business pressure of the policeman is greatly increased.
Example 1: as shown in fig. 1, in order to solve the problems in the prior art, embodiment 1 of the present invention provides a method for extracting the standard case time of an alert text, including the following steps:
step 1: sequentially extracting time elements in the warning situation text in a named entity identification mode;
step 2: cutting the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and time elements;
and step 3: establishing and training a case time recognition model, and recognizing the expression content in the text clauses through the case time recognition model to determine case time;
and 4, step 4: carrying out standardization processing on the determined case time;
and 5: and merging the case sending time after the standardization processing, and further marking the merged case sending time.
The content in the alert text is complex, the time elements are many, and the attributes are different. For example, the time elements in the alert text are mainly classified into alarm time, case time, other background time, and the like. These time elements of different attributes greatly increase the difficulty of extracting the time to issue. To illustrate embodiments of the present invention in more detail, the present application provides a simple alert text "10 hours 10/27/07/2020, and sends out the connected alarm name: in 2020, 7.21.21.7.P later, 1 time of a bank card of Zhang III (household address: xxx, identification number: xxx, date of birth: xxx) of an alarm is embezzled, and 100 yuan is lost; when the pen is stolen, 2 pens are brushed, and the total loss is 200 yuan, namely, the pen is stolen from 7 month, 25 days and 6 hours in 2020 to 7 month, 25 days and 8 hours and 10 minutes. At 9 am, 26 am, 7/26/2020, the bank is left to report the loss ". In the text, "07/27/10/40 minutes in 2020" is the alarm time, "7/21/7/late/7/2020", "7/25/2020 6/49/7/25/8/10 minutes" is the case time, and the birth date of the alarm person is other background time. The time elements have complex attributes, and the identification difficulty of the case time is greatly increased. Moreover, part of the time elements in the alert text are in a non-standard time format, such as "7 o' clock late", which also increases the difficulty of time element identification.
Therefore, in order to accurately discriminate the attribute of the time element, first, a useful time element in the alert text is accurately extracted. Further, step 1 adopts a regular expression to extract time elements, and the specific process is as follows:
firstly, the content in the parentheses in the warning text is removed, and the time element interference information in the parenthesis content is excluded. For example, the birth date of an alarm person in the above alarm situation text, in practical applications, the content in parentheses usually also includes the birth time of the alarm person, which interferes with the extraction of text time elements, and these interfering time elements should be excluded first;
then, extracting time elements in the text by using a regular expression, wherein the regular expression is as follows:
([0-9] {4} years)? ([0-9] {1,2} month)? ([0-9] {1,2} day)? (today | yesterbefore)? [ \ \ u4E00- \ \ u9FA5 ]? (night | morning | am | afternoon | evening)? [ \ \ u4E00- \ \ u9FA5 ]? ([0-9] {0,2} [ time | point ])? ([0-9] {0,2} min);
in the formula:
([0-9] {4} year) representing four digits plus "year" to match the time of year;
([0-9] {1,2} month), representing one or two digits plus "month" to match the month time;
([0-9] {1,2} day), representing one or two digits plus "day" to match the time of day;
(today | yesterday) [ \ \ u4E00- \ \ u9FA5] to match the relative date descriptions of "today", "yesterday" and "day before"; the wildcards of single Chinese characters can increase the richness of word matching and is close to the description of an alarm person in daily life;
(night | morning | afternoon | evening) [ \\ \ u4E00- \ \ u9FA5] to match the period descriptions of "night", "morning", "afternoon" and; the wildcard character of a single Chinese character can increase the richness of word matching, effectively solve the problem of irregular time elements in the warning text caused by spoken language, avoid directly extracting 7 points later as 7 points and ensure the accuracy of time element extraction;
([0-9] {1,2} [ hour | Point ]), representing a one or two digit number plus a "hour" or "point" to match a particular hour;
([0-9] {1,2} score), representing one or two digits plus "score" to match a particular minute;
the above-mentioned warning situation text is processed by the regular expression, and the time elements "07/27/10/2020 divided by 40", "7/21/7/late/2020 divided by 6/7/25/2020 divided by 8/10/7/25/26/2020 divided by 9 am" are sequentially extracted.
Because the content of the warning text is complex and the time elements are more, if the judgment is directly carried out, the case time judgment is difficult. Therefore, the warning situation text is divided into a plurality of text clauses, and semantic recognition is carried out on each text clause to judge whether the time element in the text clauses is the case time. Therefore, in a further embodiment, the step 2 specifically comprises the following steps:
firstly, arranging the extracted time elements in sequence according to the sequence of the occurrence in the warning situation text, and setting the first time as the warning time; the warning situation text is '10 minutes and 40 minutes at 07, 27 and 27 months in 2020', namely the warning time;
then, dividing the warning situation text into a plurality of text clauses through punctuation mark regular matching; at this time, the situation that a part of text clauses have no time elements occurs, and the content of the text clauses without the time elements is the content which is necessary for analyzing whether the time elements are the case time or not, so that the text clauses without the time elements can not be directly eliminated, and the text clauses without the time elements need to be merged into the text clauses with the time elements, so that the text content before and after the text clauses with the time elements is perfected, and the case time can be accurately judged;
finally, determining text clauses where time elements except the alarm time are located; if the text clauses contain time elements and the left clause and the right clause of the text clauses do not contain the time elements, combining the left text clause and the right text clause which do not contain the time elements and the clauses containing the time elements to form new text clauses; and constructing key value pairs corresponding to the time elements and the text clauses one by one, wherein the key value pairs of the time elements and the text clauses constructed by the warning situation text are as follows:
after the time elements are extracted, if the time elements need to be judged to be the case time, the accurate judgment can be carried out according to the semantics of the text clauses where the time elements are located. In a further embodiment, the case time recognition model is established and trained to recognize the expression content of the text clause, so as to determine whether the time element is the case time. The pattern time recognition model comprises a pre-training model and a discrimination model.
Firstly, a database is established through a pre-training model, and training data in the database is derived from historical warning situation data of artificially marked case sending time. Then comparing the text clauses containing time elements in the warning situation text with the training data to determine case time in the warning situation text; and the distinguished text clause data is automatically marked and then is supplemented into the database, so that the content of the database is further enriched, and the case time is rapidly distinguished in the actual distinguishing process.
With reference to fig. 2, the discriminant model includes an input layer, a hidden layer, and an output layer; the input layer is a text clause which is used for segmenting the warning situation text and contains time elements, and the number of nodes is the number of the text clauses; the hidden layer is data newly added into the database in the pre-training process and original data in the database; the output layer determines whether the time elements in the text clauses are the case time or not through comparison, and the number of the nodes of the output layer is equal to the number of the text clauses needing to be judged. And the text clauses containing time elements input by the input layer and the comparison data of the hidden layer are subjected to similarity comparison judgment, and finally, the judgment result is output by the output layer. And aiming at the condition that the data extension exceeds the training database in the judging process, the input text clauses can be manually processed, and the processed data is supplemented into the database, so that the data of the hidden layer is gradually increased along with the increase of the training process. Therefore, as the number of text clauses processed by the discrimination model increases, the discrimination difficulty of the discrimination model decreases.
And (3) carrying out error measurement and calculation on the discrimination result by adopting a cross entropy loss function for the discrimination structure of the discrimination model so as to increase the accuracy of the discrimination result:
in the formula, XijText clause samples containing time elements; p (X)ij) The probability that the time element in the text clause is the case time is given; q (X)ij) Is the probability that the time element in the text clause is not the case time, and P (X)ij)+Q(Xij) 1 is ═ 1; m is the number of nodes of the hidden layer; n is the number of text clause samples containing time elements; the smaller the H (P, Q) value is, the smaller the error representing the judgment result is; therefore, whether to adopt the determined case time can be determined according to the comparison result of the measured error and the set value. For example: and if H (P, Q) is less than the set value, collecting the letter, considering that the time element in the judged text clause is the case time, and if H (P, Q) is less than the set value, not collecting the letter, and manually judging whether the time element without collecting the letter is the case time or not.
Because Chinese characters exist in the extracted time elements, for example, "21 st night 7 p/2020" and "7 th 25 th 6 th 49 p/2020" are both time elements representing time, but "7 th night" and "49 p/6" are two different time representation modes, and if the two time representation modes are not unified, the digital archiving is not facilitated, and the inconvenience of analyzing and handling case situations by policemen is increased. Therefore, in order to uniformly process the case time, the case time needs to be standardized. In a further embodiment, step 4 is further:
step 41: directly determining a time element "year, month, day, hour" by "[0-9] {4} year", "[0-9] {1,2} month", "[0-9] {1,2} day", "[0-9] {0,2} [ hour | point ]" regular expression, and performing the next step;
step 42: if "night", "afternoon", "evening" appears in the time element text and the number of hours by "[0-9] {0,2} [ hour | point ]" is less than 12, the number of hours plus 12;
step 43: if the 'day' element in the time elements is missing and the time elements comprise 'today', 'yesterday' and 'before', deducing 0 day, 1 day and 2 days before according to the alarm time to obtain the corresponding 'day' element;
step 44: if a single element in the time elements 'year, month, day and hour' is missing, filling the corresponding element in the former time element;
step 45: carrying out standardization processing on the time elements to form standard case time in a 'yyyymmddhh' 10-bit digital format; the 1 st to 4 th digits represent the "year" time, the 5 th to 6 th digits represent the "month" time, the 7 th to 8 th digits represent the "day" time, and the 9 th to 10 th digits represent the "hour" time.
Therefore, the case time in the alert text "7 am 21 st 7 p 2020", "7 am 7 me 25 me 6 h 2020", "49 min 7 me 25 me 8 h 10 min 7 me". After normalization at each time, "2020072119", "2020072506", "2020072508" were obtained.
In the alert text, the infringed event of the alarmer may correspond to a plurality of time points, and the time points are different in interval, some may be separated by several days, and some may be separated by several hours. If the case time is determined by the time points, the case problems with short time intervals can lose the relevance, and the case solving difficulty of policemen is increased. Therefore, in a further embodiment, by performing merging processing on the standardized processed case time, merging case time points with shorter interval time, or merging case time points with obvious relevance and marking the merged case time, convenience is provided for the police to analyze the case. Therefore, the specific process of step 5 is:
step 51: judging whether two adjacent time elements appear in the same text clause or not, when the two adjacent time elements appear in the same text clause and the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
step 52: calculating the hour difference between two adjacent time elements, and combining the standard case sending times corresponding to the two time elements to form a case sending time period when the difference between the two adjacent time elements is less than 24 hours and the former time is earlier than the latter time, otherwise executing the next step;
step 53: searching keywords in the text clauses, and when keywords 'start' and 'start' exist in the text clauses corresponding to the former time element in two adjacent time elements, keywords 'end' and 'end' exist in the text clauses corresponding to the latter time element, and the former time is earlier than the latter time, merging the standard case sending times corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
step 54: positioning the standard case time corresponding to the residual time elements as case time points;
step 55: marking the case time period and the case time point according to the time sequence; for example, the time can be marked as "first episode time", "second episode time", "third episode time", etc.
The case time "7/21/late 7/2020" in the alert text is the case time point, and "7/25/6/49/2020" and "7/25/8/10" are combined into a case time period, which can be denoted as "first case time" of 2020072119 "and" second case time "of 2020072506 and 2020072508" by a label. The mode not only can be convenient for digital filing, but also can mark the case sending time, and the case sending time is prevented from being memorized by the policemen, so that the policemen can know the order of the case occurrence through the marked case sending time, the convenience of the policemen for processing the case is greatly improved, and the policemen can quickly and accurately analyze the case.
Example 2: the embodiment 2 of the invention provides a standard case time extraction system of an alarm text, which comprises a first module, a second module, a third module, a fourth module and a fifth module;
the first module is used for sequentially extracting time elements in the warning situation text in a named entity identification mode;
the second module is used for cutting the warning situation text into a plurality of text clauses and constructing key value pairs of the text clauses and the time elements;
the third module is used for establishing and training a case time recognition model, and recognizing the expression content in the text clauses through the case time recognition model to determine case time;
the fourth module is used for carrying out standardization processing on the determined case time;
the fifth module is used for merging the case sending time after the standardization processing and further marking the case sending time after the merging;
the first module, the second module, the third module, the fourth module and the fifth module of the standard case time extraction system of the alarm text are used for realizing the standard time extraction method of the alarm text in the embodiment 1, so that the standard time extraction method of the alarm text has the technical effect, and the standard case time extraction system of the alarm text also has the same advantages.
Example 3: embodiment 3 of the present invention provides a computer processing system including a memory module; a computer program for realizing the method for extracting the standard case time of the warning situation text in any embodiment is stored in the storage module; the computer processing system can be used for realizing the standard time extraction method of the warning situation text, so the computer processing system also has the technical effect of the standard time extraction method of the warning situation text.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. A method for extracting the standard case time of an alarm text is characterized by comprising the following steps:
step 1: sequentially extracting time elements in the warning situation text in a named entity identification mode;
step 2: cutting the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and time elements;
and step 3: establishing and training a case time recognition model, and recognizing the expression content in the text clauses through the case time recognition model to determine case time;
and 4, step 4: carrying out standardization processing on the determined case time;
and 5: and merging the case sending time after the standardization processing, and further marking the merged case sending time.
2. The method for extracting the standard case time of the alarm text according to claim 1, wherein the step 1 adopts a regular expression to extract the time elements, and the specific process is as follows:
step 11: firstly, removing the content in parentheses in the warning situation text, and eliminating time element interference information in the parenthesis content;
step 12: then, extracting time elements in the text by using a regular expression, wherein the regular expression is as follows: ([0-9] {4} years)? ([0-9] {1,2} month)? ([0-9] {1,2} day)? (today | yesterbefore)? [ \ \ u4E00- \ \ u9FA5 ]? (night | morning | am | afternoon | evening)? [ \ \ u4E00- \ \ u9FA5 ]? ([0-9] {0,2} [ time | point ])? ([0-9] {0,2} min);
in the formula:
([0-9] {4} year) representing four digits plus "year" to match the time of year;
([0-9] {1,2} month), representing one or two digits plus "month" to match the month time;
([0-9] {1,2} day), representing one or two digits plus "day" to match the time of day;
(today | yesterday) [ \ \ u4E00- \ \ u9FA5] to match the relative date descriptions of "today", "yesterday" and "day before";
(night | morning | afternoon | evening) [ \\ \ u4E00- \ \ u9FA5] to match the period descriptions of "night", "morning", "afternoon" and;
([0-9] {1,2} [ hour | Point ]), representing a one or two digit number plus a "hour" or "point" to match a particular hour;
([0-9] {1,2} score), representing one or two digits plus "the score" to match a particular minute.
3. The method for extracting the standard case time of the warning text according to claim 1, wherein the step 2 further comprises:
firstly, arranging the extracted time elements in sequence according to the sequence of the occurrence in the warning situation text, and setting the first time as the warning time;
then, dividing the warning situation text into a plurality of text clauses through punctuation mark regular matching;
finally, determining text clauses where time elements except the alarm time are located; if the text clauses contain time elements and the left clause and the right clause of the text clauses do not contain the time elements, combining the left text clause and the right text clause which do not contain the time elements and the clauses containing the time elements to form new text clauses; and constructing key value pairs of which the time elements correspond to the text clauses one by one.
4. The method for extracting the standard case time of the warning text according to claim 1, wherein the case time recognition model in the step 3 comprises a pre-training model and a discriminant model;
the method comprises the steps that a pre-training model firstly establishes a database, training data in the database are derived from historical warning situation data of artificially marked case sending time, and case sending time in a warning situation text is determined by comparing text clauses containing time elements in a warning situation text with the training data; automatically marking the judged text clause data and then supplementing the text clause data into a database;
the discrimination model comprises an input layer, a hidden layer and an output layer; the input layer is a text clause which is used for segmenting the warning situation text and contains time elements, and the number of nodes is the number of the text clauses; the hidden layer is data newly added into the database in the pre-training process and original data in the database; the output layer determines whether the time elements in the text clauses are the case time or not through comparison, and the number of the nodes of the output layer is equal to the number of the text clauses needing to be judged; aiming at the condition that the data extension exceeds the training database in the judging process, the inputted text clauses are processed manually, the processed data is supplemented into the database, and the data of the hidden layer is gradually increased along with the increase of the training process;
the discrimination model carries out error measurement and calculation on the discrimination result:
in the formula, XijFor text clause samples containing time elements, P (X)ij) Is the probability that the time element in the text clause is the case time, Q (X)ij) Is the probability that the time element in the text clause is not the case time, and P (X)ij)+Q(Xij) 1, M is the node number of the hidden layer, and N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
5. The method for extracting the standard case time of the warning text according to claim 1, wherein the step 4 is further as follows:
step 41: directly determining a time element "year, month, day, hour" by "[0-9] {4} year", "[0-9] {1,2} month", "[0-9] {1,2} day", "[0-9] {0,2} [ hour | point ]" regular expression, and performing the next step;
step 42: if "night", "afternoon", "evening" appears in the time element text and the number of hours by "[0-9] {0,2} [ hour | point ]" is less than 12, the number of hours plus 12;
step 43: if the 'day' element in the time elements is missing and the time elements comprise 'today', 'yesterday' and 'before', deducing 0 day, 1 day and 2 days before according to the alarm time to obtain the corresponding 'day' element;
step 44: if a single element in the time elements 'year, month, day and hour' is missing, filling the corresponding element in the former time element;
step 45: the time elements are normalized to form the standard case time in the "yyymmddhh" 10-bit digital format.
6. The method for extracting the standard case time of the warning text according to claim 1, wherein the step 5 further comprises:
step 51: judging whether two adjacent time elements appear in the same text clause or not, when the two adjacent time elements appear in the same text clause and the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
step 52: calculating the hour difference between two adjacent time elements, and combining the standard case sending times corresponding to the two time elements to form a case sending time period when the difference between the two adjacent time elements is less than 24 hours and the former time is earlier than the latter time, otherwise executing the next step;
step 53: searching keywords in the text clauses, and when keywords 'start' and 'start' exist in the text clauses corresponding to the former time element in two adjacent time elements, keywords 'end' and 'end' exist in the text clauses corresponding to the latter time element, and the former time is earlier than the latter time, merging the standard case sending times corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
step 54: positioning the standard case time corresponding to the residual time elements as case time points;
step 55: and marking the case time period and the case time point according to the time sequence.
7. A standard case time extraction system of an alarm text is characterized by comprising:
the first module is used for sequentially extracting the time elements in the warning situation text in a named entity identification mode;
a second module for dividing the warning situation text into a plurality of text clauses and constructing key value pairs of the text clauses and the time elements;
a third module for establishing and training a case time recognition model, and recognizing the expression content in the text clauses through the case time recognition model to determine the case time;
a fourth module for standardizing the determined case time;
and the fifth module is used for merging the case time after the standardization processing and further marking the case time after merging.
8. The standard case time extraction system of an alert text according to claim 7,
the first module extracts time elements by adopting a regular expression, firstly removes the content in brackets in the warning situation text, eliminates the time element interference information in the bracket content, and then extracts the time elements in the text by utilizing the regular expression, wherein the regular expression is as follows:
([0-9] {4} years)? ([0-9] {1,2} month)? ([0-9] {1,2} day)? (today | yesterbefore)? [ \ \ u4E00- \ \ u9FA5 ]? (night | morning | am | afternoon | evening)? [ \ \ u4E00- \ \ u9FA5 ]? ([0-9] {0,2} [ time | point ])? ([0-9] {0,2} min);
in the formula:
([0-9] {4} year) representing four digits plus "year" to match the time of year;
([0-9] {1,2} month), representing one or two digits plus "month" to match the month time;
([0-9] {1,2} day), representing one or two digits plus "day" to match the time of day;
(today | yesterday) [ \ \ u4E00- \ \ u9FA5] to match the relative date descriptions of "today", "yesterday" and "day before";
(night | morning | afternoon | evening) [ \\ \ u4E00- \ \ u9FA5] to match the period descriptions of "night", "morning", "afternoon" and;
([0-9] {1,2} [ hour | Point ]), representing a one or two digit number plus a "hour" or "point" to match a particular hour;
([0-9] {1,2} score), representing one or two digits plus "score" to match a particular minute;
the second module firstly arranges the extracted time elements in sequence according to the sequence of the alarm situation texts, and sets the first time as the alarm time; then, dividing the warning situation text into a plurality of text clauses through punctuation mark regular matching; finally, determining text clauses where time elements except the alarm time are located; if the text clauses contain time elements and the left clause and the right clause of the text clauses do not contain the time elements, combining the left text clause and the right text clause which do not contain the time elements and the clauses containing the time elements to form new text clauses; constructing key value pairs of which the time elements correspond to the text clauses one by one;
the third module establishes and trains a case time recognition model, and the case time recognition model comprises a pre-training model and a discrimination model;
firstly, establishing a database by a pre-training model, wherein training data in the database is derived from historical alarm condition data of artificially marked case sending time, and determining the case sending time in the alarm condition text by comparing text clauses containing time elements in the alarm condition text with the training data; automatically marking the judged text clause data and then supplementing the text clause data into a database;
the discrimination model comprises an input layer, a hidden layer and an output layer; the input layer is a text clause which is used for segmenting the warning situation text and contains time elements, and the number of nodes is the number of the text clauses; the hidden layer is data newly added into the database in the pre-training process and original data in the database; the output layer determines whether the time elements in the text clauses are the case time or not through comparison, and the number of the nodes of the output layer is equal to the number of the text clauses needing to be judged; aiming at the condition that the data extension exceeds the training database in the judging process, the inputted text clauses are processed manually, the processed data is supplemented into the database, and the data of the hidden layer is gradually increased along with the increase of the training process;
the discrimination model carries out error measurement and calculation on the discrimination result:
in the formula, XijFor text clause samples containing time elements, P (X)ij) Is the probability that the time element in the text clause is the case time, Q (X)ij) Is the probability that the time element in the text clause is not the case time, and P (X)ij)+Q(Xij) 1, M is the node number of the hidden layer, and N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
9. The standard case time extraction system of an alert text according to claim 7,
the fourth module firstly determines the time elements 'year, month, day, hour' directly by '0-9 {4} year', '0-9 {1,2} month', '0-9 {1,2} day', '0-9 ] {0,2} [ hour | point ]' regular expression; when "night", "afternoon", "evening" appears in the time element text, and the number of hours by "[0-9] {0,2} [ hour | point ]" is less than 12, the number of hours plus 12; when the 'day' element in the time elements is missing and the time elements comprise 'today', 'yesterday' and 'before', reasoning 0 day, 1 day and 2 days before according to the alarm time to obtain the corresponding 'day' element; if a single element in the time elements 'year, month, day and hour' is missing, filling the corresponding element in the previous time element; finally, the time elements are standardized to form standard case time in a 'yyyymmddhh' 10-bit digital format;
the fifth module firstly judges whether two adjacent time elements appear in the same text clause, and when the two adjacent time elements appear in the same text clause and the previous time is earlier than the next time, the standard case sending time corresponding to the two time elements is merged to form a case sending time period; calculating the hour difference between two adjacent time elements, and combining the standard case sending time corresponding to the two time elements to form a case sending time period when the difference between the two adjacent time elements is less than 24 hours and the former time is earlier than the latter time; searching keywords in the text clauses, and when keywords 'start' and 'start' exist in the text clauses corresponding to the former time element in two adjacent time elements, keywords 'end' and 'end' exist in the text clauses corresponding to the latter time element, and the former time is earlier than the latter time, merging the standard case sending times corresponding to the two time elements to form a case sending time period; positioning the standard case time corresponding to the residual time elements as case time points; and finally, marking the case time period and the case time point according to the time sequence.
10. A computer processing system comprising a storage module, wherein a computer program for implementing the standard case time extraction method of the alert text according to any one of claims 1 to 6 is stored in the storage module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011195667.3A CN112541075B (en) | 2020-10-30 | 2020-10-30 | Standard case sending time extraction method and system for alert text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011195667.3A CN112541075B (en) | 2020-10-30 | 2020-10-30 | Standard case sending time extraction method and system for alert text |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112541075A true CN112541075A (en) | 2021-03-23 |
CN112541075B CN112541075B (en) | 2024-04-05 |
Family
ID=75013660
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011195667.3A Active CN112541075B (en) | 2020-10-30 | 2020-10-30 | Standard case sending time extraction method and system for alert text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112541075B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116108163A (en) * | 2023-04-04 | 2023-05-12 | 之江实验室 | Text matching method, device, equipment and storage medium |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170300565A1 (en) * | 2016-04-14 | 2017-10-19 | Xerox Corporation | System and method for entity extraction from semi-structured text documents |
US20180012462A1 (en) * | 2016-07-11 | 2018-01-11 | Google Inc. | Methods and Systems for Providing Event Alerts |
CN108305050A (en) * | 2018-02-08 | 2018-07-20 | 贵州小爱机器人科技有限公司 | Information of reporting a case to the security authorities and the extracting method of service requirement information, device, equipment and medium |
CN108920461A (en) * | 2018-06-26 | 2018-11-30 | 武大吉奥信息技术有限公司 | A kind of polymorphic type and entity abstracting method and device containing complex relationship |
US20190034499A1 (en) * | 2017-07-29 | 2019-01-31 | Splunk Inc. | Navigating hierarchical components based on an expansion recommendation machine learning model |
CN109472419A (en) * | 2018-11-16 | 2019-03-15 | 中山大学 | Method for building up, device and the storage medium of alert prediction model based on space-time |
CN110287292A (en) * | 2019-07-04 | 2019-09-27 | 科大讯飞股份有限公司 | A kind of judge's measurement of penalty irrelevance prediction technique and device |
CN110941702A (en) * | 2019-11-26 | 2020-03-31 | 北京明略软件系统有限公司 | Retrieval method and device for laws and regulations and laws and readable storage medium |
CN110990562A (en) * | 2019-10-29 | 2020-04-10 | 新智认知数字科技股份有限公司 | Alarm classification method and system |
CN111047092A (en) * | 2019-12-11 | 2020-04-21 | 深圳前海环融联易信息科技服务有限公司 | Dispute case victory rate prediction method and device, computer equipment and storage medium |
US20200126174A1 (en) * | 2018-08-10 | 2020-04-23 | Rapidsos, Inc. | Social media analytics for emergency management |
CN111062834A (en) * | 2019-12-11 | 2020-04-24 | 深圳前海环融联易信息科技服务有限公司 | Dispute case entity identification method and device, computer equipment and storage medium |
CN111260223A (en) * | 2020-01-17 | 2020-06-09 | 山东省计算中心(国家超级计算济南中心) | Intelligent identification and early warning method, system, medium and equipment for trial and judgment risk |
WO2020114373A1 (en) * | 2018-12-07 | 2020-06-11 | 北京国双科技有限公司 | Method and apparatus for realizing element recognition in judicial document |
CN111274804A (en) * | 2020-01-17 | 2020-06-12 | 珠海市新德汇信息技术有限公司 | Case information extraction method based on named entity recognition |
CN111680512A (en) * | 2020-05-11 | 2020-09-18 | 上海阿尔卡特网络支援系统有限公司 | Named entity recognition model, telephone exchange switching extension method and system |
CN111783420A (en) * | 2020-06-19 | 2020-10-16 | 上海交通大学 | Anti-complaint book element extraction method, system, medium and device based on BERT model |
-
2020
- 2020-10-30 CN CN202011195667.3A patent/CN112541075B/en active Active
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170300565A1 (en) * | 2016-04-14 | 2017-10-19 | Xerox Corporation | System and method for entity extraction from semi-structured text documents |
US20180012462A1 (en) * | 2016-07-11 | 2018-01-11 | Google Inc. | Methods and Systems for Providing Event Alerts |
US20190034499A1 (en) * | 2017-07-29 | 2019-01-31 | Splunk Inc. | Navigating hierarchical components based on an expansion recommendation machine learning model |
CN108305050A (en) * | 2018-02-08 | 2018-07-20 | 贵州小爱机器人科技有限公司 | Information of reporting a case to the security authorities and the extracting method of service requirement information, device, equipment and medium |
CN108920461A (en) * | 2018-06-26 | 2018-11-30 | 武大吉奥信息技术有限公司 | A kind of polymorphic type and entity abstracting method and device containing complex relationship |
US20200126174A1 (en) * | 2018-08-10 | 2020-04-23 | Rapidsos, Inc. | Social media analytics for emergency management |
CN109472419A (en) * | 2018-11-16 | 2019-03-15 | 中山大学 | Method for building up, device and the storage medium of alert prediction model based on space-time |
WO2020114373A1 (en) * | 2018-12-07 | 2020-06-11 | 北京国双科技有限公司 | Method and apparatus for realizing element recognition in judicial document |
CN110287292A (en) * | 2019-07-04 | 2019-09-27 | 科大讯飞股份有限公司 | A kind of judge's measurement of penalty irrelevance prediction technique and device |
CN110990562A (en) * | 2019-10-29 | 2020-04-10 | 新智认知数字科技股份有限公司 | Alarm classification method and system |
CN110941702A (en) * | 2019-11-26 | 2020-03-31 | 北京明略软件系统有限公司 | Retrieval method and device for laws and regulations and laws and readable storage medium |
CN111047092A (en) * | 2019-12-11 | 2020-04-21 | 深圳前海环融联易信息科技服务有限公司 | Dispute case victory rate prediction method and device, computer equipment and storage medium |
CN111062834A (en) * | 2019-12-11 | 2020-04-24 | 深圳前海环融联易信息科技服务有限公司 | Dispute case entity identification method and device, computer equipment and storage medium |
CN111260223A (en) * | 2020-01-17 | 2020-06-09 | 山东省计算中心(国家超级计算济南中心) | Intelligent identification and early warning method, system, medium and equipment for trial and judgment risk |
CN111274804A (en) * | 2020-01-17 | 2020-06-12 | 珠海市新德汇信息技术有限公司 | Case information extraction method based on named entity recognition |
CN111680512A (en) * | 2020-05-11 | 2020-09-18 | 上海阿尔卡特网络支援系统有限公司 | Named entity recognition model, telephone exchange switching extension method and system |
CN111783420A (en) * | 2020-06-19 | 2020-10-16 | 上海交通大学 | Anti-complaint book element extraction method, system, medium and device based on BERT model |
Non-Patent Citations (2)
Title |
---|
JING LI: ""A Survey on Deep Learning for Named Entity Recognition"", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》, 30 October 2020 (2020-10-30), pages 50 - 70 * |
杨峰 等: ""基于情景相似度的突发事件情报感知实现方法"", 《情报学报》, 31 May 2019 (2019-05-31), pages 525 - 533 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116108163A (en) * | 2023-04-04 | 2023-05-12 | 之江实验室 | Text matching method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112541075B (en) | 2024-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3846048A1 (en) | Online log analysis method, system, and electronic terminal device thereof | |
CN107766371B (en) | Text information classification method and device | |
CN110413787B (en) | Text clustering method, device, terminal and storage medium | |
CN110175334B (en) | Text knowledge extraction system and method based on custom knowledge slot structure | |
CN111259160B (en) | Knowledge graph construction method, device, equipment and storage medium | |
CN110941720A (en) | Knowledge base-based specific personnel information error correction method | |
CN111259951A (en) | Case detection method and device, electronic equipment and readable storage medium | |
CN115168345B (en) | Database classification method, system, device and storage medium | |
CN113268615A (en) | Resource label generation method and device, electronic equipment and storage medium | |
CN111859070A (en) | Mass internet news cleaning system | |
CN114090736A (en) | Enterprise industry identification system and method based on text similarity | |
CN112069383A (en) | News text event and time extraction and normalization system for event tracking | |
CN112328792A (en) | Optimization method for recognizing credit events based on DBSCAN clustering algorithm | |
CN113268982A (en) | Network table structure identification method and device, computer device and computer readable storage medium | |
CN115936624A (en) | Basic level data management method and device | |
CN112541075B (en) | Standard case sending time extraction method and system for alert text | |
CN109960707B (en) | College recruitment data acquisition method and system based on artificial intelligence | |
CN109542845B (en) | Text metadata extraction method based on keyword expression | |
CN113761137A (en) | Method and device for extracting address information | |
CN111291535A (en) | Script processing method and device, electronic equipment and computer readable storage medium | |
CN110765107A (en) | Question type identification method and system based on digital coding | |
CN113468315B (en) | Vulnerability vendor name matching method | |
CN115994531A (en) | Multi-dimensional text comprehensive identification method | |
CN114298041A (en) | Network security named entity identification method and identification device | |
CN113609864A (en) | Text semantic recognition processing system and method based on industrial control system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |