CN107729314A - A kind of Chinese time recognition methods, device and storage medium, program product - Google Patents

A kind of Chinese time recognition methods, device and storage medium, program product Download PDF

Info

Publication number
CN107729314A
CN107729314A CN201710912117.0A CN201710912117A CN107729314A CN 107729314 A CN107729314 A CN 107729314A CN 201710912117 A CN201710912117 A CN 201710912117A CN 107729314 A CN107729314 A CN 107729314A
Authority
CN
China
Prior art keywords
time
word
identified
temporal expression
expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710912117.0A
Other languages
Chinese (zh)
Other versions
CN107729314B (en
Inventor
刘嘉伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201710912117.0A priority Critical patent/CN107729314B/en
Publication of CN107729314A publication Critical patent/CN107729314A/en
Application granted granted Critical
Publication of CN107729314B publication Critical patent/CN107729314B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

This application discloses a kind of Chinese time recognition methods, this method includes:Text to be identified is segmented, obtain word segmentation result, character string to be identified is matched with the regularity for recognition time basis word, the character string to be identified matched with regularity is defined as time basis word, character string wherein to be identified includes a participle or multiple continuous participles in word segmentation result, the temporal expression independent time basis word being labeled as in text to be identified, or multiple time basis words for meeting preparatory condition are combined to the temporal expression being labeled as in text to be identified.Independent time basis word and/or multiple time basis words are labeled as temporal expression, so as to realize the identification of Chinese time in the case of Chinese word flexible structure by this method by identifying the time basis word of existence time element according to preparatory condition.Disclosed herein as well is a kind of device of Chinese time identification.

Description

A kind of Chinese time recognition methods, device and storage medium, program product
Technical field
The application is related to field of computer technology, and in particular to a kind of Chinese time recognition methods, device and storage medium, Program product.
Background technology
Natural language processing is computer science and an important directions in artificial intelligence field, and its research can Realize the various theoretical and methods for carrying out efficient communication between people and computer with natural language.In daily life, The time dimension critically important as one, plays key player, it is appreciated that when the context of many things is required for passing through Between, such as the sequencing occurred using the time come positioning time.Computer is allowed correctly to identify that the time has critically important Function and significance.
Due to flexible structure between Chinese word, clause is complicated, and Chinese can be recognized accurately in the prior art by lacking The scheme of time.
The content of the invention
In view of this, the application provides a kind of Chinese time recognition methods, device and storage medium, program product, with solution The Chinese time can not certainly be recognized accurately in the prior art.
To solve the above problems, the technical scheme that the embodiment of the present application provides is as follows:
A kind of Chinese time recognition methods, methods described include:
Text to be identified is segmented, obtains word segmentation result;
Character string to be identified is matched with the regularity for recognition time basis word, will be with the regularity The character string to be identified of matching is defined as the time basis word, and the character string to be identified includes one in the word segmentation result Individual participle or multiple continuous participles;
The temporal expression independent time basis word being labeled as in the text to be identified, or will be multiple full The time basis word of sufficient preparatory condition is combined the temporal expression being labeled as in the text to be identified.
Accordingly, methods described also includes:
The temporal expression is divided into very first time expression formula, the second temporal expression or the 3rd temporal expressions Formula, the very first time expression formula be include determine the time temporal expression, second temporal expression be include it is non-really The temporal expression fixed time, the 3rd temporal expression are other times expression formula;
It regard second temporal expression very first time expression formula nearest using before second temporal expression as base Point carries out time conversion between punctual.
Accordingly, it is described that the temporal expression is divided into very first time expression formula, the second temporal expression or Three temporal expressions, including:
The temporal expression language material training generation for determining time, non-determined time or other times using being labelled with is supported Vector machine SVM models;
The temporal expression is inputted into the SVM models, the temporal expression is divided into very first time expression Formula, the second temporal expression or the 3rd temporal expression.
Accordingly, it is described by second temporal expression with very first time table nearest before second temporal expression Time conversion is carried out as reference time point up to formula, including:
Trigger word is identified in second temporal expression by default triggering word list;
Time conversion regime corresponding to the trigger word is obtained from the triggering word list;
Nearest very first time expression formula utilizes the triggering as reference time point using before second temporal expression Time conversion regime corresponding to word is changed to the trigger word in second temporal expression.
Accordingly, it is described that text to be identified is segmented, word segmentation result is obtained, including:
The language material of sequence labelling is trained generation CRF models using condition random field CRF algorithms;
CRF models described in text input to be identified are segmented, obtain word segmentation result.
Accordingly, methods described also includes:
The regularity for being used for recognition time basis word corresponding to each chronological classification is obtained, the chronological classification includes exhausted To time, relative time, period, temporal frequency, Fuzzy Time, holiday time, time in dynasty and event time.
Accordingly, it is described multiple time basis words for meeting preparatory condition are combined be labeled as it is described to be identified Temporal expression in text, including:
Multiple continuous time basis words are combined to the temporal expression being labeled as in the text to be identified; Or when only existing structural auxiliary word between multiple time basis words, by multiple time basis words and multiple institutes Existing structural auxiliary word is combined between stating time basis word, the temporal expression being labeled as in the text to be identified.
A kind of Chinese time identification device, described device include:
Participle unit, for being segmented to text to be identified, obtain word segmentation result;
Determining unit, will for character string to be identified to be matched with the regularity for recognition time basis word The character string to be identified matched with the regularity is defined as the time basis word, and the character string to be identified includes described A participle or multiple continuous participles in word segmentation result;
Unit is marked, for the temporal expressions independent time basis word being labeled as in the text to be identified Formula, or multiple time basis words for meeting preparatory condition are combined to the time being labeled as in the text to be identified Expression formula.
Accordingly, described device also includes:
Division unit, for by the temporal expression be divided into very first time expression formula, the second temporal expression or 3rd temporal expression, the very first time expression formula are temporal expression, second temporal expressions for including determining the time Formula is to include the temporal expression of non-determined time, and the 3rd temporal expression is other times expression formula;
Converting unit, for by second temporal expression with the very first time nearest before second temporal expression Expression formula carries out time conversion as reference time point.
Accordingly, the division unit includes:
Subelement is trained, for utilizing the temporal expression for being labelled with determination time, non-determined time or other times Language material training generation support vector machines model;
Subelement is divided, for the temporal expression to be inputted into the SVM models, the temporal expression is divided For very first time expression formula, the second temporal expression or the 3rd temporal expression.
Accordingly, the converting unit includes:
Subelement is identified, for identifying trigger word in second temporal expression by default triggering word list;
Subelement is obtained, for obtaining time conversion regime corresponding to the trigger word from the triggering word list;
Conversion subunit, fiducial time is used as nearest very first time expression formula using before second temporal expression Point, the trigger word in second temporal expression is changed using time conversion regime corresponding to the trigger word.
Accordingly, the participle unit includes:
Subelement is generated, for the language material of sequence labelling to be trained into generation CRF moulds using condition random field CRF algorithms Type;
Subelement is segmented, for CRF models described in text input to be identified to be segmented, obtains word segmentation result.
Accordingly, described device also includes:
Acquiring unit, it is used for the regularity of recognition time basis word corresponding to each chronological classification for obtaining, it is described Chronological classification include absolute time, relative time, the period, temporal frequency, Fuzzy Time, holiday time, the time in dynasty with And event time.
Accordingly, the mark unit is specifically used for:
Multiple continuous time basis words are combined to the temporal expression being labeled as in the text to be identified; Or when only existing structural auxiliary word between multiple time basis words, by multiple time basis words and multiple institutes Existing structural auxiliary word is combined between stating time basis word, the temporal expression being labeled as in the text to be identified.One Computer-readable recording medium is planted, instruction is stored with the computer readable storage medium storing program for executing, when the instruction is in terminal device During upper operation so that the above-mentioned Chinese time recognition methods of the terminal device.
A kind of computer program product, when the computer program product is run on the terminal device so that the terminal Equipment performs above-mentioned Chinese time recognition methods.
As can be seen here, the embodiment of the present application has the advantages that:
The embodiment of the present application segments to text to be identified first, by each participle and the progress of default regularity Match somebody with somebody, the participle matched with regularity is defined as time basis word, you can so that presence is recognized accurately in text to be identified The time basis word of time element, further can be using independent time basis word as the temporal expression in text to be identified Or multiple time basis words are combined as the temporal expression in text to be identified, so as in the spirit of Chinese word structure The identification of Chinese time is realized in the case of work.
Brief description of the drawings
Fig. 1 is a kind of flow chart for Chinese time recognition methods that the embodiment of the present application provides;
Fig. 2 is a kind of flow chart for Chinese time recognition methods that another embodiment of the application provides;
Fig. 3 is a kind of schematic diagram for Chinese time identification device that the embodiment of the present application provides;
Fig. 4 is the schematic diagram for another Chinese time identification device that the embodiment of the present application provides.
Embodiment
It is below in conjunction with the accompanying drawings and specific real to enable the above-mentioned purpose of the application, feature and advantage more obvious understandable Mode is applied to be described in further detail the embodiment of the present application.
Natural language processing is computer science and an important directions in artificial intelligence field, and its research can Realize the various theoretical and methods for carrying out efficient communication between people and computer with natural language.In daily life, The time dimension critically important as one, plays key player, it is appreciated that when the context of many things is required for passing through Between, for example, the sequencing occurred using the time come locating events.Computer is allowed correctly to identify that the time has critically important Function and significance.
Due to flexible structure between Chinese word, clause is complicated, and Chinese can be recognized accurately in the prior art by lacking The scheme of time.Traditional Chinese time recognition methods, some simple temporal expressions, such as tomorrow are only can interpolate that, 2017 etc., but for the temporal expression of complexity, traditional Chinese time recognition methods accuracy rate is relatively low, Wu Faman The demand of sufficient user.
The embodiment of the present application inventor has found that can combine machine learning CRF (Conditional Random Field, condition random field) algorithm and regularity, text to be identified is segmented, therefrom identifies time base Plinth word, by time basis word separately as temporal expression, or conduct is combined according to preparatory condition to time basis word Temporal expression, the speed and accuracy rate for identifying the Chinese time can be greatly improved.
Below in conjunction with the accompanying drawings, embodiments herein is described in further detail.
Fig. 1 shows a kind of flow chart for Chinese time recognition methods that the embodiment of the present application provides, reference picture 1, the party Method includes:
S101:Text to be identified is segmented, obtains word segmentation result.
Text to be identified can be considered as the text for needing to identify.Text to be identified can be selected in the text by user, Text to be identified can be inputted in input frame by user.Participle can be understood as text to be identified being divided into the process of word. Word segmentation result can be understood as that text to be identified is carried out to segment resulting result, and word segmentation result can include text to be identified Each participle and each participle part of speech.
In a kind of possible implementation of the embodiment of the present application, the language material of sequence labelling can be entered using CRF algorithms Row training generation CRF models, text input CRF models to be identified are segmented, obtain word segmentation result.
CRF models can be understood as the condition of another group of output stochastic variable under conditions of given one group of input stochastic variable The probability non-directed graph discriminative model of probability distribution.CRF models can solve HMM (Hidden Markov Model, hidden Ma Er Section's husband's model) and MEMM (Maximum Entropy Markov Model, maximum entropy Markov model) in sequence labelling Mark offset issue.
Sequence labelling can be understood as the process being labeled to the lexeme information of the character in character string.Lexeme information It can include in prefix, word, suffix, monosyllabic word, can be represented respectively with B, M, E, S.As an example, text to be identified can Think " I likes eating green apple ", sequence labelling carried out to it, be specially " I S happiness B it is joyous E eat S green grass or young crops B apples M fruits E ", its In, the participle in character combination formation word segmentation result between a prefix and thereafter closest suffix, for example, " happiness Vigorously " and " green apple ";Monosyllabic word is individually formed a participle in word segmentation result, for example, " I " and " eating ".
The language material of sequence labelling can be utilized CRF algorithms by the character string after mark as the language material of training CRF models CRF models can be generated by being trained, and text to be identified can be segmented by the CRF models, obtained word segmentation result.It is logical The language material for crossing more sequence labelling is trained to CRF models, is advantageous to improve the standard that CRF models segment text to be identified True rate.
S102:Character string to be identified is matched with the regularity for recognition time basis word, will be advised with canonical The character string to be identified then matched is defined as time basis word.
Due to Chinese flexible structure, clause is complicated, diversified temporal expressions mode be present.That is, timetable Species up to formula is various, if be all identified per several temporal expressions with a kind of regularity, needs more canonical Text to be identified is identified rule, causes recognition rate to reduce.Therefore, temporal expression is drawn in the embodiment of the present application It is divided into one or more time basis words to be identified, time basis word can be considered as composition list minimum in temporal expression Member, due to time basis word species well below temporal expression species, for the regularity quantity of time basis word Also accordingly it is greatly lowered, identifies that the time of time basis word goes out time of temporal expression than Direct Recognition and significantly dropped It is low.
In the embodiment of the present application, can be entered by character string to be identified and the regularity for recognition time basis word Row matching, is defined as time basis word by the character string to be identified matched with regularity.
Wherein, character string to be identified can include a participle in word segmentation result or multiple continuous participles.One side On the other hand face subsequently can also can respectively add participle using each participle in word segmentation result as character string to be identified The participle of predetermined number is segmented as character string to be identified as character string to be identified, such as using a certain, while after the participle Continuous 1 participle of addition, in the participle, subsequently for 2 participles of addition etc. collectively as character string to be identified, i.e., one participle can be right Answer multiple character strings to be identified.
In order to facilitate understanding, treat identification string with reference to specific example and illustrate.Assuming that the result after participle For " 2017 September 13 days before submit ", character string to be identified can include it is each participle " 2017 ", " September ", " 13 days ", " it ", " preceding " and " submission ", character string to be identified can also include dividing after each participle and each participle Not Tian Jia predetermined number the character string that is formed of participle.By one participle corresponding to multiple character strings to be identified respectively with canonical Rule matches, then can be defined as the most long character string to be identified that can be matched corresponding to the participle with regularity Time basis word.So that preset length is 4 participles as an example, it is assumed that have identification certain year, certain month, the canonical before one day and one day Rule.Then for participle " 2017 " be corresponding with respectively " 2017 ", " in September, 2017 ", " on September 13rd, 2017 " and " 2017 on September 13, it " these character strings to be identified, then only having " 2017 " can be with the regularity of identification certain year Match somebody with somebody, then " 2017 " can be identified as time basis word, and similarly " September " can be identified as time basis word.For participle " 13 days " are corresponding with " 13 days ", " 13 days it ", " before 13 days " and " being submitted before 13 days " these characters to be identified respectively Go here and there, then " 13 days " can match with identifying the regularity of one day, and " before 13 days " can advise with the canonical before identifying one day Then match, character string " 13 days " to be identified and " before 13 days " can match with regularity, then now take " 13 days " this The most long character string " before 13 days " to be identified matched corresponding to individual participle with regularity is used as time basis word
It is identified, can be kept away by the character string formed to the participle of addition predetermined number after segmenting and segmenting Exempt from will " before 13 days ", the time basis word such as " before 500 years " be identified as " 13 days " or " 500 years ", improve Chinese time identification Accuracy rate.
Regularity can describe a kind of pattern of string matching, for checking whether a character string contains certain seed String, or extraction meets the substring of some condition from some character string.Specific to the application implementation column, regularity can be A kind of specific, regularity for recognition time basis word.Regularity can be defined with regular expression.Regular expressions Formula can be considered as the word template being made up of general character and metacharacter.General character can be understood as all non-Explicit designations The printing of metacharacter and non-print character composition, can include all upper case or lower case letters, numeral, punctuation mark, Chinese character Deng.Metacharacter can be understood as the character to acquire a special sense.For example, " $ " character can match the end bit of input character string Put, " " character late can be labeled as spcial character, literal character, backward reference or ESC, " ^ " can be matched The starting position of character string is inputted, " Shu " can represent "or" logical relation.
In order to make it easy to understand, it shown below is some examples of regularity.
Regularity [d { 4 } years] can be used for the character string for identifying the time of 4 bit digitals, such as " 2017 ".
Regularity [(the Shu 12 of 10 Shu of d Shu 11) moon] can be used for identification numeral plus the character string in month, such as " 8 Month ".
Identify, can also be represented in numeral on time basis for Chinese, increase includes the regularity of Chinese character, such as will Numeral replaces with the regularity of " one ", " two " or " one ", " two " etc., for identifying the similar table such as in 2017 or eight month The time basis word stated.
For ambiguous character string, for example, " No. 8 " can be 8 days in spoken language, or No. 8 sportsmen, Ke Yi Increase some restrictions in regularity, single No. xx is not judged, such as only identify " the xx months No. 8 " or " on No. xx Noon " etc..
[(bright Shu is big for regularityShu is big afterwardsBefore) day] can be used for identifying " tomorrow ", " day after tomorrow ", " day after tomorrow ", " day before yesterday " And/or " three days ago ".Wherein, "" represent matching subexpression zero degree above or once.
Regularity [(d+ | [one two three four five six seven eight nine ten hundred thousand ten thousand]+) week] can be used for identification 3 weeks, 22 weeks or The time basis word of the similar statement such as three weeks.
Regularity is [per (oneYear | oneMonth | oneDay | oneMy god | oneWeek | oneWhen | oneCarve | onePoint | oneSecond)] can For identify it is annual, monthly, every point, Mei Yimiao, per a moment etc. similar statement time basis word.
Regularity [several (ten | hundred | thousand | ten thousand)It is individualHour] can be used for identifying several hours, a few houres, Ji Shi little When, the time basis word of the similar statement such as hundreds of hours.
For festivals or holidays, dynasty and other times etc., can consult festivals or holidays common all over the world and Chinese dynasty with And other times, regularity is write to identify according to above-mentioned identical method, and the embodiment of the present application will not be repeated here.
In addition, for the ease of being managed to regularity, in some possible implementations of the embodiment of the present application, The time of Chinese can be divided into absolute time, relative time, period, temporal frequency, Fuzzy Time, holiday time, court For time, event time totally eight major class.The canonical corresponding to each chronological classification for recognition time basis word can be obtained to advise Then, treat identification string to be matched, identify time basis word.
Wherein, absolute time can be understood as specific standards time, such as " September 1 day 14 point 53 minutes in 2013 ", " 20 generation Record the nineties " etc..Relative time can be considered as the phase time from a kind of time reference point, such as " tomorrow ", " yesterday " etc.. " period " can represent time span, such as 700 years, two weeks.Temporal frequency can represent the regular time, such as " every My god ", " each Tuesday " etc..Fuzzy Time can be the time that can not specifically state, such as " centuries ", " a few houres " etc..Section Time holiday can include world's festivals or holidays, traditional festivals or holidays, such as " Christmas Day ", " Labor Day ", " May Day ", " mid-autumn Section " and " Spring Festival " etc..Time in dynasty, such as " Southern Song Dynasty ", " spring and autumn " etc..When " event time " can be used to indicate that other events Between noun, such as " the Chibi, Battle ", " Third Plenary Session of the 11th Central Committee of the Chinese Communist Party of party ".
It will be classified by above-mentioned classification time, due to non-overlapping copies and influence between classification, can will identify inhomogeneity The regularity of time basis word is managed and safeguarded respectively, facilitates the regularity in later stage to integrate and extend.Such as go out New temporal expressions mode is showed such as network neologisms, rapidly can add new regularity in corresponding classification to enter Row identification.Can also be the time by the part-of-speech tagging of the character string after character string to be identified is identified as into time basis word, this A little character strings for being labeled as the time as the language material of CRF model trainings, can improve the accuracy rate of Model Identification time basis word.
S103:The temporal expression independent time basis word being labeled as in the text to be identified, or will Multiple time basis words for meeting preparatory condition are combined the temporal expression being labeled as in the text to be identified.
Independent time basis word can be understood as single time basis word, in the adjacent position of the single time basis word Typically no other time basis words.Adjacent position is it is to be understood that in addition to structural auxiliary word, with the time basis word Position where closest participle.Independent time basis word can be identified as temporal expression.
As an example, the first text to be identified " we agreement tomorrow go see film " in, " tomorrow " can regard For an independent time basis word, the second text to be identified " we agreement tomorrow afternoon 3 points go see film " in " tomorrow " is then typically not intended as independent time basis word.
In the application in some possible implementations, multiple time basis words for meeting preparatory condition are combined mark The implementation for noting the temporal expression in text to be identified can be that can be combined multiple continuous time basis words The temporal expression being labeled as in text to be identified, or when only existing structural auxiliary word between multiple time basis words, also may be used So that existing structural auxiliary word between multiple time basis words and multiple time basis words to be combined, text to be identified is labeled as Temporal expression in this.Wherein, at least two time basis words are not present interval or only using spaces as interval, can be considered as company Continuous time basis word.
As an example, the second text to be identified " we agreement tomorrow afternoon 3 points go see film " in " tomorrow ", " afternoon " and " 3 points " can be considered as continuous time basis word, can enter the time basis word of these contacts Row combination, is labeled as temporal expression " 3 points of tomorrow afternoon ".3rd text to be identified " we agreement tomorrow afternoon 3 Point go see film ", tomorrow ", only exist between " afternoon " and " 3 points " these three time basis words structural auxiliary word " ", Then can will " tomorrow ", " " are combined in " afternoon " and " 3 points " etc., be labeled as temporal expression " afternoon 3 of tomorrow Point ".
Minimum composition unit of the time basis word as temporal expression, compared to Direct Recognition temporal expression, during identification Between regularity is simpler used by basic word, quantity is less, and recognition rate is significantly improved.Present inventor Provide test data, identify the temporal expression of identical content, required for the identification method using " time basis word " Time is the 1/70 of the time needed described in traditional Direct Recognition mode.It can be seen that the identification method based on time basis word can To greatly improve time recognition rate.
After multiple time basis words are identified, time basis word is combined according to default condition and is labeled as the time Expression formula, the identification breakpoint problem in traditional identification method can be avoided.Identify breakpoint, it can be understood as complete by one Time expression recognition is multiple short temporal expressions, for example, will " on January 1st, 2017 " be identified as " 2017 ", " January ", Multiple short temporal expressions such as " 1 day ".
It can be seen that by the way that time basis word is identified, and combine and mark according to preparatory condition, not only increase identification speed Rate, recognition accuracy is also improved, the time can not be accurately identified so as to avoid, the forfeiture that caused time dimension judges, Influence to functions such as text minings.
So, the embodiment of the present application is segmented by treating text, obtains word segmentation result, determines to treat according to word segmentation result Identification string, character string to be identified is matched with the regularity for recognition time basis word, will be with regularity The character string to be identified of matching is defined as time basis word, and independent time basis word is labeled as into timetable in text to be identified Temporal expression is labeled as up to formula, or by multiple time basis word combinations for meeting preparatory condition.Compared to traditional identification Method, the recognition methods that the application provides is due to by the regularity for recognition time basis word, substantially increasing identification Speed and identification accuracy rate, be time dimension excavate text message bring great convenience.
The temporal expression that the method provided by the embodiment of the present application identifies can be absolute time or phase To the time, the other times in Chinese eight big chronological classifications are can also be.In order to more fully understand the context of text to be identified, Text is excavated, the embodiment of the present application can also carry out time conversion to the temporal expression identified.
Fig. 2 show the flow chart of another Chinese time recognition methods embodiment of the application offer.The present embodiment master If being illustrated to the conversion method of temporal expression, referring to Fig. 2, this method includes:
S201:Temporal expression is divided into very first time expression formula, the second temporal expression or the 3rd temporal expressions Formula.
Very first time expression formula can be the temporal expression for including determining the time, and the second temporal expression can be to include The temporal expression of non-determined time, the 3rd temporal expression can be other times expression formula.Very first time expression formula can be with As reference time point, the second temporal expression is the temporal expression for needing to change in the embodiment of the present application.
In order to facilitate understanding, can illustrate." on January 1st, 2017 " can be considered as the time of a determination, belong to the One temporal expression." tomorrow " can be considered as a non-determined time, belong to the second temporal expression, can be based on context suitable Sequence and the semantic time for being converted into determination." annual " is temporal frequency, neither reference time point, also non-required conversion Time, belong to the 3rd temporal expression.
Division is carried out to temporal expression can use SVMs (Support Vector Machine, SVM) mould Type.SVM models typically can realize linear regression or classification by the way that sample space is mapped into a high-dimensional feature space.This Application embodiment is mainly classified using SVM models to temporal expression.
As a kind of possible implementation, it can utilize and be labelled with determination time, non-determined time or other times Temporal expression language material training generation support vector machines model, by temporal expression input SVM models, by timetable Very first time expression formula, the second temporal expression or the 3rd temporal expression are divided into up to formula.
Temporal expression language material can be understood as when training one kind of generation vector machine SVM models to be labelled with determination Between, the temporal expression of non-determined time or other times, temporal expression language material can obtain from corpus, can also The temporal expression identified in above-described embodiment is labeled as temporal expression language material.Utilize the temporal expression language Material is trained, and can generate SVM models, because SVM models can realize classification in high-dimensional feature space, thus can be used In temporal expression is divided into very first time expression formula, the second temporal expression or the 3rd temporal expression.
For training the temporal expression language material of SVM models more, be more advantageous to improve SVM models accuracy rate, pair when Between expression of grouping effect it is also better.
S202:It regard the second temporal expression very first time expression formula nearest using before second temporal expression as base Point carries out time conversion between punctual.
Because in text to be identified, context is often to have what is necessarily associated, therefore, the knowledge provided using the application When the temporal expression that other method identifies includes the uncertain time, namely the second temporal expression, context can be combined Semanteme, be converted into the temporal expression of a determination time, namely very first time expression formula.
Specifically, very first time expression formula that can be nearest using before second temporal expression is used as fiducial time click-through The row time changes.For example, the second temporal expression is " tomorrow ", nearest before " tomorrow " " very first time expression formula is " on January 1st, 2017 ", considering context, the second temporal expression can be understood as " tomorrow on January 1st, 2017 ", that is, Second temporal expression can be converted to " on January 2nd, 2017 ".
In some possible implementations of the embodiment of the present application, by the second temporal expression with second temporal expressions Nearest very first time expression formula can include by default as the realization of reference time point progress time conversion before formula Triggering word list identifies trigger word in the second temporal expression;Time conversion corresponding to trigger word is obtained from triggering word list Mode;Nearest very first time expression formula is corresponding using trigger word as reference time point using before second temporal expression Time conversion regime the trigger word in the second temporal expression is changed.
Wherein, the time basis word of triggered time conversion, the time base of triggered time conversion can be included by triggering word list Plinth word can be understood as trigger word, and time conversion regime corresponding to trigger word can also be included by triggering in word list.Triggered time The time basis word of conversion typically can be the time basis word for representing uncertain time.For example, " day after tomorrow ", " yesterday ", " one week Afterwards " and " last year " etc..Time conversion regime corresponding to trigger word is obtained from triggering word list, then with second timetable Time point on the basis of nearest very first time expression formula before up to formula, the time conversion regime according to corresponding to trigger word, during progress Between plus-minus etc. logical operation, by the non-determined time portion in the second temporal expression be converted to determine the time.
For example, time conversion regime corresponding to " after one week " is " when on the basis of very first time expression formula nearest before Between point plus 7 days ", time conversion regime corresponding to " yesterday " is " on the basis of very first time expression formula nearest before time point Subtract 1 day ", time conversion regime corresponding to " day after tomorrow " be on the basis of very first time expression formula nearest before time point add 2 My god ", time conversion regime corresponding to " last year " be on the basis of very first time expression formula nearest before time point subtract 1 year ".
In some possible implementations of the embodiment of the present application, the time of determination can be divided into year, month, day, when, Multiple dimensions such as minute, second, for example, xx mornings day, xx noons day, xx afternoons day, xx at night, xx mornings can represent the dimension of " day " Degree, at the beginning of xx the first tenday period of a month month, the xx middle of the month, the xx last ten-days period month, the xx months, the xx the end of month can represent the dimension of " moon ".To containing trigger word The second temporal expression when being changed, the time dimension where trigger word can be changed.Triggering will not contained When second temporal expression of word is supplemented, the second temporal expression can be supplemented completely according to current dimension.
, can be to the second temporal expressions when including trigger word and other time basis words in the second temporal expression Trigger word in formula is changed, and other time basis words are not changed, and retains original form.For example, when second Between expression formula be " tomorrow afternoon ", trigger word is " tomorrow ", and other times basis word is " afternoon ", second temporal expression it Preceding nearest very first time expression formula is " on August 1st, 2017 ", when being changed to second temporal expression, only to triggering Word " tomorrow " is changed, because the dimension where trigger word " tomorrow " is " day ", i.e., on the basis of very first time expression formula Increase by one day, that is, " tomorrow " " on August 2nd, 2017 " will be converted to, and other time basis words " afternoon " keep constant, The second temporal expression after thus changing is " afternoon on the 2nd of August in 2017 ".If the dimension where trigger word is year, with " year " is that unit is changed, such as trigger word " last year ", then can be on the time basis in nearest very first time expression formula On subtract 1 year.When not including trigger word in the second temporal expression, the very first time that can also be nearest before expresses Formula, based on the current dimension of the second temporal expression, the second temporal expression is supplemented complete.For example, the second temporal expression For " 9 days ", nearest very first time expression formula is " on August 1st, 2015 " before the second temporal expression, due to the second timetable Only there is a time basis word " 9 days " up in formula, without trigger word, and the time dimension where " 9 days " is " day ", therefore Can according in very first time expression formula nearest before with second temporal expression in incidence relation semantically, by second Temporal expression adds to the dimension of " day ", is specially " on August 9th, 2015 ", if the second temporal expression is " September ", due to The dimension where time basis word " September " in second temporal expression is " moon ", according to incidence relation semantically, by this " moon " dimension is added in second temporal expression, is specially " in September, 2015 ".
In order to improve the accuracy of time conversion, calendar function can also be increased in conversion time.Specially to benchmark When time point is added and subtracted, added and subtracted according to the time on calendar, the time after being changed.
In order to facilitate understanding, illustrated with reference to example.
When reference time point is " on 2 25th, 2012 ", the second temporal expression is " on March 4th, 2012 " " after one week ", Reason is that 2 months in 2012 are 28 days, on the basis of 25 days 2 months plus 7 days, is included on the basis of 2 months plus 3 days, in March Add 4 days on the basis of part.When reference time point is " on May 1st, 2017 ", the second temporal expression is " in April, 2017 " before three days " 28 days ", reason is that April is 30 days, the previous day i.e. April 30 on May 1, on the basis of April 30 subtracts two again day As April 28.
It can be seen that " March 32 " similar wrong time expression way can be avoided the occurrence of by calendar translation function, improve The accuracy rate of time conversion.Another Chinese time recognition methods that the embodiment of the present application provides, passes through the time to identifying Expression formula is divided into very first time expression formula, the second temporal expression and the 3rd temporal expression, and very first time expression formula is The time is determined, the second temporal expression includes the non-determined time, can be by second temporal expressions for the second temporal expression Nearest very first time expression formula carries out time conversion, the time determined as reference time point before formula.The application is real The time switching function of example offer is provided, can be after temporal expression be identified, can be in semantic level to the uncertain time The non-determined time is converted to and determines the time, the accuracy rate of Chinese time identification is improved, there is important meaning for text mining Justice.
The Chinese time recognition methods provided for the ease of understanding the application to implement, is carried out with reference to concrete application scene Introduce, this method is as follows:
First, establish CRF models using the corpus and CRF algorithms that have marked.
Corpus can be " People's Daily " being labeled to lexeme information therein.
Second, text to be identified is segmented, obtains word segmentation result.
Text to be identified is the text that user inputs in input frame, specific as follows:
" during nineteen twenty-seven August 1 day 2, event A started ... afternoon on the same day, and event B occurs, and nineteen twenty-seven August is from 3 days, event C When arrival Linchuan on the 7th occurs ..., event D occurs.At the beginning of September, event E occurs ".
3rd, matched according to the regularity for recognition time basis word, determine time basis word.
According to the matching process provided in the embodiment of the present application, the time base that can be identified from above-mentioned text to be identified Plinth word includes:" nineteen twenty-seven ", " August ", " 1 day ", " when 2 ", " same day ", " afternoon ", " nineteen twenty-seven ", " August ", " 3 days ", " 7 days " And " at the beginning of September ".
Wherein, " nineteen twenty-seven ", " August ", " 1 day ", " when 2 " are one group of continuous time basis word, and " same day ", " afternoon " be Another group of continuous time basis word, " nineteen twenty-seven ", " August ", it is within " 3 days " one group of continuous time basis word, 7 are independent Time basis word, September are just independent time basis word.
4th, the temporal expression independent time basis word being labeled as in text to be identified, or by multiple satisfactions The time basis word of preparatory condition is combined the temporal expression being labeled as in text to be identified.
According to time basis word independent in above-mentioned text and multiple continuous time basis words, temporal expression is obtained It is as follows:
(1) during nineteen twenty-seven August 1 day 2.
(2) afternoon on the same day.
(3) nineteen twenty-seven August 3 days.
(4) 7 days.
(5) at the beginning of September.
5th, the temporal expression that will identify that using SVM models is classified.
Temporal expression (1) and temporal expression (3) are very first time expression formula, to determine the time.Temporal expression (2), temporal expression (4) and temporal expression (5) are the second temporal expression, including the non-determined time, to need to change Time.
6th, the second temporal expression is changed, the non-determined time is converted to and determines the time.
Wherein, very first time expression formula nearest before the temporal expression (2) is temporal expression (1), can by when Between time reference point of the expression formula (1) as temporal expression (2) conversion time.Therefore, temporal expression (2) " afternoon on the same day " In " same day " be trigger word, could alternatively be nineteen twenty-seven August 1, temporal expression (2) conversion after be " nineteen twenty-seven August 1 day Afternoon ".
Nearest very first time expression formula is temporal expression (3) before temporal expression (4) and temporal expression (5), The time reference point that can be changed using temporal expression (3) as temporal expression (4) and (5) time.
For temporal expression (4), on the basis of temporal expression (3) " nineteen twenty-seven August 3 days ", temporal expression (4) " 7 Day " should have and temporal expression (3) identical year information and month information, therefore, the temporal expression (4) after conversion For " nineteen twenty-seven August 7 days ".
For temporal expression (5), on the basis of temporal expression (3) " nineteen twenty-seven August 3 days ", temporal expression (5) " 9 The beginning of the month " should have and temporal expression (3) identical year information, therefore the temporal expression after conversion is " nineteen twenty-seven September Just ".
A kind of Chinese time recognition methods that the embodiment of the present application provides, is divided text to be identified by CRF models Word, the time basis word in text to be identified is identified according to the regularity for recognition time basis, by the independent time Basic word is labeled as temporal expression, or multiple continuous time basis words are combined, and is labeled as temporal expression.Enter One step, the expression formula that will identify that is divided into very first time expression formula, the second temporal expression and the 3rd temporal expression, the One temporal expression is determines the time, and the second temporal expression includes the non-determined time, for the second temporal expression, with before Nearest very first time expression formula is time reference point, is converted into and determines the time.This Chinese time recognition methods due to Time basis is identified, the quantity of the regularity for matching is greatly reduced, improves recognition rate, by knowing Basis does not carry out mark alone or in combination according to rule can avoid identifying breakpoint, improve the accuracy rate of Chinese time identification. Passage time translation function, can be in semantic level, and the non-determined time that will identify that is converted to the determination time, in understanding It is hereafter significant, it is convenient to excavate text message in time dimension.
Above for the embodiment of the present application provide Chinese time recognition methods embodiment, based on this, the application Embodiment additionally provides a kind of Chinese time identification device, referring specifically to following examples.
Fig. 3 show a kind of Chinese time identification device schematic diagram of the embodiment of the present application offer, and the device embodiment can With including participle unit 301, determining unit 302, mark unit 303, wherein:Participle unit 301, for entering to text to be identified Row participle, obtains word segmentation result;
Determining unit 302, for character string to be identified to be matched with the regularity for recognition time basis word, The character string to be identified matched with regularity is defined as time basis word, character string to be identified includes one in word segmentation result Individual participle or multiple continuous participles;
Unit 303 is marked, for the temporal expression being labeled as independent time basis word in text to be identified, or Multiple time basis words for meeting preparatory condition are combined to the temporal expression being labeled as in text to be identified.
Optionally, participle unit 301 can include generation subelement and participle subelement,
Subelement is generated, for the language material of sequence labelling to be trained into generation CRF moulds using condition random field CRF algorithms Type;
Subelement is segmented, for text input CRF models to be identified to be segmented, obtains word segmentation result.
Optionally, the device embodiment also includes acquiring unit, is used to identify corresponding to each chronological classification for obtaining The regularity of time basis word, chronological classification include absolute time, relative time, the period, temporal frequency, Fuzzy Time, Holiday time, time in dynasty and event time.
Optionally, mark unit 303 can be specifically used for:
Multiple continuous time basis words are combined to the temporal expression being labeled as in text to be identified;
Or when only existing structural auxiliary word between multiple time basis words, by multiple time basis words and it is multiple when Between between basic word existing structural auxiliary word be combined, the temporal expression being labeled as in text to be identified.
The Chinese time identification device embodiment that the application provides, is segmented by participle unit to text to be identified, Determining unit will be matched for the regularity of recognition time basis word with the character string to be identified of participle composition, be determined Independent time basis word is labeled as the temporal expression in text to be identified, or will be more by time basis word, mark unit The individual time basis word for meeting preparatory condition is combined the temporal expression being labeled as in text to be identified, substantially increases knowledge Other speed and the accuracy rate of identification, it is to excavate text message in time dimension to bring great convenience.Implemented by the application The temporal expression that the device that example provides identifies can be absolute time or relative time, can also be Chinese eight Other times in big chronological classification.In order to more fully understand the context of text to be identified, text is excavated, the application Embodiment additionally provides another Chinese time identification device embodiment, can carry out the time turn to the temporal expression identified Change.Referring specifically to following examples.
Fig. 4 show a kind of Chinese time identification device schematic diagram of the embodiment of the present application offer, implements in said apparatus On the basis of example, the device embodiment can also include division unit 404 and converting unit 405, wherein:Division unit 404, use In temporal expression is divided into very first time expression formula, the second temporal expression or the 3rd temporal expression, the very first time Expression formula be include determine the time temporal expression, the second temporal expression be the temporal expression for including the non-determined time, 3rd temporal expression is other times expression formula;
Converting unit 405, for by the second temporal expression with the very first time nearest before second temporal expression Expression formula carries out time conversion as reference time point.
Optionally, division unit 404 can include:
Subelement is trained, for utilizing the temporal expression for being labelled with determination time, non-determined time or other times Language material training generation support vector machines model;
Subelement is divided, for temporal expression to be inputted into SVM models, temporal expression is divided into very first time table Up to formula, the second temporal expression or the 3rd temporal expression.
Optionally, converting unit 405 can include:
Subelement is identified, for identifying trigger word in the second temporal expression by default triggering word list;
Subelement is obtained, for obtaining time conversion regime corresponding to trigger word from triggering word list;
Conversion subunit, fiducial time is used as nearest very first time expression formula using before second temporal expression Point, the trigger word in the second temporal expression is changed using time conversion regime corresponding to trigger word.The application is implemented A kind of Chinese time identification device that example provides, very first time table is divided into by division unit to the temporal expression identified , can be by second temporal expression with it by converting unit up to formula, the second temporal expression and the 3rd temporal expression Preceding nearest very first time expression formula carries out time conversion, the time determined as reference time point.The embodiment of the present application The time switching function of offer, can be after temporal expression be identified, to the uncertain time, can will be non-in semantic level Determine that the time is converted to and determine the time, improve the accuracy rate of Chinese time identification, it is significant for text mining.
In addition, the embodiment of the present application additionally provides a kind of storage medium.In a kind of possible implementation, above-mentioned storage Medium can be used for the program code for performing the Chinese time recognition methods that the embodiment of the present application provides.
The embodiment of the present application additionally provides a kind of computer program product, and the computer program product is transported on the terminal device During row so that terminal device can perform the Chinese time recognition methods of the embodiment of the present application offer.
It should be noted that each embodiment is described by the way of progressive in this specification, each embodiment emphasis is said Bright is all the difference with other embodiment, between each embodiment identical similar portion mutually referring to.For reality For applying system disclosed in example or device, because it is corresponded to the method disclosed in Example, so fairly simple, the phase of description Part is closed referring to method part illustration.
It should also be noted that, herein, such as first and second or the like relational terms are used merely to one Entity or operation make a distinction with another entity or operation, and not necessarily require or imply between these entities or operation Any this actual relation or order be present.Moreover, term " comprising ", "comprising" or its any other variant are intended to contain Lid nonexcludability includes, so that process, method, article or equipment including a series of elements not only will including those Element, but also the other element including being not expressly set out, or it is this process, method, article or equipment also to include Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that Other identical element also be present in process, method, article or equipment including the key element.
Directly it can be held with reference to the step of method or algorithm that the embodiments described herein describes with hardware, processor Capable software module, or the two combination are implemented.Software module can be placed in random access memory (RAM), internal memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.
The foregoing description of the disclosed embodiments, professional and technical personnel in the field are enable to realize or using the application. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments in the case where not departing from spirit herein or scope.Therefore, the application The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope caused.

Claims (10)

1. a kind of Chinese time recognition methods, it is characterised in that methods described includes:
Text to be identified is segmented, obtains word segmentation result;
Character string to be identified is matched with the regularity for recognition time basis word, will be matched with the regularity Character string to be identified be defined as the time basis word, the character string to be identified includes one point in the word segmentation result Word or multiple continuous participles;
The temporal expression independent time basis word being labeled as in the text to be identified, or meet multiple in advance If the time basis word of condition is combined the temporal expression being labeled as in the text to be identified.
2. according to the method for claim 1, it is characterised in that methods described also includes:
The temporal expression is divided into very first time expression formula, the second temporal expression or the 3rd temporal expression, institute State very first time expression formula be include determine the time temporal expression, second temporal expression be to include the non-determined time Temporal expression, the 3rd temporal expression is other times expression formula;
During by second temporal expression very first time expression formula nearest using before second temporal expression as benchmark Between point carry out time conversion.
3. according to the method for claim 2, it is characterised in that described that the temporal expression is divided into very first time table Up to formula, the second temporal expression or the 3rd temporal expression, including:
The temporal expression language material training generation supporting vector for determining time, non-determined time or other times using being labelled with Machine SVM models;
The temporal expression is inputted into the SVM models, the temporal expression is divided into very first time expression formula, the Two temporal expressions or the 3rd temporal expression.
4. according to the method for claim 2, it is characterised in that it is described by second temporal expression with second time Nearest very first time expression formula carries out time conversion as reference time point before expression formula, including:
Trigger word is identified in second temporal expression by default triggering word list;
Time conversion regime corresponding to the trigger word is obtained from the triggering word list;
Nearest very first time expression formula utilizes the trigger word pair as reference time point using before second temporal expression The time conversion regime answered is changed to the trigger word in second temporal expression.
5. according to the method for claim 1, it is characterised in that it is described that text to be identified is segmented, obtain participle knot Fruit, including:
The language material of sequence labelling is trained generation CRF models using condition random field CRF algorithms;
CRF models described in text input to be identified are segmented, obtain word segmentation result.
6. according to the method for claim 1, it is characterised in that methods described also includes:
The regularity for being used for recognition time basis word corresponding to each chronological classification is obtained, the chronological classification includes absolute time Between, relative time, the period, temporal frequency, Fuzzy Time, holiday time, time in dynasty and event time.
7. according to the method for claim 1, it is characterised in that described by multiple time basises for meeting preparatory condition Word is combined the temporal expression being labeled as in the text to be identified, including:
Multiple continuous time basis words are combined to the temporal expression being labeled as in the text to be identified;Or Person, when only existing structural auxiliary word between multiple time basis words, by multiple time basis words and multiple described Existing structural auxiliary word is combined between time basis word, the temporal expression being labeled as in the text to be identified.
8. a kind of Chinese time identification device, it is characterised in that described device includes:
Participle unit, for being segmented to text to be identified, obtain word segmentation result;
Determining unit, will be with institute for character string to be identified to be matched with the regularity for recognition time basis word The character string to be identified for stating regularity matching is defined as the time basis word, and the character string to be identified includes the participle As a result a participle or multiple continuous participles in;
Unit is marked, for the temporal expression being labeled as the independent time basis word in the text to be identified, or Multiple time basis words for meeting preparatory condition are combined the temporal expressions being labeled as in the text to be identified by person Formula.
A kind of 9. computer-readable recording medium, it is characterised in that instruction is stored with the computer readable storage medium storing program for executing, when When the instruction is run on the terminal device so that when the terminal device perform claim requires the Chinese described in any one of 1-7 Between recognition methods.
10. a kind of computer program product, it is characterised in that when the computer program product is run on the terminal device, make Obtain the Chinese time recognition methods described in the terminal device perform claim requirement any one of 1-7.
CN201710912117.0A 2017-09-29 2017-09-29 Chinese time identification method and device, storage medium and program product Active CN107729314B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710912117.0A CN107729314B (en) 2017-09-29 2017-09-29 Chinese time identification method and device, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710912117.0A CN107729314B (en) 2017-09-29 2017-09-29 Chinese time identification method and device, storage medium and program product

Publications (2)

Publication Number Publication Date
CN107729314A true CN107729314A (en) 2018-02-23
CN107729314B CN107729314B (en) 2021-10-26

Family

ID=61209414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710912117.0A Active CN107729314B (en) 2017-09-29 2017-09-29 Chinese time identification method and device, storage medium and program product

Country Status (1)

Country Link
CN (1) CN107729314B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920500A (en) * 2018-05-24 2018-11-30 众安信息技术服务有限公司 A kind of time resolution method
CN109800338A (en) * 2018-12-11 2019-05-24 平安科技(深圳)有限公司 Colloquial style time standard control method, device, computer equipment and storage medium
CN110047489A (en) * 2019-04-04 2019-07-23 科讯嘉联信息技术有限公司 A kind of household electrical appliances apply to install the method and system that the time is applied to install in scene intelligent typing
CN110222346A (en) * 2019-06-20 2019-09-10 贵州电网有限责任公司 A method of extracting effective time from interaction data
CN111027319A (en) * 2019-10-30 2020-04-17 平安科技(深圳)有限公司 Method and device for analyzing natural language time words and computer equipment
CN111104798A (en) * 2018-10-27 2020-05-05 北京智慧正安科技有限公司 Analysis method, system and computer readable storage medium for criminal plot in legal document
CN111104481A (en) * 2019-12-17 2020-05-05 东软集团股份有限公司 Method, device and equipment for identifying matching field
CN111144127A (en) * 2019-12-25 2020-05-12 科大讯飞股份有限公司 Text semantic recognition method and model acquisition method thereof and related device
CN111222324A (en) * 2019-12-27 2020-06-02 南京医睿科技有限公司 Time identification method and device, computer readable storage medium and electronic equipment
CN111581963A (en) * 2020-03-30 2020-08-25 深圳壹账通智能科技有限公司 Method and device for extracting time character string, computer equipment and storage medium
CN113988067A (en) * 2021-11-12 2022-01-28 北京嘉和海森健康科技有限公司 Sentence segmentation method and device and electronic equipment
CN114943222A (en) * 2022-05-13 2022-08-26 医渡云(北京)技术有限公司 Time entity identification method and device, computer storage medium and electronic equipment
CN116010627A (en) * 2023-03-28 2023-04-25 智慧眼科技股份有限公司 Time extraction method and system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11143864A (en) * 1997-11-06 1999-05-28 Nippon Telegr & Teleph Corp <Ntt> Method and device for date expression normalization and storage medium for recording date expression normalization program
CN101609445A (en) * 2009-07-16 2009-12-23 复旦大学 Crucial sub-method for extracting topic based on temporal information
CN103020034A (en) * 2011-09-26 2013-04-03 北京大学 Chinese words segmentation method and device
CN103823859A (en) * 2014-02-21 2014-05-28 安徽博约信息科技有限责任公司 Name recognition algorithm based on combination of decision-making tree rules and multiple statistic models
CN104951508A (en) * 2015-05-21 2015-09-30 腾讯科技(深圳)有限公司 Time information identification method and device
CN105404686A (en) * 2015-12-10 2016-03-16 湖南科技大学 Method for matching place name and address in news event based on geographical feature hierarchical segmented words
CN105786964A (en) * 2016-01-15 2016-07-20 二十世纪空间技术应用股份有限公司 Web mining-based remote sensing product search limited item semantic extension method
CN106776537A (en) * 2016-11-18 2017-05-31 畅捷通信息技术股份有限公司 The abstracting method and system of temporal information and subject information in text
US20170161372A1 (en) * 2015-12-04 2017-06-08 Codeq Llc Method and system for summarizing emails and extracting tasks
CN106970913A (en) * 2017-05-12 2017-07-21 湖南中周至尚信息技术有限公司 The extracting method and device of a kind of time

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11143864A (en) * 1997-11-06 1999-05-28 Nippon Telegr & Teleph Corp <Ntt> Method and device for date expression normalization and storage medium for recording date expression normalization program
CN101609445A (en) * 2009-07-16 2009-12-23 复旦大学 Crucial sub-method for extracting topic based on temporal information
CN103020034A (en) * 2011-09-26 2013-04-03 北京大学 Chinese words segmentation method and device
CN103823859A (en) * 2014-02-21 2014-05-28 安徽博约信息科技有限责任公司 Name recognition algorithm based on combination of decision-making tree rules and multiple statistic models
CN104951508A (en) * 2015-05-21 2015-09-30 腾讯科技(深圳)有限公司 Time information identification method and device
US20170161372A1 (en) * 2015-12-04 2017-06-08 Codeq Llc Method and system for summarizing emails and extracting tasks
CN105404686A (en) * 2015-12-10 2016-03-16 湖南科技大学 Method for matching place name and address in news event based on geographical feature hierarchical segmented words
CN105786964A (en) * 2016-01-15 2016-07-20 二十世纪空间技术应用股份有限公司 Web mining-based remote sensing product search limited item semantic extension method
CN106776537A (en) * 2016-11-18 2017-05-31 畅捷通信息技术股份有限公司 The abstracting method and system of temporal information and subject information in text
CN106970913A (en) * 2017-05-12 2017-07-21 湖南中周至尚信息技术有限公司 The extracting method and device of a kind of time

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
左亚尧等: "基于规则的中文时间表达式识别与规范化", 《广东工业大学学报》 *
张绍麒: "《辞书与数字化研究》", 31 August 2005 *
李君婵等: "中文时间表达式及类型识别", 《计算机科学》 *
邬桐等: "自动构建时间基元规则库的中文时间表达式识别", 《中文信息学报》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920500B (en) * 2018-05-24 2022-02-11 众安信息技术服务有限公司 Time analysis method
CN108920500A (en) * 2018-05-24 2018-11-30 众安信息技术服务有限公司 A kind of time resolution method
CN111104798A (en) * 2018-10-27 2020-05-05 北京智慧正安科技有限公司 Analysis method, system and computer readable storage medium for criminal plot in legal document
CN111104798B (en) * 2018-10-27 2023-04-21 北京智慧正安科技有限公司 Resolution method, system and computer readable storage medium for sentencing episodes in legal documents
CN109800338A (en) * 2018-12-11 2019-05-24 平安科技(深圳)有限公司 Colloquial style time standard control method, device, computer equipment and storage medium
CN110047489A (en) * 2019-04-04 2019-07-23 科讯嘉联信息技术有限公司 A kind of household electrical appliances apply to install the method and system that the time is applied to install in scene intelligent typing
CN110222346A (en) * 2019-06-20 2019-09-10 贵州电网有限责任公司 A method of extracting effective time from interaction data
CN111027319A (en) * 2019-10-30 2020-04-17 平安科技(深圳)有限公司 Method and device for analyzing natural language time words and computer equipment
CN111104481A (en) * 2019-12-17 2020-05-05 东软集团股份有限公司 Method, device and equipment for identifying matching field
CN111104481B (en) * 2019-12-17 2023-10-10 东软集团股份有限公司 Method, device and equipment for identifying matching field
CN111144127A (en) * 2019-12-25 2020-05-12 科大讯飞股份有限公司 Text semantic recognition method and model acquisition method thereof and related device
CN111222324A (en) * 2019-12-27 2020-06-02 南京医睿科技有限公司 Time identification method and device, computer readable storage medium and electronic equipment
CN111581963B (en) * 2020-03-30 2022-09-20 深圳壹账通智能科技有限公司 Method and device for extracting time character string, computer equipment and storage medium
CN111581963A (en) * 2020-03-30 2020-08-25 深圳壹账通智能科技有限公司 Method and device for extracting time character string, computer equipment and storage medium
CN113988067A (en) * 2021-11-12 2022-01-28 北京嘉和海森健康科技有限公司 Sentence segmentation method and device and electronic equipment
CN113988067B (en) * 2021-11-12 2024-06-25 北京嘉和海森健康科技有限公司 Sentence word segmentation method and device and electronic equipment
CN114943222A (en) * 2022-05-13 2022-08-26 医渡云(北京)技术有限公司 Time entity identification method and device, computer storage medium and electronic equipment
CN116010627A (en) * 2023-03-28 2023-04-25 智慧眼科技股份有限公司 Time extraction method and system

Also Published As

Publication number Publication date
CN107729314B (en) 2021-10-26

Similar Documents

Publication Publication Date Title
CN107729314A (en) A kind of Chinese time recognition methods, device and storage medium, program product
US11610061B2 (en) Modifying text according to a specified attribute
Creutz et al. Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0
CN109147767B (en) Method, device, computer equipment and storage medium for recognizing numbers in voice
US11860684B2 (en) Few-shot named-entity recognition
CN103970798B (en) The search and matching of data
CN107608960B (en) Method and device for linking named entities
CN109949799B (en) Semantic parsing method and system
CN110008473B (en) Medical text named entity identification and labeling method based on iteration method
US20200372088A1 (en) Recommending web api&#39;s and associated endpoints
CN110188175A (en) A kind of question and answer based on BiLSTM-CRF model are to abstracting method, system and storage medium
CN110321549B (en) New concept mining method based on sequential learning, relation mining and time sequence analysis
CN109508448A (en) Short information method, medium, device are generated based on long article and calculate equipment
CN112434533B (en) Entity disambiguation method, entity disambiguation device, electronic device, and computer-readable storage medium
CN113255331B (en) Text error correction method, device and storage medium
CN109522550A (en) Text information error correction method, device, computer equipment and storage medium
CN106980620A (en) A kind of method and device matched to Chinese character string
CN107590119B (en) Method and device for extracting person attribute information
Selvaraj et al. Medication regimen extraction from medical conversations
CN111144116B (en) Document knowledge structured extraction method and device
Čibej et al. Normalisation, tokenisation and sentence segmentation of Slovene tweets
CN110610006B (en) Morphological double-channel Chinese word embedding method based on strokes and fonts
Jamtsho et al. Dzongkha word segmentation using deep learning
US10540987B2 (en) Summary generating device, summary generating method, and computer program product
CN112989807B (en) Long digital entity extraction method based on continuous digital compression coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant