WO2021082424A1 - Parsing method and apparatus for natural language time words, and computer device - Google Patents

Parsing method and apparatus for natural language time words, and computer device Download PDF

Info

Publication number
WO2021082424A1
WO2021082424A1 PCT/CN2020/093111 CN2020093111W WO2021082424A1 WO 2021082424 A1 WO2021082424 A1 WO 2021082424A1 CN 2020093111 W CN2020093111 W CN 2020093111W WO 2021082424 A1 WO2021082424 A1 WO 2021082424A1
Authority
WO
WIPO (PCT)
Prior art keywords
time
words
word
time word
arrangement positions
Prior art date
Application number
PCT/CN2020/093111
Other languages
French (fr)
Chinese (zh)
Inventor
查月阅
张骏
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021082424A1 publication Critical patent/WO2021082424A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification

Definitions

  • This application relates to the technical field of semantic parsing, and in particular to a method, device and computer equipment for parsing natural language time words.
  • time information is an indispensable element for a complete analysis of natural language semantics.
  • the existing recognition method of time information in natural language is mainly based on the recognition of fixed rules, and the fixed rules are matched with the text, so as to extract the time words, for example, extract "September 10th, 2018" which represents the date. Time word.
  • the main purpose of this application is to provide a natural language time word parsing method, device and computer equipment, aiming to solve the disadvantages of the existing time word parsing method that is too rigid, accurate and low in completeness.
  • this application provides a natural language time word parsing method, including:
  • this application also provides a natural language time word parsing device, including:
  • a processing module for removing preset characters in the input text to obtain preprocessed text
  • the word segmentation module is used to segment the preprocessed text according to the first preset rule to obtain several time words
  • the encapsulation module is used for data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;
  • the merging module is used to merge each of the first time words according to a second preset rule to obtain a number of second time words;
  • the parsing module is used to analyze each of the second time words to obtain the time interval corresponding to each of the second time words.
  • the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the natural language time word parsing method when the processor executes the computer program, wherein:
  • the natural language time word parsing method includes the following steps: obtaining input text; removing preset characters in the input text to obtain a preprocessed text; performing word segmentation on the preprocessed text according to a first preset rule to obtain several Time words; data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words; combining the first time words according to the second preset rule to obtain a number of second time words Time words; respectively analyze each of the second time words to obtain the time interval corresponding to each of the second time words.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned natural language time word parsing method is realized, wherein the natural language time word
  • the parsing method includes the following steps: obtaining input text; removing preset characters in the input text to obtain a preprocessed text; segmenting the preprocessed text according to a first preset rule to obtain several time words; The time words are data encapsulated to obtain the first time words corresponding to each of the time words; the first time words are combined according to a second preset rule to obtain a number of second time words; each of the time words is parsed separately For the second time word, the time interval corresponding to each of the second time words is obtained.
  • the natural language time word parsing method, device and computer equipment provided in this application first extract multiple time words from the input text through multiple pre-built recognition rules, and then correspond to each time word in the input text.
  • the arrangement position and the association between the recognition rules are merged corresponding time words, and finally the merged time words are analyzed according to the corresponding word meaning to obtain the corresponding time interval, so as to realize the analysis of all time words in natural language, and effectively improve the input The comprehensiveness and accuracy of time word recognition in the text.
  • Figure 1 is a schematic diagram of the steps of a natural language time word parsing method in an embodiment of the present application
  • FIG. 2 is a block diagram of the overall structure of a natural language time word parsing device in an embodiment of the present application
  • FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the present application.
  • an embodiment of the present application provides a natural language time word parsing method, including:
  • S3 Perform word segmentation on the preprocessed text according to the first preset rule to obtain several time words
  • natural language refers to a language that humans naturally narrate, such as a segment of speech or text.
  • the analysis system receives the user's voice information, it needs to convert the voice information into text information.
  • the parsing system After the parsing system receives the natural language input by the user, it converts it into information in text format to obtain the input text.
  • the parsing system needs to preprocess the input text, identify preset characters in the input text by marking sensitive characters, etc., and remove them, so as to obtain the preprocessed text, so as to reduce the processing complexity of subsequent word segmentation.
  • the input text is: "The net profit of the last month of the second quarter of 2018”
  • the pre-processed text after removing the preset characters by marking the sensitive character " ⁇ " is: "The last month of the second quarter of 2018 Net profit for one month”.
  • a rule library is pre-built in the parsing system.
  • the rule library is a regular expression.
  • the rule library is composed of multiple recognition rules. Each recognition rule contains multiple different recognition parameters. One recognition rule is used to recognize a type of time word. .
  • the parsing system After loading the rule base, the parsing system respectively calls each recognition rule in the rule base to segment the preprocessed text, thereby obtaining one or more time words corresponding to each recognition rule.
  • the preprocessed text is: "Net profit in the last month of the second quarter of 2018”
  • the recognition parameters of recognition rule A include “year”
  • the time word obtained after segmentation of the preprocessed text by recognition rule A is: 2018 Year
  • the recognition parameter of recognition rule B includes "quarter”
  • the time word obtained after word segmentation of the preprocessed text by recognition rule B is: the second quarter.
  • the parsing system encapsulates the data of each time word after word segmentation, so that the format of each time word is unified, and the corresponding first time word is obtained.
  • the first time word carries the time word attribute, and the time word attribute includes corresponding information such as the recognition rule corresponding to the first time word, the arrangement position of the first time word in the input text, for example: the first time word: 2018 , Corresponding rule: identification rule A, start position: 0, end position: 4.
  • the parsing system first filters out two or more first-time words with continuity in the arrangement position according to the arrangement position of each first-time word in the input text, and merges to obtain the first merged time Words, and mark several first time words that do not have continuity in their arrangement positions as time words to be merged.
  • each time word to be merged in the preset range to form several first time word sets, that is, in the same first time word set, the time word to be merged must be the same as another time word to be merged.
  • the arrangement position is within the preset range.
  • the parsing system screens and merges several time words to be merged corresponding to the recognition rules with an association relationship to obtain several second merge time words.
  • the analysis system synthesizes each of the first combined time words and each of the second combined time words to obtain each of the second time words.
  • the analysis system performs corresponding analysis on the second time word, and obtains the corresponding time interval according to the start time and end time corresponding to the second time word.
  • the parsing system outputs the time interval in a preset format, for example, output from 0:00 on January 1, 2018 to 24:00 on December 31, 2018 as: 2018-01-01-0:00——2018 -12-31-24: 00.
  • step of segmenting the preprocessed text according to the first preset rule to obtain several time words includes:
  • S302 Filter from the pre-processed text to obtain a plurality of the time words corresponding to the recognition parameters of each recognition rule.
  • a rule library is pre-built in the parsing system, and the rule library is a regular expression.
  • the rule base is composed of multiple identification rules, and each identification rule contains multiple identification parameters.
  • the identification library also includes special identification parameters such as "before”, “after”, “current day”, “yesterday”, etc., which can be used to identify similar "after 6 days” "This kind of time word.
  • the parsing system filters out one or more time words from the preprocessed text through the recognition parameters in each recognition rule, and realizes the word segmentation of the preprocessed text. Among them, the time words filtered based on the same recognition rule belong to the same category and correspond to the recognition rule.
  • the first time word carries a time word attribute
  • the time word attribute includes the recognition rule corresponding to the first time word and the arrangement position of the first time word in the input text, so
  • the step of combining each of the first time words according to the second preset rule to obtain several second time words includes:
  • S501 According to the sequence of the arrangement positions, sequentially filter and merge several of the first time words with continuity in the arrangement positions to obtain the first merged time word, and combine the ones with no continuity in the arrangement position. A plurality of said first time words are marked as time words to be merged;
  • S502 According to the sequence of the arrangement positions, respectively classify each of the time words to be merged whose arrangement positions are within a preset range into the same set to obtain at least one first time word set;
  • S504 Use the first combined time word and the second combined time word as the second time word.
  • the time word attribute carried by the first time word includes the recognition rule corresponding to the first time word, the start position and the end position of the first time word in the input text, that is, the arrangement position.
  • the parsing system judges whether the end position of a first time word is adjacent to the start position of another first time word, and if the end position is adjacent to the start position, it judges whether the two first time words correspond to the arrangement positions respectively. There is continuity between.
  • the parsing system screens out several first time words with continuity in the arrangement position according to the above method and merges them, thereby obtaining one or more first merged time words. Further, in the process of merging the first time words according to the continuity of the arrangement position, the parsing system may continuously merge multiple first time words.
  • the first time word A and the first time word B have continuity
  • the first time word The word B and the first time word C have continuity
  • the parsing system can combine the first time word A, the first time word B, and the first time word C to obtain a first combined time word.
  • the parsing system marks several first time words whose arrangement positions do not have continuity as time words to be merged, so as to use another rule for merging. Specifically, the parsing system compares the arrangement positions of the time words to be merged in pairs according to the sequence of the arrangement positions of the time words to be merged in the input text.
  • the end position of the time word A is 5, the start position of the time word B to be merged is 8, the end position is 10, the start position of the time word C to be merged is 12, and the preset range is 3.
  • the time word to be merged B and the time word C to be merged are included in the same time word set.
  • the parsing system forms one or more first-time word sets in the above-mentioned manner, and then according to the association relationship between the pre-established recognition rules, in the same first-time word set, selects several pending recognition rules corresponding to the association relationship.
  • the merged time words are merged to obtain the second merged time word.
  • the parsing system synthesizes the first time word to be merged and the second time word to be merged to obtain the second time word.
  • the arrangement position includes a start position and an end position, and according to the sequence of the arrangement position, a number of the first time words with continuity in the arrangement position are sequentially filtered and combined to obtain the first time word.
  • the steps to merge time words include:
  • S5011 Determine whether the end position of one of the first time words is adjacent to the start position of another first time word
  • S5013 According to the sequence of the arrangement positions, sequentially traverse all the first time words, and merge each of the first time words corresponding to the plurality of arrangement positions with the continuity to obtain the first time words. A combined time word.
  • the arrangement position of the first time word in the input text includes a start position and an end position.
  • the parsing system judges whether the end position of a first time word is adjacent to the start position of another first time word. If the end position of a first time word is adjacent to the start position of another first time word, the parsing system determines that the arrangement positions of the two first time words have continuity. For example, the start position of the first time word A is 3 and the end position is 6,; the start position of the first time word B is 7, and the end position is 9; because the end position of the first time word A is "6" and the first time The start position "7" of word B is adjacent, then the system determines that the corresponding arrangement positions of the first time word A and the first time word B have continuity.
  • the parsing system sequentially traverses all the first time words according to the sequence of the arrangement positions of the first time words in the input text, and according to the above-mentioned judgment method, filters out the first time corresponding to each arrangement position with continuity. Words, and merge them in sequence according to the arrangement position of each time word to obtain one or more first merged time words.
  • the step of combining each of the first time words corresponding to the plurality of the arrangement positions having the continuity to obtain the first combined time word includes:
  • S50131 Combine a number of the first time words in order according to their corresponding arrangement positions to obtain the first combined time word.
  • the parsing system when the parsing system merges two or more first time words whose arrangement positions have continuity, it needs to merge sequentially according to the respective arrangement positions of the first time words in the input text.
  • the parsing system can determine the arrangement position of the two first time words in the input text according to the size relationship between the corresponding start positions or end positions of the two first time words. For example, the start position of the first time word A is 5. The start position of the first time word B is 9. Since the start position of the first time word A is smaller than the start article of the first time word B, it must be ranked before the first time word B. Since the input text is obtained based on the natural language input by the user, the time word in the natural language itself has a specific logic and sequence. For example, when we speak, we normally only say September 2018, not September. In 2018, therefore, the parsing system needs to merge the two first time words according to the order of the arrangement positions to obtain the first merged time word.
  • the step of screening each of the to-be-combined time words corresponding to the recognition rules having an association relationship and performing secondary merging to obtain a second combined time word includes :
  • S5031 Classify each of the time words to be merged according to their corresponding recognition rules to obtain several sets of second time words;
  • S5033 Filter and merge at least two of the time words to be merged that are simultaneously included in the third time word set and the first time word set to obtain the second merged time word.
  • the parsing system classifies each time word to be combined according to their corresponding recognition rules, thereby obtaining one or more second time word sets, where each time word to be combined in the same second time word set is Filtered by the same recognition rules.
  • the identification rules in the rule base are pre-built with an association relationship. For example, the identification rule A can identify the time word "year”, the identification rule B can identify the time word "month”, and the identification rule A and the identification rule B are related to each other for follow-up Combine the time word "year” with the time word "month”.
  • the parsing system respectively merges two or more second time word sets corresponding to the recognition rules with an association relationship to obtain one or more third time word sets.
  • the parsing system only needs to filter and merge at least two time words to be merged that are contained in the first time word set and the third time word set at the same time, to obtain the second merged time word with logical association.
  • the step of separately analyzing each of the second time words to obtain the time interval corresponding to each of the second time words includes:
  • S601 Determine whether the second time word belongs to a pre-built marked time word
  • S604 Calculate the corresponding time interval according to the reference time point and the meaning of the second time word.
  • the second time word obtained after the analysis system is merged will have two forms, one is: 2018, August, 9th and other time words with certain semantics, and the other is: today, After the day after tomorrow and 6 days later, this type of semantically ambiguous time words, developers set this type of semantically ambiguous time words as marked time words, and different forms of second time words have different processing methods in the parsing process.
  • the parsing system first judges whether the second time word is a marked time word, if not, it can directly obtain the corresponding time interval according to the start time and end time of the second time word. For example, the second time word is: June 2018, the corresponding time interval is: June 1, 2018 0:00-June 30, 2018 24:00.
  • the corresponding interval is accurate to microseconds, and will not be described in detail here.
  • the analysis system needs to obtain the current reference time point.
  • the reference time point is obtained according to the time zone where the user is currently located, that is, corresponds to the current time zone of the user.
  • the analysis system calculates the corresponding time interval according to the reference time point and the meaning of the second time word. For example, the reference time point is: June 24, 2018, and the second time word is: 3 days later, the corresponding time interval is 2018 From 0:00 on June 27th to 24:00 on June 27th, 2018.
  • the method for parsing natural language time words provided by this embodiment first extracts multiple time words from the input text through a plurality of pre-built recognition rules, and then according to the respective arrangement position and recognition of each time word in the input text The association between the rules carries out the corresponding time word merging, and finally the merged time word is analyzed according to the corresponding word meaning to obtain the corresponding time interval, so as to realize the analysis of all time words in natural language, and effectively improve the recognition of time words in the input text The comprehensiveness and accuracy rate.
  • an embodiment of the present application also provides a natural language time word parsing device, including:
  • the processing module 2 is used to remove preset characters in the input text to obtain preprocessed text
  • the word segmentation module 3 is used to segment the preprocessed text according to the first preset rule to obtain several time words;
  • the encapsulation module 4 is used for data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;
  • the merging module 5 is used to merge each of the first time words according to a second preset rule to obtain a number of second time words;
  • the parsing module 6 is configured to analyze each of the second time words separately to obtain the time interval corresponding to each of the second time words.
  • the analysis device further includes an output module, configured to output each of the time intervals to a display interface in a preset format.
  • the functions and functions of the acquisition module 1, the processing module 2, the word segmentation module 3, the encapsulation module 4, the merging module 5 and the parsing module 6 in the above-mentioned plug-in detection device are detailed in the above-mentioned plug-in based on login data.
  • the implementation process of corresponding steps S1 to S6 in the detection method will not be repeated here.
  • word segmentation module 3 includes:
  • a loading sub-module for loading a pre-built rule library where the rule library is composed of multiple identification rules, and a single identification rule contains multiple identification parameters;
  • the first screening sub-module is used for screening from the preprocessed text to obtain a plurality of the time words corresponding to the recognition parameters of the recognition rules.
  • the implementation process of the functions and roles of the loading sub-module and the first screening sub-module in the above-mentioned plug-in detection device is detailed in the implementation process corresponding to steps S301 to S302 in the above-mentioned plug-in detection method based on login data. No longer.
  • the merge module 5 includes:
  • the second screening sub-module is used to sequentially filter and merge several of the first time words with continuity in the arrangement positions according to the sequence of the arrangement positions to obtain the first merged time words, and arrange the The plurality of said first time words whose positions are not continuous are marked as time words to be merged;
  • the classification sub-module is configured to classify each of the time words to be merged with the arrangement positions within a preset range into the same set according to the sequence of the arrangement positions, to obtain at least one first time word set;
  • the merging sub-module is configured to filter each of the to-be-merged time words respectively corresponding to the recognition rules having an association relationship in the same first time word set to perform a second merging to obtain a second merged time word;
  • the marking sub-module is configured to use the first combined time word and the second combined time word as the second time word.
  • the functions and functions of the second screening sub-module, classification sub-module, merging sub-module, and marking sub-module in the above-mentioned plug-in detection device are detailed in the corresponding steps in the above-mentioned plug-in detection method based on login data.
  • the implementation process of S501 to S504 will not be repeated here.
  • the arrangement position includes a start position and an end position
  • the second screening submodule includes:
  • a judging unit for judging whether the end position of one said first time word is adjacent to the beginning position of another said first time word
  • a determining unit configured to determine that the arrangement positions corresponding to the two first time words have continuity if it is adjacent to the start position of another first time word;
  • the traversal unit is configured to sequentially traverse all the first time words according to the sequence of the arrangement positions, and merge each of the first time words corresponding to the plurality of the arrangement positions with the continuity. , Get the first combined time word.
  • the implementation process of the functions and roles of the judgment unit, judgment unit, and jump unit in the above-mentioned plug-in detection device is detailed in the implementation process corresponding to steps S5011 to S5013 in the above-mentioned plug-in detection method based on login data, here No longer.
  • the determining unit includes:
  • the merging subunit is configured to sequentially merge several of the other first time words according to the respective arrangement positions to obtain the first merged time word.
  • the implementation process of the functions and roles of the merged subunit in the above-mentioned plug-in detection device is detailed in the implementation process corresponding to step S50131 in the above-mentioned plug-in detection method based on login data, which will not be repeated here.
  • the merging sub-module includes:
  • the classification unit is configured to classify each of the time words to be merged according to the corresponding recognition rules to obtain a number of second time word sets;
  • the first merging unit is configured to merge the second time word sets corresponding to each of the recognition rules that have an association relationship to obtain a plurality of third time word sets;
  • the second merging unit is used to filter and merge at least two of the time words to be merged that are simultaneously included in the third time word set and the first time word set to obtain the second merging time word.
  • the implementation process of the functions and roles of the classification unit, the first merging unit and the second merging unit in the above-mentioned plug-in detection device is detailed in the implementation process corresponding to steps S5031 to S5033 in the above-mentioned plug-in detection method based on login data. , I won’t repeat it here.
  • analysis module 6 includes:
  • the judging sub-module is used to judge whether the second time word belongs to a pre-built marked time word
  • the first calculation sub-module is configured to obtain the corresponding time interval according to the start time and end time of the second time word if it does not belong to a pre-built marked time word;
  • the acquiring sub-module is used to acquire the current reference time point if it belongs to a pre-built marked time word
  • the second calculation submodule is configured to calculate the corresponding time interval according to the reference time point and the meaning of the second time word.
  • the functions and functions of the judgment sub-module, the first calculation sub-module, the acquisition sub-module and the second calculation sub-module in the above-mentioned plug-in detection device are detailed in the corresponding method in the above-mentioned plug-in detection method based on login data.
  • the implementation process of steps S601 to S604 will not be repeated here
  • the natural language time word parsing device provided in this embodiment first extracts multiple time words from the input text through multiple pre-built recognition rules, and then according to the respective arrangement position and recognition of each time word in the input text The association between the rules carries out the corresponding time word merging, and finally the merged time word is analyzed according to the corresponding word meaning to obtain the corresponding time interval, so as to realize the analysis of all time words in natural language, and effectively improve the recognition of time words in the input text The comprehensiveness and accuracy rate.
  • an embodiment of the present application also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store data such as a rule library.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize the function of the natural language time word parsing method in any of the above embodiments.
  • the foregoing processor executes the steps of the foregoing natural language time word parsing method:
  • S3 Perform word segmentation on the preprocessed text according to the first preset rule to obtain several time words
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the storage medium may be a non-volatile storage medium or a volatile storage medium, on which a computer program is stored.
  • the computer program is executed by a processor,
  • the method for parsing natural language time words in any of the above embodiments is specifically as follows:
  • S3 Perform word segmentation on the preprocessed text according to the first preset rule to obtain several time words
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The present application relates to the field of semantic parsing, and provides a parsing method and apparatus for natural language time words, a computer device, and a computer readable storage medium. The method comprises: obtaining an input text; removing preset characters in the input text to obtain a preprocessed text; performing word segmentation to obtain a plurality of time words; performing data encapsulation to obtain first time words corresponding to the time words; combining the first time words to obtain a plurality of second time words; and parsing the second time words to obtain time intervals corresponding to the second time words. According to the present application, corresponding time words are extracted from an input text by means of a plurality of recognition rules, then the time words are combined according to the arrangement positions of the time words in the input text and the association between the recognition rules, and finally the combined time words are parsed according to meanings to obtain corresponding time intervals, thereby implementing the parsing of all time words in natural language, and effectively improving the comprehensiveness and accuracy of time word recognition in the input text.

Description

自然语言时间词的解析方法、装置和计算机设备Natural language time word parsing method, device and computer equipment
本申请要求于2019年10月30日提交中国专利局、申请号为201911045300.0,发明名称为“自然语言时间词的解析方法、装置和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 30, 2019, the application number is 201911045300.0, and the invention title is "Natural language time word parsing method, device and computer equipment", the entire content of which is incorporated by reference Incorporated in this application.
技术领域Technical field
本申请涉及语义解析技术领域,特别涉及一种自然语言时间词的解析方法、装置和计算机设备。This application relates to the technical field of semantic parsing, and in particular to a method, device and computer equipment for parsing natural language time words.
背景技术Background technique
在对自然语言进行解析时,时间信息是完整解析自然语言语义时不可或缺的要素。而现有对自然语言中时间信息的识别方法主要是基于固定规则的识别,将固定规则与文本进行匹配,从而提取出时间词,比如,提取出“2018年9月10号”这样表示日期的时间词。发明人意识到,这种识别方法需要构建大量的规则,一方面过于复杂且呆板,不便于后期开发人员的理解和修改;另一方面,这种固定规则从文本中提取的时间词不够全面,准确率较低。When analyzing natural language, time information is an indispensable element for a complete analysis of natural language semantics. The existing recognition method of time information in natural language is mainly based on the recognition of fixed rules, and the fixed rules are matched with the text, so as to extract the time words, for example, extract "September 10th, 2018" which represents the date. Time word. The inventor realized that this recognition method requires the construction of a large number of rules. On the one hand, it is too complicated and rigid, which is not convenient for later developers to understand and modify; on the other hand, the time words extracted from the text by this fixed rule are not comprehensive enough. The accuracy rate is low.
技术问题technical problem
本申请的主要目的为提供一种自然语言时间词的解析方法、装置和计算机设备,旨在解决现有时间词解析方法过于呆板和准确率、完整度低的弊端。The main purpose of this application is to provide a natural language time word parsing method, device and computer equipment, aiming to solve the disadvantages of the existing time word parsing method that is too rigid, accurate and low in completeness.
技术解决方案Technical solutions
为实现上述目的,第一方面,本申请提供了一种自然语言时间词的解析方法,包括:In order to achieve the above objectives, in the first aspect, this application provides a natural language time word parsing method, including:
获取输入文本;Get the input text;
去除所述输入文本中的预设字符,得到预处理文本;Remove preset characters in the input text to obtain preprocessed text;
根据第一预设规则对所述预处理文本进行分词,得到若干个时间词;Perform word segmentation on the preprocessed text according to the first preset rule to obtain several time words;
将各所述时间词进行数据封装,得到各所述时间词分别对应的第一时间词;Data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;
将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词;Combine each of the first time words according to a second preset rule to obtain a number of second time words;
分别解析各所述第二时间词,得到各所述第二时间词各自对应的时间区间。Analyze each of the second time words respectively to obtain the time intervals corresponding to each of the second time words.
第二方面,本申请还提供了一种自然语言时间词的解析装置,包括:In the second aspect, this application also provides a natural language time word parsing device, including:
获取模块,用于获取输入文本;Get module, used to get input text;
处理模块,用于去除所述输入文本中的预设字符,得到预处理文本;A processing module for removing preset characters in the input text to obtain preprocessed text;
分词模块,用于根据第一预设规则对所述预处理文本进行分词,得到若干个时间词;The word segmentation module is used to segment the preprocessed text according to the first preset rule to obtain several time words;
封装模块,用于将各所述时间词进行数据封装,得到各所述时间词分别对应的第一时间词;The encapsulation module is used for data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;
合并模块,用于将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词;The merging module is used to merge each of the first time words according to a second preset rule to obtain a number of second time words;
解析模块,用于分别解析各所述第二时间词,得到各所述第二时间词各自对应的时间区间。The parsing module is used to analyze each of the second time words to obtain the time interval corresponding to each of the second time words.
第三方面,本申请还提供一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器执行所述计算机程序时实现上述自然语言时间词的解析方法,其中,所述自然语言时间词的解析方法包括以下步骤:获取输入文本;去除所述输入文本中的预设字符,得到预处理文本;根据第一预设规则对所述预处理文本进行分词,得到若干个时间词;将各所述时间词进行数据封装,得到各所述时间词分别对应的第一时间词;将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词;分别解析各所述第二时间词,得到各所述第二时间词各自对应的时间区间。In a third aspect, the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the natural language time word parsing method when the processor executes the computer program, wherein: The natural language time word parsing method includes the following steps: obtaining input text; removing preset characters in the input text to obtain a preprocessed text; performing word segmentation on the preprocessed text according to a first preset rule to obtain several Time words; data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words; combining the first time words according to the second preset rule to obtain a number of second time words Time words; respectively analyze each of the second time words to obtain the time interval corresponding to each of the second time words.
第四方面,本申请还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述自然语言时间词的解析方法,其中,所述自然语言时间词的解析方法包括以下步骤:获取输入文本;去除所述输入文本中的预设字符,得到预处理文本;根据第一预设规则对所述预处理文本进行分词,得到若干个时间词;将各所述时间词进行数据封装,得到各所述时间词分别对应的第一时间词;将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词;分别解析各所述第二时间词,得到各所述第二时间词各自对应的时间区间。In a fourth aspect, the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned natural language time word parsing method is realized, wherein the natural language time word The parsing method includes the following steps: obtaining input text; removing preset characters in the input text to obtain a preprocessed text; segmenting the preprocessed text according to a first preset rule to obtain several time words; The time words are data encapsulated to obtain the first time words corresponding to each of the time words; the first time words are combined according to a second preset rule to obtain a number of second time words; each of the time words is parsed separately For the second time word, the time interval corresponding to each of the second time words is obtained.
有益效果Beneficial effect
本申请中提供的一种自然语言时间词的解析方法、装置和计算机设备,首先通过预先构建的多个识别规则从输入文本中提取多个时间词,然后根据各时间词在输入文本中各自对应的排列位置以及识别规则之间的关联进行相应的时间词合并,最后将合并的时间词根据对应的词义解析得到对应的时间区间,从而实现对自然语言中所有时间词的解析,有效提高对输入文本中时间词识别的全面性和准确率。The natural language time word parsing method, device and computer equipment provided in this application first extract multiple time words from the input text through multiple pre-built recognition rules, and then correspond to each time word in the input text. The arrangement position and the association between the recognition rules are merged corresponding time words, and finally the merged time words are analyzed according to the corresponding word meaning to obtain the corresponding time interval, so as to realize the analysis of all time words in natural language, and effectively improve the input The comprehensiveness and accuracy of time word recognition in the text.
附图说明Description of the drawings
图1是本申请一实施例中自然语言时间词的解析方法步骤示意图;Figure 1 is a schematic diagram of the steps of a natural language time word parsing method in an embodiment of the present application;
图2是本申请一实施例中自然语言时间词的解析装置整体结构框图;2 is a block diagram of the overall structure of a natural language time word parsing device in an embodiment of the present application;
图3是本申请一实施例的计算机设备的结构示意框图。FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the present application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
本发明的最佳实施方式The best mode of the present invention
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.
参照图1,本申请一实施例中提供了一种自然语言时间词的解析方法,包括:1, an embodiment of the present application provides a natural language time word parsing method, including:
S1:获取输入文本;S1: Get the input text;
S2:去除所述输入文本中的预设字符,得到预处理文本;S2: Remove preset characters in the input text to obtain preprocessed text;
S3:根据第一预设规则对所述预处理文本进行分词,得到若干个时间词;S3: Perform word segmentation on the preprocessed text according to the first preset rule to obtain several time words;
S4:将各所述时间词进行数据封装,得到各所述时间词分别对应的第一时间词;S4: Data encapsulation of each of the time words to obtain the first time word corresponding to each of the time words;
S5:将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词;S5: Combine each of the first time words according to a second preset rule to obtain a number of second time words;
S6:分别解析各所述第二时间词,得到各所述第二时间词各自对应的时间区间。S6: Parse each of the second time words separately to obtain the time interval corresponding to each of the second time words.
本实施例中,自然语言是指人类自然叙述的语言,比如一段语音或文字。若解析系统接收到的是用户的语音信息,则需要将语音信息转化为文字信息。解析系统接收到用户输入的自然语言后,将其转换为文本格式的信息,从而得到输入文本。解析系统需要对输入文本进行预处理,通过标记敏感字符等方式识别出输入文本中的预设字符,并将其剔除,从而得到预处理文本,以便减少后续分词的处理复杂度。比如,输入文本为:“2018年的第二季度的最后一个月的净利润”,通过标记敏感字符“的”,从而进行去除预设字符后的预处理文本为:“2018年第二季度最后一个月净利润”。解析系统中预先构建有规则库,该规则库为正则表达式,规则库由多个识别规则组成,各个识别规则中分别包含有多个不同的识别参数,一个识别规则用于识别一类时间词。解析系统在加载规则库后,分别调用规则库中的各个识别规则对预处理文本进行分词,从而得到各个识别规则对应的一个或多个时间词。比如,预处理文本为:“2018年第二季度最后一个月净利润”,识别规则A的识别参数包括“年”,因此通过识别规则A对预处理文本进行分词后得到的时间词为:2018年;识别规则B的识别参数包括“季度”,因此通过识别规则B对预处理文本进行分词后得到的时间词为:第二季度。解析系统将分词后的各个时间词进行数据封装,使得各时间词的格式统一,得到各自对应的第一时间词。其中,第一时间词携带时间词属性,时间词属性包括该第一时间词所对应的识别规则、第一时间词在输入文本中的排列位置等相应信息,比如:第一时间词:2018年,对应规则:识别规则A,开始位置:0,结束位置:4。在完成数据封装后,解析系统首先根据各个第一时间词在输入文本中的排列位置,筛选得到排列位置具有连续性的两个或两个以上的第一时间词进行合并,得到第一合并时间词,并将排列位置不具有连续性的若干个第一时间词标记为待合并时间词。然后分别将排列位置在预设范围内的各个待合并时间词进行分类,形成若干个第一时间词集合,即同一第一时间词集合中,待合并时间词必定与另一待合并时间词的排列位置在预设范围内。解析系统在同一第一时间词集合内,筛选具有关联关系的识别规则对应的若干个待合并时间词进行合并,得到若干个第二合并时间词。解析系统综合各所述第一合并时间词和各所述第二合并时间词,得到各所述第二时间词。解析系统对第二时间词进行相应的解析,根据第二时间词对应的开始时刻和结束时刻,得到对应的时间区间。比如,第二时间词为:2018年,则对应的时间区间为:2018年1月1日0时——2018年12月31日24时。进一步的,解析系统将时间区间按照预设格式进行输出,比如将2018年1月1日0时——2018年12月31日24时输出为:2018-01-01-0:00——2018-12-31-24:00。In this embodiment, natural language refers to a language that humans naturally narrate, such as a segment of speech or text. If the analysis system receives the user's voice information, it needs to convert the voice information into text information. After the parsing system receives the natural language input by the user, it converts it into information in text format to obtain the input text. The parsing system needs to preprocess the input text, identify preset characters in the input text by marking sensitive characters, etc., and remove them, so as to obtain the preprocessed text, so as to reduce the processing complexity of subsequent word segmentation. For example, the input text is: "The net profit of the last month of the second quarter of 2018", and the pre-processed text after removing the preset characters by marking the sensitive character "的" is: "The last month of the second quarter of 2018 Net profit for one month". A rule library is pre-built in the parsing system. The rule library is a regular expression. The rule library is composed of multiple recognition rules. Each recognition rule contains multiple different recognition parameters. One recognition rule is used to recognize a type of time word. . After loading the rule base, the parsing system respectively calls each recognition rule in the rule base to segment the preprocessed text, thereby obtaining one or more time words corresponding to each recognition rule. For example, the preprocessed text is: "Net profit in the last month of the second quarter of 2018", and the recognition parameters of recognition rule A include "year", so the time word obtained after segmentation of the preprocessed text by recognition rule A is: 2018 Year; the recognition parameter of recognition rule B includes "quarter", so the time word obtained after word segmentation of the preprocessed text by recognition rule B is: the second quarter. The parsing system encapsulates the data of each time word after word segmentation, so that the format of each time word is unified, and the corresponding first time word is obtained. Among them, the first time word carries the time word attribute, and the time word attribute includes corresponding information such as the recognition rule corresponding to the first time word, the arrangement position of the first time word in the input text, for example: the first time word: 2018 , Corresponding rule: identification rule A, start position: 0, end position: 4. After the data encapsulation is completed, the parsing system first filters out two or more first-time words with continuity in the arrangement position according to the arrangement position of each first-time word in the input text, and merges to obtain the first merged time Words, and mark several first time words that do not have continuity in their arrangement positions as time words to be merged. Then classify each time word to be merged in the preset range to form several first time word sets, that is, in the same first time word set, the time word to be merged must be the same as another time word to be merged. The arrangement position is within the preset range. In the same first time word set, the parsing system screens and merges several time words to be merged corresponding to the recognition rules with an association relationship to obtain several second merge time words. The analysis system synthesizes each of the first combined time words and each of the second combined time words to obtain each of the second time words. The analysis system performs corresponding analysis on the second time word, and obtains the corresponding time interval according to the start time and end time corresponding to the second time word. For example, if the second time word is: 2018, the corresponding time interval is: 0:00 on January 1, 2018-24:00 on December 31, 2018. Further, the parsing system outputs the time interval in a preset format, for example, output from 0:00 on January 1, 2018 to 24:00 on December 31, 2018 as: 2018-01-01-0:00——2018 -12-31-24: 00.
进一步的,所述根据第一预设规则对所述预处理文本进行分词,得到若干个时间词的步骤,包括:Further, the step of segmenting the preprocessed text according to the first preset rule to obtain several time words includes:
S301:加载预先构建的规则库,其中,所述规则库由多个识别规则组成,单个所述识别规则中包含多个识别参数;S301: Load a pre-built rule library, where the rule library is composed of multiple identification rules, and a single identification rule contains multiple identification parameters;
S302:从所述预处理文本中筛选得到与各所述识别规则的识别参数分别对应的若干个所述时间词。S302: Filter from the pre-processed text to obtain a plurality of the time words corresponding to the recognition parameters of each recognition rule.
本实施例中,解析系统中预先构建有规则库,该规则库为正则表达式。规则库由多个识别规则组成,各个识别规则中包含有多个识别参数。识别库中除了上述的“年”、“季度”等常规的识别参数外,还包括“前”、“后”“当日”、“昨天”等特殊的识别参数,可以用于识别类似“6天后”这一类的时间词。解析系统通过各个识别规则中的识别参数,从预处理文本中筛选得到一个或多个时间词,实现对预处理文本的分词。其中,基于同一识别规则筛选得到的时间词属于同一类,与识别规则对应。In this embodiment, a rule library is pre-built in the parsing system, and the rule library is a regular expression. The rule base is composed of multiple identification rules, and each identification rule contains multiple identification parameters. In addition to the conventional identification parameters such as "year" and "quarter" mentioned above, the identification library also includes special identification parameters such as "before", "after", "current day", "yesterday", etc., which can be used to identify similar "after 6 days" "This kind of time word. The parsing system filters out one or more time words from the preprocessed text through the recognition parameters in each recognition rule, and realizes the word segmentation of the preprocessed text. Among them, the time words filtered based on the same recognition rule belong to the same category and correspond to the recognition rule.
进一步的,所述第一时间词携带时间词属性,所述时间词属性包括所述第一时间词对应的所述识别规则和所述第一时间词在所述输入文本中的排列位置,所述将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词的步骤,包括:Further, the first time word carries a time word attribute, and the time word attribute includes the recognition rule corresponding to the first time word and the arrangement position of the first time word in the input text, so The step of combining each of the first time words according to the second preset rule to obtain several second time words includes:
S501:根据所述排列位置的顺序性,依次筛选所述排列位置具有连续性的若干个所述第一时间词进行合并,得到第一合并时间词,并将所述排列位置不具有连续性的若干个所述第一时间词标记为待合并时间词;S501: According to the sequence of the arrangement positions, sequentially filter and merge several of the first time words with continuity in the arrangement positions to obtain the first merged time word, and combine the ones with no continuity in the arrangement position. A plurality of said first time words are marked as time words to be merged;
S502:按照所述排列位置的顺序性,分别将所述排列位置在预设范围内的各个所述待合并时间词归类为同一集合,得到至少一个第一时间词集合;S502: According to the sequence of the arrangement positions, respectively classify each of the time words to be merged whose arrangement positions are within a preset range into the same set to obtain at least one first time word set;
S503:在同一所述第一时间词集合中,筛选具有关联关系的所述识别规则分别对应的各个所述待合并时间词进行二次合并,得到第二合并时间词;S503: In the same first time word set, filter each of the to-be-combined time words respectively corresponding to the recognition rules having an association relationship and perform a second merging to obtain a second combined time word;
S504:将所述第一合并时间词和所述第二合并时间词作所述第二时间词。S504: Use the first combined time word and the second combined time word as the second time word.
本实施例中,第一时间词携带的时间词属性包括该第一时间词对应的识别规则、该第一时间词在输入文本中的开始位置和结束位置,即排列位置。解析系统通过判断一个第一时间词的结束位置是否与另一第一时间词的开始位置是否相邻,若结束位置与开始位置相邻,则判断两个第一时间词分别对应的排列位置之间具有连续性。解析系统按照上述方法筛选出排列位置具有连续性的若干个第一时间词进行合并,从而得到一个或多个第一合并时间词。进一步的,在按照排列位置连续性进行合并第一时间词的过程中,解析系统可以连续合并多个第一时间词,比如第一时间词A与第一时间词B具有连续性,第一时间词B与第一时间词C具有连续性,则解析系统可以将第一时间词A、第一时间词B和第一时间词C三个进行合并得到一个第一合并时间词。并且,解析系统将排列位置不具有连续性的若干个第一时间词标记为待合并时间词,以使用另一种规则来进行合并。具体的,解析系统按照各待合并时间词在输入文本中的排列位置的顺序性,将待合并时间词两两一组进行排列位置的比对,若排列位置在预设范围内,比如待合并时间词A的结束位置为5,待合并时间词B的开始位置为8,结束位置为10,待合并时间词C的开始位置为12,的预设范围为3,则可以将待合并时间词A、待合并时间词B、待合并时间词C包含在同一时间词集合中。解析系统按照上述方式形成一个或多个第一时间词集合,然后根据预先建立的识别规则之间的关联关系,在同一第一时间词集合中,筛选具有关联关系的识别规则对应的若干个待合并时间词进行合并,从而得到第二合并时间词。解析系统综合第一待合并时间词和第二待合并时间词得到第二时间词。In this embodiment, the time word attribute carried by the first time word includes the recognition rule corresponding to the first time word, the start position and the end position of the first time word in the input text, that is, the arrangement position. The parsing system judges whether the end position of a first time word is adjacent to the start position of another first time word, and if the end position is adjacent to the start position, it judges whether the two first time words correspond to the arrangement positions respectively. There is continuity between. The parsing system screens out several first time words with continuity in the arrangement position according to the above method and merges them, thereby obtaining one or more first merged time words. Further, in the process of merging the first time words according to the continuity of the arrangement position, the parsing system may continuously merge multiple first time words. For example, the first time word A and the first time word B have continuity, and the first time word The word B and the first time word C have continuity, and the parsing system can combine the first time word A, the first time word B, and the first time word C to obtain a first combined time word. In addition, the parsing system marks several first time words whose arrangement positions do not have continuity as time words to be merged, so as to use another rule for merging. Specifically, the parsing system compares the arrangement positions of the time words to be merged in pairs according to the sequence of the arrangement positions of the time words to be merged in the input text. If the arrangement positions are within a preset range, for example, to be merged The end position of the time word A is 5, the start position of the time word B to be merged is 8, the end position is 10, the start position of the time word C to be merged is 12, and the preset range is 3. A. The time word to be merged B and the time word C to be merged are included in the same time word set. The parsing system forms one or more first-time word sets in the above-mentioned manner, and then according to the association relationship between the pre-established recognition rules, in the same first-time word set, selects several pending recognition rules corresponding to the association relationship. The merged time words are merged to obtain the second merged time word. The parsing system synthesizes the first time word to be merged and the second time word to be merged to obtain the second time word.
进一步的,所述排列位置包括起始位置和结束位置,所述根据所述排列位置的顺序性,依次筛选所述排列位置具有连续性的若干个所述第一时间词进行合并,得到第一合并时间词的步骤,包括:Further, the arrangement position includes a start position and an end position, and according to the sequence of the arrangement position, a number of the first time words with continuity in the arrangement position are sequentially filtered and combined to obtain the first time word. The steps to merge time words include:
S5011:判断一个所述第一时间词的结束位置是否与另一个所述第一时间词的开始位置相邻;S5011: Determine whether the end position of one of the first time words is adjacent to the start position of another first time word;
S5012:若与另一个所述第一时间词的开始位置相邻,则判定两个所述第一时间词各自对应的所述排列位置具有连续性;S5012: If it is adjacent to the start position of another first time word, determine that the arrangement positions corresponding to the two first time words have continuity;
S5013:根据所述排列位置的顺序性,依次遍历所有的所述第一时间词,将具有所述连续性的若干个所述排列位置分别对应的各所述第一时间词进行合并,得到第一合并时间词。S5013: According to the sequence of the arrangement positions, sequentially traverse all the first time words, and merge each of the first time words corresponding to the plurality of arrangement positions with the continuity to obtain the first time words. A combined time word.
本实施例中,第一时间词在输入文本中的排列位置包括起始位置和结束位置。解析系统判断一个第一时间词的结束位置是否与另一个第一时间词的开始位置是否相邻。若一个第一时间词的结束位置与另一个第一时间词的开始位置相邻,则解析系统判定两个第一时间词的排列位置具有连续性。比如第一时间词A的开始位置为3,结束位置为6,;第一时间词B的开始位置为7,结束位置为9;由于第一时间词A的结束位置“6”与第一时间词B的开始位置“7”相邻,则系统判定第一时间词A和第一时间词B两者分别对应的排列位置具有连续性。解析系统根据各个第一时间词在输入文本中的排列排列位置的顺序性依次遍历所有的第一时间词,按照上述的判定方法,筛选出具有连续性的各个排列位置分别对应的各第一时间词,并按照各时间词的排列位置进行顺序合并,得到一个或多个第一合并时间词。In this embodiment, the arrangement position of the first time word in the input text includes a start position and an end position. The parsing system judges whether the end position of a first time word is adjacent to the start position of another first time word. If the end position of a first time word is adjacent to the start position of another first time word, the parsing system determines that the arrangement positions of the two first time words have continuity. For example, the start position of the first time word A is 3 and the end position is 6,; the start position of the first time word B is 7, and the end position is 9; because the end position of the first time word A is "6" and the first time The start position "7" of word B is adjacent, then the system determines that the corresponding arrangement positions of the first time word A and the first time word B have continuity. The parsing system sequentially traverses all the first time words according to the sequence of the arrangement positions of the first time words in the input text, and according to the above-mentioned judgment method, filters out the first time corresponding to each arrangement position with continuity. Words, and merge them in sequence according to the arrangement position of each time word to obtain one or more first merged time words.
进一步的,所述将具有所述连续性的若干个所述排列位置分别对应的各所述第一时间词进行合并,得到第一合并时间词的步骤,包括:Further, the step of combining each of the first time words corresponding to the plurality of the arrangement positions having the continuity to obtain the first combined time word includes:
S50131:将若干个所述第一时间词按照各自对应的所述排列位置进行顺序合并,得到所述第一合并时间词。S50131: Combine a number of the first time words in order according to their corresponding arrangement positions to obtain the first combined time word.
本实施例中,解析系统在将排列位置具有连续性的两个或两个以上的第一时间词进行合并时,需要根据各个第一时间词在输入文本中各自对应的排列位置进行顺序合并。具体地,解析系统可以根据两个第一时间词各自对应的开始位置或结束位置之间的大小关系来判定两者之间在输入文本中的排列位置,比如第一时间词A的开始位置为5,第一时间词B的开始位置为9,由于第一时间词A的开始位置小于第一时间词B的开始文章,因此必然排在第一时间词B之前。由于输入文本是根据用户输入的自然语言得到的,自然语言中的时间词本身就具有特定的逻辑性和顺序性,比如我们在说话时正常只会说2018年9月,而不会说9月2018年,因此解析系统需要根据排列位置的顺序来合并两个第一时间词,从而得到第一合并时间词。In this embodiment, when the parsing system merges two or more first time words whose arrangement positions have continuity, it needs to merge sequentially according to the respective arrangement positions of the first time words in the input text. Specifically, the parsing system can determine the arrangement position of the two first time words in the input text according to the size relationship between the corresponding start positions or end positions of the two first time words. For example, the start position of the first time word A is 5. The start position of the first time word B is 9. Since the start position of the first time word A is smaller than the start article of the first time word B, it must be ranked before the first time word B. Since the input text is obtained based on the natural language input by the user, the time word in the natural language itself has a specific logic and sequence. For example, when we speak, we normally only say September 2018, not September. In 2018, therefore, the parsing system needs to merge the two first time words according to the order of the arrangement positions to obtain the first merged time word.
进一步的,所述在同一所述第一时间词集合中,筛选具有关联关系的所述识别规则分别对应的各个所述待合并时间词进行二次合并,得到第二合并时间词的步骤,包括:Further, in the same first time word set, the step of screening each of the to-be-combined time words corresponding to the recognition rules having an association relationship and performing secondary merging to obtain a second combined time word includes :
S5031:将各所述待合并时间词按照各自对应的所述识别规则进行分类,得到若干个第二时间词集合;S5031: Classify each of the time words to be merged according to their corresponding recognition rules to obtain several sets of second time words;
S5032:分别将各个具有关联关系的所述识别规则各自对应的所述第二时间词集合进行合并,得到若干个第三时间词集合;S5032: Combine the second time word sets corresponding to the recognition rules each having an association relationship to obtain several third time word sets;
S5033:筛选同时包含于所述第三时间词集合和所述第一时间词集合内的不少于两个的所述待合并时间词进行合并,得到所述第二合并时间词。S5033: Filter and merge at least two of the time words to be merged that are simultaneously included in the third time word set and the first time word set to obtain the second merged time word.
本实施例中,解析系统将各个待合并时间词按照各自对应的识别规则进行分类,从而得到一个或多个第二时间词集合,其中,同一第二时间词集合中的各个待合并时间词均由同种识别规则筛选得到。规则库中的各个识别规则之间预先构建有关联关系,比如识别规则A可以识别时间词“年”,识别规则B可以识别时间词“月”,识别规则A与识别规则B相互关联,以便后续将时间词“年”与时间词“月”进行合并。 解析系统分别将具有关联关系的识别规则对应的两个或两个以上的第二时间词集合进行合并,从而得到一个或多个第三时间词集合。若两个待合并时间词同时包含于第三时间词集合和第一时间词集合内,则说明这两个待合并时间词之间的排列位置既在预设范围内,并且各自对应的识别规则具有关联关系。因此,解析系统只需要筛选同时包含于第一时间词集合和第三时间词集合内的不少于两个的待合并时间词进行合并,即可得到具有逻辑关联的第二合并时间词。In this embodiment, the parsing system classifies each time word to be combined according to their corresponding recognition rules, thereby obtaining one or more second time word sets, where each time word to be combined in the same second time word set is Filtered by the same recognition rules. The identification rules in the rule base are pre-built with an association relationship. For example, the identification rule A can identify the time word "year", the identification rule B can identify the time word "month", and the identification rule A and the identification rule B are related to each other for follow-up Combine the time word "year" with the time word "month". The parsing system respectively merges two or more second time word sets corresponding to the recognition rules with an association relationship to obtain one or more third time word sets. If two time words to be merged are included in the third time word set and the first time word set at the same time, it means that the arrangement position between the two time words to be merged is within the preset range, and their respective recognition rules Have an association relationship. Therefore, the parsing system only needs to filter and merge at least two time words to be merged that are contained in the first time word set and the third time word set at the same time, to obtain the second merged time word with logical association.
进一步的,所述分别解析各所述第二时间词,得到各所述第二时间词各自对应的时间区间的步骤,包括:Further, the step of separately analyzing each of the second time words to obtain the time interval corresponding to each of the second time words includes:
S601:判断所述第二时间词是否属于预先构建的标记时间词;S601: Determine whether the second time word belongs to a pre-built marked time word;
S602:若不属于预先构建的标记时间词,则根据所述第二时间词的开始时刻和结束时刻得到对应的所述时间区间;S602: If it does not belong to a pre-built marked time word, obtain the corresponding time interval according to the start time and end time of the second time word;
S603:若属于预先构建的标记时间词,则获取当前的基准时间点;S603: If it belongs to a pre-built marked time word, acquire the current reference time point;
S604:根据所述基准时间点和所述第二时间词的词义计算得到对应的所述时间区间。S604: Calculate the corresponding time interval according to the reference time point and the meaning of the second time word.
本实施例中,解析系统合并后得到的第二时间词会有两个形式,一种为:2018年、8月、9日这一类具有语义确定的时间词,另一种为:今天、后天、6天后这一类语义模糊的时间词,开发人员将这一类语义模糊的时间词设定为标记时间词,不同形式的第二时间词在解析过程中的处理方法不同。解析系统首先判断第二时间词是否为标记时间词,若不是,则可以直接根据第二时间词的开始时刻和结束时刻得到对应的时间区间。比如第二时间词为:2018年6月,则对应的时间区间为:2018年6月1日0时0分——2018年6月30日24时0分。具体的,在机器的执行过程中,对应的区间精确到微秒,在此不做详述。若是标记时间词,解析系统需要获取当前的基准时间点,具体的,该基准时间点根据用户当前所处的时区得到,即与用户当前时区对应。解析系统根据基准时间点和第二时间词的词义计算得到对应的时间区间,比如基准时间点为:2018年6月24日,第二时间词为:3天后,则对应的时间区间为2018年6月27日0时0分——2018年6月27日24时0分。In this embodiment, the second time word obtained after the analysis system is merged will have two forms, one is: 2018, August, 9th and other time words with certain semantics, and the other is: today, After the day after tomorrow and 6 days later, this type of semantically ambiguous time words, developers set this type of semantically ambiguous time words as marked time words, and different forms of second time words have different processing methods in the parsing process. The parsing system first judges whether the second time word is a marked time word, if not, it can directly obtain the corresponding time interval according to the start time and end time of the second time word. For example, the second time word is: June 2018, the corresponding time interval is: June 1, 2018 0:00-June 30, 2018 24:00. Specifically, in the execution process of the machine, the corresponding interval is accurate to microseconds, and will not be described in detail here. In the case of marking time words, the analysis system needs to obtain the current reference time point. Specifically, the reference time point is obtained according to the time zone where the user is currently located, that is, corresponds to the current time zone of the user. The analysis system calculates the corresponding time interval according to the reference time point and the meaning of the second time word. For example, the reference time point is: June 24, 2018, and the second time word is: 3 days later, the corresponding time interval is 2018 From 0:00 on June 27th to 24:00 on June 27th, 2018.
本实施例提供的一种自然语言时间词的解析方法,首先通过预先构建的多个识别规则从输入文本中提取多个时间词,然后根据各时间词在输入文本中各自对应的排列位置以及识别规则之间的关联进行相应的时间词合并,最后将合并的时间词根据对应的词义解析得到对应的时间区间,从而实现对自然语言中所有时间词的解析,有效提高对输入文本中时间词识别的全面性和准确率。The method for parsing natural language time words provided by this embodiment first extracts multiple time words from the input text through a plurality of pre-built recognition rules, and then according to the respective arrangement position and recognition of each time word in the input text The association between the rules carries out the corresponding time word merging, and finally the merged time word is analyzed according to the corresponding word meaning to obtain the corresponding time interval, so as to realize the analysis of all time words in natural language, and effectively improve the recognition of time words in the input text The comprehensiveness and accuracy rate.
参照图2,本申请一实施例中还提供了一种自然语言时间词的解析装置,包括:2, an embodiment of the present application also provides a natural language time word parsing device, including:
获取模块1,用于获取输入文本;Obtaining module 1, used to obtain input text;
处理模块2,用于去除所述输入文本中的预设字符,得到预处理文本;The processing module 2 is used to remove preset characters in the input text to obtain preprocessed text;
分词模块3,用于根据第一预设规则对所述预处理文本进行分词,得到若干个时间词;The word segmentation module 3 is used to segment the preprocessed text according to the first preset rule to obtain several time words;
封装模块4,用于将各所述时间词进行数据封装,得到各所述时间词分别对应的第一时间词;The encapsulation module 4 is used for data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;
合并模块5,用于将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词;The merging module 5 is used to merge each of the first time words according to a second preset rule to obtain a number of second time words;
解析模块6,用于分别解析各所述第二时间词,得到各所述第二时间词各自对应的时间区间。The parsing module 6 is configured to analyze each of the second time words separately to obtain the time interval corresponding to each of the second time words.
进一步的,所述解析装置还包括输出模块,用于将各所述时间区间按照预设格式输出到显示界面。Further, the analysis device further includes an output module, configured to output each of the time intervals to a display interface in a preset format.
本实施例中,上述外挂检测装置中的获取模块1、处理模块2、分词模块3、封装模块4、合并模块5与解析模块6的功能和作用的实现过程具体详见上述基于登录数据的外挂检测方法中对应步骤S1至S6的实现过程,在此不再赘述。In this embodiment, the functions and functions of the acquisition module 1, the processing module 2, the word segmentation module 3, the encapsulation module 4, the merging module 5 and the parsing module 6 in the above-mentioned plug-in detection device are detailed in the above-mentioned plug-in based on login data. The implementation process of corresponding steps S1 to S6 in the detection method will not be repeated here.
进一步的,所述分词模块3,包括:Further, the word segmentation module 3 includes:
加载子模块,用于加载预先构建的规则库,其中,所述规则库由多个识别规则组成,单个所述识别规则中包含多个识别参数;A loading sub-module for loading a pre-built rule library, where the rule library is composed of multiple identification rules, and a single identification rule contains multiple identification parameters;
第一筛选子模块,用于从所述预处理文本中筛选得到与各所述识别规则的识别参数分别对应的若干个所述时间词。The first screening sub-module is used for screening from the preprocessed text to obtain a plurality of the time words corresponding to the recognition parameters of the recognition rules.
本实施例中,上述外挂检测装置中的加载子模块与第一筛选子模块的功能和作用的实现过程具体详见上述基于登录数据的外挂检测方法中对应步骤S301至S302的实现过程,在此不再赘述。In this embodiment, the implementation process of the functions and roles of the loading sub-module and the first screening sub-module in the above-mentioned plug-in detection device is detailed in the implementation process corresponding to steps S301 to S302 in the above-mentioned plug-in detection method based on login data. No longer.
进一步的,所述第一时间词携带时间词属性,所述时间词属性包括所述第一时间词对应的所述识别规则和所述第一时间词在所述输入文本中的排列位置,所述合并模块5,包括:Further, the first time word carries a time word attribute, and the time word attribute includes the recognition rule corresponding to the first time word and the arrangement position of the first time word in the input text, so The merge module 5 includes:
第二筛选子模块,用于根据所述排列位置的顺序性,依次筛选所述排列位置具有连续性的若干个所述第一时间词进行合并,得到第一合并时间词,并将所述排列位置不具有连续性的若干个所述第一时间词标记为待合并时间词;The second screening sub-module is used to sequentially filter and merge several of the first time words with continuity in the arrangement positions according to the sequence of the arrangement positions to obtain the first merged time words, and arrange the The plurality of said first time words whose positions are not continuous are marked as time words to be merged;
归类子模块,用于按照所述排列位置的顺序性,分别将所述排列位置在预设范围内的各个所述待合并时间词归类为同一集合,得到至少一个第一时间词集合;The classification sub-module is configured to classify each of the time words to be merged with the arrangement positions within a preset range into the same set according to the sequence of the arrangement positions, to obtain at least one first time word set;
合并子模块,用于在同一所述第一时间词集合中,筛选具有关联关系的所述识别规则分别对应的各个所述待合并时间词进行二次合并,得到第二合并时间词;The merging sub-module is configured to filter each of the to-be-merged time words respectively corresponding to the recognition rules having an association relationship in the same first time word set to perform a second merging to obtain a second merged time word;
标记子模块,用于将所述第一合并时间词和所述第二合并时间词作所述第二时间词。The marking sub-module is configured to use the first combined time word and the second combined time word as the second time word.
本实施例中,上述外挂检测装置中的第二筛选子模块、归类子模块、合并子模块与标记子模块的功能和作用的实现过程具体详见上述基于登录数据的外挂检测方法中对应步骤S501至S504的实现过程,在此不再赘述。In this embodiment, the functions and functions of the second screening sub-module, classification sub-module, merging sub-module, and marking sub-module in the above-mentioned plug-in detection device are detailed in the corresponding steps in the above-mentioned plug-in detection method based on login data. The implementation process of S501 to S504 will not be repeated here.
进一步的,所述排列位置包括起始位置和结束位置,所述第二筛选子模块,包括:Further, the arrangement position includes a start position and an end position, and the second screening submodule includes:
判断单元,用于判断一个所述第一时间词的结束位置是否与另一个所述第一时间词的开始位置相邻;A judging unit for judging whether the end position of one said first time word is adjacent to the beginning position of another said first time word;
判定单元,用于若与另一个所述第一时间词的开始位置相邻,则判定两个所述第一时间词各自对应的所述排列位置具有连续性;A determining unit, configured to determine that the arrangement positions corresponding to the two first time words have continuity if it is adjacent to the start position of another first time word;
遍历单元,用于根据所述排列位置的顺序性,依次遍历所有的所述第一时间词,将具有所述连续性的若干个所述排列位置分别对应的各所述第一时间词进行合并,得到第一合并时间词。The traversal unit is configured to sequentially traverse all the first time words according to the sequence of the arrangement positions, and merge each of the first time words corresponding to the plurality of the arrangement positions with the continuity. , Get the first combined time word.
本实施例中,上述外挂检测装置中的判断单元、判定单元与跳转单元的功能和作用的实现过程具体详见上述基于登录数据的外挂检测方法中对应步骤S5011至S5013的实现过程,在此不再赘述。In this embodiment, the implementation process of the functions and roles of the judgment unit, judgment unit, and jump unit in the above-mentioned plug-in detection device is detailed in the implementation process corresponding to steps S5011 to S5013 in the above-mentioned plug-in detection method based on login data, here No longer.
进一步的,所述判定单元,包括:Further, the determining unit includes:
合并子单元,用于将若干个所述另一第一时间词按照各自对应的所述排列位置进行顺序合并,得到所述第一合并时间词。The merging subunit is configured to sequentially merge several of the other first time words according to the respective arrangement positions to obtain the first merged time word.
本实施例中,上述外挂检测装置中的合并子单元的功能和作用的实现过程具体详见上述基于登录数据的外挂检测方法中对应步骤S50131的实现过程,在此不再赘述。In this embodiment, the implementation process of the functions and roles of the merged subunit in the above-mentioned plug-in detection device is detailed in the implementation process corresponding to step S50131 in the above-mentioned plug-in detection method based on login data, which will not be repeated here.
进一步的,所述合并子模块,包括:Further, the merging sub-module includes:
分类单元,用于将各所述待合并时间词按照各自对应的所述识别规则进行分类,得到若干个第二时间词集合;The classification unit is configured to classify each of the time words to be merged according to the corresponding recognition rules to obtain a number of second time word sets;
第一合并单元,用于分别将各个具有关联关系的所述识别规则各自对应的所述第二时间词集合进行合并,得到若干个第三时间词集合;The first merging unit is configured to merge the second time word sets corresponding to each of the recognition rules that have an association relationship to obtain a plurality of third time word sets;
第二合并单元,用于筛选同时包含于所述第三时间词集合和所述第一时间词集合内的不少于两个的所述待合并时间词进行合并,得到所述第二合并时间词。The second merging unit is used to filter and merge at least two of the time words to be merged that are simultaneously included in the third time word set and the first time word set to obtain the second merging time word.
本实施例中,上述外挂检测装置中的分类单元、第一合并单元与第二合并单元的功能和作用的实现过程具体详见上述基于登录数据的外挂检测方法中对应步骤S5031至S5033的实现过程,在此不再赘述。In this embodiment, the implementation process of the functions and roles of the classification unit, the first merging unit and the second merging unit in the above-mentioned plug-in detection device is detailed in the implementation process corresponding to steps S5031 to S5033 in the above-mentioned plug-in detection method based on login data. , I won’t repeat it here.
进一步的,所述解析模块6,包括:Further, the analysis module 6 includes:
判断子模块,用于判断所述第二时间词是否属于预先构建的标记时间词;The judging sub-module is used to judge whether the second time word belongs to a pre-built marked time word;
第一计算子模块,用于若不属于预先构建的标记时间词,则根据所述第二时间词的开始时刻和结束时刻得到对应的所述时间区间;The first calculation sub-module is configured to obtain the corresponding time interval according to the start time and end time of the second time word if it does not belong to a pre-built marked time word;
获取子模块,用于若属于预先构建的标记时间词,则获取当前的基准时间点;The acquiring sub-module is used to acquire the current reference time point if it belongs to a pre-built marked time word;
第二计算子模块,用于根据所述基准时间点和所述第二时间词的词义计算得到对应的所述时间区间。The second calculation submodule is configured to calculate the corresponding time interval according to the reference time point and the meaning of the second time word.
本实施例中,上述外挂检测装置中的判断子模块、第一计算子模块、获取子模块与第二计算子模块的功能和作用的实现过程具体详见上述基于登录数据的外挂检测方法中对应步骤S601至S604的实现过程,在此不再赘述In this embodiment, the functions and functions of the judgment sub-module, the first calculation sub-module, the acquisition sub-module and the second calculation sub-module in the above-mentioned plug-in detection device are detailed in the corresponding method in the above-mentioned plug-in detection method based on login data. The implementation process of steps S601 to S604 will not be repeated here
本实施例提供的一种自然语言时间词的解析装置,首先通过预先构建的多个识别规则从输入文本中提取多个时间词,然后根据各时间词在输入文本中各自对应的排列位置以及识别规则之间的关联进行相应的时间词合并,最后将合并的时间词根据对应的词义解析得到对应的时间区间,从而实现对自然语言中所有时间词的解析,有效提高对输入文本中时间词识别的全面性和准确率。The natural language time word parsing device provided in this embodiment first extracts multiple time words from the input text through multiple pre-built recognition rules, and then according to the respective arrangement position and recognition of each time word in the input text The association between the rules carries out the corresponding time word merging, and finally the merged time word is analyzed according to the corresponding word meaning to obtain the corresponding time interval, so as to realize the analysis of all time words in natural language, and effectively improve the recognition of time words in the input text The comprehensiveness and accuracy rate.
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储规则库等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现上述的任一实施例自然语言时间词的解析方法的功能。Referring to FIG. 3, an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store data such as a rule library. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize the function of the natural language time word parsing method in any of the above embodiments.
上述处理器执行上述自然语言时间词的解析方法的步骤:The foregoing processor executes the steps of the foregoing natural language time word parsing method:
S1:获取输入文本;S1: Get the input text;
S2:去除所述输入文本中的预设字符,得到预处理文本;S2: Remove preset characters in the input text to obtain preprocessed text;
S3:根据第一预设规则对所述预处理文本进行分词,得到若干个时间词;S3: Perform word segmentation on the preprocessed text according to the first preset rule to obtain several time words;
S4:将各所述时间词进行数据封装,得到各所述时间词分别对应的第一时间词;S4: Data encapsulation of each of the time words to obtain the first time word corresponding to each of the time words;
S5:将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词;S5: Combine each of the first time words according to a second preset rule to obtain a number of second time words;
S6:分别解析各所述第二时间词,得到各所述第二时间词各自对应的时间区间。S6: Parse each of the second time words separately to obtain the time interval corresponding to each of the second time words.
本申请一实施例还提供一种计算机可读存储介质,所述存储介质可以是非易失性存储介质,也可以是易失性存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述的任一实施例自然语言时间词的解析方法,具体为:An embodiment of the present application also provides a computer-readable storage medium. The storage medium may be a non-volatile storage medium or a volatile storage medium, on which a computer program is stored. When the computer program is executed by a processor, The method for parsing natural language time words in any of the above embodiments is specifically as follows:
S1:获取输入文本;S1: Get the input text;
S2:去除所述输入文本中的预设字符,得到预处理文本;S2: Remove preset characters in the input text to obtain preprocessed text;
S3:根据第一预设规则对所述预处理文本进行分词,得到若干个时间词;S3: Perform word segmentation on the preprocessed text according to the first preset rule to obtain several time words;
S4:将各所述时间词进行数据封装,得到各所述时间词分别对应的第一时间词;S4: Data encapsulation of each of the time words to obtain the first time word corresponding to each of the time words;
S5:将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词;S5: Combine each of the first time words according to a second preset rule to obtain a number of second time words;
S6:分别解析各所述第二时间词,得到各所述第二时间词各自对应的时间区间。S6: Parse each of the second time words separately to obtain the time interval corresponding to each of the second time words.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储与一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM通过多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored and a non-volatile computer readable storage. In the medium, when the computer program is executed, it may include the procedures of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其它相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of this application, and do not therefore limit the scope of the patent of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of this application.

Claims (20)

  1. 一种自然语言时间词的解析方法,包括:A natural language time word parsing method, including:
    获取输入文本;Get the input text;
    去除所述输入文本中的预设字符,得到预处理文本;Remove preset characters in the input text to obtain preprocessed text;
    根据第一预设规则对所述预处理文本进行分词,得到若干个时间词;Perform word segmentation on the preprocessed text according to the first preset rule to obtain several time words;
    将各所述时间词进行数据封装,得到各所述时间词分别对应的第一时间词;Data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;
    将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词;Combine each of the first time words according to a second preset rule to obtain a number of second time words;
    分别解析各所述第二时间词,得到各所述第二时间词各自对应的时间区间。Analyze each of the second time words respectively to obtain the time intervals corresponding to each of the second time words.
  2. 根据权利要求1所述的自然语言时间词的解析方法,所述根据第一预设规则对所述预处理文本进行分词,得到若干个时间词的步骤,包括:The method for parsing natural language time words according to claim 1, wherein the step of segmenting the preprocessed text according to a first preset rule to obtain several time words comprises:
    加载预先构建的规则库,其中,所述规则库由多个识别规则组成,单个所述识别规则中包含多个识别参数;Loading a pre-built rule library, where the rule library is composed of multiple identification rules, and a single identification rule contains multiple identification parameters;
    从所述预处理文本中筛选得到与各所述识别规则的识别参数分别对应的若干个所述时间词。A number of the time words corresponding to the recognition parameters of each recognition rule are obtained by filtering from the preprocessed text.
  3. 根据权利要求2所述的自然语言时间词的解析方法,所述第一时间词携带时间词属性,所述时间词属性包括所述第一时间词对应的所述识别规则和所述第一时间词在所述输入文本中的排列位置,所述将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词的步骤,包括:The natural language time word parsing method according to claim 2, wherein the first time word carries a time word attribute, and the time word attribute includes the recognition rule corresponding to the first time word and the first time The arrangement position of words in the input text, and the step of merging each of the first time words according to a second preset rule to obtain a plurality of second time words includes:
    根据所述排列位置的顺序性,依次筛选所述排列位置具有连续性的若干个所述第一时间词进行合并,得到第一合并时间词,并将所述排列位置不具有连续性的若干个所述第一时间词标记为待合并时间词;According to the sequence of the arrangement positions, sequentially select several of the first time words with continuity in the arrangement positions and merge them to obtain the first merged time word, and combine several words with no continuity in the arrangement positions The first time word is marked as a time word to be merged;
    按照所述排列位置的顺序性,分别将所述排列位置在预设范围内的各个所述待合并时间词归类为同一集合,得到至少一个第一时间词集合;According to the sequence of the arrangement positions, respectively classify each of the time words to be merged whose arrangement positions are within a preset range into the same set to obtain at least one first time word set;
    在同一所述第一时间词集合中,筛选具有关联关系的所述识别规则分别对应的各个所述待合并时间词进行二次合并,得到第二合并时间词;In the same first time word set, filter each of the to-be-combined time words respectively corresponding to the recognition rules having an association relationship and merge them twice to obtain a second combined time word;
    将所述第一合并时间词和所述第二合并时间词作所述第二时间词。The first combined time word and the second combined time word are used as the second time word.
  4. 根据权利要求3所述的自然语言时间词的解析方法,所述排列位置包括起始位置和结束位置,所述根据所述排列位置的顺序性,依次筛选所述排列位置具有连续性的若干个所述第一时间词进行合并,得到第一合并时间词的步骤,包括:The natural language time word parsing method according to claim 3, wherein the arrangement position includes a start position and an end position, and according to the sequence of the arrangement position, several consecutive positions in the arrangement position are sequentially selected The step of combining the first time words to obtain the first combined time words includes:
    判断一个所述第一时间词的结束位置是否与另一个所述第一时间词的开始位置相邻;Judging whether the end position of one of the first time words is adjacent to the beginning position of another first time word;
    若与另一个所述第一时间词的开始位置相邻,则判定两个所述第一时间词各自对应的所述排列位置具有连续性;If it is adjacent to the start position of another first time word, it is determined that the arrangement positions corresponding to the two first time words have continuity;
    根据所述排列位置的顺序性,依次遍历所有的所述第一时间词,将具有所述连续性的若干个所述排列位置分别对应的各所述第一时间词进行合并,得到第一合并时间词。According to the sequence of the arrangement positions, all the first time words are sequentially traversed, and the first time words corresponding to the plurality of arrangement positions with the continuity are merged to obtain the first merged Time word.
  5. 根据权利要求4所述的自然语言时间词的解析方法,所述将具有所述连续性的若干个所述排列位置分别对应的各所述第一时间词进行合并,得到第一合并时间词的步骤,包括:The method for parsing natural language time words according to claim 4, wherein the first time words corresponding to the plurality of the arrangement positions with the continuity are merged to obtain the first merged time word The steps include:
    将若干个所述第一时间词按照各自对应的所述排列位置进行顺序合并,得到所述第一合并时间词。Combining a number of the first time words in sequence according to the respective arrangement positions to obtain the first combined time word.
  6. 根据权利要求3所述的自然语言时间词的解析方法,所述在同一所述第一时间词集合中,筛选具有关联关系的所述识别规则分别对应的各个所述待合并时间词进行二次合并,得到第二合并时间词的步骤,包括:The method for parsing natural language time words according to claim 3, wherein in the same first time word set, each of the time words to be merged corresponding to the recognition rules having an association relationship is selected for a second time The steps of merging to obtain the second merging time word include:
    将各所述待合并时间词按照各自对应的所述识别规则进行分类,得到若干个第二时间词集合;Classify each of the time words to be merged according to their corresponding recognition rules to obtain a number of second time word sets;
    分别将各个具有关联关系的所述识别规则各自对应的所述第二时间词集合进行合并,得到若干个第三时间词集合;Respectively combining the second time word sets corresponding to the respective recognition rules having an association relationship to obtain a plurality of third time word sets;
    筛选同时包含于所述第三时间词集合和所述第一时间词集合内的不少于两个的所述待合并时间词进行合并,得到所述第二合并时间词。Not less than two of the time words to be merged that are simultaneously included in the third time word set and the first time word set are filtered and merged to obtain the second merged time word.
  7. 根据权利要求1所述的自然语言时间词的解析方法,所述分别解析各所述第二时间词,得到各所述第二时间词各自对应的时间区间的步骤,包括:The natural language time word parsing method according to claim 1, wherein the step of parsing each of the second time words separately to obtain the time interval corresponding to each of the second time words respectively comprises:
    判断所述第二时间词是否属于预先构建的标记时间词;Judging whether the second time word belongs to a pre-built marked time word;
    若不属于预先构建的标记时间词,则根据所述第二时间词的开始时刻和结束时刻得到对应的所述时间区间;If it does not belong to a pre-built marked time word, obtain the corresponding time interval according to the start time and end time of the second time word;
    若属于预先构建的标记时间词,则获取当前的基准时间点;If it belongs to a pre-built marked time word, get the current reference time point;
    根据所述基准时间点和所述第二时间词的词义计算得到对应的所述时间区间。The corresponding time interval is calculated according to the reference time point and the meaning of the second time word.
  8. 一种自然语言时间词的解析装置,包括:A natural language time word parsing device, including:
    获取模块,用于获取输入文本;Get module, used to get input text;
    处理模块,用于去除所述输入文本中的预设字符,得到预处理文本;A processing module for removing preset characters in the input text to obtain preprocessed text;
    分词模块,用于根据第一预设规则对所述预处理文本进行分词,得到若干个时间词;The word segmentation module is used to segment the preprocessed text according to the first preset rule to obtain several time words;
    封装模块,用于将各所述时间词进行数据封装,得到各所述时间词分别对应的第一时间词;The encapsulation module is used for data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;
    合并模块,用于将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词;The merging module is used to merge each of the first time words according to a second preset rule to obtain a number of second time words;
    解析模块,用于分别解析各所述第二时间词,得到各所述第二时间词各自对应的时间区间。The parsing module is used to analyze each of the second time words to obtain the time interval corresponding to each of the second time words.
  9. 根据权利要求8所述的自然语言时间词的解析装置,所述分词模块,包括:8. The natural language time word parsing device according to claim 8, wherein the word segmentation module comprises:
    加载子模块,用于加载预先构建的规则库,其中,所述规则库由多个识别规则组成,单个所述识别规则中包含多个识别参数;A loading sub-module for loading a pre-built rule library, where the rule library is composed of multiple identification rules, and a single identification rule contains multiple identification parameters;
    第一筛选子模块,用于从所述预处理文本中筛选得到与各所述识别规则的识别参数分别对应的若干个所述时间词。The first screening sub-module is used for screening from the preprocessed text to obtain a plurality of the time words corresponding to the recognition parameters of the recognition rules.
  10. 根据权利要求8所述的自然语言时间词的解析装置,所述第一时间词携带时间词属性,所述时间词属性包括所述第一时间词对应的所述识别规则和所述第一时间词在所述输入文本中的排列位置,所述合并模块,包括:8. The natural language time word parsing device according to claim 8, wherein the first time word carries a time word attribute, and the time word attribute includes the recognition rule corresponding to the first time word and the first time The arrangement position of a word in the input text, and the merging module includes:
    第二筛选子模块,用于根据所述排列位置的顺序性,依次筛选所述排列位置具有连续性的若干个所述第一时间词进行合并,得到第一合并时间词,并将所述排列位置不具有连续性的若干个所述第一时间词标记为待合并时间词;The second screening sub-module is used to sequentially filter and merge several of the first time words with continuity in the arrangement positions according to the sequence of the arrangement positions to obtain the first merged time words, and arrange the The plurality of said first time words whose positions are not continuous are marked as time words to be merged;
    归类子模块,用于按照所述排列位置的顺序性,分别将所述排列位置在预设范围内的各个所述待合并时间词归类为同一集合,得到至少一个第一时间词集合;The classification sub-module is configured to classify each of the time words to be merged with the arrangement positions within a preset range into the same set according to the sequence of the arrangement positions, to obtain at least one first time word set;
    合并子模块,用于在同一所述第一时间词集合中,筛选具有关联关系的所述识别规则分别对应的各个所述待合并时间词进行二次合并,得到第二合并时间词;The merging sub-module is configured to filter each of the to-be-merged time words respectively corresponding to the recognition rules having an association relationship in the same first time word set to perform a second merging to obtain a second merged time word;
    标记子模块,用于将所述第一合并时间词和所述第二合并时间词作所述第二时间词。The marking sub-module is configured to use the first combined time word and the second combined time word as the second time word.
  11. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器执行所述计算机程序时实现一种自然语言时间词的解析方法;A computer device includes a memory and a processor, wherein a computer program is stored in the memory, and the processor implements a natural language time word parsing method when the computer program is executed;
    其中,所述自然语言时间词的解析方法包括:Wherein, the method for parsing the natural language time word includes:
    获取输入文本;Get the input text;
    去除所述输入文本中的预设字符,得到预处理文本;Remove preset characters in the input text to obtain preprocessed text;
    根据第一预设规则对所述预处理文本进行分词,得到若干个时间词;Perform word segmentation on the preprocessed text according to the first preset rule to obtain several time words;
    将各所述时间词进行数据封装,得到各所述时间词分别对应的第一时间词;Data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;
    将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词;Combine each of the first time words according to a second preset rule to obtain a number of second time words;
    分别解析各所述第二时间词,得到各所述第二时间词各自对应的时间区间。Analyze each of the second time words respectively to obtain the time intervals corresponding to each of the second time words.
  12. 根据权利要求11所述的计算机设备,所述根据第一预设规则对所述预处理文本进行分词,得到若干个时间词的步骤,包括:11. The computer device according to claim 11, wherein the step of segmenting the preprocessed text according to a first preset rule to obtain several time words comprises:
    加载预先构建的规则库,其中,所述规则库由多个识别规则组成,单个所述识别规则中包含多个识别参数;Loading a pre-built rule library, where the rule library is composed of multiple identification rules, and a single identification rule contains multiple identification parameters;
    从所述预处理文本中筛选得到与各所述识别规则的识别参数分别对应的若干个所述时间词。A number of the time words corresponding to the recognition parameters of each recognition rule are obtained by filtering from the preprocessed text.
  13. 根据权利要求12所述的计算机设备,所述第一时间词携带时间词属性,所述时间词属性包括所述第一时间词对应的所述识别规则和所述第一时间词在所述输入文本中的排列位置,所述将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词的步骤,包括:The computer device according to claim 12, wherein the first time word carries a time word attribute, and the time word attribute includes the recognition rule corresponding to the first time word and the input of the first time word in the input The arrangement position in the text, the step of combining each of the first time words according to a second preset rule to obtain a plurality of second time words includes:
    根据所述排列位置的顺序性,依次筛选所述排列位置具有连续性的若干个所述第一时间词进行合并,得到第一合并时间词,并将所述排列位置不具有连续性的若干个所述第一时间词标记为待合并时间词;According to the sequence of the arrangement positions, sequentially select several of the first time words with continuity in the arrangement positions and merge them to obtain the first merged time word, and combine several words with no continuity in the arrangement positions The first time word is marked as a time word to be merged;
    按照所述排列位置的顺序性,分别将所述排列位置在预设范围内的各个所述待合并时间词归类为同一集合,得到至少一个第一时间词集合;According to the sequence of the arrangement positions, respectively classify each of the time words to be merged whose arrangement positions are within a preset range into the same set to obtain at least one first time word set;
    在同一所述第一时间词集合中,筛选具有关联关系的所述识别规则分别对应的各个所述待合并时间词进行二次合并,得到第二合并时间词;In the same first time word set, filter each of the to-be-combined time words respectively corresponding to the recognition rules having an association relationship and merge them twice to obtain a second combined time word;
    将所述第一合并时间词和所述第二合并时间词作所述第二时间词。The first combined time word and the second combined time word are used as the second time word.
  14. 根据权利要求13所述的计算机设备,所述排列位置包括起始位置和结束位置,所述根据所述排列位置的顺序性,依次筛选所述排列位置具有连续性的若干个所述第一时间词进行合并,得到第一合并时间词的步骤,包括:The computer device according to claim 13, wherein the arrangement position includes a start position and an end position, and the arrangement positions are sequentially filtered according to the sequence of the arrangement positions for a plurality of the first times with continuity The steps of merging words to obtain the first merging time word include:
    判断一个所述第一时间词的结束位置是否与另一个所述第一时间词的开始位置相邻;Judging whether the end position of one of the first time words is adjacent to the beginning position of another first time word;
    若与另一个所述第一时间词的开始位置相邻,则判定两个所述第一时间词各自对应的所述排列位置具有连续性;If it is adjacent to the start position of another first time word, it is determined that the arrangement positions corresponding to the two first time words have continuity;
    根据所述排列位置的顺序性,依次遍历所有的所述第一时间词,将具有所述连续性的若干个所述排列位置分别对应的各所述第一时间词进行合并,得到第一合并时间词。According to the sequence of the arrangement positions, all the first time words are sequentially traversed, and the first time words corresponding to the plurality of arrangement positions with the continuity are merged to obtain the first merged Time word.
  15. 根据权利要求14所述的计算机设备,所述将具有所述连续性的若干个所述排列位置分别对应的各所述第一时间词进行合并,得到第一合并时间词的步骤,包括:The computer device according to claim 14, wherein the step of combining each of the first time words corresponding to the plurality of the arrangement positions having the continuity to obtain the first combined time word comprises:
    将若干个所述第一时间词按照各自对应的所述排列位置进行顺序合并,得到所述第一合并时间词。Combining a number of the first time words in sequence according to the respective arrangement positions to obtain the first combined time word.
  16. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现一种自然语言时间词的解析方法,其中,所述自然语言时间词的解析方法包括以下步骤:A computer-readable storage medium with a computer program stored thereon, which implements a natural language time word parsing method when the computer program is executed by a processor, wherein the natural language time word parsing method includes the following steps:
    获取输入文本;Get the input text;
    去除所述输入文本中的预设字符,得到预处理文本;Remove preset characters in the input text to obtain preprocessed text;
    根据第一预设规则对所述预处理文本进行分词,得到若干个时间词;Perform word segmentation on the preprocessed text according to the first preset rule to obtain several time words;
    将各所述时间词进行数据封装,得到各所述时间词分别对应的第一时间词;Data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;
    将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词;Combine each of the first time words according to a second preset rule to obtain a number of second time words;
    分别解析各所述第二时间词,得到各所述第二时间词各自对应的时间区间。Analyze each of the second time words respectively to obtain the time intervals corresponding to each of the second time words.
  17. 根据权利要求16所述的计算机可读存储介质,所述根据第一预设规则对所述预处理文本进行分词,得到若干个时间词的步骤,包括:The computer-readable storage medium according to claim 16, wherein the step of segmenting the preprocessed text according to a first preset rule to obtain several time words comprises:
    加载预先构建的规则库,其中,所述规则库由多个识别规则组成,单个所述识别规则中包含多个识别参数;Loading a pre-built rule library, where the rule library is composed of multiple identification rules, and a single identification rule contains multiple identification parameters;
    从所述预处理文本中筛选得到与各所述识别规则的识别参数分别对应的若干个所述时间词。A number of the time words corresponding to the recognition parameters of each recognition rule are obtained by filtering from the preprocessed text.
  18. 根据权利要求17所述的计算机可读存储介质,所述第一时间词携带时间词属性,所述时间词属性包括所述第一时间词对应的所述识别规则和所述第一时间词在所述输入文本中的排列位置,所述将各所述第一时间词按照第二预设规则进行合并,得到若干个第二时间词的步骤,包括:The computer-readable storage medium according to claim 17, wherein the first time word carries a time word attribute, and the time word attribute includes the recognition rule corresponding to the first time word and where the first time word is The arrangement position in the input text, the step of combining each of the first time words according to a second preset rule to obtain a plurality of second time words includes:
    根据所述排列位置的顺序性,依次筛选所述排列位置具有连续性的若干个所述第一时间词进行合并,得到第一合并时间词,并将所述排列位置不具有连续性的若干个所述第一时间词标记为待合并时间词;According to the sequence of the arrangement positions, sequentially select several of the first time words with continuity in the arrangement positions and merge them to obtain the first merged time word, and combine several words with no continuity in the arrangement positions The first time word is marked as a time word to be merged;
    按照所述排列位置的顺序性,分别将所述排列位置在预设范围内的各个所述待合并时间词归类为同一集合,得到至少一个第一时间词集合;According to the sequence of the arrangement positions, respectively classify each of the time words to be merged whose arrangement positions are within a preset range into the same set to obtain at least one first time word set;
    在同一所述第一时间词集合中,筛选具有关联关系的所述识别规则分别对应的各个所述待合并时间词进行二次合并,得到第二合并时间词;In the same first time word set, filter each of the to-be-combined time words respectively corresponding to the recognition rules having an association relationship and merge them twice to obtain a second combined time word;
    将所述第一合并时间词和所述第二合并时间词作所述第二时间词。The first combined time word and the second combined time word are used as the second time word.
  19. 根据权利要求18所述的计算机可读存储介质,所述排列位置包括起始位置和结束位置,所述根据所述排列位置的顺序性,依次筛选所述排列位置具有连续性的若干个所述第一时间词进行合并,得到第一合并时间词的步骤,包括:The computer-readable storage medium according to claim 18, wherein the arrangement position includes a start position and an end position, and according to the sequence of the arrangement position, the plurality of the arrangement positions having the continuity are sequentially filtered. The steps of merging the first time word to obtain the first merged time word include:
    判断一个所述第一时间词的结束位置是否与另一个所述第一时间词的开始位置相邻;Judging whether the end position of one of the first time words is adjacent to the beginning position of another first time word;
    若与另一个所述第一时间词的开始位置相邻,则判定两个所述第一时间词各自对应的所述排列位置具有连续性;If it is adjacent to the start position of another first time word, it is determined that the arrangement positions corresponding to the two first time words have continuity;
    根据所述排列位置的顺序性,依次遍历所有的所述第一时间词,将具有所述连续性的若干个所述排列位置分别对应的各所述第一时间词进行合并,得到第一合并时间词。According to the sequence of the arrangement positions, all the first time words are sequentially traversed, and the first time words corresponding to the plurality of arrangement positions with the continuity are merged to obtain the first merged Time word.
  20. 根据权利要求19所述的计算机可读存储介质,所述将具有所述连续性的若干个所述排列位置分别对应的各所述第一时间词进行合并,得到第一合并时间词的步骤,包括:The computer-readable storage medium according to claim 19, the step of combining each of the first time words corresponding to the plurality of the arrangement positions having the continuity to obtain the first combined time word, include:
    将若干个所述第一时间词按照各自对应的所述排列位置进行顺序合并,得到所述第一合并时间词。Combining a number of the first time words in sequence according to the respective arrangement positions to obtain the first combined time word.
PCT/CN2020/093111 2019-10-30 2020-05-29 Parsing method and apparatus for natural language time words, and computer device WO2021082424A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911045300.0 2019-10-30
CN201911045300.0A CN111027319A (en) 2019-10-30 2019-10-30 Method and device for analyzing natural language time words and computer equipment

Publications (1)

Publication Number Publication Date
WO2021082424A1 true WO2021082424A1 (en) 2021-05-06

Family

ID=70200542

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093111 WO2021082424A1 (en) 2019-10-30 2020-05-29 Parsing method and apparatus for natural language time words, and computer device

Country Status (2)

Country Link
CN (1) CN111027319A (en)
WO (1) WO2021082424A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027319A (en) * 2019-10-30 2020-04-17 平安科技(深圳)有限公司 Method and device for analyzing natural language time words and computer equipment
CN113988067B (en) * 2021-11-12 2024-06-25 北京嘉和海森健康科技有限公司 Sentence word segmentation method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6122650A (en) * 1997-04-25 2000-09-19 Sanyo Electric Co., Ltd. Method and apparatus for updating time related data in a modified document
CN108829673A (en) * 2018-06-08 2018-11-16 北京玄科技有限公司 The abstracting method and device of time word
CN109885659A (en) * 2019-02-20 2019-06-14 安徽省泰岳祥升软件有限公司 The normalized method and device of temporal information in a kind of pair of text
CN111027319A (en) * 2019-10-30 2020-04-17 平安科技(深圳)有限公司 Method and device for analyzing natural language time words and computer equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729314B (en) * 2017-09-29 2021-10-26 东软集团股份有限公司 Chinese time identification method and device, storage medium and program product
CN107894978B (en) * 2017-11-14 2021-04-09 鼎富智能科技有限公司 Time word extraction method and device
CN108549694B (en) * 2018-04-16 2021-11-23 南京云问网络技术有限公司 Method for processing time information in text
CN109190119B (en) * 2018-08-22 2020-11-10 腾讯科技(深圳)有限公司 Time extraction method and device, storage medium and electronic device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6122650A (en) * 1997-04-25 2000-09-19 Sanyo Electric Co., Ltd. Method and apparatus for updating time related data in a modified document
CN108829673A (en) * 2018-06-08 2018-11-16 北京玄科技有限公司 The abstracting method and device of time word
CN109885659A (en) * 2019-02-20 2019-06-14 安徽省泰岳祥升软件有限公司 The normalized method and device of temporal information in a kind of pair of text
CN111027319A (en) * 2019-10-30 2020-04-17 平安科技(深圳)有限公司 Method and device for analyzing natural language time words and computer equipment

Also Published As

Publication number Publication date
CN111027319A (en) 2020-04-17

Similar Documents

Publication Publication Date Title
CN110444198B (en) Retrieval method, retrieval device, computer equipment and storage medium
CN111444723B (en) Information extraction method, computer device, and storage medium
WO2021042503A1 (en) Information classification extraction method, apparatus, computer device and storage medium
US10803253B2 (en) Method and device for extracting point of interest from natural language sentences
CN110704571B (en) Court trial auxiliary processing method, trial auxiliary processing device, equipment and medium
CN109063221A (en) Query intention recognition methods and device based on mixed strategy
US10803252B2 (en) Method and device for extracting attributes associated with centre of interest from natural language sentences
Zeng et al. Photon: A robust cross-domain text-to-SQL system
CN113506574A (en) Method and device for recognizing user-defined command words and computer equipment
US11113478B2 (en) Responsive document generation
WO2021082424A1 (en) Parsing method and apparatus for natural language time words, and computer device
CN110781673B (en) Document acceptance method and device, computer equipment and storage medium
CN105378706B (en) Entity extraction is fed back
CN109614627A (en) A kind of text punctuate prediction technique, device, computer equipment and storage medium
WO2022238881A1 (en) Method and system for processing user inputs using natural language processing
CN113343108A (en) Recommendation information processing method, device, equipment and storage medium
WO2022267460A1 (en) Event-based sentiment analysis method and apparatus, and computer device and storage medium
CN111126201A (en) Method and device for identifying people in script
CN112668284B (en) Legal document segmentation method and system
CN109800430A (en) semantic understanding method and system
JP2000040085A (en) Method and device for post-processing for japanese morpheme analytic processing
CN112270018B (en) Scene-sensitive system and method for automatically placing hook function
KR102492008B1 (en) Apparatus for managing minutes and method thereof
CN114547059A (en) Platform data updating method and device and computer equipment
KR20200072005A (en) Method for correcting speech recognized sentence

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20883207

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20883207

Country of ref document: EP

Kind code of ref document: A1