WO2021082424A1

WO2021082424A1 - Parsing method and apparatus for natural language time words, and computer device

Info

Publication number: WO2021082424A1
Application number: PCT/CN2020/093111
Authority: WO
Inventors: 查月阅; 张骏
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-10-30
Filing date: 2020-05-29
Publication date: 2021-05-06
Also published as: CN111027319A

Abstract

The present application relates to the field of semantic parsing, and provides a parsing method and apparatus for natural language time words, a computer device, and a computer readable storage medium. The method comprises: obtaining an input text; removing preset characters in the input text to obtain a preprocessed text; performing word segmentation to obtain a plurality of time words; performing data encapsulation to obtain first time words corresponding to the time words; combining the first time words to obtain a plurality of second time words; and parsing the second time words to obtain time intervals corresponding to the second time words. According to the present application, corresponding time words are extracted from an input text by means of a plurality of recognition rules, then the time words are combined according to the arrangement positions of the time words in the input text and the association between the recognition rules, and finally the combined time words are parsed according to meanings to obtain corresponding time intervals, thereby implementing the parsing of all time words in natural language, and effectively improving the comprehensiveness and accuracy of time word recognition in the input text.

Description

Natural language time word parsing method, device and computer equipment

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 30, 2019, the application number is 201911045300.0, and the invention title is "Natural language time word parsing method, device and computer equipment", the entire content of which is incorporated by reference Incorporated in this application.

Technical field

This application relates to the technical field of semantic parsing, and in particular to a method, device and computer equipment for parsing natural language time words.

Background technique

When analyzing natural language, time information is an indispensable element for a complete analysis of natural language semantics. The existing recognition method of time information in natural language is mainly based on the recognition of fixed rules, and the fixed rules are matched with the text, so as to extract the time words, for example, extract "September 10th, 2018" which represents the date. Time word. The inventor realized that this recognition method requires the construction of a large number of rules. On the one hand, it is too complicated and rigid, which is not convenient for later developers to understand and modify; on the other hand, the time words extracted from the text by this fixed rule are not comprehensive enough. The accuracy rate is low.

technical problem

The main purpose of this application is to provide a natural language time word parsing method, device and computer equipment, aiming to solve the disadvantages of the existing time word parsing method that is too rigid, accurate and low in completeness.

Technical solutions

In order to achieve the above objectives, in the first aspect, this application provides a natural language time word parsing method, including:

Get the input text;

Remove preset characters in the input text to obtain preprocessed text;

Perform word segmentation on the preprocessed text according to the first preset rule to obtain several time words;

Data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;

Combine each of the first time words according to a second preset rule to obtain a number of second time words;

Analyze each of the second time words respectively to obtain the time intervals corresponding to each of the second time words.

In the second aspect, this application also provides a natural language time word parsing device, including:

Get module, used to get input text;

A processing module for removing preset characters in the input text to obtain preprocessed text;

The word segmentation module is used to segment the preprocessed text according to the first preset rule to obtain several time words;

The encapsulation module is used for data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;

The merging module is used to merge each of the first time words according to a second preset rule to obtain a number of second time words;

The parsing module is used to analyze each of the second time words to obtain the time interval corresponding to each of the second time words.

In a third aspect, the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the natural language time word parsing method when the processor executes the computer program, wherein: The natural language time word parsing method includes the following steps: obtaining input text; removing preset characters in the input text to obtain a preprocessed text; performing word segmentation on the preprocessed text according to a first preset rule to obtain several Time words; data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words; combining the first time words according to the second preset rule to obtain a number of second time words Time words; respectively analyze each of the second time words to obtain the time interval corresponding to each of the second time words.

In a fourth aspect, the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned natural language time word parsing method is realized, wherein the natural language time word The parsing method includes the following steps: obtaining input text; removing preset characters in the input text to obtain a preprocessed text; segmenting the preprocessed text according to a first preset rule to obtain several time words; The time words are data encapsulated to obtain the first time words corresponding to each of the time words; the first time words are combined according to a second preset rule to obtain a number of second time words; each of the time words is parsed separately For the second time word, the time interval corresponding to each of the second time words is obtained.

Beneficial effect

The natural language time word parsing method, device and computer equipment provided in this application first extract multiple time words from the input text through multiple pre-built recognition rules, and then correspond to each time word in the input text. The arrangement position and the association between the recognition rules are merged corresponding time words, and finally the merged time words are analyzed according to the corresponding word meaning to obtain the corresponding time interval, so as to realize the analysis of all time words in natural language, and effectively improve the input The comprehensiveness and accuracy of time word recognition in the text.

Description of the drawings

Figure 1 is a schematic diagram of the steps of a natural language time word parsing method in an embodiment of the present application;

2 is a block diagram of the overall structure of a natural language time word parsing device in an embodiment of the present application;

FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the present application.

The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

The best mode of the present invention

In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.

1, an embodiment of the present application provides a natural language time word parsing method, including:

S1: Get the input text;

S2: Remove preset characters in the input text to obtain preprocessed text;

S3: Perform word segmentation on the preprocessed text according to the first preset rule to obtain several time words;

S4: Data encapsulation of each of the time words to obtain the first time word corresponding to each of the time words;

S5: Combine each of the first time words according to a second preset rule to obtain a number of second time words;

S6: Parse each of the second time words separately to obtain the time interval corresponding to each of the second time words.

In this embodiment, natural language refers to a language that humans naturally narrate, such as a segment of speech or text. If the analysis system receives the user's voice information, it needs to convert the voice information into text information. After the parsing system receives the natural language input by the user, it converts it into information in text format to obtain the input text. The parsing system needs to preprocess the input text, identify preset characters in the input text by marking sensitive characters, etc., and remove them, so as to obtain the preprocessed text, so as to reduce the processing complexity of subsequent word segmentation. For example, the input text is: "The net profit of the last month of the second quarter of 2018", and the pre-processed text after removing the preset characters by marking the sensitive character "的" is: "The last month of the second quarter of 2018 Net profit for one month". A rule library is pre-built in the parsing system. The rule library is a regular expression. The rule library is composed of multiple recognition rules. Each recognition rule contains multiple different recognition parameters. One recognition rule is used to recognize a type of time word. . After loading the rule base, the parsing system respectively calls each recognition rule in the rule base to segment the preprocessed text, thereby obtaining one or more time words corresponding to each recognition rule. For example, the preprocessed text is: "Net profit in the last month of the second quarter of 2018", and the recognition parameters of recognition rule A include "year", so the time word obtained after segmentation of the preprocessed text by recognition rule A is: 2018 Year; the recognition parameter of recognition rule B includes "quarter", so the time word obtained after word segmentation of the preprocessed text by recognition rule B is: the second quarter. The parsing system encapsulates the data of each time word after word segmentation, so that the format of each time word is unified, and the corresponding first time word is obtained. Among them, the first time word carries the time word attribute, and the time word attribute includes corresponding information such as the recognition rule corresponding to the first time word, the arrangement position of the first time word in the input text, for example: the first time word: 2018 , Corresponding rule: identification rule A, start position: 0, end position: 4. After the data encapsulation is completed, the parsing system first filters out two or more first-time words with continuity in the arrangement position according to the arrangement position of each first-time word in the input text, and merges to obtain the first merged time Words, and mark several first time words that do not have continuity in their arrangement positions as time words to be merged. Then classify each time word to be merged in the preset range to form several first time word sets, that is, in the same first time word set, the time word to be merged must be the same as another time word to be merged. The arrangement position is within the preset range. In the same first time word set, the parsing system screens and merges several time words to be merged corresponding to the recognition rules with an association relationship to obtain several second merge time words. The analysis system synthesizes each of the first combined time words and each of the second combined time words to obtain each of the second time words. The analysis system performs corresponding analysis on the second time word, and obtains the corresponding time interval according to the start time and end time corresponding to the second time word. For example, if the second time word is: 2018, the corresponding time interval is: 0:00 on January 1, 2018-24:00 on December 31, 2018. Further, the parsing system outputs the time interval in a preset format, for example, output from 0:00 on January 1, 2018 to 24:00 on December 31, 2018 as: 2018-01-01-0:00——2018 -12-31-24: 00.

Further, the step of segmenting the preprocessed text according to the first preset rule to obtain several time words includes:

S301: Load a pre-built rule library, where the rule library is composed of multiple identification rules, and a single identification rule contains multiple identification parameters;

S302: Filter from the pre-processed text to obtain a plurality of the time words corresponding to the recognition parameters of each recognition rule.

In this embodiment, a rule library is pre-built in the parsing system, and the rule library is a regular expression. The rule base is composed of multiple identification rules, and each identification rule contains multiple identification parameters. In addition to the conventional identification parameters such as "year" and "quarter" mentioned above, the identification library also includes special identification parameters such as "before", "after", "current day", "yesterday", etc., which can be used to identify similar "after 6 days" "This kind of time word. The parsing system filters out one or more time words from the preprocessed text through the recognition parameters in each recognition rule, and realizes the word segmentation of the preprocessed text. Among them, the time words filtered based on the same recognition rule belong to the same category and correspond to the recognition rule.

Further, the first time word carries a time word attribute, and the time word attribute includes the recognition rule corresponding to the first time word and the arrangement position of the first time word in the input text, so The step of combining each of the first time words according to the second preset rule to obtain several second time words includes:

S501: According to the sequence of the arrangement positions, sequentially filter and merge several of the first time words with continuity in the arrangement positions to obtain the first merged time word, and combine the ones with no continuity in the arrangement position. A plurality of said first time words are marked as time words to be merged;

S502: According to the sequence of the arrangement positions, respectively classify each of the time words to be merged whose arrangement positions are within a preset range into the same set to obtain at least one first time word set;

S503: In the same first time word set, filter each of the to-be-combined time words respectively corresponding to the recognition rules having an association relationship and perform a second merging to obtain a second combined time word;

S504: Use the first combined time word and the second combined time word as the second time word.

In this embodiment, the time word attribute carried by the first time word includes the recognition rule corresponding to the first time word, the start position and the end position of the first time word in the input text, that is, the arrangement position. The parsing system judges whether the end position of a first time word is adjacent to the start position of another first time word, and if the end position is adjacent to the start position, it judges whether the two first time words correspond to the arrangement positions respectively. There is continuity between. The parsing system screens out several first time words with continuity in the arrangement position according to the above method and merges them, thereby obtaining one or more first merged time words. Further, in the process of merging the first time words according to the continuity of the arrangement position, the parsing system may continuously merge multiple first time words. For example, the first time word A and the first time word B have continuity, and the first time word The word B and the first time word C have continuity, and the parsing system can combine the first time word A, the first time word B, and the first time word C to obtain a first combined time word. In addition, the parsing system marks several first time words whose arrangement positions do not have continuity as time words to be merged, so as to use another rule for merging. Specifically, the parsing system compares the arrangement positions of the time words to be merged in pairs according to the sequence of the arrangement positions of the time words to be merged in the input text. If the arrangement positions are within a preset range, for example, to be merged The end position of the time word A is 5, the start position of the time word B to be merged is 8, the end position is 10, the start position of the time word C to be merged is 12, and the preset range is 3. A. The time word to be merged B and the time word C to be merged are included in the same time word set. The parsing system forms one or more first-time word sets in the above-mentioned manner, and then according to the association relationship between the pre-established recognition rules, in the same first-time word set, selects several pending recognition rules corresponding to the association relationship. The merged time words are merged to obtain the second merged time word. The parsing system synthesizes the first time word to be merged and the second time word to be merged to obtain the second time word.

Further, the arrangement position includes a start position and an end position, and according to the sequence of the arrangement position, a number of the first time words with continuity in the arrangement position are sequentially filtered and combined to obtain the first time word. The steps to merge time words include:

S5011: Determine whether the end position of one of the first time words is adjacent to the start position of another first time word;

S5012: If it is adjacent to the start position of another first time word, determine that the arrangement positions corresponding to the two first time words have continuity;

S5013: According to the sequence of the arrangement positions, sequentially traverse all the first time words, and merge each of the first time words corresponding to the plurality of arrangement positions with the continuity to obtain the first time words. A combined time word.

In this embodiment, the arrangement position of the first time word in the input text includes a start position and an end position. The parsing system judges whether the end position of a first time word is adjacent to the start position of another first time word. If the end position of a first time word is adjacent to the start position of another first time word, the parsing system determines that the arrangement positions of the two first time words have continuity. For example, the start position of the first time word A is 3 and the end position is 6,; the start position of the first time word B is 7, and the end position is 9; because the end position of the first time word A is "6" and the first time The start position "7" of word B is adjacent, then the system determines that the corresponding arrangement positions of the first time word A and the first time word B have continuity. The parsing system sequentially traverses all the first time words according to the sequence of the arrangement positions of the first time words in the input text, and according to the above-mentioned judgment method, filters out the first time corresponding to each arrangement position with continuity. Words, and merge them in sequence according to the arrangement position of each time word to obtain one or more first merged time words.

Further, the step of combining each of the first time words corresponding to the plurality of the arrangement positions having the continuity to obtain the first combined time word includes:

S50131: Combine a number of the first time words in order according to their corresponding arrangement positions to obtain the first combined time word.

In this embodiment, when the parsing system merges two or more first time words whose arrangement positions have continuity, it needs to merge sequentially according to the respective arrangement positions of the first time words in the input text. Specifically, the parsing system can determine the arrangement position of the two first time words in the input text according to the size relationship between the corresponding start positions or end positions of the two first time words. For example, the start position of the first time word A is 5. The start position of the first time word B is 9. Since the start position of the first time word A is smaller than the start article of the first time word B, it must be ranked before the first time word B. Since the input text is obtained based on the natural language input by the user, the time word in the natural language itself has a specific logic and sequence. For example, when we speak, we normally only say September 2018, not September. In 2018, therefore, the parsing system needs to merge the two first time words according to the order of the arrangement positions to obtain the first merged time word.

Further, in the same first time word set, the step of screening each of the to-be-combined time words corresponding to the recognition rules having an association relationship and performing secondary merging to obtain a second combined time word includes :

S5031: Classify each of the time words to be merged according to their corresponding recognition rules to obtain several sets of second time words;

S5032: Combine the second time word sets corresponding to the recognition rules each having an association relationship to obtain several third time word sets;

S5033: Filter and merge at least two of the time words to be merged that are simultaneously included in the third time word set and the first time word set to obtain the second merged time word.

In this embodiment, the parsing system classifies each time word to be combined according to their corresponding recognition rules, thereby obtaining one or more second time word sets, where each time word to be combined in the same second time word set is Filtered by the same recognition rules. The identification rules in the rule base are pre-built with an association relationship. For example, the identification rule A can identify the time word "year", the identification rule B can identify the time word "month", and the identification rule A and the identification rule B are related to each other for follow-up Combine the time word "year" with the time word "month". The parsing system respectively merges two or more second time word sets corresponding to the recognition rules with an association relationship to obtain one or more third time word sets. If two time words to be merged are included in the third time word set and the first time word set at the same time, it means that the arrangement position between the two time words to be merged is within the preset range, and their respective recognition rules Have an association relationship. Therefore, the parsing system only needs to filter and merge at least two time words to be merged that are contained in the first time word set and the third time word set at the same time, to obtain the second merged time word with logical association.

Further, the step of separately analyzing each of the second time words to obtain the time interval corresponding to each of the second time words includes:

S601: Determine whether the second time word belongs to a pre-built marked time word;

S602: If it does not belong to a pre-built marked time word, obtain the corresponding time interval according to the start time and end time of the second time word;

S603: If it belongs to a pre-built marked time word, acquire the current reference time point;

S604: Calculate the corresponding time interval according to the reference time point and the meaning of the second time word.

In this embodiment, the second time word obtained after the analysis system is merged will have two forms, one is: 2018, August, 9th and other time words with certain semantics, and the other is: today, After the day after tomorrow and 6 days later, this type of semantically ambiguous time words, developers set this type of semantically ambiguous time words as marked time words, and different forms of second time words have different processing methods in the parsing process. The parsing system first judges whether the second time word is a marked time word, if not, it can directly obtain the corresponding time interval according to the start time and end time of the second time word. For example, the second time word is: June 2018, the corresponding time interval is: June 1, 2018 0:00-June 30, 2018 24:00. Specifically, in the execution process of the machine, the corresponding interval is accurate to microseconds, and will not be described in detail here. In the case of marking time words, the analysis system needs to obtain the current reference time point. Specifically, the reference time point is obtained according to the time zone where the user is currently located, that is, corresponds to the current time zone of the user. The analysis system calculates the corresponding time interval according to the reference time point and the meaning of the second time word. For example, the reference time point is: June 24, 2018, and the second time word is: 3 days later, the corresponding time interval is 2018 From 0:00 on June 27th to 24:00 on June 27th, 2018.

The method for parsing natural language time words provided by this embodiment first extracts multiple time words from the input text through a plurality of pre-built recognition rules, and then according to the respective arrangement position and recognition of each time word in the input text The association between the rules carries out the corresponding time word merging, and finally the merged time word is analyzed according to the corresponding word meaning to obtain the corresponding time interval, so as to realize the analysis of all time words in natural language, and effectively improve the recognition of time words in the input text The comprehensiveness and accuracy rate.

2, an embodiment of the present application also provides a natural language time word parsing device, including:

Obtaining module 1, used to obtain input text;

The processing module 2 is used to remove preset characters in the input text to obtain preprocessed text;

The word segmentation module 3 is used to segment the preprocessed text according to the first preset rule to obtain several time words;

The encapsulation module 4 is used for data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;

The merging module 5 is used to merge each of the first time words according to a second preset rule to obtain a number of second time words;

The parsing module 6 is configured to analyze each of the second time words separately to obtain the time interval corresponding to each of the second time words.

Further, the analysis device further includes an output module, configured to output each of the time intervals to a display interface in a preset format.

In this embodiment, the functions and functions of the acquisition module 1, the processing module 2, the word segmentation module 3, the encapsulation module 4, the merging module 5 and the parsing module 6 in the above-mentioned plug-in detection device are detailed in the above-mentioned plug-in based on login data. The implementation process of corresponding steps S1 to S6 in the detection method will not be repeated here.

Further, the word segmentation module 3 includes:

A loading sub-module for loading a pre-built rule library, where the rule library is composed of multiple identification rules, and a single identification rule contains multiple identification parameters;

The first screening sub-module is used for screening from the preprocessed text to obtain a plurality of the time words corresponding to the recognition parameters of the recognition rules.

In this embodiment, the implementation process of the functions and roles of the loading sub-module and the first screening sub-module in the above-mentioned plug-in detection device is detailed in the implementation process corresponding to steps S301 to S302 in the above-mentioned plug-in detection method based on login data. No longer.

Further, the first time word carries a time word attribute, and the time word attribute includes the recognition rule corresponding to the first time word and the arrangement position of the first time word in the input text, so The merge module 5 includes:

The second screening sub-module is used to sequentially filter and merge several of the first time words with continuity in the arrangement positions according to the sequence of the arrangement positions to obtain the first merged time words, and arrange the The plurality of said first time words whose positions are not continuous are marked as time words to be merged;

The classification sub-module is configured to classify each of the time words to be merged with the arrangement positions within a preset range into the same set according to the sequence of the arrangement positions, to obtain at least one first time word set;

The merging sub-module is configured to filter each of the to-be-merged time words respectively corresponding to the recognition rules having an association relationship in the same first time word set to perform a second merging to obtain a second merged time word;

The marking sub-module is configured to use the first combined time word and the second combined time word as the second time word.

In this embodiment, the functions and functions of the second screening sub-module, classification sub-module, merging sub-module, and marking sub-module in the above-mentioned plug-in detection device are detailed in the corresponding steps in the above-mentioned plug-in detection method based on login data. The implementation process of S501 to S504 will not be repeated here.

Further, the arrangement position includes a start position and an end position, and the second screening submodule includes:

A judging unit for judging whether the end position of one said first time word is adjacent to the beginning position of another said first time word;

A determining unit, configured to determine that the arrangement positions corresponding to the two first time words have continuity if it is adjacent to the start position of another first time word;

The traversal unit is configured to sequentially traverse all the first time words according to the sequence of the arrangement positions, and merge each of the first time words corresponding to the plurality of the arrangement positions with the continuity. , Get the first combined time word.

In this embodiment, the implementation process of the functions and roles of the judgment unit, judgment unit, and jump unit in the above-mentioned plug-in detection device is detailed in the implementation process corresponding to steps S5011 to S5013 in the above-mentioned plug-in detection method based on login data, here No longer.

Further, the determining unit includes:

The merging subunit is configured to sequentially merge several of the other first time words according to the respective arrangement positions to obtain the first merged time word.

In this embodiment, the implementation process of the functions and roles of the merged subunit in the above-mentioned plug-in detection device is detailed in the implementation process corresponding to step S50131 in the above-mentioned plug-in detection method based on login data, which will not be repeated here.

Further, the merging sub-module includes:

The classification unit is configured to classify each of the time words to be merged according to the corresponding recognition rules to obtain a number of second time word sets;

The first merging unit is configured to merge the second time word sets corresponding to each of the recognition rules that have an association relationship to obtain a plurality of third time word sets;

The second merging unit is used to filter and merge at least two of the time words to be merged that are simultaneously included in the third time word set and the first time word set to obtain the second merging time word.

In this embodiment, the implementation process of the functions and roles of the classification unit, the first merging unit and the second merging unit in the above-mentioned plug-in detection device is detailed in the implementation process corresponding to steps S5031 to S5033 in the above-mentioned plug-in detection method based on login data. , I won’t repeat it here.

Further, the analysis module 6 includes:

The judging sub-module is used to judge whether the second time word belongs to a pre-built marked time word;

The first calculation sub-module is configured to obtain the corresponding time interval according to the start time and end time of the second time word if it does not belong to a pre-built marked time word;

The acquiring sub-module is used to acquire the current reference time point if it belongs to a pre-built marked time word;

The second calculation submodule is configured to calculate the corresponding time interval according to the reference time point and the meaning of the second time word.

In this embodiment, the functions and functions of the judgment sub-module, the first calculation sub-module, the acquisition sub-module and the second calculation sub-module in the above-mentioned plug-in detection device are detailed in the corresponding method in the above-mentioned plug-in detection method based on login data. The implementation process of steps S601 to S604 will not be repeated here

The natural language time word parsing device provided in this embodiment first extracts multiple time words from the input text through multiple pre-built recognition rules, and then according to the respective arrangement position and recognition of each time word in the input text The association between the rules carries out the corresponding time word merging, and finally the merged time word is analyzed according to the corresponding word meaning to obtain the corresponding time interval, so as to realize the analysis of all time words in natural language, and effectively improve the recognition of time words in the input text The comprehensiveness and accuracy rate.

Referring to FIG. 3, an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store data such as a rule library. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize the function of the natural language time word parsing method in any of the above embodiments.

The foregoing processor executes the steps of the foregoing natural language time word parsing method:

S1: Get the input text;

S2: Remove preset characters in the input text to obtain preprocessed text;

An embodiment of the present application also provides a computer-readable storage medium. The storage medium may be a non-volatile storage medium or a volatile storage medium, on which a computer program is stored. When the computer program is executed by a processor, The method for parsing natural language time words in any of the above embodiments is specifically as follows:

S1: Get the input text;

S2: Remove preset characters in the input text to obtain preprocessed text;

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored and a non-volatile computer readable storage. In the medium, when the computer program is executed, it may include the procedures of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The above are only the preferred embodiments of this application, and do not therefore limit the scope of the patent of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of this application.

Claims

A natural language time word parsing method, including:

Get the input text;

Remove preset characters in the input text to obtain preprocessed text;

Perform word segmentation on the preprocessed text according to the first preset rule to obtain several time words;

Data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;

Combine each of the first time words according to a second preset rule to obtain a number of second time words;

Analyze each of the second time words respectively to obtain the time intervals corresponding to each of the second time words.
The method for parsing natural language time words according to claim 1, wherein the step of segmenting the preprocessed text according to a first preset rule to obtain several time words comprises:

Loading a pre-built rule library, where the rule library is composed of multiple identification rules, and a single identification rule contains multiple identification parameters;

A number of the time words corresponding to the recognition parameters of each recognition rule are obtained by filtering from the preprocessed text.
The natural language time word parsing method according to claim 2, wherein the first time word carries a time word attribute, and the time word attribute includes the recognition rule corresponding to the first time word and the first time The arrangement position of words in the input text, and the step of merging each of the first time words according to a second preset rule to obtain a plurality of second time words includes:

According to the sequence of the arrangement positions, sequentially select several of the first time words with continuity in the arrangement positions and merge them to obtain the first merged time word, and combine several words with no continuity in the arrangement positions The first time word is marked as a time word to be merged;

According to the sequence of the arrangement positions, respectively classify each of the time words to be merged whose arrangement positions are within a preset range into the same set to obtain at least one first time word set;

In the same first time word set, filter each of the to-be-combined time words respectively corresponding to the recognition rules having an association relationship and merge them twice to obtain a second combined time word;

The first combined time word and the second combined time word are used as the second time word.
The natural language time word parsing method according to claim 3, wherein the arrangement position includes a start position and an end position, and according to the sequence of the arrangement position, several consecutive positions in the arrangement position are sequentially selected The step of combining the first time words to obtain the first combined time words includes:

Judging whether the end position of one of the first time words is adjacent to the beginning position of another first time word;

If it is adjacent to the start position of another first time word, it is determined that the arrangement positions corresponding to the two first time words have continuity;

According to the sequence of the arrangement positions, all the first time words are sequentially traversed, and the first time words corresponding to the plurality of arrangement positions with the continuity are merged to obtain the first merged Time word.
The method for parsing natural language time words according to claim 4, wherein the first time words corresponding to the plurality of the arrangement positions with the continuity are merged to obtain the first merged time word The steps include:

Combining a number of the first time words in sequence according to the respective arrangement positions to obtain the first combined time word.
The method for parsing natural language time words according to claim 3, wherein in the same first time word set, each of the time words to be merged corresponding to the recognition rules having an association relationship is selected for a second time The steps of merging to obtain the second merging time word include:

Classify each of the time words to be merged according to their corresponding recognition rules to obtain a number of second time word sets;

Respectively combining the second time word sets corresponding to the respective recognition rules having an association relationship to obtain a plurality of third time word sets;

Not less than two of the time words to be merged that are simultaneously included in the third time word set and the first time word set are filtered and merged to obtain the second merged time word.
The natural language time word parsing method according to claim 1, wherein the step of parsing each of the second time words separately to obtain the time interval corresponding to each of the second time words respectively comprises:

Judging whether the second time word belongs to a pre-built marked time word;

If it does not belong to a pre-built marked time word, obtain the corresponding time interval according to the start time and end time of the second time word;

If it belongs to a pre-built marked time word, get the current reference time point;

The corresponding time interval is calculated according to the reference time point and the meaning of the second time word.
A natural language time word parsing device, including:

Get module, used to get input text;

A processing module for removing preset characters in the input text to obtain preprocessed text;

The word segmentation module is used to segment the preprocessed text according to the first preset rule to obtain several time words;

The encapsulation module is used for data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;

The merging module is used to merge each of the first time words according to a second preset rule to obtain a number of second time words;

The parsing module is used to analyze each of the second time words to obtain the time interval corresponding to each of the second time words.
8. The natural language time word parsing device according to claim 8, wherein the word segmentation module comprises:

A loading sub-module for loading a pre-built rule library, where the rule library is composed of multiple identification rules, and a single identification rule contains multiple identification parameters;

The first screening sub-module is used for screening from the preprocessed text to obtain a plurality of the time words corresponding to the recognition parameters of the recognition rules.
8. The natural language time word parsing device according to claim 8, wherein the first time word carries a time word attribute, and the time word attribute includes the recognition rule corresponding to the first time word and the first time The arrangement position of a word in the input text, and the merging module includes:

The second screening sub-module is used to sequentially filter and merge several of the first time words with continuity in the arrangement positions according to the sequence of the arrangement positions to obtain the first merged time words, and arrange the The plurality of said first time words whose positions are not continuous are marked as time words to be merged;

The classification sub-module is configured to classify each of the time words to be merged with the arrangement positions within a preset range into the same set according to the sequence of the arrangement positions, to obtain at least one first time word set;

The merging sub-module is configured to filter each of the to-be-merged time words respectively corresponding to the recognition rules having an association relationship in the same first time word set to perform a second merging to obtain a second merged time word;

The marking sub-module is configured to use the first combined time word and the second combined time word as the second time word.
A computer device includes a memory and a processor, wherein a computer program is stored in the memory, and the processor implements a natural language time word parsing method when the computer program is executed;

Wherein, the method for parsing the natural language time word includes:

Get the input text;

Remove preset characters in the input text to obtain preprocessed text;

Perform word segmentation on the preprocessed text according to the first preset rule to obtain several time words;

Data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;

Combine each of the first time words according to a second preset rule to obtain a number of second time words;

Analyze each of the second time words respectively to obtain the time intervals corresponding to each of the second time words.
11. The computer device according to claim 11, wherein the step of segmenting the preprocessed text according to a first preset rule to obtain several time words comprises:

Loading a pre-built rule library, where the rule library is composed of multiple identification rules, and a single identification rule contains multiple identification parameters;

A number of the time words corresponding to the recognition parameters of each recognition rule are obtained by filtering from the preprocessed text.
The computer device according to claim 12, wherein the first time word carries a time word attribute, and the time word attribute includes the recognition rule corresponding to the first time word and the input of the first time word in the input The arrangement position in the text, the step of combining each of the first time words according to a second preset rule to obtain a plurality of second time words includes:

According to the sequence of the arrangement positions, sequentially select several of the first time words with continuity in the arrangement positions and merge them to obtain the first merged time word, and combine several words with no continuity in the arrangement positions The first time word is marked as a time word to be merged;

According to the sequence of the arrangement positions, respectively classify each of the time words to be merged whose arrangement positions are within a preset range into the same set to obtain at least one first time word set;

In the same first time word set, filter each of the to-be-combined time words respectively corresponding to the recognition rules having an association relationship and merge them twice to obtain a second combined time word;

The first combined time word and the second combined time word are used as the second time word.
The computer device according to claim 13, wherein the arrangement position includes a start position and an end position, and the arrangement positions are sequentially filtered according to the sequence of the arrangement positions for a plurality of the first times with continuity The steps of merging words to obtain the first merging time word include:

Judging whether the end position of one of the first time words is adjacent to the beginning position of another first time word;

If it is adjacent to the start position of another first time word, it is determined that the arrangement positions corresponding to the two first time words have continuity;

According to the sequence of the arrangement positions, all the first time words are sequentially traversed, and the first time words corresponding to the plurality of arrangement positions with the continuity are merged to obtain the first merged Time word.
The computer device according to claim 14, wherein the step of combining each of the first time words corresponding to the plurality of the arrangement positions having the continuity to obtain the first combined time word comprises:

Combining a number of the first time words in sequence according to the respective arrangement positions to obtain the first combined time word.
A computer-readable storage medium with a computer program stored thereon, which implements a natural language time word parsing method when the computer program is executed by a processor, wherein the natural language time word parsing method includes the following steps:

Get the input text;

Remove preset characters in the input text to obtain preprocessed text;

Perform word segmentation on the preprocessed text according to the first preset rule to obtain several time words;

Data encapsulation of each of the time words to obtain the first time words corresponding to each of the time words;

Combine each of the first time words according to a second preset rule to obtain a number of second time words;

Analyze each of the second time words respectively to obtain the time intervals corresponding to each of the second time words.
The computer-readable storage medium according to claim 16, wherein the step of segmenting the preprocessed text according to a first preset rule to obtain several time words comprises:

Loading a pre-built rule library, where the rule library is composed of multiple identification rules, and a single identification rule contains multiple identification parameters;

A number of the time words corresponding to the recognition parameters of each recognition rule are obtained by filtering from the preprocessed text.
The computer-readable storage medium according to claim 17, wherein the first time word carries a time word attribute, and the time word attribute includes the recognition rule corresponding to the first time word and where the first time word is The arrangement position in the input text, the step of combining each of the first time words according to a second preset rule to obtain a plurality of second time words includes:

According to the sequence of the arrangement positions, sequentially select several of the first time words with continuity in the arrangement positions and merge them to obtain the first merged time word, and combine several words with no continuity in the arrangement positions The first time word is marked as a time word to be merged;

According to the sequence of the arrangement positions, respectively classify each of the time words to be merged whose arrangement positions are within a preset range into the same set to obtain at least one first time word set;

In the same first time word set, filter each of the to-be-combined time words respectively corresponding to the recognition rules having an association relationship and merge them twice to obtain a second combined time word;

The first combined time word and the second combined time word are used as the second time word.
The computer-readable storage medium according to claim 18, wherein the arrangement position includes a start position and an end position, and according to the sequence of the arrangement position, the plurality of the arrangement positions having the continuity are sequentially filtered. The steps of merging the first time word to obtain the first merged time word include:

Judging whether the end position of one of the first time words is adjacent to the beginning position of another first time word;

If it is adjacent to the start position of another first time word, it is determined that the arrangement positions corresponding to the two first time words have continuity;

According to the sequence of the arrangement positions, all the first time words are sequentially traversed, and the first time words corresponding to the plurality of arrangement positions with the continuity are merged to obtain the first merged Time word.
The computer-readable storage medium according to claim 19, the step of combining each of the first time words corresponding to the plurality of the arrangement positions having the continuity to obtain the first combined time word, include:

Combining a number of the first time words in sequence according to the respective arrangement positions to obtain the first combined time word.