CN111597302B - Text event acquisition method and device, electronic equipment and storage medium - Google Patents

Text event acquisition method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111597302B
CN111597302B CN202010350403.4A CN202010350403A CN111597302B CN 111597302 B CN111597302 B CN 111597302B CN 202010350403 A CN202010350403 A CN 202010350403A CN 111597302 B CN111597302 B CN 111597302B
Authority
CN
China
Prior art keywords
event
text
participles
configuration item
element configuration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010350403.4A
Other languages
Chinese (zh)
Other versions
CN111597302A (en
Inventor
岳重阳
冯少辉
李鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Iplus Teck Co ltd
Original Assignee
Beijing Iplus Teck Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Iplus Teck Co ltd filed Critical Beijing Iplus Teck Co ltd
Priority to CN202010350403.4A priority Critical patent/CN111597302B/en
Publication of CN111597302A publication Critical patent/CN111597302A/en
Application granted granted Critical
Publication of CN111597302B publication Critical patent/CN111597302B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention provides a method and a device for acquiring a text event, electronic equipment and a storage medium. The method for acquiring the text event comprises the following steps: performing word segmentation and labeling processing on the text to obtain a plurality of word segments corresponding to the text, wherein each word segment corresponds to one label; according to the marks of the participles, acquiring a plurality of target participles which are associated with a plurality of element attributes in a preset DSL sequence from the plurality of participles, wherein the DSL sequence comprises a plurality of event element configuration items, and the event element configuration items comprise the element attributes; and obtaining a text event according to the plurality of target word segmentation. According to the method for acquiring the text event, the customized field specific language is established according to the information required to be acquired from the text, and then the required elements can be acquired from the text according to the field specific language so as to quickly and accurately acquire the text event.

Description

Text event acquisition method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of natural language processing, in particular to a text event acquisition method and device, electronic equipment and a storage medium.
Background
The event extraction is to acquire necessary event element information from a text, for example: for text: "1 month 1 year in 2020, meeting Beijing with Li Si in three days. "the captured text event elements may include 1 month 1 day of 2020, zhang san, beijing, lie si, meetings, etc.
In the related art, the event extraction method includes a pattern matching method and a sequence labeling method. The schema matching-based mode matches sentence patterns closely related to events in the text by means of rule templates, and has the disadvantages that the customization and maintenance of rules are labor-intensive, and the manual execution efficiency needs to be improved. The method based on sequence labeling is to manually label a large amount of data in advance, and then learn the corresponding relation between the text sequence and the label by using a machine learning or deep learning algorithm, so as to realize the prediction of the corresponding label of the text sequence, and a large amount of data labeling needs to be manually carried out, which also affects the working efficiency, and in addition, the training of the model is time-consuming and labor-consuming.
Disclosure of Invention
Based on the problems in the prior art, the invention provides a method and a device for acquiring a text event, an electronic device and a storage medium.
In a first aspect, the present invention provides a method for acquiring a text event, including: performing word segmentation and labeling processing on a text to obtain a plurality of word segments corresponding to the text, wherein each word segment corresponds to a label; acquiring a plurality of target participles which are associated with a plurality of element attributes in a preset DSL sequence from the plurality of participles according to the marks of the participles, wherein the DSL sequence comprises a plurality of event element configuration items, and the event element configuration items comprise the element attributes; and obtaining the text event according to the target word segmentation.
According to the method for acquiring the text event, the customized field specific language is established according to the information required to be acquired from the text, and then the required elements can be acquired from the text according to the field specific language so as to quickly and accurately acquire the text event.
In some examples, further comprising: and creating the DSL sequence according to the event elements required to be acquired.
In some examples, the creating the DSL sequence according to the event elements required to be obtained comprises: determining an event type configuration item and the event element configuration items of the DSL sequence according to the event element required to be obtained, wherein the event element configuration item further comprises element names respectively corresponding to the element attributes; and applying a preset DSL grammar rule to create the DSL sequence according to the event type configuration item and the plurality of event element configuration items.
In some examples, the obtaining, from the plurality of segments, a plurality of target segments associated with a plurality of element attributes in a preset DSL sequence according to the tagging of the segments includes: obtaining a plurality of element extraction tasks which are in one-to-one correspondence with the plurality of event element configuration items according to the plurality of event element configuration items of the DSL sequence; and traversing the multiple participles in sequence according to the currently executed element extraction tasks in the multiple element extraction tasks, so as to obtain target participles related to the currently executed element extraction tasks from the multiple participles according to the marks of the participles.
In some examples, the plurality of element extraction tasks include a subject element extraction task, a trigger action element extraction task, an object element extraction task, a time element extraction task, and a location element extraction task, and the obtaining, from the plurality of segmented words, a target segmented word associated with the currently executed element extraction task according to a label of the segmented word includes: if the currently executed element extraction task is a subject element extraction task, a trigger action element extraction task or an object element extraction task, the obtained target participle is as follows: according to the marks of the participles, matching the participles from the plurality of the participles and related to the element attributes in the currently executed element extraction task; if the currently executed element extraction task is a time element extraction task or a place element extraction task, the obtained target word segmentation is as follows: and searching the participles searched from the plurality of participles according to the searching mode corresponding to the element attributes in the currently executed element extracting task according to the marks of the participles, wherein the searching mode can be customized according to requirements.
In a second aspect, the present invention further provides an apparatus for acquiring a text event, including: the system comprises a preprocessing module, a marking module and a display module, wherein the preprocessing module is used for performing word segmentation and marking processing on a text to obtain a plurality of word segmentations corresponding to the text, and each word segmentations corresponds to one mark; an obtaining module, configured to obtain, from the multiple participles, multiple target participles associated with multiple element attributes in a preset DSL sequence according to a tag of the participle, where the DSL sequence includes multiple event element configuration items, and the event element configuration item includes the element attributes; and the text event determining module is used for obtaining the text event according to the target word segmentation.
According to the device for acquiring the text event, the customized field specific language is established according to the information required to be acquired from the text, and then the required elements can be acquired from the text according to the field specific language so as to quickly and accurately acquire the text event.
In some examples, further comprising: and the DSL sequence creating module is used for determining an event type configuration item and the event element configuration items of the DSL sequence according to the event element required to be obtained, wherein the event element configuration item also comprises element names respectively corresponding to the element attributes, and the DSL sequence is created according to the event type configuration item and the event element configuration items by applying a preset DSL syntax rule.
In some examples, the obtaining module is configured to obtain, according to a plurality of event element configuration items of the DSL sequence, a plurality of element extraction tasks that correspond to the plurality of event element configuration items one to one, and sequentially traverse the plurality of participles according to currently executed element extraction tasks among the plurality of element extraction tasks, so as to obtain, according to a label of a participle, a target participle associated with the currently executed element extraction task from the plurality of participles.
In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the method for acquiring a text event according to the first aspect.
In a fourth aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method for acquiring a text event according to the first aspect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a method for acquiring a text event according to an embodiment of the present invention;
fig. 2 is a structural diagram of a DSL sequence in a text event acquisition method according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating interconversion between DSL sequences and semantics in a text event acquisition method according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a method for acquiring a text event according to an embodiment of the present invention, in which a DSL sequence is converted into an extraction task set.
Fig. 5 is a schematic diagram illustrating a correspondence between word segments and attributes of a text in the method for acquiring a text event according to an embodiment of the present invention;
fig. 6 is a detailed flowchart of a method for acquiring a text event according to an embodiment of the present invention;
fig. 7 is a block diagram of an apparatus for acquiring text events according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Fig. 1 shows a flowchart of a method for acquiring a text event according to an embodiment of the present invention, and as shown in fig. 1, the method for acquiring a text event according to an embodiment of the present invention includes the following steps:
s101: performing word segmentation and labeling processing on the text to obtain a plurality of word segments corresponding to the text, wherein each word segment corresponds to one label.
For example, for a piece of text relating to a meeting or visit: "1 month 1 year in 2020, meeting Beijing with Li Si in three days. After the word segmentation processing is carried out on the text, a plurality of obtained word segments comprise '1 month and 1 day in 2020', 'Zhang three', 'Ling', 'Beijing', 'AND', 'Li four', 'carried out', 'met', 'meeting'. "etc., a plurality of participles constituting a participle sequence.
In a specific example, the tagging of the text includes, for example, performing part-of-speech tagging processing and named entity recognition processing on the text, where after the part-of-speech tagging processing is performed on the text, a plurality of part-of-speech tags corresponding to a plurality of participles in the text can be obtained. For example, for a piece of text relating to a meeting or visit: "1 month 1 year in 2020, meeting Beijing with Li Si in three days. After the part-of-speech tagging is performed on the text, the obtained part-of-speech tags comprise t, w, nr, v, ns, p, nr, v, u, v, w and the like, and the plurality of part-of-speech tags form a part-of-speech tag sequence.
After the text is subjected to named entity recognition processing, a plurality of named entity marks corresponding to a plurality of word segments of the text are obtained. For example, for a piece of text relating to a meeting or visit: "1 month 1 year in 2020, meeting Beijing with Li Si in three days. ", after the text is processed with named entity recognition, the obtained multiple named entity tags include TIM, O, PER, O, LOC, O, PER, O, etc., and the multiple named entity tags constitute a named entity tag sequence.
That is, the labeling process performed on the text includes a part-of-speech labeling process and a named entity recognition process, so that the tag includes a part-of-speech tag and a named entity tag.
S102: and acquiring a plurality of target participles which are associated with a plurality of element attributes in a preset DSL sequence from the plurality of participles according to the marks of the participles, wherein the DSL sequence comprises a plurality of event element configuration items, and the event element configuration items comprise the element attributes.
The dsl (domain specific language) is a domain specific language, and may be an application language customized for a specific domain based on an existing programming language.
In an embodiment of the present invention, the above-mentioned DSL sequence can be obtained in advance by using the above-mentioned application language, and the required DSL sequence can be created according to the event elements required to be obtained, that is: a pre-set DSL sequence.
For example: for a relatively regular rule, for example, for a type of text conforming to the same form or a similar form, information such as several keywords that need to be paid attention to is determined in advance from the text, and further, event elements that need to be obtained can be determined from the information such as the keywords, such as the text related to meeting or visiting, the 5 determined keywords are 1 month and 1 day of 2020, zhang, beijing, liquad, and meeting, and 5 event element configuration items are determined based on the 5 determined keywords or keywords similar to the 5 determined keywords, such as: subject, trigger action, object, time and place.
Of course, in the DSL sequence, an event type configuration item may also be included for the type of the obtained text event. As shown in fig. 2, a block diagram of a DSL including 1 event type profile and 5 time element profiles is shown. Such as: the first column represents event type configuration items, the second column to the sixth column represent event element configuration items, and for each configuration item, the first row represents a name, such as: the event type element (also referred to as element) name, subject element name, trigger action element name, object element name, event element name, and place element name of the event type configuration item are respectively from left to right in the first row, and correspondingly, the attribute corresponding to each element is respectively from left to right in the second row, such as: the attribute of the event type element may be content of the event type, the attribute of the body element may be a part-of-speech tag or a named entity tag of the body, the attribute of the time element may be a location identifier of the time information, and the attribute of the place element may be a location identifier of the place information.
As shown in fig. 2, the element names are expressed by letter combinations, for example, the event type element name is defined as "typ", the subject element name is defined as "sub", the trigger action element name is defined as "act", the object element name is defined as "obj", the time element name is defined as "tim", and the place element name is defined as "loc".
Part-of-speech tags are used to distinguish the part-of-speech to which the current vocabulary belongs, including but not limited to "nr" (names of people), "v" (verbs), "a" (adjectives), etc.
Named entity tags are used to identify which words belong to proper nouns, including but not limited to "PER" (person name), "LOC" (place name), "ORG" (organizational name), "O" (non-named entity), etc.
The action identifier of the trigger action is represented by a word or a phrase, and is used for distinguishing which type of action the verb corresponding to the trigger action belongs to, including but not limited to identifiers such as "goto" (travel), "speak" (expression), "meet" (meet-up).
The position identifier is represented by a character string and is used for distinguishing the way in which two elements, namely time and position, are searched in a word sequence consisting of a plurality of participles. The searching mode can be customized according to actual business needs, including but not limited to "here" (using information of the current position as information to be searched), "lsearch" (searching forward from the matched starting position), "rsearch" (searching backward from the matched ending position), "hereto" (searching a piece of text from the current position to the specified keyword position), and the like.
Based on the above-mentioned DSL syntax rules, creating a DSL sequence according to the event elements to be obtained, including: determining an event type configuration item and a plurality of event element configuration items of a DSL sequence according to an event element required to be obtained, wherein the event element configuration item also comprises element names respectively corresponding to a plurality of element attributes; and applying a preset DSL grammar rule to create a DSL sequence according to the event type configuration item and the plurality of event element configuration items.
The created expression of the DSL sequence may be parsed by the parser for syntax detection, and certainly, the DSL sequence may also be obtained by the parser.
As shown in connection with fig. 3, the DSL sequence and semantics can be converted to each other by a deserialization or serialization operation of the parser. Specifically, the parser may be responsible for checking the validity of the DSL language, parsing the semantics of the DSL language, and finally processing into a unified structure and format, which is an deserialization process. The parser may also support serialization functions, i.e., functions that convert from processed data to linguistic expressions, i.e.: the serialization process refers to a process of converting data into a byte sequence, such as converting data into a DSL sequence.
The syntax of the DSL is defined by rules, and therefore, the specific representation of the DSL sequence created based on the syntax includes, but is not limited to, plain text, JSON, or XML format representation. For example, for the text "1 month and 1 day of 2020, a meeting was made with beijing and lie four in three times. "taking the access event of the text as an example, the main body should be" Zhang three ", the part of speech is" nr ", and the named entity is marked as" PER "; the trigger action should be "meeting", and the action identifier is "meet"; the object shall be "Liqu", the part of speech is "nr", the named entity is labeled "PER"; the time is "1 month and 1 day 2020", and the location identifier is "lsearch"; the place should be "Beijing" and the location identifier "here".
The grammar rule definition includes:
(1) the identifiers are in the range of the respective limited closed sets, so that the identifiers can be found in the closed sets, otherwise, the identifiers are illegal;
(2) the event type and all 5 elements must be configured, otherwise, the event type and all 5 elements are illegal;
(3) the element is composed of an element name and an element attribute (mark or identifier), the element name and the element attribute are separated by a reverse oblique line, otherwise, the element is illegal;
(4) except the plaintext keywords, the mark or the identifier is one of a part of speech type, a named entity type, an action identifier and a position identifier;
(5) the number of marks must be at least 1, if there are more, comma separated, otherwise illegal;
(6) if the mark is a part-of-speech mark, the mark is a full lower case letter, and if the mark is a named entity mark, the mark is a full upper case letter, so that the mark is illegal if the mark cannot be found in a closed set.
Thus, based on the clear text format, the DSL sequence is expressed as:
typ/access _ sub/PER, ORG _ act/meet _ loc/her _ obj/PER, ORG _ tim/lsearch
Wherein all elements are separated by at least 1 consecutive underline, otherwise illegal; except the first event type, the events are separated by 1 backslash and are in a format of elements/marks, otherwise, the events are illegal.
Based on the JSON format, the DSL sequence can be converted into:
Figure BDA0002471624070000091
wherein, the JSON syntax is required to be met, otherwise, the JSON syntax is illegal.
Based on the XML format, the DSL sequence can be converted into:
Figure BDA0002471624070000092
wherein, it is required to conform to XML syntax, otherwise it is illegal.
After obtaining the preset DSL sequence, a plurality of target participles associated with a plurality of element attributes in the preset DSL sequence may be obtained from the plurality of participles according to the label of the participle, and as a specific example, a plurality of element extraction tasks one-to-one corresponding to a plurality of event element configuration items may be obtained according to a plurality of event element configuration items of the DSL sequence; and sequentially traversing the multiple participles according to the currently executed element extraction task in the multiple element extraction tasks so as to obtain a target participle associated with the currently executed element extraction task from the multiple participles.
Taking the DSL sequence including 1 event type configuration item and 5 event element configuration items as an example, correspondingly, the plurality of element extraction tasks include 5 extraction tasks corresponding to the 5 event element configuration items one by one, respectively: according to the example, a target participle related to the currently executed element extraction task is acquired from a plurality of participles according to the mark of the participle, and the method comprises the following steps:
if the currently executed element extraction task is a subject element extraction task, a trigger action element extraction task or an object element extraction task, the obtained target participle is as follows: matching the participles from the multiple participles according to the marks of the participles, and extracting the participles related to the element attributes in the task of the currently executed element;
if the currently executed element extraction task is a time element extraction task or a place element extraction task, the obtained target segmentation is as follows: and searching the participles searched from the plurality of participles according to the marks of the participles and the searching mode corresponding to the element attributes in the currently executed element extracting task.
It should be noted that the search mode can be customized according to the requirement. For example: customizing the search mode, wherein the search mode can be used for searching a text paragraph formed by a plurality of word segments; custom lookup approaches may also be used to match subject elements or object elements.
For example, as shown in fig. 4, a DSL sequence including 1 event type profile and 5 event element profiles is divided into a collective representation of a plurality of decimation tasks. Wherein, in the set, the first row represents event type information, namely: corresponding to the event type configuration item, the second line to the fifth line are a subject element extraction task, a trigger action element extraction task, an object element extraction task, a time element extraction task and a place element extraction task, respectively.
After the extraction task is obtained, for example, for the matching work of the subject, the trigger action, the object element and the like, the input word sequence (namely, the word sequence formed by a plurality of participles) is sequentially scanned, if the attribute of the current participle is matched with the attribute in the corresponding extraction task, the target participle is determined until the tail of the word sequence is scanned, and if the matching does not exist, the event described in the DSL sequence is judged not to be contained in the word sequence.
Each extraction task will perform the acquisition task separately. For example: and after the current extraction task is matched, switching to the next extraction task.
It should be noted that after all the extraction tasks are executed, at least the segmentation corresponding to the main element and the trigger action element can be matched, and the text event can be determined, otherwise, the corresponding text event cannot be obtained from the text.
And searching time elements, place elements, subject elements, object elements and the like which need to be searched for special requirements in the word sequence according to a searching mode expressed by the attributes in the corresponding extraction task.
For example: for the searching mode of taking the participles at the current position, the participles sequentially matched from the current position can be directly taken; for the forward or backward searching mode, sequentially scanning the word sequence along the specified direction by taking the current participle as a starting point to determine whether the corresponding participle can be acquired; for the mode of searching from the current position to the specified keyword, the method is customized based on the specific service requirement, and the main idea is as follows: and taking the position of the current word segmentation as a scanning starting position, and designating a plaintext character string as a scanning end point, and if the end point position is successfully matched in the scanning process, acquiring the part of text as an acquisition result of the element. Of course, more search modes can be customized according to requirements.
In the above example, after the text is subjected to word segmentation, part of speech tagging and named entity recognition processing, the obtained multiple word segmentation, part of speech tagging and named entity tagging may be subjected to preprocessing operations in advance, for example: each participle is corresponding to a part-of-speech tag and/or a named entity tag, as shown in fig. 5, for a word 1, a word 2, a word 3, etc., a part-of-speech tag 1, a part-of-speech tag 2, a part-of-speech tag 3, etc. are corresponding to each participle, and of course, a corresponding named entity tag 1, a named entity tag 2, a named entity tag 3, etc. may also be corresponding to each participle. Therefore, the matching accuracy can be improved in the subsequent matching.
In addition, in order to improve the success rate of matching, verbs and synonyms corresponding to various trigger action identifiers can be prepared in advance, so that matching items corresponding to the verbs in the word sequence can be enriched, and further, the success rate of matching can be improved.
S103: and obtaining a text event according to the plurality of target word segmentation.
Through the above S101 to S103, the corresponding text event can be determined from the text. As shown in fig. 6, the above-mentioned overall execution process of S101 to S103 is to read the current word sequence and the corresponding part-of-speech tag, named entity tag, etc; reading the expression of the DSL sequence, and analyzing the expression into an event element extraction task set; then, extracting a task for each event element, and matching a plurality of participles according to the attributes of the elements, namely: matching the elements; and finally, if time and place elements or subject elements, object elements and the like which need to be searched due to special requirements are searched, searching according to a specified searching mode. And finally, after the target word segmentation corresponding to the extraction task is obtained, the target word segmentation can be formed into a text event.
For example: the respectively obtained participles are: 1/2020 Zhang San, Beijing, Li Si and Hui Mian. Since the event type configuration item of the DSL sequence is the access type, the meaning of the text event composed by the above-mentioned word segmentation is known as: zusanli 1 month 1 in 2020, and has a meeting/visiting with Liqu in Beijing.
According to the method for acquiring the text event, the customized domain specific language is established according to the information required to be acquired from the text, and then the required elements can be acquired from the text according to the domain specific language so as to quickly and accurately acquire the text event.
Fig. 7 is a block diagram illustrating an apparatus for acquiring a text event according to an embodiment of the present invention. As shown in fig. 7, an apparatus for acquiring a text event according to an embodiment of the present invention includes: a preprocessing module 710, an acquisition module 720, and a text event determination module 730.
The preprocessing module 710 is configured to perform word segmentation and tagging on a text, where the tagging includes, for example, part-of-speech tagging, named entity tagging, and the like, so as to obtain a plurality of words corresponding to the text, where each word corresponds to one tag. The obtaining module 720 is configured to obtain, from the multiple participles, multiple target participles associated with multiple element attributes in a preset DSL sequence according to the tag of the participle, where the DSL sequence includes multiple event element configuration items, and the event element configuration items include the element attributes. The text event determining module 730 is configured to obtain a text event according to the plurality of target word segments.
In one embodiment of the present invention, further comprising: and a DSL sequence creation module (not shown in fig. 7) configured to determine, according to the event element to be obtained, an event type configuration item and a plurality of event element configuration items of the DSL sequence, where the event element configuration item further includes element names respectively corresponding to the plurality of element attributes, and apply a preset DSL syntax rule to create the DSL sequence according to the event type configuration item and the plurality of event element configuration items.
In an embodiment of the present invention, the obtaining module 720 is configured to obtain, according to a plurality of event element configuration items of the DSL sequence, a plurality of element extraction tasks that are one-to-one corresponding to the plurality of event element configuration items, and sequentially traverse the plurality of participles according to currently executed element extraction tasks in the plurality of element extraction tasks, so as to obtain, according to marks of the participles, a target participle associated with the currently executed element extraction task from the plurality of participles.
The device for acquiring the text event, provided by the embodiment of the invention, can be used for establishing the customized field specific language according to the information required to be acquired from the text, further acquiring the required elements from the text according to the field specific language so as to quickly and accurately acquire the text event, and has the advantages of simplifying manual operation, being high in text event acquisition speed and high in acquisition accuracy, and improving the text event acquisition experience.
It should be noted that a specific implementation manner of the apparatus for acquiring a text event in the embodiment of the present invention is similar to a specific implementation manner of the method for acquiring a text event in the embodiment of the present invention, and please refer to the description of the method part specifically, and details are not repeated in order to reduce redundancy.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which specifically includes the following components, with reference to fig. 8: a processor 801, a memory 802, a communication interface 803, and a communication bus 804;
the processor 801, the memory 802 and the communication interface 803 complete mutual communication through the communication bus 804; the communication interface 803 is used for realizing information transmission between devices;
the processor 801 is configured to call a computer program in the memory 802, and the processor implements all the steps of the above text event acquisition method when executing the computer program, for example, the processor implements the following steps when executing the computer program: performing word segmentation processing on a text to obtain a plurality of word segments corresponding to the text; obtaining a plurality of target participles which are associated with a plurality of element attributes in a preset DSL sequence from the plurality of participles, wherein the DSL sequence comprises a plurality of event element configuration items, and the event element configuration items comprise the element attributes; and obtaining the text event according to the target word segmentation.
Based on the same inventive concept, yet another embodiment of the present invention provides a non-transitory computer-readable storage medium, having stored thereon a computer program, which when executed by a processor implements all the steps of the above-mentioned text event acquisition method, for example, the processor implements the following steps when executing the computer program: performing word segmentation processing on the text to obtain a plurality of word segments corresponding to the text; obtaining a plurality of target participles associated with a plurality of element attributes in a preset DSL sequence from a plurality of participles, wherein the DSL sequence comprises a plurality of event element configuration items, and the event element configuration items comprise the element attributes; and obtaining the text event according to the target word segmentation.
In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, that is, physical devices, or virtual devices, may be located in the same place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be essentially or partially implemented in the form of software products, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the index monitoring method according to the embodiments or some parts of the embodiments.
In addition, in the present invention, terms such as "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Moreover, in the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Furthermore, in the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (7)

1. A method for acquiring a text event is characterized by comprising the following steps:
performing word segmentation and labeling processing on a text to obtain a plurality of word segments corresponding to the text, wherein each word segment corresponds to a label; the marks comprise part-of-speech marks and named entity marks;
according to the marks of the participles, acquiring a plurality of target participles which are associated with a plurality of element attributes in a preset DSL sequence from the plurality of participles;
obtaining the text event according to the target word segmentation;
wherein the DSL sequence is created according to the event element required to be acquired;
the creating the DSL sequence according to the event element required to be acquired comprises:
determining an event type configuration item and a plurality of event element configuration items of the DSL sequence according to an event element required to be obtained, wherein each event element configuration item comprises an element attribute and an element name corresponding to the element attribute; the event element configuration item includes: a subject element configuration item, a trigger action element configuration item, an object element configuration item, a time element configuration item and a place element configuration item; the event type configuration item, the subject element configuration item, the trigger action element configuration item, the object element configuration item, the time element configuration item and the place element configuration item respectively comprise element attributes which are content of an event type, a part of speech tag and a named entity tag, an action identifier, a part of speech tag and a named entity tag, a position identifier of time information and a position identifier of place information;
and applying a preset DSL grammar rule to create the DSL sequence according to the event type configuration item and the plurality of event element configuration items.
2. The method for acquiring text events according to claim 1, wherein the acquiring, from the plurality of segmented words, a plurality of target segmented words associated with a plurality of element attributes in a preset DSL sequence according to the token of the segmented word comprises:
obtaining a plurality of element extraction tasks which are in one-to-one correspondence with the plurality of event element configuration items according to the plurality of event element configuration items of the DSL sequence;
and traversing the multiple participles in sequence according to the currently executed element extraction tasks in the multiple element extraction tasks, so as to obtain target participles related to the currently executed element extraction tasks from the multiple participles according to the marks of the participles.
3. The method for acquiring a text event according to claim 2, wherein the plurality of element extraction tasks include a subject element extraction task, a trigger action element extraction task, an object element extraction task, a time element extraction task, and a location element extraction task, and the acquiring a target participle associated with the currently executed element extraction task from the plurality of participles according to a label of the participle includes:
if the currently executed element extraction task is a subject element extraction task, a trigger action element extraction task or an object element extraction task, the obtained target participle is as follows: according to the marks of the participles, matching the participles from the plurality of the participles and related to the element attributes in the currently executed element extraction task;
if the currently executed element extraction task is a time element extraction task or a place element extraction task, the obtained target word segmentation is as follows: according to the marks of the participles, searching the participles searched from the multiple participles according to the searching mode corresponding to the element attributes in the currently executed element extracting task;
the searching mode can be customized according to requirements.
4. An apparatus for acquiring a text event, comprising:
the system comprises a preprocessing module, a marking module and a display module, wherein the preprocessing module is used for performing word segmentation and marking processing on a text to obtain a plurality of word segmentations corresponding to the text, and each word segmentations corresponds to one mark; the marks comprise part-of-speech marks and named entity marks;
the obtaining module is used for obtaining a plurality of target participles which are associated with a plurality of element attributes in a preset DSL sequence from the plurality of participles according to the marks of the participles;
the text event determining module is used for obtaining the text event according to the target word segmentation;
wherein the DSL sequence is created according to the event element required to be acquired;
the device further comprises:
a DSL sequence creation module, configured to determine, according to an event element to be acquired, an event type configuration item and multiple event element configuration items of the DSL sequence; each event element configuration item comprises an element attribute and an element name corresponding to the element attribute; the event element configuration item includes: a subject element configuration item, a trigger action element configuration item, an object element configuration item, a time element configuration item and a place element configuration item; the event type configuration item, the subject element configuration item, the trigger action element configuration item, the object element configuration item, the time element configuration item and the place element configuration item respectively comprise element attributes which are content of an event type, a part of speech tag and a named entity tag, an action identifier, a part of speech tag and a named entity tag, a position identifier of time information and a position identifier of place information; and applying a preset DSL grammar rule to create the DSL sequence according to the event type configuration item and the event element configuration items.
5. The apparatus according to claim 4, wherein the obtaining module is configured to obtain, according to a plurality of event element configuration items of the DSL sequence, a plurality of element extraction tasks that are one-to-one corresponding to the plurality of event element configuration items, and sequentially traverse the plurality of participles according to currently executed element extraction tasks among the plurality of element extraction tasks, so as to obtain, according to a label of the participle, a target participle associated with the currently executed element extraction task from the plurality of participles.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for acquiring text events according to any one of claims 1 to 3 when executing the computer program.
7. A non-transitory computer-readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing the method for capturing text events according to any one of claims 1 to 3.
CN202010350403.4A 2020-04-28 2020-04-28 Text event acquisition method and device, electronic equipment and storage medium Active CN111597302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010350403.4A CN111597302B (en) 2020-04-28 2020-04-28 Text event acquisition method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010350403.4A CN111597302B (en) 2020-04-28 2020-04-28 Text event acquisition method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111597302A CN111597302A (en) 2020-08-28
CN111597302B true CN111597302B (en) 2022-02-15

Family

ID=72187705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010350403.4A Active CN111597302B (en) 2020-04-28 2020-04-28 Text event acquisition method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111597302B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597302B (en) * 2020-04-28 2022-02-15 北京中科智加科技有限公司 Text event acquisition method and device, electronic equipment and storage medium
CN112733544B (en) * 2021-04-02 2021-07-09 中国电子科技网络信息安全有限公司 Target character activity track information extraction method, computer device and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597302A (en) * 2020-04-28 2020-08-28 北京中科智加科技有限公司 Text event acquisition method and device, electronic equipment and storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9043206B2 (en) * 2010-04-26 2015-05-26 Cyberpulse, L.L.C. System and methods for matching an utterance to a template hierarchy
US9514281B2 (en) * 2011-05-03 2016-12-06 Graeme John HIRST Method and system of longitudinal detection of dementia through lexical and syntactic changes in writing
CN106959944A (en) * 2017-02-14 2017-07-18 中国电子科技集团公司第二十八研究所 A kind of Event Distillation method and system based on Chinese syntax rule
CN108920447B (en) * 2018-05-07 2022-08-05 国家计算机网络与信息安全管理中心 Chinese event extraction method for specific field
CN109524070B (en) * 2018-11-12 2021-03-23 北京懿医云科技有限公司 Data processing method and device, electronic equipment and storage medium
CN110008463B (en) * 2018-11-15 2023-04-18 创新先进技术有限公司 Method, apparatus and computer readable medium for event extraction
CN110162771B (en) * 2018-11-22 2023-08-29 腾讯科技(深圳)有限公司 Event trigger word recognition method and device and electronic equipment
CN109815481B (en) * 2018-12-17 2023-05-26 北京百度网讯科技有限公司 Method, device, equipment and computer storage medium for extracting event from text
CN110109672B (en) * 2019-04-17 2023-01-10 奇安信科技集团股份有限公司 Analysis processing method and device for expression

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597302A (en) * 2020-04-28 2020-08-28 北京中科智加科技有限公司 Text event acquisition method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111597302A (en) 2020-08-28

Similar Documents

Publication Publication Date Title
CN107291783B (en) Semantic matching method and intelligent equipment
CN113807098A (en) Model training method and device, electronic equipment and storage medium
CN112084381A (en) Event extraction method, system, storage medium and equipment
CN110609998A (en) Data extraction method of electronic document information, electronic equipment and storage medium
CN106934069B (en) Data retrieval method and system
US20180181544A1 (en) Systems for Automatically Extracting Job Skills from an Electronic Document
CN111191275A (en) Sensitive data identification method, system and device
CN112396049A (en) Text error correction method and device, computer equipment and storage medium
CN107977364B (en) Method and device for segmenting dimension language sub-words
CN111428480B (en) Resume identification method, device, equipment and storage medium
CN111046660B (en) Method and device for identifying text professional terms
CN110516251B (en) Method, device, equipment and medium for constructing electronic commerce entity identification model
CN110991163A (en) Document comparison analysis method and device, electronic equipment and storage medium
CN111597302B (en) Text event acquisition method and device, electronic equipment and storage medium
CN108763192B (en) Entity relation extraction method and device for text processing
CN112633001A (en) Text named entity recognition method and device, electronic equipment and storage medium
CN112380866A (en) Text topic label generation method, terminal device and storage medium
CN111178080B (en) Named entity identification method and system based on structured information
CN106372232B (en) Information mining method and device based on artificial intelligence
CN116595195A (en) Knowledge graph construction method, device and medium
CN107894976A (en) A kind of mixing language material segmenting method based on Bi LSTM
CN111898024A (en) Intelligent question and answer method and device, readable storage medium and computing equipment
CN116796726A (en) Resume analysis method, resume analysis device, terminal equipment and medium
CN115392251A (en) Real-time entity identification method for Internet financial service
CN115130475A (en) Extensible universal end-to-end named entity identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant