CN110674309A

CN110674309A - Method and device for extracting entity information of text data

Info

Publication number: CN110674309A
Application number: CN201910812660.2A
Authority: CN
Inventors: 邱伟豪
Original assignee: Nanjing Yiyi Yunda Data Technology Co Ltd; Nanjing Medical Duyun Medical Technology Co Ltd
Current assignee: Nanjing Yiyi Yunda Data Technology Co Ltd; Nanjing Medical Duyun Medical Technology Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2020-01-10
Anticipated expiration: 2039-08-30
Also published as: CN110674309B

Abstract

The invention is suitable for the technical field of natural language processing, and provides a method and a device for extracting entity information of text data, wherein the method comprises the following steps: extracting entity information in the text data to be processed according to the hierarchical entity word list and preset conditions; and splicing the extracted entity information to obtain spliced entity information. According to the method and the device, the extracted preset conditions are set according to the mutual relation among the entity information, and the entity information of the text data to be processed is extracted according to the preset conditions, so that the condition of calling omission is effectively avoided, the accuracy of the extraction of the entity information is ensured, and the effect of the extraction of the entity information is effectively improved.

Description

Method and device for extracting entity information of text data

Technical Field

The invention belongs to the technical field of natural language processing, and particularly relates to a method and a device for extracting entity information of text data.

Background

Unstructured data refers to data that is structured without a predefined data model or schema, and typical unstructured data includes text data in natural language such as text files, emails, social media, website data, mobile data, communication data, and the like. Taking the electronic medical record as an example, the electronic medical record records massive real and abundant clinical data, is a summary of long-term practice and experience of clinicians, and can be used for supporting clinical assistant decision-making, epidemiological statistics, clinical scientific research, drug research and development and the like. However, the electronic medical record contains a large amount of unstructured text data based on natural language, so that valuable related information can be extracted from the unstructured text data of the natural language, the unstructured text data needs to be structured, and entity extraction is an important step in the structured processing of the text data.

At present, when the entity extraction is carried out on the electronic medical record, the following two methods are often adopted: the first method is to match the complete part words as a whole in the text; the second method is to split the part words according to their characteristics, match them as multiple independent parts and then combine them. However, the first method is easy to cause call omission, resulting in low recall rate; the second method is easy to cause calling errors, so that the accuracy is low. Therefore, when entity extraction is carried out at present, recall rate and accuracy cannot be considered at the same time, and the effect of entity extraction is not ideal.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for extracting entity information of text data, so as to solve the problem that in the prior art, when entity extraction is performed, recall rate and accuracy cannot be considered at the same time, so that an effect of entity extraction is not ideal.

A first aspect of an embodiment of the present invention provides a method for extracting entity information of text data, including:

extracting entity information in the text data to be processed according to the hierarchical entity word list and preset conditions;

and splicing the extracted entity information to obtain spliced entity information.

A second aspect of the embodiments of the present invention provides an entity information extracting apparatus for text data, including:

the extraction module is used for extracting entity information in the text data to be processed according to the hierarchical entity word list and preset conditions;

and the splicing module is used for splicing the extracted entity information to obtain spliced entity information.

A third aspect of the embodiments of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the entity information extraction method for text data when executing the computer program.

A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, which stores a computer program, wherein the computer program, when executed by a processor, implements the steps of the entity information extraction method for text data described above.

Compared with the prior art, the embodiment of the invention has the following beneficial effects: according to the embodiment, the preset condition of extraction is set according to the mutual relation between the entity information, and the entity of the text data to be processed is extracted according to the preset condition, so that the condition of missing calling is effectively avoided, the accuracy of entity extraction is ensured, the entity extraction effect is effectively improved, and a good data basis is provided for subsequent structured processing.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a first schematic flow chart illustrating an implementation of a method for extracting entity information of text data according to an embodiment of the present invention;

fig. 2 is a schematic flow chart illustrating an implementation of the method for extracting entity information of text data according to the embodiment of the present invention;

fig. 3 is a schematic flowchart of constructing a hierarchical entity vocabulary in the method for extracting entity information of text data according to the embodiment of the present invention;

fig. 4 is a schematic flowchart illustrating a process of extracting entity information in text data to be processed in the method for extracting entity information of text data according to the embodiment of the present invention;

fig. 5 is a schematic flowchart illustrating splicing of extracted entities in the method for extracting entity information of text data according to the embodiment of the present invention;

fig. 6 is a first exemplary diagram of an entity information extraction apparatus for text data according to an embodiment of the present invention;

fig. 7 is a second exemplary diagram of an entity information extraction apparatus for text data according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a ranking module in an entity information extraction apparatus for text data according to an embodiment of the present invention;

fig. 9 is a schematic diagram of an extraction module in an entity information extraction apparatus for text data according to an embodiment of the present invention;

fig. 10 is a schematic diagram of a concatenation module in an entity information extraction apparatus for text data according to an embodiment of the present invention;

fig. 11 is a schematic diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Referring to fig. 1, a first aspect of the present invention provides a method for extracting entity information of text data, including:

step S30: and extracting entity information in the text data to be processed according to the hierarchical entity word list and preset conditions.

In consideration of the mutual relation between the entity information to be extracted, the entity information needs to be extracted according to certain preset conditions, on one hand, the condition of missed calling can be avoided, and on the other hand, the accuracy rate of the entity information extraction can be improved.

Referring to fig. 2, in an embodiment, step S30 is preceded by:

step S10: and grading the entity information according to the position relation to construct a graded entity word list.

In order to extract entity information in text data to be processed, a corresponding entity word list needs to be constructed for an entity to be extracted. For example, when the physical information to be extracted is a breast part, a physical vocabulary corresponding to the breast part needs to be constructed according to the positional relationship between the breast parts. Of course, the entity information to be extracted may be of other types, and is not limited to the above case.

Referring to fig. 3, in the present embodiment, the step of constructing the hierarchical entity vocabulary may include:

step S101: and classifying the entities according to a preset mode to obtain an entity classification table.

In this embodiment, the entity information to be extracted may be a body tissue, and the preset manner corresponding to this time is a lesion position corresponding to this body tissue determined according to medical knowledge. For example, when the physical information to be extracted is a breast part, the medical knowledge is a lesion position of the breast part, so that the lesion position of the breast part can be classified into laterality, a main anatomical position, a sub-anatomical position, a breast quadrant, a breast o' clock position, a depth and the like, wherein the laterality includes left, right, left, right and the like; the main anatomic position comprises breast, nipple, mammary gland, areola, breast nipple, nipple areola of nipple, mammary areola of breast, mammary gland nipple, mammary areola of breast, etc.; the secondary anatomical site comprises a posterior space, a duct, a gland, a leaflet, a lactiferous duct, a suspensory ligament, a lactiferous duct sinus and the like; the breast quadrant comprises an outer quadrant, an outer upper quadrant, an inner upper quadrant, an inner lower quadrant, an inner middle quadrant and the like; breast o ' clock positions include 9 o ' clock, 12 o ' clock, 5 o ' clock to 7 o ' clock, etc.; the depth includes the posterior, superficial, anterior, medial posterior, posterior medial band, middle, etc. The above lesion positions and the content corresponding to each lesion position are arranged to form an entity classification table, which is shown as the following table:

step S103: and sequencing the entity information in the entity classification table according to the position relation among the entity information so as to construct the hierarchical entity word table.

When the entity classification table is obtained, the mutual relationship between the entity information is not considered, so after the entity classification table is obtained, the entity information needs to be sorted according to the position relationship between the entity information, so that the hierarchical relationship between different entity information can be obtained. The hierarchical relationship among the entity information can reflect the logical relationship among the upper and lower level entities, and is beneficial to extracting the entity information subsequently.

For example, in the present embodiment, as shown in the above table, the upper and lower orders among the entities in the breast region are: the classification entity list comprises a lateral character, a main anatomical position, a secondary anatomical position, a breast quadrant, a breast clock position and a depth, so that the entity classification list can be rearranged according to an upper-level sequence and a lower-level sequence, and a classification entity word list is constructed as follows:

referring to fig. 4, in this embodiment, the step of extracting the entity information in the text data to be processed may include:

step S301: and acquiring text data to be processed.

The source of the text data to be processed can be set according to the requirement, for example, the source can be an electronic medical record from a hospital for desensitization processing, and the electronic medical record contains rich text data which comprises information corresponding to the entities in the constructed hierarchical entity vocabulary. For example, in one embodiment, the text data to be processed includes the following text:

the double breasts are symmetrical, the nipples are in the same level, no deviation, depression or fluid overflow is seen, the local skin is not red, swollen, ulcerated or exuded, and no tangerine peel sign or dimple sign is seen. The left breast can be palpated with a lump of about 5 x 3cm, no tenderness in the middle, smooth surface, clear border, pushing, no palpation with blood vessel pulsation and fluctuation. The left upper (11 o ' clock) and lower (6 o ' clock) mammary glands each had a mass of about 5 x 3cm, 1cm 0.5cm, the right upper (10 o ' clock, 11 o ' clock) and lower (6 o ' clock) mammary glands each had a mass of about 4 x 2cm, 2cm 1cm, all were neutral, no tenderness, smooth surfaces, clear borders, pushability, no pulsation of blood vessels, and motion. The left side can touch the swollen lymph nodes, and the rest of the armpit and the upper and lower parts of the clavicles on the two sides can not touch the obvious swollen lymph nodes. "

Step S303: and determining entity information to be extracted according to the hierarchical entity word list.

In step S10, a hierarchical entity vocabulary is obtained, and from the hierarchical entity vocabulary and the text data to be processed, it can be determined which entity information needs to be extracted. For example, from the above text data to be processed, it can be determined that left side, breast, upper quadrant, 11 o ' clock, lower quadrant, 6 o ' clock, right side, and 10 o ' clock are entity information to be extracted.

Step S305: and determining a preset condition for extracting the entity information to be extracted according to the logical relationship among the entity information to be extracted.

In this embodiment, the preset conditions include: according to the logical relationship among the entity information to be extracted, whether the entity information with the upper and lower level relationship should be in the same clause or the same whole sentence or not can be determined, and the sequence of the entity information with the upper and lower level relationship can be determined. Through analysis, in the text data to be processed, the lateral and main analysis positions should be in one clause, and the sequence can not be distinguished; the main anatomical position and the breast o' clock position are in a whole sentence, and the main anatomical position is in front, so that the positions of more than two entities in the text can be distinguished through punctuation marks. For example, the left (lateral) breast (main anatomic position), the right (lateral) breast (main anatomic position) need to be defined in one sentence, and the upper quadrant (11 o 'clock (breast o' clock position)) and the lower quadrant (6 o 'clock (breast o' clock position)) of the left (lateral) breast (main anatomic position) need only be in one whole sentence. Therefore, when extracting, the mutual relation among the entity information can be limited through punctuation marks, only the entity information meeting the limiting conditions can be extracted when processing the text data to be processed, and the entity information not meeting the limiting conditions can not be extracted even if the entity information is the entity information in the hierarchical entity word list.

Step S307: and processing the text data to be processed according to the preset condition so as to extract entity information in the text data to be processed.

After the preset condition is determined, the text data to be processed may be processed, for example, the entity information meeting the extraction condition may be marked to determine the entity information that may be extracted and the relationship between the entity information. For example, the text data to be processed may be processed as follows:

the double breasts are symmetrical, the nipples are in the same level, no deviation, depression or fluid overflow is seen, the local skin is not red, swollen, ulcerated or exuded, and no tangerine peel sign or dimple sign is seen. Left side of the^【1】Mammary gland^【2】One lump, about 5 x 3cm, no tenderness in the middle, smooth surface, clear border, no pushing, no blood vessel pulsation and wave motion. Left side of the^【3】Mammary gland^【4】Upper quadrant^【5】(11 o' clock)^【6】) And lower quadrant^【7】(6 o' clock)^【8】) Each property and each herb should be about5 x 3cm, 1cm x 0.5cm, right side^【9】Mammary gland^【10】Upper quadrant^【11】(10 o' clock)^【12】11 o' clock^【13】) And lower quadrant^【14】(6 o' clock)^【15】) Each of the palpable and palpable masses, about 4 x 2cm, 2cm x 2cm, 2 x 1cm, all had no tenderness, smooth surface, clear border, pushability, no palpable blood vessel pulsation and pulsatility. The left side can touch the swollen lymph nodes, and the rest of the armpit and the upper and lower parts of the clavicles on the two sides can not touch the obvious swollen lymph nodes. "

At this time, determining that the entity information can be extracted includes: left side of the^【1】Mammary gland disease and its preparation method^【2】(ii) a Left side of the^【3】Mammary gland disease and its preparation method^【4】Upper quadrant of the tube^【5】11 o' clock^【6】Lower quadrant^【7】6 o' clock^【8】(ii) a Right side of the^【9】Mammary gland disease and its preparation method^【10】Upper quadrant of the tube^【11】10 o' clock^【12】11 o' clock^【13】Lower quadrant^【14】6 o' clock^【15】。

After the entity information is extracted, the extracted entity information needs to be further processed.

Referring to fig. 1, step S50: and splicing the extracted entity information to obtain spliced entity information.

As can be seen from step S30, after the extractable entity information is obtained, in order to obtain more complete entity information for subsequent use in the structuring process, the extracted entity information needs to be spliced according to the relationship between the entity information so as to obtain the complete meaning of the extracted entity information. Referring to fig. 5, the step of splicing the extracted entity information may include:

step S501: and determining the mutual relation between the extracted entity information according to the logical relation between the extracted entity information.

For example, in the extracted entity information, the left side^【1】(lateral) and mammary gland^【2】(main splitting position) belongs to the same clause, so splicing can be carried out; left side of the^【3】(laterality) of,Mammary gland^【4】(main dissection position), upper quadrant^【5】(quadrant of breast) and 11 o' clock^【6】(breast o' clock position) in the same sentence, so splicing can be done; left side of the^【3】(lateral), mammary gland^【4】(main dissection position), lower quadrant^【7】(quadrant of breast) and 6 o' clock^【8】(breast o' clock position) in the same sentence, so splicing can be done; right side of the^【9】(lateral), mammary gland^【10】(main dissection position), upper quadrant^【11】(quadrant of breast) and 10 o' clock^【12】(breast o' clock position) in the same sentence, so splicing can be done; right side of the^【9】(lateral), mammary gland^【10】(main dissection position), upper quadrant^【11】(quadrant of breast) and 10 o' clock^【13】(breast o' clock position) in the same sentence, so splicing can be done; right side of the^【9】(lateral), mammary gland^【10】(main dissection position), lower quadrant^【14】(quadrant of breast) and 6 o' clock^【15】The (breast o' clock position) is in the same sentence and therefore stitching can be performed.

Step S503: and splicing the extracted entity information according to the mutual relation, and taking the spliced entity information as a whole to obtain the spliced entity information.

In this embodiment, by splicing at least two pieces of entity information having a mutual relationship, spliced entity information can be obtained. For example, after the extracted entity information is spliced, the obtained complete entity information includes: left and mammary gland, left upper mammary gland quadrant 11 o 'clock, left lower mammary gland quadrant 6 o' clock, right upper mammary gland quadrant 10 o 'clock, right lower mammary gland quadrant 6 o' clock.

Further, the entity information after splicing as a whole is related to other subsequent entity information, for example, the upper 11 o 'clock of the left mammary gland is related to the tumor, and the lower 6 o' clock of the left mammary gland is related to the tumor.

Compared with the prior art, the method for extracting the entity information of the text data provided by the embodiment has the beneficial effects that at least:

at present, when the entity information of the electronic medical record is extracted, the following two methods are usually adopted, wherein the first method is to match complete part words in a text as a whole, so that although the accuracy of the entity information extraction can be ensured, the situation of missed recall is easy to occur because the distribution situation of each entity information in the text is more dispersed; the second method is to split the part words according to their characteristics, match them as multiple independent parts and then combine them, so that although it can ensure the extraction of a large amount of entity information, it is easy to have wrong calling, resulting in low accuracy.

The embodiment provides a brand new entity information extraction method, which sets the preset conditions for extraction according to the mutual relationship between the entity information, and extracts the entity information of the text data to be processed according to the preset conditions, thereby not only effectively avoiding the condition of missing calls, but also ensuring the accuracy of the extraction of the entity information, effectively improving the effect of the extraction of the entity information, and providing a good data base for the subsequent structural processing.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Referring to fig. 6, a second aspect of the present embodiment provides an entity information extracting apparatus for text data, including an extracting module 63 and a splicing module 65. The extraction module 63 is configured to extract entity information in the text data to be processed according to the hierarchical entity vocabulary and preset conditions; the splicing module 65 is configured to splice the extracted entity information to obtain spliced entity information.

Referring to fig. 7, in an embodiment, the entity information extracting apparatus for text data further includes a grading module 61, where the grading module 61 is configured to grade the entity information according to the position relationship to construct a graded entity vocabulary.

Referring to fig. 8, in an embodiment, the classification module 61 includes a classification table obtaining unit 611 and a vocabulary obtaining unit 613, wherein the classification table obtaining unit 611 is configured to classify the entity information according to a predetermined manner to obtain an entity classification table; the vocabulary acquiring unit 613 is configured to sort the entity information in the entity classification table according to the position relationship between the entity information, so as to construct the hierarchical entity vocabulary.

Referring to fig. 9, the extracting module 63 includes a data acquiring unit 631, an extracting entity determining unit 633, a preset condition acquiring unit 635, and an extracting unit 637, where the data acquiring unit 631 is configured to acquire text data to be processed; the extracted entity determining unit 633 is used for determining entity information to be extracted according to the hierarchical entity vocabulary; the preset condition obtaining unit 635 is configured to determine a preset condition for extracting the entity information according to a logical relationship between the entity information that needs to be extracted; the extracting unit 637 is configured to process the text data to be processed according to the preset condition, so as to extract entity information in the text data to be processed.

Referring to fig. 10, the splicing module 65 includes an interrelationship obtaining unit 651 and a splicing unit 653, where the interrelationship obtaining unit 651 is configured to determine the interrelationship between the extracted entity information according to the logical relation between the extracted entity information; the splicing unit 653 is configured to splice the extracted entity information according to the correlation, and take the spliced entity information as a whole to obtain spliced entity information.

Fig. 11 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 11, the terminal device 7 of this embodiment includes: a processor 70, a memory 71 and a computer program 72, such as an entity information extraction method program of text data, stored in said memory 71 and executable on said processor 70. The processor 70, when executing the computer program 72, implements the steps in the above-mentioned embodiments of the entity information extraction method, such as the steps S10 to S50 shown in fig. 1 to 5. Alternatively, the processor 70, when executing the computer program 72, implements the functions of each module/unit in each device embodiment described above, for example, the functions of the modules 61 to 65 shown in fig. 6 to 10.

Illustratively, the computer program 72 may be partitioned into one or more modules/units that are stored in the memory 71 and executed by the processor 70 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 72 in the terminal device 7.

The terminal device 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device 7 may include, but is not limited to, a processor 70 and a memory 71. It will be understood by those skilled in the art that fig. 11 is only an example of the terminal device 7, and does not constitute a limitation to the terminal device 7, and may include more or less components than those shown, or combine some components, or different components, for example, the terminal device 7 may further include an input-output device, a network access device, a bus, etc.

The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. The memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used for storing the computer program 72 and other programs and data required by the terminal device. The memory 71 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed terminal device and method may be implemented in other ways. For example, the above-described terminal device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical function division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A method for extracting entity information of text data is characterized by comprising the following steps:

2. The method for extracting entity information of text data according to claim 1, wherein before the step of extracting entity information in the text data to be processed according to the hierarchical entity vocabulary and the preset condition, the method further comprises:

and classifying the entity information according to the position relation to construct the classified entity word list.

3. The method for extracting entity information of text data as claimed in claim 2, wherein said classifying the entity information according to the position relationship to construct a classified entity vocabulary comprises:

classifying the entity information according to a preset mode to obtain an entity classification table;

and sequencing the entity information in the entity classification table according to the position relation among the entity information so as to construct the hierarchical entity word table.

4. The method as claimed in claim 3, wherein the predetermined manner includes a lesion location corresponding to the entity information.

5. The method for extracting entity information of text data according to any one of claims 1 to 4, wherein the extracting entity information in the text data to be processed according to the hierarchical entity vocabulary and the preset condition comprises:

determining entity information to be extracted according to the hierarchical entity word list;

determining a preset condition for extracting the entity information to be extracted according to the logical relationship among the entity information to be extracted;

and processing the text data to be processed according to the preset condition so as to extract entity information in the text data to be processed.

6. The entity information extracting method of text data as claimed in claim 5, wherein the preset condition includes: determining whether the entity information with the superior-inferior relation should be in the same clause or the same whole sentence, and determining the sequence of the entity information with the superior-inferior relation.

7. The method for extracting entity information of text data according to claim 6, wherein the splicing the extracted entity information to obtain spliced entity information comprises:

determining the mutual relation between the extracted entity information according to the logical relation between the extracted entity information;

and splicing the extracted entity information according to the mutual relation, and taking the spliced entity information as a whole to obtain the spliced entity information.

8. An entity information extraction apparatus of text data, characterized by comprising:

9. A terminal device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the entity information extraction method of text data according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the entity information extraction method of text data according to any one of claims 1 to 7.