CN112948347A - Text data structuring processing method, device, equipment and storage medium - Google Patents

Text data structuring processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN112948347A
CN112948347A CN201911265046.5A CN201911265046A CN112948347A CN 112948347 A CN112948347 A CN 112948347A CN 201911265046 A CN201911265046 A CN 201911265046A CN 112948347 A CN112948347 A CN 112948347A
Authority
CN
China
Prior art keywords
structured
data
text data
pieces
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911265046.5A
Other languages
Chinese (zh)
Inventor
郝东林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yiyiyun Technology Co ltd
Original Assignee
Beijing Yiyiyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yiyiyun Technology Co ltd filed Critical Beijing Yiyiyun Technology Co ltd
Priority to CN201911265046.5A priority Critical patent/CN112948347A/en
Publication of CN112948347A publication Critical patent/CN112948347A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a text data structuring processing method, a text data structuring processing device, text data structuring processing equipment and a storage medium. The method comprises the following steps: determining a preset structured rule; according to the preset structuring rule, carrying out structuring processing on a plurality of text data to generate a plurality of pieces of structured data corresponding to the plurality of text data; determining the accuracy and recall rate of the plurality of pieces of structured data according to the plurality of pieces of structured data and a plurality of preset reference structured data; and when the accuracy rate is greater than or equal to a first preset threshold value and the recall rate is greater than or equal to a second preset threshold value, determining that the text data are subjected to structured processing by adopting the preset structured rule. According to the text data structuring processing method provided by the invention, the structuring rule with high processing calling rate can be determined, so that the requirements of automatically executing the structuring processing flow of mass data, ensuring the data structuring quality and efficiency and the like are met.

Description

Text data structuring processing method, device, equipment and storage medium
Technical Field
The invention relates to the field of text processing, in particular to a text data structured processing method and device, electronic equipment and a computer readable storage medium.
Background
In the information age, with the high-speed expansion of digital information quantity, the huge quantity and complexity of original data are increased day by day, and great technical difficulty is brought to the direct processing and application of data. Thus, each field in daily life requires a very large number of data structuring tasks. Taking the medical field as an example, the patient medical history records, family history records and the like stored in each hospital database contain massive text information, and the processing flow involved in the data structuring task is usually more.
At this stage, the execution of data structuring tasks is extremely dependent on human involvement. Each processing flow needs a large number of operators with corresponding roles, and after the previous processing flow is completed, the operators need to inform the operators of the next processing flow in a form of oral or instant messages, so that the serious defects that an interaction link is easy to make mistakes, the original data is not updated timely, intermediate data and operation records cannot be stored, the data structuring quality and the efficiency are low and the like exist.
The above information disclosed in this background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
In view of the problems of high dependency on human labor, poor quality of data structuring, low efficiency and the like of the current data structuring operation, the invention provides a text data structuring processing method, a text data structuring processing device, an electronic device and a computer readable storage medium.
Additional features and advantages of the invention will be set forth in the detailed description which follows, or may be learned by practice of the invention.
According to an aspect of the present invention, there is provided a text data structuring processing method, including: determining a preset structured rule; according to the preset structuring rule, carrying out structuring processing on a plurality of text data to generate a plurality of pieces of structured data corresponding to the plurality of text data; determining the accuracy and recall rate of the plurality of pieces of structured data according to the plurality of pieces of structured data and a plurality of preset reference structured data; and when the accuracy rate is greater than or equal to a first preset threshold value and the recall rate is greater than or equal to a second preset threshold value, determining that the text data are subjected to structured processing by adopting the preset structured rule.
According to an embodiment of the present invention, when the accuracy is smaller than the first preset threshold, or the recall rate is smaller than the second preset threshold, the method further includes: and adjusting the rule of the preset structured rule so that the accuracy of the plurality of pieces of structured data generated according to the adjusted structured rule is greater than or equal to the first preset threshold value, and the recall rate is greater than or equal to the second preset threshold value.
According to an embodiment of the present invention, determining that the text data is structured by using the preset structuring rule includes: acquiring a plurality of text data to be structured; and carrying out the structuring processing on the plurality of text data to be structured according to the preset structuring rule.
According to an embodiment of the present invention, before performing the structuring process on the plurality of pieces of text data to be structured, the method further includes: at least one of the following preprocessing operations is carried out on the text data to be structured: removing repeated data in the text data to be structured, segmenting fields in the text data to be structured, and counting word frequency information of the text data to be structured.
According to an embodiment of the present invention, the preset structuring rule includes: at least two designated entity objects and designated relationships between the designated entity objects.
According to an embodiment of the present invention, the structuring the plurality of text data according to the preset structuring rule, and the generating the plurality of structured data corresponding to the plurality of text data includes: the following operations are performed for each piece of text data: respectively identifying a matching field corresponding to each specified entity object in the text data according to the preset structured rule; judging whether the matching field meets the specified relation between corresponding specified entity objects; when the matching fields meet the specified relation between corresponding specified entity objects, generating structured data containing the matching fields; wherein the match field comprises: a field identical to the designated entity object and/or a field identical to an entity object lower than the designated entity object.
According to an embodiment of the present invention, determining the accuracy and the recall ratio of the pieces of structured data includes: determining the same data amount in the plurality of pieces of structured data corresponding to the plurality of pieces of reference structured data; determining the quotient of the data amount and the total amount of the plurality of pieces of structured data as the accuracy of the plurality of pieces of structured data; determining a quotient of the data amount and a total amount of the plurality of pieces of reference structured data as a recall rate of the plurality of pieces of structured data.
According to another aspect of the present invention, there is provided a text data structuring processing apparatus comprising: the rule determining module is used for determining a preset structured rule; the data processing module is used for carrying out structuring processing on a plurality of pieces of text data according to the preset structuring rule to generate a plurality of pieces of structured data corresponding to the plurality of pieces of text data; the result comparison module is used for determining the accuracy and the recall rate of the plurality of pieces of structured data according to the plurality of pieces of structured data and a plurality of preset reference structured data; and the rule judging module is used for determining that the text data is subjected to structured processing by adopting the preset structured rule when the accuracy is greater than or equal to a first preset threshold and the recall rate is greater than or equal to a second preset threshold.
According to still another aspect of the present invention, there is provided an electronic apparatus including: the text data structuring processing method comprises a memory, a processor and executable instructions stored in the memory and executable in the processor, wherein the processor executes the executable instructions to realize the text data structuring processing method.
According to yet another aspect of the present invention, there is provided a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement any of the text data structuring methods described above.
According to the text data structured processing method provided by the invention, the result obtained by processing the test data by the predefined rule is compared with the result obtained by manually processing the same test data, so that the structured rule with high processing calling rate can be determined, and the structured processing method meets the task requirements of automatically executing the structured processing flow of mass text data, improving the data structured quality and efficiency and the like.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
FIG. 1 is a flow diagram illustrating a text data structuring process according to an exemplary embodiment.
FIG. 2 is a flow diagram illustrating a method of processing text data according to preset structured rules in accordance with an exemplary embodiment.
FIG. 3 is a flow diagram illustrating a method of determining structured data recall according to an exemplary embodiment.
Fig. 4 is a block diagram illustrating a text data structuring processing device according to an exemplary embodiment.
Fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment.
FIG. 6 is a schematic diagram illustrating a computer-readable storage medium according to an example embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, apparatus, steps, and so forth. In other instances, well-known structures, methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
Further, in the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically defined otherwise. The symbol "/" generally indicates that the former and latter associated objects are in an "or" relationship. The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.
As mentioned above, the performance of data structuring tasks is currently very dependent on human involvement and maintenance. Therefore, the invention provides a text data structured processing method, which determines the structured rule with high processing precision by comparing the result obtained by processing the test data by the predefined rule with the result obtained by manually processing the same test data, is used for realizing the automatic structured processing of mass text data, obviously reduces the cost of manpower operation and maintenance, avoids the possibility of errors in the interaction link, and simultaneously ensures the quality and efficiency of data structuring.
FIG. 1 is a flow diagram illustrating a text data structuring process according to an exemplary embodiment. The text data structuring processing method shown in fig. 1 can be implemented in a medical text structuring task management platform, for example.
Referring to fig. 1, a text data structuring processing method 10 includes:
in step S102, a preset structuring rule is determined.
In step S104, a plurality of pieces of text data are structured according to a preset structuring rule, and a plurality of pieces of structured data corresponding to the plurality of pieces of text data are generated. In view of the above, taking the medical field as an example, the acquired pieces of text data may be from patient medical history records, family history records, and the like stored in any of a plurality of hospital databases, and may be text data including diagnosis information of tumors, masses, cancers, and the like, for example. Medical personnel can establish a text admittance task to realize the acquisition of text data: medical personnel firstly specify a position source of original text data (such as databases of a plurality of hospitals), formulate an extraction rule by writing SQL (Structured Query Language) sentences, and send the SQL sentences to the specified databases of the plurality of hospitals for data extraction, for example, extract the original text data "a patient has stomach cancer in 2017-01-01 and is hospitalized and treated". The sensitive information such as name, identification number and the like in the original text data can be removed in various ways such as table lookup, regular matching and the like after the data extraction is completed, for example, the original text data becomes "the patient has stomach cancer in 2017-01-01 and is hospitalized and treated" after the desensitization.
It should be noted that the present invention is not limited to the medical field, the text data type, and the data extraction method.
In some embodiments, pre-setting the structured rules comprises: at least two designated entity objects and designated relationships between the designated entity objects.
Correspondingly, in some embodiments, as shown in fig. 2, the step S104 may further include: the following operations are performed for each piece of text data:
in step S1042, matching fields corresponding to each designated entity object in the text data are respectively identified according to a preset structuring rule.
Wherein the match field includes: the same field as the designated entity object and/or the same field as a lower entity object of the designated entity object. For example, if "tumor" is medically specified as a solid object, "lung cancer", "stomach cancer", etc. belong to the lower solid object than the "tumor" specified as the solid object.
In step S1044, it is determined whether the matching field satisfies the specified relationship between the corresponding specified entity objects.
In step S1046, when the matching field satisfies the specified relationship between the corresponding specified entity objects, structured data containing the matching field is generated.
For example, the preset structuring rule is: "specify entity object A: 1/2 date of regularization of the XXX-YY-ZZ form; specifying entity object B: a tumor; specifying entity object C: words such as none, no, etc. which represent negative meanings; specifying the relationship: the designated entity object A and the designated entity object B are positioned in the same sentence, and the designated entity object C does not appear in 5 characters before the designated entity object B; and (3) outputting: { date: a designated entity object a and/or a lower entity object of the designated entity object a, a disease: the designated entity object B and/or the lower entity object of the designated entity object B } ", the desensitized text data" patient has gastric cancer detected in 2017-01-01, hospitalized "structured data is generated after structured processing: { date: 2017-01-01, diseases: gastric cancer }.
For another example, the preset structuring rule is: "specify entity object A: a relative; specifying entity object B: diseases; specifying the relationship: the designated entity object B appears in 10 characters after the designated entity object A; and (3) outputting: { relatives: a designated entity object a and/or a lower entity object of the designated entity object a, a disease: the designated entity object B and/or the lower entity object of the designated entity object B } ", the text data" sister has diabetes "is structured to generate structured data: { relatives: sister, disease: diabetes }.
In step S106, the accuracy and recall of the plurality of pieces of structured data are determined according to the plurality of pieces of structured data and the plurality of preset reference structured data.
The reference pieces of structured data may be pieces of structured data obtained by manually labeling the text data.
In some embodiments, as shown in fig. 3, step S106 may further include:
in step S1062, the same amount of data of the plurality of pieces of structured data as the plurality of pieces of reference structured data is determined.
In step S1064, the quotient of the data amount and the total amount of the plurality of pieces of structured data is determined as the accuracy of the plurality of pieces of structured data.
In step S1066, the quotient of the data amount and the total amount of the plurality of pieces of reference structured data is determined as a recall rate of the plurality of pieces of structured data.
In step S108, when the accuracy is greater than or equal to the first preset threshold and the recall rate is greater than or equal to the second preset threshold, it is determined that the text data is structured by using the preset structuring rule.
In some embodiments, if the accuracy and/or the recall ratio determined in steps S1062 to S1066 are less than the corresponding preset threshold, the structural rule needs to be improved or modified according to the comparison result between the multiple pieces of structural data and the multiple pieces of reference structural data, and steps S102 to S106 are repeated until the accuracy and the recall ratio are not less than the corresponding preset threshold.
The structured rules after the comparison and verification can be continuously applied to the medical text structured task management platform, and can also be issued to the designated hospitals by the medical text structured task management platform, so that the internal systems of all hospitals can also realize the batch production of structured data.
As described above, specifically, in some embodiments, the pieces of text data acquired in step S104 may be text data extracted from the pieces of text data to be structured. Correspondingly, step S108 may further include: acquiring a plurality of text data to be structured; and carrying out structuring processing on the plurality of text data to be structured according to a preset structuring rule.
In addition, in some embodiments, before performing the structuring process on the plurality of pieces of text data to be structured, the method 10 may further include: at least one of the following preprocessing operations is carried out on a plurality of pieces of text data to be structured: removing repeated data in the text data to be structured, segmenting fields in the text data to be structured, and counting word frequency information of the text data to be structured so as to generate data overviews such as original text data source ratios and the like for medical staff to check.
According to the text data structured processing method provided by the embodiment of the invention, the result obtained by processing the test data by the predefined rule is compared with the result obtained by manually processing the same test data, so that the structured rule with high processing calling rate can be determined, and the structured processing method can meet the task requirements of automatically executing the structured processing flow of mass text data, improving the data structured quality and efficiency and the like.
It should be clearly understood that the present disclosure describes how to make and use particular examples, but the principles of the present disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.
Those skilled in the art will appreciate that all or part of the steps implementing the above embodiments are implemented as computer programs executed by a CPU. The computer program, when executed by the CPU, performs the functions defined by the method provided by the present invention. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.
Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the method according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.
Fig. 4 is a block diagram illustrating a text data structuring processing device according to an exemplary embodiment.
Referring to fig. 4, the text data structuring processing device 20 includes: a rule determination module 202, a data processing module 204, a result comparison module 206, and a rule determination module 208.
The rule determining module 202 is configured to determine a preset structured rule.
The data processing module 204 is configured to perform structuring processing on the multiple pieces of text data according to a preset structuring rule, and generate multiple pieces of structured data corresponding to the multiple pieces of text data. The plurality of text data may be text data extracted from a plurality of text data to be structured.
In some embodiments, pre-setting the structured rules may include: at least two designated entity objects and designated relationships between the designated entity objects. Correspondingly, the data processing module 204 may further include: an entity identification unit, a relation judgment unit and a data generation unit.
The entity identification unit is used for respectively identifying the matching fields corresponding to each specified entity object in each piece of text data according to a preset structured rule. The match field may include: the same field as the designated entity object and/or the same field as a lower entity object of the designated entity object.
The relation judging unit is used for judging whether the matching fields meet the specified relation between the corresponding specified entity objects.
The data generation unit is used for generating the structured data containing the matching fields when the matching fields meet the specified relation between the corresponding specified entity objects.
The result comparing module 206 is configured to determine an accuracy and a recall rate of the plurality of pieces of structured data according to the plurality of pieces of structured data and a plurality of preset reference pieces of structured data.
The reference pieces of structured data may be pieces of structured data generated by manually labeling the text data.
In some embodiments, the result comparison module 206 may further include: the device comprises a first determining unit, a second determining unit and a third determining unit.
The first determining unit is used for determining the same data quantity corresponding to the reference structured data in the structured data.
The second determining unit is used for determining the quotient of the data quantity and the total quantity of the plurality of pieces of structured data as the accuracy of the plurality of pieces of structured data.
The third determining unit is used for determining the quotient of the data quantity and the total quantity of the plurality of pieces of reference structured data as the recall rate of the plurality of pieces of structured data.
The rule determining module 208 is configured to determine to perform structural processing on the text data by using a preset structural rule when the accuracy is greater than or equal to a first preset threshold and the recall rate is greater than or equal to a second preset threshold.
According to the text data structured processing device provided by the embodiment of the invention, the result obtained by processing the test data by the predefined rule is compared with the result obtained by manually processing the same test data, so that the structured rule with high processing calling rate can be determined, and the structured processing device can meet the task requirements of automatically executing the structured processing flow of mass text data, improving the data structured quality and efficiency and the like.
It is noted that the block diagrams shown in the above figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
Fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment. It should be noted that the electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 5, the electronic device 600 is embodied in the form of a general-purpose computer device. The components of the electronic device 600 include: at least one Central Processing Unit (CPU)601, which may perform various appropriate actions and processes according to program code stored in a Read Only Memory (ROM)602 or loaded from at least one storage unit 608 into a Random Access Memory (RAM) 603.
In particular, according to an embodiment of the present invention, the program code may be executed by the central processing unit 601, such that the central processing unit 601 performs the steps according to various exemplary embodiments of the present invention described in the above-mentioned method embodiment section of the present specification. For example, the central processing unit 601 may perform the steps as shown in fig. 1 to 3.
In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input unit 606 including a keyboard, a mouse, and the like; an output unit 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage unit 608 including a hard disk and the like; and a communication unit 609 including a network interface card such as a LAN card, a modem, or the like. The communication unit 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage unit 608 as necessary.
FIG. 6 is a schematic diagram illustrating a computer-readable storage medium according to an example embodiment.
Referring to fig. 6, a program product 700 configured to implement the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable medium carries one or more programs which, when executed by a device, cause the computer readable medium to carry out the functions as shown in figures 1 to 3.
Exemplary embodiments of the present invention are specifically illustrated and described above. It is to be understood that the invention is not limited to the precise construction, arrangements, or instrumentalities described herein; on the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A text data structuring processing method is characterized by comprising the following steps:
determining a preset structured rule;
according to the preset structuring rule, carrying out structuring processing on a plurality of text data to generate a plurality of pieces of structured data corresponding to the plurality of text data;
determining the accuracy and recall rate of the plurality of pieces of structured data according to the plurality of pieces of structured data and a plurality of preset reference structured data;
and when the accuracy rate is greater than or equal to a first preset threshold value and the recall rate is greater than or equal to a second preset threshold value, determining that the text data are subjected to structured processing by adopting the preset structured rule.
2. The method of claim 1, wherein when the accuracy is less than the first preset threshold or the recall is less than the second preset threshold, the method further comprises:
and adjusting the rule of the preset structured rule so that the accuracy of the plurality of pieces of structured data generated according to the adjusted structured rule is greater than or equal to the first preset threshold value, and the recall rate is greater than or equal to the second preset threshold value.
3. The method of claim 1, wherein determining to perform structuring processing on the text data by using the preset structuring rule comprises:
acquiring a plurality of text data to be structured;
and carrying out the structuring processing on the plurality of text data to be structured according to the preset structuring rule.
4. The method according to claim 3, wherein before the structuring process is performed on the plurality of pieces of text data to be structured, the method further comprises:
at least one of the following preprocessing operations is carried out on the text data to be structured: removing repeated data in the text data to be structured, segmenting fields in the text data to be structured, and counting word frequency information of the text data to be structured.
5. The method of claim 1, wherein the preset structuring rules comprise: at least two designated entity objects and designated relationships between the designated entity objects.
6. The method according to claim 5, wherein the step of structuring a plurality of text data according to the preset structuring rule, and the step of generating a plurality of pieces of structured data corresponding to the plurality of text data comprises: the following operations are performed for each piece of text data:
respectively identifying a matching field corresponding to each specified entity object in the text data according to the preset structured rule;
judging whether the matching field meets the specified relation between corresponding specified entity objects;
when the matching fields meet the specified relation between corresponding specified entity objects, generating structured data containing the matching fields;
wherein the match field comprises: a field identical to the designated entity object and/or a field identical to an entity object lower than the designated entity object.
7. The method of claim 1, wherein determining an accuracy rate and a recall rate of the plurality of pieces of structured data comprises:
determining the same data amount in the plurality of pieces of structured data corresponding to the plurality of pieces of reference structured data;
determining the quotient of the data amount and the total amount of the plurality of pieces of structured data as the accuracy of the plurality of pieces of structured data;
determining a quotient of the data amount and a total amount of the plurality of pieces of reference structured data as a recall rate of the plurality of pieces of structured data.
8. A text data structuring processing device, comprising:
the rule determining module is used for determining a preset structured rule;
the data processing module is used for carrying out structuring processing on a plurality of pieces of text data according to the preset structuring rule to generate a plurality of pieces of structured data corresponding to the plurality of pieces of text data;
the result comparison module is used for determining the accuracy and the recall rate of the plurality of pieces of structured data according to the plurality of pieces of structured data and a plurality of preset reference structured data;
and the rule judging module is used for determining that the text data is subjected to structured processing by adopting the preset structured rule when the accuracy is greater than or equal to a first preset threshold and the recall rate is greater than or equal to a second preset threshold.
9. An electronic device, comprising: memory, processor and executable instructions stored in the memory and executable in the processor, characterized in that the processor implements the method according to any of claims 1-7 when executing the executable instructions.
10. A computer-readable storage medium having stored thereon computer-executable instructions, which when executed by a processor, implement the method of any one of claims 1-7.
CN201911265046.5A 2019-12-11 2019-12-11 Text data structuring processing method, device, equipment and storage medium Pending CN112948347A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911265046.5A CN112948347A (en) 2019-12-11 2019-12-11 Text data structuring processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911265046.5A CN112948347A (en) 2019-12-11 2019-12-11 Text data structuring processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112948347A true CN112948347A (en) 2021-06-11

Family

ID=76226292

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911265046.5A Pending CN112948347A (en) 2019-12-11 2019-12-11 Text data structuring processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112948347A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824801A (en) * 2015-03-16 2016-08-03 国家计算机网络与信息安全管理中心 Entity relationship rapid extraction method based on automaton
CN109582661A (en) * 2018-11-23 2019-04-05 金色熊猫有限公司 Data structured appraisal procedure, device, storage medium and electronic equipment
CN109815500A (en) * 2019-01-25 2019-05-28 杭州绿湾网络科技有限公司 Management method, device, computer equipment and the storage medium of unstructured official document
CN110032648A (en) * 2019-03-19 2019-07-19 微医云(杭州)控股有限公司 A kind of case history structuring analytic method based on medical domain entity
CN110347564A (en) * 2019-05-24 2019-10-18 平安普惠企业管理有限公司 Data creation method and device, electronic equipment, storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824801A (en) * 2015-03-16 2016-08-03 国家计算机网络与信息安全管理中心 Entity relationship rapid extraction method based on automaton
CN109582661A (en) * 2018-11-23 2019-04-05 金色熊猫有限公司 Data structured appraisal procedure, device, storage medium and electronic equipment
CN109815500A (en) * 2019-01-25 2019-05-28 杭州绿湾网络科技有限公司 Management method, device, computer equipment and the storage medium of unstructured official document
CN110032648A (en) * 2019-03-19 2019-07-19 微医云(杭州)控股有限公司 A kind of case history structuring analytic method based on medical domain entity
CN110347564A (en) * 2019-05-24 2019-10-18 平安普惠企业管理有限公司 Data creation method and device, electronic equipment, storage medium

Similar Documents

Publication Publication Date Title
US7580831B2 (en) Dynamic dictionary and term repository system
CN109634941B (en) Medical data processing method and device, electronic equipment and storage medium
CN112883157B (en) Method and device for standardizing multi-source heterogeneous medical data
US11250035B2 (en) Knowledge graph generating apparatus, method, and non-transitory computer readable storage medium thereof
CN112233746A (en) Method for automatically standardizing medical data
CN110737689B (en) Data standard compliance detection method, device, system and storage medium
CN116541752B (en) Metadata management method, device, computer equipment and storage medium
EP2922018A1 (en) Medical information analysis program, medical information analysis device, and medical information analysis method
US20120259661A1 (en) Systems and methods for data mining of DICOM structured reports
CN111061835B (en) Query method and device, electronic equipment and computer readable storage medium
US20140046694A1 (en) Systems and methods for synoptic element structured reporting
CN109299214B (en) Text information extraction method, text information extraction device, text information extraction medium and electronic equipment
CN113488157B (en) Intelligent diagnosis guiding processing method and device, electronic equipment and storage medium
US10192031B1 (en) System for extracting information from DICOM structured reports
US9881004B2 (en) Gender and name translation from a first to a second language
CN111523309A (en) Medicine information normalization method and device, storage medium and electronic equipment
CN111639161A (en) System information processing method, apparatus, computer system and medium
CN111126034A (en) Medical variable relation processing method and device, computer medium and electronic equipment
CN111063447A (en) Query and text processing method and device, electronic equipment and storage medium
CN112948347A (en) Text data structuring processing method, device, equipment and storage medium
CN113988082A (en) Text processing method and device, electronic equipment and storage medium
CN113139498A (en) Medical bill code matching method and device
CN111241834A (en) Medical care quality evaluation obtaining method, device, medium and terminal equipment
AU2016287770B2 (en) Frameworks and methodologies for enabling searching and/or categorisation of digitised information, including clinical report data
CN116089459B (en) Data retrieval method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination