WO2005122002A2

WO2005122002A2 - Structurized document creation method, and device thereof

Info

Publication number: WO2005122002A2
Application number: PCT/JP2005/000102
Authority: WO
Inventors: Masakazu Fujio; Hideyuki Ban
Original assignee: Hitachi Medical Corp; Masakazu Fujio; Hideyuki Ban
Priority date: 2004-06-07
Filing date: 2005-01-07
Publication date: 2005-12-22
Also published as: JP4649405B2; JPWO2005122002A1

Description

Specification

Structured document creation method and apparatus

Technical field

The present invention relates to a structured document creation method and apparatus suitable for use in a system that supports creation of an image interpretation report in the medical field, such as mammography (mammography).

Background art

[0002] An image reading report refers to a photograph taken by a CT (Computer Tomography) or MRI (Magnetic Resonance Imaging), etc., which is observed by a specialist, and if there are any abnormal findings, characteristics such as the shape, size, type, etc. It describes the diagnosis results, instructions, etc. In recent years, securing the quality of these interpretation reports has become an issue.

One of the problems with the conventional interpretation report system was that the quality and authenticity of the interpretation were difficult to understand due to differences in the description method and the degree of detail among the interpretation doctors. For example, if there is no description of the part that should be diagnosed, it cannot be distinguished whether it is an oversight or no apparent abnormality. In addition, there are companies that currently offer remote image interpretation services. The requester may not be able to see which doctor's request for interpretation is available. In doing so, it is very important that the terms and abbreviations used are unified, in order to accurately convey the content.

[0003] From the background described above, the formulation of standardization standards for information exchange between medical information systems has been promoted. Currently, DICOM (Digital Imaging and Communication in Medicine) and HL7 (Health Level 7) Spread is spreading.

Particularly in the radiation department, the DICOM standard for communication and storage of image data captured by devices such as CT and MRI is spreading, and each company supports it in the form of PACS (Picture Archiving and Communication System) and DICOM compatible modalities. When products are released, a connection check between vendor machines is performed at an event called Connectathon organized by IHE every year.

[0004] In the DICOM standard, the standard for describing interpretation reports is DICOM— An SR is being formulated. As of January 2004, there are 16 DICOM basic standards and 93 standard supplements (for example, see Non-Patent Document 1).

Those related to SR (Structured Reporting) include mammography CAD (Suppl.50), chest CAD (Supple.65), catheter lab (Supple.66), circulatory organ (Supple.71), and cardiac echo (Supple.72). ), Patient history (Supple.75), Numerical representation of vascular 'ventricular imaging (Supple.76), Vascular ultrasound (Supple.77), Fetal' pediatric (Supple.78), Breast cancer diagnosis (Supple.79), MR angiography · CT angiography (Supple.X).

[0005] DICOM—To describe an image interpretation report in a standard structure format such as SR, determine which template (structured fixed format) the input sentence applies to, and fill in the blank part of the template. You have to determine the value that goes into. For this reason, a method has been adopted in which the content of the findings is templated and the template is filled in at the time of input (for example, see Patent Documents 1, 2, and 3).

Non-Patent Document 1: “DICOM Standard”, [online], search on May 5, 2004, Internet URL: http: / 1 meaical.nema.org/Dicom/

Patent Document 1: Japanese Patent Application Laid-Open No. 2001-125994 (paragraphs [0018] to [0034], FIG. 1) Patent Document 2: Japanese Patent Application Laid-Open No. 2001-126007 (paragraph [0010], FIG. 1)

Patent Document 3: Japanese Patent Application Laid-Open No. 2003-288332 (paragraphs [0031]-[0038], FIGS. 1 and 2) Disclosure of the Invention

[0006] However, in the conventional interpretation input system based on template input described above, it is necessary to select a template to be used on the input side or for each diagnostic site and select or input an input item. It was very difficult for doctors to use.

[0007] The present invention has been made based on the above-mentioned circumstances, and enables a doctor's opinion sentence to be input efficiently into a structured fixed format for each medical field in a form that reduces the burden on the doctor. An object of the present invention is to provide a method and an apparatus for creating a structured document that can be converted and recorded.

[0008] In the structured document creation method and apparatus of the present invention, the degree of conformity between an input text, a template, and a structured template stored in the past is provided by character string information or language analysis. A template that is calculated using linguistic information and efficiently converted to The feature is to narrow down the list. For this purpose, the input text is converted into a structured document according to the template by performing a correlation function operation using a word dictionary having category information necessary for the structure.

Brief Description of Drawings

FIG. 1 is a drawing cited for explaining a system environment in which a structure creation apparatus of the present invention is used.

FIG. 2 is a diagram showing a template narrowing-down sequence according to the embodiment of the present invention.

FIG. 3 is a block diagram showing the internal configuration of the structure creation apparatus according to the embodiment of the present invention.

FIG. 4 is a diagram showing an example of a database and a program module according to the embodiment of the present invention.

[Fig. 5] (a) is a diagram showing an example of a field-specific context group term dictionary, and (b) is an example of a field-specific SR template (template DB).

FIG. 6 is an example of an SR template according to the embodiment of the present invention.

FIG. 7 is a diagram showing an example of a structure display report visualization display according to the embodiment of the present invention.

FIG. 8 is a flowchart according to the embodiment of the present invention.

FIG. 9 is a conceptual diagram illustrating an operation of a structure processing according to an embodiment of the present invention.

FIG. 10 is a diagram showing an example of a context group defined in DICOM-SR.

FIG. 11 is a diagram showing an expression example of an in-process result of an input sentence according to the embodiment of the present invention.

FIG. 12 is a diagram showing an example of expressing an in-process result of an input sentence according to the embodiment of the present invention.

FIG. 13 is a diagram showing an example of an SR tree structure according to the embodiment of the present invention.

FIG. 14 is a diagram showing the concept of an SR tree structure.

FIG. 15 is a diagram showing an example of a Content Item defined in DICOM-SR.

FIG. 16 is a diagram showing an example of a constraint on Relationship in DICOM-SR.

FIG. 17 is a diagram showing one entry of an SR subtree bank according to the embodiment of the present invention.

FIG. 18 is a diagram showing an example of a time-series arrangement of interpretation reports according to the embodiment of the present invention. BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. First, the structure defined by DIC OM-SR will be briefly described. DICOM—Structures defined by SRs may contain not only information on disease names, diagnoses, plans, shapes, numbers, sizes, dates, etc. but also target images and reference information to past reports. it can. Figure 14 shows the concept of a structured document. As shown in FIG. 14, each concept is expressed by an information unit called “Content Item” denoted by reference numeral 1001 and a “Relationship” denoted by reference numeral 1002, which defines a relationship between Content Items. DICOM—Structured documents in SR have Content ItemOOl in nodes,

It is represented by a tree structure with Relationshipl002 as an arc.

[0011] As shown in Fig. 15, the Content Item 1001 is represented by a triple of a "value type" denoted by reference numeral 1101, a "concept name" denoted by reference numeral 1102, and a corresponding "value" denoted by reference numeral 1103. Is done. As the value type 1101, six types of “CONTAINER”, “TEXT”, “CODE”, “NUM”, “PNAME”, and “SCOORD” are illustrated here.

rCONTAINERj is used not only as the title of the SR document (corresponding to the root node when expressed in the SR tree structure), but also as a container for grouping and hierarchizing Content Items. When giving meaning to a group, it is possible to write an appropriate name in the concept name 1102 and use it as a simple container without describing the name. “CONTAINER” is the only content item 1101 value type that does not have an actual “value (1103)”.

“TEXT” has a character string as a value 1103, and always needs a “concept name (1102)” indicating what value the character string is. Here, the term used as the concept name 1102 is not an arbitrary term but a standard term to which a force code is assigned in DICOM-SR. In the case of “CODE”, both the concept name 1102 and the value 1103 are expressed in standard terms. "NUM" has a numeric concept name 1102 and a value 1103 with units.

“PNAME” is used to represent the concept name 1102 and its value 1103 relating to a personal name such as “doctor” and “nurse”. “SCOORD” represents the coordinates in the image referenced by the report. In addition, a Content Item is also defined to express the reference to the target image of the report, and to express the reference to the past report.

[0013] Types of Relationship (1002), which is another important component of the SR tree structure, are: "Contain", "Has PropertiesJ, inferred From", "By—Reference". , "Selected FromJ,""Has Observation ContextJ,""Has Acquisition ContextJ," and "Has Concept Modifie" (for example, see Figs. 14 and 16). For the relationship rContainJ, the parent node type is " CONTAINER ", which is used when there is no point in grouping child nodes. “Has Properties” is used to further refine the contents of the parent node, and “inferred from” is used when the symptom, etc., on which the diagnosis was based is included as a child node. “By-ReferenceJ,“ Selected From ”” is used to reference images. “Has Observation ContextJ” “Has Acquisition Context” is used to describe attributes such as observation date and time and observer. "Has Concept Modifie is used to add expressions that modify symptoms.

[0014] The SR report according to the present embodiment includes a Content Item 1001 as shown in FIG.

Not all tree structures connected by Relationshipl002 are possible, and have some restrictions listed below.

(1) The content item type of the root node is “CONTAINER”, and the concept name 1102 describes the title of the SR report.

(2) Only the following value types 1101 can be used.

TEXT, CODE, NUM, DATETIME, DATE, TIME, UIDREF, PNAME, SCOORD, TCOORD, COMPOSITE, IMAGE, WAVEFORM, CONTAINER

(3) The relationship l002 that can be used for each value type 1101 is limited to the one shown in table form in FIG.

(4) In the SR tree structure, the reference relationship to the ancestor Content Item1001 is prohibited.

[0015] The above constraint alone does not provide a constraint that is sufficient to give a description pattern of the contents of diagnosis. “Template” exists as a detailed definition of the SR description for each medical field, such as mammography and catheter labs.

The template describes the subtree structure of the SR, and describes a fixed pattern for the partial structure of Content ItemOOlOl and Relationshipl002 for a part of the report. When the template is schematically represented, it can be represented as shown in FIG. Here, the number “TID4006” represents the ID of the template, Represents the template “Single Image Finding” (reference 401) in the field of graphics, and uses the “CODE” type Content Item 1001 as the root node of the subtree specified by TID4006.

[0016] Reference numeral 402 shown in FIG. 6 indicates that the value 1103 of the content item 1001 of the root node must also select a neutral force of the CID 6014. The CID is called a context group in the DIC OM-SR, and an ID is assigned to each predetermined value set.

The CID 6014 has values 1103 such as “Calcincation Cluster”, “Individuai CalcificationJ”, “Breast Composition J”, “Breast Geometry”. In addition, as a component of this template, there is a portion 1103 such as “reliability of finding” indicated by reference numeral 403 and a portion reading another template such as reference numerals 404 and 406. It is also possible to specify the same template ID as the caller, such as reference numeral 408. Also, as conditions for the existence of each node (Content Item 1001), the number of appearances and whether or not the item is a required item are described as indicated by reference numerals 405 and 407. Reference numeral 405, “1, MC” means “only once” if it appears, “M” means “required”, and “C” means “if the condition is satisfied”. The conditions are described in words in the actual definition of the template, but are omitted in FIG.

[0017] In this example, when the value of the code 402 is "Calcification Cluster", the template 406 of the DTID4009 is always possessed as a child node of the Has Properties relation, and the template 408 of the DTID4006 is used in the Inferred From relation. It is recommended to use!

DTID4006 template, called from “Single Image Finding”, has “tumor description”, “calcification description”, “numerical expression”, “single image finding”, etc. The templates for calling "Image Finding" include "Impression / RecommendationJ" and "Amplify FeatureJ".

[0018] A context group is a group of standard terms that appear in a specific context. Figure 10 shows an example of the context group “CID6132” for terms related to calcification.

Here, the coding system for the first column force standard term described as “SRT” is This is an abbreviation, and various glossaries such as SNOMED, ICD10, and HI-BIRD are used. The second column shows the version of the coding scheme used. The third column shows the code value of the term. The fourth column is intended to enhance readability, and describes the contents of the code for convenience. The context group is used by specifying the context group ID in the template (SR template) described in the previous section. In the present embodiment, the template and the SR template indicate the same one.

FIG. 13 shows an example of a template description according to DICOM-SR. Here (see Fig. 11), “Dens Breast been. Round Calcincations ケ) Noted In RT.AC Area, Adenoma Susp. Microcalcifications Seen In RT.AC Area. Malignancy Please follow up after 6 months. ”Is shown in an SR tree structure.

In FIG. 13, reference numeral 900 denotes a root node of the SR document tree, which represents a document title “Mammography Report”. The child node has an ontent Item of IMAGE type 902 and a root node Item (905) with partial processing and Findings Summary information. Reference numeral 902 has a reference to the image information for which the report is to be written, and describes the child node (reference numeral 904) with the relation name “Has Acq Context "(symbol 903).

[0021] Furthermore, "Processing and Findings Summary" corresponds to TID4002,

Assessment Category "(symbol 907)," Recommendation Follow-Up "(symbol 908)," Recommendation Follow-up Interval ', (symbol 909), corresponding to TID4004,

"Single Image Impression Recommendation" (symbol 911). The suffix 911 is composed of "Composite Feature" (symbols 913 and 920) composed of TID4005 template and "Single Image" composed of TID4006 template. Finding "(symbol 926), each corresponding to a specific description of calcification or tumor!

An embodiment of the present invention will be described in detail below on the basis of the definition of the structured document defined in DICOM-SR.

FIG. 1 is a diagram cited for explaining a system environment in which the structure creation apparatus of the present invention is used. To be more specific, the interpretation center 10 and the radiology department 11 It shows the configuration of the system that converts the input to the client to the DICOM-SR format and the environment for connecting to the network. The system shown in Fig. 1 consists of an image interpretation center 10 equipped with an image interpretation client, a radiology department 11 equipped with an image interpretation client, a hospital 12 equipped with a DICOM image viewer, a structured Web server 13 (structured document creation device), an image interpretation client It is composed of 14 ant, 14 specialist certification body, and 16 health examination body.

[0023] Usually, there are many doctors who request imaging of X-rays or the like and interpretation doctors who make diagnosis by looking at the captured images. Recently, as shown in FIG. A case can be assumed in which an image interpretation center 10 and a radiology department 11 are connected to a hospital 12 where doctors requesting image interpretation are connected via a network, and remote image diagnosis is performed.

In fact, there are companies that provide remote reading services at a reading center that collects radiology certifications in one place. In that case, the requesting doctor first transmits the DICOM image to be read to the reading client 14 of the reading doctor. As the transmission protocol, various communication forms and protocols can be used depending on the system configuration in which DICOM communication is common in the diagnostic imaging system. The image interpretation client 14 starts image interpretation work while viewing the DICOM image received via the image receiving means (not shown).

[0024] IT interpretation is progressing in interpretation work, and interpretation reports systems having keyboard input and voice input interfaces have been commercialized by various vendors. For example, in the “Natural Report” interpretation report system created by Hitachi Medical, a voice input system using Ami Voice is used as a finding input means!

Here, the flow of an interpretation report by voice input when an interpretation report is created by the interpretation center 10 will be described. First, the doctor looks at the microphone while looking at the captured image displayed on the computer screen (DICOM image viewer ZDICOM—SR report viewer) or the film arranged on the shaft. Dictate. With a conventional interpretation report system, what is dictated is converted into text data by a speech recognition engine and stored as text. As for the flow of the structured processing, the input work of the doctor is not particularly burdened as much as the conventional interpretation work, but internally, the structure based on the input data that is not stored as it is as speech recognition text Conversion to DICOM-SR format to save power. There are a plurality of possible timings for starting the structural processing. Normally, when a sentence is linguistically analyzed to determine the written information and applied to a fixed format for structural analysis, it is desirable from the viewpoint of analysis accuracy that the entire sentence can be passed to an analysis processing engine as much as possible. However, it is desirable to start the structuring step in the middle of the dictation so that the doctor can confirm the result of the structure.

There are three trade-offs between accuracy and speed: (1) wait until the entire sentence is completed, (2) wait until one sentence is completed, and (3) do not wait for one sentence. When converting an input sentence into a fixed format that is pre-defined by DICOM-SR or the like, it is necessary to determine which template covers which range of the input sentence, as in (3). In addition, in order to start the step-by-step processing in which dictation has begun, it is thought that processing efficiency can be improved by performing processing to search for template candidates.

[0026] In order to confirm whether the result of the structure is intended by the doctor, it is necessary to present the result of the structure easily and clearly. As a presentation method, a method of displaying when the structure is completely completed, and a method of dynamically displaying the state of partial structuring during the dictation can be considered. As a display method of the structured result, for example, a display format shown in FIG. 7 can be considered.

In the example shown in FIG. 7, information about the X-ray irradiation direction, which is extracted by writing power, is displayed in the field indicated by reference numeral 504, and there is no description about follow-up (reference numeral 505). (Composite Feature) is written, and its contents are the inner upper part and the outer upper part (reference number 500), three circular calcifications (reference number 501) (reference number 502), and the diagnosis is adenoma (reference number 503). ).

If the structured data is stored in the storage device, it is possible to perform a high-quality interpretation report search such as a search using a pair of a lesion site and a symptom. In addition, if it is common to write reports in a standard format (see Fig. 1), a specialist certification body 15 will use an SR report created in the past to evaluate specialists when evaluating specialists. It is also conceivable that the application may be applied to evaluation when discriminating insurance scores according to the physician's ability at the insurance reviewing agency 16. [0028] In addition, the means for structuring a free sentence can be provided as an outsourcing technology to various medical institutions as a Web (World Wide Web) service. In FIG. 1, a structured Web service is provided at a site somewhere on the network, such as a structured Web server 13. In this case, a structured program can be used from the interpretation report system in the form of a function call.

FIG. 2 shows a DICOM-SR report sentence structured sequence in this case. That is, the examination part information and the image are respectively input from the image interpretation client 14 by voice input (S201) and from the PACS (Picture Archiving ana Communication system) or IS (Radiology Information System) of the hospital 12 to the examination site information and the image. Provided to In response to this, the structured Web server 13 narrows down template candidates (S203), and also narrows down existing structured report candidates accumulated in the past. Further, the structure is reduced by narrowing down the SR subtree candidates according to the finding input (S202) (S204, S205), and a DICOM-SR report, which is the final target, is created. The narrowing down of the SR subtree candidates (S204) and the structuring process (S205) will be described later.

FIG. 3 shows the internal configuration of the structured document creation device mounted on the structured Web server 13. Note that the structured document creation device may be implemented in a computer of the interpretation center 10 or the radiology department 11.

The structured document creation device includes an arithmetic unit 1, an input device 2, a storage device 3, and an output device 4. The arithmetic unit 1 is mainly composed of a CPU (Central Processing Unit), and includes a finding input capturing unit 101, an extended morphological analysis unit 102, a structured document conversion unit 103, a structured document output unit 104, and a dictionary. It includes an information update unit 105, an edit / input fetch unit 106, a search support unit 107, an item value prediction unit 108, and a structured document update unit 109 (see FIG. 3).

The finding input capturing unit 101 converts speech (eg, a finding sentence relating to a disease or disease) input by a doctor into text by a built-in speech recognition engine, and supplies the text to the extended morphological analysis unit 102. The extended morphological analysis unit 102 refers the converted text document to a word dictionary 33 that stores technical terms and categories to which the technical terms belong. Then, the morphological analysis is performed, and the morphological analysis is supplied to the structure conversion unit 103.

[0032] The structured document conversion unit 103 extracts the correlation between the word obtained by the morphological analysis and the category to which the word belongs, and converts the text document according to a standardized format standardized based on the correlation. The document is converted into a document (interpretation report) and supplied to the structure output unit 104. When converting into a structured document, the structured document conversion unit 103 narrows down the templates stored and prepared in the template DB 34 by performing a correlation function operation. The structured document output unit 104 accumulates the structured document obtained by the conversion in the structured document conversion unit 103 in the structured document DB (structured report DB) 312 and sets the output device 4 as the output device 4. Output to a display screen prepared in advance.

On the other hand, when a doctor who visually recognizes the structured document output on the display screen attempts input for editing, the edit input capturing unit 106 captures the input and supplies the input to the dictionary information updating unit 105. With the ability to The dictionary information updating unit 105 has a function of referring to the structured document stored in the structured document DB 312, and updating the word dictionary 33 (301 (see FIG. 4)) using the editing history.

The diagnostic knowledge DB 320 stores a diagnostic knowledge system created based on the structured documents stored in the structured document DB 312, and the item value prediction unit 108 reads out the diagnostic knowledge system from the diagnostic knowledge DB 320. It has the function of predicting the progress of symptoms from information and the elapsed time information that is input and captured from the outside, and applies it to the item fields of a standardized standard format (template). The search support unit 107 has a function of searching for a structured document stored in the structured document DB 312 from an input character string accompanying a search request to which external force is also supplied, and searching and outputting the corresponding structured document. Further, the structured document updating unit 109 visually displays the structured document obtained by the conversion on a display screen, prompts an editing input by a doctor or the like from the input device 2 such as a keyboard, and when there is an editing input, Import the information and update the contents of the structured document DB312 where the structured documents are stored.

[0035] Here, as various DBs, the word dictionary 33 (word dictionary 301), the structured document DB 312, the knowledge system DB (diagnostic knowledge DB 320), and the force represented as a part of the template DB 34 are actually: The various files shown in Figs. 4 and 5 are constructed and used as a DB in the storage device 3. Also, as in the sentence boundary determination module 302, etc. Various program modules that are called and function by U are also stored in the storage device 3 and provided.

As for the word dictionary 33 in FIG. 3, in addition to a general-purpose word dictionary 301 as shown in FIG. 4, an SR subtree bank 314, or as shown in FIG. A category group term dictionary (remembering the category to which the technical terms belong), such as Caterlab and cardiac echo, is prepared, and the template DB (34) contains, as shown in Fig. 5 (b), SR templates for each field, such as mammography, catheter lab, and echocardiography, are available. Details will be described later.

Hereinafter, a process for converting an interpretation report input as a free sentence (free dictation) into a certain structure determined for each field will be described with reference to a flowchart shown in FIG. This will be described with reference to FIGS. 3 and 4).

[0038] (S61: Document division processing (S61—S64 (first step))

First, the structured document creation device (arithmetic device 1) of the present invention (see FIG. 4) is configured to input text according to the character string pattern matching rules registered in the sentence boundary determination module 302 and the paragraph boundary determination module 308. Is divided into combinations of sentence sets.

The sentence segmentation process is performed step by step, and roughly divided into sentence sets such as “diagnosis part”, “finding part”, and “instruction part”, and then prepared by the noragraph boundary determination module 308 for each sentence set type. The sentence segmentation process is recursively repeated according to the given pattern matching rules.

[0039] For example, in the case of the "finding portion", the types of "image quality", "imaging direction", "imaging site", "calcification", "chest structure", etc. are considered as sentence sets. Can be For example, if a character string pattern such as "diagnosis:", "IMPRESSION", or "CT SCAN OF ABDOMENJ" is found in an input sentence, a sentence set can be separated before and after the input rule. The character string "Diagnosis:" appears, and the next other clue pattern appears or until the end of the sentence. In the sentence division processing by the paragraph boundary determination module 308, there is a case where the application rule does not exist and the division cannot be further advanced due to any stage of the restart step. In this case, the sentence boundary determination module 302 performs the sentence division processing, and the sentence boundary determination module 302 If the determination process fails, the process of dividing the area ends.

In the above sentence segmentation process, the sentence is assumed as the minimum unit, but the connecting particle `` Ichiga '', such as `` No obvious malignant findings are found, but please refer to other examination findings '' etc. This is not the case when it is better to divide a sentence into separate sentence sets in the middle of a sentence.

[0040] Each sentence set can be expressed by surrounding it with tags such as "ku paragraph> '" // paragraph> ". Depending on the clues used to determine the noragraph range, the type of paragraph may be more specific to the criticism. In this case, the type of paragraph is expressed using the attribute information format of XML, such as “, paragraph type = conclusion>” and “ku paragraph type = 〃procedure〃〉”.

Wear. It is possible to express the hierarchical division process by enclosing the relevant area with nested tags, such as "ku paragraph no sentence> '". </ SentenceXsentence>' "/ sentence> / paragraph>". it can.

By repeating the above decomposition steps as much as possible, the data is decomposed into a set of specific content description areas with a possible granularity. The clue patterns required for the decomposition are recorded in the paragraph boundary determination module 308 and the sentence boundary determination module 302 allocated to the storage device.

As shown in FIG. 1, the rules used in the sentence segmentation process are divided into those common to each user or examination type and those individually customized. New data and rules that can be used to create structural reports according to the standard format can be synchronized by sharing the knowledge obtained at each site as necessary, and sharing them. It is possible to improve each other's reporting environment.

(S62: Extended Morphological Analysis Processing)

Normally, one sentence is assumed as input. However, since the content corresponding to one SR template may be described in multiple sentences, it may be applied to multiple sentences.

In the extended morphological analysis, a word dictionary with semantic information such as “disease name” and “part” and an SR subtree bank 314 that is not limited to the word dictionary 33 for a context group are used as dictionaries. The SR subtree bank 314 can be regarded as a word dictionary with semantic information having an SR subtree as semantic information. Word power in word dictionary 301 "Many SMALL It corresponds to a character string that exceeds the word range, such as "Caldfication," and has a power equivalent to its meaning. It is a corresponding SR subtree. Using the word dictionary 301 and the SR subtree bank 314, search for the optimal path using dictionary lookup and dynamic programming for input sentences (Yuji Matsumoto, Kei Kitauchi, Tatsuo Yamashita, Yoshitaka Hirano, Hiroshi Matsuda, Kazutaka Takaoka Ma, Masayuki Asahara, "Japanese morphological analysis system" ChaSen "version 2.2.1 Instruction Manual", Dec, 2000), a word (character string) sequence with meaning (structure) information (Fig. 9, code 600).

[0044] Further, in the interpretation report, since there are many spelling variations, the dictionary lookup process in the morphological analysis process is extended. DP matching between the dictionary entry character string and the input character string is performed even for missing or inserted characters that cannot be obtained only by dictionary lookup processing that assumes perfect matching. It can be carried out.

For example, if the word “fine linear calcification” is registered in the dictionary, “fine linear calcification” with one character missing in the sentence (String) You can get the sequence. A similar process is possible when the input is "left MRI order" and the dictionary is "left MRI order". Normally, dictionaries used for morphological analysis often have a data structure devised to speed up dictionary lookup, such as a Suffix Tree or Suffix Array, so the transition process of the Suffix Tree using input characters requires A method of adding processes corresponding to insertion, deletion, and replacement is conceivable.

[0045] In this process, for example, by adding a constraint such as "there is no loss or replacement of the first character", it is possible to limit the spelling variation that can be dealt with and secure the speed. It is also effective to define a non-replaceable character string in the transition process of the Suffix Tree. For example, one-letter substitutions of "many" and "many" may be considered the same, but treating one-character replacements of "many" and "minority" as the same may cause a serious error. . The above processing can be applied to sentence where English and Japanese are mixed and English sentence which is often found in remarks not limited to Japanese.

(S63: Clause analysis processing)

Subsequently, the phrase analysis module 305 performs a phrase construction process on the word sequence among the word (character string) sequences output from the extended morphological analysis unit 102. Phrases are independent words, which are words that typically have meanings such as verbs and nouns. The sequence is followed by a sequence of adjuncts such as case particles and auxiliary verbs that have no meaning by themselves. Also in English, units corresponding to Japanese clauses called noun clauses and verb clauses can be constructed.

For example, the word sequences 601, 602, and 603 shown in FIG. 9 become a phrase 604 ("to the right breast"), the sign 605 becomes a phrase 606 ("accompanying") alone, and the word sequences 607, 608, and 609 represent , Sentence 610 ("Tumor shadow"), and symbols 611, 612, and 613 become sentence 614 ("Accept.").

[0047] (S64: phrase type determination processing)

Next, the phrase type determination module 304 performs a phrase type determination process. Each clause can have semantic information such as “disease name”, “part”, “instruction”, etc., and “context group identification information” to which the word belongs, according to the semantic information of the word that is a constituent element. As an example of a context group defined in SR in the field of mammography CAD !, Fig. 10 shows a context group related to calcification type terms.

Here, it can be seen that the term “popcorn-like calcification” is assigned the code “F-01761” in version 1.1 of the code-riding system “SRT”. The context group in Fig. 10 is assigned the ID "CID6132". In the example shown in FIG. 9, the context group of clause 604 is CID6022, which generally means an examination part. Clause 610 is classified as “Mass”, which is a type of tumor, and clause 614, a verb phrase, is classified as indicating certainty.

As a second method of the phrase type determination processing, semantic information such as “part” and “instruction” is added to the phrase according to a separately defined word sequence pattern. It is possible to hold multiple results of the phrase type determination process, without having to be unique. In the phrase type determination process, the SR template ID that includes the phrase as an element is output as the phrase type. For example, in the SR template, an element corresponding to “confidence” appears in TID4002, TID4005, TID4006, and the like, and thus has the above-mentioned three SR template HDs as the phrase type of reference numeral 614. Although the SR subtree 615 has an ID of TID4009, SR templates having TID4009 as elements include TID4006 and TID4005. In the phrase type determination processing stage, a plurality of candidates are held, and in the structure Choose the right one.

Of the word (character string) sequence output by the extended morphological analysis unit 102, the character string portion corresponding to the SR subtree has SR substructure tree information and needs to perform a phrase type determination process. There is no.

The analysis results and the status in the middle are shown in XML (extended Mark-up

Language) format or the like.

In FIGS. 11 and 12, reference numeral 800 represents an input sentence, and the sentence boundary is in an ambiguous state. Reference numeral 801 denotes a state in which the sentence range is determined by the above-described sentence division processing, and each sentence is represented by an area surrounded by a tag “ku sentence> 〃. The range of each word in a state where the extended morphological analysis processing has been completed is represented as an area surrounded by a tag {morphological element}. Reference numeral 803 indicates a state where the phrase construction processing has been completed. Is represented as an area surrounding the tag phrase.} Reference numeral 804 denotes a state in which the phrase type determination processing has been completed, and each phrase type is represented by “<calcification”, “<confidence level”. , “<Number”, “<pathology>”, etc.

[0050] (Second step (S65): Structure matching processing by template matching)

Subsequently, the structured document creation unit 103 executes a structuring process based on the template. An SR template has an ID and a subtree structure. The subtree has the above-described context group ID and SR template ID (at least one of context group identification information and template identification information is added), and includes the above-described clause type information and SR subtree information. By matching with, the applicable SR template and the corresponding part in the SR template can be specified.

According to the phrase type determination process performed in S64, when a series of phrase types or SR subtrees at close distances have the same template ID as the phrase type, for example, in the example shown in FIG. If it has a template ID of "4005", a subtree is constructed according to the common template ID. Those clauses and SR subtrees do not have to be continuous. The SR template itself to be compared with the input analysis result in the template matching type structure processing shown in S65 of FIG. 9 can also calculate the ID of the template having itself as an element at a glance . Of the phrase type judgment processing result output Similarly, if the SR subtree newly constructed by the template matching type structuring process is similar to the phrase or SR subtree whose template ID candidates are the same, The elements are put together to form a subtree corresponding to the template. When the same phrase type candidates are arranged consecutively, those phrase groups are put together as a subtree. According to the meaning of each template, the grouped clauses include the following contents: `` description on size '', `` description on tumor morphology and symptoms '', `` description of number '', `` description of interval until next test '', etc. The SR subtree to be represented.

When calculating the template ID that includes itself as an element in the phrase type determination processing, the output is changed by using the D ICOM image header, the examination type obtained through the hospital system such as RIS, HIS, and imaging part information. can do. For example, if the symmetric finding is found to be mammography, the SR template used for the content part will be narrowed down to template ID 4000-4100.

[0051] (Third Step (S66): Output of Structured Document)

The document is output from the output device 4. The output device 4 corresponds to a display screen, a printer, speed, a network interface card, and the like.

[0052] (Fourth Step (S67): Accumulation of Structured Report)

After confirming that the created structured report (structured document) is correct via the doctor's feedback by the editing and inputting unit 106, the structured document output unit 104 executes the structured report storage module 318. Thus, it is stored in the structure document DB (318).

The stored structure report is decomposed into a reusable subtree by the substructure extraction module 313 and registered in the SR subtree bank 314 as a pair with a corresponding substring. As shown in Figure 1 as shared data, these subtrees are synchronized and shared for highly available SR subtrees stored at each interpretation site, creating a structured report construction environment. It is possible to improve each other.

Next, use of diagnostic knowledge and time-series information will be described. By combining the diagnostic knowledge DB320 with the structured report, it is possible to provide an alert function for doctor input and omission of discrimination, suggestion of a test, etc., and educational human power support for trainees. For example, if the diagnostic knowledge DB320 has the knowledge that "ultrasonics are easy to distinguish cells from fibrous line types", if findings of "fibrous line types" or "cysts" are input, other What is the possibility! If there is knowledge that “the description to proceed with the ultrasound examination can be set as the default value” in the recommendation part of the SR template, “tumor with spikyura → benign” “postsurgical scar” ), Radial scar, fat necrosis, sclerosing adenosis, cerebral ulcer, malignant `` hard carcinoma, invasive lobular carcinoma. Except for the case where the site is consistent with the findings, application of detailed examination, `` unclear boundaries → It is important to distinguish the problem of Breast Composition, and the nature of the mass '', `` When the fat concentration is observed in a mass with spikyura → Did you distinguish between hard cancer, which involves fat and infiltrate it, and radiological scar? "" Even if there is a proliferative part in the cyst, the X-ray absorption value of that part and that of the liquid component Cannot be distinguished "The shape of the mass and the degree of malignancy. Circular → polygon → lobulated → irregularly shaped." The boundary of smoothness and infiltration is inconsistent. "" Mammamogram alone can diagnose benign calcification. " Calcification of skin and blood vessels, calcification associated with fibrosis and invasive dilatation, circular calcification, central permeability calcification, lime milk calcification, etc. "" Diffuse and regional distribution is often benign. For those with a linear distribution, it is necessary to sufficiently consider malignancy. ”“ It is important to distinguish between those that require differentiation and those that are clearly benign. ”“ Including fat as a component of a mass → lipoma It is possible to provide input support such as "hamartoma, mammary mass, oil cysts, and lymph nodes" and "no breast cancer fat component. Fat may be involved by infiltrating surrounding tissues." These are processed by the item value prediction unit 108.

In the process of creating a structured document using a partial tree template, for the content not described in the input sentence, “no abnormality” or the corresponding value of the previous interpretation report is used. A default value can be given to reduce the doctor's input burden. Conversely, it is effective to point out inconsistencies between the contents described in the input sentence and the corresponding values in the test values and past radiology reports.

Next, display of time-series information will be described. As described above, by structuring the report, it is possible to arrange the report in a time series for each item.

Specifically, in FIG. 18, reference numeral 1400 indicates the date and time (time axis) at which the image interpretation report was created. Reference numeral 1401 indicates a site to be reported. Reference numeral 1402 indicates calcification, tumor, and other findings (symptoms) at each site.

Each finding is further classified according to its severity. For example, in the case of calcification, calcification cases of high severity are listed from those related to obvious benign calcification such as calcification of skin and blood vessels, fiber type, circular calcification, and central calcification. . Reference numerals 1404 to 1408 denote reports. By placing them at the corresponding time, site, and symptom location, it is possible to have a bird's-eye view of reports that have been performed in the past. It is convenient to create a new report or refer to an appropriate past report!

As a method of customizing the “structured interpretation report creation support system” that is actively performed by the interpreting doctor, a registration module (SR block registration module 316) in the SR subtree bank 314 (see FIG. 4) is used. ) And synonyms to the word dictionary 33 for the context group (see Fig. 3). • There is a new term registration module (context group module 315).

In the former module, expressions frequently used by the radiologist and their structured results are registered in the SR subtree bank 314 using the SR block registration module 316. This makes it possible to reuse the expression frequently used by the radiologist and the result of the structure in the pasting process to the input character string (fifth step). In the latter module, for example, in the context group on the type of calcification, if it is a synonym for "popcorn calcification," it is "shi ode bcheme Designator, Version," "shi oae Value," "SRT, 1.1, F-017ol." Yes, entries with different code meaning expressions are registered by the context group module 315. If the type of calcification is not a synonym for any of the entries in Figure 10, register with a new ID that does not overlap, such as "USER, 1.0, F-01770".

[0057] The side force of the interpreting physician is a means for returning feedback on the presented structure result, and is the means for editing and inputting of input means 106. For example, the value of the “Individual Calcification” field in FIG. 7 is “Circular calcification” (reference numeral 501), but in fact, a content group on calcification cases (DICOM—SR, as shown in FIG. 10) If it is another element (defined as CID No. 61313), select another item (eg, “lime calcification”), or enter the value directly, Feedback to output results It comes out.

As a method of automatic customization using the work of an interpreting doctor, there is a method of using a correction history of a structured report. The phrase type determining module 304 is customized based on the structural modification report history record DB (311) recorded by the structural modification report editing module 317. For example, when the character string “S7” appears, the pattern match rules handled by the phrase type determination module 304 include the “cancer progress” pattern and the date (“Showa 7”) pattern. Suitable for both. If the radiologist is inputting a finding with the intention of “cancer progress”, no customization is performed if the phrase type determination module 304 originally determines the “cancer progress”. Conversely, when it is determined to be “Showa 7”, the pattern matching rule is usually corrected by the input of the editing / inputting unit 106 of the structure editing report based on the feedback of the radiologist, and the corrected pattern is stored in the report correction history DB (311 ). Using the accumulated correction log, the correction log analysis module can partially customize the writing style of each radiologist by changing the application order of rules or changing the application weight of patterns.

[0059] As described above, according to the present invention, it is possible to efficiently convert and record a doctor's opinion sentence input into a structured fixed format for each medical care field with a small burden on the doctor. Can be done.

In the future, it is expected that "DICOM —SR compatible interpretation report" used in the input stage of the radiology interpretation report system will be required to be written as a standard specification in a pamphlet as an essential specification. The invention can be expected to be used as an effective tool.

Possibility of industrial use

According to the present invention, it is possible to implement a device that can create a highly convenient interpretation report in a standard format without putting a burden on a doctor and accumulate it as knowledge.

Claims

The scope of the claims

[1] A structured document creating method in which a computing device computes a text document input by an input device to create a structured document, and outputs the created structured document to an output device,

The arithmetic unit includes:

A first step of performing a morphological analysis on the input text document by referring to a word dictionary storing technical terms and categories to which the technical terms belong;

Extracting a correlation between a word obtained by the morphological analysis and a category to which the word belongs, and converting the text document into a structured document according to a standardized format based on the correlation. Steps and

Outputting a structured document obtained by the conversion to the output device.

[2] The word dictionary is

It is represented by a structured document that has a context item as a node and has a tree structure with relationships as arcs. According to the partial structure of the context item and the relationship, context group identification information and 2. The structured document creation method according to claim 1, wherein the structured document creation method has a partial tree structure to which at least one of the template identification information is assigned.

[3] The first step is:

Sub-step of dividing the input text into a set of sentence sets according to a string pattern match rule prepared in advance;

Performing a phrase analysis of the divided document! ヽ, and determining phrase type information having the context group identification information from the word dictionary;

2. The structured document creation method according to claim 1, comprising:

[4] The second step is:

For each of the determined phrase type information, the corresponding context group identification information is extracted by collating with the subtree structure of the word dictionary, and the fixed form document and the fixed form applicable based on the extracted context group identification information. Sub to identify the corresponding part in the document Steps,

4. The structured document creation method according to claim 3, comprising:

[5] The arithmetic unit comprises:

A fourth step of storing the structured document obtained by the conversion in a storage device, and comparing a structured document frequently used among the structured documents stored in the storage device with an input character string. , A fifth step of pasting the structured document obtained by the matching to the input character string,

5. The structured document creation method according to claim 4, further comprising:

[6] The text document is a commentary on a disease or a disease, the structured document is a radiological report, and the radiological report is a chronological order for a specific item about the symptom or treatment of the disease or disease. Output side by side,

2. The structured document creation method according to claim 1, wherein:

[7] A structured document creation device that creates a structured document by calculating a text document input by an input device, and outputs the created structured document to an output device,

A morphological analysis unit that performs a morphological analysis on the input text document by referring to a word dictionary storing technical terms and categories to which the technical terms belong;

The structure obtained by extracting a correlation between a word obtained by the morphological analysis and a category to which the word belongs, and converting the text document into a structure formed according to a standardized format standardized based on the correlation. A conversion unit;

A structured document output unit for outputting the structured document obtained by the conversion to the output device;

A structured document creation device, comprising:

[8] a structured document storage device in which the structured document obtained by the conversion is stored;

A dictionary information updating unit that updates the word dictionary by referring to the structured document stored in the structured document storage device and using the editing history thereof V;

8. The structured document creation device according to claim 7, comprising:

[9] a storage device for storing a knowledge dictionary created based on the stored structured document; An item value prediction unit that predicts the progress of a symptom from the information read from the knowledge dictionary and the elapsed time information input and captured, and applies the information to the item column of the standardized standard format;

8. The structured document creation device according to claim 7, comprising:

[10] A search support unit that searches for a structured document stored in the storage device from an input character string attached to a search request, and searches and outputs a corresponding structured document.

8. The structured document creation device according to claim 7, comprising:

[11] A structured document creation device that computes a text document input by an input device to create a structured document, and outputs the created structured document to a display screen of a display device. So,

A morphological analysis unit that performs a morphological analysis of an input text document by referring to a word dictionary storing technical terms and a category to which the technical terms belong;

A structured document that extracts a correlation between a word obtained by the morphological analysis and a category to which the word belongs, and converts the text document into a structured document that follows a standardized format based on the correlation. A conversion unit;

The structured document obtained by the conversion is visually displayed on the display screen to prompt an edit input, and when the edit input is received, the information is taken in, and the result of the conversion by the structured document conversion unit is obtained. And a structured document updating unit for updating the content of the storage device in which the stored document is stored.

[12] The input device,

8. The structured document creation device according to claim 7, further comprising a speech recognition engine for creating a text dictation-free dictation ability.