WO2018120447A1 - Method, device and equipment for processing medical record information - Google Patents

Method, device and equipment for processing medical record information Download PDF

Info

Publication number
WO2018120447A1
WO2018120447A1 PCT/CN2017/077125 CN2017077125W WO2018120447A1 WO 2018120447 A1 WO2018120447 A1 WO 2018120447A1 CN 2017077125 W CN2017077125 W CN 2017077125W WO 2018120447 A1 WO2018120447 A1 WO 2018120447A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
target
feature
medical
information
Prior art date
Application number
PCT/CN2017/077125
Other languages
French (fr)
Chinese (zh)
Inventor
银磊
李明修
卜海亮
魏世嘉
Original Assignee
北京搜狗科技发展有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京搜狗科技发展有限公司 filed Critical 北京搜狗科技发展有限公司
Publication of WO2018120447A1 publication Critical patent/WO2018120447A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references

Definitions

  • the present invention relates to the field of information processing technologies, and in particular, to a method, device and device for processing medical record information.
  • the medical record information can reflect the patient's medical treatment
  • the medical record information can be used for doctors, patients to understand the patient's historical conditions, treatment, etc., and can also be used to analyze data on the condition and treatment of a large number of patients.
  • the medical record information that can usually be obtained directly is usually chaotic, that is, various information contents are pieced together indiscriminately. Therefore, on the one hand, when displaying such medical information to the user, the user is not only difficult to read smoothly but also cannot quickly find the required information content. On the other hand, such medical information is not conducive to the search and identification of the information content. Therefore, it is also difficult to use for data collation and analysis.
  • the technical problem to be solved by the present invention is to provide a method, a device and a device for processing medical record information, so that various information contents in the medical record information can be distinguished according to a certain structural format, and the structure of the medical record information is realized. It not only makes it easy for users to read and quickly find the information content of the demand, but also facilitates data collation and analysis.
  • an embodiment of the present invention provides a method for processing medical record information, including:
  • Target text unit in the target medical text is embodied as text information under the target category.
  • the determining the target category corresponding to the text feature of the target text unit may include:
  • the target category may be a category for describing patient information, a category for describing a disease name, a category for describing symptom statement information, a category for describing symptom identification information, and a description for medical information.
  • the method may further include:
  • the target feature word in the target medical text is embodied as text information belonging to the first feature item.
  • the extracting the target feature words for describing the first feature item from the original medical text may include:
  • the extracting the target feature words for describing the first feature item from the original medical text may include:
  • the initial feature words are matched in a standard feature vocabulary to obtain a standard feature word that matches the initial feature word as the target feature word for describing the first feature item.
  • the analyzing the original medical text to obtain an initial feature word for describing the first feature item may include:
  • the method may further include:
  • the method may further include:
  • the inferred feature word is a feature word for describing the second feature item not recorded in the original medical text ;
  • the inferred feature word is embodied in the target medical text as text information belonging to the second feature.
  • the determining the inferred feature word corresponding to the original medical text in the second feature item may include:
  • the method may further include:
  • Finding a preset medical text matching the target medical text wherein the text information of the preset medical text under the target category is the same as or similar to the target text unit, and the target category includes a category used to describe a patient's personal information and/or a category used to describe a patient's symptoms;
  • Extracting text information under the category for describing the diagnosis information in the preset medical text is embodied as reference diagnostic information in the target medical text.
  • the obtaining the original medical text may include:
  • an embodiment of the present invention provides a processing device for processing medical information, including:
  • a dividing unit configured to divide the original medical text into at least one target text unit
  • a first determining unit configured to determine a target category corresponding to the text feature of the target text unit
  • a generating unit configured to generate the target medical text, wherein the target text unit is embodied as text information under the target target category in the target medical text.
  • the first determining unit may include:
  • a target category determining subunit configured to determine, according to the first machine learning model, a target category corresponding to a text feature of the target text unit, wherein the first machine learning model passes the historical medical text included in the training sample set
  • the correspondence between the text feature and the preset category is obtained by training.
  • the target category is a category for describing patient information, a category for describing a disease name, a category for describing symptom statement information, a category for describing symptom identification information, and a category for describing medical information. Or a category used to describe prescription information.
  • the device may further include:
  • a first extracting unit configured to extract, from the original medical text, a target feature word for describing the first feature item; wherein, in the target medical text, the target feature word is embodied as belonging to the first Text information for feature items.
  • the first extracting unit may include:
  • a target feature word extracting sub-unit configured to extract the target feature word for describing the first feature item from the text information under the target category to which the first feature item belongs in the original medical text.
  • the first extracting unit may specifically include: an analyzing subunit and a matching subunit;
  • the analysis subunit is configured to analyze the original medical text to obtain an initial feature word for describing the first feature item
  • the matching subunit is configured to match the initial feature words in a standard feature vocabulary to obtain a standard feature word that matches the initial feature word, as the target feature for describing the first feature item word.
  • the matching subunit may specifically include:
  • the initial feature word extraction subunit is configured to perform lexical analysis and/or syntax analysis on the original medical text based on the medical special vocabulary to obtain the initial feature word for describing the first feature item.
  • the device may further include:
  • a establishing unit configured to describe a corresponding relationship between the initial feature word and the target feature word of the first feature item And reflected in the target medical text.
  • the device may further include:
  • a second determining unit configured to determine an inferred feature word corresponding to the original medical text under the second feature item, wherein the inferred feature word is not described in the original medical text for describing the a characteristic word of the second feature item;
  • the inferred feature word is embodied in the target medical text as text information belonging to the second feature.
  • the second determining unit may include:
  • a feature word determining subunit configured to determine, according to a second machine learning model, an inferred feature word corresponding to the original medical text under the second feature item, wherein the second machine learning model passes the training sample
  • the historical medical record text included in the set is trained by the corresponding correspondence between the preset inferred feature words for describing the second feature item.
  • the device may further include: a searching unit and a second extracting unit;
  • the searching unit is configured to search for preset medical text matching the target medical text, wherein the text information of the preset medical text under the target category is the same as the target text unit or Similarly, the target category includes categories for describing patient personal information and/or categories for describing patient symptoms;
  • the second extracting unit is configured to extract text information in a category for describing diagnostic information in the preset medical text for generating the target medical text.
  • the obtaining unit may include: a first acquiring subunit and a first identifying subunit;
  • the first obtaining subunit is configured to obtain medical record information in a voice form
  • the first identification subunit is configured to perform voice recognition on the medical record information to obtain the original medical record text.
  • the acquiring unit may include: a second acquiring subunit and a second identifying subunit;
  • the second obtaining subunit is configured to acquire medical record information in an image form
  • the second identification subunit is configured to perform image recognition on the medical record information to obtain the original medical record text.
  • an embodiment of the present invention provides an apparatus, including a memory, and one or more programs, wherein one or more programs are stored in a memory and configured to be Execution of the one or more programs by one or more processors includes instructions for performing the following operations:
  • Target text unit in the target medical text is embodied as text information under the target category.
  • the embodiment of the invention has the following advantages:
  • a method, apparatus and apparatus for unstructured original medical text, by dividing the original medical text into at least one target text unit and determining the target text unit for each target text unit
  • the target category corresponding to the text feature can generate a structured target medical text, so that each target text unit in the target medical text is embodied as text information under the target category to which it belongs. It can be seen that since different information contents are classified into corresponding categories in the structured target medical text, on the one hand, when the target medical text is displayed to the user, the user can not only read more smoothly but also can be faster.
  • the text content categorized in the target medical text text is conducive to the search and identification of the information content, which also makes the target medical text more conducive to data collation and analysis.
  • FIG. 1 is a schematic diagram of a framework of an exemplary application scenario according to an embodiment of the present invention
  • FIG. 2 is a schematic flow chart of a method for processing medical record information according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a processing device for processing medical records according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a device according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • the inventors have found through research that the medical information that can usually be directly obtained, such as the medical information input by the user, is usually disorderly.
  • the information content used to describe different features is pieced together indiscriminately.
  • the disorderly medical information is not conducive to the user's search for information content. Identification.
  • the original medical text is divided into at least one target text unit, and the target category corresponding to the text feature of the target text unit is determined for each target text unit, and the structure is generated accordingly.
  • the target medical text is such that each target text unit in the target medical text is reflected in the text information under the target category.
  • the embodiment of the present invention can be applied to the scenario shown in FIG. 1 , where the user terminal 102 and the server 101 implement interaction through the network 103 .
  • the server 101 obtains the original medical text transmitted by the user terminal 102.
  • the server 101 divides the original medical text into at least one target text unit, determines a target category corresponding to the text feature of the target text unit, and generates a target medical text, wherein in the target medical text
  • the target text unit is embodied as text information under the target category.
  • the server 101 can transmit the target medical text information to the user terminal 102 for display.
  • the user terminal 102 can be existing, under development, or developed in the future, and can be implemented by any form of wired and/or wireless connection (eg, Wi-Fi, LAN, cellular, coaxial cable, etc.).
  • Any user device that interacts with server 101 including but not limited to: existing, ongoing Smartphones, non-smart phones, tablets, laptop personal computers, desktop personal computers, small computers, medium-sized computers, large computers, etc. that are developed or developed in the future.
  • server 101 is merely an example of an existing, research-developed or future-developed device capable of providing medical information processing functions to a user.
  • Embodiments of the invention are not subject to any limitation in this regard.
  • FIG. 2 a schematic flowchart of a method for processing medical record information in an embodiment of the present invention is shown.
  • the method may include the following steps, for example:
  • the original medical text to be structured can be obtained.
  • medical information can be obtained in a variety of ways.
  • the medical record information may be information input by the user.
  • the medical record information can also be information stored in a database.
  • the originally obtained medical record information may be information in the form of text, information in the form of images, or information in the form of voice. Since the embodiment is to structure the original medical text in text form, in the case that the originally obtained medical information is in the form of text, the original medical text may be the medical information itself, and the original medical record is obtained. Where the information is in a non-text form, the original medical text may be the original medical text converted into text.
  • the step 201 may include: acquiring medical record information in a voice form; performing voice recognition on the medical record information to obtain the original medical record text.
  • the steps 201 includes: acquiring medical record information in the form of an image; performing image recognition on the medical record information to obtain the original medical record text.
  • the original medical record information may sometimes contain information about multiple diagnoses of a patient.
  • the related information of the multiple diagnosis in the medical record information can be divided into a plurality of related information of one diagnosis, and then the related information of one diagnosis is used as the original medical text for structuring. deal with. That is, the original medical text may be medical text information related to one diagnosis for one patient.
  • the original medical record information includes the relevant information of the first diagnosis and the related information of the second diagnosis
  • the medical information obtained originally can be divided into the relevant information of one diagnosis and the related information of the second diagnosis according to the time of the consultation
  • the relevant information of the first consultation and the relevant information of the second consultation are respectively used as the original medical text, and the subsequent steps are performed.
  • the original medical text information can be divided into sentences. That is, the divided target text unit is a text sentence.
  • the original medical text information may be divided into units of phrases, phrases, paragraphs, and the like.
  • each target text unit divided for the original medical text may be searched for a target category matching the text feature of the target text unit in a preset preset category that can be used to describe the medical information. , thereby determining a corresponding target category for each target text unit. It can be understood that, for the target text unit, if the text feature of the target text unit matches the target category, the target category is a category for describing the target text unit.
  • the preset plurality of categories applicable to medical record information may include, for example, a category for describing patient information, a category for describing a disease name, a category for describing symptom statement information, and a description. Any of a plurality of categories of the category of the symptom discrimination information, the category for describing the medical order information, the category for describing the prescription information, and the like. That is, for any one of the target text units, the corresponding target category may be, for example, a category for describing patient information, a category for describing a disease name, a category for describing symptom statement information, and a description for symptom recognition.
  • the patient information may include, for example, a patient name, a patient gender, a patient's age, a visit time, and the like.
  • the symptom statement information may also be referred to as chief complaint information.
  • the symptom identification information may be a dialectical information on a TCM concept, or may be a Western medical plan. Read the test results information.
  • a machine learning model can be employed to determine a corresponding target category for the target text unit.
  • step 203 may be specifically: determining, according to the first machine learning model, a target category corresponding to the text feature of the target text unit, wherein the first machine learning model passes the historical medical text included in the training sample set The correspondence between the text feature and the preset category is obtained by training, and the historical medical text is text information under the preset category.
  • the training process of the first machine learning model may be specifically: in the case of determining that the historical medical text belongs to text information under a certain preset category, using the text feature of the historical medical text as an input And training the first machine learning model with the preset category to which the historical medical text belongs.
  • the plurality of historical medical texts for training may include the plurality of text information under the preset categories applicable to the medical information, so that the trained first machine learning model can accurately cover all available Preset category of medical information.
  • the historical medical text may be a sentence text in units of sentences, that is, text information of one sentence per training is used as a historical medical text.
  • the historical medical record information may also be a paragraph text in units of paragraphs, that is, each session uses text information of one paragraph as a historical medical text.
  • the first machine learning model can represent the correspondence between the text features and the preset categories, and therefore, the target text unit
  • the text feature is input to the trained first machine learning model, and the target category output by the first machine learning model is the category to which the target text unit belongs.
  • Target text unit is embodied as text information under the target target category in the target medical text.
  • each target text unit that is divided into the original medical text can be organized according to the target category to which it belongs, and the target medical text is generated.
  • the target medical text may be used, for example, for feedback to the user, that is, after step 204, the embodiment may, for example, further comprise: presenting the target medical text.
  • each target text unit in the target medical text is saved correspondingly to its corresponding target category, so that the target medical text can reflect the text information under which the target text unit belongs to each target category. For example, suppose the target text unit is "head very painful”, belonging The target category is “command”, and the information reflected in the target medical text can be “main complaint: the head is very painful”.
  • the target medical record can be Individual features can be set in the text to reflect these important feature words.
  • the embodiment may further include, for example, extracting, from the original medical text, a target feature word for describing the first feature item.
  • the target feature word in the target medical text is embodied as text information belonging to the first feature item.
  • the target feature word is correspondingly saved with the corresponding first feature item, so the target medical record text can reflect that the target feature word belongs to the corresponding first feature item. For example, if the target feature word is “angelica” and the first feature item belongs to “medicine material”, the information reflected in the target medical text may be “medicine material: angelica”.
  • the target feature words under the first feature item are text information recorded in the original medical text.
  • the first feature item may be a feature item for describing a patient name, that is, the target feature word may be information for describing a patient name.
  • the target feature word is "Zhang San”
  • the first feature item and the target feature word in the target medical text can be embodied as "patient name: Zhang San”.
  • the first feature item may be a feature item for describing a medicine, that is, the target feature word may be information for describing a medicine.
  • the medicine may be a Chinese medicine material or a western medicine product.
  • the first feature item and the target feature word in the target medical text can be embodied as "drug: amoxicillin”.
  • the target feature word is "angelica”
  • the first feature item and the target feature word in the target medical text can be embodied as "medicinal material: angelica”.
  • the first feature item may be a feature item for describing a dose, that is, the target feature word may be information for describing a dose.
  • the first feature item and the target feature word in the target medical text can be embodied as: “dose: 10 grams.”
  • the first feature item may be a feature item for describing a symptom, that is, the target feature word may be information for describing a symptom.
  • the target feature word is “headache”
  • the first feature item and the target feature word in the target medical text can be embodied as “symptoms: headaches”.
  • the feature words of the same meaning may be normalized so that the same feature word is used in the target medical text to describe the same meaning.
  • the process of extracting the target feature word may include, for example, analyzing the original medical text to obtain an initial feature word for describing the first feature item; and the initial feature in the standard feature vocabulary The words are matched to obtain a standard feature word that matches the initial feature word as the target feature word for describing the first feature item.
  • the standard feature vocabulary specifies a standard feature word for a plurality of feature words for describing the same meaning, and the standard feature vocabulary also records the correspondence between the non-standard feature words of the same meaning and the standard feature words. . If the initial feature word is a non-standard feature word in the standard feature lexicon, the corresponding standard feature word of the non-standard feature word in the standard feature vocabulary can be used as the target feature word. If the initial feature word is a standard feature word in the standard feature lexicon, the initial feature word itself can be used as the target feature word. For example, "headache” and “headache” can be normalized into “headache”, that is, "headache” is a non-standard characteristic word, and "headache” is a standard characteristic word.
  • the method may further include: establishing the description Corresponding relationship between the initial feature word of the first feature item and the target feature word for describing the first feature item is embodied in the target medical record text. That is, the initial feature word and the target feature word corresponding to each other may also be included in the target medical text. For example, if the initial feature word is “headache” and the target feature word is “headache”, the initial feature word and the target feature word in the target medical text can be embodied as “original word: headache; standard symptom: headache”. For another example, if the initial feature word is “twenty g” and the target feature word is “20 g”, the initial feature word and the target feature word in the target medical text can be embodied as “original word: twenty g; standard dose: 20 grams”.
  • the analysis of the original medical texts can be combined with lexical analysis and syntactic analysis by means of a medical special vocabulary, so that the extraction of feature words is more accurate.
  • the original medical text in order to obtain an initial feature word, may be subjected to lexical analysis and/or syntax analysis based on a medical-specific vocabulary, The initial feature word of the first feature item. For example, suppose the original medical text records that “the head is very painful”.
  • the “head” is a noun and a subject and represents the human body part
  • the “pain” is a verb, a predicate and indicates the state of the human body part. Based on this, the initial characteristics can be determined The word is "headache.”
  • the target feature words can be identified based on corresponding specific rules.
  • the target feature word may be extracted based on an age recognition rule (eg, the feature word includes “number + year” or “number + ten”).
  • the target feature word may be extracted based on a time recognition rule (eg, the feature word includes "year", “month”, “day” or has a separator ".” "/”, etc.) .
  • the target feature words can be identified by a specific recognition technique. For example, for a first characteristic "patient name”, a target feature word can be extracted based on a natural language processing named entity recognition technique.
  • the first feature item is a feature belonging to one or several target categories, that is, the target feature words under the first feature item are present in the text information under the target category.
  • the target feature word for describing the first feature item may be specifically extracted in text information under the target category of the first feature item. That is, after 203, a target feature word for describing the first feature item is extracted from the text information under the target category to which the first feature item belongs in the original medical text.
  • the text information under the target category includes all target text units corresponding to the target category.
  • the first feature item "drug” is a feature belonging to the target category "prescription", that is, the related information corresponding to the first feature item "drug” exists in the text information belonging to the category “prescription”. Therefore, after the text information belonging to the category “prescription” is determined by classifying the original medical text, the target feature word corresponding to the first feature “drug” can be searched for and extracted in the text information belonging to the category “prescription”.
  • the target feature words of the first feature item may also be searched and extracted from all the text information of the original medical text.
  • the embodiment may further include: determining an inferred feature word corresponding to the original medical text under the second feature item, wherein the inferred feature word is in the original medical text a feature word for describing the second feature item; the inferred feature word in the target medical text is embodied as text information belonging to the second feature item.
  • the inferred feature word is text information belonging to the corresponding second feature item. For example, if the original medical text does not record the gender of the patient, assuming that the patient is a female based on the original medical text, the inferred feature word is “female” and the target category is “patient gender”. The information reflected in the text of the case can be “gender: female”.
  • the inferred feature words belonging to the second feature item are text information not directly recorded in the original medical text.
  • the second feature item may be a feature item for describing the gender of the patient, that is, the inferred feature word may be a feature word for describing the gender of the patient. Assuming that the inferred feature word is "male”, the second feature item and the corresponding inferred feature word in the target medical text can be embodied as "patient gender: male".
  • the second feature item may be a feature item for describing the age of the patient, that is, the inferred feature word may be a feature word for describing the age of the patient. Assuming that the inferred feature word is "middle age”, the second feature item and the corresponding inferred feature word in the target medical text can be embodied as "patient age: middle age”.
  • the determining the manner of determining the feature word may include: determining, according to the second machine learning model, the inferred feature word corresponding to the original medical text under the second feature item, wherein the second machine learning The model is obtained by training a correspondence between a historical medical text included in the training sample set and a preset inferred feature word for describing the second characteristic item, and can be inferred from the historical medical text.
  • the historical feature words are obtained by training a correspondence between a historical medical text included in the training sample set and a preset inferred feature word for describing the second characteristic item, and can be inferred from the historical medical text.
  • the training process of the second machine learning model may be specifically: for the historical medical text that is difficult to extract the determined feature words, in the case of determining the inferred feature words corresponding to the historical medical text,
  • the second medical learning model is trained as an input to the historical medical text as an input. It can be understood that after training a certain number of historical medical texts and their corresponding inferred feature words, the second machine learning model can represent the correspondence between the medical text and the inferred feature words, and therefore, the structure will be
  • the original medical text is input to the trained second machine learning model, and the inferred feature word output by the second machine learning model is a feature that the original medical text can reflect.
  • the same or similar symptoms, patient information, etc. may be obtained from the original medical text provided by the user. Extracting the text content of the diagnostic information in the preset medical text of the text content and as the reference diagnostic information is embodied in the target medical text information for the user to refer to, therefore, The user can obtain the diagnostic information recommended as a reference by inputting the patient information, thereby realizing the function of "self-diagnosis".
  • the embodiment may further include, for example, searching for preset medical text matching the target medical text, wherein the preset medical text is text information under the target category Same or similar to the target text unit, the target category includes a category for describing patient personal information and/or a category for describing a patient's symptoms; extracting a category for describing diagnostic information in the preset medical text
  • the text information below is embodied in the target medical text as reference diagnostic information.
  • the category for describing the diagnosis information may be, for example, a category for describing the prescription information, a category for describing the condition discrimination information, and/or a category for describing the medical order information.
  • the preset medical text may be, for example, pre-collected classic medical information or medical information provided by a medical expert.
  • the text information used to match the original medical text and the preset medical text may be text information under a target category, or may be text information under multiple target categories.
  • different matching weights can be set for different target categories to measure between the original medical information and the preset medical text.
  • the degree of matching For example, the text information used to match the original medical information and the preset medical text may be text information under four target categories of “disorder”, “patient age”, “patient gender”, and “visiting time”.
  • “diagnosis time” has relatively small impact on the diagnostic information
  • “disease”, “patient age” and “patient gender” can adopt relatively large matching weights
  • “visiting time” can adopt relatively small matching. Weights.
  • the result of the matching may be original.
  • the medical record information matches the preset medical text. If the original medical information and the preset medical text are more consistent in the text information of “disorder” and “visiting time” and the “patient gender” is more inconsistent, the matching result may be the original medical information and the preset medical record. The text does not match.
  • the medical text of the original medical text and the target medical text may be a medical text of a Chinese medicine, or may be a medical text of a Western medicine.
  • the original medical text is divided into at least one target text unit and the target category corresponding to the text feature of the target text unit is determined for each target text unit.
  • Each target text unit in the medical text is reflected in the text information under its target category. It can be seen that since different information contents are classified into corresponding categories in the structured target medical text, on the one hand, when the target medical text is displayed to the user, the user can not only read more smoothly but also can be faster.
  • the text content categorized in the target medical text text is conducive to the search and identification of the information content, which also makes the target medical text more conducive to data collation and analysis.
  • the device may specifically include:
  • the obtaining unit 301 is configured to obtain the original medical text
  • a dividing unit 302 configured to divide the original medical text into at least one target text unit
  • a first determining unit 303 configured to determine a target category corresponding to the text feature of the target text unit
  • the generating unit 304 is configured to generate target medical text, wherein the target text unit is embodied as text information under the target category in the target medical text.
  • the first determining unit 303 may include:
  • a target category determining subunit configured to determine, according to the first machine learning model, a target category corresponding to a text feature of the target text unit, wherein the first machine learning model passes the historical medical text included in the training sample set
  • the correspondence between the text feature and the preset category is obtained by training.
  • the target category is a category for describing patient information, a category for describing a disease name, a category for describing symptom statement information, a category for describing symptom identification information, and a category for describing medical information. Or a category used to describe prescription information.
  • the device may further include:
  • a first extracting unit configured to extract, from the original medical text, a target feature word for describing the first feature item; wherein, in the target medical text, the target feature word is embodied as belonging to the first Text information for feature items.
  • the first extracting unit may include:
  • a target feature word extracting sub-unit configured to extract the target feature word for describing the first feature item from the text information under the target category to which the first feature item belongs in the original medical text.
  • the first extracting unit may specifically include: an analyzing subunit and a matching subunit;
  • the analysis subunit is configured to analyze the original medical text to obtain an initial feature word for describing the first feature item
  • the matching subunit is configured to match the initial feature words in a standard feature vocabulary to obtain a standard feature word that matches the initial feature word, as the target feature for describing the first feature item word.
  • the matching subunit may include:
  • the initial feature word extraction subunit is configured to perform lexical analysis and/or syntax analysis on the original medical text based on the medical special vocabulary to obtain the initial feature word for describing the first feature item.
  • the device may further include:
  • a establishing unit configured to describe a correspondence between the initial feature word and the target feature word of the first feature item, and embodied in the target medical record text.
  • the first feature item may be a feature item for describing a patient name, a feature item for describing a medicine, a feature item for describing a dose, or a feature item for describing a symptom.
  • the device may further include:
  • a second determining unit configured to determine an inferred feature word corresponding to the original medical text under the second feature item, wherein the inferred feature word is not described in the original medical text for describing the a characteristic word of the second feature item;
  • the inferred feature word is embodied in the target medical text as text information belonging to the second feature.
  • the second determining unit may include:
  • a feature word determining subunit configured to determine, according to a second machine learning model, an inferred feature word corresponding to the original medical text under the second feature item, wherein the second machine learning model passes the training sample
  • the historical medical record text included in the set is trained by the corresponding correspondence between the preset inferred feature words for describing the second feature item.
  • the second feature item may be a feature item for describing a gender of the patient or a feature item for describing the age of the patient.
  • the device may further include: a searching unit and a second extracting unit;
  • the searching unit is configured to search for preset medical text matching the target medical text, wherein the text information of the preset medical text under the target category is the same as the target text unit or Similarly, the target category includes categories for describing patient personal information and/or categories for describing patient symptoms;
  • the second extracting unit is configured to extract text information in a category for describing diagnostic information in the preset medical text for generating the target medical text.
  • the original medical text may be medical text information related to one diagnosis for one patient.
  • the obtaining unit 301 may include: a first acquiring subunit and a first identifying subunit;
  • the first obtaining subunit is configured to obtain medical record information in a voice form
  • the first identification subunit is configured to perform voice recognition on the medical record information to obtain the original medical record text.
  • the obtaining unit 301 may include: a second acquiring subunit and a second identifying subunit;
  • the second obtaining subunit is configured to acquire medical record information in an image form
  • the second identification subunit is configured to perform image recognition on the medical record information to obtain the original medical record text.
  • the device may further include:
  • a presentation unit for presenting the target medical text.
  • the original medical text is divided into at least one target text unit and the target category corresponding to the text feature of the target text unit is determined for each target text unit.
  • a structured target medical text can be generated such that each target text unit in the target medical text is embodied as textual information under its target category. It can be seen that since different information contents are classified into corresponding categories in the structured target medical text, on the one hand, when the target medical text is displayed to the user, the user can not only read more smoothly but also can be faster.
  • the text content categorized in the target medical text text is conducive to the search and identification of the information content, which also makes the target medical text more conducive to data collation and analysis.
  • apparatus 1800 can include one or more of the following components: processing component 1802, memory 1804, power component 1806, multimedia component 1806, audio component 1810, input/output (I/O) interface 1812, sensor component 1814, And a communication component 1816.
  • Processing component 1802 typically controls the overall operation of device 1800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • Processing component 1802 can include one or more processors 1820 to execute instructions to perform all or part of the steps described above.
  • processing component 1802 can include one or more modules to facilitate interaction between component 1802 and other components.
  • processing component 1802 can include a multimedia module to facilitate interaction between multimedia component 1806 and processing component 1802.
  • Memory 1804 is configured to store various types of data to support operation at device 1800. Examples of such data include instructions for any application or method operating on device 1800, contact data, phone book data, messages, pictures, videos, and the like. Memory 1804 can be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Disk Disk or Optical Disk.
  • Power component 1806 provides power to various components of device 1800.
  • Power component 1806 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for device 1800.
  • Multimedia component 1806 includes a screen between the device 1800 and the user that provides an output interface.
  • the screen can include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touches, slides, and gestures on the touch panel. The touch sensor may sense not only the boundary of the touch or sliding action, but also the duration and pressure associated with the touch or slide operation.
  • the multimedia component 1806 includes a front camera and/or a rear camera. When the device 1800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the audio component 1810 is configured to output and/or input an audio signal.
  • audio component 1810 includes a microphone (MIC) that is configured to receive an external audio signal when device 1800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode.
  • the received audio signal may be further stored in memory 1804 or transmitted via communication component 1816.
  • the audio component 1810 also includes a speaker for outputting an audio signal.
  • the I/O interface 1812 provides an interface between the processing component 1802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to, a home button, a volume button, a start button, and a lock button.
  • Sensor assembly 1814 includes one or more sensors for providing device 1800 with a status assessment of various aspects.
  • sensor assembly 1814 can detect an open/closed state of device 1800, relative positioning of components, such as the display and keypad of device 1800, and sensor component 1814 can also detect a change in position of one component of device 1800 or device 1800, The presence or absence of contact by the user with the device 1800, the orientation or acceleration/deceleration of the device 1800 and the temperature change of the device 1800.
  • Sensor assembly 1814 can include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • Sensor assembly 1814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor assembly 1814 can also include an acceleration sensor, a gyro sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 1816 is configured to facilitate wired or wireless communication between device 1800 and other devices.
  • the device 1800 can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof.
  • communication component 1816 receives broadcast signals or broadcast associated information from an external broadcast management system via a broadcast channel.
  • the communication component 1816 also includes a near field communication (NFC) module to facilitate short range communication.
  • NFC near field communication
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • device 1800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor, or other electronic component implementation for performing the above methods.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable A gate array
  • controller microcontroller, microprocessor, or other electronic component implementation for performing the above methods.
  • FIG. 5 is a schematic structural diagram of a server in an embodiment of the present invention.
  • the server 1900 can vary considerably depending on configuration or performance, and can include one or more central processing units (CPUs) 1922 (eg, one or more processors) and memory 1932, one or one The above storage medium 1942 or storage medium 1930 of data 1944 (eg, one or one storage device in Shanghai).
  • the memory 1932 and the storage medium 1930 may be short-term storage or persistent storage.
  • the program stored on storage medium 1930 may include one or more modules (not shown), each of which may include a series of instruction operations in the server.
  • central processor 1922 can be configured to communicate with storage medium 1930, which performs a series of instruction operations in storage medium 1930.
  • Server 1900 may also include one or more power sources 1926, one or more wired or wireless network interfaces 1950, one or more input and output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941.
  • power sources 1926 For example, Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • An embodiment of the present invention provides an apparatus.
  • the device includes a memory, and one or more programs, wherein one or more programs are stored in the memory, and configured to be executed by one or more processors to include the one or more programs for performing the following operations Instructions:
  • Target text unit in the target medical text is embodied as text information under the target category.
  • the device may be specifically the foregoing device 1800
  • the memory may be specifically the memory 1804 in the foregoing device 1800
  • the processor may be specifically the processor 1820 in the foregoing device 1800.
  • the device may be specifically the foregoing server 1900
  • the processor may be specifically the central processor 1922 in the foregoing server 1900
  • the memory may be specifically in the foregoing server 1900.
  • Storage medium 1930
  • the processor can specifically execute the following operations:
  • the target category is a category for describing patient information, a category for describing a disease name, a category for describing symptom statement information, a category for describing symptom identification information, and a category for describing medical information. Or a category used to describe prescription information.
  • the processor may further execute an instruction of:
  • the target feature word in the target medical text is embodied as text information belonging to the first feature item.
  • the processor may specifically execute an instruction of:
  • the processor may specifically execute an instruction of:
  • the initial feature words are matched in a standard feature vocabulary to obtain a standard feature word that matches the initial feature word as the target feature word for describing the first feature item.
  • the processor may specifically execute an instruction of:
  • the processor may further execute an instruction of:
  • the first feature item may be a feature item for describing a patient name, for describing a medicine
  • a feature item of a product for describing a medicine
  • a feature item for describing a dose for describing a symptom.
  • the processor may further execute an instruction of:
  • the inferred feature word is a feature word for describing the second feature item not recorded in the original medical text ;
  • the inferred feature word is embodied in the target medical text as text information belonging to the second feature.
  • the processor may specifically execute an instruction of:
  • the second feature item may be a feature item for describing a gender of the patient or a feature item for describing the age of the patient.
  • the processor may further execute an instruction of:
  • the target category includes categories for describing patient personal information and/or categories for describing patient symptoms;
  • Extracting text information under the category for describing the diagnosis information in the preset medical text is embodied as reference diagnostic information in the target medical text.
  • the original medical text may be medical text information related to one diagnosis for one patient.
  • the processor may specifically execute the following operations:
  • the processor may specifically perform the following operations. make:
  • the processor may further execute an instruction of:
  • Embodiments of the present invention also provide a non-transitory computer readable storage medium including instructions, such as a memory 1804 including instructions executable by the processor 1820 of the apparatus 1800 to perform the above methods, such as a storage medium including instructions. 1930, the above instructions may be executed by the central processor 1922 of the server 1900 to perform the above method.
  • the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
  • a non-transitory computer readable storage medium when instructions in the storage medium are executed by a processor of an electronic device, enabling the electronic device to perform a method of communication, the method comprising:
  • Target text unit in the target medical text is embodied as text information under the target category.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Machine Translation (AREA)

Abstract

Disclosed by the present invention is a method for processing medical record information. The method comprises: acquiring an original medical record text and dividing the original medical record text into at least one target text unit; determining a target category corresponding to a text feature of the target text unit; and generating a target medical record text, the target text unit in the target medical record text being embodied as text information under the target category. By means of the method provided by the embodiments of the present invention, different information content in a structured target medical record text is respectively divided into corresponding categories, which not only may enable a user to read more smoothly, but may also enable the user to find desired information content more quickly so that the target medical record text is more favorable for data organization and analysis. In addition, also disclosed by the present invention are a device and equipment for processing medical record information.

Description

一种医案信息的处理方法、装置和设备Method, device and device for processing medical record information
本申请要求于2016年12月28号提交中国专利局、申请号为201611236257.2、发明名称为“一种医案信息的处理方法、装置和设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201611236257.2, entitled "Processing, Apparatus and Apparatus for Processing Medical Information" on December 28, 2016, the entire contents of which are hereby incorporated by reference. Combined in this application.
技术领域Technical field
本发明涉及信息处理技术领域,特别是涉及一种医案信息的处理方法、装置和设备。The present invention relates to the field of information processing technologies, and in particular, to a method, device and device for processing medical record information.
背景技术Background technique
目前,医案信息已经成为信息处理技术中十分常见的信息处理对象。由于医案信息能够反映患者的就医情况,医案信息可以用于医生、患者了解患者的历史病症、治疗等情况,也可以用于对大量患者的病症情况、治疗情况进行数据分析。At present, medical information has become a very common object of information processing in information processing technology. Since the medical record information can reflect the patient's medical treatment, the medical record information can be used for doctors, patients to understand the patient's historical conditions, treatment, etc., and can also be used to analyze data on the condition and treatment of a large number of patients.
但是,通常能够直接获取到的医案信息,其内容通常是杂乱无章的,也即,各种不同的信息内容不加区分地拼凑在一起。因此,一方面,在向用户显示这样的医案信息时,用户不仅难以顺畅地阅读而且也无法快速寻找到需要的信息内容,另一方面,这样的医案信息不利于信息内容的查找和识别,因此也难以用于数据整理和分析。However, the medical record information that can usually be obtained directly is usually chaotic, that is, various information contents are pieced together indiscriminately. Therefore, on the one hand, when displaying such medical information to the user, the user is not only difficult to read smoothly but also cannot quickly find the required information content. On the other hand, such medical information is not conducive to the search and identification of the information content. Therefore, it is also difficult to use for data collation and analysis.
发明内容Summary of the invention
本发明所要解决的技术问题是,提供一种医案信息的处理方法、装置和设备,以使得医案信息中各种不同的信息内容能够按照一定的结构格式区分开,实现医案信息的结构化,不仅便于用户阅读和快速寻找需求的信息内容,并且也便于数据整理和分析。The technical problem to be solved by the present invention is to provide a method, a device and a device for processing medical record information, so that various information contents in the medical record information can be distinguished according to a certain structural format, and the structure of the medical record information is realized. It not only makes it easy for users to read and quickly find the information content of the demand, but also facilitates data collation and analysis.
第一方面,本发明实施例提供了一种医案信息的处理方法,包括:In a first aspect, an embodiment of the present invention provides a method for processing medical record information, including:
获取原始医案文本,并将所述原始医案文本划分成至少一个目标文本单元;Obtaining the original medical text and dividing the original medical text into at least one target text unit;
确定所述目标文本单元的文本特征对应的目标类别; Determining a target category corresponding to a text feature of the target text unit;
生成目标医案文本,其中,在所述目标医案文本中所述目标文本单元体现为所属目标类别下的文本信息。Generating a target medical text, wherein the target text unit in the target medical text is embodied as text information under the target category.
可选的,所述确定所述目标文本单元的文本特征对应的目标类别,可以包括:Optionally, the determining the target category corresponding to the text feature of the target text unit may include:
基于第一机器学习模型,确定所述目标文本单元的文本特征对应的目标类别,其中,所述第一机器学习模型通过对训练样本集中包括的历史医案文本的文本特征与预置类别之间的对应关系进行训练而得到。Determining, according to the first machine learning model, a target category corresponding to the text feature of the target text unit, wherein the first machine learning model passes between a text feature of the historical medical text included in the training sample set and the preset category The correspondence is obtained by training.
可选的,所述目标类别可以为用于描述患者信息的类别、用于描述疾病名称的类别、用于描述症状陈述信息的类别、用于描述症状辨别信息的类别、用于描述医嘱信息的类别、或用于描述处方信息的类别。Optionally, the target category may be a category for describing patient information, a category for describing a disease name, a category for describing symptom statement information, a category for describing symptom identification information, and a description for medical information. Category, or category used to describe prescription information.
可选的,所述方法还可以包括:Optionally, the method may further include:
从所述原始医案文本中提取用于描述第一特征项的目标特征词;Extracting a target feature word for describing the first feature item from the original medical text;
其中,在所述目标医案文本中所述目标特征词体现为属于所述第一特征项的文本信息。The target feature word in the target medical text is embodied as text information belonging to the first feature item.
可选的,所述从所述原始医案文本中提取用于描述第一特征项的目标特征词,可以包括:Optionally, the extracting the target feature words for describing the first feature item from the original medical text may include:
从所述原始医案文本中、所述第一特征项所属的目标类别下的文本信息中提取所述用于描述第一特征项的目标特征词。Extracting the target feature words for describing the first feature item from the text information under the target category to which the first feature item belongs in the original medical text.
可选的,所述从所述原始医案文本中提取用于描述第一特征项的目标特征词,可以包括:Optionally, the extracting the target feature words for describing the first feature item from the original medical text may include:
对所述原始医案文本进行分析,得到用于描述所述第一特征项的初始特征词;Performing analysis on the original medical text to obtain an initial feature word for describing the first feature item;
在标准特征词库中对所述初始特征词进行匹配,得到与所述初始特征词相匹配的标准特征词,作为所述用于描述第一特征项的目标特征词。The initial feature words are matched in a standard feature vocabulary to obtain a standard feature word that matches the initial feature word as the target feature word for describing the first feature item.
可选的,所述对所述原始医案文本进行分析,得到用于描述所述第一特征项的初始特征词,可以包括:Optionally, the analyzing the original medical text to obtain an initial feature word for describing the first feature item may include:
基于医学专用词库,对所述原始医案文本进行词法分析和/或句法分析,得到所述用于描述所述第一特征项的初始特征词。Performing lexical analysis and/or syntax analysis on the original medical text based on a medical-specific vocabulary to obtain the initial feature words for describing the first feature item.
可选的,所述方法还可以包括: Optionally, the method may further include:
建立用于描述所述第一特征项的初始特征词和目标特征词的对应关系,并体现在所述目标医案文本中。Corresponding relationship between the initial feature word and the target feature word for describing the first feature item is established and embodied in the target medical record text.
可选的,所述方法还可以包括:Optionally, the method may further include:
确定所述原始医案文本在第二特征项下对应的推断特征词,其中,所述推断特征词为在所述原始医案文本中没有记载的用于描述所述第二特征项的特征词;Determining the inferred feature word corresponding to the original medical text under the second feature item, wherein the inferred feature word is a feature word for describing the second feature item not recorded in the original medical text ;
在所述目标医案文本中所述推断特征词体现为属于所述第二特征项的文本信息。The inferred feature word is embodied in the target medical text as text information belonging to the second feature.
可选的,所述确定所述原始医案文本在第二特征项下对应的推断特征词,可以包括:Optionally, the determining the inferred feature word corresponding to the original medical text in the second feature item may include:
基于第二机器学习模型,确定所述原始医案文本在所述第二特征项下对应的推断特征词,其中,所述第二机器学习模型通过对训练样本集中包括的历史医案文本与预置的用于描述所述第二特征项的推断特征词之间的对应关系进行训练而得到。Determining, according to the second machine learning model, the inferred feature words corresponding to the original medical text under the second feature item, wherein the second machine learning model passes the historical medical text and the pre-set included in the training sample set The corresponding relationship between the inferred feature words for describing the second feature item is obtained by training.
可选的,所述生成目标医案文本之后,所述方法还可以包括:Optionally, after the generating the target medical text, the method may further include:
查找与所述目标医案文本相匹配的预置医案文本,其中,所述预置医案文本在所述目标类别下的文本信息与所述目标文本单元相同或相似,所述目标类别包括用于描述患者个人信息的类别和/或用于描述患者症状的类别;Finding a preset medical text matching the target medical text, wherein the text information of the preset medical text under the target category is the same as or similar to the target text unit, and the target category includes a category used to describe a patient's personal information and/or a category used to describe a patient's symptoms;
提取所述预置医案文本中用于描述诊断信息的类别下的文本信息,作为参考诊断信息体现在所述目标医案文本中。Extracting text information under the category for describing the diagnosis information in the preset medical text is embodied as reference diagnostic information in the target medical text.
可选的,所述获取原始医案文本,可以包括:Optionally, the obtaining the original medical text may include:
获取语音形式的医案信息;对所述医案信息进行语音识别,得到所述原始医案文本;Obtaining medical record information in a voice form; performing voice recognition on the medical record information to obtain the original medical record text;
或者,or,
获取图像形式的医案信息;对所述医案信息进行图像识别,得到所述原始医案文本。Obtaining medical record information in the form of an image; performing image recognition on the medical record information to obtain the original medical record text.
第二方面,本发明实施例提供了一种医案信息的处理装置,包括:In a second aspect, an embodiment of the present invention provides a processing device for processing medical information, including:
获取单元,用于获取原始医案文本;An acquisition unit for obtaining the original medical text;
划分单元,用于将所述原始医案文本划分成至少一个目标文本单元; a dividing unit, configured to divide the original medical text into at least one target text unit;
第一确定单元,用于确定所述目标文本单元的文本特征对应的目标类别;a first determining unit, configured to determine a target category corresponding to the text feature of the target text unit;
生成单元,用于生成目标医案文本,其中,在所述目标医案文本中所述目标文本单元体现为所属目标类别下的文本信息。And a generating unit, configured to generate the target medical text, wherein the target text unit is embodied as text information under the target target category in the target medical text.
可选的,所述第一确定单元,可以包括:Optionally, the first determining unit may include:
目标类别确定子单元,用于基于第一机器学习模型,确定所述目标文本单元的文本特征对应的目标类别,其中,所述第一机器学习模型通过对训练样本集中包括的历史医案文本的文本特征与预置类别之间的对应关系进行训练而得到。a target category determining subunit, configured to determine, according to the first machine learning model, a target category corresponding to a text feature of the target text unit, wherein the first machine learning model passes the historical medical text included in the training sample set The correspondence between the text feature and the preset category is obtained by training.
可选的,所述目标类别为用于描述患者信息的类别、用于描述疾病名称的类别、用于描述症状陈述信息的类别、用于描述症状辨别信息的类别、用于描述医嘱信息的类别、或用于描述处方信息的类别。Optionally, the target category is a category for describing patient information, a category for describing a disease name, a category for describing symptom statement information, a category for describing symptom identification information, and a category for describing medical information. Or a category used to describe prescription information.
可选的,所述装置还可以包括:Optionally, the device may further include:
第一提取单元,用于从所述原始医案文本中提取用于描述第一特征项的目标特征词;其中,在所述目标医案文本中所述目标特征词体现为属于所述第一特征项的文本信息。a first extracting unit, configured to extract, from the original medical text, a target feature word for describing the first feature item; wherein, in the target medical text, the target feature word is embodied as belonging to the first Text information for feature items.
可选的,所述第一提取单元,可以包括:Optionally, the first extracting unit may include:
目标特征词提取子单元,用于从所述原始医案文本中、所述第一特征项所属的目标类别下的文本信息中提取所述用于描述第一特征项的目标特征词。And a target feature word extracting sub-unit, configured to extract the target feature word for describing the first feature item from the text information under the target category to which the first feature item belongs in the original medical text.
可选的,所述第一提取单元,可以具体包括:分析子单元和匹配子单元;Optionally, the first extracting unit may specifically include: an analyzing subunit and a matching subunit;
所述分析子单元,用于对所述原始医案文本进行分析,得到用于所述描述第一特征项的初始特征词;The analysis subunit is configured to analyze the original medical text to obtain an initial feature word for describing the first feature item;
所述匹配子单元,用于在标准特征词库中对所述初始特征词进行匹配,得到与所述初始特征词相匹配的标准特征词,作为所述用于描述第一特征项的目标特征词。The matching subunit is configured to match the initial feature words in a standard feature vocabulary to obtain a standard feature word that matches the initial feature word, as the target feature for describing the first feature item word.
可选的,所述匹配子单元,具体可以包括:Optionally, the matching subunit may specifically include:
初始特征词提取子单元,用于基于医学专用词库,对所述原始医案文本进行词法分析和/或句法分析,得到所述用于描述所述第一特征项的初始特征词。The initial feature word extraction subunit is configured to perform lexical analysis and/or syntax analysis on the original medical text based on the medical special vocabulary to obtain the initial feature word for describing the first feature item.
可选的,所述装置还可以包括:Optionally, the device may further include:
建立单元,用于描述所述第一特征项的初始特征词和目标特征词的对应关 系,并体现在所述目标医案文本中。a establishing unit, configured to describe a corresponding relationship between the initial feature word and the target feature word of the first feature item And reflected in the target medical text.
可选的,所述装置还可以包括:Optionally, the device may further include:
第二确定单元,用于确定所述原始医案文本在第二特征项下对应的推断特征词,其中,所述推断特征词为在所述原始医案文本中没有记载的用于描述所述第二特征项的特征词;a second determining unit, configured to determine an inferred feature word corresponding to the original medical text under the second feature item, wherein the inferred feature word is not described in the original medical text for describing the a characteristic word of the second feature item;
在所述目标医案文本中所述推断特征词体现为属于所述第二特征项的文本信息。The inferred feature word is embodied in the target medical text as text information belonging to the second feature.
可选的,所述第二确定单元,可以包括:Optionally, the second determining unit may include:
推断特征词确定子单元,用于基于第二机器学习模型,确定所述原始医案文本在所述第二特征项下对应的推断特征词,其中,所述第二机器学习模型通过对训练样本集中包括的历史医案文本与预置的用于描述所述第二特征项的推断特征词之间的对应关系进行训练而得到。Deducing a feature word determining subunit, configured to determine, according to a second machine learning model, an inferred feature word corresponding to the original medical text under the second feature item, wherein the second machine learning model passes the training sample The historical medical record text included in the set is trained by the corresponding correspondence between the preset inferred feature words for describing the second feature item.
可选的,所述装置还可以包括:查找单元和第二提取单元;Optionally, the device may further include: a searching unit and a second extracting unit;
所述查找单元,用于查找与所述目标医案文本相匹配的预置医案文本,其中,所述预置医案文本在所述目标类别下的文本信息与所述目标文本单元相同或相似,所述目标类别包括用于描述患者个人信息的类别和/或用于描述患者症状的类别;The searching unit is configured to search for preset medical text matching the target medical text, wherein the text information of the preset medical text under the target category is the same as the target text unit or Similarly, the target category includes categories for describing patient personal information and/or categories for describing patient symptoms;
所述第二提取单元,用于提取所述预置医案文本中用于描述诊断信息的类别下的文本信息,以用于生成所述目标医案文本。The second extracting unit is configured to extract text information in a category for describing diagnostic information in the preset medical text for generating the target medical text.
可选的,所述获取单元可以包括:第一获取子单元和第一识别子单元;Optionally, the obtaining unit may include: a first acquiring subunit and a first identifying subunit;
所述第一获取子单元,用于获取语音形式的医案信息;The first obtaining subunit is configured to obtain medical record information in a voice form;
所述第一识别子单元,用于对所述医案信息进行语音识别,得到所述原始医案文本。The first identification subunit is configured to perform voice recognition on the medical record information to obtain the original medical record text.
可选的,所述获取单元可以包括:第二获取子单元和第二识别子单元;Optionally, the acquiring unit may include: a second acquiring subunit and a second identifying subunit;
所述第二获取子单元,用于获取图像形式的医案信息;The second obtaining subunit is configured to acquire medical record information in an image form;
所述第二识别子单元,用于对所述医案信息进行图像识别,得到所述原始医案文本。The second identification subunit is configured to perform image recognition on the medical record information to obtain the original medical record text.
第三方面,本发明实施例提供了一种设备,包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由 一个或者一个以上处理器执行所述一个或者一个以上程序包含用于进行以下操作的指令:In a third aspect, an embodiment of the present invention provides an apparatus, including a memory, and one or more programs, wherein one or more programs are stored in a memory and configured to be Execution of the one or more programs by one or more processors includes instructions for performing the following operations:
获取原始医案文本,并将所述原始医案文本划分成至少一个目标文本单元;Obtaining the original medical text and dividing the original medical text into at least one target text unit;
确定所述目标文本单元的文本特征对应的目标类别;Determining a target category corresponding to a text feature of the target text unit;
生成目标医案文本,其中,在所述目标医案文本中所述目标文本单元体现为所属目标类别下的文本信息。Generating a target medical text, wherein the target text unit in the target medical text is embodied as text information under the target category.
与现有技术相比,本发明实施例具有以下优点:Compared with the prior art, the embodiment of the invention has the following advantages:
根据本发明实施方式提供的方法、装置和设备,对于没有结构化的原始医案文本,通过将所述原始医案文本划分成至少一个目标文本单元并为每一个目标文本单元确定该目标文本单元的文本特征对应的目标类别,可以生成结构化的目标医案文本,使得在目标医案文本中每一个目标文本单元均体现为其所属目标类别下的文本信息。由此可见,由于在结构化的目标医案文本中不同的信息内容分别被划分到了相应的类别下,一方面,向用户显示目标医案文本时用户不仅能够更顺畅地阅读并且也能够更快地寻找到需要的信息内容,另一方面,目标医案文本中分类体现的文本内容有利于信息内容的查找和识别,这也使得目标医案文本更利于数据整理和分析。A method, apparatus and apparatus according to an embodiment of the present invention, for unstructured original medical text, by dividing the original medical text into at least one target text unit and determining the target text unit for each target text unit The target category corresponding to the text feature can generate a structured target medical text, so that each target text unit in the target medical text is embodied as text information under the target category to which it belongs. It can be seen that since different information contents are classified into corresponding categories in the structured target medical text, on the one hand, when the target medical text is displayed to the user, the user can not only read more smoothly but also can be faster. On the other hand, the text content categorized in the target medical text text is conducive to the search and identification of the information content, which also makes the target medical text more conducive to data collation and analysis.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a few embodiments described in the present invention, and other drawings can be obtained from those skilled in the art without any inventive effort.
图1为本发明实施例中一个示例性应用场景的框架示意图;FIG. 1 is a schematic diagram of a framework of an exemplary application scenario according to an embodiment of the present invention; FIG.
图2为本发明实施例中一种医案信息的处理方法的流程示意图;2 is a schematic flow chart of a method for processing medical record information according to an embodiment of the present invention;
图3为本发明实施例中一种医案信息的处理装置的结构示意图;3 is a schematic structural diagram of a processing device for processing medical records according to an embodiment of the present invention;
图4为本发明实施例中一种装置的结构示意图;4 is a schematic structural diagram of a device according to an embodiment of the present invention;
图5是本发明实施例中一种服务器的结构示意图。 FIG. 5 is a schematic structural diagram of a server according to an embodiment of the present invention.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
发明人经过研究发现,通常能够直接获取到的医案信息,如用户输入的医案信息,其内容通常是杂乱无章的。其中,用于描述不同特征的信息内容不加区分地拼凑在一起,一方面,用户难以顺畅地阅读杂乱无章的医案信息,另一方面,杂乱无章的医案信息不利于用户对于信息内容的查找和识别。The inventors have found through research that the medical information that can usually be directly obtained, such as the medical information input by the user, is usually disorderly. Among them, the information content used to describe different features is pieced together indiscriminately. On the one hand, it is difficult for the user to smoothly read the disorganized medical information. On the other hand, the disorderly medical information is not conducive to the user's search for information content. Identification.
为了解决上述问题,在本发明实施例中,将原始医案文本划分成至少一个目标文本单元,并为每一个目标文本单元确定该目标文本单元的文本特征对应的目标类别,并据此生成结构化的目标医案文本,使得在目标医案文本中每一个目标文本单元均体现为其所属目标类别下的文本信息。由此可见,由于在结构化的目标医案文本中不同的信息内容分别被划分到了相应的类别下,一方面,向用户显示目标医案文本时用户不仅能够更顺畅地阅读并且也能够更容易、更快地寻找到需要的信息内容,另一方面,目标医案文本中分类体现的文本内容有利于信息内容的查找和识别,这也使得目标医案文本更利于数据整理和分析。In order to solve the above problem, in the embodiment of the present invention, the original medical text is divided into at least one target text unit, and the target category corresponding to the text feature of the target text unit is determined for each target text unit, and the structure is generated accordingly. The target medical text is such that each target text unit in the target medical text is reflected in the text information under the target category. It can be seen that since different information contents are classified into corresponding categories in the structured target medical text, on the one hand, when the target medical text is displayed to the user, the user can not only read more smoothly but also can be easier. To find the required information content more quickly, on the other hand, the text content categorized in the target medical text text is conducive to the search and identification of the information content, which also makes the target medical text more conducive to data collation and analysis.
举例说明,本发明实施例可以应用到如图1所示的场景,其中,用户终端102与服务器101之间通过网络103实现交互。在这一场景中,服务器101获取用户终端102发送的原始医案文本。然后,服务器101将所述原始医案文本划分成至少一个目标文本单元,确定所述目标文本单元的文本特征对应的目标类别,并生成目标医案文本,其中,在所述目标医案文本中所述目标文本单元体现为所述目标类别下的文本信息。再后,服务器101可以将所述目标医案文本信息发送到用户终端102进行显示。For example, the embodiment of the present invention can be applied to the scenario shown in FIG. 1 , where the user terminal 102 and the server 101 implement interaction through the network 103 . In this scenario, the server 101 obtains the original medical text transmitted by the user terminal 102. Then, the server 101 divides the original medical text into at least one target text unit, determines a target category corresponding to the text feature of the target text unit, and generates a target medical text, wherein in the target medical text The target text unit is embodied as text information under the target category. Then, the server 101 can transmit the target medical text information to the user terminal 102 for display.
可以理解的是,用户终端102可以是现有的、正在研发的或将来研发的、能够通过任何形式的有线和/或无线连接(例如,Wi-Fi、LAN、蜂窝、同轴电缆等)实现与服务器101交互的任何用户设备,包括但不限于:现有的、正在 研发的或将来研发的智能手机、非智能手机、平板电脑、膝上型个人计算机、桌面型个人计算机、小型计算机、中型计算机、大型计算机等。It will be appreciated that the user terminal 102 can be existing, under development, or developed in the future, and can be implemented by any form of wired and/or wireless connection (eg, Wi-Fi, LAN, cellular, coaxial cable, etc.). Any user device that interacts with server 101, including but not limited to: existing, ongoing Smartphones, non-smart phones, tablets, laptop personal computers, desktop personal computers, small computers, medium-sized computers, large computers, etc. that are developed or developed in the future.
此外,服务器101仅是现有的、正在研发的或将来研发的、能够向用户提供医案信息处理功能的设备的一个示例。本发明的实施方式在此方面不受任何限制。Further, the server 101 is merely an example of an existing, research-developed or future-developed device capable of providing medical information processing functions to a user. Embodiments of the invention are not subject to any limitation in this regard.
可以理解的是,在上述场景中,虽然将本发明实施方式的动作描述为由服务器101执行,但是这些动作也可以部分由用户终端102执行、部分由服务器101执行,或者完全由用户终端102执行。本发明在执行主体方面不受限制,只要执行了本发明实施方式所公开的动作即可。It will be understood that in the above scenario, although the actions of the embodiments of the present invention are described as being performed by the server 101, these actions may also be partially performed by the user terminal 102, partially by the server 101, or performed entirely by the user terminal 102. . The present invention is not limited in terms of the execution subject, and only the actions disclosed in the embodiments of the present invention may be performed.
需要注意的是,上述应用场景仅是为了便于理解本发明而示出,本发明的实施方式在此方面不受任何限制。相反,本发明的实施方式可以应用于适用的任何场景。It should be noted that the above application scenarios are only for the purpose of facilitating understanding of the present invention, and embodiments of the present invention are not limited in this respect. Rather, embodiments of the invention may be applied to any scenario that is applicable.
下面结合附图,详细说明本发明的各种非限制性实施方式。Various non-limiting embodiments of the present invention are described in detail below with reference to the drawings.
示例性方法Exemplary method
参见图2,示出了本发明实施例中一种医案信息的处理方法的流程示意图。在本实施例中,所述方法例如可以包括以下步骤:Referring to FIG. 2, a schematic flowchart of a method for processing medical record information in an embodiment of the present invention is shown. In this embodiment, the method may include the following steps, for example:
201、获取原始医案文本。201. Obtain the original medical text.
具体实现时,基于获取到的医案信息,可以得到待结构化的原始医案文本。其中,医案信息可以有多种获取方式。例如,医案信息可以是用户输入的信息。又如,医案信息也可以是数据库中保存的信息。In the specific implementation, based on the obtained medical record information, the original medical text to be structured can be obtained. Among them, medical information can be obtained in a variety of ways. For example, the medical record information may be information input by the user. As another example, the medical record information can also be information stored in a database.
可以理解的是,原始获取到的医案信息有多种可能的形式。例如,原始获取到的医案信息可以是文本形式的信息,也可以是图像形式的信息,还可以是语音形式的信息。由于本实施例是对文本形式的原始医案文本进行结构化处理,在原始获取的医案信息为文本形式的情况下原始医案文本可以是所述医案信息本身,在原始获取的医案信息为非文本形式的情况下原始医案文本可以为转化成文本形式的原始医案文本。例如,在所述医案信息为语音形式的情况下,步骤201可以包括:获取语音形式的医案信息;对所述医案信息进行语音识别,得到所述原始医案文本。又如,在所述医案信息为图像形式的情况下,步骤 201包括:获取图像形式的医案信息;对所述医案信息进行图像识别,得到所述原始医案文本。It can be understood that there are many possible forms of medical information obtained from the original. For example, the originally obtained medical record information may be information in the form of text, information in the form of images, or information in the form of voice. Since the embodiment is to structure the original medical text in text form, in the case that the originally obtained medical information is in the form of text, the original medical text may be the medical information itself, and the original medical record is obtained. Where the information is in a non-text form, the original medical text may be the original medical text converted into text. For example, in a case where the medical record information is in a voice form, the step 201 may include: acquiring medical record information in a voice form; performing voice recognition on the medical record information to obtain the original medical record text. For another example, in the case where the medical record information is in the form of an image, the steps 201 includes: acquiring medical record information in the form of an image; performing image recognition on the medical record information to obtain the original medical record text.
需要说明的是,原始获取到的医案信息中有时候可能包含了针对一个患者的多次诊断的相关信息。为了使得结构化处理得到的医案信息相统一,可以将医案信息中多次诊断的相关信息划分成多个一次诊断的相关信息,再以一次诊断的相关信息作为原始医案文本进行结构化处理。也即,所述原始医案文本可以为针对一个患者一次诊断涉及的医案文本信息。例如,假设原始获取到的医案信息中包含了一诊的相关信息和二诊的相关信息,可以按照就诊时间将原始获取到的医案信息划分成一诊的相关信息和二诊的相关信息,再以一诊的相关信息和二诊的相关信息分别作为原始医案文本,执行后续步骤。It should be noted that the original medical record information may sometimes contain information about multiple diagnoses of a patient. In order to make the medical information obtained by the structured processing unified, the related information of the multiple diagnosis in the medical record information can be divided into a plurality of related information of one diagnosis, and then the related information of one diagnosis is used as the original medical text for structuring. deal with. That is, the original medical text may be medical text information related to one diagnosis for one patient. For example, suppose that the original medical record information includes the relevant information of the first diagnosis and the related information of the second diagnosis, and the medical information obtained originally can be divided into the relevant information of one diagnosis and the related information of the second diagnosis according to the time of the consultation, The relevant information of the first consultation and the relevant information of the second consultation are respectively used as the original medical text, and the subsequent steps are performed.
202、将所述原始医案文本划分成至少一个目标文本单元。202. Divide the original medical text into at least one target text unit.
具体实现时,可以以句子为单位对原始医案文本信息进行划分。也即,划分得到的目标文本单元为文本句子。当然,本实施例中,还可以但不限于以词组、短语、段落等为单位对原始医案文本信息进行划分。In the specific implementation, the original medical text information can be divided into sentences. That is, the divided target text unit is a text sentence. Of course, in this embodiment, the original medical text information may be divided into units of phrases, phrases, paragraphs, and the like.
203、确定所述目标文本单元的文本特征对应的目标类别。203. Determine a target category corresponding to the text feature of the target text unit.
具体实现时,分别针对原始医案文本划分出来的每个目标文本单元,可以在预置的多个可用于描述医案信息的预置类别中查找与目标文本单元的文本特征相匹配的目标类别,从而为每个目标文本单元确定一个相对应的目标类别。可以理解的是,对于目标文本单元来说,若该目标文本单元的文本特征与目标类别相匹配,则该目标类别为用于描述该目标文本单元的类别。In a specific implementation, each target text unit divided for the original medical text may be searched for a target category matching the text feature of the target text unit in a preset preset category that can be used to describe the medical information. , thereby determining a corresponding target category for each target text unit. It can be understood that, for the target text unit, if the text feature of the target text unit matches the target category, the target category is a category for describing the target text unit.
可以理解的是,所述预置的多个可用于医案信息的类别例如可以包括用于描述患者信息的类别、用于描述疾病名称的类别、用于描述症状陈述信息的类别、用于描述症状辨别信息的类别、用于描述医嘱信息的类别、用于描述处方信息的类别等其中任意多个类别。也即,对于任意一个目标文本单元来说,其对应的目标类别例如可以为用于描述患者信息的类别、用于描述疾病名称的类别、用于描述症状陈述信息的类别、用于描述症状辨别信息的类别、用于描述医嘱信息的类别或用于描述处方信息的类别。其中,所述患者信息例如可以包括患者姓名、患者性别、患者年龄、就诊时间等。所述症状陈述信息也可称为主诉信息。所述症状辨别信息可以是中医概念上的辨证信息,也可以是西医概 念上的化验结果信息。It can be understood that the preset plurality of categories applicable to medical record information may include, for example, a category for describing patient information, a category for describing a disease name, a category for describing symptom statement information, and a description. Any of a plurality of categories of the category of the symptom discrimination information, the category for describing the medical order information, the category for describing the prescription information, and the like. That is, for any one of the target text units, the corresponding target category may be, for example, a category for describing patient information, a category for describing a disease name, a category for describing symptom statement information, and a description for symptom recognition. The category of information, the category used to describe the medical order information, or the category used to describe the prescription information. The patient information may include, for example, a patient name, a patient gender, a patient's age, a visit time, and the like. The symptom statement information may also be referred to as chief complaint information. The symptom identification information may be a dialectical information on a TCM concept, or may be a Western medical plan. Read the test results information.
在本实施例中,例如可以采用机器学习模型来为目标文本单元确定相应的目标类别。具体地,步骤203可以具体为:基于第一机器学习模型,确定所述目标文本单元的文本特征对应的目标类别,其中,所述第一机器学习模型通过对训练样本集中包括的历史医案文本的文本特征与预置类别之间的对应关系进行训练而得到,所述历史医案文本为所述预置类别下的文本信息。其中,所述第一机器学习模型的训练过程可以具体在于,在确定所述历史医案文本属于某个预置类别下的文本信息的情况下,以所述历史医案文本的文本特征作为输入、以所述历史医案文本所属的预置类别作为输出对所述第一机器学习模型进行训练。其中,用于训练的多个历史医案文本可以包括所述多个可用于医案信息的预置类别下的文本信息,以便于使得训练后的第一机器学习模型能够准确地覆盖所有可用于医案信息的预置类别。此外,所述历史医案文本可以是以句子为单位的句子文本,即每一次训练使用一个句子的文本信息作为历史医案文本。或者,所述历史医案信息也可以是以段落为单元的段落文本,即每一次训练使用一个段落的文本信息作为历史医案文本。可以理解的是,针对一定数量的历史医案文本和其对应的预置类别进行了训练之后,第一机器学习模型可以表示文本特征与预置类别之间的对应关系,因此,将目标文本单元的文本特征输入到训练过的第一机器学习模型,第一机器学习模型输出的目标类别即是所述目标文本单元所属的类别。In this embodiment, for example, a machine learning model can be employed to determine a corresponding target category for the target text unit. Specifically, step 203 may be specifically: determining, according to the first machine learning model, a target category corresponding to the text feature of the target text unit, wherein the first machine learning model passes the historical medical text included in the training sample set The correspondence between the text feature and the preset category is obtained by training, and the historical medical text is text information under the preset category. Wherein, the training process of the first machine learning model may be specifically: in the case of determining that the historical medical text belongs to text information under a certain preset category, using the text feature of the historical medical text as an input And training the first machine learning model with the preset category to which the historical medical text belongs. Wherein, the plurality of historical medical texts for training may include the plurality of text information under the preset categories applicable to the medical information, so that the trained first machine learning model can accurately cover all available Preset category of medical information. In addition, the historical medical text may be a sentence text in units of sentences, that is, text information of one sentence per training is used as a historical medical text. Alternatively, the historical medical record information may also be a paragraph text in units of paragraphs, that is, each session uses text information of one paragraph as a historical medical text. It can be understood that after training for a certain number of historical medical texts and their corresponding preset categories, the first machine learning model can represent the correspondence between the text features and the preset categories, and therefore, the target text unit The text feature is input to the trained first machine learning model, and the target category output by the first machine learning model is the category to which the target text unit belongs.
204、生成目标医案文本,其中,在所述目标医案文本中所述目标文本单元体现为所属目标类别下的文本信息。204. Generate a target medical text, wherein the target text unit is embodied as text information under the target target category in the target medical text.
具体实现时,可以将原始医案文本划分出来的每个目标文本单元按照各自所属的目标类别进行组织,生成目标医案文本。所述目标医案文本例如可以用于向用户反馈,也即,在步骤204之后本实施例例如还可以包括:呈现所述目标医案文本。In the specific implementation, each target text unit that is divided into the original medical text can be organized according to the target category to which it belongs, and the target medical text is generated. The target medical text may be used, for example, for feedback to the user, that is, after step 204, the embodiment may, for example, further comprise: presenting the target medical text.
可以理解的是,在目标医案文本中包含了原始医案文本中划分出来的所有目标文本单元。此外,在目标医案文本中每个目标文本单元与其相应的目标类别是对应保存的,故在目标医案文本能够体现出每个目标文本单元分别是属于哪个目标类别下的文本信息。例如,假设目标文本单元为“头非常疼”、所属 的目标类别为“主诉”,则在目标医案文本中体现出来的信息可以是“主诉:头非常疼”。It can be understood that all target text units divided in the original medical text are included in the target medical text. In addition, each target text unit in the target medical text is saved correspondingly to its corresponding target category, so that the target medical text can reflect the text information under which the target text unit belongs to each target category. For example, suppose the target text unit is "head very painful", belonging The target category is “command”, and the information reflected in the target medical text can be “main complaint: the head is very painful”.
需要说明的是,在原始医案文本中可以记载了一些用于描述重要特征的特征词。考虑到这些重要的特征词在目标类别下与其他文本内容是混在一起的,为了使得用户能够更明显地识别出这些重要的特征词,在本实施例的一些实施方式中,可以在目标医案文本中可以设置单独的特征项,用于体现这些重要的特征词。具体地,在204之前,本实施例例如还可以包括:从所述原始医案文本中提取用于描述第一特征项的目标特征词。其中,在所述目标医案文本中所述目标特征词体现为属于所述第一特征项的文本信息。在目标医案文本中目标特征词与其相应的第一特征项是对应保存的,故在目标医案文本能够体现出目标特征词是属于相应的第一特征项的文本信息。例如,假设目标特征词为“当归”、所属的第一特征项为“药材”,则在目标医案文本中体现出来的信息可以是“药材:当归”。It should be noted that some characteristic words for describing important features can be recorded in the original medical text. Considering that these important feature words are mixed with other text content under the target category, in order to enable the user to more clearly identify these important feature words, in some embodiments of the present embodiment, the target medical record can be Individual features can be set in the text to reflect these important feature words. Specifically, before 204, the embodiment may further include, for example, extracting, from the original medical text, a target feature word for describing the first feature item. The target feature word in the target medical text is embodied as text information belonging to the first feature item. In the target medical text, the target feature word is correspondingly saved with the corresponding first feature item, so the target medical record text can reflect that the target feature word belongs to the corresponding first feature item. For example, if the target feature word is “angelica” and the first feature item belongs to “medicine material”, the information reflected in the target medical text may be “medicine material: angelica”.
可以理解的是,所述第一特征项下的目标特征词是在原始医案文本中记载的文本信息。例如,所述第一特征项可以是用于描述患者姓名的特征项,即所述目标特征词可以是用于描述患者姓名的信息。假设所述目标特征词为“张三”,则在目标医案文本中第一特征项和目标特征词可以体现成“患者姓名:张三”。又如,所述第一特征项可以是用于描述药品的特征项,即所述目标特征词可以是用于描述药品的信息。其中,所述药品可以是中医药材,也可以是西医药品。假设所述目标特征词为“阿莫西林”,则在目标医案文本中第一特征项和目标特征词可以体现成“药品:阿莫西林”。假设所述目标特征词为“当归”,则在目标医案文本中第一特征项和目标特征词可以体现成“药材:当归”。再如,所述第一特征项可以是用于描述剂量的特征项,即所述目标特征词可以是用于描述剂量的信息。假设所述目标特征词为“10克”,则在目标医案文本中第一特征项和目标特征词可以体现成:“剂量:10克”。又再如,所述第一特征项可以是用于描述症状的特征项,即所述目标特征词可以是用于描述症状的信息。假设所述目标特征词为“头疼”,则在目标医案文本中第一特征项和目标特征词可以体现成“症状:头疼”。It can be understood that the target feature words under the first feature item are text information recorded in the original medical text. For example, the first feature item may be a feature item for describing a patient name, that is, the target feature word may be information for describing a patient name. Assuming that the target feature word is "Zhang San", the first feature item and the target feature word in the target medical text can be embodied as "patient name: Zhang San". For another example, the first feature item may be a feature item for describing a medicine, that is, the target feature word may be information for describing a medicine. The medicine may be a Chinese medicine material or a western medicine product. Assuming that the target feature word is "amoxicillin", the first feature item and the target feature word in the target medical text can be embodied as "drug: amoxicillin". Assuming that the target feature word is "angelica", the first feature item and the target feature word in the target medical text can be embodied as "medicinal material: angelica". For another example, the first feature item may be a feature item for describing a dose, that is, the target feature word may be information for describing a dose. Assuming that the target feature word is "10 grams", the first feature item and the target feature word in the target medical text can be embodied as: "dose: 10 grams." Still again, the first feature item may be a feature item for describing a symptom, that is, the target feature word may be information for describing a symptom. Assuming that the target feature word is “headache”, the first feature item and the target feature word in the target medical text can be embodied as “symptoms: headaches”.
可以理解的是,不同的医案文本有时会采用不同的特征词描述同一个含 义,这样不利于医案信息的统计分析。为此,在本实施例的一些实施方式中,可以对同一个含义的特征词采用归一化处理,以使得目标医案文本中采用相同的特征词描述同一个含义。具体地,目标特征词的提取过程,例如可以包括:对所述原始医案文本进行分析,得到用于描述所述第一特征项的初始特征词;在标准特征词库中对所述初始特征词进行匹配,得到与所述初始特征词相匹配的标准特征词,作为所述用于描述第一特征项的目标特征词。其中,标准特征词库为用于描述同一个含义的多个特征词指定了一个标准特征词,并且,标准特征词库还记载了同一含义的非标准特征词与标准特征词之间的对应关系。若所述初始特征词为标准特征词库中的非标准特征词,则该非标准特征词在标准特征词库中对应的标准特征词可以作为目标特征词。若所述初始特征词为标准特征词库中的标准特征词,则该初始特征词自身就可以作为目标特征词。例如,“头疼”、“头痛”可以归一化成“头痛”,也即,“头疼”是非标准特征词,“头痛”是标准特征词。Understandably, different medical texts sometimes use different feature words to describe the same Righteousness, this is not conducive to the statistical analysis of medical information. To this end, in some embodiments of the present embodiment, the feature words of the same meaning may be normalized so that the same feature word is used in the target medical text to describe the same meaning. Specifically, the process of extracting the target feature word may include, for example, analyzing the original medical text to obtain an initial feature word for describing the first feature item; and the initial feature in the standard feature vocabulary The words are matched to obtain a standard feature word that matches the initial feature word as the target feature word for describing the first feature item. The standard feature vocabulary specifies a standard feature word for a plurality of feature words for describing the same meaning, and the standard feature vocabulary also records the correspondence between the non-standard feature words of the same meaning and the standard feature words. . If the initial feature word is a non-standard feature word in the standard feature lexicon, the corresponding standard feature word of the non-standard feature word in the standard feature vocabulary can be used as the target feature word. If the initial feature word is a standard feature word in the standard feature lexicon, the initial feature word itself can be used as the target feature word. For example, "headache" and "headache" can be normalized into "headache", that is, "headache" is a non-standard characteristic word, and "headache" is a standard characteristic word.
为了使得用户能够了解特征词的归一化处理,以避免用户有时了解目标医案文本中出现的标准特征词,在本实施例的一些实施方式中,例如还可以包括:建立所述用于描述第一特征项的初始特征词和所述用于描述第一特征项的目标特征词的对应关系,并体现在所述目标医案文本中。也即,在所述目标医案文本中还可以具有相互对应体现的所述初始特征词和所述目标特征词。例如,假设初始特征词为“头疼”、目标特征词为“头痛”,则目标医案文本中初始特征词和目标特征词可以体现成“原词:头疼;标准症状:头痛”。又如,假设初始特征词为“二十g”、目标特征词为“20克”,则目标医案文本中初始特征词和目标特征词可以体现成“原词:二十g;标准剂量:20克”。In order to enable the user to understand the normalization process of the feature words, in order to prevent the user from being able to understand the standard feature words that appear in the target medical text, in some embodiments of the present embodiment, for example, the method may further include: establishing the description Corresponding relationship between the initial feature word of the first feature item and the target feature word for describing the first feature item is embodied in the target medical record text. That is, the initial feature word and the target feature word corresponding to each other may also be included in the target medical text. For example, if the initial feature word is “headache” and the target feature word is “headache”, the initial feature word and the target feature word in the target medical text can be embodied as “original word: headache; standard symptom: headache”. For another example, if the initial feature word is “twenty g” and the target feature word is “20 g”, the initial feature word and the target feature word in the target medical text can be embodied as “original word: twenty g; standard dose: 20 grams".
由于医案文本中特征词可能具有医学专业属性,原始医案文本的分析可以借助于医学专用词库并结合词法分析、句法分析,从而使得特征词的提取更准确。具体地,在本实施例的一些实施方式中,为得到初始特征词,可以基于医学专用词库,对所述原始医案文本进行词法分析和/或句法分析,得到所述用于描述所述第一特征项的初始特征词。例如,假设原始医案文本记载了“头非常疼”,通过词法分析和句法分析可以识别出,“头”是名词和主语且表示人体部位,“疼”是动词、谓语且表示人体部位的状态,基于此可以确定初始特征 词为“头疼”。Since the feature words in the medical text may have medical professional attributes, the analysis of the original medical texts can be combined with lexical analysis and syntactic analysis by means of a medical special vocabulary, so that the extraction of feature words is more accurate. Specifically, in some implementations of this embodiment, in order to obtain an initial feature word, the original medical text may be subjected to lexical analysis and/or syntax analysis based on a medical-specific vocabulary, The initial feature word of the first feature item. For example, suppose the original medical text records that “the head is very painful”. Through lexical analysis and syntactic analysis, it can be recognized that the “head” is a noun and a subject and represents the human body part, and the “pain” is a verb, a predicate and indicates the state of the human body part. Based on this, the initial characteristics can be determined The word is "headache."
此外,对于一些具有特定规则的第一特征项,可以基于相应的特定规则对目标特征词进行识别。例如,针对第一特征性“患者年龄”,可以基于年龄识别规则(如特征词包含“数字+岁”或“数字+旬”)提取目标特征词。又如,针对第一特征项“就诊时间”,可以基于时间识别规则(如特征词包含“年”、“月”、“日”或具有分隔符“.”“/”等)提取目标特征词。再此外,针对某些特定的第一特征项,可以通过特定的识别技术对目标特征词进行识别。例如,针对第一特征性“患者姓名”,可以基于自然语言处理命名实体识别技术提取目标特征词。In addition, for some first feature items with specific rules, the target feature words can be identified based on corresponding specific rules. For example, for the first characteristic "patient age", the target feature word may be extracted based on an age recognition rule (eg, the feature word includes "number + year" or "number + ten"). For another example, for the first feature item "visiting time", the target feature word may be extracted based on a time recognition rule (eg, the feature word includes "year", "month", "day" or has a separator "." "/", etc.) . Furthermore, for certain specific first feature items, the target feature words can be identified by a specific recognition technique. For example, for a first characteristic "patient name", a target feature word can be extracted based on a natural language processing named entity recognition technique.
有时,第一特征项是属于某一个或某几个目标类别下的特征,也即,第一特征项下的目标特征词都存在于目标分类下的文本信息中。基于此,本实施例的一些实施方式中,所述用于描述所述第一特征项的目标特征词具体可以是在所述第一特征项所述的目标类别下的文本信息中进行提取的,也即,在203之后,从所述原始医案文本中、所述第一特征项所属的目标类别下的文本信息中提取用于描述第一特征项的目标特征词。其中,目标类别下的文本信息包括所有与目标类别相对应的目标文本单元。例如,第一特征项“药品”是属于目标类别“处方”下的特征,也即,属于类别“处方”的文本信息中存在第一特征项“药品”对应的相关信息。因此,在经过对原始医案文本分类而确定了属于类别“处方”的文本信息之后,可以在属于类别“处方”的文本信息中查找、提取第一特征项“药品”对应的目标特征词。当然,第一特征项的目标特征词也可以是从原始医案文本的所有文本信息中进行查找、提取的。Sometimes, the first feature item is a feature belonging to one or several target categories, that is, the target feature words under the first feature item are present in the text information under the target category. Based on this, in some implementations of this embodiment, the target feature word for describing the first feature item may be specifically extracted in text information under the target category of the first feature item. That is, after 203, a target feature word for describing the first feature item is extracted from the text information under the target category to which the first feature item belongs in the original medical text. Wherein, the text information under the target category includes all target text units corresponding to the target category. For example, the first feature item "drug" is a feature belonging to the target category "prescription", that is, the related information corresponding to the first feature item "drug" exists in the text information belonging to the category "prescription". Therefore, after the text information belonging to the category “prescription” is determined by classifying the original medical text, the target feature word corresponding to the first feature “drug” can be searched for and extracted in the text information belonging to the category “prescription”. Of course, the target feature words of the first feature item may also be searched and extracted from all the text information of the original medical text.
需要说明的是,对于原始医案文本中没有直接记载的一些特征词,有时可以从原始医案文本中记载的文本信息推断出来。在本实施例的一些实施方式中,可以在目标医案文本中设置单独的特征项,用于体现这些推断出来的特征词。具体地,在204之前,本实施例例如还可以包括:确定所述原始医案文本在第二特征项下对应的推断特征词,其中,所述推断特征词为在所述原始医案文本中没有记载的用于描述所述第二特征项的特征词;在所述目标医案文本中所述推断特征词体现为属于所述第二特征项的文本信息。在目标医案文本中推断特征词与其相应的第二特征项是对应保存的,故在目标医案文本能够体现出 推断特征词是属于相应的第二特征项的文本信息。例如,在原始医案文本没有记载患者性别的情况下,假设依据原始医案文本能够推断出患者是女性,则推断特征词为“女”,所属的目标类别为“患者性别”,在目标医案文本中体现出来的信息可以是“性别:女”。It should be noted that some characteristic words that are not directly recorded in the original medical text can sometimes be inferred from the text information recorded in the original medical text. In some embodiments of the present embodiment, separate feature items may be provided in the target medical text for embodying the inferred feature words. Specifically, before 204, the embodiment may further include: determining an inferred feature word corresponding to the original medical text under the second feature item, wherein the inferred feature word is in the original medical text a feature word for describing the second feature item; the inferred feature word in the target medical text is embodied as text information belonging to the second feature item. Inferring the feature word and its corresponding second feature item in the target medical text are saved, so the target medical text can be reflected The inferred feature word is text information belonging to the corresponding second feature item. For example, if the original medical text does not record the gender of the patient, assuming that the patient is a female based on the original medical text, the inferred feature word is “female” and the target category is “patient gender”. The information reflected in the text of the case can be “gender: female”.
可以理解的是,属于所述第二特征项的推断特征词是在原始医案文本中没有直接记载的文本信息。例如,所述第二特征项可以是用于描述患者性别的特征项,即所述推断特征词可以是用于描述患者性别的特征词。假设所述推断特征词为“男”,则在目标医案文本中第二特征项和对应的推断特征词可以体现成“患者性别:男”。又如,所述第二特征项可以是用于描述患者年龄的特征项,即所述推断特征词可以是用于描述患者年龄的特征词。假设所述推断特征词为“中年”,则在目标医案文本中第二特征项和对应的推断特征词可以体现成“患者年龄:中年”。It can be understood that the inferred feature words belonging to the second feature item are text information not directly recorded in the original medical text. For example, the second feature item may be a feature item for describing the gender of the patient, that is, the inferred feature word may be a feature word for describing the gender of the patient. Assuming that the inferred feature word is "male", the second feature item and the corresponding inferred feature word in the target medical text can be embodied as "patient gender: male". For another example, the second feature item may be a feature item for describing the age of the patient, that is, the inferred feature word may be a feature word for describing the age of the patient. Assuming that the inferred feature word is "middle age", the second feature item and the corresponding inferred feature word in the target medical text can be embodied as "patient age: middle age".
需要说明的是,推断特征词的推断方式,例如可以采用机器学习模型。具体地,推断特征词的确定方式,例如可以包括:基于第二机器学习模型,确定所述原始医案文本在所述第二特征项下对应的推断特征词,其中,所述第二机器学习模型通过对训练样本集中包括的历史医案文本与预置的用于描述所述第二特征项的推断特征词之间的对应关系进行训练而得到,从所述历史医案文本出发能够推断得到所述历史特征词。其中,所述第二机器学习模型的训练过程可以具体在于,对于很难提取得到确定特征词的历史医案文本,在确定所述历史医案文本对应的推断特征词的情况下,以所述历史医案文本作为输入、以所述推断特征词作为输出对所述第二机器学习模型进行训练。可以理解的是,针对一定数量的历史医案文本和其对应的推断特征词进行了训练之后,第二机器学习模型可以表示医案文本与推断特征词之间的对应关系,因此,将待结构化的原始医案文本输入到训练过的第二机器学习模型,第二机器学习模型输出的推断特征词即是所述原始医案文本能够反映出的特征。It should be noted that, for inferring a method of inferring a feature word, for example, a machine learning model can be employed. Specifically, the determining the manner of determining the feature word may include: determining, according to the second machine learning model, the inferred feature word corresponding to the original medical text under the second feature item, wherein the second machine learning The model is obtained by training a correspondence between a historical medical text included in the training sample set and a preset inferred feature word for describing the second characteristic item, and can be inferred from the historical medical text. The historical feature words. The training process of the second machine learning model may be specifically: for the historical medical text that is difficult to extract the determined feature words, in the case of determining the inferred feature words corresponding to the historical medical text, The second medical learning model is trained as an input to the historical medical text as an input. It can be understood that after training a certain number of historical medical texts and their corresponding inferred feature words, the second machine learning model can represent the correspondence between the medical text and the inferred feature words, and therefore, the structure will be The original medical text is input to the trained second machine learning model, and the inferred feature word output by the second machine learning model is a feature that the original medical text can reflect.
在本实施例的一些实施方式中,在用户提供了包含症状、患者信息等文本内容的原始医案文本的情况下,可以从与用户提供的原始医案文本具有相同或相似症状、患者信息等文本内容的预置医案文本中提取出诊断信息的文本内容并作为参考诊断信息体现在目标医案文本信息,以便用户进行参考,因此,用 户可以通过输入患者信息的方式获得推荐作为参考的诊断信息,从而实现“自诊”的功能。具体地,在203之后,本实施例例如还可以包括:查找与所述目标医案文本相匹配的预置医案文本,其中,所述预置医案文本在所述目标类别下的文本信息与所述目标文本单元相同或相似,所述目标类别包括用于描述患者个人信息的类别和/或用于描述患者症状的类别;提取所述预置医案文本中用于描述诊断信息的类别下的文本信息,以作为参考诊断信息体现在所述目标医案文本中。其中,用于描述诊断信息的类别例如可以是用于描述处方信息的类别,用于描述病症辨别信息的类别和/或用于描述医嘱信息的类别。此外,所述预置医案文本例如可以是预先收集的经典医案信息或医学专家提供的医案信息。In some embodiments of the present embodiment, in the case where the user provides the original medical text containing the textual content of the symptom, patient information, etc., the same or similar symptoms, patient information, etc. may be obtained from the original medical text provided by the user. Extracting the text content of the diagnostic information in the preset medical text of the text content and as the reference diagnostic information is embodied in the target medical text information for the user to refer to, therefore, The user can obtain the diagnostic information recommended as a reference by inputting the patient information, thereby realizing the function of "self-diagnosis". Specifically, after 203, the embodiment may further include, for example, searching for preset medical text matching the target medical text, wherein the preset medical text is text information under the target category Same or similar to the target text unit, the target category includes a category for describing patient personal information and/or a category for describing a patient's symptoms; extracting a category for describing diagnostic information in the preset medical text The text information below is embodied in the target medical text as reference diagnostic information. Among them, the category for describing the diagnosis information may be, for example, a category for describing the prescription information, a category for describing the condition discrimination information, and/or a category for describing the medical order information. In addition, the preset medical text may be, for example, pre-collected classic medical information or medical information provided by a medical expert.
可以理解的是,用于匹配原始医案文本与预置医案文本的文本信息可以是一个目标类别下的文本信息,也可以是多个目标类别下的文本信息。在利用多个目标类别下的文本信息对原始医案信息与预置医案文本进行匹配时,可以对不同的目标类别设置不同的匹配权重来衡量原始医案信息与预置医案文本之间的匹配程度。例如,用于匹配原始医案信息与预置医案文本的文本信息可以是“病症”、“患者年龄”、“患者性别”、“就诊时间”四个目标类别下的文本信息。其中,考虑到“就诊时间”对诊断信息的影响相对较小,“病症”、“患者年龄”和“患者性别”可以采用相对较大的匹配权重,“就诊时间”可以采用相对较小的匹配权重。此时,若原始医案信息与预置医案文本在“病症”、“患者年龄”和“患者性别”的文本信息较为一致而“就诊时间”较为不一致的情况下,匹配的结果可能是原始医案信息与预置医案文本相匹配。若原始医案信息与预置医案文本在“病症”和“就诊时间”的文本信息较为一致而“患者性别”较为不一致的情况下,匹配的结果可能是原始医案信息与预置医案文本不匹配。It can be understood that the text information used to match the original medical text and the preset medical text may be text information under a target category, or may be text information under multiple target categories. When using the text information under multiple target categories to match the original medical information with the preset medical text, different matching weights can be set for different target categories to measure between the original medical information and the preset medical text. The degree of matching. For example, the text information used to match the original medical information and the preset medical text may be text information under four target categories of “disorder”, “patient age”, “patient gender”, and “visiting time”. Among them, considering that the “diagnosis time” has relatively small impact on the diagnostic information, “disease”, “patient age” and “patient gender” can adopt relatively large matching weights, and “visiting time” can adopt relatively small matching. Weights. At this time, if the original medical information and the preset medical text are more consistent in the text information of "illness", "patient age" and "patient gender" and the "visual time" is inconsistent, the result of the matching may be original. The medical record information matches the preset medical text. If the original medical information and the preset medical text are more consistent in the text information of “disorder” and “visiting time” and the “patient gender” is more inconsistent, the matching result may be the original medical information and the preset medical record. The text does not match.
在本实施例中,所述原始医案文本和所述目标医案文本任何一种医案文本,如可以是中医的医案文本,又如也可以是西医的医案文本。In this embodiment, the medical text of the original medical text and the target medical text may be a medical text of a Chinese medicine, or may be a medical text of a Western medicine.
在本实施例中,对于没有结构化的原始医案文本,通过将所述原始医案文本划分成至少一个目标文本单元并为每一个目标文本单元确定该目标文本单元的文本特征对应的目标类别,可以生成结构化的目标医案文本,使得在目标 医案文本中每一个目标文本单元均体现为其目标类别下的文本信息。由此可见,由于在结构化的目标医案文本中不同的信息内容分别被划分到了相应的类别下,一方面,向用户显示目标医案文本时用户不仅能够更顺畅地阅读并且也能够更快地寻找到需要的信息内容,另一方面,目标医案文本中分类体现的文本内容有利于信息内容的查找和识别,这也使得目标医案文本更利于数据整理和分析。In this embodiment, for the original medical text without the structure, the original medical text is divided into at least one target text unit and the target category corresponding to the text feature of the target text unit is determined for each target text unit. Can generate structured target medical texts that make the target Each target text unit in the medical text is reflected in the text information under its target category. It can be seen that since different information contents are classified into corresponding categories in the structured target medical text, on the one hand, when the target medical text is displayed to the user, the user can not only read more smoothly but also can be faster. On the other hand, the text content categorized in the target medical text text is conducive to the search and identification of the information content, which also makes the target medical text more conducive to data collation and analysis.
示例性设备Exemplary device
参见图3,示出了本发明实施例中一种医案信息的处理装置的结构示意图。在本实施例中,所述装置例如具体可以包括:Referring to FIG. 3, a schematic structural diagram of a processing apparatus for medical record information in an embodiment of the present invention is shown. In this embodiment, the device may specifically include:
获取单元301,用于获取原始医案文本;The obtaining unit 301 is configured to obtain the original medical text;
划分单元302,用于将所述原始医案文本划分成至少一个目标文本单元;a dividing unit 302, configured to divide the original medical text into at least one target text unit;
第一确定单元303,用于确定所述目标文本单元的文本特征对应的目标类别;a first determining unit 303, configured to determine a target category corresponding to the text feature of the target text unit;
生成单元304,用于生成目标医案文本,其中,在所述目标医案文本中所述目标文本单元体现为所述目标类别下的文本信息。The generating unit 304 is configured to generate target medical text, wherein the target text unit is embodied as text information under the target category in the target medical text.
可选的,所述第一确定单元303,可以包括:Optionally, the first determining unit 303 may include:
目标类别确定子单元,用于基于第一机器学习模型,确定所述目标文本单元的文本特征对应的目标类别,其中,所述第一机器学习模型通过对训练样本集中包括的历史医案文本的文本特征与预置类别之间的对应关系进行训练而得到。a target category determining subunit, configured to determine, according to the first machine learning model, a target category corresponding to a text feature of the target text unit, wherein the first machine learning model passes the historical medical text included in the training sample set The correspondence between the text feature and the preset category is obtained by training.
可选的,所述目标类别为用于描述患者信息的类别、用于描述疾病名称的类别、用于描述症状陈述信息的类别、用于描述症状辨别信息的类别、用于描述医嘱信息的类别、或用于描述处方信息的类别。Optionally, the target category is a category for describing patient information, a category for describing a disease name, a category for describing symptom statement information, a category for describing symptom identification information, and a category for describing medical information. Or a category used to describe prescription information.
可选的,所述装置还可以包括:Optionally, the device may further include:
第一提取单元,用于从所述原始医案文本中提取用于描述第一特征项的目标特征词;其中,在所述目标医案文本中所述目标特征词体现为属于所述第一特征项的文本信息。a first extracting unit, configured to extract, from the original medical text, a target feature word for describing the first feature item; wherein, in the target medical text, the target feature word is embodied as belonging to the first Text information for feature items.
可选的,所述第一提取单元,可以包括: Optionally, the first extracting unit may include:
目标特征词提取子单元,用于从所述原始医案文本中、所述第一特征项所属的目标类别下的文本信息中提取所述用于描述第一特征项的目标特征词。And a target feature word extracting sub-unit, configured to extract the target feature word for describing the first feature item from the text information under the target category to which the first feature item belongs in the original medical text.
可选的,所述第一提取单元,可以具体包括:分析子单元和匹配子单元;Optionally, the first extracting unit may specifically include: an analyzing subunit and a matching subunit;
所述分析子单元,用于对所述原始医案文本进行分析,得到用于所述描述第一特征项的初始特征词;The analysis subunit is configured to analyze the original medical text to obtain an initial feature word for describing the first feature item;
所述匹配子单元,用于在标准特征词库中对所述初始特征词进行匹配,得到与所述初始特征词相匹配的标准特征词,作为所述用于描述第一特征项的目标特征词。The matching subunit is configured to match the initial feature words in a standard feature vocabulary to obtain a standard feature word that matches the initial feature word, as the target feature for describing the first feature item word.
可选的,所述匹配子单元,可以包括:Optionally, the matching subunit may include:
初始特征词提取子单元,用于基于医学专用词库,对所述原始医案文本进行词法分析和/或句法分析,得到所述用于描述所述第一特征项的初始特征词。The initial feature word extraction subunit is configured to perform lexical analysis and/or syntax analysis on the original medical text based on the medical special vocabulary to obtain the initial feature word for describing the first feature item.
可选的,所述装置还可以包括:Optionally, the device may further include:
建立单元,用于描述所述第一特征项的初始特征词和目标特征词的对应关系,并体现在所述目标医案文本中。And a establishing unit, configured to describe a correspondence between the initial feature word and the target feature word of the first feature item, and embodied in the target medical record text.
可选的,所述第一特征项可以为用于描述患者姓名的特征项、用于描述药品的特征项、用于描述剂量的特征项或用于描述症状的特征项。Optionally, the first feature item may be a feature item for describing a patient name, a feature item for describing a medicine, a feature item for describing a dose, or a feature item for describing a symptom.
可选的,所述装置还可以包括:Optionally, the device may further include:
第二确定单元,用于确定所述原始医案文本在第二特征项下对应的推断特征词,其中,所述推断特征词为在所述原始医案文本中没有记载的用于描述所述第二特征项的特征词;a second determining unit, configured to determine an inferred feature word corresponding to the original medical text under the second feature item, wherein the inferred feature word is not described in the original medical text for describing the a characteristic word of the second feature item;
在所述目标医案文本中所述推断特征词体现为属于所述第二特征项的文本信息。The inferred feature word is embodied in the target medical text as text information belonging to the second feature.
可选的,所述第二确定单元,可以包括:Optionally, the second determining unit may include:
推断特征词确定子单元,用于基于第二机器学习模型,确定所述原始医案文本在所述第二特征项下对应的推断特征词,其中,所述第二机器学习模型通过对训练样本集中包括的历史医案文本与预置的用于描述所述第二特征项的推断特征词之间的对应关系进行训练而得到。Deducing a feature word determining subunit, configured to determine, according to a second machine learning model, an inferred feature word corresponding to the original medical text under the second feature item, wherein the second machine learning model passes the training sample The historical medical record text included in the set is trained by the corresponding correspondence between the preset inferred feature words for describing the second feature item.
可选的,所述第二特征项可以为用于描述患者性别的特征项或用于描述患者年龄的特征项。 Optionally, the second feature item may be a feature item for describing a gender of the patient or a feature item for describing the age of the patient.
可选的,所述装置还可以包括:查找单元和第二提取单元;Optionally, the device may further include: a searching unit and a second extracting unit;
所述查找单元,用于查找与所述目标医案文本相匹配的预置医案文本,其中,所述预置医案文本在所述目标类别下的文本信息与所述目标文本单元相同或相似,所述目标类别包括用于描述患者个人信息的类别和/或用于描述患者症状的类别;The searching unit is configured to search for preset medical text matching the target medical text, wherein the text information of the preset medical text under the target category is the same as the target text unit or Similarly, the target category includes categories for describing patient personal information and/or categories for describing patient symptoms;
所述第二提取单元,用于提取所述预置医案文本中用于描述诊断信息的类别下的文本信息,以用于生成所述目标医案文本。The second extracting unit is configured to extract text information in a category for describing diagnostic information in the preset medical text for generating the target medical text.
可选的,所述原始医案文本可以为针对一个患者一次诊断涉及的医案文本信息。Optionally, the original medical text may be medical text information related to one diagnosis for one patient.
可选的,所述获取单元301可以包括:第一获取子单元和第一识别子单元;Optionally, the obtaining unit 301 may include: a first acquiring subunit and a first identifying subunit;
所述第一获取子单元,用于获取语音形式的医案信息;The first obtaining subunit is configured to obtain medical record information in a voice form;
所述第一识别子单元,用于对所述医案信息进行语音识别,得到所述原始医案文本。The first identification subunit is configured to perform voice recognition on the medical record information to obtain the original medical record text.
可选的,所述获取单元301可以包括:第二获取子单元和第二识别子单元;Optionally, the obtaining unit 301 may include: a second acquiring subunit and a second identifying subunit;
所述第二获取子单元,用于获取图像形式的医案信息;The second obtaining subunit is configured to acquire medical record information in an image form;
所述第二识别子单元,用于对所述医案信息进行图像识别,得到所述原始医案文本。The second identification subunit is configured to perform image recognition on the medical record information to obtain the original medical record text.
可选的,所述装置还可以包括:Optionally, the device may further include:
呈现单元,用于呈现所述目标医案文本。a presentation unit for presenting the target medical text.
在本实施例中,对于没有结构化的原始医案文本,通过将所述原始医案文本划分成至少一个目标文本单元并为每一个目标文本单元确定该目标文本单元的文本特征对应的目标类别,可以生成结构化的目标医案文本,使得在目标医案文本中每一个目标文本单元均体现为其目标类别下的文本信息。由此可见,由于在结构化的目标医案文本中不同的信息内容分别被划分到了相应的类别下,一方面,向用户显示目标医案文本时用户不仅能够更顺畅地阅读并且也能够更快地寻找到需要的信息内容,另一方面,目标医案文本中分类体现的文本内容有利于信息内容的查找和识别,这也使得目标医案文本更利于数据整理和分析。 In this embodiment, for the original medical text without the structure, the original medical text is divided into at least one target text unit and the target category corresponding to the text feature of the target text unit is determined for each target text unit. A structured target medical text can be generated such that each target text unit in the target medical text is embodied as textual information under its target category. It can be seen that since different information contents are classified into corresponding categories in the structured target medical text, on the one hand, when the target medical text is displayed to the user, the user can not only read more smoothly but also can be faster. On the other hand, the text content categorized in the target medical text text is conducive to the search and identification of the information content, which also makes the target medical text more conducive to data collation and analysis.
参照图4,装置1800可以包括以下一个或多个组件:处理组件1802,存储器1804,电源组件1806,多媒体组件1806,音频组件1810,输入/输出(I/O)的接口1812,传感器组件1814,以及通信组件1816。Referring to FIG. 4, apparatus 1800 can include one or more of the following components: processing component 1802, memory 1804, power component 1806, multimedia component 1806, audio component 1810, input/output (I/O) interface 1812, sensor component 1814, And a communication component 1816.
处理组件1802通常控制装置1800的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件1802可以包括一个或多个处理器1820来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件1802可以包括一个或多个模块,便于处理组件1802和其他组件之间的交互。例如,处理部件1802可以包括多媒体模块,以方便多媒体组件1806和处理组件1802之间的交互。 Processing component 1802 typically controls the overall operation of device 1800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing component 1802 can include one or more processors 1820 to execute instructions to perform all or part of the steps described above. Moreover, processing component 1802 can include one or more modules to facilitate interaction between component 1802 and other components. For example, processing component 1802 can include a multimedia module to facilitate interaction between multimedia component 1806 and processing component 1802.
存储器1804被配置为存储各种类型的数据以支持在设备1800的操作。这些数据的示例包括用于在装置1800上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器1804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。 Memory 1804 is configured to store various types of data to support operation at device 1800. Examples of such data include instructions for any application or method operating on device 1800, contact data, phone book data, messages, pictures, videos, and the like. Memory 1804 can be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or Optical Disk.
电源组件1806为装置1800的各种组件提供电力。电源组件1806可以包括电源管理系统,一个或多个电源,及其他与为装置1800生成、管理和分配电力相关联的组件。 Power component 1806 provides power to various components of device 1800. Power component 1806 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for device 1800.
多媒体组件1806包括在所述装置1800和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件1806包括一个前置摄像头和/或后置摄像头。当设备1800处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。 Multimedia component 1806 includes a screen between the device 1800 and the user that provides an output interface. In some embodiments, the screen can include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, slides, and gestures on the touch panel. The touch sensor may sense not only the boundary of the touch or sliding action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1806 includes a front camera and/or a rear camera. When the device 1800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
音频组件1810被配置为输出和/或输入音频信号。例如,音频组件1810包括一个麦克风(MIC),当装置1800处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器1804或经由通信组件1816发送。在一些实施例中,音频组件1810还包括一个扬声器,用于输出音频信号。The audio component 1810 is configured to output and/or input an audio signal. For example, audio component 1810 includes a microphone (MIC) that is configured to receive an external audio signal when device 1800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in memory 1804 or transmitted via communication component 1816. In some embodiments, the audio component 1810 also includes a speaker for outputting an audio signal.
I/O接口1812为处理组件1802和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 1812 provides an interface between the processing component 1802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to, a home button, a volume button, a start button, and a lock button.
传感器组件1814包括一个或多个传感器,用于为装置1800提供各个方面的状态评估。例如传感器组件1814可以检测到设备1800的打开/关闭状态,组件的相对定位,例如所述组件为装置1800的显示器和小键盘,传感器组件1814还可以检测装置1800或装置1800一个组件的位置改变,用户与装置1800接触的存在或不存在,装置1800方位或加速/减速和装置1800的温度变化。传感器组件1814可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件1814还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件1814还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。 Sensor assembly 1814 includes one or more sensors for providing device 1800 with a status assessment of various aspects. For example, sensor assembly 1814 can detect an open/closed state of device 1800, relative positioning of components, such as the display and keypad of device 1800, and sensor component 1814 can also detect a change in position of one component of device 1800 or device 1800, The presence or absence of contact by the user with the device 1800, the orientation or acceleration/deceleration of the device 1800 and the temperature change of the device 1800. Sensor assembly 1814 can include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 1814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1814 can also include an acceleration sensor, a gyro sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
通信组件1816被配置为便于装置1800和其他设备之间有线或无线方式的通信。装置1800可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信部件1816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信部件1816还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。 Communication component 1816 is configured to facilitate wired or wireless communication between device 1800 and other devices. The device 1800 can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, communication component 1816 receives broadcast signals or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1816 also includes a near field communication (NFC) module to facilitate short range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
在示例性实施例中,装置1800可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。 In an exemplary embodiment, device 1800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor, or other electronic component implementation for performing the above methods.
图5是本发明实施例中服务器的结构示意图。该服务器1900可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1922(例如,一个或一个以上处理器)和存储器1932,一个或一个以上存储应用程序1942或数据1944的存储介质1930(例如一个或一个以上海量存储设备)。其中,存储器1932和存储介质1930可以是短暂存储或持久存储。存储在存储介质1930的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器1922可以设置为与存储介质1930通信,在服务器1900上执行存储介质1930中的一系列指令操作。FIG. 5 is a schematic structural diagram of a server in an embodiment of the present invention. The server 1900 can vary considerably depending on configuration or performance, and can include one or more central processing units (CPUs) 1922 (eg, one or more processors) and memory 1932, one or one The above storage medium 1942 or storage medium 1930 of data 1944 (eg, one or one storage device in Shanghai). Among them, the memory 1932 and the storage medium 1930 may be short-term storage or persistent storage. The program stored on storage medium 1930 may include one or more modules (not shown), each of which may include a series of instruction operations in the server. Still further, central processor 1922 can be configured to communicate with storage medium 1930, which performs a series of instruction operations in storage medium 1930.
服务器1900还可以包括一个或一个以上电源1926,一个或一个以上有线或无线网络接口1950,一个或一个以上输入输出接口1958,一个或一个以上键盘1956,和/或,一个或一个以上操作系统1941,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。 Server 1900 may also include one or more power sources 1926, one or more wired or wireless network interfaces 1950, one or more input and output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941. For example, Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
本发明实施例提供了一种设备。该设备包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行所述一个或者一个以上程序包含用于进行以下操作的指令:An embodiment of the present invention provides an apparatus. The device includes a memory, and one or more programs, wherein one or more programs are stored in the memory, and configured to be executed by one or more processors to include the one or more programs for performing the following operations Instructions:
获取原始医案文本,并将所述原始医案文本划分成至少一个目标文本单元;Obtaining the original medical text and dividing the original medical text into at least one target text unit;
确定所述目标文本单元的文本特征对应的目标类别;Determining a target category corresponding to a text feature of the target text unit;
生成目标医案文本,其中,在所述目标医案文本中所述目标文本单元体现为所属目标类别下的文本信息。Generating a target medical text, wherein the target text unit in the target medical text is embodied as text information under the target category.
在本实施例的一些实施方式中,所述设备可以具体为前述的装置1800,所述存储器可以具体为前述装置1800中的存储器1804,所述处理器可以具体为前述装置1800中的处理器1820。In some embodiments of the present embodiment, the device may be specifically the foregoing device 1800, and the memory may be specifically the memory 1804 in the foregoing device 1800, and the processor may be specifically the processor 1820 in the foregoing device 1800. .
在本实施例的另一些实施方式中,所述设备可以具体为前述的服务器1900,所述处理器可以具体为前述服务器1900中的中央处理器1922,所述存储器可以具体为前述服务器1900中的存储介质1930。In other embodiments of the present embodiment, the device may be specifically the foregoing server 1900, and the processor may be specifically the central processor 1922 in the foregoing server 1900, and the memory may be specifically in the foregoing server 1900. Storage medium 1930.
可选的,为了确定所述目标文本单元的文本特征对应的目标类别,所述处 理器可以具体执行如下操作的指令:Optionally, in order to determine a target category corresponding to the text feature of the target text unit, where The processor can specifically execute the following operations:
基于第一机器学习模型,确定所述目标文本单元的文本特征对应的目标类别,其中,所述第一机器学习模型通过对训练样本集中包括的历史医案文本的文本特征与预置类别之间的对应关系进行训练而得到。Determining, according to the first machine learning model, a target category corresponding to the text feature of the target text unit, wherein the first machine learning model passes between a text feature of the historical medical text included in the training sample set and the preset category The correspondence is obtained by training.
可选的,所述目标类别为用于描述患者信息的类别、用于描述疾病名称的类别、用于描述症状陈述信息的类别、用于描述症状辨别信息的类别、用于描述医嘱信息的类别、或用于描述处方信息的类别。Optionally, the target category is a category for describing patient information, a category for describing a disease name, a category for describing symptom statement information, a category for describing symptom identification information, and a category for describing medical information. Or a category used to describe prescription information.
可选的,所述处理器还可以执行如下操作的指令:Optionally, the processor may further execute an instruction of:
从所述原始医案文本中提取用于描述第一特征项的目标特征词;Extracting a target feature word for describing the first feature item from the original medical text;
其中,在所述目标医案文本中所述目标特征词体现为属于所述第一特征项的文本信息。The target feature word in the target medical text is embodied as text information belonging to the first feature item.
可选的,为了从所述原始医案文本中提取用于描述第一特征项的目标特征词,所述处理器可以具体执行如下操作的指令:Optionally, in order to extract a target feature word for describing the first feature item from the original medical text, the processor may specifically execute an instruction of:
从所述原始医案文本中、所述第一特征项所属的目标类别下的文本信息中提取所述用于描述第一特征项的目标特征词。Extracting the target feature words for describing the first feature item from the text information under the target category to which the first feature item belongs in the original medical text.
可选的,为了从所述原始医案文本中提取用于描述第一特征项的目标特征词,所述处理器可以具体执行如下操作的指令:Optionally, in order to extract a target feature word for describing the first feature item from the original medical text, the processor may specifically execute an instruction of:
对所述原始医案文本进行分析,得到用于所述描述第一特征项的初始特征词;Performing analysis on the original medical text to obtain an initial feature word for describing the first feature item;
在标准特征词库中对所述初始特征词进行匹配,得到与所述初始特征词相匹配的标准特征词,作为所述用于描述第一特征项的目标特征词。The initial feature words are matched in a standard feature vocabulary to obtain a standard feature word that matches the initial feature word as the target feature word for describing the first feature item.
可选的,为了对所述原始医案文本进行分析而得到用于描述所述第一特征项的初始特征词,所述处理器可以具体执行如下操作的指令:Optionally, in order to analyze the original medical text to obtain an initial feature word for describing the first feature item, the processor may specifically execute an instruction of:
基于医学专用词库,对所述原始医案文本进行词法分析和/或句法分析,得到所述用于描述所述第一特征项的初始特征词。Performing lexical analysis and/or syntax analysis on the original medical text based on a medical-specific vocabulary to obtain the initial feature words for describing the first feature item.
可选的,所述处理器还可以执行如下操作的指令:Optionally, the processor may further execute an instruction of:
建立用于描述所述第一特征项的初始特征词和目标特征词的对应关系,并体现在所述目标医案文本中。Corresponding relationship between the initial feature word and the target feature word for describing the first feature item is established and embodied in the target medical record text.
可选的,所述第一特征项可以为用于描述患者姓名的特征项、用于描述药 品的特征项、用于描述剂量的特征项或用于描述症状的特征项。Optionally, the first feature item may be a feature item for describing a patient name, for describing a medicine A feature item of a product, a feature item for describing a dose, or a feature item for describing a symptom.
可选的,所述处理器还可以执行如下操作的指令:Optionally, the processor may further execute an instruction of:
确定所述原始医案文本在第二特征项下对应的推断特征词,其中,所述推断特征词为在所述原始医案文本中没有记载的用于描述所述第二特征项的特征词;Determining the inferred feature word corresponding to the original medical text under the second feature item, wherein the inferred feature word is a feature word for describing the second feature item not recorded in the original medical text ;
在所述目标医案文本中所述推断特征词体现为属于所述第二特征项的文本信息。The inferred feature word is embodied in the target medical text as text information belonging to the second feature.
可选的,为了确定所述原始医案文本在第二特征项下对应的推断特征词,所述处理器可以具体执行如下操作的指令:Optionally, in order to determine the inferred feature word corresponding to the original medical text under the second feature item, the processor may specifically execute an instruction of:
基于第二机器学习模型,确定所述原始医案文本在所述第二特征项下对应的推断特征词,其中,所述第二机器学习模型通过对训练样本集中包括的历史医案文本与预置的用于描述所述第二特征项的推断特征词之间的对应关系进行训练而得到。Determining, according to the second machine learning model, the inferred feature words corresponding to the original medical text under the second feature item, wherein the second machine learning model passes the historical medical text and the pre-set included in the training sample set The corresponding relationship between the inferred feature words for describing the second feature item is obtained by training.
可选的,所述第二特征项可以为用于描述患者性别的特征项或用于描述患者年龄的特征项。Optionally, the second feature item may be a feature item for describing a gender of the patient or a feature item for describing the age of the patient.
可选的,所述处理器还可以执行如下操作的指令:Optionally, the processor may further execute an instruction of:
在生成目标医案文本之后,查找与所述目标医案文本相匹配的预置医案文本,其中,所述预置医案文本在所述目标类别下的文本信息与所述目标文本单元相同或相似,所述目标类别包括用于描述患者个人信息的类别和/或用于描述患者症状的类别;After generating the target medical text, searching for the preset medical text matching the target medical text, wherein the text information of the preset medical text under the target category is the same as the target text unit Or similarly, the target category includes categories for describing patient personal information and/or categories for describing patient symptoms;
提取所述预置医案文本中用于描述诊断信息的类别下的文本信息,作为参考诊断信息体现在所述目标医案文本中。Extracting text information under the category for describing the diagnosis information in the preset medical text is embodied as reference diagnostic information in the target medical text.
可选的,所述原始医案文本可以为针对一个患者一次诊断涉及的医案文本信息。Optionally, the original medical text may be medical text information related to one diagnosis for one patient.
可选的,为了获取原始医案文本,所述处理器可以具体执行如下操作的指令:Optionally, in order to obtain the original medical text, the processor may specifically execute the following operations:
获取语音形式的医案信息;Obtain medical information in the form of speech;
对所述医案信息进行语音识别,得到所述原始医案文本。Performing voice recognition on the medical record information to obtain the original medical text.
可选的,为了获取原始医案文本,所述处理器可以具体执行如下操作的指 令:Optionally, in order to obtain the original medical text, the processor may specifically perform the following operations. make:
获取图像形式的医案信息;Obtain medical record information in the form of images;
对所述医案信息进行图像识别,得到所述原始医案文本。Performing image recognition on the medical record information to obtain the original medical text.
可选的,所述处理器还可以执行如下操作的指令:Optionally, the processor may further execute an instruction of:
呈现所述目标医案文本。Presenting the target medical text.
本发明实施例还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器1804,上述指令可由装置1800的处理器1820执行以完成上述方法,又如包括指令的存储介质1930,上述指令可由服务器1900的中央处理器1922执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。Embodiments of the present invention also provide a non-transitory computer readable storage medium including instructions, such as a memory 1804 including instructions executable by the processor 1820 of the apparatus 1800 to perform the above methods, such as a storage medium including instructions. 1930, the above instructions may be executed by the central processor 1922 of the server 1900 to perform the above method. For example, the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
一种非临时性计算机可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行一种通方法,所述方法包括:A non-transitory computer readable storage medium, when instructions in the storage medium are executed by a processor of an electronic device, enabling the electronic device to perform a method of communication, the method comprising:
获取原始医案文本,并将所述原始医案文本划分成至少一个目标文本单元;Obtaining the original medical text and dividing the original medical text into at least one target text unit;
确定所述目标文本单元的文本特征对应的目标类别;Determining a target category corresponding to a text feature of the target text unit;
生成目标医案文本,其中,在所述目标医案文本中所述目标文本单元体现为所属目标类别下的文本信息。Generating a target medical text, wherein the target text unit in the target medical text is embodied as text information under the target category.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本发明的其它实施方案。本发明旨在涵盖本发明的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本发明的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本发明的真正范围和精神由下面的权利要求指出。Other embodiments of the invention will be apparent to those skilled in the <RTIgt; The present invention is intended to cover any variations, uses, or adaptations of the present invention, which are in accordance with the general principles of the invention and include common general knowledge or common technical means in the art that are not disclosed in the present disclosure. . The specification and examples are to be considered as illustrative only,
应当理解的是,本发明并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本发明的范围仅由所附的权利要求来限制It is to be understood that the invention is not limited to the details of the details of The scope of the invention is limited only by the appended claims
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 The above are only the preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are within the spirit and scope of the present invention, should be included in the protection of the present invention. Within the scope.

Claims (14)

  1. 一种医案信息的处理方法,其特征在于,包括:A method for processing medical record information, comprising:
    获取原始医案文本,并将所述原始医案文本划分成至少一个目标文本单元;Obtaining the original medical text and dividing the original medical text into at least one target text unit;
    确定所述目标文本单元的文本特征对应的目标类别;Determining a target category corresponding to a text feature of the target text unit;
    生成目标医案文本,其中,在所述目标医案文本中所述目标文本单元体现为所属目标类别下的文本信息。Generating a target medical text, wherein the target text unit in the target medical text is embodied as text information under the target category.
  2. 根据权利要求1所述的方法,其特征在于,所述确定所述目标文本单元的文本特征对应的目标类别,包括:The method according to claim 1, wherein the determining a target category corresponding to the text feature of the target text unit comprises:
    基于第一机器学习模型,确定所述目标文本单元的文本特征对应的目标类别,其中,所述第一机器学习模型通过对训练样本集中包括的历史医案文本的文本特征与预置类别之间的对应关系进行训练而得到。Determining, according to the first machine learning model, a target category corresponding to the text feature of the target text unit, wherein the first machine learning model passes between a text feature of the historical medical text included in the training sample set and the preset category The correspondence is obtained by training.
  3. 根据权利要求1所述的方法,其特征在于,所述目标类别为用于描述患者信息的类别、用于描述疾病名称的类别、用于描述症状陈述信息的类别、用于描述症状辨别信息的类别、用于描述医嘱信息的类别、或用于描述处方信息的类别。The method according to claim 1, wherein the target category is a category for describing patient information, a category for describing a disease name, a category for describing symptom statement information, and a description for symptom discrimination information. A category, a category used to describe medical order information, or a category used to describe prescription information.
  4. 根据权利要求1所述的方法,其特征在于,还包括:The method of claim 1 further comprising:
    从所述原始医案文本中提取用于描述第一特征项的目标特征词;Extracting a target feature word for describing the first feature item from the original medical text;
    其中,在所述目标医案文本中所述目标特征词体现为属于所述第一特征项的文本信息。The target feature word in the target medical text is embodied as text information belonging to the first feature item.
  5. 根据权利要求4所述的方法,其特征在于,所述从所述原始医案文本中提取用于描述第一特征项的目标特征词,包括:The method according to claim 4, wherein the extracting the target feature words for describing the first feature item from the original medical text includes:
    从所述原始医案文本中、所述第一特征项所属的目标类别下的文本信息中提取所述用于描述第一特征项的目标特征词。Extracting the target feature words for describing the first feature item from the text information under the target category to which the first feature item belongs in the original medical text.
  6. 根据权利要求4所述的方法,其特征在于,所述从所述原始医案文本中提取用于描述第一特征项的目标特征词,包括:The method according to claim 4, wherein the extracting the target feature words for describing the first feature item from the original medical text includes:
    对所述原始医案文本进行分析,得到用于描述所述第一特征项的初始特征词; Performing analysis on the original medical text to obtain an initial feature word for describing the first feature item;
    在标准特征词库中对所述初始特征词进行匹配,得到与所述初始特征词相匹配的标准特征词,作为所述用于描述第一特征项的目标特征词。The initial feature words are matched in a standard feature vocabulary to obtain a standard feature word that matches the initial feature word as the target feature word for describing the first feature item.
  7. 根据权利要求6所述的方法,其特征在于,所述对所述原始医案文本进行分析,得到用于描述所述第一特征项的初始特征词,包括:The method according to claim 6, wherein the analyzing the original medical text to obtain an initial feature word for describing the first feature item comprises:
    基于医学专用词库,对所述原始医案文本进行词法分析和/或句法分析,得到所述用于描述所述第一特征项的初始特征词。Performing lexical analysis and/or syntax analysis on the original medical text based on a medical-specific vocabulary to obtain the initial feature words for describing the first feature item.
  8. 根据权利要求6所述的方法,其特征在于,还包括:The method of claim 6 further comprising:
    建立用于描述所述第一特征项的初始特征词和目标特征词的对应关系,并体现在所述目标医案文本中。Corresponding relationship between the initial feature word and the target feature word for describing the first feature item is established and embodied in the target medical record text.
  9. 根据权利要求1所述的方法,其特征在于,还包括:The method of claim 1 further comprising:
    确定所述原始医案文本在第二特征项下对应的推断特征词,其中,所述推断特征词为在所述原始医案文本中没有记载的用于描述所述第二特征项的特征词;Determining the inferred feature word corresponding to the original medical text under the second feature item, wherein the inferred feature word is a feature word for describing the second feature item not recorded in the original medical text ;
    在所述目标医案文本中所述推断特征词体现为属于所述第二特征项的文本信息。The inferred feature word is embodied in the target medical text as text information belonging to the second feature.
  10. 根据权利要求9所述的方法,其特征在于,所述确定所述原始医案文本在第二特征项下对应的推断特征词,包括:The method according to claim 9, wherein the determining the inferred feature word corresponding to the original medical text under the second feature item comprises:
    基于第二机器学习模型,确定所述原始医案文本在所述第二特征项下对应的推断特征词,其中,所述第二机器学习模型通过对训练样本集中包括的历史医案文本与预置的用于描述所述第二特征项的推断特征词之间的对应关系进行训练而得到。Determining, according to the second machine learning model, the inferred feature words corresponding to the original medical text under the second feature item, wherein the second machine learning model passes the historical medical text and the pre-set included in the training sample set The corresponding relationship between the inferred feature words for describing the second feature item is obtained by training.
  11. 根据权利要求1所述的方法,其特征在于,所述生成目标医案文本之后,还包括:The method according to claim 1, wherein after the generating the target medical text, the method further comprises:
    查找与所述目标医案文本相匹配的预置医案文本,其中,所述预置医案文本在所述目标类别下的文本信息与所述目标文本单元相同或相似,所述目标类别包括用于描述患者个人信息的类别和/或用于描述患者症状的类别;Finding a preset medical text matching the target medical text, wherein the text information of the preset medical text under the target category is the same as or similar to the target text unit, and the target category includes a category used to describe a patient's personal information and/or a category used to describe a patient's symptoms;
    提取所述预置医案文本中用于描述诊断信息的类别下的文本信息,作为参考诊断信息体现在所述目标医案文本中。 Extracting text information under the category for describing the diagnosis information in the preset medical text is embodied as reference diagnostic information in the target medical text.
  12. 根据权利要求1所述的方法,其特征在于,所述获取原始医案文本,包括:The method of claim 1 wherein said obtaining the original medical text comprises:
    获取语音形式的医案信息;对所述医案信息进行语音识别,得到所述原始医案文本;Obtaining medical record information in a voice form; performing voice recognition on the medical record information to obtain the original medical record text;
    或者,or,
    获取图像形式的医案信息;对所述医案信息进行图像识别,得到所述原始医案文本。Obtaining medical record information in the form of an image; performing image recognition on the medical record information to obtain the original medical record text.
  13. 一种医案信息的处理装置,其特征在于,包括:A processing device for medical record information, comprising:
    获取单元,用于获取原始医案文本;An acquisition unit for obtaining the original medical text;
    划分单元,用于将所述原始医案文本划分成至少一个目标文本单元;a dividing unit, configured to divide the original medical text into at least one target text unit;
    第一确定单元,用于确定所述目标文本单元的文本特征对应的目标类别;a first determining unit, configured to determine a target category corresponding to the text feature of the target text unit;
    生成单元,用于生成目标医案文本,其中,在所述目标医案文本中所述目标文本单元体现为所属目标类别下的文本信息。And a generating unit, configured to generate the target medical text, wherein the target text unit is embodied as text information under the target target category in the target medical text.
  14. 一种设备,其特征在于,包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行所述一个或者一个以上程序包含用于进行以下操作的指令:An apparatus, comprising: a memory, and one or more programs, wherein one or more programs are stored in the memory and configured to execute the one or more programs by one or more processors Contains instructions for doing the following:
    获取原始医案文本,并将所述原始医案文本划分成至少一个目标文本单元;Obtaining the original medical text and dividing the original medical text into at least one target text unit;
    确定所述目标文本单元的文本特征对应的目标类别;Determining a target category corresponding to a text feature of the target text unit;
    生成目标医案文本,其中,在所述目标医案文本中所述目标文本单元体现为所属目标类别下的文本信息。 Generating a target medical text, wherein the target text unit in the target medical text is embodied as text information under the target category.
PCT/CN2017/077125 2016-12-28 2017-03-17 Method, device and equipment for processing medical record information WO2018120447A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611236257.2A CN108257676B (en) 2016-12-28 2016-12-28 Medical case information processing method, device and equipment
CN201611236257.2 2016-12-28

Publications (1)

Publication Number Publication Date
WO2018120447A1 true WO2018120447A1 (en) 2018-07-05

Family

ID=62707727

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/077125 WO2018120447A1 (en) 2016-12-28 2017-03-17 Method, device and equipment for processing medical record information

Country Status (2)

Country Link
CN (1) CN108257676B (en)
WO (1) WO2018120447A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284353A (en) * 2018-09-10 2019-01-29 平安科技(深圳)有限公司 Case search method, device, computer equipment and storage medium
CN111177117A (en) * 2019-12-17 2020-05-19 山东中医药大学第二附属医院 Traditional Chinese medicine medical record data processing method
CN111209924A (en) * 2018-11-19 2020-05-29 零氪科技(北京)有限公司 System for automatically extracting medical advice and application
CN116646046A (en) * 2023-07-27 2023-08-25 中日友好医院(中日友好临床医学研究所) Electronic medical record processing method and system based on Internet diagnosis and treatment

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125100A (en) * 2019-12-12 2020-05-08 东软集团股份有限公司 Data storage method and device, storage medium and electronic equipment
CN112131862B (en) * 2020-07-20 2021-12-03 中国中医科学院中医药信息研究所 Traditional Chinese medicine medical record data processing method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020453A (en) * 2012-12-15 2013-04-03 中国科学院深圳先进技术研究院 Generation method of structured electronic medical record based on ontology technology
CN103678281A (en) * 2013-12-31 2014-03-26 北京百度网讯科技有限公司 Method and device for automatically labeling text
CN103886034A (en) * 2014-03-05 2014-06-25 北京百度网讯科技有限公司 Method and equipment for building indexes and matching inquiry input information of user
CN104899260A (en) * 2015-05-20 2015-09-09 东华大学 Method for structured processing of Chinese pathological text
CN105808712A (en) * 2016-03-07 2016-07-27 陈宽 Intelligent system and method for converting text type medical reports into structured data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020453A (en) * 2012-12-15 2013-04-03 中国科学院深圳先进技术研究院 Generation method of structured electronic medical record based on ontology technology
CN103678281A (en) * 2013-12-31 2014-03-26 北京百度网讯科技有限公司 Method and device for automatically labeling text
CN103886034A (en) * 2014-03-05 2014-06-25 北京百度网讯科技有限公司 Method and equipment for building indexes and matching inquiry input information of user
CN104899260A (en) * 2015-05-20 2015-09-09 东华大学 Method for structured processing of Chinese pathological text
CN105808712A (en) * 2016-03-07 2016-07-27 陈宽 Intelligent system and method for converting text type medical reports into structured data

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284353A (en) * 2018-09-10 2019-01-29 平安科技(深圳)有限公司 Case search method, device, computer equipment and storage medium
CN109284353B (en) * 2018-09-10 2023-10-03 平安科技(深圳)有限公司 Medical case retrieval method, device, computer equipment and storage medium
CN111209924A (en) * 2018-11-19 2020-05-29 零氪科技(北京)有限公司 System for automatically extracting medical advice and application
CN111209924B (en) * 2018-11-19 2023-04-18 零氪科技(北京)有限公司 System for automatically extracting medical advice and application
CN111177117A (en) * 2019-12-17 2020-05-19 山东中医药大学第二附属医院 Traditional Chinese medicine medical record data processing method
CN111177117B (en) * 2019-12-17 2023-06-16 山东中医药大学第二附属医院 Data processing method for traditional Chinese medicine medical records
CN116646046A (en) * 2023-07-27 2023-08-25 中日友好医院(中日友好临床医学研究所) Electronic medical record processing method and system based on Internet diagnosis and treatment
CN116646046B (en) * 2023-07-27 2023-11-17 中日友好医院(中日友好临床医学研究所) Electronic medical record processing method and system based on Internet diagnosis and treatment

Also Published As

Publication number Publication date
CN108257676A (en) 2018-07-06
CN108257676B (en) 2020-03-03

Similar Documents

Publication Publication Date Title
WO2018120447A1 (en) Method, device and equipment for processing medical record information
US20170052947A1 (en) Methods and devices for training a classifier and recognizing a type of information
CN109522419B (en) Session information completion method and device
WO2017084541A1 (en) Method and apparatus for sending expression image during call session
KR102544453B1 (en) Method and device for processing information, and storage medium
JP6167245B2 (en) COMMUNICATION MESSAGE IDENTIFICATION METHOD, COMMUNICATION MESSAGE IDENTIFICATION DEVICE, PROGRAM, AND RECORDING MEDIUM
CN109471919B (en) Zero pronoun resolution method and device
CN105550643A (en) Medical term recognition method and device
RU2733816C1 (en) Method of processing voice information, apparatus and storage medium
CN108733718B (en) Search result display method and device and display device for search results
CN111898382A (en) Named entity recognition method and device for named entity recognition
CN108255939A (en) A kind of cross-language search method and apparatus, a kind of device for cross-language search
JP2022510660A (en) Data processing methods and their devices, electronic devices, and storage media
WO2022116527A1 (en) Data processing method and device
WO2018214663A1 (en) Voice-based data processing method and apparatus, and electronic device
WO2018006629A1 (en) Prescription matching method and device, and device for prescription matching
CN110634570A (en) Diagnostic simulation method and related device
CN111241844A (en) Information recommendation method and device
CN112948665A (en) Searching method, device and medium
US20170039874A1 (en) Assisting a user in term identification
CN109145151B (en) Video emotion classification acquisition method and device
CN111324214A (en) Statement error correction method and device
CN114822753A (en) Prescription auditing method and device, electronic equipment and storage medium
US11238863B2 (en) Query disambiguation using environmental audio
WO2017035985A1 (en) String storing method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17887578

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17887578

Country of ref document: EP

Kind code of ref document: A1