CN109830272A - Data normalization method, apparatus, computer equipment and storage medium - Google Patents
Data normalization method, apparatus, computer equipment and storage medium Download PDFInfo
- Publication number
- CN109830272A CN109830272A CN201910011828.XA CN201910011828A CN109830272A CN 109830272 A CN109830272 A CN 109830272A CN 201910011828 A CN201910011828 A CN 201910011828A CN 109830272 A CN109830272 A CN 109830272A
- Authority
- CN
- China
- Prior art keywords
- data
- type
- occurrence
- item data
- item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present application provides a kind of data normalization method, apparatus, computer equipment and storage medium.The described method includes: obtaining an item data to be normalized in physical examination report;Determine data type corresponding to the occurrence of the item data;The item data is standardized according to identified data type, wherein the mode of standardization corresponding to different types of data is different.The embodiment of the present application uses different standardization modes to the data of different types of data, can comprehensively be standardized to physical examination report, improve the precision and accuracy to the processing of physical examination reporting standardsization;Make the data after standardization that can be further used for model learning simultaneously, improves the consistency and accuracy of the data of model learning.
Description
Technical field
This application involves technical field of data processing more particularly to a kind of data normalization method, apparatus, computer equipment
And storage medium.
Background technique
Electronics physical examination report generally comprises bulk information, and corresponding physical examination project is a variety of multinomial, is not easy to handle, and causes at present
All relatively roughly, majority matches corresponding physical examination knot directly to identify by data format for common electronics physical examination report recognition methods
Fruit, the data that screening can identify are stored and are standardized, and later period model learning is used for.However in different physical examination reports,
The physical examination result of same item data expresses the consistent meaning, and the physical examination result in physical examination report is entirely different, and different
The physical examination result difference of project data is also very big, can not be identified completely by this rough recognition methods, while body
The identification of inspection project is not also comprehensive, and the data identified are also unfavorable for the study of later period model.
Summary of the invention
The embodiment of the present application provides a kind of data normalization method, apparatus, computer equipment and storage medium, and number can be improved
According to the precision and accuracy of standardization.
In a first aspect, the embodiment of the present application provides a kind of data normalization method, this method comprises:
Obtain an item data to be normalized in physical examination report;Determine data class corresponding to the occurrence of the item data
Type;The item data is standardized according to identified data type, wherein standard corresponding to different types of data
The mode for changing processing is different.
Second aspect, the embodiment of the invention provides a kind of data normalization device, which includes using
The corresponding unit of method described in the above-mentioned first aspect of execution.
The third aspect, the embodiment of the invention provides a kind of computer equipment, the computer equipment includes memory, with
And the processor being connected with the memory;
The memory is for storing computer program, and the processor is for running the calculating stored in the memory
Machine program, to execute method described in above-mentioned first aspect.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage
Media storage has computer program, when the computer program is executed by processor, realizes method described in above-mentioned first aspect.
Data type corresponding to occurrence of the embodiment of the present application by one item data of identification, and according to different data
Type carries out different standardizations to the occurrence of the item data.The embodiment of the present application adopts the data of different types of data
With different standardization modes, comprehensively physical examination report can be standardized, avoid important physical examination index or text
The omission of word feature improves precision and accuracy to the processing of physical examination reporting standardsization;After making standardization simultaneously
Data can be further used for model learning, improve the consistency and accuracy of the data of model learning.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description
Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field
For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the flow diagram of data normalization method provided by the embodiments of the present application;
Fig. 2 is the sub-process schematic diagram of data normalization method provided by the embodiments of the present application;
Fig. 3 is the sub-process schematic diagram of data normalization method provided by the embodiments of the present application;
Fig. 4 is the sub-process schematic diagram of Fig. 3 provided by the embodiments of the present application;
Fig. 5 is the sub-process schematic diagram of Fig. 3 provided by the embodiments of the present application;
Fig. 6 is the schematic block diagram of data normalization device provided by the embodiments of the present application;
Fig. 7 is the schematic block diagram of type determining units provided by the embodiments of the present application;
Fig. 8 is the schematic block diagram of Standardisation Cell provided by the embodiments of the present application;
Fig. 9 is the schematic block diagram of canonical matching unit provided by the embodiments of the present application;
Figure 10 is the schematic block diagram of natural-sounding processing unit provided by the embodiments of the present application;
Figure 11 is the schematic block diagram of computer equipment provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen
Please in embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall in the protection scope of this application.
The data being related in the embodiment of the present application with physical examination report in data instance be illustrated.It is to be appreciated that
Scheme in the application can also be applied to other scenes, and can also be other is not the data of physical examination report type.
Fig. 1 is the flow diagram of data normalization method provided by the embodiments of the present application.As shown in Figure 1, this method packet
Include S101-S103.
S101 obtains an item data to be normalized in physical examination report.
Wherein, physical examination report can have more parts, be also possible to portion.In the present embodiment, physical examination report has more parts.Physical examination
Data to be normalized in report have multinomial, such as weight, heart rate, liver color ultrasound, eyesight.Each single item data include: data
Item and occurrence.Such as data item: weight, occurrence are as follows: 176cm.If physical examination report has more parts, then obtaining more parts of physical examination reports
In an item data to be normalized, occurrence corresponding to such as weight and weight;If physical examination report only has portion, obtain
An item data to be normalized in this part of physical examination report.It is to be appreciated that the physical examination result of different people may be by different
What doctor provided, due to the habit difference of each doctor, then may have in the physical examination result of same item data it is multiple not
Same value, and the meaning of multiple different value expression is consistent, it is therefore desirable to physical examination result is standardized.
S102 determines data type corresponding to the occurrence of the item data, wherein data type include numeric type, piece
Type, COMPLEX MIXED type is simply mixed in act type.
In the present embodiment, by physical examination report in data type include numeric type, enumeration type, type, complexity be simply mixed
Mixed type.It is to be understood that by physical examination report in data type be divided into four seed types, which can cover substantially
All physical examination results in physical examination report.
The numeric type i.e. occurrence of the item data is specific value, such as 175cm, 50kg.Enumeration type, such as: " feminine gender ", " just
Often ", it " is not detected ", "+", " ++ " etc..Type is simply mixed, such as: " > 100 beats/min, nodal tachycardia ", this type is with numerical value
Based on.COMPLEX MIXED type such as " is shown in that multiple low echo nodules, maximum are located at lobus dexter about 14mm × 8mm, tubercle periphery in thyroid gland
Have vascular circle around ".Such case may be pure words, it may be possible to which text and numerical value mixing, relatively complicated, it may include
Enumeration type and situations such as be simply mixed type.
In one embodiment, as shown in Fig. 2, step S102 includes the following steps S201-S206.
S201 obtains the occurrence of the item data and detects to the occurrence of the acquired item data.
The data item as corresponding to the item data are as follows: weight, the occurrence of the item data are as follows: 176cm.So obtain this
The occurrence 176cm of data.The occurrence of the item data is detected to judge number corresponding to the occurrence of the item data
According to type.
S202 if the occurrence of the item data is number, or is the combination of number and unit, then it is determined that the item data
Occurrence corresponding to data type be numeric type.
Such as data item: age, corresponding occurrence are as follows: 28, it is as digital, determine that corresponding data type is numerical value
Type.Such as data item: hemoglobin, corresponding occurrence are 135g/L, the as combination of number and unit, are determined corresponding
Data type is numeric type.
S203, if the occurrence of the item data is one of preset enumerated value, then it is determined that the tool of the item data
Data type corresponding to body value is enumeration type.
Wherein, preset enumerated value includes " normal ", " Non Apparent Abnormality ", " showing no obvious abnormalities ", " feminine gender ", " not
Detection ", " no hyperemia ", " no enlargement ", " without special ", " positive ", "abnormal" etc.;Include grade classification, is also considered as and enumerates
Type, if such preset enumerated value includes "-", "+", " ++ ", " +++ " grade, such as the grade of glucose in urine, further include " I grades ",
" II grades ", " III level " etc., such as the grade of cleannes.
S204 if the occurrence of the item data not only includes text, but also includes number, whether not to judge the number of words of text
Whether the number more than the first preset quantity and number appearance is less than the second preset quantity.
Such as the occurrence of certain item data are as follows: double kidney form size positions journey, the visible strong echo accompanying sound shadow of left kidney, size is about
4*3mm.The occurrence of so item data not only includes text, but also includes number.It counts in the occurrence of the item data
Whether the number that the number of words of text and number occur judges number of words that text occurs more than the first preset quantity and number occurs
Whether number is more than the second preset quantity.Wherein, the first preset quantity can be 20, and the second preset quantity can be 2.The
One preset quantity and the second preset quantity can also be other numerical value.
S205, if the number of words of the text in the occurrence of the item data is less than time that the first preset quantity and number occur
Number is less than the second preset quantity, determines that data type corresponding to the occurrence of the item data is that type is simply mixed.
S206, if the number of words of the text in the occurrence of the item data has been more than the first preset quantity, or number occurs
Number be more than the number of words of text in the occurrence of the second preset quantity or the item data be more than the first preset quantity
And the number that number occurs has been more than the second preset quantity, determines data type corresponding to the occurrence of the item data for complexity
Mixed type.
It should be noted that the scheme of the data type of data determined above is not limited thereto, in other embodiments,
Other schemes can also be used to carry out the determination of data type.
S103 is standardized the item data according to identified data type, wherein different types of data institute
The mode of corresponding standardization is different.
According to the different different to the mode of item data processing of data type.
In one embodiment, as shown in figure 3, step S103 includes the following steps S301-S305.
S301 obtains data type corresponding to the occurrence of the identified item data.
S302, if data type corresponding to the occurrence of the item data is numeric type, to the item data in physical examination report
Occurrence handled, with the data unit of the unified item data.
If height is 168cm, or it is 1.68m, then unified be converted into 168cm, 168cm for height, or uniformly will
Height is converted into 1.68m, 1.68m, and the data unit of the item data is so carried out unification.If there are more parts of physical examination reports, need
Convert the occurrence of the item data in more parts of physical examination reports.
S303 will be in the item data occurrence if data type corresponding to the occurrence of the item data is enumeration type
Text carries out unification, or the occurrence of the item data and pre-set numerical value are carried out matching mapping.
Such as " normal ", " Non Apparent Abnormality ", " showing no obvious abnormalities ", " feminine gender ", " being not detected ", " no special " indicate
One meaning, then be all unified for " normal ".The matching of the item data occurrence and pre-set numerical value maps, as will
" normal " of physical examination item, "abnormal" are each mapped to 0 and 1, wherein 0 and 1 is the pre-set numerical value of the item data;To have
The "-" of such as glucose in urine of grade classification, "+", " ++ ", " +++ " are each mapped to 0,1,2,3 etc., wherein 0,1,2,3 be to be somebody's turn to do
The pre-set numerical value of item data.
S304 uses regular expression if data type corresponding to the occurrence of the item data is that type is simply mixed
Matched mode is standardized.
Regular expression describes the mode or rule of a kind of string matching, passes through predefined specific character
Matched text is gone in (rule) combination.It is standardized by the way of regular expression matching, first uses the mode of regular expression
Matched text is gone, then the text after matching is standardized.
S305 uses natural language if data type corresponding to the occurrence of the item data is COMPLEX MIXED type
The method of processing is standardized.
Natural language processing (Natural Language Processing, NLP) by " understanding " to natural-sounding come
It is standardized.
Embodiment shown in Fig. 3 is with according to different data types, such as numeric type, enumeration type are simply mixed type, are complicated mixed
Mould assembly etc. is standardized using different standardization processing methods.
In one embodiment, as shown in figure 4, step S303 includes the following steps S401-S405.
S401 obtains default regular expression corresponding to the occurrence of the item data according to the data item of the item data.
Such as data item: heart rate presets regular expression are as follows: Dou Xing.The occurrence led such as different physical examination Reporting Centers
May be: 80 beats/min of heart rate, sinus rate;Sinus property is aroused in interest, and 80 beats/min;80 beats/min, sinus property heart speed etc..Although different physical examinations
Description in report is inconsistent, but all " sinus property " occurs.With default regular expression: Dou Xing, to be matched, it is easy to
It is fitted on the data item.It should be noted that default regular expression corresponding to the same data item can have it is multiple.
S402, judges whether default regular expression matches with the occurrence of the item data.
If default regular expression are as follows: Dou Xing " sinus property " occurs in the occurrence of the item data, it is determined that default
Regular expression is matched with the occurrence of the item data, is otherwise determined and is mismatched.If it is determined that mismatching, then prompted.
S403, if default regular expression is matched with the occurrence of the item data, judge be in the occurrence of the item data
It is no to have symbol and number.
There are the description that symbol is had in the occurrence of some data item, such as " < 30 times ".
S404 extracts the tool of the item data according to preset format if having symbol and number in the occurrence of the item data
Feature corresponding to body value, to obtain standardization result.
Wherein, preset format can be with are as follows: number, symbol, unit.Such as " < 30 times ", the feature extracted according to preset format
Are as follows: 30, <, it is secondary;Such as " < 3.12mmol/L ", the feature extracted according to preset format are as follows: 3.12, <, mmol/L.It will be according to pre-
If the feature that format extracts is determined as standardization result.
S405 is extracted in the occurrence of the item data if the occurrence of the item data does not have symbol but to have number
Number, using the number extracted as standardization result.
If the occurrence of the item data does not have symbol but to have number, then according to being a number or multiple digital (its
In, the number of multiple numbers does not exceed the second preset quantity), it is divided into single digital and extracts and multiple digital extraction.It should be noted that
It is if an only number, to extract a number, such as extract 80 in " 80 beats/min of heart rate, sinus rate ".If having
Multiple numbers, then multiple numbers are extracted, multiple end values of multiple number as the item data, as eye test is " left
Eye vision 4.1, right vision 4.3 " extract 4.1 and 4.3, respectively correspond left vision and right vision.
This embodiment define the standardization modes of the data for the type that is simply mixed.
In one embodiment, as shown in figure 5, step S304 includes the following steps S501-S506.
S501, calls recurrence packet interface, and text corresponding to the occurrence to the item data carries out punctuate grouping.
Wherein, recurrence packet interface can be the interface provided in Chinese grammer analysis tool packet THULAC, use
Text corresponding to the occurrence to the item data carries out punctuate grouping.Wherein, THULAC is by Tsinghua University's natural language
A set of Chinese lexical analysis kit that processing is released with the development of society & culture's computing laboratory has Chinese word segmentation and part of speech mark
The functions such as note.It is to be understood that having long sentence in text corresponding to the occurrence of the item data, includes short sentence in long sentence, includes
Situations such as equal inside and outside number.Recurrence packet interface is called, text corresponding to the occurrence to the item data carries out punctuate grouping, greatly
Group (section) includes middle group, includes in middle group (sentence) group (short sentence or word) etc..
Whether S502, the data type of the text after judging punctuate grouping belong to numeric type or enumeration type or simple mixed
Mould assembly.The short sentence that will make pauses in reading unpunctuated ancient writings after being grouped carries out the judgement of data type.
S503, if the data type of the text after punctuate grouping belongs to numeric type or enumeration type or type is simply mixed,
Then using numeric type or enumeration type or the corresponding standardization mode of type is simply mixed it is standardized.
S504, if the data type of the text after punctuate grouping is not belonging to numeric type or enumeration type or is simply mixed
Type calls participle and part-of-speech tagging interface, carries out participle and part-of-speech tagging to the text after punctuate grouping, and analyzed,
To obtain the first result.
Specifically, the short sentence after obtaining punctuate grouping, calls participle and part-of-speech tagging interface, short sentence is segmented, and
Part of speech after determining participle;According to the part of speech after participle, the short sentence after punctuate grouping is analyzed according to certain rules, with
Obtain the first result.Wherein, part of speech includes noun, adjective etc..Termini generales are core word.Participle and part-of-speech tagging interface can
To be the interface provided in Chinese grammer analysis tool packet THULAC, for carrying out participle and part-of-speech tagging and grammer
Analysis etc..Corresponding function can also be completed using the participle of other participle tools offers and part-of-speech tagging interface.It presses
The short sentence after punctuate grouping is analyzed according to certain rule, such as a short sentence can be regarded as three parts: 1) what organ,
2) what's the matter, and 3) specific value;Such as 1) thyroid gland, 2) tubercle, 3) 2cm.It should be noted that the step in call participle and
When part-of-speech tagging interface is analyzed, mainly the short sentence for having numerical value is analyzed, extracts numerical value corresponding to core word
Feature.If not having numerical characteristics in the sentence, the first result is sky.
S505 calls keyword extraction algorithms, counts to the short sentence after punctuate grouping, to show that candidate keywords go out
The second frequency that existing first frequency and candidate keywords occurs in more parts of physical examination report files where the item data, root
According to the first frequency and the second frequency from the one group of pass extracted in the candidate keywords in the item data occurrence
Keyword, using the keyword extracted as the second result.
Wherein, keyword extraction algorithm can be used TF-IDF algorithm, TF, Term Frequency, what keyword occurred
Frequency, the frequency that keyword is occurred is as first frequency, i.e., (candidate) keyword occurs in the occurrence of the item data
Frequency;IDF, Inverse Document Frequency, reverse document frequency, what a word occurred in entire library dictionary
Frequency.Reverse document frequency is known as second frequency, i.e., is reported (candidate) keyword in more parts of physical examinations where the item data
The frequency occurred in document.The item number is extracted from the candidate keywords according to the first frequency and the second frequency
According to one group of keyword in occurrence, specifically: first frequency corresponding to candidate keywords and second frequency are multiplied to
To multiplied result;Multiplied result is arranged according to descending;First group of candidate keywords after extracting arrangement;By this first group candidate
Keyword thinks one group of keyword in data occurrence as this.Using this group of keyword as feature corresponding to the item data,
Using this feature as the second result.
The data item as corresponding to the item data (physical examination project) is lung, the keyword extracted (feature) are as follows: inflammation,
Calcification etc..Indicate that there is inflammation in lung and there is calcification phenomenon in lung.
S506, using first result and second result as standardization knot corresponding to the item data occurrence
Fruit.
In one embodiment, before calling participle and part-of-speech tagging interface, the step further includes S503a.
S503a, detecting, which whether there is in the text after punctuate is grouped, number.If punctuate grouping after text in there are
Number executes the step of calling participle and part-of-speech tagging interface;If then being held in the text after punctuate grouping there is no there is number
Row step S505.
The embodiment, step " call participle and part-of-speech tagging interface, carry out participle and word to the text after punctuate grouping
Property mark, and analyzed " primarily directed to the situation for having number, if there is no there is number in text after punctuate grouping,
Without execution " participle and part-of-speech tagging interface is called, participle and part-of-speech tagging are carried out to the text after punctuate grouping, and carry out
The step of analysis ", reduces standardized calculation amount, saved the standardized time.
In one embodiment, after step S506, the method also includes S506a, S506b, S506c.
S506a obtains feature and signature identification corresponding to the occurrence of the pre-set item data.
Such as data item lung, whether pre-set feature is " normal ", " inflammation ", " calcification " etc. respectively.Institute is right
The signature identification answered is respectively that " 0,1 " (0 indicates normal;1 indicates abnormal), " 0,1 " (0 indicates no corresponding feature, that is, does not have
Inflammation;1 indicates corresponding feature, that is, has inflammation), " 0,1 " (0 indicates no corresponding feature, i.e., no calcification;1 indicates
Corresponding feature, i.e. calcification).
S506b matches standardization result corresponding to the item data occurrence to obtain with pre-set feature
To matching result.
If standardization result is "abnormal", " inflammation ", then with the matching result that is obtained after pre-set characteristic matching
For "abnormal", " inflammation ".
S506c is marked the standardization result using corresponding signature identification according to matching result.
If matching result be "abnormal", " inflammation ", using corresponding signature identification label result be respectively " 1 ",
"1","0";If matching result is "abnormal", " inflammation ", " calcification ", then the result point of the label using corresponding signature identification
It Wei not " 1 ", " 1 ", " 1 ".
Further standardization result is marked for the embodiment, and standardization result is quantized, convenient for point of model
Analysis and statistics.
Above method embodiment targetedly classifies to the data in physical examination report, and data type is such as divided into four
The different type of kind, and different standardizations is carried out respectively to the data in physical examination report according to the Different Results of classification,
Comprehensively physical examination report can be standardized, avoid the omission of important physical examination index or character features, improve simultaneously
To the precision and accuracy of the processing of physical examination reporting standardsization.Data after standardization can be further used for model learning, mention
The high consistency and accuracy of the data of model learning.
Fig. 6 is the schematic block diagram of data normalization device provided by the embodiments of the present application.As shown in fig. 6, the device packet
It includes for executing unit corresponding to above-mentioned data normalization method.Specifically, as shown in fig. 6, the device 60 includes obtaining list
First 601, type determining units 602, Standardisation Cell 603.
Acquiring unit 601, for obtaining an item data to be normalized in physical examination report.
Type determining units 602, for determining data type corresponding to the occurrence of the item data, wherein data class
Type includes numeric type, enumeration type, type, COMPLEX MIXED type is simply mixed.
In one embodiment, as shown in fig. 7, type determining units 602 are including obtaining detection unit 701, numeric type determines
Unit 702, enumeration type determination unit 703, quantity judging unit 704 and mixed type determination unit 705.Wherein, detection is obtained
Unit 701, for obtaining the occurrence of the item data and being detected to the occurrence of the acquired item data.Numeric type is true
Order member 702 if the occurrence for the item data is number, or is the combination of number and unit, then it is determined that the item number
According to occurrence corresponding to data type be numeric type.Enumeration type determination unit 703, if the occurrence for the item data is
One of preset enumerated value, then it is determined that data type corresponding to the occurrence of the item data is enumeration type.Quantity
Judging unit 704 if the occurrence for the item data not only includes text, but also includes number, judges that the number of words of text is
It is no to be less than whether the number that the first preset quantity and number occur is less than the second preset quantity.Mixed type determination unit 705,
If the number of words for the text in the occurrence of the item data is less than the number that the first preset quantity and number occur and is less than
Second preset quantity determines that data type corresponding to the occurrence of the item data is that type is simply mixed;Otherwise, it determines the item number
According to occurrence corresponding to data type be COMPLEX MIXED type.
Standardisation Cell 603, for being standardized according to identified data type to the item data, wherein
The mode of standardization corresponding to different types of data is different.
In one embodiment, as shown in figure 8, Standardisation Cell 603 includes type acquiring unit 801, numeric processing unit
802, processing unit 803, canonical matching unit 804 and natural language processing unit 805 are enumerated.Wherein, type acquiring unit
801, for obtaining data type corresponding to the occurrence of the identified item data.Numeric processing unit 802, if for should
Data type corresponding to the occurrence of item data is numeric type, is handled the occurrence of the item data in physical examination report,
To unify the data unit of the item data.Processing unit 803 is enumerated, if data class corresponding to the occurrence for the item data
Type is enumeration type, by the text in the item data occurrence carry out unification, or by the occurrence of the item data with preset
Numerical value carry out matching mapping.Canonical matching unit 804, if data type corresponding to occurrence for the item data is letter
Single mixed type, then be standardized by the way of regular expression matching.Natural language processing unit 805, if being used for this
Data type corresponding to the occurrence of data is COMPLEX MIXED type, then carries out standard using the method for natural language processing
Change.
In one embodiment, as shown in figure 9, canonical matching unit 804 includes expression formula acquiring unit 901, matching judgment
Unit 902, sign digit judging unit 903, the first extraction unit 904 and the second extraction unit 905.Wherein, expression formula obtains
Unit 901 is taken, for the data item according to the item data, obtains default regular expressions corresponding to the occurrence of the item data
Formula.Matching judgment unit 902, for judging whether default regular expression matches with the occurrence of the item data.Sign digit
Judging unit 903 judges in the occurrence of the item data if matching for default regular expression with the occurrence of the item data
Whether symbol and number are had.First extraction unit 904, if for having symbol and number in the occurrence of the item data, according to
Preset format extracts feature corresponding to the occurrence of the item data, to obtain standardization result.Second extraction unit 905 is used
If there is no symbol but to have number in the occurrence of the item data, the number in the occurrence of the item data is extracted, will be mentioned
The number of taking-up is as standardization result.
In one embodiment, as shown in Figure 10, natural language processing unit 805 is sentenced including punctuate unit 101, text type
Disconnected unit 102, part of speech analytical unit 103, keyword extracting unit 104, result determination unit 105.Wherein, punctuate unit 101,
For calling recurrence packet interface, text corresponding to the occurrence to the item data carries out punctuate grouping.Text type judgement
Unit 102, for judging whether the data type of the text after punctuate grouping belongs to numeric type or enumeration type or simple mixed
Mould assembly.If the data type of the text after punctuate grouping belongs to numeric type or enumeration type or type is simply mixed, number is triggered
Value processing unit enumerates processing unit or canonical matching unit.Part of speech analytical unit 103, if after for grouping of making pauses in reading unpunctuated ancient writings
The data type of text is not belonging to numeric type or enumeration type or type is simply mixed, and calls participle and part-of-speech tagging interface, right
Text after punctuate grouping carries out participle and part-of-speech tagging, and is analyzed, to obtain the first result.Keyword extracting unit
104, for calling keyword extraction algorithms, the short sentence after punctuate grouping is counted, to obtain what candidate keywords occurred
The second frequency that first frequency and candidate keywords occur in more parts of physical examination report files where the item data, according to institute
First frequency and the second frequency are stated from the one group of keyword extracted in the item data occurrence in the candidate keywords,
Using the keyword extracted as the second result.As a result determination unit 105 are used for first result and second result
As standardization result corresponding to the item data occurrence.
In one embodiment, the natural language processing unit 804 further includes Digital Detecting unit 102a.Wherein, digital
Detection unit 102a, if the data type for the text after grouping of making pauses in reading unpunctuated ancient writings is not belonging to numeric type or enumeration type or simple mixed
Mould assembly, detecting, which whether there is in the text after punctuate is grouped, number.There is number if it exists, triggers part of speech analytical unit 103.If
There is no numbers, trigger keyword extracting unit 104.
In one embodiment, the natural language processing unit 804 further includes signature identification acquiring unit 105a, feature
With unit 105b, marking unit 105c.Wherein, signature identification acquiring unit 105a, for obtaining the pre-set item data
Occurrence corresponding to feature and signature identification.Characteristic matching unit 105b, for will be corresponding to the item data occurrence
Standardization result is matched with pre-set feature to obtain matching result.Marking unit 105c, for being tied according to matching
Fruit is marked the standardization result using corresponding signature identification.
It should be noted that it is apparent to those skilled in the art that, the tool of above-mentioned apparatus and each unit
Body realizes process, can be no longer superfluous herein with reference to the corresponding description in preceding method embodiment, for convenience of description and succinctly
It states.
Above-mentioned apparatus can be implemented as a kind of form of computer program, and computer program can be in meter as shown in figure 11
It calculates and is run on machine equipment.
Figure 11 is a kind of schematic block diagram of computer equipment provided by the embodiments of the present application.The equipment is that terminal etc. is set
It is standby, such as mobile terminal, PC terminal, IPad.The equipment 110 includes the processor 112 connected by system bus 111, storage
Device and network interface 113, wherein memory may include non-volatile memory medium 114 and built-in storage 115.
The non-volatile memory medium 114 can storage program area 1141 and computer program 1142.This is non-volatile to deposit
When the computer program 1142 stored in storage media is executed by processor 112, it can be achieved that data normalization side described above
Method.The processor 112 supports the operation of whole equipment for providing calculating and control ability.The built-in storage 115 is non-volatile
Property storage medium in computer program operation provide environment, the computer program by processor 112 execute when, may make place
Reason device 112 executes data normalization method described above.The network interface 113 is for carrying out network communication.Art technology
Personnel are appreciated that structure shown in Figure 11, and only the block diagram of part-structure relevant to application scheme, is not constituted
Restriction to the equipment that application scheme is applied thereon, specific equipment may include more more or fewer than as shown in the figure
Component perhaps combines certain components or with different component layouts.
Wherein, the processor 112 is for running computer program stored in memory, to realize following steps:
Obtain an item data to be normalized in physical examination report;Determine data class corresponding to the occurrence of the item data
Type;The item data is standardized according to identified data type, wherein standard corresponding to different types of data
The mode for changing processing is different.
In one embodiment, the data type includes numeric type, enumeration type, type, COMPLEX MIXED type is simply mixed, described
When the step of the data type corresponding to the occurrence for executing the determination item data of processor 112, it is implemented as follows
Step:
It obtains the occurrence of the item data and the occurrence of the acquired item data is detected;If the item data
Occurrence is number, or is the combination of number and unit, then it is determined that data type corresponding to the occurrence of the item data
For numeric type;If the occurrence of the item data is one of preset enumerated value, then it is determined that the occurrence of the item data
Corresponding data type is enumeration type;If the occurrence of the item data not only includes text, but also includes number, text is judged
Number of words whether be less than the first preset quantity and number occur number whether be less than the second preset quantity;If the item data
Occurrence in text number of words be less than the first preset quantity and number occur number be less than the second preset quantity, really
Data type corresponding to the occurrence of the fixed item data is that type is simply mixed;Otherwise, it determines the occurrence institute of the item data is right
The data type answered is COMPLEX MIXED type.
In one embodiment, the data type includes numeric type, enumeration type, type, COMPLEX MIXED type is simply mixed, described
Processor 112 is when executing the step that the data type according to determined by is standardized the item data, specifically
Realize following steps:
If data type corresponding to the occurrence of the item data be numeric type, to physical examination report in the item data it is specific
Value is handled, with the data unit of the unified item data;If data type corresponding to the occurrence of the item data is to enumerate
Text in the item data occurrence is carried out unification by type, or by the occurrence of the item data and pre-set numerical value into
Row matching mapping;If data type corresponding to the occurrence of the item data is that type is simply mixed, regular expression is used
The mode matched is standardized;If data type corresponding to the occurrence of the item data is COMPLEX MIXED type, using certainly
The method of right Language Processing is standardized.
In one embodiment, if the data class corresponding to the occurrence for executing the item data of the processor 112
Type is that type is simply mixed, then when the step being standardized by the way of regular expression matching, is implemented as follows step:
According to the data item of the item data, default regular expression corresponding to the occurrence of the item data is obtained;Judgement
Whether default regular expression matches with the occurrence of the item data;If the occurrence of default regular expression and the item data
Match, judges whether there is symbol and number in the occurrence of the item data;If having symbol and number in the occurrence of the item data,
Feature corresponding to the occurrence of the item data is extracted, according to preset format to obtain standardization result;If the tool of the item data
There is no symbol but to have number in body value, then extract the number in the occurrence of the item data, by the number extracted as mark
Standardization result.
In one embodiment, if the data corresponding to the occurrence for executing the item data of the processor 112
Type is COMPLEX MIXED type, then when the step being standardized using the method for natural language processing, is implemented as follows step
It is rapid:
Recurrence packet interface is called, text corresponding to the occurrence to the item data carries out punctuate grouping;Judgement punctuate
Whether the data type of the text after grouping belongs to numeric type or enumeration type or type is simply mixed;If the text after punctuate grouping
This data type, which belongs to numeric type or enumeration type type is perhaps simply mixed, then uses numeric type or enumeration type or simple
The corresponding standardization mode of mixed type is standardized;If the data type of the text after punctuate grouping is not belonging to count
Type is simply mixed in value type or enumeration type, calls participle and part-of-speech tagging interface, divides the text after punctuate grouping
Word and part-of-speech tagging, and analyzed, to obtain the first result;Keyword extraction algorithms are called, to short after punctuate grouping
Sentence counted, with obtain candidate keywords occur first frequency and candidate keywords in more parts of bodies where the item data
The second frequency occurred in inspection report file, mentions from the candidate keywords according to the first frequency with the second frequency
One group of keyword in the item data occurrence is taken out, using the keyword extracted as the second result;By first result
With second result as standardization result corresponding to the item data occurrence.
In one embodiment, the processor 112 execute it is described using first result and second result as
After the step of standardization result corresponding to the item data occurrence, following steps are also realized:
Obtain feature corresponding to the occurrence of the pre-set item data and signature identification;By the item data occurrence
Corresponding standardization result is matched with pre-set feature to obtain matching result;According to matching result, using pair
The standardization result is marked in the signature identification answered.
In one embodiment, the processor 112 is executing the calling participle and part-of-speech tagging interface, is grouped to punctuate
Text afterwards carries out participle and part-of-speech tagging, and is analyzed, the step of to obtain the first result before, also realize following step
It is rapid:
Whether there is in text after detection punctuate grouping has number;If there are number in the text after punctuate grouping,
It executes and calls participle and part-of-speech tagging interface, participle and part-of-speech tagging are carried out to the text after punctuate grouping, and analyzed,
With the step of obtaining the first result.
It should be appreciated that in the embodiment of the present application, alleged processor 112 can be central processing unit (Central
Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital
Signal Processor, DSP), specific integrated circuit (application program lication Specific Integrated
Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other can
Programmed logic device, discrete gate or transistor logic, discrete hardware components etc..General processor can be microprocessor
Or the processor is also possible to any conventional processor etc..
Those of ordinary skill in the art will appreciate that be realize above-described embodiment method in all or part of the process,
It is that relevant hardware can be instructed to complete by computer program.The computer program can be stored in a storage medium,
The storage medium can be computer readable storage medium.The computer program is by the processing of at least one of the computer system
Device executes, to realize the process step of the embodiment of the above method.
Therefore, present invention also provides a kind of storage mediums.The storage medium can be computer readable storage medium.It should
Storage medium is stored with computer program, which performs the steps of when being executed by a processor
Obtain an item data to be normalized in physical examination report;Determine data class corresponding to the occurrence of the item data
Type;The item data is standardized according to identified data type, wherein standard corresponding to different types of data
The mode for changing processing is different.
In one embodiment, the data type includes numeric type, enumeration type, type, COMPLEX MIXED type is simply mixed, described
When the step of processor data type corresponding to the occurrence for executing the determination item data, it is implemented as follows step
It is rapid:
It obtains the occurrence of the item data and the occurrence of the acquired item data is detected;If the item data
Occurrence is number, or is the combination of number and unit, then it is determined that data type corresponding to the occurrence of the item data
For numeric type;If the occurrence of the item data is one of preset enumerated value, then it is determined that the occurrence of the item data
Corresponding data type is enumeration type;If the occurrence of the item data not only includes text, but also includes number, text is judged
Number of words whether be less than the first preset quantity and number occur number whether be less than the second preset quantity;If the item data
Occurrence in text number of words be less than the first preset quantity and number occur number be less than the second preset quantity, really
Data type corresponding to the occurrence of the fixed item data is that type is simply mixed;Otherwise, it determines the occurrence institute of the item data is right
The data type answered is COMPLEX MIXED type.
In one embodiment, the data type includes numeric type, enumeration type, type, COMPLEX MIXED type is simply mixed, described
Processor is when executing the step that the data type according to determined by is standardized the item data, specific implementation
Following steps:
If data type corresponding to the occurrence of the item data be numeric type, to physical examination report in the item data it is specific
Value is handled, with the data unit of the unified item data;If data type corresponding to the occurrence of the item data is to enumerate
Text in the item data occurrence is carried out unification by type, or by the occurrence of the item data and pre-set numerical value into
Row matching mapping;If data type corresponding to the occurrence of the item data is that type is simply mixed, regular expression is used
The mode matched is standardized;If data type corresponding to the occurrence of the item data is COMPLEX MIXED type, using certainly
The method of right Language Processing is standardized.
In one embodiment, if processor data type corresponding to the occurrence for executing the item data is
Type is simply mixed, then when the step being standardized by the way of regular expression matching, is implemented as follows step:
According to the data item of the item data, default regular expression corresponding to the occurrence of the item data is obtained;Judgement
Whether default regular expression matches with the occurrence of the item data;If the occurrence of default regular expression and the item data
Match, judges whether there is symbol and number in the occurrence of the item data;If having symbol and number in the occurrence of the item data,
Feature corresponding to the occurrence of the item data is extracted, according to preset format to obtain standardization result;If the tool of the item data
There is no symbol but to have number in body value, then extract the number in the occurrence of the item data, by the number extracted as mark
Standardization result.
In one embodiment, if processor data type corresponding to the occurrence for executing the item data
For COMPLEX MIXED type, then when the step being standardized using the method for natural language processing, step is implemented as follows:
Recurrence packet interface is called, text corresponding to the occurrence to the item data carries out punctuate grouping;Judgement punctuate
Whether the data type of the text after grouping belongs to numeric type or enumeration type or type is simply mixed;If the text after punctuate grouping
This data type, which belongs to numeric type or enumeration type type is perhaps simply mixed, then uses numeric type or enumeration type or simple
The corresponding standardization mode of mixed type is standardized;If the data type of the text after punctuate grouping is not belonging to count
Type is simply mixed in value type or enumeration type, calls participle and part-of-speech tagging interface, divides the text after punctuate grouping
Word and part-of-speech tagging, and analyzed, to obtain the first result;Keyword extraction algorithms are called, to short after punctuate grouping
Sentence counted, with obtain candidate keywords occur first frequency and candidate keywords in more parts of bodies where the item data
The second frequency occurred in inspection report file, mentions from the candidate keywords according to the first frequency with the second frequency
One group of keyword in the item data occurrence is taken out, using the keyword extracted as the second result;By first result
With second result as standardization result corresponding to the item data occurrence.
In one embodiment, the processor is described using first result and second result as this in execution
After the step of standardization result corresponding to data occurrence, following steps are also realized:
Obtain feature corresponding to the occurrence of the pre-set item data and signature identification;By the item data occurrence
Corresponding standardization result is matched with pre-set feature to obtain matching result;According to matching result, using pair
The standardization result is marked in the signature identification answered.
In one embodiment, the processor is executing the calling participle and part-of-speech tagging interface, after punctuate grouping
Text carry out participle and part-of-speech tagging, and analyzed, the step of to obtain the first result before, also realize following step
It is rapid:
Whether there is in text after detection punctuate grouping has number;If there are number in the text after punctuate grouping,
It executes and calls participle and part-of-speech tagging interface, participle and part-of-speech tagging are carried out to the text after punctuate grouping, and analyzed,
With the step of obtaining the first result.
The storage medium can be USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), magnetic disk
Or the various computer readable storage mediums that can store program code such as CD.
In several embodiments provided herein, it should be understood that disclosed device, device and method, it can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, the division of the unit,
Only a kind of logical function partition, there may be another division manner in actual implementation.Those skilled in the art can be with
It is well understood, for convenience of description and succinctly, the specific work process of the device of foregoing description, equipment and unit can
With with reference to the corresponding process in preceding method embodiment, details are not described herein.The above, the only specific embodiment party of the application
Formula, but the protection scope of the application is not limited thereto, and anyone skilled in the art discloses in the application
In technical scope, various equivalent modifications or substitutions can be readily occurred in, these modifications or substitutions should all cover the guarantor in the application
Within the scope of shield.Therefore, the protection scope of the application should be subject to the protection scope in claims.
Claims (10)
1. a kind of data normalization method, which is characterized in that the described method includes:
Obtain an item data to be normalized in physical examination report;
Determine data type corresponding to the occurrence of the item data;
The item data is standardized according to identified data type, wherein mark corresponding to different types of data
The mode of standardization processing is different.
2. the method according to claim 1, wherein the data type includes numeric type, enumeration type, simply mixes
Mould assembly, COMPLEX MIXED type, data type corresponding to the occurrence of the determination item data, comprising:
It obtains the occurrence of the item data and the occurrence of the acquired item data is detected;
If the occurrence of the item data is number, or is the combination of number and unit, then it is determined that the occurrence of the item data
Corresponding data type is numeric type;
If the occurrence of the item data is one of preset enumerated value, then it is determined that corresponding to the occurrence of the item data
Data type be enumeration type;
If the occurrence of the item data not only includes text, but also includes number, judge whether the number of words of text is less than first
Whether the number that preset quantity and number occur is less than the second preset quantity;
If the number of words of the text in the occurrence of the item data is less than the number that the first preset quantity and number occur and is less than
Second preset quantity determines that data type corresponding to the occurrence of the item data is that type is simply mixed;Otherwise, it determines the item number
According to occurrence corresponding to data type be COMPLEX MIXED type.
3. the method according to claim 1, wherein the data type includes numeric type, enumeration type, simply mixes
Mould assembly, COMPLEX MIXED type, the data type according to determined by are standardized the item data, comprising:
If data type corresponding to the occurrence of the item data be numeric type, to physical examination report in the item data occurrence into
Row processing, with the data unit of the unified item data;
If data type corresponding to the occurrence of the item data is enumeration type, the text in the item data occurrence is united
One, or the occurrence of the item data and pre-set numerical value are subjected to matching mapping;
If data type corresponding to the occurrence of the item data is that type is simply mixed, by the way of regular expression matching
It is standardized;
If data type corresponding to the occurrence of the item data is COMPLEX MIXED type, the method for using natural language processing
To be standardized.
4. if according to the method described in claim 3, it is characterized in that, data class corresponding to the occurrence of the item data
Type is that type is simply mixed, then is standardized by the way of regular expression matching, comprising:
According to the data item of the item data, default regular expression corresponding to the occurrence of the item data is obtained;
Judge whether default regular expression matches with the occurrence of the item data;
If default regular expression is matched with the occurrence of the item data, judge whether to have in the occurrence of the item data symbol with
Number;
If having symbol and number in the occurrence of the item data, extracted corresponding to the occurrence of the item data according to preset format
Feature, to obtain standardization result;
If not having symbol but to have number in the occurrence of the item data, the number in the occurrence of the item data is extracted, it will
The number extracted is as standardization result.
5. if according to the method described in claim 3, it is characterized in that, data corresponding to the occurrence of the item data
Type is COMPLEX MIXED type, then is standardized using the method for natural language processing, comprising:
Recurrence packet interface is called, text corresponding to the occurrence to the item data carries out punctuate grouping;
Whether the data type of the text after judging punctuate grouping belongs to numeric type or enumeration type or type is simply mixed;
If the data type of the text after punctuate grouping belongs to numeric type or enumeration type or type is simply mixed, numerical value is used
Type or enumeration type are simply mixed the corresponding standardization mode of type and are standardized;
If the data type of the text after punctuate grouping is not belonging to numeric type or enumeration type or type is simply mixed, participle is called
With part-of-speech tagging interface, participle and part-of-speech tagging are carried out to the text after punctuate grouping, and analyzed, to obtain the first knot
Fruit;
Keyword extraction algorithms are called, the short sentence after punctuate grouping is counted, to obtain the first of candidate keywords appearance
The second frequency that frequency and candidate keywords occur in more parts of physical examination report files where the item data, according to described
One frequency and the second frequency will be mentioned from the one group of keyword extracted in the item data occurrence in the candidate keywords
The keyword of taking-up is as the second result;
Using first result and second result as standardization result corresponding to the item data occurrence.
6. according to the method described in claim 5, it is characterized in that, the method also includes:
Obtain feature corresponding to the occurrence of the pre-set item data and signature identification;
Standardization result corresponding to the item data occurrence is matched with pre-set feature to obtain matching result;
According to matching result, the standardization result is marked using corresponding signature identification.
7. according to the method described in claim 5, it is characterized in that, in calling participle and part-of-speech tagging interface, to punctuate
Text after grouping carries out participle and part-of-speech tagging, and is analyzed, and before obtaining the first result, the method is also wrapped
It includes:
Whether there is in text after detection punctuate grouping has number;
There is number if it exists, execute and call participle and part-of-speech tagging interface, participle and word are carried out to the text after punctuate grouping
Property mark, and analyzed, the step of to obtain the first result.
8. a kind of data normalization device, which is characterized in that the data normalization device includes:
Acquiring unit, for obtaining an item data to be normalized in physical examination report;
Type determining units, for determining data type corresponding to the occurrence of the item data;
Standardisation Cell, for being standardized according to identified data type to the item data, wherein different data
The mode of standardization corresponding to type is different.
9. a kind of computer equipment, which is characterized in that the computer equipment includes memory, and is connected with the memory
Processor;
The memory is for storing computer program;The processor is for running the computer journey stored in the memory
Sequence, to execute the method according to claim 1 to 7.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey
Sequence when the computer program is executed by processor, realizes the method according to claim 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910011828.XA CN109830272B (en) | 2019-01-07 | 2019-01-07 | Data standardization method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910011828.XA CN109830272B (en) | 2019-01-07 | 2019-01-07 | Data standardization method and device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109830272A true CN109830272A (en) | 2019-05-31 |
CN109830272B CN109830272B (en) | 2022-08-30 |
Family
ID=66860174
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910011828.XA Active CN109830272B (en) | 2019-01-07 | 2019-01-07 | Data standardization method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109830272B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110957016A (en) * | 2019-11-21 | 2020-04-03 | 山东鲁能软件技术有限公司 | Physical examination data intelligent recognition system and method based on health cloud management platform |
CN113392939A (en) * | 2021-08-16 | 2021-09-14 | 江苏苏宁银行股份有限公司 | Industrial code standardization method and device, electronic equipment and storage medium |
CN115064237A (en) * | 2022-06-09 | 2022-09-16 | 山东浪潮智慧医疗科技有限公司 | Method for realizing standardization of hospital physical examination summary data |
WO2024067442A1 (en) * | 2022-09-27 | 2024-04-04 | 华为技术有限公司 | Data management method and related apparatus |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030105638A1 (en) * | 2001-11-27 | 2003-06-05 | Taira Rick K. | Method and system for creating computer-understandable structured medical data from natural language reports |
CN107545934A (en) * | 2017-05-11 | 2018-01-05 | 新华三大数据技术有限公司 | The extracting method and device of numeric type index |
CN108733837A (en) * | 2018-05-28 | 2018-11-02 | 杭州依图医疗技术有限公司 | A kind of the natural language structural method and device of case history text |
-
2019
- 2019-01-07 CN CN201910011828.XA patent/CN109830272B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030105638A1 (en) * | 2001-11-27 | 2003-06-05 | Taira Rick K. | Method and system for creating computer-understandable structured medical data from natural language reports |
CN107545934A (en) * | 2017-05-11 | 2018-01-05 | 新华三大数据技术有限公司 | The extracting method and device of numeric type index |
CN108733837A (en) * | 2018-05-28 | 2018-11-02 | 杭州依图医疗技术有限公司 | A kind of the natural language structural method and device of case history text |
Non-Patent Citations (1)
Title |
---|
吕旭东: "一种电子病历系统体系结构及其关键技术", 《中国生物医学工程学报》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110957016A (en) * | 2019-11-21 | 2020-04-03 | 山东鲁能软件技术有限公司 | Physical examination data intelligent recognition system and method based on health cloud management platform |
CN110957016B (en) * | 2019-11-21 | 2023-08-08 | 山东鲁能软件技术有限公司 | Physical examination data intelligent identification system and method based on health cloud management platform |
CN113392939A (en) * | 2021-08-16 | 2021-09-14 | 江苏苏宁银行股份有限公司 | Industrial code standardization method and device, electronic equipment and storage medium |
CN115064237A (en) * | 2022-06-09 | 2022-09-16 | 山东浪潮智慧医疗科技有限公司 | Method for realizing standardization of hospital physical examination summary data |
WO2024067442A1 (en) * | 2022-09-27 | 2024-04-04 | 华为技术有限公司 | Data management method and related apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN109830272B (en) | 2022-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109830272A (en) | Data normalization method, apparatus, computer equipment and storage medium | |
US9665565B2 (en) | Semantic similarity evaluation method, apparatus, and system | |
CN107644011B (en) | System and method for fine-grained medical entity extraction | |
US20220044812A1 (en) | Automated generation of structured patient data record | |
CN111898366B (en) | Document subject word aggregation method and device, computer equipment and readable storage medium | |
US8935155B2 (en) | Method for processing medical reports | |
US11755661B2 (en) | Text entry assistance and conversion to structured medical data | |
Friedlin et al. | A natural language processing system to extract and code concepts relating to congestive heart failure from chest radiology reports | |
CN112541066B (en) | Text-structured-based medical and technical report detection method and related equipment | |
CN112562807A (en) | Medical data analysis method, apparatus, device, storage medium, and program product | |
CN110263155B (en) | Data classification method, and training method and system of data classification model | |
CN109471950B (en) | Method for constructing structured knowledge network of abdominal ultrasonic text data | |
Litkowski | Pattern dictionary of english prepositions | |
CN108363691A (en) | A kind of field term identifying system and method for 95598 work order of electric power | |
CN110741376A (en) | Automatic document analysis for different natural languages | |
CN109299467B (en) | Medical text recognition method and device and sentence recognition model training method and device | |
CN111177356B (en) | Acid-base index medical big data analysis method and system | |
CN115359799A (en) | Speech recognition method, training method, device, electronic equipment and storage medium | |
CN110347805A (en) | Petroleum industry security risk key element extracting method, device, server and storage medium | |
CN114077832A (en) | Chinese text error correction method and device, electronic equipment and readable storage medium | |
CN110060749B (en) | Intelligent electronic medical record diagnosis method based on SEV-SDG-CNN | |
CN111859032A (en) | Method and device for detecting character-breaking sensitive words of short message and computer storage medium | |
WO2020229348A1 (en) | Correcting an examination report | |
Derczynski et al. | Using signals to improve automatic classification of temporal relations | |
CN114416977A (en) | Text difficulty grading evaluation method and device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |