CN116226362A - Word segmentation method for improving accuracy of searching hospital names - Google Patents
Word segmentation method for improving accuracy of searching hospital names Download PDFInfo
- Publication number
- CN116226362A CN116226362A CN202310500980.0A CN202310500980A CN116226362A CN 116226362 A CN116226362 A CN 116226362A CN 202310500980 A CN202310500980 A CN 202310500980A CN 116226362 A CN116226362 A CN 116226362A
- Authority
- CN
- China
- Prior art keywords
- word segmentation
- word
- matching
- segmentation
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 142
- 238000000034 method Methods 0.000 title claims abstract description 63
- 238000012216 screening Methods 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 6
- 230000009191 jumping Effects 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a word segmentation method for improving the accuracy of searching hospital names, which belongs to the technical field of hospital information, and the method comprises the steps of decomposing fonts in a text set one by one according to a target hospital common name set to form a text set of a single font; combining fonts in the text set front and back to form word segmentation, matching the word segmentation with a dictionary in a database, and outputting a word segmentation result successfully matched; sequentially displaying the matching results according to the matching degree of the word segmentation results; the invention can check and match one by one according to the input characters of the user and eliminate the ambiguity problem in character word segmentation, thereby greatly improving the accuracy and efficiency of searching and improving the experience of the user.
Description
Technical Field
The invention discloses a word segmentation method, belongs to the technical field of hospital information, and particularly relates to a word segmentation method for improving accuracy of searching hospital names.
Background
In the popularization of intellectualization and informatization, the user can know various information in the outside world without going home, and can obtain different types of information through searching of internet equipment; so that the information of people can be synchronized; with the advent of the information age, the internet has played an increasing role in various aspects of people's production and life, and for our country using chinese as a native language, chinese information processing technology has taken a very important role in informatization construction of our country.
When the user searches for the hospital name in daily life, because the hospital name is usually longer, if the whole name of the hospital cannot be marked out, a plurality of different hospital names can appear in the search box, and meanwhile, a plurality of hospitals possibly exist in the current city, so that the user cannot determine the accuracy of the hospital, and the experience of the user is reduced.
Chinese patent publication No. CN112199494a discloses a medical information searching method, apparatus, electronic device and storage medium. The method can determine medical inquiry sentences, preprocesses the medical inquiry sentences to obtain word segmentation sequences, wherein the word segmentation sequences comprise a plurality of medical words, a pre-built inverted index table is obtained, an initial text field of each medical word is determined, the medical words in the initial text field are determined to be boundary words, a target text field is determined from the initial text fields, each target text field corresponds to one inquiry dimension, a search library corresponding to a search request is determined according to the inquiry dimension, the medical words are searched in the search library, and the search result of the search request is obtained.
Chinese patent publication No. CN109543178A discloses a method and system for constructing judicial text label system. Obtaining judicial vocabulary texts through a word segmentation tool, constructing a primary tag system according to word frequency statistics, merging tags with similar semantics in the primary tag system, expanding a harsh tag to obtain an expanded tag system, counting the accuracy of searching the texts by the expanded tag system by utilizing a text test set, verifying whether the current expanded tag system is constructed, and otherwise, further optimizing the tag system.
The Chinese patent with publication number of CN111950283A discloses a Chinese word segmentation and named entity recognition system for large-scale medical text mining, word vectors are obtained based on word2vec and segmented text, the word vectors are input into a laminated BiLSTM-CRF model, entity labeling is carried out on the word vectors through a first layer of the laminated BiLSTM-CRF model, part-of-speech features are added into the word vectors after the entity labeling to form an input feature set, and complex named entity recognition is carried out on the input feature set through a second layer of the laminated BiLSTM-CRF model.
The prior art has the following problems: when the target information is segmented, the target information is not decomposed into single characters, and the characters are rearranged, so that information leakage is caused, and the search and the matching are not accurate enough; homonym replacement search is not performed, and the error word search information package degree is not enough; disambiguation of the word is not performed; word segmentation is carried out based on a semantic model, the calculation is complex, the calculation force requirement is high, and when the search calculation requirement of the Internet level is faced, the calculation and operation pressure of the system is high.
Disclosure of Invention
The invention aims to provide a word segmentation method for improving the accuracy of searching hospital names, and solves the defects in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions: a word segmentation method for improving accuracy of searching for hospital names, the word segmentation method comprising the following steps:
s1, establishing a word segmentation set formed by a single word set based on a target hospital common name set, wherein the method specifically comprises the following sub-steps:
S12, collecting the common namesThe vocabulary and phrases are decomposed one by one to form a single word set ++>The method comprises the steps of carrying out a first treatment on the surface of the The word set->, wherein />To->Is a single word;
s2, for the single word setThe single characters in the database are combined back and forth to form word segmentation, and the word segmentation is matched with dictionary in the database; comprises the following substeps:
s21, gathering the single wordsAll the single words of (1) are combined in positive sequence and in reverse sequence to obtain word segmentation set +.>The word set->, wherein ,/>Said->Is a two word phrase set, the +.>Is a three word phrase set, the ++>Is a four-word phrase set, and meets the following conditions:
wherein ;/>For the initial word +.>、/>、/>The segmentation word consists of an initial word and a following word;
s221, if the matching is successful, matching the matched phrase from the word segmentation setThe rest part is used as a new word segmentation set to be repeatedly combined and matched;
s222, if the matching is unsuccessful, selecting word sets from the word segmentation setOne or a plurality of single words are intercepted in the forward direction or the reverse direction to form a character string to be matched, and the character string is matched with a search field until the word segmentation set +.>Complete or intercept phrase matching inTo the last word->。
S3, outputting a word segmentation result which is successfully matched;
and S4, displaying the matching results in sequence according to the matching degree of the word segmentation results.
Further, the combined text which cannot be successfully matched is segmented, and ambiguity is eliminated; the method comprises the following specific steps:
s5, determining the text which cannot be successfully matched as the Chinese text Y to be segmented, and performing word segmentation through a forward maximum matching method, a reverse maximum matching method and an HMM to obtain a word segmentation resultThe method comprises the steps of carrying out a first treatment on the surface of the The segmentation results of the forward maximum matching method, the reverse maximum matching method and the HMM word segmentation method are respectively marked as +.>;
S6, marking to obtain the part which is not identical in the three word segmentation results, namely the part which is used as an ambiguity part, by comparing the three word segmentation results;
s7, judging which ambiguity results the ambiguity part belongs to and disambiguating:
s71, first result: if the result isOr->Or->That is, any two of the three word segmentation results are identical, the word segmentation results are +.>As a final cut;
s72, second result: if the result isNamely, the three word segmentation results are different from each other, the word segmentation result is +.>As a final cut;
when the ambiguous result is the second result, the second disambiguation is needed on the basis of the first disambiguation, the part of speech of the three word segmentation results is marked by using the HMM, the ambiguous parts which are different in each word segmentation result are obtained through screening, the maximized segmentation method is obtained through the evaluation function, and the segmentation is used as the final segmentation.
Further, in the case of the common name setBefore word segmentation, the common name set is +.>Preprocessing, recognizing Chinese and English numbers, domain names and the like with obvious characteristics, and carrying out ++on the common name set>Filtering text sets of (2), counting word frequency and selecting candidate words, screening Chinese and English numbers and domain names, and filtering for multiple times until no Chinese and English numbers and domain names are selectable.
Further, in the process of integrating the search field with the word segmentation setWhen matching, the word segmentation set is +.>Inserting, indexing and storing characters;
wherein the word segmentation setThe method comprises an initial node, a plurality of intermediate nodes and an end node; the initial node is located in the history recordThe intermediate node is positioned at the phrase which is successfully matched and has the smallest sequence number, the intermediate node is positioned at the phrase which is successfully matched in each time in the history record, and the ending node is positioned at the phrase which is successfully matched and has the largest sequence number in the history record; each time of matching is provided with paths from an initial node to an end node, and a plurality of intermediate nodes exist on each path;
when searching word segmentation setWhen a word is stored, the method starts from an initial node, and then traverses along a certain branch until the last word of the word is segmented, and the query is completed.
acquiring a first character of a search field, finding out an initial node corresponding to the first character, and jumping to an intermediate node of a next character to wait for the next inquiry;
acquiring a second character of the character string to be queried from the intermediate node, and jumping to the intermediate node of the next character again to wait for the next query;
repeating the operation until the last character of the word is used as an ending node;
and reading the information of the last character node, returning all characters of the path through which the information passes, and finishing the inquiry.
Further, when the word segmentation is always unable to be successfully matched, pinyin font matching is performed on all characters in the word segmentation, and each font is obtainedThe spelling of the font can be obtained>And performing combination matching with the initial consonant and the final of the pinyin in the search field. />
The beneficial effects are that: the invention discloses a word segmentation method, which belongs to the technical field of hospital information, and particularly relates to a word segmentation method for improving the accuracy of searching hospital names; combining fonts in the text set front and back to form vocabulary, matching the vocabulary with vocabulary in a database, and outputting a word segmentation result which is successfully matched; sequentially displaying the matching results according to the matching degree of the word segmentation results; the invention can check and match one by one according to the input characters of the user and eliminate the ambiguity problem in character word segmentation, thereby greatly improving the accuracy and efficiency of searching and improving the experience of the user.
Drawings
Fig. 1 is a schematic diagram of the operation of the present invention.
Fig. 2 is a flow chart of the operation of the present invention.
Fig. 3 is a diagram of the disambiguation step of the present invention.
FIG. 4 is a word segmentation matching flow diagram of the present invention.
FIG. 5 is a schematic diagram of word segmentation matching of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A word segmentation method for improving accuracy of searching hospital names comprises the following steps:
establishing a corresponding text set according to an input target text, and decomposing fonts in the text set one by one to form a text set of a single font;
combining fonts in the text set front and back to form word segmentation, matching the word segmentation with a dictionary in a database, and outputting a word segmentation result successfully matched;
and displaying the matching results in sequence according to the matching degree of the word segmentation results.
In one embodiment, a common name set is established based on the entered common names of the target hospitalsFor the common name set +.>The vocabulary and phrases are decomposed one by one to form a single word set ++>The method comprises the steps of carrying out a first treatment on the surface of the The single word set, wherein />To->Is a single word.
In one embodiment, for the set of wordsThe method for forming the word segmentation by combining the single characters in the database front and back and matching the word segmentation with the dictionary in the database comprises the following steps:
gathering the single wordsAll the single words of (1) are combined in positive sequence and in reverse sequence to obtain word segmentation set +.>The word set->, wherein ,/>Said->Is two (two)Word phrase set, said->Is a three word phrase set, the ++>Is a four-word phrase set, and meets the following conditions:
/> wherein ,;/>for the initial word +.>、/>、/>The segmentation word consists of an initial word and a following word;
if the matching is successful, matching the word group from the word segmentation setThe rest part is used as a new word segmentation set to be repeatedly combined and matched;
if the matching is unsuccessful, then the word segmentation set is used forOne or a plurality of single words are intercepted in the forward direction or the reverse direction to form a character string to be matched, and the character string is matched with a search field until the word segmentation set +.>The phrase matching in the word is completed or intercepted to the last word +.>;
Outputting a word segmentation result which is successfully matched;
and displaying the matching results in sequence according to the matching degree of the word segmentation results.
In one embodiment, for some combined texts which cannot be successfully matched, the text needs to be segmented to eliminate ambiguity; the method comprises the following specific steps:
determining a text which cannot be successfully matched as a Chinese text Y to be segmented, and performing word segmentation through a forward maximum matching method, a reverse maximum matching method and an HMM to obtain a word segmentation resultThe method comprises the steps of carrying out a first treatment on the surface of the The segmentation results of the forward maximum matching method, the reverse maximum matching method and the HMM word segmentation method are respectively marked as +.>;
The method comprises the steps of marking, namely obtaining a part which is not identical in three word segmentation results, namely being used as an ambiguous part, by comparing the three word segmentation results;
judging which ambiguity results the ambiguity part belongs to and disambiguating:
first result: if the result isOr->Or->That is, any two of the three word segmentation results are identical, the word segmentation results are +.>As a final cut;
second results: if the result isNamely, the three word segmentation results are different from each other, the word segmentation result is +.>As a final cut;
when the ambiguous result is the second result, the second disambiguation is needed on the basis of the first disambiguation, the part of speech of the three word segmentation results is marked by using the HMM, the ambiguous parts which are different in each word segmentation result are obtained through screening, the maximized segmentation method is obtained through the evaluation function, and the segmentation is used as the final segmentation.
In one embodiment, when a common name is assembledBefore word segmentation, the common name set is required to be +.>Preprocessing, recognizing Chinese and English numbers, domain names and the like with obvious characteristics, and carrying out ++on the common name set>Filtering text sets of (1), counting word frequency and selecting candidate words, screening Chinese and English numbers, domain names and the like, and screening and filtering for multiple times until no Chinese and English numbers and domain names are selectable, wherein the domain names can be distinguished, and the accuracy and the recognition efficiency can be greatly improved.
In one embodiment, the search field is combined with the word segmentation setWhen matching, the word segmentation set is processedClosing deviceInserting, indexing and storing characters;
wherein the word segmentation setThe method comprises an initial node, a plurality of intermediate nodes and an end node; the initial node is positioned at the phrase with successful matching and minimum sequence number in the history record, the intermediate node is positioned at the phrase with successful matching each time in the history record, and the ending node is positioned at the phrase with successful matching and maximum sequence number in the history record; each time of matching is provided with paths from an initial node to an end node, and a plurality of intermediate nodes exist on each path;
when searching word segmentation setWhen a word is stored, the method starts from an initial node, and then traverses along a certain branch until the last word of the word is segmented, and the query is completed.
acquiring a first character of a search field, finding out an initial node corresponding to the first character, and jumping to an intermediate node of a next character to wait for the next inquiry;
acquiring a second character of the character string to be queried from the intermediate node, and jumping to the intermediate node of the next character again to wait for the next query;
repeating the operation until the last character of the word is used as an ending node;
and reading the information of the last character node, returning all characters of the path through which the information passes, and finishing the inquiry.
In one embodiment, when the word segmentation is always unable to be successfully matched, pinyin font matching is needed for all characters in the word segmentation, and a database is utilized to perform specific search for pinyin of each font of the text set, so that the same pinyin font is matched;
when the word segmentation is always unable to be successfully matched, performing Pinyin font matching on all characters in the word segmentation, and obtaining each fontThe spelling of the font can be obtained>And performing combination matching with the initial consonant and the final of the pinyin in the search field.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.
Claims (6)
1. The word segmentation method for improving the accuracy of searching for the hospital name is characterized by comprising the following steps of:
s1, establishing a word segmentation set formed by a single word set based on a target hospital common name set, wherein the method specifically comprises the following sub-steps:
S12, collecting the common namesThe vocabulary and phrases are decomposed one by one to form a single word set ++>The method comprises the steps of carrying out a first treatment on the surface of the The word set->, wherein />To->Is a single word;
s2, for the single word setThe single characters in the database are combined back and forth to form word segmentation, and the word segmentation is matched with dictionary in the database; comprises the following substeps:
s21, gathering the single wordsAll the single words of (1) are combined in positive sequence and in reverse sequence to obtain word segmentation set +.>The word set->, wherein ,/>Said->Is a two word phrase set, the +.>Is a three word phrase set, the ++>Is a four-word phrase set, and meets the following conditions:
wherein ,;/>for the initial word +.>、/>、/>The segmentation word consists of an initial word and a following word;
s221, if the matching is successful, matching the matched phrase from the word segmentation setThe rest part is used as a new word segmentation set to be repeatedly combined and matched;
s222, if the matching is unsuccessful, selecting word sets from the word segmentation setOne or a plurality of single words are intercepted in the forward direction or the reverse direction to form a character string to be matched, and the character string is matched with a search field until the word segmentation set +.>The phrase matching in the word is completed or intercepted to the last word +.>;
S3, outputting a word segmentation result which is successfully matched;
and S4, displaying the matching results in sequence according to the matching degree of the word segmentation results.
2. The word segmentation method for improving the accuracy of searching for hospital names according to claim 1, wherein the word segmentation method is characterized in that the combined text which cannot be successfully matched is segmented, so that ambiguity is eliminated; the method comprises the following specific steps:
s5, determining the text which cannot be successfully matched as the Chinese text Y to be segmented, and performing word segmentation through a forward maximum matching method, a reverse maximum matching method and an HMM to obtain a word segmentation resultThe method comprises the steps of carrying out a first treatment on the surface of the The segmentation results of the forward maximum matching method, the reverse maximum matching method and the HMM word segmentation method are respectively marked as +.>;
S6, marking to obtain the part which is not identical in the three word segmentation results, namely the part which is used as an ambiguity part, by comparing the three word segmentation results;
s7, judging which ambiguity results the ambiguity part belongs to and disambiguating:
s71, first result: if the result isOr->Or->That is, any two of the three word segmentation results are identical, the word segmentation results are +.>As a final cut;
s72, second result: if the result isNamely, the three word segmentation results are different from each other, the word segmentation result is +.>As a final cut;
when the ambiguous result is the second result, the second disambiguation is needed on the basis of the first disambiguation, the part of speech of the three word segmentation results is marked by using the HMM, the ambiguous parts which are different in each word segmentation result are obtained through screening, the maximized segmentation method is obtained through the evaluation function, and the segmentation is used as the final segmentation.
3. The word segmentation method for improving accuracy of searching for hospital names according to claim 2, wherein in the step of searching for the common name setBefore word segmentation, the common name set is +.>Preprocessing, recognizing Chinese and English numbers, domain names and the like with obvious characteristics, and carrying out ++on the common name set>Filtering text sets of (2), counting word frequency and selecting candidate words, screening Chinese and English numbers and domain names, and filtering for multiple times until no Chinese and English numbers and domain names are selectable.
4. A method for word segmentation to improve accuracy of searching hospital names according to claim 3, wherein the search field is combined with word segmentation setWhen matching, the word segmentation set is +.>Inserting, indexing and storing characters;
wherein the word segmentation setThe method comprises an initial node, a plurality of intermediate nodes and an end node; the initial node is positioned at the phrase with successful matching and minimum sequence number in the history record, the intermediate node is positioned at the phrase with successful matching each time in the history record, and the ending node is positioned at the phrase with successful matching and maximum sequence number in the history record; each time of matching is provided with paths from an initial node to an end node, and a plurality of intermediate nodes exist on each path;
5. The word segmentation method for improving accuracy of searching for hospital names according to claim 4, wherein the word segmentation setThe matching method of (2) is as follows:
acquiring a first character of a search field, finding out an initial node corresponding to the first character, and jumping to an intermediate node of a next character to wait for the next inquiry;
acquiring a second character of the character string to be queried from the intermediate node, and jumping to the intermediate node of the next character again to wait for the next query;
repeating the operation until the last character of the word is used as an ending node;
and reading the information of the last character node, returning all characters of the path through which the information passes, and finishing the inquiry.
6. The word segmentation method for improving the accuracy of searching for hospital names according to claim 5, wherein when the word segmentation is always unable to be successfully matched, the spelling font matching is performed on all the characters in the word segmentation, and each font is obtainedThe spelling of the font can be obtained>And performing combination matching with the initial consonant and the final of the pinyin in the search field. />
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310500980.0A CN116226362B (en) | 2023-05-06 | 2023-05-06 | Word segmentation method for improving accuracy of searching hospital names |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310500980.0A CN116226362B (en) | 2023-05-06 | 2023-05-06 | Word segmentation method for improving accuracy of searching hospital names |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116226362A true CN116226362A (en) | 2023-06-06 |
CN116226362B CN116226362B (en) | 2023-07-18 |
Family
ID=86571606
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310500980.0A Active CN116226362B (en) | 2023-05-06 | 2023-05-06 | Word segmentation method for improving accuracy of searching hospital names |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116226362B (en) |
Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000004459A1 (en) * | 1998-07-15 | 2000-01-27 | Microsoft Corporation | Proper name identification in chinese |
JP2000200291A (en) * | 1998-12-29 | 2000-07-18 | Xerox Corp | Method for automatically detecting selected character string in text |
JP2001043221A (en) * | 1999-07-29 | 2001-02-16 | Matsushita Electric Ind Co Ltd | Chinese word dividing device |
CN101071420A (en) * | 2007-06-22 | 2007-11-14 | 腾讯科技(深圳)有限公司 | Method and system for cutting index participle |
CN101655841A (en) * | 2009-09-28 | 2010-02-24 | 章森 | Recursion method for word omni-segmentation of Chinese text |
CN101882163A (en) * | 2010-06-30 | 2010-11-10 | 中国科学院地理科学与资源研究所 | Fuzzy Chinese address geographic evaluation method based on matching rule |
AU2013219188A1 (en) * | 2007-01-04 | 2013-09-12 | Thinking Solutions Pty Ltd | Linguistic Analysis |
CN103678684A (en) * | 2013-12-25 | 2014-03-26 | 沈阳美行科技有限公司 | Chinese word segmentation method based on navigation information retrieval |
CN107918604A (en) * | 2017-11-13 | 2018-04-17 | 彩讯科技股份有限公司 | A kind of Chinese segmenting method and device |
CN108538395A (en) * | 2018-04-02 | 2018-09-14 | 上海市儿童医院 | A kind of construction method of general medical disease that calls for specialized treatment data system |
WO2018201600A1 (en) * | 2017-05-05 | 2018-11-08 | 平安科技(深圳)有限公司 | Information mining method and system, electronic device and readable storage medium |
JP2018206261A (en) * | 2017-06-08 | 2018-12-27 | 日本電信電話株式会社 | Word division estimation model learning device, word division device, method and program |
CN109753516A (en) * | 2019-01-31 | 2019-05-14 | 北京嘉和美康信息技术有限公司 | A kind of sort method and relevant apparatus of case history search result |
CN110287488A (en) * | 2019-06-18 | 2019-09-27 | 上海晏鼠计算机技术股份有限公司 | A kind of Chinese text segmenting method based on big data and Chinese feature |
CN112988753A (en) * | 2021-03-31 | 2021-06-18 | 建信金融科技有限责任公司 | Data searching method and device |
CN113065350A (en) * | 2021-04-13 | 2021-07-02 | 哈尔滨理工大学 | Biomedical text word sense disambiguation method based on attention neural network |
WO2021135910A1 (en) * | 2020-06-24 | 2021-07-08 | 平安科技(深圳)有限公司 | Machine reading comprehension-based information extraction method and related device |
CN113392189A (en) * | 2021-08-17 | 2021-09-14 | 东华理工大学南昌校区 | News text processing method based on automatic word segmentation |
CN114154494A (en) * | 2021-11-24 | 2022-03-08 | 南方电网数字电网研究院有限公司 | Disambiguation word segmentation method, system, device and storage medium |
US11520989B1 (en) * | 2018-05-17 | 2022-12-06 | Workday, Inc. | Natural language processing with keywords |
-
2023
- 2023-05-06 CN CN202310500980.0A patent/CN116226362B/en active Active
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000004459A1 (en) * | 1998-07-15 | 2000-01-27 | Microsoft Corporation | Proper name identification in chinese |
JP2000200291A (en) * | 1998-12-29 | 2000-07-18 | Xerox Corp | Method for automatically detecting selected character string in text |
JP2001043221A (en) * | 1999-07-29 | 2001-02-16 | Matsushita Electric Ind Co Ltd | Chinese word dividing device |
AU2013219188A1 (en) * | 2007-01-04 | 2013-09-12 | Thinking Solutions Pty Ltd | Linguistic Analysis |
CN101071420A (en) * | 2007-06-22 | 2007-11-14 | 腾讯科技(深圳)有限公司 | Method and system for cutting index participle |
CN101655841A (en) * | 2009-09-28 | 2010-02-24 | 章森 | Recursion method for word omni-segmentation of Chinese text |
CN101882163A (en) * | 2010-06-30 | 2010-11-10 | 中国科学院地理科学与资源研究所 | Fuzzy Chinese address geographic evaluation method based on matching rule |
CN103678684A (en) * | 2013-12-25 | 2014-03-26 | 沈阳美行科技有限公司 | Chinese word segmentation method based on navigation information retrieval |
WO2018201600A1 (en) * | 2017-05-05 | 2018-11-08 | 平安科技(深圳)有限公司 | Information mining method and system, electronic device and readable storage medium |
JP2018206261A (en) * | 2017-06-08 | 2018-12-27 | 日本電信電話株式会社 | Word division estimation model learning device, word division device, method and program |
CN107918604A (en) * | 2017-11-13 | 2018-04-17 | 彩讯科技股份有限公司 | A kind of Chinese segmenting method and device |
CN108538395A (en) * | 2018-04-02 | 2018-09-14 | 上海市儿童医院 | A kind of construction method of general medical disease that calls for specialized treatment data system |
US11520989B1 (en) * | 2018-05-17 | 2022-12-06 | Workday, Inc. | Natural language processing with keywords |
CN109753516A (en) * | 2019-01-31 | 2019-05-14 | 北京嘉和美康信息技术有限公司 | A kind of sort method and relevant apparatus of case history search result |
CN110287488A (en) * | 2019-06-18 | 2019-09-27 | 上海晏鼠计算机技术股份有限公司 | A kind of Chinese text segmenting method based on big data and Chinese feature |
WO2021135910A1 (en) * | 2020-06-24 | 2021-07-08 | 平安科技(深圳)有限公司 | Machine reading comprehension-based information extraction method and related device |
CN112988753A (en) * | 2021-03-31 | 2021-06-18 | 建信金融科技有限责任公司 | Data searching method and device |
CN113065350A (en) * | 2021-04-13 | 2021-07-02 | 哈尔滨理工大学 | Biomedical text word sense disambiguation method based on attention neural network |
CN113392189A (en) * | 2021-08-17 | 2021-09-14 | 东华理工大学南昌校区 | News text processing method based on automatic word segmentation |
CN114154494A (en) * | 2021-11-24 | 2022-03-08 | 南方电网数字电网研究院有限公司 | Disambiguation word segmentation method, system, device and storage medium |
Non-Patent Citations (1)
Title |
---|
唐涛: "面向特定领域的中文分词技术的研究", 中国优秀硕士论文电子期刊网, pages 1 - 56 * |
Also Published As
Publication number | Publication date |
---|---|
CN116226362B (en) | 2023-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11475209B2 (en) | Device, system, and method for extracting named entities from sectioned documents | |
CN105718586B (en) | The method and device of participle | |
US8447588B2 (en) | Region-matching transducers for natural language processing | |
US9195646B2 (en) | Training data generation apparatus, characteristic expression extraction system, training data generation method, and computer-readable storage medium | |
US8266169B2 (en) | Complex queries for corpus indexing and search | |
US5794177A (en) | Method and apparatus for morphological analysis and generation of natural language text | |
Kumar et al. | Part of speech taggers for morphologically rich indian languages: a survey | |
US8510097B2 (en) | Region-matching transducers for text-characterization | |
CN112035730B (en) | Semantic retrieval method and device and electronic equipment | |
WO2008107305A2 (en) | Search-based word segmentation method and device for language without word boundary tag | |
Zhikov et al. | An efficient algorithm for unsupervised word segmentation with branching entropy and MDL | |
Bellare et al. | Learning extractors from unlabeled text using relevant databases | |
CN112417891B (en) | Text relation automatic labeling method based on open type information extraction | |
CN106383814A (en) | Word segmentation method of English social media short text | |
CN109213998A (en) | Chinese wrongly written character detection method and system | |
CN112447172B (en) | Quality improvement method and device for voice recognition text | |
Shafi et al. | UNLT: Urdu natural language toolkit | |
CN114298010A (en) | Text generation method integrating dual-language model and sentence detection | |
CN116628173B (en) | Intelligent customer service information generation system and method based on keyword extraction | |
CN112765977A (en) | Word segmentation method and device based on cross-language data enhancement | |
Hirpassa | Information extraction system for Amharic text | |
CN116226362B (en) | Word segmentation method for improving accuracy of searching hospital names | |
CN115983233A (en) | Electronic medical record duplication rate estimation method based on data stream matching | |
CN115858733A (en) | Cross-language entity word retrieval method, device, equipment and storage medium | |
CN115618883A (en) | Business semantic recognition method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |