WO2023165122A1 - 问诊模板的匹配方法、装置、设备及存储介质 - Google Patents

问诊模板的匹配方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023165122A1
WO2023165122A1 PCT/CN2022/121720 CN2022121720W WO2023165122A1 WO 2023165122 A1 WO2023165122 A1 WO 2023165122A1 CN 2022121720 W CN2022121720 W CN 2022121720W WO 2023165122 A1 WO2023165122 A1 WO 2023165122A1
Authority
WO
WIPO (PCT)
Prior art keywords
template
expression
sample set
text words
occurrence
Prior art date
Application number
PCT/CN2022/121720
Other languages
English (en)
French (fr)
Inventor
赵建双
Original Assignee
康键信息技术(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 康键信息技术(深圳)有限公司 filed Critical 康键信息技术(深圳)有限公司
Publication of WO2023165122A1 publication Critical patent/WO2023165122A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H80/00ICT specially adapted for facilitating communication between medical practitioners or patients, e.g. for collaborative diagnosis, therapy or health monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular to a matching method, device, equipment and storage medium of an inquiry template.
  • This application provides a matching method, device, equipment, and storage medium for an inquiry template, which is used to enable operators to clearly know the classification rules of each inquiry template, and the classification process is easy to control.
  • the classification rules can be modified as soon as possible, which is convenient for operators to solve fault problems and adjust business changes faster.
  • the first aspect of this application provides a method for matching medical inquiry templates, including: obtaining the main complaint information provided by the user; the main complaint information includes the user's physical condition information and discomfort symptom information; combining the main complaint information with each question
  • the regular expression of the diagnosis template is matched, and the consultation template corresponding to the successful regular expression is determined as the target consultation template; the target consultation template is provided to the user; the target consultation template is used for the user to fill in the symptom performance information;
  • the regular expression of each consultation template is obtained in the following way: obtain the sample set corresponding to the consultation template; obtain the initial expression based on the sample attribute information of the sample set; update the initial expression and Sample collection until the first specified condition is met, and the regular expression of the question template is obtained.
  • the second aspect of the present application provides a matching device for medical inquiry templates
  • the device includes: an information acquisition module, used to obtain the main complaint information provided by the user; the main complaint information includes the user's physical condition information and discomfort symptom information; a matching module for The main complaint information is matched with the regular expressions of each kind of consultation template, and the consultation template corresponding to the successful matching regular expression is determined as the target consultation template; the target providing module is used to provide the target consultation template to the user ;
  • the target question template is used for the user to fill in the symptom performance information; wherein, the regular expression of each question template is obtained by the following method: obtain the sample set corresponding to the question template; obtain the initial expression based on the sample attribute information of the sample set formula; update the initial expression and the sample set based on the preset update algorithm until the first specified condition is met, and obtain the regular expression of the consultation template.
  • the third aspect of the present application provides a matching device for a medical inquiry template.
  • the matching device for a medical questioning template includes: a memory and at least one processor, and instructions are stored in the memory; at least one processor invokes the instructions in the memory, so that the question.
  • the matching device of the diagnosis template performs the steps of the matching method of the consultation template as follows: obtain the main complaint information provided by the user; the main complaint information includes the user's physical condition information and discomfort symptom information; The expression is matched, and the consultation template corresponding to the successful regular expression is determined as the target consultation template; the target consultation template is provided to the user; the target consultation template is used for the user to fill in the symptom performance information; wherein, each The regular expression of the consultation template is obtained in the following ways: obtain the sample set corresponding to the consultation template; obtain the initial expression based on the sample attribute information of the sample set; update the initial expression and the sample set based on the preset update algorithm, until If the first specified condition is met, the regular expression of the consultation template is obtained.
  • the fourth aspect of the present application provides a computer-readable storage medium, on which instructions are stored, and when the instructions are executed by a processor, the following steps of the method for matching medical inquiry templates are implemented: obtaining the chief complaint information provided by the user; chief complaint information Including the user's physical condition information and discomfort symptom information; matching the main complaint information with the regular expressions of each consultation template, and determining the consultation template corresponding to the successful regular expression as the target consultation template; Provide target consultation templates; target consultation templates are used for users to fill in symptom information; wherein, the regular expressions of each consultation template are obtained through the following methods: obtain the sample set corresponding to the consultation template; sample attributes based on the sample set Information to obtain an initial expression; update the initial expression and sample set based on a preset update algorithm until the first specified condition is met, and obtain a regular expression of the question template.
  • the main complaint information provided by the user is obtained; the main complaint information includes the user's physical condition information and discomfort symptom information; the main complaint information is matched with the regular expressions of each consultation template, and the successful regular expressions are matched
  • the consultation template corresponding to the formula is determined as the target consultation template; the target consultation template is provided to the user; the target consultation template is used for the user to fill in the symptom performance information; wherein, the regular expression of each consultation template is passed in the following way Obtain: obtain the sample set corresponding to the consultation template; obtain the initial expression based on the sample attribute information of the sample set; update the initial expression and the sample set based on the preset update algorithm until the first specified condition is met, and obtain the query template regular expression.
  • regular expressions are used to match the consultation templates for users.
  • the regular expressions of each consultation template have good readability. Operators can clearly know the classification rules of each consultation template, and the classification process is convenient. Control, when a fault or business change occurs, the classification rules can be modified as soon as possible, which is convenient for the operator to solve the fault problem and adjust the business change faster.
  • Fig. 1 is a schematic diagram of an embodiment of a matching method of an inquiry template in the embodiment of the present application
  • FIG. 2 is a schematic diagram of another embodiment of the matching method of the inquiry template in the embodiment of the present application.
  • FIG. 3 is a schematic diagram of an embodiment of a matching device for an inquiry template in the embodiment of the present application
  • Fig. 4 is a schematic diagram of an embodiment of a device for matching a medical inquiry template in the embodiment of the present application.
  • the embodiment of the present application provides a matching method, device, device, and storage medium of an inquiry template, which are used to extract and analyze data information of different types of forms, and do not need to customize the identification module for different form formats, while ensuring accuracy At the same time, the extraction cost is reduced.
  • AI artificial intelligence
  • the embodiments of the present application may acquire and process relevant data based on artificial intelligence technology.
  • artificial intelligence is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • the present application provides a method, device, device and storage medium for matching an inquiry template, which are used to match an inquiry template for a user through a regular expression, and each regular expression of an inquiry template has a better Readability, operators can clearly know the classification rules of each consultation template, the classification process is easy to control, when a fault or business change occurs, the classification rules can be modified as soon as possible, which is convenient for operators to solve fault problems and make business changes faster Make adjustments.
  • An embodiment of a method for matching a medical inquiry template in the embodiment of the present application includes:
  • the main complaint information includes the user's physical condition information and discomfort symptom information;
  • the user can submit the main complaint information online through the terminal device.
  • the main complaint information mainly includes the user's current physical condition and discomfort symptoms, etc.
  • the main complaint information is usually in the form of text; the user can also input the voice, and then extract the text from the voice to get the text Form Chief Complaint Information.
  • the chief complaint information may be: left foot tendonitis, a small amount of effusion in the left ankle joint cavity, etc.
  • the regular expression of each consultation template is obtained in the following way: obtain the sample set corresponding to the consultation template; obtain the initial expression based on the sample attribute information of the sample set; update the initial expression and Sample collection until the first specified condition is met, and the regular expression of the question template is obtained.
  • the above sample set may include positive samples and negative samples, wherein the positive sample matches the inquiry template, the negative sample does not match the inquiry template, and the negative sample matches other inquiry templates. match.
  • the above initial expression can be obtained based on the sample attribute information of the positive sample.
  • the sample attribute information can be the weight of some keywords in the positive sample, or the positional relationship between keywords and other information.
  • the obtained initial expression can match at least a part of the positive samples.
  • the initial expression is deformed, mutated, etc., and the sample set is updated at the same time, so that the initial expression can match more positive samples.
  • try to None of the negative samples match, or the matching degree with the negative samples is lower than a certain threshold, that is, the final regular expression can have a greater degree of discrimination between the matching degrees of the positive samples and the negative samples.
  • the target medical inquiry template is used for the user to fill in the symptom presentation information.
  • the target inquiry template here usually matches the user's chief complaint information.
  • the target consultation template is used for the user to fill in the symptoms information, which can describe the user's symptoms more specifically and in detail than the above-mentioned main complaint information, which is helpful for doctors to diagnose the user's disease based on the symptoms information and provide a treatment plan.
  • the matching method of the above-mentioned medical inquiry template obtains the main complaint information provided by the user; the main complaint information includes the user's physical condition information and discomfort symptom information; the main complaint information is matched with the regular expression of each medical inquiry template, and the regular expression that matches successfully
  • the consultation template corresponding to the formula is determined as the target consultation template; the target consultation template is provided to the user; the target consultation template is used for the user to fill in the symptom performance information; wherein, the regular expression of each consultation template is passed in the following way
  • regular expressions are used to match the consultation templates for users.
  • the regular expressions of each consultation template have good readability. Operators can clearly know the classification rules of each consultation template, and the classification process is convenient. Control, when a fault or business change occurs, the classification rules can be modified as soon as possible, which is convenient for the operator to solve the fault problem and adjust the business change faster.
  • an embodiment of obtaining the regular expression of each consultation template includes:
  • the above-mentioned training corpus includes multiple sets of training corpus; wherein, the format of the training corpus is: chief complaint information+template name+belonging department.
  • the chief complaint information is "left foot tendonitis, a small amount of effusion in the left ankle joint cavity”
  • the template name is "leg pain consultation”
  • the department is “orthopedics”.
  • the template name may also include a large category of templates, for example, comprehensive, external, etc.
  • the template name of the aforementioned "Leg Pain Consultation” will be embodied as "Leg Pain Consultation-Comprehensive”.
  • the main complaint information is "Enteritis and diarrhea for more than 20 days, and the medicine did not improve”
  • the template name is "Adult Diarrhea Consultation-External”
  • the department is "Gastroenterology”.
  • target templates belonging to the same department as the consultation template are obtained from the training corpus, and the training corpus belonging to the target templates other than the consultation template is used as a negative sample.
  • the training corpus to which the leg pain consultation template belongs is a positive sample
  • the training corpus corresponding to other consultation templates is a negative sample
  • the department is the orthopedics training corpus
  • the samples other than the training corpus to which the leg pain consultation template belongs are negative samples.
  • the above-mentioned step of determining the weight of the text word based on the meaning of the text word and the frequency of occurrence of the text word in the sample set can be realized by the following sub-steps:
  • the weight of each text word is obtained through statistical methods of word frequency and inverse document frequency.
  • term frequency and inverse document frequency can also be called TF-IDF (Term Frequency–Inverse Document Frequency).
  • word frequency and inverse document frequency the importance of a text word to a training corpus can be evaluated. The importance of a text word increases proportionally with the number of times it appears in the training corpus, but at the same time it decreases inversely proportional to the frequency it appears in the sample set.
  • the co-occurrence parameters include co-occurrence frequency, average distance, minimum distance and maximum distance information
  • the co-occurrence frequency can be understood as the frequency of two text words appearing in the same training corpus;
  • the average distance can be understood as calculating the distance between two text words in the same training corpus, which can be measured by the number of characters, and then calculated Average the distances of two text words in each training corpus to get the average distance.
  • the minimum distance is the minimum distance between two text words in each training corpus, and the maximum distance is the maximum distance between two text words in each training corpus.
  • a co-occurrence matrix between multiple text words is generated; in the co-occurrence matrix, each matrix position includes a co-occurrence parameter, and the co-occurrence parameter is The co-occurrence parameter between the two text words corresponding to this matrix position.
  • the relationship between the occurrence positions of any two text words that is, the co-occurrence parameters between the two text words; the relationship between the occurrence positions of multiple text words, including the co-occurrence parameters between any two text words occurrence parameters; thus, the above-mentioned co-occurrence matrix includes the correlation of occurrence positions among multiple text words.
  • the relationship between the occurrence positions of the two text words can be understood as the distance between the two text words appearing in the text, the probability that the two text words appear in the same text at the same time, etc.
  • the association relationship of occurrence locations can be realized through co-occurrence statistics.
  • the above-mentioned co-occurrence matrix can also be a three-dimensional matrix, and the horizontal direction of the three-dimensional matrix is a plurality of text words arranged in sequence, and the vertical direction is also a plurality of text words arranged in sequence, and the depth is composed of the above-mentioned co-occurrence parameters parameter vector.
  • the initial expression may be generated according to the text words with higher weight among the multiple text words and the text words with high co-occurrence frequency and small co-occurrence distance in the co-occurrence matrix.
  • Text words with higher weights have a higher matching degree with the inquiry template.
  • the co-occurrence frequency in the co-occurrence matrix is higher, and the terms with smaller co-occurrence distances have a higher matching degree with the inquiry template.
  • This initial expression is a regular expression. Most of the positive samples in the above sample set can match the initial expression, but some positive samples cannot match the initial expression. In order to obtain the most suitable regular expression for the question template, it is necessary to use the following steps for the initial expression adjust and filter.
  • Ways of crossover variation may include word addition, word replacement, word deletion, negative addition, negative deletion, negative substitution, cross breeding, and the like.
  • the optimal expression can be screened through various conditions such as accuracy rate, recall rate, and the logic of words.
  • the above-mentioned step of selecting the optimal expression from multiple variant expressions based on preset conditions can be realized through the principle of genetic algorithm, specifically through the following sub-steps:
  • the matching relationship includes: the first matching rate between the variant expression and the positive sample in the sample set, and the The second matching rate of negative samples;
  • the higher the first matching rate and the lower the second matching rate the higher the matching degree of the variation expression and the inquiry template is, and the variation expression can identify the The matching chief complaint information, and then the consultation template can be recommended to the patient with the chief complaint information.
  • a matching rate threshold can be set for the first matching rate, and a matching rate threshold can be set for the second matching rate. Only when both matching rates meet the corresponding matching rate threshold, it can be determined that the matching relationship satisfies The second specified condition.
  • cross-mutation in addition to performing cross-mutation based on the initial expression, cross-mutation can also be performed based on the current mutated expression to obtain more diverse mutated expressions.
  • the first specified condition includes: the number of loops reaches the number threshold, or the proportion of positive samples in the sample set Satisfy the preset percentage threshold.
  • the regular expression of each inquiry model can be obtained.
  • the regular expression has good readability, explanatory ability and easy control, which is convenient for medical operations to quickly understand and repair, and adapt to business changes.
  • An embodiment of the matching device of the medical questioning template in the embodiment of the present application includes:
  • An information acquisition module 301 configured to acquire the main complaint information provided by the user; the main complaint information includes the user's physical condition information and discomfort symptom information;
  • the matching module 302 is used to match the main complaint information with the regular expressions of each kind of medical inquiry template, and determine the medical questioning template corresponding to the successful regular expression as the target medical questioning template; wherein, each kind of medical questioning template
  • the regular expression of is obtained through the following methods: obtain the sample set corresponding to the consultation template; obtain the initial expression based on the sample attribute information of the sample set; update the initial expression and the sample set based on the preset update algorithm until the first Specify the conditions to get the regular expression of the inquiry template;
  • the target providing module 303 is used to provide the user with a target medical inquiry template; the target medical inquiry template is used for the user to fill in the information of symptoms.
  • the above-mentioned matching module is also used to: obtain the training corpus; the training corpus includes multiple sets of training expectations, each set of training expectations includes chief complaint information, template name and department; obtains the first template corresponding to the template name of the inquiry template from the training corpus; A training corpus, the first training corpus is used as a positive sample; the second training corpus corresponding to the template name other than the template name of the inquiry template is obtained from the training corpus, and the second training corpus is used as a negative sample; the positive sample and the negative The sample is the sample set corresponding to the consultation template.
  • the above matching module is also used to: perform word segmentation processing on the chief complaint information in the sample set to obtain multiple text words; for each text word, based on the meaning of the text word and the frequency of occurrence of the text word in the sample set, determine the text word The weight of the weight; the association relationship between the occurrence positions of multiple text words; the initial expression.
  • the above-mentioned matching module is also used for: based on the preset synonym table, normalizes multiple text words, divides synonymous text words into the same category, and obtains the meaning type of each text word; based on each text word The meaning type of the word, the weight of each text word is obtained through the statistical method of word frequency and inverse document frequency.
  • the above matching module is also used to: for any two text words between multiple text words, count the co-occurrence parameters between any two text words; the co-occurrence parameters include co-occurrence frequency, average distance, minimum distance and maximum distance information; based on the co-occurrence parameters between any two text words, a co-occurrence matrix between multiple text words is generated; in the co-occurrence matrix, each matrix position includes a co-occurrence parameter, and the co-occurrence parameter is the matrix Co-occurrence parameters between two text words corresponding to positions.
  • the above-mentioned matching module is also used for: performing cross-mutation on the initial expression to obtain multiple variant expressions; screening the optimal expression from multiple variant expressions based on preset conditions; deleting positive expressions that meet the optimal expression in the sample set Sample, generate a new positive sample that does not conform to the optimal expression, and obtain an updated sample set; continue to execute the steps of obtaining the initial expression based on the sample attribute information based on the sample set, until the specified conditions are met, stop the loop, and get a consultation A regular expression of the template; wherein, the first specified condition includes: the number of cycles reaches a number threshold, or the proportion of positive samples in the sample set meets a preset proportion threshold.
  • the above-mentioned matching module is also used to: for each variation expression, determine the matching relationship between the variation expression and the sample set; the matching relationship includes: the first matching rate of the variation expression with the positive sample in the sample set, and the The second matching rate of the negative samples in the set; judging whether there is a variation expression whose matching relationship satisfies the second specified condition among multiple variation expressions; if it exists, take the variation expression whose matching relationship meets the second specified condition as the most An optimal expression; if it does not exist, continue to perform the step of cross-mutating the initial expression to obtain multiple mutated expressions until a mutated expression whose matching relationship satisfies the second specified condition appears.
  • the matching device of the above-mentioned medical inquiry template obtains the chief complaint information provided by the user; the chief complaint information includes the user's physical condition information and discomfort symptom information; matches the chief complaint information with the regular expressions of each medical inquiry template, and matches the successful regular expression
  • the consultation template corresponding to the formula is determined as the target consultation template; the target consultation template is provided to the user; the target consultation template is used for the user to fill in the symptom performance information; wherein, the regular expression of each consultation template is passed in the following way Obtain: obtain the sample set corresponding to the consultation template; obtain the initial expression based on the sample attribute information of the sample set; update the initial expression and the sample set based on the preset update algorithm until the first specified condition is met, and obtain the query template regular expression.
  • regular expressions are used to match the consultation templates for users.
  • the regular expressions of each consultation template have good readability. Operators can clearly know the classification rules of each consultation template, and the classification process is convenient. Control, when a fault or business change occurs, the classification rules can be modified as soon as possible, which is convenient for the operator to solve the fault problem and adjust the business change faster.
  • Figure 3 above describes in detail the device for matching medical inquiry templates in the embodiment of the present application from the perspective of unitization, and the following describes the matching equipment for medical inquiry templates in the embodiment of the present application in detail from the perspective of hardware processing.
  • the matching device for a medical questioning template includes: a memory and at least one processor, and instructions are stored in the memory; at least one processor calls the memory in the memory. instructions, so that the device for matching medical inquiry templates executes the above method for matching medical inquiry templates.
  • the matching device 400 of the inquiry template may have relatively large differences due to different configurations or performances, and may include one or more processors (central processing units, CPU) 410 (for example, one or more processors) and memory 420 , one or more storage media 430 (such as one or more mass storage devices) for storing application programs 433 or data 432 .
  • the memory 420 and the storage medium 430 may be temporary storage or persistent storage.
  • the program stored in the storage medium 430 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the matching device 400 for the consultation template.
  • the processor 410 may be configured to communicate with the storage medium 430, and execute a series of instruction operations in the storage medium 430 on the device 400 for matching the consultation template.
  • the matching device 400 of the consultation template may also include one or more power sources 440, one or more wired or wireless network interfaces 450, one or more input and output interfaces 460, and/or, one or more operating systems 431, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
  • the present application also provides a computer-readable storage medium
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium
  • the computer-readable storage medium may also be a volatile computer-readable storage medium
  • the computer-readable storage medium may be Instructions are stored in the read storage medium, and when the instructions are run on the computer, the computer is made to execute the steps of the method for matching the consultation template.
  • the present application also provides a matching device for a medical inquiry template.
  • the matching device for a medical questioning template includes a memory and a processor. Instructions are stored in the memory. When the instructions are executed by the processor, the processor executes the medical inquiry in the above-mentioned embodiments The steps of the template's matching method.
  • the computer-readable storage medium may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function, etc.; Use the created data etc.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods in various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Machine Translation (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

本申请涉及人工智能技术领域,公开了一种问诊模板的匹配方法、装置、设备及存储介质,方法包括:获取用户提供的主诉信息;将主诉信息与问诊模板的正则表达式进行匹配,将匹配成功的正则表达式所对应的问诊模板,确定为目标问诊模板;向用户提供目标问诊模板;每种问诊模板的正则表达式通过下述方式得到:获取问诊模板对应的样本集合;基于样本集合的样本属性信息,得到初始表达式;基于预设的更新算法更新初始表达式和样本集合,直至满足第一指定条件,得到问诊模板的正则表达式。该方式可以使运营人员明确知晓每种问诊模板的分类规则,分类过程便于控制,当出现故障或业务变更时,方便运营人员解决故障问题以及更快地对业务变更进行调整。

Description

问诊模板的匹配方法、装置、设备及存储介质
本申请要求于2022年03月04日提交中国专利局、申请号为202210212308.7、发明名称为“问诊模板的匹配方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种问诊模板的匹配方法、装置、设备及存储介质。
背景技术
在患者问诊的过程中,首先需要根据患者的主诉病症,向患者提供合适的问诊模板,以供患者填写具体的病症表现。通常,不同的病症所匹配的问诊模板不同,此时,就需要根据患者提供的主诉病症,向患者提供最匹配的问诊模板。相关技术中,通过机器学习的方式训练分类模型,将患者提供的主诉病症输入至该分类模型中,即可输出匹配的问诊模板。但是发明人发现,通过机器学习方式训练得到的分类模型属于黑盒模型,缺少解释性,难以理解,也无法得知其分类的具体过程,一旦出现故障或者业务变更等问题,医学运营人员很难在短时间内更新分类模型,增加了修复或者变更业务的成本。
发明内容
本申请提供了一种问诊模板的匹配方法、装置、设备及存储介质,用于使运营人员可以明确知晓每种问诊模板的分类规则,分类过程便于控制,当出现故障或业务变更时,可以尽快修改分类规则,方便运营人员解决故障问题以及更快地对业务变更进行调整。
为实现上述目的,本申请第一方面提供了一种问诊模板的匹配方法,包括:获取用户提供的主诉信息;主诉信息包括用户的身体状况信息和不适症状信息;将主诉信息与每种问诊模板的正则表达式进行匹配,将匹配成功的正则表达式所对应的问诊模板,确定为目标问诊模板;向用户提供目标问诊模板;目标问诊模板用于用户填写病症表现信息;其中,每种问诊模板的正则表达式通过下述方式得到:获取问诊模板对应的样本集合;基于样本集合的样本属性信息,得到初始表达式;基于预设的更新算法更新初始表达式和样本集合,直至满足第一指定条件,得到问诊模板的正则表达式。
本申请第二方面提供了一种问诊模板的匹配装置,装置包括:信息获取模块,用于获取用户提供的主诉信息;主诉信息包括用户的身体状况信息和不适症状信息;匹配模块,用于将主诉信息与每种问诊模板的正则表达式进行匹配,将匹配成功的正则表达式所对应的问诊模板,确定为目标问诊模板;目标提供模块,用于向用户提供目标问诊模板;目标问诊模板用于用户填写病症表现信息;其中,每种问诊模板的正则表达式通过下述方式得到:获取问诊模板对应的样本集合;基于样本集合的样本属性信息,得到初始表达式;基于预设的更新算法更新初始表达式和样本集合,直至满足第一指定条件,得到问诊模板的正则表达式。
本申请第三方面提供了一种问诊模板的匹配设备,问诊模板的匹配设备包括:存储器和至少一个处理器,存储器中存储有指令;至少一个处理器调用存储器中的指令,以使得问诊模板的匹配设备执行如下所述的问诊模板的匹配方法的步骤:获取用户提供的主诉信息;主诉信息包括用户的身体状况信息和不适症状信息;将主诉信息与每种问诊模板的正则表达式进行匹配,将匹配成功的正则表达式所对应的问诊模板,确定为目标问诊模板;向用户提供目标问诊模板;目标问诊模板用于用户填写病症表现信息;其中,每种问诊模板的正则表达式通过下述方式得到:获取问诊模板对应的样本集合;基于样本集合的样本属性信息,得到初始表达式;基于预设的更新算法更新初始表达式和样本集合,直至满足第一指定条件,得到问诊模板的正则表达式。
本申请第四方面提供了一种计算机可读存储介质,其上存储有指令,指令被处理器执 行时实现如下所述的问诊模板的匹配方法的步骤:获取用户提供的主诉信息;主诉信息包括用户的身体状况信息和不适症状信息;将主诉信息与每种问诊模板的正则表达式进行匹配,将匹配成功的正则表达式所对应的问诊模板,确定为目标问诊模板;向用户提供目标问诊模板;目标问诊模板用于用户填写病症表现信息;其中,每种问诊模板的正则表达式通过下述方式得到:获取问诊模板对应的样本集合;基于样本集合的样本属性信息,得到初始表达式;基于预设的更新算法更新初始表达式和样本集合,直至满足第一指定条件,得到问诊模板的正则表达式。
本申请提供的技术方案中,获取用户提供的主诉信息;主诉信息包括用户的身体状况信息和不适症状信息;将主诉信息与每种问诊模板的正则表达式进行匹配,将匹配成功的正则表达式所对应的问诊模板,确定为目标问诊模板;向用户提供目标问诊模板;目标问诊模板用于用户填写病症表现信息;其中,每种问诊模板的正则表达式通过下述方式得到:获取问诊模板对应的样本集合;基于样本集合的样本属性信息,得到初始表达式;基于预设的更新算法更新初始表达式和样本集合,直至满足第一指定条件,得到问诊模板的正则表达式。上述方式中,通过正则表达式为用户匹配问诊模板,每种问诊模板的正则表达式,具有较好的可读性,运营人员可以明确知晓每种问诊模板的分类规则,分类过程便于控制,当出现故障或业务变更时,可以尽快修改分类规则,方便运营人员解决故障问题以及更快地对业务变更进行调整。
附图说明
图1为本申请实施例中问诊模板的匹配方法的一个实施例示意图;
图2为本申请实施例中问诊模板的匹配方法的另一个实施例示意图;
图3为本申请实施例中问诊模板的匹配装置的一个实施例示意图;
图4为本申请实施例中问诊模板的匹配设备的一个实施例示意图。
具体实施方式
本申请实施例提供了一种问诊模板的匹配方法、装置、设备及存储介质,用于提取和解析不同类型表格的数据信息,并且无需针对不同的表格版式定制识别模块,在保证准确率的同时,降低提取成本。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
相关技术中,通过深度学习方法,例如:贝叶斯网络模型、BERT(BidirectionalEncoder Representations from Transformers)网络模型、前馈神经网络模型,直接对主诉信息进行分类,该分类方法的可读性较低,如果在运行过程中出现故障,维修人员可能需要耗费 很大的时间成本才能解决故障问题,此外,如果业务有所调整,也需要比较大的时间成本,不利于医院的运营管理。
基于此,本申请提供了一种问诊模板的匹配方法、装置、设备及存储介质,用于通过正则表达式为用户匹配问诊模板,每种问诊模板的正则表达式,具有较好的可读性,运营人员可以明确知晓每种问诊模板的分类规则,分类过程便于控制,当出现故障或业务变更时,可以尽快修改分类规则,方便运营人员解决故障问题以及更快地对业务变更进行调整。
为便于理解,下面对本申请实施例的具体流程进行描述,请参阅图1,本申请实施例中一种问诊模板的匹配方法的一个实施例包括:
101、获取用户提供的主诉信息;该主诉信息包括用户的身体状况信息和不适症状信息;
用户可以通过终端设备线上提交主诉信息,该主诉信息主要包括用户目前的身体状况及不适症状等等,该主诉信息通常为文本形式;用户也可以输入语音,然后从语音中提取文本,得到文本形式的主诉信息。例如,主诉信息可以为:左脚肌腱炎,左踝关节腔少量积液等。
102、将主诉信息与每种问诊模板的正则表达式进行匹配,将匹配成功的正则表达式所对应的问诊模板,确定为目标问诊模板;
其中,每种问诊模板的正则表达式通过下述方式得到:获取问诊模板对应的样本集合;基于样本集合的样本属性信息,得到初始表达式;基于预设的更新算法更新初始表达式和样本集合,直至满足第一指定条件,得到问诊模板的正则表达式。
对于一种问诊模板来说,上述样本集合中可以包括正样本和负样本,其中,正样本与该问诊模板相匹配,负样本与该问诊模板不匹配,负样本与其他问诊模板相匹配。在实际实现时,可以基于正样本的样本属性信息,得到上述初始表达式,样本属性信息可以为正样本中部分关键词的权重,或者关键词之间的位置关系等信息。得到的初始表达式可以匹配至少一部分正样本。在初始表达式的基础上,基于预设的规则或算法,对初始表达式进行变形、变异等处理,同时对样本集合进行更新,以使初始表达式可以匹配更多的正样本,同时,尽量与负样本均不匹配,或者与负样本的匹配度低于一定的阈值,即,最终得到的正则表达式可以对正样本和负样本的匹配度有较大的区分度。
103、向用户提供目标问诊模板;该目标问诊模板用于用户填写病症表现信息。
这里的目标问诊模板通常与用户的主诉信息相匹配。该目标问诊模板用于用户填写病症表现信息,该病症表现信息比前述主诉信息可以更加具体详细的描述用户的病症,有利于医生基于病症表现信息对用户的疾病进行诊断,并提供治疗方案。
上述问诊模板的匹配方法,获取用户提供的主诉信息;主诉信息包括用户的身体状况信息和不适症状信息;将主诉信息与每种问诊模板的正则表达式进行匹配,将匹配成功的正则表达式所对应的问诊模板,确定为目标问诊模板;向用户提供目标问诊模板;目标问诊模板用于用户填写病症表现信息;其中,每种问诊模板的正则表达式通过下述方式得到:获取问诊模板对应的样本集合;基于样本集合的样本属性信息,得到初始表达式;基于预设的更新算法更新初始表达式和样本集合,直至满足第一指定条件,得到问诊模板的正则表达式。上述方式中,通过正则表达式为用户匹配问诊模板,每种问诊模板的正则表达式,具有较好的可读性,运营人员可以明确知晓每种问诊模板的分类规则,分类过程便于控制,当出现故障或业务变更时,可以尽快修改分类规则,方便运营人员解决故障问题以及更快地对业务变更进行调整。
请参阅图2,本申请实施例中问诊模板的匹配方法中,获取每种问诊模板的正则表达式的一个实施例包括:
201,获取训练语料集;该训练语料集中包括多组训练预料,每组训练预料中包括主诉信息、模板名称和所属科室;从训练语料集中获取问诊模板的模板名称对应的第一训练语 料,将第一训练语料作为正样本;从训练语料集中获取除问诊模板的模板名称以外的模板名称对应的第二训练语料,将第二训练语料作为负样本;将正样本和负样本作为问诊模板对应的样本集合。
上述训练语料集包括多组训练语料;其中,训练语料的格式为:主诉信息+模板名称+所属科室。例如,主诉信息为“左脚肌腱炎,左踝关节腔少量积液”,模板名称为“腿痛问诊”,科室为“骨科”。另外,模板名称中还可以包括模板的大类别,例如,综合、外部等,此时,前述“腿痛问诊”的模板名称会具体化为“腿痛问诊-综合”。再如,主诉信息为“肠炎拉肚子二十多天了,吃了药也没见好”,模板名称为“成人腹泻问诊-外部”,科室为“消化内科”。
另一种实现方式中,从训练语料集中获取与问诊模板属于同一科室的目标模板,将除问诊模板以外的目标模板所属的训练语料,作为负样本。
例如:问诊模板为腿痛问诊模板时,该腿痛问诊模板所属的训练语料为正样本,其他问诊模板对应的训练语料为负样本,或者,将科室为骨科的训练语料中,腿痛问诊模板所属的训练语料以外的样本为负样本。
202,对样本集合中的主诉信息进行分词处理,得到多个文本词;针对每个文本词,基于文本词的词义,以及文本词在样本集合中的出现频率,确定文本词的权重;统计多个文本词之间的出现位置的关联关系;
上述基于文本词的词义,以及文本词在样本集合中的出现频率,确定文本词的权重的步骤,可以通过下述子步骤实现:
(1)基于预设的同义词表,对多个文本词进行归一化操作,将同义的文本词划分为同一类,得到每个文本词的词义类型;作为示例,如果同义的文本词有多个,例如三个,可以将其中一个文本词作为该组同义的文本词的词义类型。
(2)基于每个文本词的词义类型,通过词频和逆文档频率的统计方式,得到每个文本词的权重。其中,词频和逆文档频率,也可以称为TF-IDF(Term Frequency–InverseDocument Frequency)。通过词频和逆文档频率,可以评估一文本词对于一个训练语料的重要程度。文本词的重要性随着它在训练语料中出现的次数成正比增加,但同时会随着它在样本集合中出现的频率成反比下降。
上述统计多个文本词之间的出现位置的关联关系的步骤,可以通过下述子步骤实现:
(1)针对多个文本词之间的任意两个文本词,统计任意两个文本词之间的共现参数;该共现参数包括共现频率、平均间距、最小间距和最大间距信息;
其中,共现频率可以理解为两个文本词出现在同一训练语料中的频率;平均间距可以理解为,计算两个文本词在同一训练语料中的距离,该距离可以通过字符数衡量,然后计算两个文本词在各个训练语料中的距离的平均值,得到平均间距。最小间距为两个文本词在各个训练语料中的距离的最小值,最大间距为两个文本词在各个训练语料中的距离的最大值。
(2)基于任意两个文本词之间的共现参数,生成多个文本词之间的共现矩阵;在共现矩阵中,每个矩阵位置上包括一个共现参数,该共现参数为该矩阵位置对应的两个文本词之间的共现参数。
任意两个文本词之间的出现位置的关联关系,即这两个文本词之间的共现参数;多个文本词之间的出现位置的关联关系,包括任意两个文本词之间的共现参数;因而,上述共现矩阵包含了多个文本词之间的出现位置的关联关系。
以两个文本词为例,这两个文本词的出现位置的关联关系,可以理解为这两个文本词在文本中出现的距离,两个文本词同时出现在同一文本中的概率等。在实际实现时,出现位置的关联关系可以通过共现统计实现。
另一种方式中,上述共现矩阵也可以为三维矩阵,该三维矩阵的横向为依次排列的多个文本词,纵向也为依次排列的多个文本词,深向为上述共现参数组成的参数向量。
203,根据样本集合中的多个文本词、每个文本词的权重、以及多个文本词之间的出现位置的关联关系,生成初始表达式。
一种具体的方式中,可以根据多个文本词中权重较高的文本词以及共现矩阵中共现频率高、共现距离较小的文本词生成初始表达式。权重较高的文本词与问诊模版的匹配程度更高,同样地,共现矩阵中共现频率较高,共现距离较小的词条与问诊模版的匹配程度更高。
该初始表达式属于正则表达式。上述样本集合中的正样本中的大部分样本可以匹配该初始表达式,但部分正样本不能匹配该初始表达式,为了得到问诊模板最合适的正则表达式,需要通过下述步骤为初始表达式进行调整、筛选。
204,对初始表达式进行交叉变异,得到多个变异表达式;基于预设条件从多个变异表达式中筛选最优表达式;删除样本集合中符合最优表达式的正样本,生成不符合最优表达式的新的正样本,得到更新后的样本集合;
交叉变异的方式可以包括,词语添加、词语替换、词语删减、负向添加、负向删减、负向替换和交叉繁衍等。具体可以通过准确率、召回率及词语的逻辑性等多方面条件筛选最优表达式。
具体的,上述基于预设条件从多个变异表达式中筛选最优表达式的步骤,可以通过遗传算法的原理实现,具体可以通过下述子步骤实现:
(1)针对每个变异表达式,确定该变异表达式与样本集合的匹配关系;该匹配关系包括:该变异表达式与样本集合中的正样本的第一匹配率,以及与样本集合中的负样本的第二匹配率;
(2)判断多个变异表达式中,是否存在匹配关系满足第二指定条件的变异表达式;
需要说明的是,在上述匹配关系中,第一匹配率越高,且第二匹配率越低,说明变异表达式与问诊模板匹配程度越高,该变异表达式可以识别到与问诊模板相匹配的主诉信息,进而可以将该问诊模板推荐给主诉信息的患者。
在上述第二指定条件中,可以针对第一匹配率设置一个匹配率阈值,针对第二匹配率设置一个匹配率阈值,只有两个匹配率均满足对应的匹配率阈值时,可以确定匹配关系满足第二指定条件。
(3)如果存在,将匹配关系满足第二指定条件的变异表达式作为最优表达式;如果不存在,继续执行对初始表达式进行交叉变异,得到多个变异表达式的步骤,直至出现匹配关系满足第二指定条件的变异表达式。
另一种方式中,除了基于初始表达式进行交叉变异,也可以基于当前的变异表达式进行交叉变异,得到更加多样的变异表达式。
205,继续执行前述步骤202,直至满足指定条件,停止循环,得到问诊模板的正则表达式;其中,该第一指定条件包括:循环次数达到次数阈值,或者正样本在样本集合中的占比满足预设占比阈值。
通过上述步骤201-205,可以得到每个问诊模型的正则表达式。该正则表达式具有较好的可读性、解释性并且易于控制,方便医学运营快速理解修复,适应业务变化。
下面对本申请实施例中问诊模板的匹配装置进行描述,请参阅图3,本申请实施例中问诊模板的匹配装置的一个实施例包括:
信息获取模块301,用于获取用户提供的主诉信息;主诉信息包括用户的身体状况信息和不适症状信息;
匹配模块302,用于将主诉信息与每种问诊模板的正则表达式进行匹配,将匹配成功 的正则表达式所对应的问诊模板,确定为目标问诊模板;其中,每种问诊模板的正则表达式通过下述方式得到:获取问诊模板对应的样本集合;基于样本集合的样本属性信息,得到初始表达式;基于预设的更新算法更新初始表达式和样本集合,直至满足第一指定条件,得到问诊模板的正则表达式;
目标提供模块303,用于向用户提供目标问诊模板;目标问诊模板用于用户填写病症表现信息。
上述匹配模块还用于:获取训练语料集;训练语料集中包括多组训练预料,每组训练预料中包括主诉信息、模板名称和所属科室;从训练语料集中获取问诊模板的模板名称对应的第一训练语料,将第一训练语料作为正样本;从训练语料集中获取除问诊模板的模板名称以外的模板名称对应的第二训练语料,将第二训练语料作为负样本;将正样本和负样本作为问诊模板对应的样本集合。
上述匹配模块还用于:对样本集合中的主诉信息进行分词处理,得到多个文本词;针对每个文本词,基于文本词的词义,以及文本词在样本集合中的出现频率,确定文本词的权重;统计多个文本词之间的出现位置的关联关系;根据样本集合中的多个文本词、每个文本词的权重、以及多个文本词之间的出现位置的关联关系,生成初始表达式。
上述匹配模块还用于:基于预设的同义词表,对多个文本词进行归一化操作,将同义的文本词划分为同一类,得到每个文本词的词义类型;基于每个文本词的词义类型,通过词频和逆文档频率的统计方式,得到每个文本词的权重。
上述匹配模块还用于:针对多个文本词之间的任意两个文本词,统计任意两个文本词之间的共现参数;共现参数包括共现频率、平均间距、最小间距和最大间距信息;基于任意两个文本词之间的共现参数,生成多个文本词之间的共现矩阵;在共现矩阵中,每个矩阵位置上包括一个共现参数,共现参数为该矩阵位置对应的两个文本词之间的共现参数。
上述匹配模块还用于:对初始表达式进行交叉变异,得到多个变异表达式;基于预设条件从多个变异表达式中筛选最优表达式;删除样本集合中符合最优表达式的正样本,生成不符合最优表达式的新的正样本,得到更新后的样本集合;继续执行基于样本集合的样本属性信息,得到初始表达式的步骤,直至满足指定条件,停止循环,得到问诊模板的正则表达式;其中,第一指定条件包括:循环次数达到次数阈值,或者正样本在样本集合中的占比满足预设占比阈值。
上述匹配模块还用于:针对每个变异表达式,确定该变异表达式与样本集合的匹配关系;匹配关系包括:该变异表达式与样本集合中的正样本的第一匹配率,以及与样本集合中的负样本的第二匹配率;判断多个变异表达式中,是否存在匹配关系满足第二指定条件的变异表达式;如果存在,将匹配关系满足第二指定条件的变异表达式作为最优表达式;如果不存在,继续执行对初始表达式进行交叉变异,得到多个变异表达式的步骤,直至出现匹配关系满足第二指定条件的变异表达式。
上述问诊模板的匹配装置,获取用户提供的主诉信息;主诉信息包括用户的身体状况信息和不适症状信息;将主诉信息与每种问诊模板的正则表达式进行匹配,将匹配成功的正则表达式所对应的问诊模板,确定为目标问诊模板;向用户提供目标问诊模板;目标问诊模板用于用户填写病症表现信息;其中,每种问诊模板的正则表达式通过下述方式得到:获取问诊模板对应的样本集合;基于样本集合的样本属性信息,得到初始表达式;基于预设的更新算法更新初始表达式和样本集合,直至满足第一指定条件,得到问诊模板的正则表达式。上述方式中,通过正则表达式为用户匹配问诊模板,每种问诊模板的正则表达式,具有较好的可读性,运营人员可以明确知晓每种问诊模板的分类规则,分类过程便于控制,当出现故障或业务变更时,可以尽快修改分类规则,方便运营人员解决故障问题以及更快地对业务变更进行调整。
上面图3从单元化的角度对本申请实施例中的问诊模板的匹配装置进行详细描述,下面从硬件处理的角度对本申请实施例中问诊模板的匹配设备进行详细描述。
图4是本申请实施例提供的一种问诊模板的匹配设备的结构示意图,问诊模板的匹配设备包括:存储器和至少一个处理器,存储器中存储有指令;至少一个处理器调用存储器中的指令,以使得问诊模板的匹配设备执行上述问诊模板的匹配方法。
该问诊模板的匹配设备400可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)410(例如,一个或一个以上处理器)和存储器420,一个或一个以上存储应用程序433或数据432的存储介质430(例如一个或一个以上海量存储设备)。其中,存储器420和存储介质430可以是短暂存储或持久存储。存储在存储介质430的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对问诊模板的匹配设备400中的一系列指令操作。更进一步地,处理器410可以设置为与存储介质430通信,在问诊模板的匹配设备400上执行存储介质430中的一系列指令操作。
问诊模板的匹配设备400还可以包括一个或一个以上电源440,一个或一个以上有线或无线网络接口450,一个或一个以上输入输出接口460,和/或,一个或一个以上操作系统431,例如Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。本领域技术人员可以理解,图4示出的问诊模板的匹配设备结构并不构成对问诊模板的匹配设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,该计算机可读存储介质也可以为易失性计算机可读存储介质,计算机可读存储介质中存储有指令,当指令在计算机上运行时,使得计算机执行问诊模板的匹配方法的步骤。
本申请还提供一种问诊模板的匹配设备,问诊模板的匹配设备包括存储器和处理器,存储器中存储有指令,指令被处理器执行时,使得处理器执行上述各实施例中的问诊模板的匹配方法的步骤。
进一步地,计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实 施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (20)

  1. 一种问诊模板的匹配方法,其中,所述方法包括:
    获取用户提供的主诉信息;所述主诉信息包括所述用户的身体状况信息和不适症状信息;
    将所述主诉信息与每种问诊模板的正则表达式进行匹配,将匹配成功的正则表达式所对应的问诊模板,确定为目标问诊模板;
    向所述用户提供所述目标问诊模板;所述目标问诊模板用于所述用户填写病症表现信息;
    其中,每种所述问诊模板的正则表达式通过下述方式得到:获取所述问诊模板对应的样本集合;基于所述样本集合的样本属性信息,得到初始表达式;基于预设的更新算法更新所述初始表达式和所述样本集合,直至满足第一指定条件,得到所述问诊模板的正则表达式。
  2. 根据权利要求1所述的方法,其中,获取所述问诊模板对应的样本集合的步骤,包括:
    获取训练语料集;所述训练语料集中包括多组训练预料,每组所述训练预料中包括主诉信息、模板名称和所属科室;
    从所述训练语料集中获取所述问诊模板的模板名称对应的第一训练语料,将所述第一训练语料作为正样本;
    从所述训练语料集中获取除所述问诊模板的模板名称以外的模板名称对应的第二训练语料,将所述第二训练语料作为负样本;
    将所述正样本和所述负样本作为所述问诊模板对应的样本集合。
  3. 根据权利要求1所述的方法,其中,基于所述样本集合的样本属性信息,得到初始表达式的步骤,包括:
    对所述样本集合中的主诉信息进行分词处理,得到多个文本词;
    针对每个所述文本词,基于所述文本词的词义,以及所述文本词在样本集合中的出现频率,确定所述文本词的权重;
    统计所述多个文本词之间的出现位置的关联关系;
    根据所述样本集合中的多个文本词、每个所述文本词的权重、以及多个文本词之间的出现位置的关联关系,生成初始表达式。
  4. 根据权利要求3所述的方法,其中,基于所述文本词的词义,以及所述文本词在样本集合中的出现频率,确定所述文本词的权重的步骤,包括:
    基于预设的同义词表,对所述多个文本词进行归一化操作,将同义的文本词划分为同一类,得到每个所述文本词的词义类型;
    基于每个所述文本词的词义类型,通过词频和逆文档频率的统计方式,得到每个文本词的权重。
  5. 根据权利要求3所述的方法,其中,统计所述多个文本词之间的出现位置的关联关系的步骤,包括:
    针对所述多个文本词之间的任意两个文本词,统计所述任意两个文本词之间的共现参数;所述共现参数包括共现频率、平均间距、最小间距和最大间距信息;
    基于所述任意两个文本词之间的共现参数,生成所述多个文本词之间的共现矩阵;在所述共现矩阵中,每个矩阵位置上包括一个共现参数,所述共现参数为该矩阵位置对应的两个文本词之间的共现参数。
  6. 根据权利要求1所述的方法,其中,基于预设的更新算法更新所述初始表达式和所述样本集合,直至满足第一指定条件,得到所述问诊模板的正则表达式的步骤,包括:
    对所述初始表达式进行交叉变异,得到多个变异表达式;
    基于预设条件从多个变异表达式中筛选最优表达式;
    删除所述样本集合中符合所述最优表达式的正样本,生成不符合所述最优表达式的新的正样本,得到更新后的所述样本集合;
    继续执行基于所述样本集合的样本属性信息,得到初始表达式的步骤,直至满足指定条件,停止循环,得到所述问诊模板的正则表达式;其中,所述第一指定条件包括:循环次数达到次数阈值,或者所述正样本在所述样本集合中的占比满足预设占比阈值。
  7. 根据权利要求6所述的方法,其中,基于预设条件从多个变异表达式中筛选最优表达式的步骤,包括:
    针对每个所述变异表达式,确定该变异表达式与所述样本集合的匹配关系;所述匹配关系包括:该变异表达式与所述样本集合中的正样本的第一匹配率,以及与所述样本集合中的负样本的第二匹配率;
    判断多个变异表达式中,是否存在所述匹配关系满足第二指定条件的变异表达式;
    如果存在,将所述匹配关系满足第二指定条件的变异表达式作为最优表达式;如果不存在,继续执行对所述初始表达式进行交叉变异,得到多个变异表达式的步骤,直至出现所述匹配关系满足第二指定条件的变异表达式。
  8. 一种问诊模板的匹配装置,其中,所述装置包括:
    信息获取模块,用于获取用户提供的主诉信息;所述主诉信息包括所述用户的身体状况信息和不适症状信息;
    匹配模块,用于将所述主诉信息与每种问诊模板的正则表达式进行匹配,将匹配成功的正则表达式所对应的问诊模板,确定为目标问诊模板;
    目标提供模块,用于向所述用户提供所述目标问诊模板;所述目标问诊模板用于所述用户填写病症表现信息。
  9. 一种问诊模板的匹配设备,其中,所述问诊模板的匹配设备包括:存储器和至少一个处理器,所述存储器中存储有指令;
    所述至少一个处理器调用所述存储器中的所述指令,以使得所述问诊模板的匹配设备执行如下所述的问诊模板的匹配方法的步骤:
    获取用户提供的主诉信息;所述主诉信息包括所述用户的身体状况信息和不适症状信息;
    将所述主诉信息与每种问诊模板的正则表达式进行匹配,将匹配成功的正则表达式所对应的问诊模板,确定为目标问诊模板;
    向所述用户提供所述目标问诊模板;所述目标问诊模板用于所述用户填写病症表现信息;
    其中,每种所述问诊模板的正则表达式通过下述方式得到:获取所述问诊模板对应的样本集合;基于所述样本集合的样本属性信息,得到初始表达式;基于预设的更新算法更新所述初始表达式和所述样本集合,直至满足第一指定条件,得到所述问诊模板的正则表达式。
  10. 根据权利要求9所述的问诊模板的匹配设备,其中,获取所述问诊模板对应的样本集合的步骤,包括:
    获取训练语料集;所述训练语料集中包括多组训练预料,每组所述训练预料中包括主诉信息、模板名称和所属科室;
    从所述训练语料集中获取所述问诊模板的模板名称对应的第一训练语料,将所述第一训练语料作为正样本;
    从所述训练语料集中获取除所述问诊模板的模板名称以外的模板名称对应的第二训练 语料,将所述第二训练语料作为负样本;
    将所述正样本和所述负样本作为所述问诊模板对应的样本集合。
  11. 根据权利要求9所述的问诊模板的匹配设备,其中,基于所述样本集合的样本属性信息,得到初始表达式的步骤,包括:
    对所述样本集合中的主诉信息进行分词处理,得到多个文本词;
    针对每个所述文本词,基于所述文本词的词义,以及所述文本词在样本集合中的出现频率,确定所述文本词的权重;
    统计所述多个文本词之间的出现位置的关联关系;
    根据所述样本集合中的多个文本词、每个所述文本词的权重、以及多个文本词之间的出现位置的关联关系,生成初始表达式。
  12. 根据权利要求11所述的问诊模板的匹配设备,其中,基于所述文本词的词义,以及所述文本词在样本集合中的出现频率,确定所述文本词的权重的步骤,包括:
    基于预设的同义词表,对所述多个文本词进行归一化操作,将同义的文本词划分为同一类,得到每个所述文本词的词义类型;
    基于每个所述文本词的词义类型,通过词频和逆文档频率的统计方式,得到每个文本词的权重。
  13. 根据权利要求11所述的问诊模板的匹配设备,其中,统计所述多个文本词之间的出现位置的关联关系的步骤,包括:
    针对所述多个文本词之间的任意两个文本词,统计所述任意两个文本词之间的共现参数;所述共现参数包括共现频率、平均间距、最小间距和最大间距信息;
    基于所述任意两个文本词之间的共现参数,生成所述多个文本词之间的共现矩阵;在所述共现矩阵中,每个矩阵位置上包括一个共现参数,所述共现参数为该矩阵位置对应的两个文本词之间的共现参数。
  14. 根据权利要求9所述的问诊模板的匹配设备,其中,基于预设的更新算法更新所述初始表达式和所述样本集合,直至满足第一指定条件,得到所述问诊模板的正则表达式的步骤,包括:
    对所述初始表达式进行交叉变异,得到多个变异表达式;
    基于预设条件从多个变异表达式中筛选最优表达式;
    删除所述样本集合中符合所述最优表达式的正样本,生成不符合所述最优表达式的新的正样本,得到更新后的所述样本集合;
    继续执行基于所述样本集合的样本属性信息,得到初始表达式的步骤,直至满足指定条件,停止循环,得到所述问诊模板的正则表达式;其中,所述第一指定条件包括:循环次数达到次数阈值,或者所述正样本在所述样本集合中的占比满足预设占比阈值。
  15. 根据权利要求14所述的问诊模板的匹配设备,其中,基于预设条件从多个变异表达式中筛选最优表达式的步骤,包括:
    针对每个所述变异表达式,确定该变异表达式与所述样本集合的匹配关系;所述匹配关系包括:该变异表达式与所述样本集合中的正样本的第一匹配率,以及与所述样本集合中的负样本的第二匹配率;
    判断多个变异表达式中,是否存在所述匹配关系满足第二指定条件的变异表达式;
    如果存在,将所述匹配关系满足第二指定条件的变异表达式作为最优表达式;如果不存在,继续执行对所述初始表达式进行交叉变异,得到多个变异表达式的步骤,直至出现所述匹配关系满足第二指定条件的变异表达式。
  16. 一种计算机可读存储介质,其上存储有指令,其中,所述指令被处理器执行时实现如下所述的问诊模板的匹配方法的步骤:
    获取用户提供的主诉信息;所述主诉信息包括所述用户的身体状况信息和不适症状信息;
    将所述主诉信息与每种问诊模板的正则表达式进行匹配,将匹配成功的正则表达式所对应的问诊模板,确定为目标问诊模板;
    向所述用户提供所述目标问诊模板;所述目标问诊模板用于所述用户填写病症表现信息;
    其中,每种所述问诊模板的正则表达式通过下述方式得到:获取所述问诊模板对应的样本集合;基于所述样本集合的样本属性信息,得到初始表达式;基于预设的更新算法更新所述初始表达式和所述样本集合,直至满足第一指定条件,得到所述问诊模板的正则表达式。
  17. 根据权利要求16所述的计算机可读存储介质,其中,获取所述问诊模板对应的样本集合的步骤,包括:
    获取训练语料集;所述训练语料集中包括多组训练预料,每组所述训练预料中包括主诉信息、模板名称和所属科室;
    从所述训练语料集中获取所述问诊模板的模板名称对应的第一训练语料,将所述第一训练语料作为正样本;
    从所述训练语料集中获取除所述问诊模板的模板名称以外的模板名称对应的第二训练语料,将所述第二训练语料作为负样本;
    将所述正样本和所述负样本作为所述问诊模板对应的样本集合。
  18. 根据权利要求16所述的计算机可读存储介质,其中,基于所述样本集合的样本属性信息,得到初始表达式的步骤,包括:
    对所述样本集合中的主诉信息进行分词处理,得到多个文本词;
    针对每个所述文本词,基于所述文本词的词义,以及所述文本词在样本集合中的出现频率,确定所述文本词的权重;
    统计所述多个文本词之间的出现位置的关联关系;
    根据所述样本集合中的多个文本词、每个所述文本词的权重、以及多个文本词之间的出现位置的关联关系,生成初始表达式。
  19. 根据权利要求18所述的计算机可读存储介质,其中,基于所述文本词的词义,以及所述文本词在样本集合中的出现频率,确定所述文本词的权重的步骤,包括:
    基于预设的同义词表,对所述多个文本词进行归一化操作,将同义的文本词划分为同一类,得到每个所述文本词的词义类型;
    基于每个所述文本词的词义类型,通过词频和逆文档频率的统计方式,得到每个文本词的权重。
  20. 根据权利要求18所述的计算机可读存储介质,其中,统计所述多个文本词之间的出现位置的关联关系的步骤,包括:
    针对所述多个文本词之间的任意两个文本词,统计所述任意两个文本词之间的共现参数;所述共现参数包括共现频率、平均间距、最小间距和最大间距信息;
    基于所述任意两个文本词之间的共现参数,生成所述多个文本词之间的共现矩阵;在所述共现矩阵中,每个矩阵位置上包括一个共现参数,所述共现参数为该矩阵位置对应的两个文本词之间的共现参数。
PCT/CN2022/121720 2022-03-04 2022-09-27 问诊模板的匹配方法、装置、设备及存储介质 WO2023165122A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210212308.7 2022-03-04
CN202210212308.7A CN114566294A (zh) 2022-03-04 2022-03-04 问诊模板的匹配方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023165122A1 true WO2023165122A1 (zh) 2023-09-07

Family

ID=81717991

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/121720 WO2023165122A1 (zh) 2022-03-04 2022-09-27 问诊模板的匹配方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN114566294A (zh)
WO (1) WO2023165122A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114566294A (zh) * 2022-03-04 2022-05-31 康键信息技术(深圳)有限公司 问诊模板的匹配方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1908935A (zh) * 2006-08-01 2007-02-07 华为技术有限公司 一种自然语言的搜索方法及系统
CN111415740A (zh) * 2020-02-12 2020-07-14 东北大学 问诊信息的处理方法、装置、存储介质及计算机设备
CN112397197A (zh) * 2020-11-16 2021-02-23 康键信息技术(深圳)有限公司 基于人工智能的问诊数据处理方法及装置
CN112509682A (zh) * 2020-12-15 2021-03-16 康键信息技术(深圳)有限公司 基于文本识别的问诊方法、装置、设备及存储介质
WO2021068683A1 (zh) * 2019-10-11 2021-04-15 平安科技(深圳)有限公司 正则表达式生成方法、装置、服务器及计算机可读存储介质
CN114566294A (zh) * 2022-03-04 2022-05-31 康键信息技术(深圳)有限公司 问诊模板的匹配方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1908935A (zh) * 2006-08-01 2007-02-07 华为技术有限公司 一种自然语言的搜索方法及系统
WO2021068683A1 (zh) * 2019-10-11 2021-04-15 平安科技(深圳)有限公司 正则表达式生成方法、装置、服务器及计算机可读存储介质
CN111415740A (zh) * 2020-02-12 2020-07-14 东北大学 问诊信息的处理方法、装置、存储介质及计算机设备
CN112397197A (zh) * 2020-11-16 2021-02-23 康键信息技术(深圳)有限公司 基于人工智能的问诊数据处理方法及装置
CN112509682A (zh) * 2020-12-15 2021-03-16 康键信息技术(深圳)有限公司 基于文本识别的问诊方法、装置、设备及存储介质
CN114566294A (zh) * 2022-03-04 2022-05-31 康键信息技术(深圳)有限公司 问诊模板的匹配方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN114566294A (zh) 2022-05-31

Similar Documents

Publication Publication Date Title
US20190251084A1 (en) Search method and apparatus
WO2020181805A1 (zh) 糖尿病的预测方法及装置、存储介质、计算机设备
CN107103048B (zh) 药品信息匹配方法及系统
Häggström Data‐driven confounder selection via Markov and Bayesian networks
Grannis et al. Real world performance of approximate string comparators for use in patient matching
CN104731774B (zh) 面向通用机译引擎的个性化翻译方法及装置
WO2021135449A1 (zh) 基于深度强化学习的数据分类方法、装置、设备及介质
WO2023029513A1 (zh) 基于人工智能的搜索意图识别方法、装置、设备及介质
CN111986792A (zh) 医疗机构评分方法、装置、设备及存储介质
CN110032631B (zh) 一种信息反馈方法、装置和存储介质
WO2023165122A1 (zh) 问诊模板的匹配方法、装置、设备及存储介质
CN111883251A (zh) 医疗误诊检测方法、装置、电子设备及存储介质
US11705231B2 (en) System and method for computerized synthesis of simulated health data
Geng et al. A model-free Bayesian classifier
WO2021120587A1 (zh) 基于oct的视网膜分类方法、装置、计算机设备及存储介质
WO2023178970A1 (zh) 医疗数据处理方法、装置、设备及存储介质
JP2021179859A (ja) 学習モデル生成システム、及び学習モデル生成方法
CN117198446A (zh) 一种基于医护对讲设备的查房报告自动生成方法
CN116468043A (zh) 嵌套实体识别方法、装置、设备及存储介质
CN108009157B (zh) 一种语句归类方法及装置
Rahimi et al. A web‐based high‐performance multicriteria decision support system for medical diagnosis
Janani et al. Dengue prediction using (MLP) multilayer perceptron—A machine learning approach
WO2022079593A1 (en) A system and a way to automatically monitor clinical trials - virtual monitor (vm) and a way to record medical history
Dankar et al. A new PCA-based utility measure for synthetic data evaluation
CN111986815A (zh) 基于共现关系的项目组合挖掘方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22929552

Country of ref document: EP

Kind code of ref document: A1