CN113779954A - Similar text recommendation method and device and electronic equipment - Google Patents

Similar text recommendation method and device and electronic equipment Download PDF

Info

Publication number
CN113779954A
CN113779954A CN202110130557.7A CN202110130557A CN113779954A CN 113779954 A CN113779954 A CN 113779954A CN 202110130557 A CN202110130557 A CN 202110130557A CN 113779954 A CN113779954 A CN 113779954A
Authority
CN
China
Prior art keywords
text
texts
historical
similar
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110130557.7A
Other languages
Chinese (zh)
Inventor
康西龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Tuoxian Technology Co Ltd
Original Assignee
Beijing Jingdong Tuoxian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Tuoxian Technology Co Ltd filed Critical Beijing Jingdong Tuoxian Technology Co Ltd
Priority to CN202110130557.7A priority Critical patent/CN113779954A/en
Publication of CN113779954A publication Critical patent/CN113779954A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a recommendation method and device for similar texts and electronic equipment, and relates to the technical field of artificial intelligence. The recommendation method comprises the following steps: determining similarity parameters between the main complaint texts of the users and the historical texts; inputting the characteristic information of each historical text into a machine learning model, and outputting the quality evaluation parameters of each historical text; and recommending similar texts of the main complaint texts in each historical text according to the similarity parameter and the quality evaluation parameter.

Description

Similar text recommendation method and device and electronic equipment
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for recommending similar texts, a method and an apparatus for recommending similar case texts, an electronic device, and a non-volatile computer-readable storage medium.
Background
The recommendation system mainly refers to a technology for applying collaborative intelligence to make recommendation, and can effectively solve the problem of information overload. The recommendation system can provide the user with a sorted recommendation list containing personalized articles and related information thereof according to the preference and the constraint of the user.
The text recommendation system can determine similar texts according to the main complaint texts provided by the user. The accurate text recommendation system can improve the user experience of each application field, and further improve the benefit of the application field. For example, applying the text recommendation system to the medical field can recommend a history case similar to the patient's condition as a reference text for medical diagnosis.
In the related art, similar texts may be recommended based on the similarity of the text contents.
Disclosure of Invention
The inventors of the present disclosure found that the following problems exist in the above-described related art: the basis for determining the similarity of the texts is single, so that the accuracy of text recommendation is low.
In view of this, the present disclosure provides a recommendation technical solution of similar texts, which can improve the accuracy of text recommendation.
According to some embodiments of the present disclosure, there is provided a recommendation method of similar text, including: determining similarity parameters between the main complaint texts of the users and the historical texts; inputting the characteristic information of each historical text into a machine learning model, and outputting the quality evaluation parameters of each historical text; and recommending similar texts of the main complaint texts in each historical text according to the similarity parameter and the quality evaluation parameter.
In some embodiments, determining similarity parameters between the user's complaint text and the respective historical texts comprises: and determining similarity parameters according to the search terms in the main complaint text and the index information of each historical text.
In some embodiments, determining the similarity parameter according to the index information of the search term and each historical text in the main complaint text includes: determining a search term according to an entity contained in the obtained main complaint text; and determining similarity parameters according to the search terms and the index information, wherein the index information is established according to the entities contained in the historical texts.
In some embodiments, inputting the feature information of each historical text into the machine learning model, and outputting the quality assessment parameter of each historical text comprises: determining each related text of the main complaint text in each historical text according to the similarity parameter; and inputting the characteristic information of each relevant text into a machine learning model, and outputting a quality evaluation parameter.
In some embodiments, recommending similar text for the complaint text in each of the historical texts includes: and recommending similar texts of the main complaint texts in the relevant texts according to the similarity parameters and the quality evaluation parameters.
In some embodiments, determining each relevant text of the complaint text in each historical text according to the similarity parameter includes: sequencing the historical texts according to the sequence of similarity parameters from large to small; and determining the historical texts with the ranking higher than the threshold value as the relevant texts.
In some embodiments, the similarity parameter is positively correlated with the inverse document frequency of each search term, positively correlated with the number of occurrences of each search term in each history text, negatively correlated with the length parameter of each history text, negatively correlated with the average length of all history texts, and positively correlated with the length of the corresponding history text.
In some embodiments, the characteristic information of each historical text is input into a machine learning model, and the quality evaluation parameters of each historical text are output; and inputting the word vectors of the feature information and the index information of each historical text into a machine learning model, and outputting the quality evaluation parameters of each historical text.
In some embodiments, the complaint text is patient complaint text, the historical text is historical case text, and the similar text is similar case text to the patient complaint text.
In some embodiments, the characteristic information includes at least one of case content characteristic information including at least one of the number of turns of a doctor's conversation with a patient, the number of medical terms included in a history case text, and a doctor profile characteristic including at least one of a job title of a doctor, a hospital grade, a rating, and a quantity of receptions.
In some embodiments, the recommendation method further comprises: and determining a recommended text in each similar text according to the screening result of the user on each similar text, wherein the screening result is determined according to at least one item of patient information, doctor portrait and inquiry process information corresponding to each similar text.
In some embodiments, the search term and the index information are determined according to a convergent word segmentation method, and the similarity parameter is determined according to an ES (elastic search) framework and a binary model BM25 (binary model) method.
In some embodiments, the entity is a medical domain entity including at least one of a disease class entity, a symptom class entity, a pharmaceutical class entity, a medical examination class entity.
According to other embodiments of the present disclosure, there is provided a recommendation method of similar case texts, including: determining similarity parameters between the patient complaint texts of the users and the historical case texts; inputting the characteristic information of each historical case text into a machine learning model, and outputting the quality evaluation parameters of each historical case text; and recommending similar case texts of the patient complaint texts in the historical case texts according to the similarity parameters and the quality evaluation parameters.
According to still other embodiments of the present disclosure, there is provided a text-like recommendation apparatus including: the similarity determining unit is used for determining similarity parameters between the main complaint texts of the users and the historical texts; the quality evaluation unit is used for inputting the characteristic information of each historical text into the machine learning model and outputting the quality evaluation parameters of each historical text; and the recommending unit is used for recommending the similar texts of the main complaint texts in each historical text according to the similarity parameter and the quality evaluation parameter.
In some embodiments, the similarity determining unit determines the similarity parameter according to the search term in the main complaint text and the index information of each history text.
In some embodiments, the similarity determining unit determines a search term according to an entity included in the obtained main complaint text, determines a similarity parameter according to the search term and index information, and the index information is established according to the entity included in each history text.
In some embodiments, the quality evaluation unit determines each relevant text of the main complaint text in each historical text according to the similarity parameter, inputs the feature information of each relevant text into the machine learning model, and outputs the quality evaluation parameter.
In some embodiments, the recommending unit recommends similar texts of the main complaint texts in each relevant text according to the similarity parameter and the quality evaluation parameter.
In some embodiments, the quality evaluation unit sorts the historical texts in the descending order of the similarity parameter; and determining the historical texts with the ranking higher than the threshold value as the relevant texts.
In some embodiments, the similarity parameter is positively correlated with the inverse document frequency of each search term, positively correlated with the number of occurrences of each search term in each history text, negatively correlated with the length parameter of each history text, negatively correlated with the average length of all history texts, and positively correlated with the length of the corresponding history text.
In some embodiments, the quality evaluation unit inputs the characteristic information of each historical text into the machine learning model and outputs the quality evaluation parameter of each historical text; and inputting the word vectors of the feature information and the index information of each historical text into a machine learning model, and outputting the quality evaluation parameters of each historical text.
In some embodiments, the complaint text is patient complaint text, the historical text is historical case text, and the similar text is similar case text to the patient complaint text.
In some embodiments, the characteristic information includes at least one of case content characteristic information including at least one of the number of turns of a doctor's conversation with a patient, the number of medical terms included in a history case text, and a doctor profile characteristic including at least one of a job title of a doctor, a hospital grade, a rating, and a quantity of receptions.
In some embodiments, the recommending unit determines the recommended text in each similar text according to the screening result of the user on each similar text, and the screening result is determined according to at least one of the patient information, the doctor portrait and the inquiry process information corresponding to each similar text.
In some embodiments, the search term and the index information are determined according to a Chinese word segmentation method, and the similarity parameter is determined according to an ES frame and a binary model BM25 method.
In some embodiments, the entity is a medical domain entity including at least one of a disease class entity, a symptom class entity, a pharmaceutical class entity, a medical examination class entity.
According to still further embodiments of the present disclosure, there is provided a recommendation apparatus for similar case texts, including: the similarity determining unit is used for determining similarity parameters between the patient chief complaint texts of the users and the historical case texts; the quality evaluation unit is used for inputting the characteristic information of each historical case text into the machine learning model and outputting the quality evaluation parameters of each historical case text; and the recommending unit is used for recommending the similar case texts of the patient main complaint texts in the historical case texts according to the similarity parameters and the quality evaluation parameters.
According to still further embodiments of the present disclosure, there is provided an electronic device including: a memory; and a processor coupled to the memory, the processor configured to execute the method for recommending similar texts or the method for recommending similar case texts in any of the above embodiments based on instructions stored in the memory device.
According to still further embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of recommending similar text or a method of recommending similar case text in any of the above embodiments.
In the above embodiment, the recommended similar texts are determined in combination with the similarity between the texts and the quality evaluation result of the history text. In this way, the text similarity can be determined from two dimensions of the relative evaluation result and the self evaluation result of the text, so that the accuracy of text recommendation is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The present disclosure can be more clearly understood from the following detailed description with reference to the accompanying drawings, in which:
FIG. 1 illustrates a flow diagram of some embodiments of a recommendation method of similar text of the present disclosure;
FIG. 2 illustrates a schematic diagram of some embodiments of recommendation methods of similar text of the present disclosure;
FIG. 3 shows a schematic diagram of further embodiments of a recommendation method of similar text of the present disclosure;
FIG. 4 illustrates a block diagram of some embodiments of a recommendation device of similar text of the present disclosure;
FIG. 5 illustrates a block diagram of some embodiments of an electronic device of the present disclosure;
fig. 6 shows a block diagram of further embodiments of the electronic device of the present disclosure.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
As described above, the text content based recommendation system recommends similar texts only in accordance with the ranking result of the correlation between the texts. For example, the ranking can be based only on the similarity of the patient's complaint text to the historical medical records text.
The technical problems of the recommendation system based on the text content include: the reference characteristics in the sorting process are single, and the content of the historical medical records is not evaluated; and the support of the user on the custom screening and filtering of the recommended texts is lacked.
Aiming at the technical problem, the history texts are sorted based on the multi-dimensional features so as to realize recommendation of similar texts and recall of the similar history texts. The multi-dimensional features may include quality of medical records, doctor's portrayal, similarity, and the like.
For example, in view of ensuring patient user experience, recall accuracy, and the need to process a large amount of inquiry dialogue content, the present disclosure identifies medical entities in text using word segmentation techniques based on a medical entity dictionary; and a search engine mode is adopted to recall the similar history medical records more accurately.
On the basis of recalling similar historical medical records, performing multi-dimensional feature extraction to judge the quality of medical record texts; integrating the similarity and quality of recalls, calculating comprehensive recommendation parameters and sequencing; the patient can perform custom filtering and screening on the recommendation result to output the final similar medical record. For example, the technical solution of the present disclosure can be realized by the following embodiments.
Fig. 1 illustrates a flow diagram of some embodiments of a recommendation method of similar text of the present disclosure.
As shown in FIG. 1, in step 110, similarity parameters between the user's complaint text and the historical texts are determined.
In some embodiments, the similarity parameter is determined according to the index information of the retrieval words and the historical texts in the main complaint text. For example, determining a search term according to an entity contained in the obtained main complaint text; and determining similarity parameters according to the search terms and the index information, wherein the index information is established according to the entities contained in the historical texts.
In some embodiments, the complaint text is patient complaint text, the historical text is historical case text, and the similar text is similar case text to the patient complaint text. For example, the entity may be a medical domain entity including at least one of a disease-like entity, a symptom-like entity, a pharmaceutical-like entity, and a medical examination-like entity.
In some embodiments, the text may be segmented by a result segmentation algorithm and a custom dictionary technique based on an offline collated medical entity dictionary. Thus, various types of medical field entities such as diseases, symptoms, medicines, examinations, and the like are identified from the text to complete the identification of the medical field entities.
In some embodiments, the search term and the index information are determined according to a Chinese word segmentation method. The similarity parameter is determined according to the ES framework and BM25 method. For example, entity recognition can be performed by using a segmentation method based on a search engine framework, and similar historical medical record texts can be recalled by using a BM25 algorithm.
For example, the search engine framework may employ an ES framework. The ES framework is a distributed, open-source, unified search engine framework. The application ES search engine can carry out self-defined componentized application configuration on the processing processes of word segmentation, indexing, sequencing and the like.
In some embodiments, a word frequency document linked list based on entity (various nouns with practical meanings) associated historical medical record texts can be constructed for each historical medical record text in advance. Therefore, the search of the text content can be converted into the search of the entity, and then the corresponding text is linked, so that the document retrieval efficiency is improved. For example, an entity and a word frequency associated document linked list can be constructed by configuring a document and using the ES.
In some embodiments, the similarity parameter is positively correlated with the inverse document frequency of each search term, is positively correlated with the occurrence frequency of each search term in each history text, and is negatively correlated with the length parameter of each history text. The length parameter is negatively correlated with the average length of all the historical texts and positively correlated with the length of the corresponding historical text.
For example, the similarity parameter score (D | Q) may be determined by the following formula:
Figure BDA0002925033550000081
d is history text, Q is a history text containing I search terms QiSet of (a), (b), f (q)iD) For the search term qiThe number of occurrences in the history text D, | D | is the number of words in the history text D, and avgdl is the average length of the text in the entire history text library.
kiThe term frequency saturation is represented by parameters set according to needs, and is used for adjusting the importance when the number of occurrences of the search term is more than the importance when the number of occurrences is less. E.g. ki∈[1.2,2.0]。
b is a parameter set according to the requirement, represents a field length protocol and is used for controlling the punishment degree of the text length to the weight. For example, b is 0.75.
IDF(qi) For the search term qiInverse document frequency of (2):
Figure BDA0002925033550000082
n(qi) To contain qiN is the total number of the history texts, and a is a parameter which is set as required and is greater than 0 and less than 1. A word IDF is large meaning that the word appears in less text, meaning that the word is more specific.
In some embodiments, the relevant text recalled using the BM25 algorithm based on the ES search engine only indicates how similar the appellation text is to the historical texts, but the quality of the historical texts is not evaluated. Thus, the historical text may be comprehensively ranked by steps 120, 130 in combination with text quality assessment and degree of similarity.
In step 120, the feature information of each historical text is input into the machine learning model, and the quality evaluation parameters of each historical text are output.
In some embodiments, each relevant text of the complaint text is determined in each historical text according to the similarity parameter. For example, according to the size of the similarity parameter, sorting each historical text; and determining each relevant text according to the comparison result of the sorting result and the sorting threshold.
In some embodiments, the feature information of each relevant text is input into a machine learning model, and the quality assessment parameters are output. For example, the word vectors of the feature information and the index information of each historical text are input into the machine learning model, and the quality evaluation parameters of each historical text are output.
In some embodiments, the characteristic information includes at least one of case content characteristic information, doctor profile characteristics. For example, the case content characteristic information includes at least one of the number of turns of a doctor's conversation with a patient, the number of medical terms contained in the historical case text; the doctor portrait characteristics comprise at least one item of the job title, the hospital grade, the goodness of appraisal and the quantity of received diagnoses of the doctor.
In some embodiments, the text quality assessment problem may be modeled as a two-classification problem of high quality text, low quality text. For example, the multidimensional features that can configure the text quality classification include medical record content features, doctor portrait features, and the like.
For example, the medical record content characteristics may include the number of doctor-patient sessions, and the amount of information contained in the doctor's responses. The information quantity may be a quantity of medical knowledge contained in the response, such as the number of medical domain entities such as diseases, symptoms, drugs or treatments.
For example, the doctor profile features may include doctor job title, doctor employment hospital grade, goodness, volume of calls, etc.
In step 130, according to the similarity parameter and the quality evaluation parameter, a similar text of the main complaint text is recommended in each history text.
In some embodiments, similar texts of the main complaint texts are recommended in the relevant texts according to the similarity parameter and the quality evaluation parameter.
In some embodiments, the recommended text is determined among the similar texts according to the filtering result of the user on the similar texts. And the screening result is determined according to at least one item of patient information, doctor portrait and inquiry process information corresponding to each similar text.
In some embodiments, after the set of recommended similar texts, the user may filter the similar texts based on multi-dimensional filtering logic. For example, the screening logic may include a patient user dimension, a physician dimension, an interrogation process dimension, and the like.
For example, the user dimensions may include the user's relevant information such as patient gender, age, medical history, etc.; the doctor dimension can comprise doctor job level, clinic level, evaluation rate, receiving quantity and other relevant information of the doctor; the inquiry process dimension may include information about the inquiry as to whether the consultation is being reviewed, cured, etc.
In some embodiments, a machine learning model may be configured using a TextCNN (Text Convolutional Neural Networks) based deep learning algorithm. For example, the embodiment in fig. 2 can be used for performing multidimensional feature fusion learning on unstructured medical record text data and structured doctor image data.
Fig. 2 shows a schematic diagram of some embodiments of a recommendation method of similar text of the present disclosure.
As shown in fig. 2, the machine learning model includes a convolutional layer, a pooling layer, a Dropout layer, and a fully-connected layer. The word vector of the historical case text, the word vector based on the index of the medical field entity and the doctor portrait information can be used as training data, and the machine learning model can be trained by using the quality evaluation parameter of the historical case text as a labeling result.
In the online evaluation process, word vectors of historical case texts, indexed word vectors, doctor portrait information and the like can be input into a trained machine learning model, and quality evaluation parameters are output.
In the text recommendation process, the quality evaluation parameters output by the machine learning model are used as quality evaluation results, and similar text recommendation is performed by combining with the similarity degree determination results.
The essence of similar text recommendation lies in that meaningful high-quality text is recommended for a user by combining the similarity between texts and the text quality. For example, a comprehensive evaluation parameter may be calculated in combination with the similarity degree and quality evaluation result of the relevant history text of the ES recall; similar text is returned based on this composite evaluation parameter.
In some embodiments, the composite evaluation parameter R may be based on the similarity parameter S1And a quality evaluation parameter S2Determines the weighted average of:
R=αS1+(1-α)S2
and alpha is a weight which is set according to the requirement and is larger than 0 and smaller than 1, and is used for adjusting the influence of the similarity parameter and the quality evaluation parameter on the comprehensive evaluation result.
FIG. 3 shows schematic diagrams of further embodiments of recommendation methods of similar text of the present disclosure.
As shown in fig. 3, according to the technical solution of the present disclosure, a similar medical record text recommendation system based on multidimensional feature sorting can be implemented in an online inquiry platform system. The system mainly comprises a similar medical record recall module, a multidimensional characteristic sorting module, a user filtering module and the like.
In some embodiments, the similar medical record recalling module recalls the similar medical record text mainly through a search engine framework by applying an entity recognition algorithm based on the segmentation of the Chinese character and a BM25 algorithm. For example, a search engine framework may apply an ES framework. The ES framework is a distributed and open-source integrated search engine framework, and the ES search engine can be applied to self-defining componentized application configuration on modules such as word segmentation, index establishment, multi-dimensional feature sequencing and the like.
In some embodiments, similar medical record texts are recalled according to the search framework, and the main steps include establishing an index of historical medical record texts, an index query based on patient complaint texts, recalling of similar medical record texts, and the like.
In the step of establishing the index, a word frequency text linked list of the entity words and the texts in different medical records is established. Thus, the search can be converted into index query, namely, a chain of corresponding texts is carried out on the basis of the search results of the entities, so that the document retrieval efficiency is improved.
In the step of recognizing the entity, the medical field entity is recognized by mainly utilizing the medical entity dictionary which is arranged offline by a doctor and utilizing the technology of ending word segmentation and self-defining dictionary to perform word segmentation so as to recognize various types of medical field entities such as diseases, symptoms, medicines, examinations and the like. An entity and word frequency associated text linked list can be constructed through an ES (extended services) by configuring a document.
In the similar case recalling step, the BM25 algorithm can be applied under the ES framework to calculate the similarity parameter between the patient chief complaint text and the history case history text, which is one of the basis for recalling.
In some embodiments, a final recall similar text may be determined in conjunction with the similarity parameter and the quality assessment parameter. Therefore, the similar medical record text recall may also include a medical record quality assessment step.
For example, in the medical record quality assessment step, a two-classification problem of whether the medical record text is high quality or low quality is modeled. The multidimensional characteristics influencing the quality of the medical record comprise medical record content characteristics and doctor portrait characteristics.
The medical record content characteristics comprise the number of doctor-patient conversation rounds and the information quantity contained in the doctor answers, namely the quantity of contained medical knowledge, such as the number of terms of diseases, symptoms, medicines or treatments and the like. The doctor image characteristics comprise doctor job title, doctor employment hospital grade, favorable assessment rate, quantity of received calls and the like.
In order to extract medical record content features better and more comprehensively, multidimensional feature fusion learning can be carried out on unstructured medical record text data and structured doctor portrait data based on a TextCNN deep learning algorithm.
The essence of similar medical record recommendation lies in that the similarity between the user questions and the quality of the answer process of the doctor are combined, and the similar medical record recommendation combines the similarity and the quality to recommend meaningful high-quality medical records for the user. And comprehensively evaluating parameters through the similarity of the history medical records recalled by the ES in combination with medical record quality evaluation technology, and finally, sorting and returning based on the comprehensively evaluated parameters.
In the index query step, the patient performs index query according to the main complaint text. The medical field entities in the main complaints can be identified through word segmentation analysis; then, according to the entities in the main complaint text, querying the historical texts corresponding to the entities in the index information; and finally, calculating similarity parameters of the medical record texts through a BM25 algorithm, and returning after sorting.
In the step of filtering similar medical record users, the recommended similar text sets are provided for the users so as to be filtered by the users based on multi-dimensional screening logic. And screening logic including per-patient user dimensions, physician dimensions, interrogation process dimensions, and the like.
User dimensions include patient gender, age, medical history, etc.; the doctor dimension comprises doctor job level, clinic level, favorable rate, receiving quantity and the like; the inquiry process includes the further diagnosis, the cure, etc.
In the embodiment, a system for recommending high-quality, desensitized and similar historical case history texts according to the chief complaint texts, user portrait information and the like input by the user is established based on the online inquiry platform. The system can also provide multi-dimensional filtering and screening functions for users.
The recommendation of similar medical record texts has important significance. On one hand, the high-quality knowledge resources of the inquiry process accumulated by the platform are utilized to provide reference for the patient to see a doctor; and on the other hand, the user experience is improved. The user can directly select the doctor corresponding to the recommended medical record to perform on-line inquiry consultation, and therefore income is increased for the platform.
Fig. 4 illustrates a block diagram of some embodiments of recommendation devices of similar text of the present disclosure.
As shown in fig. 4, the recommendation apparatus 4 for similar text includes a similarity determination unit 41, a quality evaluation unit 42, and a recommendation unit 43.
The similarity determination unit 41 determines similarity parameters between the user's complaint text and each history text.
In some embodiments, the similarity determining unit 41 determines the similarity parameter according to the index information of each history text and the search term in the main complaint text.
In some embodiments, the similarity determining unit 41 determines a search term according to the entities included in the obtained main complaint text, and determines the similarity parameter according to the search term and the index information, where the index information is established according to the entities included in each history text.
In some embodiments, the similarity parameter is positively correlated with the inverse document frequency of each search term, positively correlated with the number of occurrences of each search term in each history text, negatively correlated with the length parameter of each history text, negatively correlated with the average length of all history texts, and positively correlated with the length of the corresponding history text.
The quality evaluation unit 42 inputs the feature information of each history text into the machine learning model, and outputs the quality evaluation parameter of each history text.
In some embodiments, the quality evaluation unit 42 determines each relevant text of the main complaint text in each historical text according to the similarity parameter, inputs the feature information of each relevant text into the machine learning model, and outputs the quality evaluation parameter.
In some embodiments, the quality evaluation unit 42 sorts the historical texts in the order of decreasing similarity parameter; and determining the historical texts with the ranking higher than the threshold value as the relevant texts.
In some embodiments, the quality evaluation unit 42 inputs the feature information of each historical text into the machine learning model, and outputs a quality evaluation parameter of each historical text; and inputting the word vectors of the feature information and the index information of each historical text into a machine learning model, and outputting the quality evaluation parameters of each historical text.
The recommending unit 43 recommends similar texts of the main complaint texts in each history text according to the similarity parameter and the quality evaluation parameter.
In some embodiments, the recommending unit 43 recommends similar texts of the main complaint texts in each relevant text according to the similarity parameter and the quality evaluation parameter.
In some embodiments, the complaint text is patient complaint text, the historical text is historical case text, and the similar text is similar case text to the patient complaint text.
In some embodiments, the characteristic information includes at least one of case content characteristic information including at least one of the number of turns of a doctor's conversation with a patient, the number of medical terms included in a history case text, and a doctor profile characteristic including at least one of a job title of a doctor, a hospital grade, a rating, and a quantity of receptions.
In some embodiments, the recommending unit 43 determines the recommended text in each similar text according to the filtering result of the user on each similar text, and the filtering result is determined according to at least one of the patient information, the doctor portrait and the inquiry process information corresponding to each similar text.
In some embodiments, the search term and the index information are determined according to a Chinese word segmentation method, and the similarity parameter is determined according to an ES frame and a binary model BM25 method.
In some embodiments, the entity is a medical domain entity including at least one of a disease class entity, a symptom class entity, a pharmaceutical class entity, a medical examination class entity.
Fig. 5 illustrates a block diagram of some embodiments of an electronic device of the present disclosure.
As shown in fig. 5, the electronic apparatus 5 of this embodiment includes: a memory 51 and a processor 52 coupled to the memory 51, the processor 52 being configured to execute a method of recommending similar texts or a method of recommending similar case texts in any one of the embodiments of the present disclosure based on instructions stored in the memory 51.
The memory 51 may include, for example, a system memory, a fixed nonvolatile storage medium, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader, a database, and other programs.
Fig. 6 shows a block diagram of further embodiments of the electronic device of the present disclosure.
As shown in fig. 6, the electronic apparatus 6 of this embodiment includes: a memory 610 and a processor 620 coupled to the memory 610, the processor 620 being configured to execute the method for recommending similar texts or the method for recommending similar case texts in any of the above embodiments based on the instructions stored in the memory 610.
The memory 610 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader, and other programs.
The electronic device 6 may also include an input-output interface 630, a network interface 640, a storage interface 650, and the like. These interfaces 630, 640, 650 and the connections between the memory 610 and the processor 620 may be through a bus 660, for example. The input/output interface 630 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, a touch screen, a microphone, and a sound box. The network interface 640 provides a connection interface for various networking devices. The storage interface 650 provides a connection interface for external storage devices such as an SD card and a usb disk.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media having computer-usable program code embodied therein, including but not limited to disk storage, CD-ROM, optical storage, and the like.
So far, a recommendation method of similar texts, a recommendation apparatus of similar texts, a recommendation method of similar case texts, a recommendation apparatus of similar case texts, an electronic device, and a nonvolatile computer-readable storage medium according to the present disclosure have been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.
The method and system of the present disclosure may be implemented in a number of ways. For example, the methods and systems of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims (17)

1. A method for recommending similar texts, comprising:
determining similarity parameters between the main complaint texts of the users and the historical texts;
inputting the characteristic information of each historical text into a machine learning model, and outputting the quality evaluation parameters of each historical text;
and recommending similar texts of the main complaint texts in the historical texts according to the similarity parameters and the quality evaluation parameters.
2. The recommendation method of claim 1, wherein the determining similarity parameters between the user's complaint text and the respective historical texts comprises:
and determining the similarity parameter according to the search terms in the main complaint text and the index information of each historical text.
3. The recommendation method according to claim 2, wherein the determining the similarity parameter according to the search term in the main complaint text and the index information of each historical text comprises:
determining the search terms according to the entities contained in the obtained main complaint text;
and determining the similarity parameter according to the search terms and the index information, wherein the index information is established according to the entities contained in the historical texts.
4. The recommendation method according to claim 1, wherein the inputting feature information of each of the historical texts into a machine learning model and the outputting quality evaluation parameters of each of the historical texts comprises:
determining each relevant text of the main complaint text in each historical text according to the similarity parameter;
and inputting the characteristic information of each relevant text into the machine learning model, and outputting the quality evaluation parameters.
The recommending similar texts of the main complaint texts in the historical texts comprises the following steps:
and recommending the similar texts of the main complaint texts in the relevant texts according to the similarity parameters and the quality evaluation parameters.
5. The recommendation method of claim 4, wherein the determining, according to the similarity parameter, each relevant text of the complaint text in each historical text comprises:
sequencing the historical texts according to the size of the similarity parameter;
and determining each relevant text according to the comparison result of the sorting result and the sorting threshold.
6. The recommendation method of claim 1, wherein,
the similarity parameter is positively correlated with the inverse document frequency of each search term, is positively correlated with the occurrence frequency of each search term in each historical text, is negatively correlated with the length parameter of each historical text,
the length parameter is negatively correlated with the average length of all the historical texts and positively correlated with the length of the corresponding historical text.
7. The recommendation method according to claim 2, wherein the characteristic information of each historical text is input into a machine learning model, and a quality evaluation parameter of each historical text is output;
and inputting the word vectors of the feature information and the index information of each historical text into the machine learning model, and outputting the quality evaluation parameters of each historical text.
8. The recommendation method according to any one of claims 1-7,
the chief complaint text is a patient chief complaint text, the historical text is a historical case text, and the similar text is a similar case text of the patient chief complaint text.
9. The recommendation method of claim 8, wherein,
the characteristic information comprises at least one item of case content characteristic information and doctor portrait characteristics,
the case content characteristic information includes at least one of the number of turns of the doctor's conversation with the patient and the number of medical terms contained in the historical case text,
the doctor portrait characteristics comprise at least one of the job title, the hospital grade, the goodness of appraisal and the quantity of receiving a doctor.
10. The recommendation method of claim 8, further comprising:
and determining a recommended text in each similar text according to the screening result of the user on each similar text, wherein the screening result is determined according to at least one item of patient information, doctor portrait and inquiry process information corresponding to each similar text.
11. The recommendation method according to any one of claims 1-7,
the search terms and the index information are determined according to a method of dividing words by a bus, and the similarity parameters are determined according to a method of elastically searching an ES frame and a binary model BM 25.
12. The recommendation method of claim 3, wherein,
the entity is at least one of a disease entity, a symptom entity, a medicine entity and a medical examination entity.
13. A recommendation method of similar case texts comprises the following steps:
determining similarity parameters between the patient complaint texts of the users and the historical case texts;
inputting the characteristic information of each historical case text into a machine learning model, and outputting the quality evaluation parameters of each historical case text;
and recommending the similar case texts of the patient complaint texts in the historical case texts according to the similarity parameters and the quality evaluation parameters.
14. A text-like recommendation apparatus comprising:
the similarity determining unit is used for determining similarity parameters between the main complaint texts of the users and the historical texts;
the quality evaluation unit is used for inputting the characteristic information of each historical text into a machine learning model and outputting the quality evaluation parameters of each historical text;
and the recommending unit is used for recommending the similar texts of the main complaint texts in the historical texts according to the similarity parameters and the quality evaluation parameters.
15. A recommendation apparatus for similar case texts, comprising:
the similarity determining unit is used for determining similarity parameters between the patient chief complaint texts of the users and the historical case texts;
the quality evaluation unit is used for inputting the characteristic information of each historical case text into a machine learning model and outputting the quality evaluation parameters of each historical case text;
and the recommending unit is used for recommending the similar case texts of the patient main complaint texts in the historical case texts according to the similarity parameters and the quality evaluation parameters.
16. An electronic device, comprising:
a memory; and
a processor coupled to the memory, the processor configured to execute the method for recommending similar text of any of claims 1-12 or the method for recommending similar case text of claim 13 based on instructions stored in the memory.
17. A non-transitory computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the method of recommending similar text according to any one of claims 1 to 12, or the method of recommending similar case text according to claim 13.
CN202110130557.7A 2021-01-29 2021-01-29 Similar text recommendation method and device and electronic equipment Pending CN113779954A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110130557.7A CN113779954A (en) 2021-01-29 2021-01-29 Similar text recommendation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110130557.7A CN113779954A (en) 2021-01-29 2021-01-29 Similar text recommendation method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN113779954A true CN113779954A (en) 2021-12-10

Family

ID=78835581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110130557.7A Pending CN113779954A (en) 2021-01-29 2021-01-29 Similar text recommendation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113779954A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842985A (en) * 2022-06-30 2022-08-02 北京超数时代科技有限公司 Virtual reality diagnosis and treatment system under meta-universe scene
CN116562271A (en) * 2023-07-10 2023-08-08 之江实验室 Quality control method and device for electronic medical record, storage medium and electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842985A (en) * 2022-06-30 2022-08-02 北京超数时代科技有限公司 Virtual reality diagnosis and treatment system under meta-universe scene
CN114842985B (en) * 2022-06-30 2023-12-19 北京超数时代科技有限公司 Virtual reality diagnosis and treatment system under meta-universe scene
CN116562271A (en) * 2023-07-10 2023-08-08 之江实验室 Quality control method and device for electronic medical record, storage medium and electronic equipment
CN116562271B (en) * 2023-07-10 2023-10-10 之江实验室 Quality control method and device for electronic medical record, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN110993081B (en) Doctor online recommendation method and system
CN111414393B (en) Semantic similar case retrieval method and equipment based on medical knowledge graph
CN112632385A (en) Course recommendation method and device, computer equipment and medium
Amir et al. Quantifying mental health from social media with neural user embeddings
WO2020147395A1 (en) Emotion-based text classification method and device, and computer apparatus
CN116994709B (en) Personalized diet and exercise recommendation method and system and electronic equipment
CN112395500A (en) Content data recommendation method and device, computer equipment and storage medium
Almagro et al. ICD-10 coding of Spanish electronic discharge summaries: An extreme classification problem
CN113779954A (en) Similar text recommendation method and device and electronic equipment
CN112016450A (en) Training method and device of machine learning model and electronic equipment
CN109829154B (en) Personality prediction method based on semantics, user equipment, storage medium and device
CN109472292A (en) A kind of sensibility classification method of image, storage medium and server
CN112017744A (en) Electronic case automatic generation method, device, equipment and storage medium
CN115050442B (en) Disease category data reporting method and device based on mining clustering algorithm and storage medium
Seth et al. A comparative overview of hybrid recommender systems: Review, challenges, and prospects
CN113241193B (en) Drug recommendation model training method, recommendation method, device, equipment and medium
CN113934937A (en) Intelligent content recommendation method and device, terminal and storage medium
CN109344232A (en) A kind of public feelings information search method and terminal device
GB2603318A (en) Workshop assistance system and workshop assistance method
US12019635B2 (en) Methods and systems for arranging and displaying guided recommendations via a graphical user interface based on biological extraction
WO2023178970A1 (en) Medical data processing method, apparatus and device, and storage medium
KR102437661B1 (en) A System and Method for Classifying diseases using Electronic Medical Record for Animal Hospitals
CN106446696A (en) Information processing method and electronic device
CN114693949A (en) Multi-modal evaluation object extraction method based on regional perception alignment network
Vala et al. Analytical review and study on emotion recognition strategies using multimodal signals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination