CN117095836A - Intestinal tract preparation guiding method and system - Google Patents

Intestinal tract preparation guiding method and system Download PDF

Info

Publication number
CN117095836A
CN117095836A CN202311338056.3A CN202311338056A CN117095836A CN 117095836 A CN117095836 A CN 117095836A CN 202311338056 A CN202311338056 A CN 202311338056A CN 117095836 A CN117095836 A CN 117095836A
Authority
CN
China
Prior art keywords
user
voice
current time
intestinal tract
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311338056.3A
Other languages
Chinese (zh)
Other versions
CN117095836B (en
Inventor
徐静茹
乔洪图
凌文
王义芬
刘红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
People's Hospital Of Qingbaijiang District Chengdu
Original Assignee
People's Hospital Of Qingbaijiang District Chengdu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by People's Hospital Of Qingbaijiang District Chengdu filed Critical People's Hospital Of Qingbaijiang District Chengdu
Priority to CN202311338056.3A priority Critical patent/CN117095836B/en
Publication of CN117095836A publication Critical patent/CN117095836A/en
Application granted granted Critical
Publication of CN117095836B publication Critical patent/CN117095836B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/18Status alarms
    • G08B21/24Reminder alarms, e.g. anti-loss alarms
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H80/00ICT specially adapted for facilitating communication between medical practitioners or patients, e.g. for collaborative diagnosis, therapy or health monitoring

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Emergency Management (AREA)
  • Business, Economics & Management (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to the technical field of recognition, in particular to an intestinal preparation guiding method and an intestinal preparation guiding system, wherein the intestinal preparation guiding method is used for obtaining a voice text by obtaining user voice and determining a target inspection template according to the current time of the obtained user voice and a preset inspection template, judging whether the intestinal preparation work of a user at the current time is qualified or not according to the voice text and the target inspection template, scoring the intestinal preparation work of the user at the current time if the intestinal preparation work is qualified, outputting a scoring result, prompting the user and playing the target inspection template if the intestinal preparation work is not qualified, so that the intestinal preparation guiding based on the voice recognition, voice interaction and feedback of the user is realized, the better intestinal preparation guiding effect is obtained through timely voice interaction and feedback, the phenomenon that the user has deviation to the description of the intestinal preparation work is reduced, the intestinal preparation of the user is more sufficient, and the accuracy of a colorectal mirror inspection result is further improved.

Description

Intestinal tract preparation guiding method and system
Technical Field
The application relates to the technical field of identification, in particular to an intestinal tract preparation guiding method and system.
Background
Intestinal preparation refers to the preparation of cleaning the intestinal tract to remove fecal matter, food residues, etc. from the intestinal tract prior to colorectal microscopy. Acceptable bowel preparation can provide clear vision for colorectal microscopy, improve accuracy of colorectal microscopy results, and reduce risk of infection during colorectal microscopy procedures, while insufficient bowel preparation can result in high colorectal microscopy missed diagnosis. Thus, there is a need for gut preparation guidance for patients to improve the accuracy of colorectal microscopy results.
The existing intestinal tract preparation guiding method mainly comprises a mode of issuing a propaganda manual in an off-line mode and a mode of playing teaching videos and filling out forms in an on-line mode. The method of issuing the propaganda manual offline cannot specifically guide individual differences, and cannot solve the trouble and problems of partial patients in the intestinal tract preparation process. For the modes of online playing of teaching videos and filling of forms, a patient can timely feed back own intestinal tract preparation conditions through filling of the forms so as to acquire a personalized guiding scheme. However, it is difficult for some patients to independently fill out the form, and problems such as deviation in description of intestinal tract preparation may occur even if the form filling is enabled, thereby resulting in insufficient intestinal tract preparation for the patient. Therefore, the existing intestinal preparation guidance method has poor applicability and poor guidance effect on part of patients, so that the intestinal preparation of the patients is insufficient, and the accuracy of colorectal microscopy results is reduced.
Disclosure of Invention
In order to solve the problems in the prior art, the application provides the intestinal tract preparation guiding method and the intestinal tract preparation guiding system, and a good intestinal tract preparation guiding effect is realized.
In order to achieve the technical purpose, the technical scheme adopted by the embodiment of the application comprises the following steps:
in a first aspect, an embodiment of the present application provides a method for intestinal tract preparation guidance, including:
acquiring user voice and current time, wherein the current time is the time when the user voice is acquired;
recognizing the voice of the user to obtain a voice text;
determining a target inspection template according to the current time and a preset inspection template, wherein the inspection template comprises different time periods and intestinal tract preparation works corresponding to the time periods, and the target inspection template is a text description of the intestinal tract preparation works corresponding to the time periods where the current time is located;
judging whether the intestinal tract preparation work of the user at the current time is qualified or not according to the voice text and the target test template;
if yes, grading the intestinal tract preparation work of the user at the current time according to a preset grading rule, and outputting a grading result; if not, reminding the user that the intestinal tract preparation work at the current time is unqualified, and playing the target inspection template.
In addition, the intestinal tract preparation guiding method according to the above embodiment of the present application may further have the following additional technical features:
further, in the intestinal tract preparation guiding method according to the embodiment of the present application, recognizing the voice of the user, obtaining the voice text includes:
and inputting the voice of the user into the pre-trained voice recognition model, and outputting voice text.
Further, in one embodiment of the present application, inputting the user's speech into the pre-trained speech recognition model, outputting the speech text includes:
converting the user speech into a plurality of vector sequences;
extracting features of each vector sequence to obtain a plurality of feature graphs;
extracting space-time characteristics of each characteristic graph, wherein the space-time characteristics comprise a time context relation and a space context relation between each vector sequence;
extracting frequency characteristics of each characteristic graph;
and adding the space-time features of the feature map and the frequency features of the feature map, and outputting the voice text after passing through the full-connection layer of the voice recognition model.
Further, in one embodiment of the present application, converting the user speech into a plurality of vector sequences includes:
user speech is converted into a plurality of vector sequences using framing, short-time fourier transform, and mel-scale filter banks.
Further, in one embodiment of the present application, extracting spatio-temporal features of each feature map includes:
spatially encoding the feature map to obtain spatially encoded features;
inputting the spatial coding features into a time attention unit of a voice recognition model, and outputting a time context relation;
and performing spatial decoding on the time context relationship to obtain the spatial context relationship.
Further, in one embodiment of the present application, extracting the frequency features of each feature map includes:
inputting the feature images into a classification network of the voice recognition model, and outputting correlation information, wherein the correlation information is the relation among the feature images, and two full-connection layers are arranged on residual branches of the classification network;
extracting the characteristics of two full-connection layers of the classification network to obtain a first frequency characteristic and a second frequency characteristic;
and jointly scaling the results of the discrete cosine transform of the first frequency characteristic, the second frequency characteristic and the correlation information to output the frequency characteristic.
Further, in one embodiment of the present application, determining whether the user's bowel preparation at the current time is acceptable based on the phonetic text and the target inspection template comprises:
calculating the similarity between the voice text and the target inspection template and a first matching number, wherein the first matching number is the number of matching of the voice text and the keywords of the target inspection template;
if the similarity is smaller than a preset first threshold, the intestinal tract preparation work of the user at the current time is unqualified;
if the similarity is larger than a first threshold value and the first matching number is larger than a preset second threshold value, the intestinal tract preparation work of the user at the current time is qualified;
if the similarity is larger than a first threshold value and the first matching number is smaller than a second threshold value, describing and completing the voice text to obtain a complete voice text, and calculating a second matching number, wherein the second matching number is the number of matching of the complete voice text and the keywords of the target inspection template;
if the second matching number is larger than a second threshold value, the intestinal tract preparation work of the user at the current time is qualified; if the second matching number is smaller than the second threshold value, the intestinal tract preparation work of the user at the current time is not qualified.
Further, in one embodiment of the present application, performing description completion on the voice text to obtain a complete voice text includes:
extracting keywords of the voice text;
searching nodes matched with keywords of the voice text in a pre-constructed knowledge graph to obtain a plurality of complement words;
and inserting each complement word into the keyword to obtain the complete voice text.
Further, in one embodiment of the present application, the intestinal tract preparation guiding method further includes:
and displaying the grading result.
In a second aspect, an embodiment of the present application provides an intestinal tract preparation guidance system, including:
the acquisition module is used for acquiring user voice and current time, wherein the current time is the time when the user voice is acquired;
the recognition module is used for recognizing the voice of the user to obtain a voice text;
the determining module is used for determining a target inspection template according to the current time and a preset inspection template, wherein the inspection template comprises different time periods and intestinal tract preparation works corresponding to the time periods, and the target inspection template is a text description of the intestinal tract preparation works corresponding to the time period where the current time is located;
the judging module is used for judging whether the intestinal tract preparation work of the user at the current time is qualified or not according to the voice text and the target inspection template;
the evaluation module is used for scoring the intestinal tract preparation work of the user at the current time according to a preset scoring rule if yes, and outputting a scoring result; if not, reminding the user that the intestinal tract preparation work at the current time is unqualified, and playing the target inspection template.
The method has the advantages that voice text is obtained by acquiring voice of the user for recognition, the target inspection template is determined according to the current time of the acquired voice of the user and the preset inspection template, whether the intestinal tract preparation work of the user at the current time is qualified or not is judged according to the voice text and the target inspection template, if yes, the intestinal tract preparation work of the user at the current time is scored, a scoring result is output, if not, the user is reminded and the target inspection template is played, the intestinal tract preparation guidance based on voice recognition, voice interaction and feedback of the user is achieved, the better intestinal tract preparation guidance effect is obtained through timely voice interaction and feedback, the phenomenon that the user has deviation to the description of the intestinal tract preparation work is reduced, the intestinal tract preparation of the user is more sufficient, and the accuracy of the colorectal microscopy result is further improved.
Drawings
Fig. 1 is a schematic flow chart of an embodiment of an intestinal tract preparation guiding method according to the present application;
FIG. 2 is a schematic diagram of a language recognition model of an embodiment of an intestinal tract preparation guidance method according to the present application;
fig. 3 is a schematic structural diagram of SENet according to an embodiment of the intestinal tract preparation guiding method provided by the present application;
fig. 4 is a schematic structural diagram of an embodiment of an intestinal tract preparation guidance system according to the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, a method for guiding intestinal tract preparation according to an embodiment of the present application includes:
s101, acquiring user voice and current time.
The current time is the time when the user voice is acquired.
Optionally, in some embodiments, the front-end application is employed to obtain the user's voice and the current time. It will be appreciated that the user's voice is a voice description of the user's current completed bowel preparation, such as "i now eat two morpholines". When the front-end application receives user voice through the audio acquisition module of the device, the current system time is synchronously recorded as the current time.
Optionally, in some embodiments, the application of the front end is a WeChat applet.
S102, recognizing the voice of the user to obtain a voice text.
Specifically, in the embodiment of the application, the acquired user voice is subjected to feature extraction so as to better understand the signals contained in the user voice. Alternatively, the feature extraction method includes short-time energy, zero-crossing rate, mel-frequency cepstral coefficient, and the like. And then, inputting the extracted features into a pre-trained recognition model based on the deep learning technology, mapping the features to corresponding voice texts and outputting the voice texts.
Optionally, in some embodiments, the user voice is preprocessed before recognizing the user voice to obtain the voice text, including operations of removing noise, adjusting volume, reducing echo, and the like, so as to reduce noise interference and improve user voice quality and accuracy of voice recognition.
Optionally, in some embodiments, after recognizing the user's voice to obtain the voice text, post-processing is performed on the recognized voice text, including operations such as error correction, grammar correction, punctuation addition, and the like, so as to improve the readability and accuracy of the voice text.
Optionally, in some embodiments, step S102 includes: and inputting the voice of the user into the pre-trained voice recognition model, and outputting voice text.
Further, referring to fig. 2, the speech recognition model of the embodiment of the present application includes a backbone network MobileNet, a spatio-temporal feature extraction subunit, and a frequency feature extraction subunit. Inputting the voice of a user into a pre-trained voice recognition model, and outputting a voice text, wherein the method specifically comprises the following steps of:
step S1021, converting the user voice into a plurality of vector sequences.
Wherein each vector sequence X is represented asComprising N vectors, each vector having a dimension of 1*C. It will be appreciated that the user's speech is made up of a plurality of vector sequences. The minimum input unit of the speech recognition model in the embodiment of the application is a single vector sequence, so that a plurality of vector sequences are sequentially input into the speech recognition model.
It will be appreciated that, as shown in fig. 2, since each vector sequence X includes N vectors, and each vector has a dimension of 1*C, the input dimension of the vector sequence of the single-input speech recognition model is n×c.
Optionally, in some embodiments, the user speech is converted into a plurality of vector sequences using framing, short-time fourier transforms, and mel-scale filter banks.
And step S1022, extracting the characteristics of each vector sequence to obtain a plurality of characteristic diagrams.
Specifically, with continued reference to fig. 2, each vector sequence is input into a backbone network MobileNet of the speech recognition model, and feature extraction is performed on each vector sequence, so as to obtain a plurality of feature maps.
Optionally, the non-learning layer is removed from the backbone network MobileNet.
Step S1023, extracting the space-time characteristics of each characteristic diagram.
Wherein the spatio-temporal features include temporal and spatial context between respective vector sequences.
It can be understood that, because the plurality of vector sequences in the embodiment of the application sequentially input the voice recognition model, the time context and the space context between the extracted vector sequences can enable the association between the vector sequences to be more accurate, thereby improving the accuracy of voice recognition.
Alternatively, the feature map output by the MobileNet is input into a space-time feature extraction subunit of the speech recognition model, and the space-time features of each feature map are extracted. With continued reference to fig. 2, the spatio-temporal feature extraction subunit includes a spatial encoder, a temporal attention unit, and a spatial decoder, wherein the spatial encoder and the spatial decoder are operable to extract spatial context relationships between the sequences of vectors, and the temporal attention unit is operable to extract temporal context relationships between the sequences of vectors. The step of extracting the space-time features of each feature map by using the space-time feature extraction subunit specifically comprises the following steps:
1) Spatially encoding the feature map to obtain spatially encoded features;
2) Inputting the spatial coding features into a time attention unit of a voice recognition model, and outputting a time context relation;
3) And performing spatial decoding on the time context relationship to obtain the spatial context relationship.
And step S1024, extracting the frequency characteristics of each characteristic diagram.
Specifically, in the frequency feature extraction subunit of the speech recognition model, frequency features of each feature map are extracted by using feature combination discrete cosine transform without nonlinearity and with nonlinearity.
It can be understood that when a part of users input voice, the input voice of the users is dialect, and the existing voice recognition method is difficult to accurately recognize the dialect of the users, so that the voice recognition result is inaccurate, and the effect of intestinal tract preparation guidance is poor. According to the embodiment of the application, the space-time characteristics extracted in the step S1023 and the frequency characteristics extracted in the step S1024 are combined, and the spatial context relation, the time context relation and the relation between dialects and mandarin are comprehensively considered, so that accurate dialect recognition is realized, the accuracy of a voice recognition result is improved, and the effect of intestinal tract preparation guidance is better.
Optionally, with continued reference to fig. 2, in some embodiments, step S1024 specifically includes:
1) And inputting the feature map into a classification network of the voice recognition model, and outputting the correlation information.
The correlation information is the relation among the feature graphs, two full-connection layers are arranged on residual branches of the classification network, and the classification network is one network in the frequency feature extraction subunit.
Optionally, in some embodiments, the classification network is SENet. It will be appreciated that SENet is essentially the extraction of the correlation between channels, i.e. the relationship between the feature maps to which the vector sequence corresponds.
2) And extracting the characteristics of two full-connection layers of the classification network to obtain a first frequency characteristic and a second frequency characteristic.
Specifically, the embodiment of the application extracts the characteristics of two fully connected layers of the classification network as a first frequency characteristic and a second frequency characteristic. As shown in fig. 3, for SENet, since there is one ReLU layer (nonlinear active layer) between two fully connected layers, the difference between the first frequency characteristic and the second frequency characteristic is whether there is nonlinearity. Referring to fig. 3, senet includes a global average pooling layer, two fully connected layers, a ReLU layer, and a Sigmoid activation function layer. Wherein X is a vector sequence input to the backbone network, each vector sequence X comprises N vectors, and each vector has a dimension of 1*C, i.e., the dimension of the vector sequence X is n×c. The vector sequence X is extracted by the characteristics of the backbone network to generate a characteristic diagramWherein->Is the size of the feature map. Feature map->The +.A. is obtained after global average pooling>Then output +.>Wherein->For dimension reduction parameter of the full connection layer, for ∈K>Dimension->Dimension is reduced to obtain dimension->Thereby reducing the calculation amount of the ReLU layer, and after the activation treatment of the ReLU layer, the dimension is increased through a second full connection layer to obtain. Output of the second full-connection layer>After Sigmoid activation function processing, scaling is carried out to obtain vector +.>
3) And jointly scaling the results of the discrete cosine transform of the first frequency characteristic, the second frequency characteristic and the correlation information to output the frequency characteristic.
It will be appreciated that the nonlinearity is introduced to allow the language recognition model of the embodiments of the present application to learn better, but some of the original properties of the data are changed when passing through the nonlinear layer, so that subtle differences between dialects and standard mandarin are wiped off. Therefore, the embodiment of the application jointly scales the results obtained after the discrete cosine transform is carried out on the first frequency characteristic, the second frequency characteristic and the correlation information, so that the learning effect of the language identification model is improved, the difference between the dialect and the standard mandarin is not wiped off, and the accuracy of voice identification when the voice of the user is the dialect is improved.
Step S1025, adding the space-time features of the feature map and the frequency features of the feature map, and outputting the voice text after passing through the full connection layer of the voice recognition model.
The voice text obtained through recognition is the text of standard mandarin.
S103, determining a target inspection template according to the current time and a preset inspection template.
The test templates comprise different time periods and intestinal tract preparation works corresponding to the time periods, and the target test templates are text descriptions of the intestinal tract preparation works corresponding to the time periods where the current time is located.
It will be appreciated that the bowel preparation is divided according to strict time periods, such as: two pieces of morpholine are eaten 8 times in the morning, one box of compound polyglycol electrolyte powder (three bags are arranged in the compound polyglycol electrolyte powder) is drunk once by adding 800 milliliters of warm boiled water (1 jin of 6 two) at 9 times, and the other box of compound polyglycol electrolyte powder is drunk four times by adding 1000 milli of warm boiled water (2 jin of) at 9 times for 15 minutes. Therefore, according to the time period of the current time, the embodiment of the application can determine the intestinal tract preparation work required by the user at the current time by combining the test template, namely, the intestinal tract preparation work corresponding to the time period of the current time is determined from the test template as the target test template. Illustratively, the current time is 8 points, and the corresponding target test template is to eat two morpholines.
S104, judging whether the intestinal tract preparation work of the user at the current time is qualified or not according to the voice text and the target inspection template.
It will be appreciated that by calculating the similarity between the target inspection template determined in step S103 and the speech text obtained based on the speech recognition of the user, it is possible to determine whether the user' S bowel preparation at the present time is acceptable.
Optionally, in some embodiments, step S104 specifically includes:
step S1041, calculating the similarity and the first matching number between the voice text and the target inspection template.
The first matching number is the number of matching of the voice text and the keywords of the target inspection template.
Optionally, in some embodiments, the similarity is a cosine similarity.
In step S1042, if the similarity is smaller than a preset first threshold, the user is not qualified in the preparation of the intestinal tract at the current time.
In step S1043, if the similarity is greater than the first threshold and the first matching number is greater than the preset second threshold, the user is qualified to prepare the intestinal tract at the current time.
Step S1044, if the similarity is greater than the first threshold and the first matching number is less than the second threshold, performing description completion on the voice text to obtain a complete voice text, and calculating the second matching number.
The second matching number is the number of matching of the complete voice text and the keywords of the target inspection template.
Optionally, in some embodiments, the description complement is performed on the voice text, and the specific steps for obtaining the complete voice text are as follows:
1) Extracting keywords of the voice text;
2) Searching nodes matched with keywords of the voice text in a pre-constructed knowledge graph to obtain a plurality of complement words;
3) And inserting each complement word into the keyword to obtain the complete voice text.
Optionally, a predetermined extraction model, such as Bi-LSTM+CRF, is used to extract a series of keywords k of the phonetic text i And searching and k in the knowledge graph i And the nodes are matched, and the maximum connected graph node is used as a description complement word. It will be appreciated that these complement terms are related to the keywords and conform to the context of intestinal tract preparation.
Optionally, the complement word is stored in the form of one keyword versus a plurality of complement words.
Alternatively, in some embodiments, each complement word is inserted to the right of the keyword, resulting in complete phonetic text.
Step S1045, if the second matching number is greater than the second threshold, the intestinal tract preparation work of the user at the current time is qualified; if the second matching number is smaller than the second threshold value, the intestinal tract preparation work of the user at the current time is not qualified.
S105, if yes, grading the intestinal tract preparation work of the user at the current time according to a preset grading rule, and outputting a grading result; if not, reminding the user that the intestinal tract preparation work at the current time is unqualified, and playing the target inspection template.
Optionally, in some embodiments, when the intestinal tract preparation work at the current time is failed, the user generates feedback information of "step failure", and stores the feedback information in the user data table for reference; and when the intestinal tract preparation work at the current time is qualified, the scoring result is stored in a user data table so as to be conveniently consulted.
Optionally, in some embodiments, the bowel preparation guidance method further comprises:
and displaying the grading result.
It can be appreciated that by displaying the scoring results, the user can more intuitively receive feedback of the intestinal tract preparation work at the current time.
Further, in some embodiments, the scoring results are synchronized with the voice broadcast scoring results while the scoring results are displayed.
Further, in some embodiments, after step S105, the complete video of the intestinal tract preparation is played in response to the user clicking the preset button of the front-end application.
The existing intestinal preparation guiding method is poor in applicability and poor in guiding effect on part of patients, so that the intestinal preparation of the patients is insufficient, and the accuracy of colorectal microscopy results is reduced. Therefore, the embodiment of the application adopts the intestinal preparation guiding method of the steps S101-S105, the voice text is obtained by acquiring the voice of the user for recognition, the target inspection template is determined according to the current time of the acquired voice of the user and the preset inspection template, whether the intestinal preparation work of the user at the current time is qualified or not is judged according to the voice text and the target inspection template, if yes, the intestinal preparation work of the user at the current time is scored, the scoring result is output, if not, the user is reminded and the target inspection template is played, the intestinal preparation guiding based on the voice recognition, voice interaction and feedback of the user is realized, the better intestinal preparation guiding effect is obtained through timely voice interaction and feedback, the phenomenon that the user has deviation in description of the intestinal preparation work is reduced, the intestinal preparation of the user is more sufficient, and the accuracy of the colorectal mirror inspection result is further improved.
Next, an intestinal tract preparation guidance system according to an embodiment of the present application will be described with reference to the accompanying drawings.
Fig. 4 is a schematic diagram of an intestinal tract preparation guidance system according to an embodiment of the present application.
The system specifically comprises:
an obtaining module 401, configured to obtain a user voice and a current time, where the current time is a time when the user voice is obtained;
the recognition module 402 is configured to recognize a user's voice to obtain a voice text;
the determining module 403 is configured to determine a target inspection template according to the current time and a preset inspection template, where the inspection template includes different time periods and intestinal preparation works corresponding to the time periods, and the target inspection template is a text description of the intestinal preparation works corresponding to the time period where the current time is located;
a judging module 404, configured to judge whether the intestinal tract preparation work of the user at the current time is qualified according to the voice text and the target inspection template;
the judging module 405 is configured to score the intestinal tract preparation work of the user at the current time according to a preset scoring rule if yes, and output a scoring result; if not, reminding the user that the intestinal tract preparation work at the current time is unqualified, and playing the target inspection template.
It can be seen that the content in the above method embodiment is applicable to the system embodiment, and the functions specifically implemented by the system embodiment are the same as those of the method embodiment, and the beneficial effects achieved by the method embodiment are the same as those achieved by the method embodiment.
In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present application are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.
Furthermore, while the application is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the functions and/or features may be integrated in a single physical device and/or software module or may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present application. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Accordingly, one of ordinary skill in the art can implement the application as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the application, which is to be defined in the appended claims and their full scope of equivalents.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable program execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In describing embodiments of the present application, it should be understood that the terms "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "center", "top", "bottom", "inner", "outer", "inside", "outside", etc. indicate orientations or positional relationships based on the drawings are merely for convenience in describing the present application and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present application. Wherein "inside" refers to an interior or enclosed area or space. "peripheral" refers to the area surrounding a particular component or region.
In the description of embodiments of the present application, the terms "first," "second," "third," "fourth" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", "a third" and a fourth "may explicitly or implicitly include one or more such feature. In the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
In describing embodiments of the present application, it should be noted that the terms "mounted," "connected," and "assembled" are to be construed broadly, as they may be fixedly connected, detachably connected, or integrally connected, unless otherwise specifically indicated and defined; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art.
In the description of embodiments of the application, a particular feature, structure, material, or characteristic may be combined in any suitable manner in one or more embodiments or examples.
In describing embodiments of the present application, it will be understood that the terms "-" and "-" are intended to be inclusive of the two numerical ranges, and that the ranges include the endpoints. For example: "A-B" means a range greater than or equal to A and less than or equal to B. "A-B" means a range of greater than or equal to A and less than or equal to B.
In the description of embodiments of the present application, the term "and/or" is merely an association relationship describing an association object, meaning that three relationships may exist, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
Although embodiments of the present application have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the application, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A method of gut preparation guidance comprising:
acquiring user voice and current time, wherein the current time is the time when the user voice is acquired;
recognizing the user voice to obtain a voice text;
determining a target inspection template according to the current time and a preset inspection template, wherein the inspection template comprises different time periods and intestinal tract preparation works corresponding to the time periods, and the target inspection template is a text description of the intestinal tract preparation works corresponding to the time period of the current time;
judging whether the intestinal tract preparation work of the user at the current time is qualified or not according to the voice text and the target inspection template;
if yes, grading the intestinal tract preparation work of the user at the current time according to a preset grading rule, and outputting a grading result; if not, reminding the user that the intestinal tract preparation work at the current time is unqualified, and playing the target inspection template.
2. The method of claim 1, wherein said recognizing said user's voice to obtain voice text comprises:
and inputting the user voice into a pre-trained voice recognition model, and outputting the voice text.
3. The method of claim 2, wherein said inputting said user's voice into a pre-trained voice recognition model and outputting said voice text comprises:
converting the user speech into a plurality of vector sequences;
extracting the characteristics of each vector sequence to obtain a plurality of characteristic diagrams;
extracting space-time characteristics of each characteristic graph, wherein the space-time characteristics comprise a time context relation and a space context relation between each vector sequence;
extracting frequency characteristics of each characteristic graph;
and adding the space-time features of the feature map and the frequency features of the feature map, and outputting the voice text after passing through a full connection layer of the voice recognition model.
4. A method of gut preparation guidance according to claim 3, wherein said converting said user speech into a plurality of vector sequences comprises:
the user speech is converted into a plurality of the vector sequences using framing, short-time fourier transform and mel-scale filter banks.
5. A method of gut preparation guidance according to claim 3, wherein said extracting spatiotemporal features of each of said feature maps comprises:
performing spatial coding on the feature map to obtain spatial coding features;
inputting the spatial coding features into a time attention unit of the voice recognition model, and outputting the time context relation;
and performing spatial decoding on the time context relation to obtain the spatial context relation.
6. A method of gut preparation guidance according to claim 3, wherein said extracting frequency features of each of said feature maps comprises:
inputting the feature images into a classification network of the voice recognition model, and outputting correlation information, wherein the correlation information is the relation among the feature images, and two full-connection layers are arranged on residual branches of the classification network;
extracting the characteristics of two full-connection layers of the classification network to obtain a first frequency characteristic and a second frequency characteristic;
and jointly scaling the first frequency characteristic, the second frequency characteristic and the result of discrete cosine transform of the correlation information, and outputting the frequency characteristic.
7. The method according to claim 1, wherein determining whether the user's bowel preparation at the current time is acceptable based on the voice text and the target inspection template comprises:
calculating the similarity between the voice text and the target inspection template and a first matching number, wherein the first matching number is the number of matching keywords between the voice text and the target inspection template;
if the similarity is smaller than a preset first threshold, the intestinal tract preparation work of the user at the current time is unqualified;
if the similarity is larger than the first threshold value and the first matching number is larger than a preset second threshold value, the intestinal tract preparation work of the user at the current time is qualified;
if the similarity is larger than the first threshold value and the first matching number is smaller than the second threshold value, describing and completing the voice text to obtain a complete voice text, and calculating a second matching number, wherein the second matching number is the number of matching of the complete voice text and the keywords of the target inspection template;
if the second matching number is larger than the second threshold value, the intestinal tract preparation work of the user at the current time is qualified; if the second matching number is smaller than the second threshold value, the intestinal tract preparation work of the user at the current time is not qualified.
8. The method of claim 7, wherein said completing said voice text to obtain complete voice text comprises:
extracting keywords of the voice text;
searching nodes matched with the keywords of the voice text in a pre-constructed knowledge graph to obtain a plurality of complement words;
and inserting each complement word into the keyword to obtain the complete voice text.
9. The method of claim 1, further comprising:
and displaying the scoring result.
10. An intestinal tract preparation guidance system, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring user voice and current time, and the current time is the time when the user voice is acquired;
the recognition module is used for recognizing the user voice to obtain a voice text;
the determining module is used for determining a target inspection template according to the current time and a preset inspection template, wherein the inspection template comprises different time periods and intestinal tract preparation work corresponding to each time period, and the target inspection template is a text description of the intestinal tract preparation work corresponding to the time period where the current time is located;
the judging module is used for judging whether the intestinal tract preparation work of the user at the current time is qualified or not according to the voice text and the target inspection template;
the judging module is used for scoring the intestinal tract preparation work of the user at the current time according to a preset scoring rule if yes, and outputting a scoring result; if not, reminding the user that the intestinal tract preparation work at the current time is unqualified, and playing the target inspection template.
CN202311338056.3A 2023-10-17 2023-10-17 Intestinal tract preparation guiding method and system Active CN117095836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311338056.3A CN117095836B (en) 2023-10-17 2023-10-17 Intestinal tract preparation guiding method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311338056.3A CN117095836B (en) 2023-10-17 2023-10-17 Intestinal tract preparation guiding method and system

Publications (2)

Publication Number Publication Date
CN117095836A true CN117095836A (en) 2023-11-21
CN117095836B CN117095836B (en) 2023-12-22

Family

ID=88770263

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311338056.3A Active CN117095836B (en) 2023-10-17 2023-10-17 Intestinal tract preparation guiding method and system

Country Status (1)

Country Link
CN (1) CN117095836B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190279647A1 (en) * 2018-03-08 2019-09-12 Frontive, Inc. Methods and systems for speech signal processing
CN111834019A (en) * 2020-06-03 2020-10-27 四川大学华西医院 Standardized patient training method and device based on voice recognition technology
CN115083557A (en) * 2022-05-18 2022-09-20 四川声达创新科技有限公司 Intelligent generation system and method for medical record
KR20220146873A (en) * 2021-04-26 2022-11-02 한국로봇융합연구원 Device for asking and diagnosing a patient for surgery, and a method for collecting patient status information through the device
CN116013301A (en) * 2022-12-10 2023-04-25 云知声智能科技股份有限公司 Method, device, electronic equipment and medium for assisting inquiry dialogue and medical record writing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190279647A1 (en) * 2018-03-08 2019-09-12 Frontive, Inc. Methods and systems for speech signal processing
CN111834019A (en) * 2020-06-03 2020-10-27 四川大学华西医院 Standardized patient training method and device based on voice recognition technology
KR20220146873A (en) * 2021-04-26 2022-11-02 한국로봇융합연구원 Device for asking and diagnosing a patient for surgery, and a method for collecting patient status information through the device
CN115083557A (en) * 2022-05-18 2022-09-20 四川声达创新科技有限公司 Intelligent generation system and method for medical record
CN116013301A (en) * 2022-12-10 2023-04-25 云知声智能科技股份有限公司 Method, device, electronic equipment and medium for assisting inquiry dialogue and medical record writing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨丽清;杨可婷;林益平;郑娇娇;陈凤;: "智能手机多功能教育平台指导中青年人结肠镜检查肠道准备", 现代肿瘤医学, vol. 31, no. 9, pages 1022 - 1026 *

Also Published As

Publication number Publication date
CN117095836B (en) 2023-12-22

Similar Documents

Publication Publication Date Title
Fernandez-Lopez et al. Survey on automatic lip-reading in the era of deep learning
Nachmani et al. Fitting new speakers based on a short untranscribed sample
CN110600047B (en) Perceptual STARGAN-based multi-to-multi speaker conversion method
DE112017003563B4 (en) METHOD AND SYSTEM OF AUTOMATIC LANGUAGE RECOGNITION USING POSTERIORI TRUST POINT NUMBERS
CN106056207B (en) A kind of robot depth interaction and inference method and device based on natural language
Ramanarayanan et al. An investigation of articulatory setting using real-time magnetic resonance imaging
CN106782603B (en) Intelligent voice evaluation method and system
CN110675853B (en) Emotion voice synthesis method and device based on deep learning
JP6206960B2 (en) Pronunciation operation visualization device and pronunciation learning device
CN107818797A (en) Voice quality assessment method, apparatus and its system
Palaskar et al. Learned in speech recognition: Contextual acoustic word embeddings
CN108538283B (en) Method for converting lip image characteristics into voice coding parameters
WO2023035969A1 (en) Speech and image synchronization measurement method and apparatus, and model training method and apparatus
CN116312469B (en) Pathological voice restoration method based on voice conversion
CN111128211A (en) Voice separation method and device
CN108648745B (en) Method for converting lip image sequence into voice coding parameter
CN112735404A (en) Ironic detection method, system, terminal device and storage medium
JP6637332B2 (en) Spoken language corpus generation device and program thereof
Beckmann et al. Word-level embeddings for cross-task transfer learning in speech processing
CN110111778A (en) A kind of method of speech processing, device, storage medium and electronic equipment
CN117095836B (en) Intestinal tract preparation guiding method and system
Provost et al. Using emotional noise to uncloud audio-visual emotion perceptual evaluation
CN114613387A (en) Voice separation method and device, electronic equipment and storage medium
CN110310620B (en) Speech fusion method based on native pronunciation reinforcement learning
CN113096667A (en) Wrongly-written character recognition detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant