WO2020007027A1 - 线上问答方法、装置、计算机设备和存储介质 - Google Patents

线上问答方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2020007027A1
WO2020007027A1 PCT/CN2019/071524 CN2019071524W WO2020007027A1 WO 2020007027 A1 WO2020007027 A1 WO 2020007027A1 CN 2019071524 W CN2019071524 W CN 2019071524W WO 2020007027 A1 WO2020007027 A1 WO 2020007027A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
ontology
escaped
words
current
Prior art date
Application number
PCT/CN2019/071524
Other languages
English (en)
French (fr)
Inventor
朱姬渊
孙行智
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020007027A1 publication Critical patent/WO2020007027A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the present application relates to an online question and answer method, apparatus, computer equipment, and storage medium.
  • AI technology is mainly to learn the content of the text entered by the user and return the answer corresponding to the content. For example, in the medical scenario of online recommendation department, machine learning needs to recommend the corresponding department based on the text entered by the user.
  • the inventors realized that the current semantics of machine learning text is single, resulting in less content, and the output of the answer corresponding to the content is not accurate.
  • an online question and answer method is provided.
  • An online question and answer method including:
  • An online question-and-answer device includes:
  • a text acquisition module configured to receive text input by a user obtained by a terminal and clean the text
  • a first word segmentation module configured to perform word segmentation processing on the cleaned text to obtain a word segmentation
  • An escape word dictionary acquisition module configured to identify a current scene acquired by the terminal and load an escape word dictionary corresponding to the current scene
  • a derivation module for deriving the word segmentation through the escaped word library to obtain ontology words of different dimensions
  • An output module is used to find an answer corresponding to the ontology word, and output the answer.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the one or more processors are executed. The following steps: receiving the text input by the user obtained by the terminal and cleaning the text; performing word segmentation processing on the cleaned text to obtain a word segmentation; identifying the current scene obtained by the terminal and loading corresponding to the current scene An escape word library; deriving the participles through the escape word library to obtain ontology words of different dimensions; and finding an answer corresponding to the ontology word, and outputting the answer.
  • One or more non-transitory computer-readable storage media storing computer-readable instructions.
  • the one or more processors execute the following steps: Text input by the user and cleaning the text; performing word segmentation processing on the cleaned text to obtain a word segmentation; identifying a current scene obtained by the terminal, and loading an escaped word library corresponding to the current scene;
  • the escaped word library deduces the participles to obtain ontology words of different dimensions; and finds answers corresponding to the ontology words, and outputs the answers.
  • FIG. 1 is an application scenario diagram of an online question and answer method according to one or more embodiments.
  • FIG. 2 is a schematic flowchart of an online question answering method according to one or more embodiments.
  • FIG. 3 is a schematic diagram of an escape lexicon according to one or more embodiments.
  • FIG. 4 is a flowchart according to step S208 in the embodiment shown in FIG. 2.
  • FIG. 5 is a schematic diagram of a prefix tree according to one or more embodiments.
  • FIG. 6 is a schematic diagram of a directed acyclic graph according to one or more embodiments.
  • FIG. 7 is a block diagram of an online question answering apparatus according to one or more embodiments.
  • FIG. 8 is a block diagram of a computer device according to one or more embodiments.
  • the online question and answer method provided in this application can be applied to the application environment shown in FIG. 1.
  • the terminal communicates with the server through the network.
  • the terminal can obtain the text entered by the user, and then send the text entered by the user to the server, so that the server can receive the text entered by the user from the terminal, and then perform word segmentation on the text to obtain the word segmentation, and then identify the current scene obtained by the terminal. That is, the terminal obtains the current operation position and sends it to the server, so that the server can also obtain the current operation position, and then the current scene, so that the server can load the corresponding escaped word library and derive the word segmentation through the escaped word library.
  • the terminal may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server may be implemented by an independent server or a server cluster composed of multiple servers.
  • an online question and answer method is provided.
  • the method is applied to the server in FIG. 1 as an example, and includes the following steps:
  • S202 Receive the text input by the user obtained by the terminal, and clean the text.
  • the user can input text through the client in the terminal, and then the terminal obtains the text entered by the user, and the terminal sends the text entered by the user to the server, that is, the server receives the text entered by the user and obtains the text. Cleaning.
  • the text entered by the user may be text entered by the user through the virtual keyboard of the terminal, or voice input through the terminal, and the terminal converts the input voice into corresponding text and sends it to the server, so that the server receives the text entered by the user and obtained by the terminal. And clean the text.
  • the step of cleaning the text is to delete the invalid text in the text, such as Hello, Modal Words, etc.
  • the invalid text can be stored in advance. After the text is obtained, it is first matched with the stored invalid text. , Delete the invalid text, so as to avoid the impact of invalid text, improve the accuracy of the word segmentation, and delete the invalid text can also improve the word segmentation efficiency.
  • the text when the server receives text sent by multiple terminals, the text may be first placed in a receiving thread, and then the current processing volume of each server in the server cluster may be obtained, and the current processing volume of each server will be placed in the receiving based on the current processing volume of each server.
  • the text sent by the text from different terminals in the thread is processed for distribution, which can ensure the stability of the server.
  • the server may first obtain the user ID corresponding to each text, that is, the user ID of the user who uses the terminal, and query whether the received user ID has an association relationship, such as Determine whether there is a kinship relationship according to the user ID, etc.
  • the kinship relationship can be determined according to the user ID of the associated user that the user has input to the terminal in advance.
  • two texts are sent to the same server for processing because At this time, there may be similarities between the texts sent by the two user terminals, so that the obtained word segmentation may be the same, and then they can be merged in the subsequent derivation process, which can reduce the processing amount.
  • word segmentation is performed on the cleaned text to obtain a word segmentation
  • the obtained text is generally segmented in the form of a sentence to obtain a plurality of word segmentations having independent meanings.
  • the current scene refers to the client's scene where the user is operating when using the terminal.
  • the scene is preset when the client is designed, and may include, for example, a department recommendation scene, a drug recommendation scene, a doctor recommendation scene, etc.
  • the terminal may obtain the corresponding current scene according to the position of the current client where the user operates, or obtain the corresponding current scene according to the way of the flag.
  • Escaping thesaurus is a thesaurus that converts participles into ontology words of different dimensions. It stores the escape relations between the participles and the ontology words of different dimensions.
  • the library may be converted to ⁇ Site: Abdomen, Symptoms: Pain ⁇ .
  • the dimensions of the ontology words in the escaped thesaurus can include: population, system division, parts and organs, symptoms, etiology, inspection, medicine, and clinical treatment.
  • the ontology escape is a mapping relationship in the escape lexicon that can be directly escaped to obtain ontology words of different dimensions.
  • the approximate escape relationship is a mapping relationship that converts one participle into another participle in the escaped word library. For details, please refer to the escaped word library shown in FIG. 3.
  • different scenarios correspond to different escaped vocabularies. This is because in different scenarios, the same participle may correspond to different ontology words. For example, in the department recommended scenario, fever may correspond to internal medicine, but in medicine In the recommended scenario, fever may correspond to a cold, so after obtaining the corresponding scenario, the server first loads the escaped word dictionary corresponding to the scenario to lay the foundation for the next derivation.
  • the server matches the word segmentation obtained by the word segmentation with the corresponding word in the escaped word library, for example, the obtained word segmentation is different from the word in the escaped word library.
  • Words can be matched to obtain ontology words in different dimensions, that is, the server matches the word segmentation with different words in the escaped word library.
  • the match is successful, the dimension corresponding to the word is obtained, and the dimension is matched with the word
  • the words are output. For example, when matching to the abdomen, the dimension of the abdomen is obtained as a part, and then the part is output as "abdomen”. When it is matched, the dimension of the pain is obtained as a symptom, and the output is "symptom: pain”.
  • the matching method can adopt fuzzy matching, which can improve the success rate of matching.
  • the corresponding answers are obtained by performing logical operations on the ontology words, for example, matching the ontology words with the corresponding question-answering knowledge base, so that the corresponding answers can be obtained, and
  • the obtained answers can be sorted according to the matching rate, so that the top-ranked answers can be pushed first.
  • the matching rate may be a ratio of the number of ontology words that match the answer to the number of all ontology words.
  • the word segmentation processing is performed first, and then the corresponding escaped word library is loaded according to the current scene, so that the ontology words of different dimensions corresponding to the word segmentation can be derived from the escaped word library.
  • the answers corresponding to the ontology words in different dimensions can be obtained.
  • the word segmentation is expanded through the escaped word library, which enriches the user's semantics and can extract more user information, thereby improving the accuracy of the answer. Sex.
  • FIG. 4 is a flowchart of step S208 in the embodiment shown in FIG. 2.
  • This step S208 that is, deriving the word segmentation through the escaped word dictionary to obtain ontology words of different dimensions, can be include:
  • the escaped word library can be specifically shown in FIG. 3 above.
  • the server first obtains the current word segmentation, and then detects whether there is an ontology word corresponding to the current word segmentation in the escaped word dictionary, that is, first detects whether the escaped word dictionary exists Ontology words that match the current participle, where the matching method can be performed by fuzzy matching. And optionally, in order to improve the matching efficiency, synchronization matching may be performed in different threads, that is, multiple word segmentation is distributed and balanced in different threads for synchronous matching, thereby improving the matching efficiency.
  • the server outputs the ontology words of this different dimension.
  • synonyms are words that have a close relationship with the current participle.
  • the server searches whether there is a synonym corresponding to the current participle in the escaped word library.
  • the approximate relation database of synonyms is searched, and the approximate relation database stores the approximate relationship between the word segmentation and the synonyms, that is, the approximate relation database is first searched for whether there are pre-retrieved synonyms corresponding to the current word segmentation, and then the pre-retrieval relation database is obtained from the escape relation database. retrieve synonyms for synonyms.
  • the search is continued in the escape relation database through the synonym, that is, the ontology word corresponding to the synonym is obtained, so that the difference corresponding to the synonym can be output.
  • the ontology of the dimension When no synonyms are retrieved, the server returns a processing result without a retrieval result to the terminal.
  • the server first obtains the vocabulary after the word segmentation, and then performs the ontology relation search.
  • the ontology word exists, that is, the ontology word corresponding to the vocabulary word is present, the ontology word and the part of speech of the ontology word are output (That is, the corresponding dimension);
  • the synonyms search is continued, that is, the search is performed by approximate relationship, and when no synonyms are retrieved, no result is output
  • the synonym continues to be used as a vocabulary output to perform ontology relation retrieval until there is an ontology output or until there are no other synonyms.
  • the retrieval is performed by the ontology relationship in the escaped word library first, and when the retrieval fails, the retrieval is performed by the approximate relationship in the escaped word library, which improves the accuracy of the retrieval result.
  • acquiring the current scene may include: receiving a current operation position acquired by the terminal, and identifying the current scene according to the current operation position.
  • the method may further include: selecting a core keyword from the segmentation.
  • it may include: deriving the core keywords from the escaped word dictionary to obtain the ontology words of different dimensions.
  • the acquisition scenario is that the terminal first obtains the scenario according to the operation location where the user is located, that is, obtains the operation location where the user is located according to a pre-buried point, and then sends the operation location to the server, so that the server can obtain the corresponding current The operating position, so as to determine the scene where the operating position is located, because the scene is preset when designing the client, that is, when the client is designed, the mapping relationship between the buried point and the scene is first established, that is, the mapping relationship between the operating position and the scene.
  • the server obtains the current operation position, it first obtains the corresponding scene according to the preset mapping relationship between the operation position and the scene, such as a department recommended scene, a drug recommended scene, or a doctor recommended scene.
  • the server can obtain the corresponding Escaping thesaurus, which can avoid the difference of the escaped thesaurus in different scenarios, and the occurrence of the mismatch of the ontology words caused by the different meanings of the same word in different scenarios, which improves the matching Accuracy, and select one of the escaped lexicons to match Matching, which can reduce the number of matching times and improve the efficiency of matching.
  • the acquisition of core keywords is set according to specific scenarios and manually selected, that is, after segmentation, not all segmentations are expanded through the escaped word library, but the vocabulary after segmentation is output,
  • the core keywords are manually selected and marked, so only the core keywords need to be expanded by the escaped word library.
  • the core keyword may also be automatically selected and marked by the server.
  • a core keyword database may be preset, and after the word segmentation is performed, the word segmentation is matched with the core keyword database, and the successfully matched word segmentation is marked. As the core keywords.
  • the core keywords are first obtained from the segmentation, and it is not necessary to match all the segmentation with the escaped word library, which improves the matching efficiency.
  • the server can obtain the current scene returned by the terminal, so that it can According to the current scene, the corresponding escaped word library is selected to further narrow the matching range and improve the matching efficiency, and selecting the correct escaped word library can improve the matching accuracy rate.
  • word segmentation is performed on the obtained text to obtain a word segmentation, which may include: loading a preset dictionary, and generating a prefix tree according to the loaded preset dictionary; generating a directed tree based on the prefix tree and words in the text
  • a cyclic graph a directed acyclic graph is used to indicate the situation in which words in a text can be composed of words; a dynamic path is used to find the maximum probability path in a directed acyclic graph, and to obtain the word segmentation corresponding to the maximum probability path.
  • the following processing may be performed: selecting words that do not appear in the directed acyclic graph; obtaining a preset hidden Markov model; Hidden Markov model analyzes the selected words by word segmentation.
  • the server may first load a pre-stored dictionary.
  • the dictionary may be a dictionary downloaded from the Internet, or a dictionary generated according to various medical websites, or a user-defined dictionary.
  • the server generates a prefix tree according to the dictionaries.
  • the basic properties of the prefix tree include that the root node does not contain characters, and each child node except the root node contains one character. From the root node to a certain node, the characters passing through the path are concatenated to form the character string corresponding to the node. All children of each node contain characters that are different from each other. Repeated characters from the first character occupy only one node, such as to, and ten in FIG. 5, and the repeated word t only occupies one node.
  • the server generates a directed acyclic graph according to the prefix tree and the words in the text.
  • the directed acyclic graph is used to represent the situation in which the words in the text can form words. Specifically, refer to FIG. 6, which is an implementation.
  • FIG. 6, which is an implementation.
  • the prefix tree copied from the text in the text is obtained, and then the root of the prefix tree is obtained.
  • the nodes generate corresponding directed acyclic graphs.
  • the server searches the maximum probability path in the directed acyclic graph through the dynamic path and obtains the word segmentation corresponding to the maximum probability path.
  • the dynamic programming is based on the directed acyclic graph. First, it searches for the text to be segmented. The words that have been segmented are searched for the frequency of occurrence of the word (number of times / total number, the frequency and part-of-speech of each word are given in the dictionary). If the word is not in the dictionary, the frequency of occurrence in the dictionary will be found The frequency of the smallest word is used as the frequency of the word, and then the path of maximum probability is calculated from right to left. That is, the path with the highest probability is obtained by multiplying the frequencies from right to left. As shown in FIG. 6, the probability of having-opinion-disagreement is the largest, and the resulting segmentations are “yes”, “opinion” and “disagreement”.
  • the server selects words that do not appear in the directed acyclic graph from the text; obtains a preset hidden Markov model; and analyzes the selected words by using the hidden Markov model.
  • Chinese vocabulary is marked according to the four states of BEMS, B is the start begin position, E is end, it is the end position, M is middle, it is the middle position, S is the singgle, the position of the individual word, there is no front, no back, and That is to say, the four states of (B, E, M, S) are used to mark Chinese words.
  • Beijing can be labeled as BE, that is, North / B Jing / E, which means that North is the starting position and Beijing is the ending position.
  • the Chinese nation can be labeled as BMME, that is, start, middle, middle, and end, so that the server can obtain the word segmentation results of words that do not appear in the directed acyclic graph according to the start and end positions.
  • multi-level word segmentation is performed on the obtained text through a directed acyclic graph, dynamic path interpolation, and hidden Markov model, so that the word segmentation result is more reliable and accurate.
  • the above online question answering method may further include: receiving a management instruction for the ontology word; and modifying the corresponding ontology word according to the management instruction.
  • the user can add, import or export ontology words through the ontology management tool on the server. For example, when adding an ontology word, input the corresponding synonyms, ontology words, part of speech, etc., and save it. You can also select the ontology words that need to be exported, so that the exported ontology words are imported into other escaped word libraries, and the ontology words to be imported are fine-tuned and modified as needed, which can reduce the workload and so on.
  • the management of the ontology words is also involved, so that the ontology words can be updated in real time, that is, the escaped word library is updated in real time, so that the updated escaped word library is used for derivation, and the derivation result is more accurate.
  • the user enters the corresponding text in the client provided by the server through the terminal, and then the terminal packages and encrypts the text entered by the user and sends it to the server.
  • the server decrypts and decompresses the received text to obtain the original text.
  • the server can also The obtained original text is cleaned, as described above, for example, removing the mood word, etc.
  • the user can enter "stomach pain" in the client, so that the server can obtain the text "stomach pain” entered by the user.
  • the terminal can also set a limit on the number of words the user can enter, for example, at least n words must be entered, where n can be 3, 10, etc., and there is no specific limitation here.
  • the text is segmented.
  • the preset dictionary is loaded first, and a prefix tree is generated, and the input text "belly”, “child”, and “pain” are sequentially sorted with those in the prefix tree.
  • the text is matched, and the maximum probability path is obtained according to the directed acyclic graph, and the word segmentation corresponding to the maximum probability path is obtained.
  • “stomach pain” has two paths in the directed acyclic graph, the first is “belly-child” “Pain”, the second is “belly-ache”, and the probability of “belly-ache” in the second is greater than the probability of "belly-ache” in the second, so the second path of "belly-ache” is selected.
  • the word segmentation can be divided into two parts: “belly” and “pain”.
  • the server also needs to obtain the current scene, that is, when the terminal sends the package to send text, it also needs to send the current buried point position.
  • the server obtains the current operating position based on the current buried point position, and then maps the pre-stored operating position to the scene The relationship can obtain the current scene, so that the escaped word library corresponding to the current scene can be loaded to ensure that the ontology words obtained by the escape are accurate.
  • the server needs to process the obtained word segmentation, such as extracting core keywords, etc.
  • the server After the server has loaded the escaped word library, it will enter the word segmentation into the escaped word library for derivation to obtain different dimensions.
  • Ontology words for example, are derived from the escaped word library through the stomach. Assuming that there is a corresponding ontology word on the stomach, the ontology can be escaped to obtain the abdomen. The server then obtains the dimensions corresponding to the abdomen, so it can output "parts: abdomen".
  • the server search for answers according to the ontology words of different dimensions obtained by the server.
  • the corresponding answers are obtained according to "Section: Abdomen” and "Symptoms: Pain” above.
  • the word segmentation processing is performed first, and then the corresponding escaped word library is loaded according to the current scene, so that the ontology words of different dimensions corresponding to the word segmentation can be derived from the escaped word library, so that The answers corresponding to the ontology words in different dimensions are obtained, which improves the accuracy of the answers.
  • steps in the flowcharts of FIG. 2 and FIG. 4 are sequentially displayed according to the directions of the arrows, these steps are not necessarily performed sequentially in the order indicated by the arrows. Unless explicitly stated in this document, the execution of these steps is not strictly limited, and these steps can be performed in other orders. Moreover, at least a part of the steps in FIG. 2 and FIG. 4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. These sub-steps or The execution order of the phases is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a part of the sub-steps or phases of other steps.
  • an online question and answer device including: a text acquisition module 100, a first word segmentation module 200, an escaped word library acquisition module 300, a derivation module 400, and an output module 500. among them:
  • the text obtaining module 100 is configured to receive a text input by a user and obtained by the terminal, and clean the text.
  • the first word segmentation module 200 is configured to perform word segmentation processing on the cleaned text to obtain a word segmentation.
  • the escape word dictionary acquisition module 300 is configured to identify the current scene acquired by the terminal and load an escape word dictionary corresponding to the current scene.
  • a derivation module 400 is used to derive the word segmentation through the escaped thesaurus to obtain ontology words of different dimensions.
  • the output module 500 is configured to find an answer corresponding to the ontology word and output the answer.
  • the derivation module 400 includes:
  • the first retrieval unit is configured to retrieve whether an ontology word corresponding to the current participle exists in the escaped word library.
  • the first output unit is configured to: when an ontology word corresponding to the current participle exists in the escaped word library, perform dimension processing on the ontology word to obtain an ontology word of different dimensions, and output an ontology word of different dimensions.
  • the second retrieval unit is configured to retrieve whether there is a synonym corresponding to the current participle in the escaped word dictionary when the ontology word corresponding to the current participle does not exist.
  • the second output unit is used to update the current participle through the synonyms when the near-term word corresponding to the current participle exists in the escaped word library, and continue to retrieve whether there is an ontology word corresponding to the current participle in the escaped word library.
  • the escaped thesaurus acquisition module 300 is further configured to receive the current operation position acquired by the terminal, and obtain the current scene according to the current operation position identification.
  • the apparatus further includes:
  • the first selection module is used to select core keywords from the segmentation.
  • the derivation module 400 is also used to derive core words from different dimensions by deriving core keywords through an escaped lexicon.
  • the first word segmentation module 200 includes:
  • a loading unit configured to load a preset dictionary and generate a prefix tree according to the loaded preset dictionary.
  • a directed acyclic graph generating unit is used to generate a directed acyclic graph according to the prefix tree and the words in the text, and the directed acyclic graph is used to indicate the situation in which the words in the text can form words.
  • Word segmentation unit is used to find the maximum probability path in a directed acyclic graph through dynamic paths, and obtain the word segmentation corresponding to the maximum probability path.
  • the apparatus further includes:
  • the second selection module is used to select words that do not appear in the directed acyclic graph from the text.
  • a model acquisition module is used to acquire a preset hidden Markov model.
  • the second word segmentation module is used to perform word segmentation processing on the selected word through Hidden Markov Model to obtain analysis.
  • the apparatus further includes:
  • a receiving module for receiving a management instruction for an ontology word.
  • the modification module is used to modify the corresponding ontology according to the management instruction.
  • Each module in the above-mentioned online question answering device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 8.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in a non-volatile storage medium.
  • the database of the computer equipment is used to store the data of the escaped thesaurus.
  • the computer device's network interface is used to communicate with external terminals via a network connection.
  • the computer-readable instructions are executed by a processor to implement an online question and answer method.
  • FIG. 8 is only a block diagram of a part of the structure related to the scheme of the present application, and does not constitute a limitation on the computer equipment to which the scheme of the present application is applied.
  • the specific computer equipment may be Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.
  • a computer device includes a memory and one or more processors.
  • Computer-readable instructions are stored in the memory.
  • the one or more processors execute the following steps: The text entered by the user and the text is cleaned; the cleaned text is segmented to obtain the word segmentation; the current scene obtained by the terminal is recognized, and an escaped word dictionary corresponding to the current scene is loaded; the word segmentation is derived through the escaped word dictionary Get ontology words of different dimensions; and find the answer corresponding to the ontology word, and output the answer.
  • the processor can execute computer-readable instructions to obtain the ontology words of different dimensions through derivation of the word segmentation through the escaped word library, which can include: retrieving whether an escaped word dictionary corresponds to the current word segmentation Ontology words; when the ontology words corresponding to the current participle exist in the escaped word dictionary, the ontology words are dimensionally processed to obtain the ontology words of different dimensions, and the ontology words of different dimensions are output; When the ontology corresponding to the current participle is searched, whether there is a synonym corresponding to the current participle in the escaped word library; and when there is a synonym corresponding to the current participle in the escaped word library, the current participle is updated through the synonyms and the search is continued. Whether there is an ontology corresponding to the current participle in the semantic dictionary.
  • identifying the current scene acquired by the terminal when the processor executes the computer-readable instructions may include: receiving the current operation position acquired by the terminal, and identifying the current scene according to the current operation position.
  • the method may further include: selecting core keywords from the word segmentation; and the processor executing the computer-readable instructions.
  • the derivation of the word segmentation through the escaped thesaurus to obtain ontology words of different dimensions can include: deriving the core keywords from the escaped thesaurus to obtain ontology words of different dimensions.
  • the word segmentation performed on the obtained text by the processor when the processor executes the computer-readable instructions may include: loading a preset dictionary, and generating a prefix tree according to the loaded preset dictionary; The prefix tree and the words in the text generate a directed acyclic graph.
  • the directed acyclic graph is used to indicate the situation in which the words in the text can form words; and the dynamic path is used to find the maximum probability path in the directed acyclic graph, and Get the word segmentation corresponding to the maximum probability path.
  • the processor when the processor executes the computer-readable instructions, it also implements the following steps: selecting a word from the text that does not appear in the directed acyclic graph; obtaining a preset hidden Markov model; and The Cove model analyzes the selected words by word segmentation.
  • the processor when the processor executes the computer-readable instructions, the processor further implements the following steps: receiving a management instruction for the ontology word; and modifying the corresponding ontology word according to the management instruction.
  • One or more non-transitory computer-readable storage media storing computer-readable instructions.
  • the one or more processors execute the following steps: The text entered by the user and the text is cleaned; the cleaned text is segmented to obtain the word segmentation; the current scene obtained by the terminal is recognized, and an escaped word dictionary corresponding to the current scene is loaded; the word segmentation is derived through the escaped word dictionary Get ontology words of different dimensions; and find the answer corresponding to the ontology word, and output the answer.
  • the computer-readable instructions are implemented by the processor to obtain the ontology words of different dimensions through derivation of the word segmentation through the escaped word library, which may include: retrieving whether an escaped word dictionary corresponds to the current word segmentation When there is an ontology word corresponding to the current participle in the escaped word library, the ontology words are dimensionally processed to obtain ontology words of different dimensions, and the ontology words of different dimensions are output; when the escaped word dictionary does not exist When the ontology corresponding to the current participle is searched, whether there is a synonym corresponding to the current participle in the escaped word library; and when there is a synonym corresponding to the current participle in the escaped word library, the current participle is updated by the synonyms and the search is continued Whether there is an ontology corresponding to the current participle in the escaped word library.
  • the recognition of the current scene acquired by the terminal when the computer-readable instructions are executed by the processor may include: receiving the current operating position obtained by the terminal, and identifying the current scene according to the current operating position.
  • the method may further include: selecting core keywords from the word segmentation; and the computer-readable instructions are processed.
  • the implementation of the implementation of the tokenizer to obtain the ontology words of different dimensions by deriving the word segmentation through the escaped thesaurus may include: deriving the ontology words of different dimensions from the core keywords through the escaped thesaurus.
  • the word segmentation performed on the obtained text by the processor when the computer-readable instructions are executed by the processor to obtain the word segmentation may include: loading a preset dictionary, and generating a prefix tree according to the loaded preset dictionary; Generate a directed acyclic graph based on the prefix tree and the words in the text.
  • the directed acyclic graph is used to indicate the situation in which words in the text can form words; and the dynamic path is used to find the maximum probability path in the directed acyclic graph. And get the word segmentation corresponding to the maximum probability path.
  • the following steps are also implemented: selecting words that do not appear in the directed acyclic graph from the text; obtaining a preset hidden Markov model; and The Markov model analyzes the selected words by word segmentation.
  • the following steps are further implemented: receiving a management instruction for the ontology word; and modifying the corresponding ontology word according to the management instruction.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM dual data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种线上问答方法,包括:接收终端获取的用户输入的文本,并对所述文本进行清洗;对清洗后的文本进行分词处理得到分词;识别终端获取的当前场景,并加载与所述当前场景对应的转义词库;通过所述转义词库对所述分词进行推导得到不同维度的本体词;查找与所述本体词对应的答案,并输出所述答案。

Description

线上问答方法、装置、计算机设备和存储介质
相关申请的交叉引用
本申请要求于2018年7月4日提交中国专利局,申请号为2018107246123,申请名称为“线上问答方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及一种线上问答方法、装置、计算机设备和存储介质。
背景技术
AI技术主要是机器学习用户输入的文本的内容,并返回与该内容对应的答案,例如当线上推荐科室的医疗场景下,机器学习需要根据用户输入的文本推荐相应的科室等。
然而,发明人意识到,目前的机器学习文本的语义单一,导致所得到的内容较少,从而输出的与该内容对应的答案不准确。
发明内容
根据本申请公开的各种实施例,提供一种线上问答方法、装置、计算机设备和存储介质。
一种线上问答方法,包括:
接收终端获取的用户输入的文本,并对所述文本进行清洗;
对清洗后的所述文本进行分词处理得到分词;
识别所述终端获取的当前场景,并加载与所述当前场景对应的转义词库;
通过所述转义词库对所述分词进行推导得到不同维度的本体词;及
查找与所述本体词对应的答案,并输出所述答案。
一种线上问答装置,包括:
文本获取模块,用于接收终端获取的用户输入的文本,并对所述文本进行清洗;
第一分词模块,用于对清洗后的所述文本进行分词处理得到分词;
转义词库获取模块,用于识别所述终端获取的当前场景,并加载与所述当前场景对应的转义词库;
推导模块,用于通过所述转义词库对所述分词进行推导得到不同维度的本体词;及
输出模块,用于查找与所述本体词对应的答案,并输出所述答案。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步 骤:接收终端获取的用户输入的文本,并对所述文本进行清洗;对清洗后的所述文本进行分词处理得到分词;识别所述终端获取的当前场景,并加载与所述当前场景对应的转义词库;通过所述转义词库对所述分词进行推导得到不同维度的本体词;及查找与所述本体词对应的答案,并输出所述答案。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:接收终端获取的用户输入的文本,并对所述文本进行清洗;对清洗后的所述文本进行分词处理得到分词;识别所述终端获取的当前场景,并加载与所述当前场景对应的转义词库;通过所述转义词库对所述分词进行推导得到不同维度的本体词;及查找与所述本体词对应的答案,并输出所述答案。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为根据一个或多个实施例中线上问答方法的应用场景图。
图2为根据一个或多个实施例中线上问答方法的流程示意图。
图3为根据一个或多个实施例中转义词库的示意图。
图4为根据图2所示实施例中的步骤S208的流程图。
图5为根据一个或多个实施例中的前缀树的示意图。
图6为根据一个或多个实施例中的有向无环图的示意图。
图7为根据一个或多个实施例中线上问答装置的框图。
图8为根据一个或多个实施例中计算机设备的框图。
具体实施方式
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的线上问答方法,可以应用于如图1所示的应用环境中。终端通过网络与服务器进行通信。终端可以获取到用户输入的文本,然后将用户输入的文本发送到服务器,从而服务器可以接收终端发送的用户输入的文本,然后对该文本进行分词处理得到分词,然后识别终端获取到的当前场景,即终端获取到当前操作位置并发送到服务器,从而服务器也可以获取到当前操作位置,进而获取到当前场景,从而服务器可以加载对应的转义词 库,并通过转义词库对分词进行推导得到不同维度的本体词,进而服务器查询与该本体词对应的答案,输出该答案至终端,完成线上问答过程。终端可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在其中一个实施例中,如图2所示,提供了一种线上问答方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:
S202:接收终端获取的用户输入的文本,并对文本进行清洗。
具体地,用户可以通过终端中的客户端输入文本,然后终端获取到用户所输入的文本,终端将用户所输入的文本发送至服务器,即服务器接收终端获取的用户输入的文本,并对文本进行清洗。
用户输入的文本可以是用户通过终端的虚拟键盘输入的文本,或者是通过终端输入的语音,并且终端将输入的语音转化为相应的文本后发送至服务器,从而服务器接收终端获取的用户输入的文本,并对文本进行清洗。
可选地,对文本进行清洗的步骤是将文本中无效的文本进行删除,例如您好、语气词等,该无效文本可以预先进行存储,在获取到文本后,首先与存储的无效文本进行匹配,删除掉无效文本,从而可以避免分词时,无效文本的影响,提高了分词的准确率,且删除掉无效文本还可以提高分词的分词效率。
可选地,当服务器接收到多个终端发送的文本时,可以首先将文本放在接收线程中,然后获取到服务器集群中各个服务器的当前处理量,根据各个服务器的当前处理量将放在接收线程中的来自不同终端的文本发送的文本进行分配处理,从而可以保证服务器的稳定性。且可以选地,服务器在接收到来自各个终端的文本后,还可以首先获取到各个文本对应的用户标识,即使用终端的用户的用户标识,查询所接收到的用户标识是否存在关联关系,例如根据用户标识判断是否存在亲属关系等,该亲属关系可以是根据用户预先输入至终端的关联用户的用户标识来进行判断,当存在亲属关系时,则将两个文本发送至同一服务器进行处理,因为此时两个用户终端发送的文本可能存在相似性,从而所获取的分词可能会相同,进而在后续的推导过程中可以进行合并,从而可以减少处理量。
S204:对清洗后的文本进行分词处理得到分词。
具体地,对清洗后的文本进行分词处理得到分词,即时将所获取的文本,一般是以句子形式的文本进行分词,得到具有独立含义的多个分词。
S206:识别终端获取的当前场景,并加载与当前场景对应的转义词库。
具体地,当前场景是指用户在使用终端进行操作时所处于的客户端的场景,其中该场景是在设计客户端的时候预置的,例如可以包括科室推荐场景、药品推荐场景、医生推荐场景等,终端可以根据用户操作所处于的当前客户端的位置来获取到对应的当前场景,或者是根据标志位的方式获取到对应的当前场景。
转义词库是用于将分词转换成多个不同维度的本体词的词库,其中存储了分词和不同 维度的本体词之间的转义关系,例如分词“肚子疼”其通过转义词库可能转换为{部位:腹部,症状:疼痛}。其中转义词库中本体词的维度可以包括:人群,系统划分,部位和器官,症状,病因,检查,药品,临床处置。且转义词库中存在两类转义关系,包括本体转义关系和近似转义关系,其中本体转义转系是转义词库中可以直接进行转义得到不同维度的本体词的映射关系,近似转义关系是转义词库中将一个分词转换为另外一个分词的映射关系。具体可以参见图3所示的转义词库。
在服务器中,不同的场景对应了不同的转义词库,这是由于在不同的场景下,相同的分词可能对应不同的本体词,例如在科室推荐场景下,发烧可能对应内科,但是在药品推荐场景下,发烧可能对应感冒,因此在获取到对应的场景后,首先服务器加载与场景对应的转义词库,以为下一步推导奠定基础。
S208:通过转义词库对分词进行推导得到不同维度的本体词。
具体地,结合图3,服务器在加载完对应的转义词库后,将分词处理得到的分词与转义词库中对应的词语进行匹配,例如将得到的分词与转义词库中不同的词语进行匹配,从而可以获取到不同维度的本体词,即服务器将分词与转义词库中的不同词语进行匹配,当匹配成功时,则获取到该词语对应的维度,并将该维度与该词语进行输出,例如当匹配到腹部时,则获取腹部的维度为部位,则输出“部位:腹部”,当匹配到疼痛时,则获取到疼痛的维度为症状,则输出“症状:疼痛”。其中匹配的方式可以采用模糊匹配,从而可以提高匹配的成功率。
S210:查找与本体词对应的答案,并输出答案。
具体地,在服务器匹配得到不同维度的本体词后,则通过该些本体词进行逻辑运算得到对应的答案,例如将本体词与对应的问答知识库进行匹配,从而可以获取到对应的答案,且可选地,可以根据匹配率对所获取的答案进行排序,从而将可以将排序靠前的答案优先推送。其中匹配率可以是与答案相匹配的本体词的数量与所有本体词的数量的比值。
上述线上问答方法,在获取到用户输入的文本后,首先进行分词处理,然后根据当前场景加载对应的转义词库,从而可以根据转义词库推导出分词对应的不同维度的本体词,从而可以获取到不同维度的本体词对应的答案,采用该种方式,通过转义词库对分词进行了扩展,丰富了用户的语义,可以提取到更多的用户信息,从而提高了答案的准确性。
在其中一个实施例中,参阅图4,图4为图2所示实施例中的步骤S208的流程图,该步骤S208,即通过转义词库对分词进行推导得到不同维度的本体词,可以包括:
S402:检索转义词库中是否存在与当前分词对应的本体词。
具体地,转义词库具体可以参见上文图3所示,服务器首先获取当前分词,然后检测转义词库中是否存在与当前分词对应的本体词,即首先检测转义词库中是否存在与当前分词向匹配的本体词,其中匹配方式可以通过模糊匹配进行。且可选地,为了提高匹配效率,可以分线程进行同步匹配,即多个分词均衡分配在不同的线程中进行同步匹配,从而可以提高匹配效率。
S404:当转义词库中存在与当前分词对应的本体词时,则对本体词进行维度处理得到不同维度的本体词,并输出不同维度的本体词。
具体地,当转义词库中存在与当前分词对应的本体词时,则获取到本体词对应的维度,例如上述的人群,系统划分,部位和器官,症状,病因,检查,药品,临床处置,比如:{部位:腹部,症状:疼痛},服务器输出该不同维度的本体词。
S406:当转义词库中不存在与当前分词对应的本体词时,检索转义词库中是否存在与当前分词对应的近义词。
具体地,近义词是指与当前分词具有近似关系的词汇,当服务器未检索到与当前分词对应的本体词时,则检索转义词库中是否存在与当前分词对应的近义词,其中可以根据分词与近义词的近似关系库进行检索,该近似关系库中存储了分词与近义词的近似关系,即首先检索近似关系库是否存在与当前分词对应的预检索近义词,然后从转义关系库中获取到与预检索近义词对应的近义词。
S408:当转义词库中存在与当前分词对应的近义词时,则通过近义词更新当前分词,并继续检索转义词库中是否存在与当前分词对应的本体词。
具体地,当转义关系库中存在与当前分词对应的近义词时,则通过该近义词在转义关系库中继续进行检索,即获取到该近义词对应的本体词,从而可以输出该近义词对应的不同维度的本体词,当未检索到近义词时,则服务器向终端返回无检索结果的处理结果。
在实际应用中,服务器首先获取到分词后的词汇,然后进行本体关系检索,当存在本体词时,即存在与分词后的词汇对应的本体词时,则输出该本体词以及该本体词的词性(即对应的维度);当不存在本体时,即不存在与分析后的词汇对应的本体词时,则继续进行近义词检索,即通过近似关系进行检索,当未检索到近义词,则无结果输出,当检索到近义词时,则继续以该近义词为词汇输出,进行本体关系检索,直至存在本体词输出或者是直至不存在其他的近义词。
上述实施例中,首先进行通过转义词库中的本体关系进行检索,当检索失败时,再通过转义词库中的近似关系进行检索,提高了检索结果的准确性。
在其中一个实施例中,获取当前场景,可以包括:接收终端获取的当前操作位置,根据当前操作位置识别得到当前场景。在其中一个实施例中,对所获取的文本进行分词处理得到分词之后,还可以包括:从分词中选取核心关键词。从而通过转义词库对分词进行推导得到不同维度的本体词,可以包括:通过转义词库对核心关键词进行推导得到不同维度的本体词。
具体地,获取场景是终端首先根据用户所处的操作位置得到场景,即根据预先的埋点获取用户所处的操作的位置,从而将该操作位置发送至服务器,从而服务器可以获取到对应的当前操作位置,从而判断该操作位置所处于的场景,由于场景是在设计客户端的时候预置的,即在设计客户端的时候首先建立埋点与场景的映射关系,即操作位置与场景的映射关系,当服务器获取到当前操作位置时,则首先根据预置的操作位置与场景的映射关系 得到对应的场景,例如科室推荐场景、药品推荐场景还是医生推荐场景等,进而服务器可以根据场景获取到对应的转义词库,这样可以避免不同的场景的转义词库的不同,且同一个词在不同的场景下其可能代表的含义不同所造成的本体词匹配错误的情况的出现,提高了匹配的准确率,且选择其中的一个转义词库进行匹配,从而可以降低匹配的次数,提高匹配的效率。
具体地,核心关键词的获取是按照具体的场景设置并有人工进行挑选的,即当进行分词后,并不是所有分词都通过转义词库进行扩展,而是将分词后的词汇进行输出,由人工选择出核心关键词并进行标记,从而仅需要将核心关键词通过转义词库进行扩展。可选地,该核心关键词还可以是服务器自动进行选择并标记,例如可以预置一核心关键词库,当进行分词后,则将分词与核心关键词库进行匹配,匹配成功的分词则标记为核心关键词。
上述实施例中,首先从分词中获取了核心关键词,并不需要所有的分词都去与转义词库进行匹配,提高了匹配效率,其次,服务器可以获取到终端返回的当前场景,从而可以根据当前场景选择对应的转义词库,进一步缩小匹配范围,提高匹配效率,且选择正确的转义词库,可以提高匹配正确率。
在其中一个实施例中,对所获取的文本进行分词处理得到分词,可以包括:加载预设字典,并根据所加载的预设字典生成前缀树;根据前缀树以及文本中的字生成有向无环图,有向无环图用于表示文本中的字所能组成词语的情况;通过动态路径查找有向无环图中的最大概率路径,并获取与最大概率路径对应的分词。且可以选地,对于没有出现在有向无环图中的字还可以进行如下处理:从文本中选取未出现在有向无环图中的字;获取预设的隐马尔科夫模型;通过隐马尔科夫模型对所选取的字进行分词处理得到分析。
具体地,服务器可以首先加载预先存储的字典,该字典可以是从网上下载的字典,或者是根据各种医疗网站等生成的字典,或者是用户自定义的字典,服务器根据该些字典生成前缀树,具体可以参见图5所示,其中该前缀树的基本性质包括根节点不包含字符,除根节点外的每一个子节点都包含一个字符。从根节点到某一个节点,路径上经过的字符连接起来,为该节点对应的字符串。每个节点的所有子节点包含的字符互不相同。从第一字符开始有连续重复的字符只占用一个节点,比如图5中的to,和ten,中重复的单词t只占用了一个节点。
其次服务器根据前缀树以及文本中的字生成有向无环图,有向无环图用于表示文本中的字所能组成词语的情况,具体地,参见图6所示,图6为一实施例中的有向无环图的示意图,其中有向无环图是根据前缀树的每一个根节点进行生成的,首先获取到与文本中的字复印的前缀树,然后根据该前缀树的根节点生成对应的有向无环图。
第三,服务器通过动态路径查找有向无环图中的最大概率路径,并获取与最大概率路径对应的分词,具体地,动态规划是基于有向无环图进行的,首先查找待分词文本中已经切分好的词语,对该词语查找该词语出现的频率(次数/总数,在字典中给出了每个词的频率和词性),如有字典中没有该词语,则将词典中出现频率最小的那个词语的频率作为 该词的频率,然后从右往左计算最大概率路径。即通过从右往左的频率相乘得到的概率最大的路径。如图6中,有-意见-分歧的概率最大,则最后得到的分词为“有”、“意见”和“分歧”。
第四,服务器从文本中选取未出现在有向无环图中的字;获取预设的隐马尔科夫模型;通过隐马尔科夫模型对所选取的字进行分词处理得到分析。中文词汇按照BEMS四个状态来标记,B是开始begin位置,E是end,是结束位置,M是middle,是中间位置,S是singgle,单独成词的位置,没有前,也没有后,也就是说,采用了状态为(B,E,M,S)这四种状态来标记中文词语,比如北京可以标注为BE,即北/B京/E,表示北是开始位置,京是结束位置,中华民族可以标注为BMME,就是开始、中间、中间、结束,从而服务器根据开始和结束位置即可以获取到没有出现在有向无环图中的字的分词结果。
上述实施例中,通过有向无环图、动态路径插在、隐马尔科夫模型对获取到的文本进行多层次的分词,使得分词结果更加可靠、准确。
在其中一个实施例中,上述线上问答方法还可以包括:接收针对本体词的管理指令;根据管理指令对对应的本体词进行修改。
具体地,用户可以在服务器通过本体词管理工具新增、导入或导出本体词,例如当新增本体词时,则输入对应的近义词、本体词、词性等,并进行保存。还可以选择需要导出的本体词,从而将导出的本体词导入到其他的转义词库中,并根据需要对将要导入的本体词进行微调修改,这样可以减少工作量等。
上述实施例中,还涉及到对本体词的管理,使得本体词可以实时更新,即转义词库进行实时更新,从而通过更新后的转义词库来进行推导,是的推导结果更加准确。
具体地,为了使本领域技术人员充分了解本技术方案中的线上问答方法,现结合科室推荐场景进行详细的说明:
首先,用户通过终端在服务器提供的客户端中输入对应的文本,然后终端将用户输入的文本进行打包加密后发送至服务器,服务器对所接收的文本进行解密解压后得到原始文本,服务器还可以对所得到的原始文本进行清洗,如上文所述,例如去掉语气词等,例如用户可以在客户端中输入“肚子疼”,从而服务器可以得到用户输入的“肚子疼”的文本,可选地,终端还可以设置用户输入字数的限制,例如至少要输入n字,其中n可以是3、10等,在此不做具体限制。
其次,服务器在获取到用户输入的文本后,对文本进行分词,例如首先加载预设字典,并生成前缀树,将输入的文本“肚”、“子”、“疼”依次与前缀树中的文字进行匹配,并根据有向无环图获取到最大概率路径,获取最大概率路径对应的分词,例如“肚子疼”在有向无环图中存在两条路径,第一条是“肚-子疼”,第二条是“肚子-疼”,且第二条的“肚子-疼”的概率大于第一条“肚-子疼”的概率,因此选择第二条“肚子-疼”的路径进行分词得到“肚子”和“疼”两个分词。
第三,服务器还需要获取到当前场景,即终端在打包发送文本时,还需要发送当前的 埋点位置,服务器根据该当前埋点位置得到当前操作位置,从而根据预存的操作位置与场景的映射关系即可以得到当前场景,从而可以加载与当前场景对应的转义词库,以保证所转义得到的本体词时准确的。
第四,此外,服务器还需要将所得到的分词进行处理,例如提取核心关键词等,在服务器加载完成转义词库后,则将分词输入到转义词库中进行推导以得到不同维度的本体词,例如通过肚子在转义词库中进行推导,假设肚子存在对应的本体词,即可以进行本体转义,得到腹部,服务器再获取到腹部对应的维度部位,因此即可以输出“部位:腹部”。假设疼没有对应的本体词,则首先通过转义词库进行近似转义得到“疼”对应的近义词“疼痛”,然后对“疼痛”进行本体转义得到“疼痛”,服务器再获取到“疼痛”对应的维度“症状”,因此可以输出“症状:疼痛”。
第五,根据服务器所得到的不同维度的本体词,进行答案查找,例如上文中根据“部位:腹部”以及“症状:疼痛”得到对应的答案,例如在科室推荐场景下,则“部位:腹部”以及“症状:疼痛”对应的是内科,从而将“肚子疼”进行语义转化得到多维度的本体词,再进行答案匹配,从而使得输出的答案更加准确,且在匹配的过程中,服务器可以根据所获得的所有的本体词进行推导,只有匹配率大于预设值的才会被认定为答案,例如必须全部匹配,或者匹配率大于90%等。
上述实施例中在获取到用户输入的文本后,首先进行分词处理,然后根据当前场景加载对应的转义词库,从而可以根据转义词库推导出分词对应的不同维度的本体词,从而可以获取到不同维度的本体词对应的答案,提高了答案的准确性。
应该理解的是,虽然图2和图4的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2和图4中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图7所示,提供了一种线上问答装置,包括:文本获取模块100、第一分词模块200、转义词库获取模块300、推导模块400和输出模块500,其中:
文本获取模块100,用于接收终端获取的用户输入的文本,并对文本进行清洗。
第一分词模块200,用于对清洗后的文本进行分词处理得到分词。
转义词库获取模块300,用于识别终端获取的当前场景,并加载与当前场景对应的转义词库。
推导模块400,用于通过转义词库对分词进行推导得到不同维度的本体词。及
输出模块500,用于查找与本体词对应的答案,并输出答案。
在其中一个实施例中,推导模块400包括:
第一检索单元,用于检索转义词库中是否存在与当前分词对应的本体词。
第一输出单元,用于当转义词库中存在与当前分词对应的本体词时,则对本体词进行维度处理得到不同维度的本体词,并输出不同维度的本体词。
第二检索单元,用于当转义词库中不存在与当前分词对应的本体词时,检索转义词库中是否存在与当前分词对应的近义词。及
第二输出单元,用于当转义词库中存在与当前分词对应的近义词时,则通过近义词更新当前分词,并继续检索转义词库中是否存在与当前分词对应的本体词。
在其中一个实施例中,转义词库获取模块300还用于接收终端获取的当前操作位置,根据当前操作位置识别得到当前场景。
在其中一个实施例中,装置还包括:
第一选取模块,用于从分词中选取核心关键词。及
推导模块400还用于通过转义词库对核心关键词进行推导得到不同维度的本体词。
在其中一个实施例中,第一分词模块200包括:
加载单元,用于加载预设字典,并根据所加载的预设字典生成前缀树。
有向无环图生成单元,用于根据前缀树以及文本中的字生成有向无环图,有向无环图用于表示文本中的字所能组成词语的情况。及
分词单元,用于通过动态路径查找有向无环图中的最大概率路径,并获取与最大概率路径对应的分词。
在其中一个实施例中,装置还包括:
第二选取模块,用于从文本中选取未出现在有向无环图中的字。
模型获取模块,用于获取预设的隐马尔科夫模型。及
第二分词模块,用于通过隐马尔科夫模型对所选取的字进行分词处理得到分析。
在其中一个实施例中,装置还包括:
接收模块,用于接收针对本体词的管理指令。及
修改模块,用于根据管理指令对对应的本体词进行修改。
关于线上问答装置的具体限定可以参见上文中对于线上问答方法的限定,在此不再赘述。上述线上问答装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储转义词库的数据。该计算机设备的网络接口用于与 外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种线上问答方法。
本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行以下步骤::接收终端获取的用户输入的文本,并对文本进行清洗;对清洗后的文本进行分词处理得到分词;识别终端获取的当前场景,并加载与当前场景对应的转义词库;通过转义词库对分词进行推导得到不同维度的本体词;及查找与本体词对应的答案,并输出答案。
在其中一个实施例中,处理器执行计算机可读指令时所实现的通过转义词库对分词进行推导得到不同维度的本体词,可以包括:检索转义词库中是否存在与当前分词对应的本体词;当转义词库中存在与当前分词对应的本体词时,则对本体词进行维度处理得到不同维度的本体词,并输出不同维度的本体词;当转义词库中不存在与当前分词对应的本体词时,检索转义词库中是否存在与当前分词对应的近义词;及当转义词库中存在与当前分词对应的近义词时,则通过近义词更新当前分词,并继续检索转义词库中是否存在与当前分词对应的本体词。
在其中一个实施例中,处理器执行计算机可读指令时所实现的识别终端获取的当前场景,可以包括:接收终端获取的当前操作位置,根据当前操作位置识别得到当前场景。
在其中一个实施例中,处理器执行计算机可读指令时所实现的对所获取的文本进行分词处理得到分词之后,还可以包括:从分词中选取核心关键词;及处理器执行计算机可读指令时所实现的通过转义词库对分词进行推导得到不同维度的本体词,可以包括:通过转义词库对核心关键词进行推导得到不同维度的本体词。
在其中一个实施例中,处理器执行计算机可读指令时所实现的对所获取的文本进行分词处理得到分词,可以包括:加载预设字典,并根据所加载的预设字典生成前缀树;根据前缀树以及文本中的字生成有向无环图,有向无环图用于表示文本中的字所能组成词语的情况;及通过动态路径查找有向无环图中的最大概率路径,并获取与最大概率路径对应的分词。
在其中一个实施例中,处理器执行计算机可读指令时还实现以下步骤:从文本中选取未出现在有向无环图中的字;获取预设的隐马尔科夫模型;及通过隐马尔科夫模型对所选取的字进行分词处理得到分析。
在其中一个实施例中,处理器执行计算机可读指令时还实现以下步骤:接收针对本体词的管理指令;及根据管理指令对对应的本体词进行修改。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:接收终端获取的用户 输入的文本,并对文本进行清洗;对清洗后的文本进行分词处理得到分词;识别终端获取的当前场景,并加载与当前场景对应的转义词库;通过转义词库对分词进行推导得到不同维度的本体词;及查找与本体词对应的答案,并输出答案。
在其中一个实施例中,计算机可读指令被处理器执行时所实现的通过转义词库对分词进行推导得到不同维度的本体词,可以包括:检索转义词库中是否存在与当前分词对应的本体词;当转义词库中存在与当前分词对应的本体词时,则对本体词进行维度处理得到不同维度的本体词,并输出不同维度的本体词;当转义词库中不存在与当前分词对应的本体词时,检索转义词库中是否存在与当前分词对应的近义词;及当转义词库中存在与当前分词对应的近义词时,则通过近义词更新当前分词,并继续检索转义词库中是否存在与当前分词对应的本体词。
在其中一个实施例中,计算机可读指令被处理器执行时所实现的识别终端获取的当前场景,可以包括:接收终端获取的当前操作位置,根据当前操作位置识别得到当前场景。
在其中一个实施例中,计算机可读指令被处理器执行时所实现的对所获取的文本进行分词处理得到分词之后,还可以包括:从分词中选取核心关键词;及计算机可读指令被处理器执行时所实现的通过转义词库对分词进行推导得到不同维度的本体词,可以包括:通过转义词库对核心关键词进行推导得到不同维度的本体词。
在其中一个实施例中,计算机可读指令被处理器执行时所实现的对所获取的文本进行分词处理得到分词,可以包括:加载预设字典,并根据所加载的预设字典生成前缀树;根据前缀树以及文本中的字生成有向无环图,有向无环图用于表示文本中的字所能组成词语的情况;及通过动态路径查找有向无环图中的最大概率路径,并获取与最大概率路径对应的分词。
在其中一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:从文本中选取未出现在有向无环图中的字;获取预设的隐马尔科夫模型;及通过隐马尔科夫模型对所选取的字进行分词处理得到分析。
在其中一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:接收针对本体词的管理指令;及根据管理指令对对应的本体词进行修改。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM (SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种线上问答方法,包括:
    接收终端获取的用户输入的文本,并对所述文本进行清洗;
    对清洗后的所述文本进行分词处理得到分词;
    识别所述终端获取的当前场景,并加载与所述当前场景对应的转义词库;
    通过所述转义词库对所述分词进行推导得到不同维度的本体词;及
    查找与所述本体词对应的答案,并输出所述答案。
  2. 根据权利要求1所述的方法,其特征在于,所述通过所述转义词库对所述分词进行推导得到不同维度的本体词,包括:
    检索所述转义词库中是否存在与当前分词对应的本体词;
    当所述转义词库中存在与所述当前分词对应的本体词时,则对所述本体词进行维度处理得到不同维度的本体词,并输出所述不同维度的本体词;
    当所述转义词库中不存在与所述当前分词对应的本体词时,检索所述转义词库中是否存在与所述当前分词对应的近义词;及
    当所述转义词库中存在与所述当前分词对应的近义词时,则通过所述近义词更新所述当前分词,并继续检索所述转义词库中是否存在与当前分词对应的本体词。
  3. 根据权利要求1所述的方法,其特征在于,所述识别所述终端获取的当前场景,包括:
    接收所述终端获取的当前操作位置,根据所述当前操作位置识别得到当前场景。
  4. 根据权利要求1至3任意一项所述的方法,其特征在于,所述对所获取的文本进行分词处理得到分词之后,还包括:
    从所述分词中选取核心关键词;及
    所述通过所述转义词库对所述分词进行推导得到不同维度的本体词,包括:
    通过所述转义词库对所述核心关键词进行推导得到不同维度的本体词。
  5. 根据权利要求4所述的方法,其特征在于,所述对所获取的文本进行分词处理得到分词,包括:
    加载预设字典,并根据所加载的预设字典生成前缀树;
    根据所述前缀树以及所述文本中的字生成有向无环图,所述有向无环图用于表示所述文本中的字所能组成词语的情况;及
    通过动态路径查找所述有向无环图中的最大概率路径,并获取与所述最大概率路径对应的分词。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    从所述文本中选取未出现在所述有向无环图中的字;
    获取预设的隐马尔科夫模型;及
    通过隐马尔科夫模型对所选取的字进行分词处理得到分析。
  7. 根据权利要求4所述方法,其特征在于,所述方法还包括:
    接收针对所述本体词的管理指令;及
    根据所述管理指令对对应的本体词进行修改。
  8. 一种线上问答装置,包括:
    文本获取模块,用于接收终端获取的用户输入的文本,并对所述文本进行清洗;
    第一分词模块,用于对清洗后的所述文本进行分词处理得到分词;
    转义词库获取模块,用于识别终端获取的当前场景,并加载与所述当前场景对应的转义词库;
    推导模块,用于通过所述转义词库对所述分词进行推导得到不同维度的本体词;
    输出模块,用于查找与所述本体词对应的答案,并输出所述答案。
  9. 根据权利要求8所述的装置,其特征在于,所述推导模块包括:
    第一检索单元,用于检索所述转义词库中是否存在与当前分词对应的本体词;
    第一输出单元,用于当所述转义词库中存在与所述当前分词对应的本体词时,则对所述本体词进行维度处理得到不同维度的本体词,并输出所述不同维度的本体词;
    第二检索单元,用于当所述转义词库中不存在与所述当前分词对应的本体词时,检索所述转义词库中是否存在与所述当前分词对应的近义词;及
    第二输出单元,用于当所述转义词库中存在与所述当前分词对应的近义词时,则通过所述近义词更新所述当前分词,并继续检索所述转义词库中是否存在与当前分词对应的本体词。
  10. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:接收终端获取的用户输入的文本,并对所述文本进行清洗;对清洗后的所述文本进行分词处理得到分词;识别所述终端获取的当前场景,并加载与所述当前场景对应的转义词库;通过所述转义词库对所述分词进行推导得到不同维度的本体词;及查找与所述本体词对应的答案,并输出所述答案。
  11. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时所实现的所述通过所述转义词库对所述分词进行推导得到不同维度的本体词,包括:检索所述转义词库中是否存在与当前分词对应的本体词;当所述转义词库中存在与所述当前分词对应的本体词时,则对所述本体词进行维度处理得到不同维度的本体词,并输出所述不同维度的本体词;当所述转义词库中不存在与所述当前分词对应的本体词时,检索所述转义词库中是否存在与所述当前分词对应的近义词;及当所述转义词库中存在与所述当前分词对应的近义词时,则通过所述近义词更新所述当前分词,并继续检索所述转义词库中是否存在与当前分词对应的本体词。
  12. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时所实现的所述识别所述终端获取的当前场景,包括:接收所述终端获取的当前 操作位置,根据所述当前操作位置识别得到当前场景。
  13. 根据权利要求10至12任意一项所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时所实现的所述对所获取的文本进行分词处理得到分词之后,还包括:从所述分词中选取核心关键词;及
    所述处理器执行所述计算机可读指令时所实现的所述通过所述转义词库对所述分词进行推导得到不同维度的本体词,包括:通过所述转义词库对所述核心关键词进行推导得到不同维度的本体词。
  14. 根据权利要求13所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时所实现的所述对所获取的文本进行分词处理得到分词,包括:加载预设字典,并根据所加载的预设字典生成前缀树;根据所述前缀树以及所述文本中的字生成有向无环图,所述有向无环图用于表示所述文本中的字所能组成词语的情况;及通过动态路径查找所述有向无环图中的最大概率路径,并获取与所述最大概率路径对应的分词。
  15. 根据权利要求14所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:从所述文本中选取未出现在所述有向无环图中的字;获取预设的隐马尔科夫模型;及通过隐马尔科夫模型对所选取的字进行分词处理得到分析。
  16. 根据权利要求13所述计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:接收针对所述本体词的管理指令;及根据所述管理指令对对应的本体词进行修改。
  17. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:接收终端获取的用户输入的文本,并对所述文本进行清洗;对清洗后的所述文本进行分词处理得到分词;识别所述终端获取的当前场景,并加载与所述当前场景对应的转义词库;通过所述转义词库对所述分词进行推导得到不同维度的本体词;及查找与所述本体词对应的答案,并输出所述答案。
  18. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时所实现的所述通过所述转义词库对所述分词进行推导得到不同维度的本体词,包括:检索所述转义词库中是否存在与当前分词对应的本体词;当所述转义词库中存在与所述当前分词对应的本体词时,则对所述本体词进行维度处理得到不同维度的本体词,并输出所述不同维度的本体词;当所述转义词库中不存在与所述当前分词对应的本体词时,检索所述转义词库中是否存在与所述当前分词对应的近义词;及当所述转义词库中存在与所述当前分词对应的近义词时,则通过所述近义词更新所述当前分词,并继续检索所述转义词库中是否存在与当前分词对应的本体词。
  19. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时所实现的所述识别所述终端获取的当前场景,包括:接收所述终端获取的当前操作位置,根据所述当前操作位置识别得到当前场景。
  20. 根据权利要求17至19任意一项所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时所实现的所述对所获取的文本进行分词处理得到分词之后,还包括:从所述分词中选取核心关键词;及
    所述计算机可读指令被所述处理器执行时所实现的所述通过所述转义词库对所述分词进行推导得到不同维度的本体词,包括:通过所述转义词库对所述核心关键词进行推导得到不同维度的本体词。
PCT/CN2019/071524 2018-07-04 2019-01-14 线上问答方法、装置、计算机设备和存储介质 WO2020007027A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810724612.3 2018-07-04
CN201810724612.3A CN108986910B (zh) 2018-07-04 2018-07-04 线上问答方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2020007027A1 true WO2020007027A1 (zh) 2020-01-09

Family

ID=64536215

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/071524 WO2020007027A1 (zh) 2018-07-04 2019-01-14 线上问答方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN108986910B (zh)
WO (1) WO2020007027A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307759A (zh) * 2020-11-09 2021-02-02 西安交通大学 一种面向社交网络不规则短文本的粤语分词方法
CN112765963A (zh) * 2020-12-31 2021-05-07 北京锐安科技有限公司 语句分词方法、装置、计算机设备及存储介质
CN113033193A (zh) * 2021-01-20 2021-06-25 山谷网安科技股份有限公司 一种基于c++语言的混合型中文文本分词方法

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710087B (zh) * 2018-12-28 2023-01-13 北京金山安全软件有限公司 输入法模型生成方法及装置
CN109992776B (zh) * 2019-03-26 2023-07-25 北京博瑞彤芸文化传播股份有限公司 一种中文分词方法
CN110110133B (zh) * 2019-04-18 2020-08-11 贝壳找房(北京)科技有限公司 一种智能语音数据生成方法及装置
CN110388933A (zh) * 2019-07-22 2019-10-29 上海图聚智能科技股份有限公司 兴趣点搜索方法、装置、服务器及存储介质
CN110751234B (zh) * 2019-10-09 2024-04-16 科大讯飞股份有限公司 Ocr识别纠错方法、装置及设备
CN111291195B (zh) * 2020-01-21 2021-08-10 腾讯科技(深圳)有限公司 一种数据处理方法、装置、终端及可读存储介质
CN112559865B (zh) * 2020-12-15 2023-12-08 泰康保险集团股份有限公司 信息处理系统、计算机可读存储介质及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317846A (zh) * 2014-10-13 2015-01-28 安徽华贞信息科技有限公司 一种语义分析与标注方法及系统
CN106528540A (zh) * 2016-12-16 2017-03-22 广州索答信息科技有限公司 一种种子问句的分词方法和分词系统
CN106599215A (zh) * 2016-12-16 2017-04-26 广州索答信息科技有限公司 一种基于深度学习的问句生成方法和问句生成系统
CN107783957A (zh) * 2016-08-30 2018-03-09 中国电信股份有限公司 本体创建方法和装置
CN107993724A (zh) * 2017-11-09 2018-05-04 易保互联医疗信息科技(北京)有限公司 一种医学智能问答数据处理的方法及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101183281B (zh) * 2007-12-26 2011-04-13 腾讯科技(深圳)有限公司 一种输入法中候选词的相关词输入的方法及系统
CN103902652A (zh) * 2014-02-27 2014-07-02 深圳市智搜信息技术有限公司 自动问答系统
CN106844647A (zh) * 2017-01-22 2017-06-13 南方科技大学 一种搜索关键词获取的方法及装置
CN107220380A (zh) * 2017-06-27 2017-09-29 北京百度网讯科技有限公司 基于人工智能的问答推荐方法、装置和计算机设备
CN107688608A (zh) * 2017-07-28 2018-02-13 合肥美的智能科技有限公司 智能语音问答方法、装置、计算机设备和可读存储介质
CN107590124B (zh) * 2017-09-06 2020-12-04 耀灵人工智能(浙江)有限公司 按场景对同义词替换并根据按场景归类的标准词组比对的方法
CN107766511A (zh) * 2017-10-23 2018-03-06 深圳市前海众兴电子商务有限公司 智能问答方法、终端及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317846A (zh) * 2014-10-13 2015-01-28 安徽华贞信息科技有限公司 一种语义分析与标注方法及系统
CN107783957A (zh) * 2016-08-30 2018-03-09 中国电信股份有限公司 本体创建方法和装置
CN106528540A (zh) * 2016-12-16 2017-03-22 广州索答信息科技有限公司 一种种子问句的分词方法和分词系统
CN106599215A (zh) * 2016-12-16 2017-04-26 广州索答信息科技有限公司 一种基于深度学习的问句生成方法和问句生成系统
CN107993724A (zh) * 2017-11-09 2018-05-04 易保互联医疗信息科技(北京)有限公司 一种医学智能问答数据处理的方法及装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307759A (zh) * 2020-11-09 2021-02-02 西安交通大学 一种面向社交网络不规则短文本的粤语分词方法
CN112307759B (zh) * 2020-11-09 2024-04-12 西安交通大学 一种面向社交网络不规则短文本的粤语分词方法
CN112765963A (zh) * 2020-12-31 2021-05-07 北京锐安科技有限公司 语句分词方法、装置、计算机设备及存储介质
CN113033193A (zh) * 2021-01-20 2021-06-25 山谷网安科技股份有限公司 一种基于c++语言的混合型中文文本分词方法
CN113033193B (zh) * 2021-01-20 2024-04-16 山谷网安科技股份有限公司 一种基于c++语言的混合型中文文本分词方法

Also Published As

Publication number Publication date
CN108986910A (zh) 2018-12-11
CN108986910B (zh) 2023-09-05

Similar Documents

Publication Publication Date Title
WO2020007027A1 (zh) 线上问答方法、装置、计算机设备和存储介质
WO2020057022A1 (zh) 关联推荐方法、装置、计算机设备和存储介质
US10585924B2 (en) Processing natural-language documents and queries
US20160041986A1 (en) Smart Search Engine
CN111984851B (zh) 医学资料搜索方法、装置、电子装置及存储介质
WO2021120627A1 (zh) 数据搜索匹配方法、装置、计算机设备和存储介质
CN109933785A (zh) 用于实体关联的方法、装置、设备和介质
US11295861B2 (en) Extracted concept normalization using external evidence
US9798776B2 (en) Systems and methods for parsing search queries
CN108874773B (zh) 关键词新增方法、装置、计算机设备和存储介质
CN112115232A (zh) 一种数据纠错方法、装置及服务器
CN112883165B (zh) 一种基于语义理解的智能全文检索方法及系统
KR102292040B1 (ko) 기계 독해 기반 지식 추출을 위한 시스템 및 방법
US12008473B2 (en) Augmenting machine learning language models using search engine results
CN111859950A (zh) 一种自动化生成讲稿的方法
CN113343692B (zh) 搜索意图的识别方法、模型训练方法、装置、介质及设备
KR20120042562A (ko) 온라인 사전을 이용한 개체명 사전 구축 방법 및 이를 실행하는 장치
JP2019082860A (ja) 生成プログラム、生成方法及び生成装置
CN117076636A (zh) 一种智能客服的信息查询方法、系统和设备
Ahmed et al. Developing an ontology of concepts in the Qur'an
WO2019148797A1 (zh) 自然语言处理方法、装置、计算机设备和存储介质
CN115114420A (zh) 一种知识图谱问答方法、终端设备及存储介质
CN113297854A (zh) 文本到知识图谱实体的映射方法、装置、设备及存储介质
CN111859926A (zh) 同义句对生成方法、装置、计算机设备及存储介质
JP2004318381A (ja) 類義性計算方法、類義性計算プログラム、類義性計算プログラムを記録したコンピュータ読み取り可能な記録媒体

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19830167

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15/04/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19830167

Country of ref document: EP

Kind code of ref document: A1