WO2021174923A1 - 概念词序列生成方法、装置、计算机设备及存储介质 - Google Patents

概念词序列生成方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021174923A1
WO2021174923A1 PCT/CN2020/131954 CN2020131954W WO2021174923A1 WO 2021174923 A1 WO2021174923 A1 WO 2021174923A1 CN 2020131954 W CN2020131954 W CN 2020131954W WO 2021174923 A1 WO2021174923 A1 WO 2021174923A1
Authority
WO
WIPO (PCT)
Prior art keywords
concept
word
knowledge base
keyword
words
Prior art date
Application number
PCT/CN2020/131954
Other languages
English (en)
French (fr)
Inventor
蒙元
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021174923A1 publication Critical patent/WO2021174923A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device, computer equipment, and storage medium for generating a concept word sequence.
  • Intelligent customer service and remote consultation are an important section.
  • Intelligent customer service or remote consultation requires question-and-answer matching capabilities and recommendation capabilities.
  • Concept word sequence is the basis for intelligent customer service to have the ability to match question and answer and recommend.
  • the concept word sequence is a concept index corresponding to the keywords in the question sentence.
  • the inventor realizes that when generating a sequence of concept words, a larger abstraction time is required to generate keywords; at the same time, a larger matching time is required to match the concept words according to the keywords.
  • the first aspect of the present application provides a concept word sequence generation method, and the concept word sequence generation method includes:
  • the concept knowledge base including a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentence corresponds to a concept word;
  • the concept words corresponding to the keywords of the question sentence are combined into a sequence of concept words.
  • a second aspect of the present application provides a concept word sequence generating device, the concept word sequence generating device includes:
  • the first acquisition module is used to acquire question sentences
  • the second acquisition module is used to acquire a conceptual knowledge base, the conceptual knowledge base includes a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentence corresponds to a concept word;
  • An extraction module for extracting keywords of the question sentence from the question sentence according to the keywords of the concept knowledge base
  • the determining module is used to determine the concept word corresponding to the keyword of the question sentence according to the corresponding relationship between the keyword in the concept knowledge base and the concept word;
  • the combination module is used to combine the concept words corresponding to the keywords of the question sentence into a sequence of concept words according to the word order of the keywords of the question sentence.
  • a third aspect of the present application provides a computer device that includes a processor, and the processor is configured to implement the following steps when executing computer-readable instructions stored in a memory:
  • the concept knowledge base including a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentence corresponds to a concept word;
  • the concept words corresponding to the keywords of the question sentence are combined into a sequence of concept words.
  • a fourth aspect of the present application provides a computer-readable storage medium having computer-readable instructions stored thereon, and when the computer-readable instructions are executed by a processor, the following steps are implemented:
  • the concept knowledge base including a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentence corresponds to a concept word;
  • the concept words corresponding to the keywords of the question sentence are combined into a sequence of concept words.
  • the present application determines the concept word corresponding to the keyword of the question sentence according to the corresponding relationship between the keyword in the concept knowledge base and the concept word, and generates the concept word sequence of the sentence, so as to improve the efficiency of generating the concept word sequence.
  • Fig. 1 is a flowchart of a method for generating a concept word sequence provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of determining the conceptual words corresponding to the keywords of the question sentence provided by the embodiment of the present application.
  • Fig. 3 is a flowchart of calculating the probability score of a concept word combination provided by an embodiment of the present application.
  • Fig. 4 is a structural diagram of a conceptual word sequence generating device provided by an embodiment of the present application.
  • Fig. 5 is a schematic diagram of a computer device provided by an embodiment of the present application.
  • the concept word sequence generation method of the present application is applied to one or more computer devices.
  • the computer device is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • Its hardware includes, but is not limited to, a microprocessor and an application specific integrated circuit (ASIC) , Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC application specific integrated circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • embedded equipment etc.
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • FIG. 1 is a flowchart of a method for generating a concept word sequence provided in Embodiment 1 of the present application.
  • the concept word sequence generation method is applied to a computer device to generate the concept word sequence of a sentence and improve the efficiency of generating the concept word sequence.
  • the concept word sequence generation method includes:
  • the obtaining of the question sentence may include: pulling the question sentence from cloud storage; or receiving the question sentence input by the user; or collecting an image including the question sentence through a camera, and recognizing the question sentence through a character recognition method.
  • Question statement in the image This application does not make specific restrictions.
  • the question sentence may be a question sentence related to medical insurance, for example, the question sentence is "Which diseases are covered by a million doctors".
  • the concept knowledge base includes a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentence corresponds to a concept word.
  • the sample sentence is "I ordered 5 catties of apples on the iPhone”
  • the multiple keywords corresponding to the sample sentence are the first "apple” and the second "apple”
  • the first "apple” corresponds to a concept
  • the second "apple” corresponds to another conceptual word "fruit apple”.
  • sample sentence is "This programmer is often called a code farmer by others"
  • the keywords corresponding to the sample sentence are "programmer” and "code farmer”
  • programming corresponds to the conceptual word “computer practitioner”.
  • code farmer corresponds to the concept word "computer practitioner”.
  • the multiple keywords corresponding to each sample sentence may be the entity object in the sample sentence, or may be the key intent of the sample sentence.
  • people's knowledge of knowledge needs to be added, that is, the corresponding relationship between the keywords and the conceptual words corresponding to the keywords.
  • an apple can be considered a fruit or a mobile phone.
  • Indexing, matching, and recommending through the correspondence between concept words and keywords can reduce the calculation delay of the question-and-answer matching model.
  • the connection between concept words can also be used to recommend insurance products.
  • the conceptual knowledge base includes a plurality of sample sentences, and each sample sentence corresponds to a plurality of keywords. That is, the concept database contains multiple sample sentences marked with keywords, which can be used as historical data with reference value, and keywords can be extracted from the question sentences based on statistical methods.
  • the extracting the keywords of the question sentence from the question sentences according to the keywords of the concept knowledge base includes:
  • the word segmentation result includes K words.
  • the word segmentation result the shorter the length of any word, the lower the length score of any word; the most similar keywords of any word are obtained from the concept knowledge base Determine the reciprocal of the similarity between any word and the most similar keyword as the similarity score of any word, the higher the similarity between any word and the most similar keyword, The lower the similarity score of any word in the description.
  • the lowest keyword score of the word segmentation result is keywords, Among them, argmin (K, keyword) represents the value of the variable K and keyword when the symbol followed by the function reaches the minimum value, cost(keyword[k]) represents the similarity score of the k-th word among K words, len(keyword[k]) represents the length score of the k-th word among K words.
  • the obtaining the most similar keywords of any term from the concept knowledge base may include:
  • the word with the smallest Euclidean distance from any word is determined as the most similar keyword.
  • the concept knowledge base includes a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentence corresponds to a concept word. That is, there is a corresponding relationship between keywords and concept words in the concept knowledge base.
  • a statistical method may be used to extract keywords from the question sentence based on the corresponding relationship between the keywords in the concept knowledge base and the concept words.
  • the determining the concept word corresponding to the keyword of the question sentence according to the corresponding relationship between the keyword in the concept knowledge base and the concept word includes:
  • the question sentence includes Keyword 1 and Keyword 2.
  • keyword 1 in sample sentence 1 corresponds to concept word 11
  • keyword 1 in sample sentence 2 corresponds to concept word 12
  • keyword 2 in sample sentence 3 corresponds to concept word 21, and in sample sentence 3 Keyword 2 pairs of operational concept words 22.
  • the multiple concept words corresponding to the keyword 1 of the question sentence are concept words 11 and 12
  • the multiple concept words corresponding to the keyword 2 of the question sentence are concept words 21 and 22.
  • the concept word combination of the question sentence may include "concept word 11-concept word 21", "concept word 11-concept word 22", "concept word 12-concept word 21", and "concept word 12-concept word 22".
  • the highest probability score of the concept word combination may be calculated based on the joint probability.
  • the optimized objective function is:
  • w n is the problem statement n-th keyword
  • e n is the problem statement corresponding to the n-th keyword concept words, 1 ⁇ n ⁇ N
  • N is the problem statement Number of keywords.
  • the concept of probability is calculated based on a combination of the word score of the joint probability P (e 1, e 2, ..., e n
  • the calculating the probability score of the concept word combination further includes:
  • a keyword is randomly selected from the concept knowledge base, and a second probability that the extracted keyword is consistent with each keyword of the question sentence is calculated according to the keywords and concept words of the concept knowledge base , Get multiple second probabilities;
  • the plurality of first probability and the second probability is the product of a plurality of P (e 1, e 2, ..., e n
  • ⁇ i, j P (e i, e j) is the global maximum simplified joint information group, P (w n
  • the two target concept words are randomly selected from the concept knowledge base, and any combination of the two target concept words and the concept words is calculated according to the concept words in the concept knowledge base
  • the first probability that two concept words are consistent get multiple first probabilities, including:
  • (d1) Calculate the ratio of the first number of the first concept word pair to the second number of the plurality of second concept word pairs, and compare the first number of the first concept word pair to the plurality of first concept word pairs The ratio of the second number of the two concept word pairs is used as the first probability of any two concept words in the concept word combination, and multiple first probabilities are obtained.
  • the ratio of the first number of the first concept word pair to the second number of the plurality of second concept word pairs is proportional to P(e i , e j ), and the first concept word
  • the ratio of the first number of pairs to the second number of the plurality of second concept word pairs is used as the first probability of any two concept words in the concept word combination, Where count[e i ,e j ] represents the first number of the first concept word pair, Represents the ratio of the second number of the plurality of second concept word pairs.
  • the two target concept words are randomly selected from the concept knowledge base, and the combination of the two target concept words and the concept word is calculated according to the concept words in the concept knowledge base
  • the first probability that any two concept words of are consistent get multiple first probabilities, including:
  • (b2) Obtain a target keyword pair corresponding to the third concept word pair from the keywords of the question sentence, and obtain multiple fourth concepts corresponding to the target keyword pair from the concept knowledge base Word pairs, searching for the plurality of fourth concept word pairs in each sample sentence in the concept knowledge base, and counting the fourth number of the plurality of fourth concept word pairs found in the concept knowledge base .
  • a keyword is randomly selected from the concept knowledge base, and each of the keyword and the question sentence extracted is calculated according to the keywords and concept words of the concept knowledge base.
  • the second probability that the keywords are consistent get multiple second probabilities, including:
  • the sixth and the fifth number is the number ratio of P (w n
  • count [w n, e n ] represents the number of the sixth first target word pairs, count [w n, e n ] a fifth predetermined number to said word concept.
  • the one keyword is randomly selected from the concept knowledge base, and each of the one keyword and the question sentence extracted is calculated according to the keywords and concept words of the concept knowledge base.
  • the second probability that the keywords are consistent get multiple second probabilities, including:
  • count [w n, e n , w n-1, w n-2, w n + 1, w n + 2] represents the number of the eighth to the fourth target words
  • count [e n, w n -1 ,w n-2 ,w n+1 ,w n+2 ] represents the seventh number of the third target word pair.
  • w n-1 , w n-2 , w n+1 , and w n+2 represent the context information, such as two words before and after the designated keyword.
  • the question sentence is "Which diseases are insured for a million doctors", and the keyword corresponding to the concept word of the question sentence is "eshengsafe million doctors, what, diseases, protection”.
  • a preset word queue converts the keywords of a question sentence into a word vector, combine multiple word vectors obtained by the conversion into a concept word sequence according to the word order of the keywords of the question sentence, and store the preset word queue .
  • the concept word sequence is an abstract representation of the question sentence, and can be used as intermediate data for further natural language processing of the question sentence.
  • the concept word sequence generation method further includes:
  • the answer corresponding to the question sentence is matched according to the concept word sequence.
  • the efficiency of generating the concept word sequence is improved, thereby increasing the question and answer matching through the concept word sequence of the sentence Accuracy.
  • the concept word sequence may also be stored in a node of a blockchain.
  • the concept word sequence generation method of the first embodiment determines the concept word corresponding to the keyword of the question sentence according to the corresponding relationship between the keyword in the concept knowledge base and the concept word, and generates the concept word sequence of the sentence to improve the generation of the concept word The efficiency of the sequence.
  • the efficiency of the concept word sequence in the remote consultation of medical technology can be improved, and the accuracy of the question and answer of the remote consultation can be improved, which is beneficial to the development of remote medical services.
  • Fig. 4 is a structural diagram of a conceptual word sequence generating apparatus provided in the second embodiment of the present application.
  • the concept word sequence generating device 20 is applied to computer equipment.
  • the concept word sequence generating device 20 is used to generate the concept word sequence of the sentence, and improve the efficiency of generating the concept word sequence.
  • the concept word sequence generating device 20 may include a first acquisition module 201, a second acquisition module 202, an extraction module 203, a determination module 204, and a combination module 205.
  • the first obtaining module 201 is used to obtain question sentences.
  • the obtaining of question sentences may include: pulling question sentences from cloud storage; or receiving question sentences input by users; or collecting images including question sentences through a camera, and recognizing said question sentences through a character recognition method.
  • Question statement in the image This application does not make specific restrictions.
  • the question sentence may be a question sentence related to medical insurance, for example, the question sentence is "Which diseases are covered by a million doctors".
  • the second acquisition module 202 is configured to acquire a concept knowledge base, the concept knowledge base including a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentence corresponds to a concept word.
  • the sample sentence is "I ordered 5 catties of apples on the iPhone”
  • the multiple keywords corresponding to the sample sentence are the first "apple” and the second "apple”
  • the first "apple” corresponds to a concept
  • the second "apple” corresponds to another conceptual word "fruit apple”.
  • sample sentence is "This programmer is often called a code farmer by others"
  • the keywords corresponding to the sample sentence are "programmer” and "code farmer”
  • programming corresponds to the conceptual word “computer practitioner”.
  • code farmer corresponds to the concept word "computer practitioner”.
  • the multiple keywords corresponding to each sample sentence may be the entity object in the sample sentence, or may be the key intent of the sample sentence.
  • people's knowledge of knowledge needs to be added, that is, the corresponding relationship between the keywords and the conceptual words corresponding to the keywords.
  • an apple can be considered a fruit or a mobile phone.
  • Indexing, matching, and recommending through the correspondence between concept words and keywords can reduce the calculation delay of the question-and-answer matching model.
  • the connection between concept words can also be used to recommend insurance products.
  • the extraction module 203 is configured to extract keywords of the question sentence from the question sentence according to the keywords of the concept knowledge base.
  • the conceptual knowledge base includes a plurality of sample sentences, and each sample sentence corresponds to a plurality of keywords. That is, the concept database contains multiple sample sentences marked with keywords, which can be used as historical data with reference value, and keywords can be extracted from the question sentences based on statistical methods.
  • the extracting the keywords of the question sentence from the question sentences according to the keywords of the concept knowledge base includes:
  • the word segmentation result includes K words.
  • the word segmentation result the shorter the length of any word, the lower the length score of any word; the most similar keywords of any word are obtained from the concept knowledge base Determine the reciprocal of the similarity between any word and the most similar keyword as the similarity score of any word, the higher the similarity between any word and the most similar keyword, The lower the similarity score of any word in the description.
  • the lowest keyword score of the word segmentation result is keywords, Among them, argmin (K, keyword) represents the value of the variable K and keyword when the symbol followed by the function reaches the minimum value, and cost(keyword[k]) represents the similarity score of the k-th word in the K words, len(keyword[k]) represents the length score of the k-th word among K words.
  • the obtaining the most similar keywords of any term from the concept knowledge base may include:
  • the word with the smallest Euclidean distance from any word is determined as the most similar keyword.
  • the determining module 204 is configured to determine the concept word corresponding to the keyword of the question sentence according to the corresponding relationship between the keyword in the concept knowledge base and the concept word.
  • the concept knowledge base includes a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentence corresponds to a concept word. That is, there is a corresponding relationship between keywords and concept words in the concept knowledge base.
  • a statistical method may be used to extract keywords from the question sentence based on the corresponding relationship between the keywords in the concept knowledge base and the concept words.
  • the determining the concept word corresponding to the keyword of the question sentence according to the corresponding relationship between the keyword in the concept knowledge base and the concept word includes:
  • the question sentence includes Keyword 1 and Keyword 2.
  • keyword 1 in sample sentence 1 corresponds to concept word 11
  • keyword 1 in sample sentence 2 corresponds to concept word 12
  • keyword 2 in sample sentence 3 corresponds to concept word 21, and in sample sentence 3 Keyword 2 pairs of operational concept words 22.
  • the multiple concept words corresponding to the keyword 1 of the question sentence are concept words 11 and 12
  • the multiple concept words corresponding to the keyword 2 of the question sentence are concept words 21 and 22.
  • the concept word combination of the question sentence may include "concept word 11-concept word 21", "concept word 11-concept word 22", "concept word 12-concept word 21", and "concept word 12-concept word 22".
  • the highest probability score of the concept word combination may be calculated based on the joint probability.
  • the optimized objective function is:
  • w n is the problem statement n-th keyword
  • e n is the problem statement corresponding to the n-th keyword concept words, 1 ⁇ n ⁇ N
  • N is the problem statement Number of keywords.
  • the concept of probability is calculated based on a combination of the word score of the joint probability P (e 1, e 2, ..., e n
  • the calculating the probability score of the concept word combination further includes:
  • a keyword is randomly selected from the concept knowledge base, and a second probability that the extracted keyword is consistent with each keyword of the question sentence is calculated according to the keywords and concept words of the concept knowledge base , Get multiple second probabilities;
  • the plurality of first probability and the second probability is the product of a plurality of P (e 1, e 2, ..., e n
  • ⁇ i, j P (e i, e j) is the global maximum simplified joint information group, P (w n
  • the two target concept words are randomly selected from the concept knowledge base, and any combination of the two target concept words and the concept words is calculated according to the concept words in the concept knowledge base
  • the first probability that two concept words are consistent get multiple first probabilities, including:
  • (d1) Calculate the ratio of the first number of the first concept word pair to the second number of the plurality of second concept word pairs, and compare the first number of the first concept word pair to the plurality of first concept word pairs The ratio of the second number of the two concept word pairs is used as the first probability of any two concept words in the concept word combination, and multiple first probabilities are obtained.
  • the ratio of the first number of the first concept word pair to the second number of the plurality of second concept word pairs is proportional to P(e i , e j ), and the first concept word
  • the ratio of the first number of pairs to the second number of the plurality of second concept word pairs is used as the first probability of any two concept words in the concept word combination, Where count[e i ,e j ] represents the first number of the first concept word pair, Represents the ratio of the second number of the plurality of second concept word pairs.
  • the two target concept words are randomly selected from the concept knowledge base, and the combination of the two target concept words and the concept word is calculated according to the concept words in the concept knowledge base
  • the first probability that any two concept words of are consistent get multiple first probabilities, including:
  • (b2) Obtain a target keyword pair corresponding to the third concept word pair from the keywords of the question sentence, and obtain multiple fourth concepts corresponding to the target keyword pair from the concept knowledge base Word pairs, searching for the plurality of fourth concept word pairs in each sample sentence in the concept knowledge base, and counting the fourth number of the plurality of fourth concept word pairs found in the concept knowledge base .
  • the one keyword is randomly extracted from the concept knowledge base, and each of the one keyword and the question sentence extracted is calculated according to the keywords and concept words of the concept knowledge base.
  • the second probability that the keywords are consistent get multiple second probabilities, including:
  • the sixth and the fifth number is the number ratio of P (w n
  • count [w n, e n ] represents the number of the sixth first target word pairs, count [w n, e n ] a fifth predetermined number to said word concept.
  • the one keyword is randomly selected from the concept knowledge base, and each of the one keyword and the question sentence extracted is calculated according to the keywords and concept words of the concept knowledge base.
  • the second probability that the keywords are consistent get multiple second probabilities, including:
  • count [w n, e n , w n-1, w n-2, w n + 1, w n + 2] represents the number of the eighth to the fourth target words
  • count [e n, w n -1 ,w n-2 ,w n+1 ,w n+2 ] represents the seventh number of the third target word pair.
  • w n-1 , w n-2 , w n+1 , and w n+2 represent the context information, such as two words before and after the designated keyword.
  • the question sentence is "Which diseases are insured for a million doctors", and the keyword corresponding to the concept word of the question sentence is "eshengsafe million doctors, what, diseases, protection”.
  • the combination module 205 is configured to combine concept words corresponding to the keywords of the question sentence into a sequence of concept words according to the word order of the keywords of the question sentence.
  • a preset word queue converts the keywords of a question sentence into a word vector, combine multiple word vectors obtained by the conversion into a concept word sequence according to the word order of the keywords of the question sentence, and store the preset word queue .
  • the concept word sequence is an abstract representation of the question sentence, and can be used as intermediate data for further natural language processing of the question sentence.
  • the concept word sequence generation device further includes a matching module for combining concept words corresponding to the keywords of the question sentence into a concept word sequence according to the word order of the keywords of the question sentence , Matching the answer corresponding to the question sentence according to the concept word sequence.
  • the efficiency of generating the concept word sequence is improved, thereby increasing the question and answer matching through the concept word sequence of the sentence Accuracy.
  • the concept word sequence may also be stored in a node of a blockchain.
  • the concept word sequence generating device 20 of the second embodiment determines the concept word corresponding to the keyword of the question sentence according to the corresponding relationship between the keyword in the concept knowledge base and the concept word, and generates the concept word sequence of the sentence to improve the generated concept The efficiency of word sequences.
  • This embodiment provides a storage medium that stores computer-readable instructions.
  • the storage medium may be non-volatile or volatile.
  • the computer-readable instructions implement the above-mentioned concept words when executed by a processor.
  • the steps in the embodiment of the sequence generation method for example, steps 101-105 shown in Fig. 1:
  • the concept knowledge base includes a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentence corresponds to a concept word;
  • the first obtaining module 201 is used to obtain question sentences
  • the second acquisition module 202 is configured to acquire a concept knowledge base, the concept knowledge base including a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentence corresponds to a concept word;
  • the extraction module 203 is configured to extract the keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base;
  • the determining module 204 is configured to determine the concept word corresponding to the keyword of the question sentence according to the corresponding relationship between the keywords in the concept knowledge base and the concept words;
  • the combination module 205 is configured to combine concept words corresponding to the keywords of the question sentence into a sequence of concept words according to the word order of the keywords of the question sentence.
  • FIG. 5 is a schematic diagram of a computer device provided in Embodiment 3 of this application.
  • the computer device 30 includes a memory 301, a processor 302, and computer readable instructions stored in the memory 301 and running on the processor 302, such as a concept word sequence generating program.
  • the processor 302 executes the computer-readable instructions, the steps in the embodiment of the method for generating a concept word sequence described above are implemented, for example, 101-105 shown in FIG. 1:
  • the concept knowledge base includes a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentence corresponds to a concept word;
  • the computer-readable instruction realizes the functions of the modules in the foregoing device embodiment when executed by the processor, for example, modules 201-205 in Figure 4:
  • the first obtaining module 201 is used to obtain question sentences
  • the second acquisition module 202 is configured to acquire a concept knowledge base, the concept knowledge base including a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentence corresponds to a concept word;
  • the extraction module 203 is configured to extract the keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base;
  • the determining module 204 is configured to determine the concept word corresponding to the keyword of the question sentence according to the corresponding relationship between the keywords in the concept knowledge base and the concept words;
  • the combination module 205 is configured to combine concept words corresponding to the keywords of the question sentence into a sequence of concept words according to the word order of the keywords of the question sentence.
  • the computer-readable instructions may be divided into one or more modules, and the one or more modules are stored in the memory 301 and executed by the processor 302 to complete the method.
  • the one or more modules may be a series of computer-readable instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions in the computer device 30.
  • the computer-readable instructions can be divided into the first acquisition module 201, the second acquisition module 202, the extraction module 203, the determination module 204, and the combination module 205 in FIG.
  • the schematic diagram 5 is only an example of the computer device 30, and does not constitute a limitation on the computer device 30. It may include more or less components than those shown in the figure, or combine certain components, or different components.
  • the computer device 30 may also include input and output devices, network access devices, buses, and so on.
  • the so-called processor 302 may be a central processing unit (Central Processing Unit, CPU), other general processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor 302 can also be any conventional processor, etc.
  • the processor 302 is the control center of the computer device 30, which uses various interfaces and lines to connect the entire computer device 30. Various parts.
  • the memory 301 may be used to store the computer-readable instructions, and the processor 302 executes or executes the computer-readable instructions or modules stored in the memory 301 and calls the data stored in the memory 301 to implement all the instructions.
  • the various functions of the computer device 30 are described.
  • the memory 301 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may Data and the like created in accordance with the use of the computer device 30 are stored.
  • the memory 301 may include a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, a flash memory card (Flash Card), at least one disk storage device, flash memory Devices, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), or other non-volatile/volatile storage devices.
  • the integrated module of the computer device 30 is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through computer-readable instructions.
  • the computer-readable instructions can be stored in a storage medium. When the computer-readable instructions are executed by the processor, they can implement the steps of the foregoing method embodiments.
  • the computer-readable instruction includes computer-readable instruction code, and the computer-readable instruction code may be in the form of source code, object code, executable file, or some intermediate form.
  • the computer-readable medium may include: any entity or device capable of carrying the computer-readable instruction code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory Take memory (RAM) and so on.
  • the computer-readable storage medium may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, an application program required by at least one function, etc.; the storage data area may store Data created by the use of nodes, etc.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, or in the form of hardware plus software functional modules.
  • the above-mentioned integrated modules implemented in the form of software functional modules may be stored in a computer readable storage medium.
  • the above-mentioned software function module is stored in a storage medium and includes a number of instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor execute the concept words described in the various embodiments of this application Part of the sequence generation method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

本申请涉及人工智能技术领域,提供一种概念词序列生成方法、装置、计算机设备及存储介质。该方法获取问题语句;获取概念知识库;根据概念知识库的关键词从问题语句中提取问题语句的关键词;根据概念知识库中的关键词和概念词的对应关系确定问题语句的关键词对应的概念词;按照问题语句的关键词的词序将问题语句的关键词对应的概念词组合为概念词序列。本申请根据概念知识库中的关键词和概念词的对应关系确定问题语句的关键词对应的概念词,并生成语句的概念词序列,提升生成概念词序列的效率。本申请还涉及医疗科技领域,提高医疗智能问答的准确度。同时,本申请还涉及区块链。

Description

概念词序列生成方法、装置、计算机设备及存储介质
本申请要求于2020年09月30日提交中国专利局,申请号为202011064339.X申请名称为“概念词序列生成方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,具体涉及一种概念词序列生成方法、装置、计算机设备及存储介质。
背景技术
人工智能技术领域的自然语言处理中,智能客服、远程问诊等是一个重要的版块。智能客服或远程问诊需要拥有问答匹配能力与推荐能力。概念词序列是智能客服具有问答匹配能力与推荐能力的基础。
概念词序列是与问题语句中的关键词对应的概念索引。发明人意识到,在生成概念词序列的时候,需要较大的抽象耗时,以生成关键词;同时需要较大的匹配耗时,以根据关键词匹配概念词。
如何根据问题语句生成问题语句的概念索引,提升生成概念索引的效率,成为待解决的问题。
发明内容
鉴于以上内容,有必要提出一种概念词序列生成方法、装置、计算机设备及存储介质,其可以生成语句的概念词序列,提升生成概念词序列的效率。
本申请的第一方面提供一种概念词序列生成方法,所述概念词序列生成方法包括:
获取问题语句;
获取概念知识库,所述概念知识库包括多个样本语句,每个样本语句对应多个关键词,所述样本语句的每个关键词对应一个概念词;
根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词;
根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词;
按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列。
本申请的第二方面提供一种概念词序列生成装置,所述概念词序列生成装置包括:
第一获取模块,用于获取问题语句;
第二获取模块,用于获取概念知识库,所述概念知识库包括多个样本语句,每个样本语句对应多个关键词,所述样本语句的每个关键词对应一个概念词;
提取模块,用于根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词;
确定模块,用于根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词;
组合模块,用于按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列。
本申请的第三方面提供一种计算机设备,所述计算机设备包括处理器,所述处理器用于 执行存储器中存储的计算机可读指令时实现以下步骤:
获取问题语句;
获取概念知识库,所述概念知识库包括多个样本语句,每个样本语句对应多个关键词,所述样本语句的每个关键词对应一个概念词;
根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词;
根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词;
按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列。
本申请的第四方面提供一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现以下步骤:
获取问题语句;
获取概念知识库,所述概念知识库包括多个样本语句,每个样本语句对应多个关键词,所述样本语句的每个关键词对应一个概念词;
根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词;
根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词;
按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列。
本申请根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词,并生成语句的概念词序列,提升生成概念词序列的效率。
附图说明
图1是本申请实施例提供的概念词序列生成方法的流程图。
图2是本申请实施例提供的确定问题语句的关键词对应的概念词的流程图。
图3是本申请实施例提供的计算概念词组合的概率得分的流程图。
图4是本申请实施例提供的概念词序列生成装置的结构图。
图5是本申请实施例提供的计算机设备的示意图。
具体实施方式
为了能够更清楚地理解本申请的上述目的、特征和优点,下面结合附图和具体实施例对本申请进行详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本申请,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。
优选地,本申请的概念词序列生成方法应用在一个或者多个计算机设备中。所述计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
实施例一
图1是本申请实施例一提供的概念词序列生成方法的流程图。所述概念词序列生成方法应用于计算机设备,用于生成语句的概念词序列,提升生成概念词序列的效率。
如图1所示,所述概念词序列生成方法包括:
101,获取问题语句。
在一具体实施例中,所述获取问题语句可包括:从云存储中拉取问题语句;或接收用户输入的问题语句;或通过摄像头采集包括问题语句的图像,通过文字符识别方法识别所述图像中的问题语句。本申请不做具体限制。本申请中,问题语句可以为与医疗保险相关的提问语句,例如,问题语句为“百万医哪些病可保”。
102,获取概念知识库,所述概念知识库包括多个样本语句,每个样本语句对应多个关键词,所述样本语句的每个关键词对应一个概念词。
例如,样本语句为“我在苹果手机上定了5斤苹果”,样本语句对应的多个关键词为第一个“苹果”、第二个“苹果”,第一个“苹果”对应一个概念词“苹果手机”,第二个“苹果”对应另一个概念词“水果苹果”。
再如,样本语句为“这个程序员经常被别人叫做码农”,样本语句对应的多个关键词为“程序员”、“码农”,“程序员”对应概念词“计算机从业者”,“码农”对应概念词“计算机从业者”。
所述多个样本语句中可以存在多个相同关键词,所述多个相同关键词对应的概念词可以相同、也可以不同。
每个样本语句对应的多个关键词可以是所述样本语句中的实体对象,也可以是所述样本语句的关键意图。在基于知识体系的问答匹配模型中,需要加入人对知识的认知,即关键词和关键词对应的概念词之间的对应关系。如苹果可以被认为是水果,也可以被认为是手机。
通过概念词和关键词之间的对应关系进行索引、匹配、推荐,能降低问答匹配模型的计算延迟。概念词之间的联系还可以用于保险产品的推荐。
103,根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词。
所述概念知识库包括多个样本语句,每个样本语句对应多个关键词。即概念库中包含了标注有关键词的多个样本语句,可以作为有参考价值的历史数据,可以基于统计学方法从所述问题语句中提取关键词。
在一具体实施例中,所述根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词包括:
对所述问题语句进行多次随机分词,得到多个分词结果;
对于每个分词结果,根据所述概念知识库的关键词计算所述分词结果中的每个词语的相似度得分和长度得分;
根据所述分词结果中的每个词语的相似度得分和长度得分计算所述分词结果的关键词得分;
从关键词得分最低的分词结果中提取词语作为所述问题语句的关键词。
具体的,对于分词结果的详细说明如下。对于多个分词结果中的任意一个分词结果,该分词结果中包括K个词语。对于所述分词结果中的任一词语,所述任一词语的长度越短,所述任一词语的长度得分越低;从所述概念知识库中获取所述任一词语的最相似关键词,将所述任一词语与所述最相似关键词的相似度的倒数确定为所述任一词语的相似度得分,所述任一词语与所述最相似关键词的相似度越高,所述任一词语的相似度得分越低。
具体地,所述分词结果的最低关键词得分为keywords,
Figure PCTCN2020131954-appb-000001
Figure PCTCN2020131954-appb-000002
其中,argmin (K,keyword)表示使本符号后接函数达到最小值时的变量K和keyword的取值,cost(keyword[k])表示K 个词语中的第k个词语的相似度得分,len(keyword[k])表示K个词语中的第k个词语的长度得分。
具体地,所述从所述概念知识库中获取所述任一词语的最相似关键词可以包括:
获取所述概念知识库中的每个词语的向量表示;
基于向量表示计算所述任一词语与所述概念知识库中的每个词语的欧式距离;
在所述概念知识库中,将与所述任一词语的欧式距离最小的词语确定为最相似关键词。
104,根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词。
所述概念知识库包括多个样本语句,每个样本语句对应多个关键词,所述样本语句的每个关键词对应一个概念词。即所述概念知识库中存在关键词与概念词的对应关系。可以用统计学方法基于所述概念知识库中的关键词与概念词的对应关系从所述问题语句中提取关键词。
如图2所示,所述根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词,包括:
41,根据所述概念知识库中的关键词和概念词的对应关系从所述概念知识库中获取所述问题语句的每个关键词的多个概念词;
42,将所述问题语句的每个关键词的任一概念词组合为所述问题语句的一个概念词组合,得到所述问题语句的多个概念词组合;
43,对于所述问题语句的每个概念词组合,计算所述概念词组合的概率得分;
44,匹配概率得分最高的概念词组合中的概念词,得到所述问题语句的关键词对应的概念词。
根据所述概念知识库中的关键词和概念词的对应关系从所述概念知识库中获取所述问题语句的每个关键词的多个概念词,以得到所述问题语句的概念词组合。例如,问题语句包括关键词1和关键词2。在概念知识库中,样本语句1中的关键词1对应概念词11,样本语句2中的关键词1对应概念词12;样本语句3中的关键词2对应概念词21,样本语句3中的关键词2对运营概念词22。得到问题语句的关键词1对应的多个概念词为,概念词11和概念词12;问题语句的关键词2对应的多个概念词为,概念词21和概念词22。问题语句的概念词组合可以包括“概念词11-概念词21”、“概念词11-概念词22”、“概念词12-概念词21”、“概念词12-概念词22”。
可以基于联合概率计算所述概念词组合的最高概率得分。优化的目标函数是:
Figure PCTCN2020131954-appb-000003
Figure PCTCN2020131954-appb-000004
表示使本符号后接函数达到最大值时的变量e 1,e 2,…,e n的取值。
其中,w n是所述问题语句中的第n个关键词,e n是所述问题语句中的第n个关键词对应的概念词,1≤n≤N,N是所述问题语句中的关键词数量。
在一种可选实施例中,基于联合概率计算所述概念词组合的概率得分为P(e 1,e 2,…,e n|w 1,w 2,…,w n),
Figure PCTCN2020131954-appb-000005
其中,P(w 1,w 2,…,w n)是基于最大团构成的全局联合信息。
如图3所示,所述计算所述概念词组合的概率得分进一步包括:
431,从所述概念知识库中随机抽取两个目标概念词,根据所述概念知识库的概念词计算所述两个目标概念词与所述概念词组合中的任意两个概念词一致的第一概率,得到多个第一概率;
432,从所述概念知识库中随机抽取一个关键词,根据所述概念知识库的关键词和概念词计算抽取的所述一个关键词与所述问题语句的每个关键词一致的第二概率,得到多个第二概率;
433,计算所述多个第一概率和所述多个第二概率的乘积,将得到的乘积结果作为所 述概念词组合的概率得分。
具体地,所述多个第一概率和所述多个第二概率的乘积与P(e 1,e 2,…,e n|w 1,w 2,…,w n)成正比,可以将得到的乘积结果作为所述概念词组合的概率得分,P(e 1,e 2,…,e n|w 1,w 2,…,w n)∝∏ nP(w n|e n)∏ i,jP(e i,e j)。其中,∏ i,jP(e i,e j)为最大团简化后的全局联合信息,P(w n|e n)表示一个第二概率,P(e i,e j)表示一个第一概率。1≤i≤N,1≤j≤N。
在一具体实施例中,所述从所述概念知识库中随机抽取两个目标概念词,根据所述概念知识库的概念词计算所述两个目标概念词与所述概念词组合中的任意两个概念词一致的第一概率,得到多个第一概率,包括:
(a1)将所述概念词组中的任意两个概念词记为第一概念词对,在所述概念知识库的每一个样本语句中查找所述第一概念词对,统计在所述概念知识库中查找到的所述第一概念词对的第一数量;
(b1)获取所述概念知识库中的多个概念词,对所述概念知识库中的多个概念词进行去重处理,将所述概念知识库的去重概念词中的任意两个概念词记为第二概念词对,得到多个第二概念词对;
(c1)计算在所述概念知识库中的所述多个第二概念词对的第二数量;
(d1)计算所述第一概念词对的第一数量与所述多个第二概念词对的第二数量的比值,将所述第一概念词对的第一数量与所述多个第二概念词对的第二数量的比值作为所述概念词组合中的任意两个概念词的第一概率,得到多个第一概率。
具体地,所述第一概念词对的第一数量与所述多个第二概念词对的第二数量的比值与P(e i,e j)成正比,可以将所述第一概念词对的第一数量与所述多个第二概念词对的第二数量的比值作为所述概念词组合中的任意两个概念词的第一概率,
Figure PCTCN2020131954-appb-000006
Figure PCTCN2020131954-appb-000007
其中,count[e i,e j]表示所述第一概念词对的第一数量,
Figure PCTCN2020131954-appb-000008
表示所述多个第二概念词对的第二数量的比值。
在另一可选实施例中,所述从所述概念知识库中随机抽取两个目标概念词,根据所述概念知识库的概念词计算所述两个目标概念词与所述概念词组合中的任意两个概念词一致的第一概率,得到多个第一概率,包括:
(a2)将所述概念词组中的任意两个概念词记为第三概念词对,在所述概念知识库的每一个样本语句中查找所述第三概念词对,统计在所述概念知识库中查找到的所述第三概念词对的第三数量;
(b2)从所述问题语句的关键词中获取与所述第三概念词对对应的目标关键词对,从所述概念知识库中获取与所述目标关键词对对应的多个第四概念词对,在所述概念知识库的每一个样本语句中查找所述多个第四概念词对,统计在所述概念知识库中查找到的所述多个第四概念词对的第四数量。
(c2)计算所述第三概念词对的第三数量与所述多个第四概念词对的第四数量的比值,得到所述概念词组中的任意两个概念词的第一概率,得到多个第一概率。
在一具体实施例中,所述从所述概念知识库中随机抽取一个关键词,根据所述概念知识库的关键词和概念词计算抽取的所述一个关键词与所述问题语句的每个关键词一致的第二概率,得到多个第二概率,包括:
(a3)将所述问题语句的每个关键词记为给定关键词,从所述概念词组合中查找所述给定关键词对应的概念词,记为给定概念词,将所述给定关键词和所述给定概念词组合为第一目标词对;
(b3)在所述概念知识库的概念词中统计所述给定概念词的数量,记为第五数量;
(c3)在所述概念知识库的关键词-概念词词对中统计所述第一目标词对的数量,记为第六数量;
(d3)计算所述第六数量与所述第五数量的比值,得到所述给定关键词的第二概率,得到多个第二概率。
具体地,所述第六数量与所述第五数量的比值与P(w n|e n)成正比,可以将所述第六数量与所述第五数量的比值作为所述给定关键词的第二概率,
Figure PCTCN2020131954-appb-000009
其中,count[w n,e n]表示所述第一目标词对的第六数量,count[w n,e n]表示所述给定概念词的第五数量。
在另一实施例中,所述从所述概念知识库中随机抽取一个关键词,根据所述概念知识库的关键词和概念词计算抽取的所述一个关键词与所述问题语句的每个关键词一致的第二概率,得到多个第二概率,包括:
(a4)将所述问题语句的每个关键词记为指定关键词,从所述概念词组合中查找所述指定关键词对应的概念词,记为指定概念词;
(b4)从所述问题语句中获取所述指定关键词的上下文信息,将所述给定关键词、所述上下文信息和所述给定概念词组合为第二目标词对,将所述上下文信息和所述给定概念词组合为第三目标词对;
(c4)获取所述概念知识库的上下文信息-概念词词对,和所述概念知识库的关键词-上下文信息-概念词词对;
(d4)在所述概念知识库的上下文信息-概念词词对中统计所述第三目标词对的数量,记为第七数量;
(e4)在所述概念知识库的关键词-上下文信息-概念词词对中统计所述第四目标词对的数量,记为第八数量;
(f4)计算所述第八数量与所述第七数量的比值,得到所述指定关键词的第二概率,进而得到多个第二概率。
具体地,所述第八数量与所述第七数量的比值与P(w n|e n)成正比,所述第八数量与所述第七数量的比值作为所述指定关键词的第二概率,
Figure PCTCN2020131954-appb-000010
Figure PCTCN2020131954-appb-000011
其中,count[w n,e n,w n-1,w n-2,w n+1,w n+2]表示所述第四目标词对的第八数量,count[e n,w n-1,w n-2,w n+1,w n+2]表示所述第三目标词对的第七数量。w n-1,w n-2,w n+1,w n+2表示所述上下文信息,如所述指定关键词的前后两个词。
如上例,问题语句为“百万医哪些病可保”,问题语句的关键词对应的概念词为“e生平安百万医,what,疾病,保障”。
105,按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列。
例如,可以获取预设词队列,将问题语句的关键词转化为词向量,将转化得到的多个词向量按照问题语句的关键词的词序组合为概念词序列,并存储所述预设词队列。
所述概念词序列是所述问题语句的抽象表示,可以作为中间数据,用于进一步对所述问题语句进行自然语言处理。
在另一实施例中,在所述按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列之后,所述概念词序列生成方法还包括:
根据所述概念词序列匹配与所述问题语句对应的答案。
通过根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词,提升生成概念词序列的效率,从而增加通过语句的概念词序列进行问答匹配的准确率。
需要强调的是,为进一步保证所述概念词序列的私密和安全性,所述概念词序列还可以存储于一区块链的节点中。
实施例一的概念词序列生成方法根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词,并生成语句的概念词序列,提升生成概 念词序列的效率。
通过本申请上述概念词序列生成方法,能够在医疗科技的远程问诊中提升概念词序列的效率,进而提高远程问诊的问答准确度,有利于远程医疗服务的发展。
实施例二
图4是本申请实施例二提供的概念词序列生成装置的结构图。所述概念词序列生成装置20应用于计算机设备。所述概念词序列生成装置20用于生成语句的概念词序列,提升生成概念词序列的效率。
如图4所示,所述概念词序列生成装置20可以包括第一获取模块201、第二获取模块202、提取模块203、确定模块204、组合模块205。
第一获取模块201,用于获取问题语句。
在一具体实施例中,所述获取问题语句可包括:从云存储中拉取问题语句;或接收用户输入的问题语句;或通过摄像头采集包括问题语句的图像,通过文字符识别方法识别所述图像中的问题语句。本申请不做具体限制。本申请中,问题语句可以为与医疗保险相关的提问语句,例如,问题语句为“百万医哪些病可保”。
第二获取模块202,用于获取概念知识库,所述概念知识库包括多个样本语句,每个样本语句对应多个关键词,所述样本语句的每个关键词对应一个概念词。
例如,样本语句为“我在苹果手机上定了5斤苹果”,样本语句对应的多个关键词为第一个“苹果”、第二个“苹果”,第一个“苹果”对应一个概念词“苹果手机”,第二个“苹果”对应另一个概念词“水果苹果”。
再如,样本语句为“这个程序员经常被别人叫做码农”,样本语句对应的多个关键词为“程序员”、“码农”,“程序员”对应概念词“计算机从业者”,“码农”对应概念词“计算机从业者”。
所述多个样本语句中可以存在多个相同关键词,所述多个相同关键词对应的概念词可以相同、也可以不同。
每个样本语句对应的多个关键词可以是所述样本语句中的实体对象,也可以是所述样本语句的关键意图。在基于知识体系的问答匹配模型中,需要加入人对知识的认知,即关键词和关键词对应的概念词之间的对应关系。如苹果可以被认为是水果,也可以被认为是手机。
通过概念词和关键词之间的对应关系进行索引、匹配、推荐,能降低问答匹配模型的计算延迟。概念词之间的联系还可以用于保险产品的推荐。
提取模块203,用于根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词。
所述概念知识库包括多个样本语句,每个样本语句对应多个关键词。即概念库中包含了标注有关键词的多个样本语句,可以作为有参考价值的历史数据,可以基于统计学方法从所述问题语句中提取关键词。
在一具体实施例中,所述根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词包括:
对所述问题语句进行多次随机分词,得到多个分词结果;
对于每个分词结果,根据所述概念知识库的关键词计算所述分词结果中的每个词语的相似度得分和长度得分;
根据所述分词结果中的每个词语的相似度得分和长度得分计算所述分词结果的关键词得分;
从关键词得分最低的分词结果中提取词语作为所述问题语句的关键词。
具体的,对于分词结果的详细说明如下。对于多个分词结果中的任意一个分词结果,该分词结果中包括K个词语。对于所述分词结果中的任一词语,所述任一词语的长度越 短,所述任一词语的长度得分越低;从所述概念知识库中获取所述任一词语的最相似关键词,将所述任一词语与所述最相似关键词的相似度的倒数确定为所述任一词语的相似度得分,所述任一词语与所述最相似关键词的相似度越高,所述任一词语的相似度得分越低。
具体地,所述分词结果的最低关键词得分为keywords,
Figure PCTCN2020131954-appb-000012
Figure PCTCN2020131954-appb-000013
其中,argmin (K,keyword)表示使本符号后接函数达到最小值时的变量K和keyword的取值,cost(keyword[k])表示K个词语中的第k个词语的相似度得分,len(keyword[k])表示K个词语中的第k个词语的长度得分。
具体地,所述从所述概念知识库中获取所述任一词语的最相似关键词可以包括:
获取所述概念知识库中的每个词语的向量表示;
基于向量表示计算所述任一词语与所述概念知识库中的每个词语的欧式距离;
在所述概念知识库中,将与所述任一词语的欧式距离最小的词语确定为最相似关键词。
确定模块204,用于根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词。
所述概念知识库包括多个样本语句,每个样本语句对应多个关键词,所述样本语句的每个关键词对应一个概念词。即所述概念知识库中存在关键词与概念词的对应关系。可以用统计学方法基于所述概念知识库中的关键词与概念词的对应关系从所述问题语句中提取关键词。
在一具体实施例中,所述根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词包括:
41,根据所述概念知识库中的关键词和概念词的对应关系从所述概念知识库中获取所述问题语句的每个关键词的多个概念词;
42,将所述问题语句的每个关键词的任一概念词组合为所述问题语句的一个概念词组合,得到所述问题语句的多个概念词组合;
43,对于所述问题语句的每个概念词组合,计算所述概念词组合的概率得分;
44,匹配概率得分最高的概念词组合中的概念词,得到所述问题语句的关键词对应的概念词。
根据所述概念知识库中的关键词和概念词的对应关系从所述概念知识库中获取所述问题语句的每个关键词的多个概念词,以得到所述问题语句的概念词组合。例如,问题语句包括关键词1和关键词2。在概念知识库中,样本语句1中的关键词1对应概念词11,样本语句2中的关键词1对应概念词12;样本语句3中的关键词2对应概念词21,样本语句3中的关键词2对运营概念词22。得到问题语句的关键词1对应的多个概念词为,概念词11和概念词12;问题语句的关键词2对应的多个概念词为,概念词21和概念词22。问题语句的概念词组合可以包括“概念词11-概念词21”、“概念词11-概念词22”、“概念词12-概念词21”、“概念词12-概念词22”。
可以基于联合概率计算所述概念词组合的最高概率得分。优化的目标函数是:
Figure PCTCN2020131954-appb-000014
Figure PCTCN2020131954-appb-000015
表示使本符号后接函数达到最大值时的变量e 1,e 2,…,e n的取值。
其中,w n是所述问题语句中的第n个关键词,e n是所述问题语句中的第n个关键词对应的概念词,1≤n≤N,N是所述问题语句中的关键词数量。
在一种可选实施例中,基于联合概率计算所述概念词组合的概率得分为P(e 1,e 2,…,e n|w 1,w 2,…,w n),
Figure PCTCN2020131954-appb-000016
其中,P(w 1,w 2,…,w n)是基于最大团构成的全局联合信息。
在一具体实施例中,所述计算所述概念词组合的概率得分进一步包括:
431,从所述概念知识库中随机抽取两个目标概念词,根据所述概念知识库的概念词计算所述两个目标概念词与所述概念词组合中的任意两个概念词一致的第一概率,得到多个第一概率;
432,从所述概念知识库中随机抽取一个关键词,根据所述概念知识库的关键词和概念词计算抽取的所述一个关键词与所述问题语句的每个关键词一致的第二概率,得到多个第二概率;
433,计算所述多个第一概率和所述多个第二概率的乘积,将得到的乘积结果作为所述概念词组合的概率得分。
具体地,所述多个第一概率和所述多个第二概率的乘积与P(e 1,e 2,…,e n|w 1,w 2,…,w n)成正比,可以将得到的乘积结果作为所述概念词组合的概率得分,P(e 1,e 2,…,e n|w 1,w 2,…,w n)∝∏ nP(w n|e n)∏ i,jP(e i,e j)。其中,∏ i,jP(e i,e j)为最大团简化后的全局联合信息,P(w n|e n)表示一个第二概率,P(e i,e j)表示一个第一概率。1≤i≤N,1≤j≤N。
在一具体实施例中,所述从所述概念知识库中随机抽取两个目标概念词,根据所述概念知识库的概念词计算所述两个目标概念词与所述概念词组合中的任意两个概念词一致的第一概率,得到多个第一概率,包括:
(a1)将所述概念词组中的任意两个概念词记为第一概念词对,在所述概念知识库的每一个样本语句中查找所述第一概念词对,统计在所述概念知识库中查找到的所述第一概念词对的第一数量;
(b1)获取所述概念知识库中的多个概念词,对所述概念知识库中的多个概念词进行去重处理,将所述概念知识库的去重概念词中的任意两个概念词记为第二概念词对,得到多个第二概念词对;
(c1)计算在所述概念知识库中的所述多个第二概念词对的第二数量;
(d1)计算所述第一概念词对的第一数量与所述多个第二概念词对的第二数量的比值,将所述第一概念词对的第一数量与所述多个第二概念词对的第二数量的比值作为所述概念词组合中的任意两个概念词的第一概率,得到多个第一概率。
具体地,所述第一概念词对的第一数量与所述多个第二概念词对的第二数量的比值与P(e i,e j)成正比,可以将所述第一概念词对的第一数量与所述多个第二概念词对的第二数量的比值作为所述概念词组合中的任意两个概念词的第一概率,
Figure PCTCN2020131954-appb-000017
Figure PCTCN2020131954-appb-000018
其中,count[e i,e j]表示所述第一概念词对的第一数量,
Figure PCTCN2020131954-appb-000019
表示所述多个第二概念词对的第二数量的比值。
在另一可选实施例中,所述从所述概念知识库中随机抽取两个目标概念词,根据所述概念知识库的概念词计算所述两个目标概念词与所述概念词组合中的任意两个概念词一致的第一概率,得到多个第一概率,包括:
(a2)将所述概念词组中的任意两个概念词记为第三概念词对,在所述概念知识库的每一个样本语句中查找所述第三概念词对,统计在所述概念知识库中查找到的所述第三概念词对的第三数量;
(b2)从所述问题语句的关键词中获取与所述第三概念词对对应的目标关键词对,从所述概念知识库中获取与所述目标关键词对对应的多个第四概念词对,在所述概念知识库的每一个样本语句中查找所述多个第四概念词对,统计在所述概念知识库中查找到的所述多个第四概念词对的第四数量。
(c2)计算所述第三概念词对的第三数量与所述多个第四概念词对的第四数量的比值,得到所述概念词组中的任意两个概念词的第一概率,得到多个第一概率。
在一具体实施例中,所述从所述概念知识库中随机抽取一个关键词,根据所述概念知识库的关键词和概念词计算抽取的所述一个关键词与所述问题语句的每个关键词一致 的第二概率,得到多个第二概率,包括:
(a3)将所述问题语句的每个关键词记为给定关键词,从所述概念词组合中查找所述给定关键词对应的概念词,记为给定概念词,将所述给定关键词和所述给定概念词组合为第一目标词对;
(b3)在所述概念知识库的概念词中统计所述给定概念词的数量,记为第五数量;
(c3)在所述概念知识库的关键词-概念词词对中统计所述第一目标词对的数量,记为第六数量;
(d3)计算所述第六数量与所述第五数量的比值,得到所述给定关键词的第二概率,得到多个第二概率。
具体地,所述第六数量与所述第五数量的比值与P(w n|e n)成正比,可以将所述第六数量与所述第五数量的比值作为所述给定关键词的第二概率,
Figure PCTCN2020131954-appb-000020
其中,count[w n,e n]表示所述第一目标词对的第六数量,count[w n,e n]表示所述给定概念词的第五数量。
在另一实施例中,所述从所述概念知识库中随机抽取一个关键词,根据所述概念知识库的关键词和概念词计算抽取的所述一个关键词与所述问题语句的每个关键词一致的第二概率,得到多个第二概率,包括:
(a4)将所述问题语句的每个关键词记为指定关键词,从所述概念词组合中查找所述指定关键词对应的概念词,记为指定概念词;
(b4)从所述问题语句中获取所述指定关键词的上下文信息,将所述给定关键词、所述上下文信息和所述给定概念词组合为第二目标词对,将所述上下文信息和所述给定概念词组合为第三目标词对;
(c4)获取所述概念知识库的上下文信息-概念词词对,和所述概念知识库的关键词-上下文信息-概念词词对;
(d4)在所述概念知识库的上下文信息-概念词词对中统计所述第三目标词对的数量,记为第七数量;
(e4)在所述概念知识库的关键词-上下文信息-概念词词对中统计所述第四目标词对的数量,记为第八数量;
(f4)计算所述第八数量与所述第七数量的比值,得到所述指定关键词的第二概率,进而得到多个第二概率。
具体地,所述第八数量与所述第七数量的比值与P(w n|e n)成正比,所述第八数量与所述第七数量的比值作为所述指定关键词的第二概率,
Figure PCTCN2020131954-appb-000021
Figure PCTCN2020131954-appb-000022
其中,count[w n,e n,w n-1,w n-2,w n+1,w n+2]表示所述第四目标词对的第八数量,count[e n,w n-1,w n-2,w n+1,w n+2]表示所述第三目标词对的第七数量。w n-1,w n-2,w n+1,w n+2表示所述上下文信息,如所述指定关键词的前后两个词。
如上例,问题语句为“百万医哪些病可保”,问题语句的关键词对应的概念词为“e生平安百万医,what,疾病,保障”。
组合模块205,用于按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列。
例如,可以获取预设词队列,将问题语句的关键词转化为词向量,将转化得到的多个词向量按照问题语句的关键词的词序组合为概念词序列,并存储所述预设词队列。
所述概念词序列是所述问题语句的抽象表示,可以作为中间数据,用于进一步对所述问题语句进行自然语言处理。
在另一实施例中,概念词序列生成装置还包括匹配模块,用于在所述按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列之后,根据所述概念词序列匹配与所述问题语句对应的答案。
通过根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词,提升生成概念词序列的效率,从而增加通过语句的概念词序列进行问答匹配的准确率。
需要强调的是,为进一步保证所述概念词序列的私密和安全性,所述概念词序列还可以存储于一区块链的节点中。
实施例二的概念词序列生成装置20根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词,并生成语句的概念词序列,提升生成概念词序列的效率。
实施例三
本实施例提供一种存储介质,该存储介质上存储有计算机可读指令,该存储介质可以是非易失性,也可以是易失性,该计算机可读指令被处理器执行时实现上述概念词序列生成方法实施例中的步骤,例如图1所示的步骤101-105:
101,获取问题语句;
102,获取概念知识库,所述概念知识库包括多个样本语句,每个样本语句对应多个关键词,所述样本语句的每个关键词对应一个概念词;
103,根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词;
104,根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词;
105,按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列。
或者,该计算机可读指令被处理器执行时实现上述装置实施例中各模块的功能,例如图4中的模块201-205:
第一获取模块201,用于获取问题语句;
第二获取模块202,用于获取概念知识库,所述概念知识库包括多个样本语句,每个样本语句对应多个关键词,所述样本语句的每个关键词对应一个概念词;
提取模块203,用于根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词;
确定模块204,用于根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词;
组合模块205,用于按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列。
实施例四
图5为本申请实施例三提供的计算机设备的示意图。所述计算机设备30包括存储器301、处理器302以及存储在所述存储器301中并可在所述处理器302上运行的计算机可读指令,例如概念词序列生成程序。所述处理器302执行所述计算机可读指令时实现上述概念词序列生成方法实施例中的步骤,例如图1所示的101-105:
101,获取问题语句;
102,获取概念知识库,所述概念知识库包括多个样本语句,每个样本语句对应多个关键词,所述样本语句的每个关键词对应一个概念词;
103,根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词;
104,根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词;
105,按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列。
或者,该计算机可读指令被处理器执行时实现上述装置实施例中各模块的功能,例 如图4中的模块201-205:
第一获取模块201,用于获取问题语句;
第二获取模块202,用于获取概念知识库,所述概念知识库包括多个样本语句,每个样本语句对应多个关键词,所述样本语句的每个关键词对应一个概念词;
提取模块203,用于根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词;
确定模块204,用于根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词;
组合模块205,用于按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列。
示例性的,所述计算机可读指令可以被分割成一个或多个模块,所述一个或者多个模块被存储在所述存储器301中,并由所述处理器302执行,以完成本方法。所述一个或多个模块可以是能够完成特定功能的一系列计算机可读指令段,该指令段用于描述所述计算机可读指令在所述计算机设备30中的执行过程。例如,所述计算机可读指令可以被分割成图4中的第一获取模块201、第二获取模块202、提取模块203、确定模块204、组合模块205,各模块具体功能参见实施例二。
本领域技术人员可以理解,所述示意图5仅仅是计算机设备30的示例,并不构成对计算机设备30的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述计算机设备30还可以包括输入输出设备、网络接入设备、总线等。
所称处理器302可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器302也可以是任何常规的处理器等,所述处理器302是所述计算机设备30的控制中心,利用各种接口和线路连接整个计算机设备30的各个部分。
所述存储器301可用于存储所述计算机可读指令,所述处理器302通过运行或执行存储在所述存储器301内的计算机可读指令或模块,以及调用存储在存储器301内的数据,实现所述计算机设备30的各种功能。所述存储器301可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据计算机设备30的使用所创建的数据等。此外,存储器301可以包括硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)或其他非易失性/易失性存储器件。
所述计算机设备30集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一存储介质中,该计算机可读指令在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机可读指令包括计算机可读指令代码,所述计算机可读指令代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机可读指令代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM)、随机存取存储器(RAM)等。
进一步地,所述计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据 区块链节点的使用所创建的数据等。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
上述以软件功能模块的形式实现的集成的模块,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述概念词序列生成方法的部分步骤。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他模块或步骤,单数不排除复数。本申请中陈述的多个模块或装置也可以由一个模块或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。

Claims (20)

  1. 一种概念词序列生成方法,其中,所述概念词序列生成方法包括:
    获取问题语句;
    获取概念知识库,所述概念知识库包括多个样本语句,每个样本语句对应多个关键词,所述样本语句的每个关键词对应一个概念词;
    根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词;
    根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词;
    按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列。
  2. 如权利要求1所述的概念词序列生成方法,其中,所述根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词,包括:
    对所述问题语句进行多次随机分词,得到多个分词结果;
    对于每个分词结果,根据所述概念知识库的关键词计算所述分词结果中的每个词语的相似度得分和长度得分;
    根据所述分词结果中的每个词语的相似度得分和长度得分计算所述分词结果的关键词得分;
    从关键词得分最低的分词结果中提取词语作为所述问题语句的关键词。
  3. 如权利要求1所述的概念词序列生成方法,其中,所述根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词,包括:
    根据所述概念知识库中的关键词和概念词的对应关系从所述概念知识库中获取所述问题语句的每个关键词的多个概念词;
    将所述问题语句的每个关键词的任一概念词组合为所述问题语句的一个概念词组合,得到所述问题语句的多个概念词组合;
    对于所述问题语句的每个概念词组合,计算所述概念词组合的概率得分;
    匹配概率得分最高的概念词组合中的概念词,得到所述问题语句的关键词对应的概念词。
  4. 如权利要求3所述的概念词序列生成方法,其中,所述计算所述概念词组合的概率得分,包括:
    从所述概念知识库中随机抽取两个目标概念词,根据所述概念知识库的概念词计算所述两个目标概念词与所述概念词组合中的任意两个概念词一致的第一概率,得到多个第一概率;
    从所述概念知识库中随机抽取一个关键词,根据所述概念知识库的关键词和概念词计算抽取的所述一个关键词与所述问题语句的每个关键词一致的第二概率,得到多个第二概率;
    计算所述多个第一概率和所述多个第二概率的乘积,将得到的乘积结果作为所述概念词组合的概率得分。
  5. 如权利要求4所述的概念词序列生成方法,其中,所述从所述概念知识库中随机抽取两个目标概念词,根据所述概念知识库的概念词计算所述两个目标概念词与所述概念词组合中的任意两个概念词一致的第一概率,得到多个第一概率,包括:
    将所述概念词组中的任意两个概念词记为第一概念词对,在所述概念知识库的每一个样本语句中查找所述第一概念词对,统计在所述概念知识库中查找到的所述第一概念词对的第一数量;
    获取所述概念知识库中的多个概念词,对所述概念知识库中的多个概念词进行去重处理,将所述概念知识库的去重概念词中的任意两个概念词记为第二概念词对,得到多 个第二概念词对;
    计算在所述概念知识库中的所述多个第二概念词对的第二数量;
    计算所述第一概念词对的第一数量与所述多个第二概念词对的第二数量的比值,将所述第一概念词对的第一数量与所述多个第二概念词对的第二数量的比值作为所述概念词组合中的任意两个概念词的第一概率,得到多个第一概率。
  6. 如权利要求4所述的概念词序列生成方法,其中,所述从所述概念知识库中随机抽取一个关键词,根据所述概念知识库的关键词和概念词计算抽取的所述一个关键词与所述问题语句的每个关键词一致的第二概率,得到多个第二概率,包括:
    将所述问题语句的每个关键词记为给定关键词,从所述概念词组合中查找所述给定关键词对应的概念词,记为给定概念词,将所述给定关键词和所述给定概念词组合为第一目标词对;
    在所述概念知识库的概念词中统计所述给定概念词的数量,记为第五数量;
    在所述概念知识库的关键词-概念词词对中统计所述第一目标词对的数量,记为第六数量;
    计算所述第六数量与所述第五数量的比值,得到所述给定关键词的第二概率,得到多个第二概率。
  7. 如权利要求4所述的概念词序列生成方法,其中,所述从所述概念知识库中随机抽取一个关键词,根据所述概念知识库的关键词和概念词计算抽取的所述一个关键词与所述问题语句的每个关键词一致的第二概率,得到多个第二概率,包括:
    将所述问题语句的每个关键词记为指定关键词,从所述概念词组合中查找所述指定关键词对应的概念词,记为指定概念词;
    从所述问题语句中获取所述指定关键词的上下文信息;
    将所述给定关键词、所述上下文信息和所述给定概念词组合为第二目标词对,将所述上下文信息和所述给定概念词组合为第三目标词对;
    获取所述概念知识库的上下文信息-概念词词对,和所述概念知识库的关键词-上下文信息-概念词词对;
    在所述概念知识库的上下文信息-概念词词对中统计所述第三目标词对的数量,记为第七数量;
    在所述概念知识库的关键词-上下文信息-概念词词对中统计所述第四目标词对的数量,记为第八数量;
    计算所述第八数量与所述第七数量的比值,得到所述指定关键词的第二概率,进而得到多个第二概率。
  8. 一种概念词序列生成装置,其中,所述概念词序列生成装置包括:
    第一获取模块,用于获取问题语句;
    第二获取模块,用于获取概念知识库,所述概念知识库包括多个样本语句,每个样本语句对应多个关键词,所述样本语句的每个关键词对应一个概念词;
    提取模块,用于根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词;
    确定模块,用于根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词;
    组合模块,用于按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列。
  9. 一种计算机设备,其中,所述计算机设备包括处理器和存储器,所述处理器用于执行存储器中存储的计算机可读指令以实现以下步骤:
    获取问题语句;
    获取概念知识库,所述概念知识库包括多个样本语句,每个样本语句对应多个关键词, 所述样本语句的每个关键词对应一个概念词;
    根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词;
    根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词;
    按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列。
  10. 如权利要求9所述的计算机设备,其中,所述处理器执行所述存储器中存储的计算机可读指令以实现所述根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词时,包括:
    对所述问题语句进行多次随机分词,得到多个分词结果;
    对于每个分词结果,根据所述概念知识库的关键词计算所述分词结果中的每个词语的相似度得分和长度得分;
    根据所述分词结果中的每个词语的相似度得分和长度得分计算所述分词结果的关键词得分;
    从关键词得分最低的分词结果中提取词语作为所述问题语句的关键词。
  11. 如权利要求9所述的计算机设备,其中,所述处理器执行所述存储器中存储的计算机可读指令以实现所述根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词时,包括:
    根据所述概念知识库中的关键词和概念词的对应关系从所述概念知识库中获取所述问题语句的每个关键词的多个概念词;
    将所述问题语句的每个关键词的任一概念词组合为所述问题语句的一个概念词组合,得到所述问题语句的多个概念词组合;
    对于所述问题语句的每个概念词组合,计算所述概念词组合的概率得分;
    匹配概率得分最高的概念词组合中的概念词,得到所述问题语句的关键词对应的概念词。
  12. 如权利要求11所述的计算机设备,其中,所述处理器执行所述存储器中存储的计算机可读指令以实现所述计算所述概念词组合的概率得分时,包括:
    从所述概念知识库中随机抽取两个目标概念词,根据所述概念知识库的概念词计算所述两个目标概念词与所述概念词组合中的任意两个概念词一致的第一概率,得到多个第一概率;
    从所述概念知识库中随机抽取一个关键词,根据所述概念知识库的关键词和概念词计算抽取的所述一个关键词与所述问题语句的每个关键词一致的第二概率,得到多个第二概率;
    计算所述多个第一概率和所述多个第二概率的乘积,将得到的乘积结果作为所述概念词组合的概率得分。
  13. 如权利要求12所述的计算机设备,其中,所述处理器执行所述存储器中存储的计算机可读指令以实现所述从所述概念知识库中随机抽取两个目标概念词,根据所述概念知识库的概念词计算所述两个目标概念词与所述概念词组合中的任意两个概念词一致的第一概率,得到多个第一概率时,包括:
    将所述概念词组中的任意两个概念词记为第一概念词对,在所述概念知识库的每一个样本语句中查找所述第一概念词对,统计在所述概念知识库中查找到的所述第一概念词对的第一数量;
    获取所述概念知识库中的多个概念词,对所述概念知识库中的多个概念词进行去重处理,将所述概念知识库的去重概念词中的任意两个概念词记为第二概念词对,得到多个第二概念词对;
    计算在所述概念知识库中的所述多个第二概念词对的第二数量;
    计算所述第一概念词对的第一数量与所述多个第二概念词对的第二数量的比值,将所述第一概念词对的第一数量与所述多个第二概念词对的第二数量的比值作为所述概念词组合中的任意两个概念词的第一概率,得到多个第一概率。
  14. 如权利要求12所述的计算机设备,其中,所述处理器执行所述存储器中存储的计算机可读指令以实现所述从所述概念知识库中随机抽取一个关键词,根据所述概念知识库的关键词和概念词计算抽取的所述一个关键词与所述问题语句的每个关键词一致的第二概率,得到多个第二概率时,包括:
    将所述问题语句的每个关键词记为给定关键词,从所述概念词组合中查找所述给定关键词对应的概念词,记为给定概念词,将所述给定关键词和所述给定概念词组合为第一目标词对;
    在所述概念知识库的概念词中统计所述给定概念词的数量,记为第五数量;
    在所述概念知识库的关键词-概念词词对中统计所述第一目标词对的数量,记为第六数量;
    计算所述第六数量与所述第五数量的比值,得到所述给定关键词的第二概率,得到多个第二概率。
  15. 如权利要求12所述的计算机设备,其中,所述处理器执行所述存储器中存储的计算机可读指令以实现所述从所述概念知识库中随机抽取一个关键词,根据所述概念知识库的关键词和概念词计算抽取的所述一个关键词与所述问题语句的每个关键词一致的第二概率,得到多个第二概率时,包括:
    将所述问题语句的每个关键词记为指定关键词,从所述概念词组合中查找所述指定关键词对应的概念词,记为指定概念词;
    从所述问题语句中获取所述指定关键词的上下文信息;
    将所述给定关键词、所述上下文信息和所述给定概念词组合为第二目标词对,将所述上下文信息和所述给定概念词组合为第三目标词对;
    获取所述概念知识库的上下文信息-概念词词对,和所述概念知识库的关键词-上下文信息-概念词词对;
    在所述概念知识库的上下文信息-概念词词对中统计所述第三目标词对的数量,记为第七数量;
    在所述概念知识库的关键词-上下文信息-概念词词对中统计所述第四目标词对的数量,记为第八数量;
    计算所述第八数量与所述第七数量的比值,得到所述指定关键词的第二概率,进而得到多个第二概率。
  16. 一种存储介质,所述存储介质上存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现以下步骤:
    获取问题语句;
    获取概念知识库,所述概念知识库包括多个样本语句,每个样本语句对应多个关键词,所述样本语句的每个关键词对应一个概念词;
    根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词;
    根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词;
    按照所述问题语句的关键词的词序将所述问题语句的关键词对应的概念词组合为概念词序列。
  17. 如权利要求16所述的存储介质,其中,所述计算机可读指令被所述处理器执行以实现所述根据所述概念知识库的关键词从所述问题语句中提取所述问题语句的关键词时,包括:
    对所述问题语句进行多次随机分词,得到多个分词结果;
    对于每个分词结果,根据所述概念知识库的关键词计算所述分词结果中的每个词语的相似度得分和长度得分;
    根据所述分词结果中的每个词语的相似度得分和长度得分计算所述分词结果的关键词 得分;
    从关键词得分最低的分词结果中提取词语作为所述问题语句的关键词。
  18. 如权利要求16所述的存储介质,其中,所述计算机可读指令被所述处理器执行以实现所述根据所述概念知识库中的关键词和概念词的对应关系确定所述问题语句的关键词对应的概念词时,包括:
    根据所述概念知识库中的关键词和概念词的对应关系从所述概念知识库中获取所述问题语句的每个关键词的多个概念词;
    将所述问题语句的每个关键词的任一概念词组合为所述问题语句的一个概念词组合,得到所述问题语句的多个概念词组合;
    对于所述问题语句的每个概念词组合,计算所述概念词组合的概率得分;
    匹配概率得分最高的概念词组合中的概念词,得到所述问题语句的关键词对应的概念词。
  19. 如权利要求18所述的存储介质,其中,所述计算机可读指令被所述处理器执行以实现所述计算所述概念词组合的概率得分时,包括:
    从所述概念知识库中随机抽取两个目标概念词,根据所述概念知识库的概念词计算所述两个目标概念词与所述概念词组合中的任意两个概念词一致的第一概率,得到多个第一概率;
    从所述概念知识库中随机抽取一个关键词,根据所述概念知识库的关键词和概念词计算抽取的所述一个关键词与所述问题语句的每个关键词一致的第二概率,得到多个第二概率;
    计算所述多个第一概率和所述多个第二概率的乘积,将得到的乘积结果作为所述概念词组合的概率得分。
  20. 如权利要求19所述的存储介质,其中,所述计算机可读指令被所述处理器执行以实现所述从所述概念知识库中随机抽取两个目标概念词,根据所述概念知识库的概念词计算所述两个目标概念词与所述概念词组合中的任意两个概念词一致的第一概率,得到多个第一概率时,包括:
    将所述概念词组中的任意两个概念词记为第一概念词对,在所述概念知识库的每一个样本语句中查找所述第一概念词对,统计在所述概念知识库中查找到的所述第一概念词对的第一数量;
    获取所述概念知识库中的多个概念词,对所述概念知识库中的多个概念词进行去重处理,将所述概念知识库的去重概念词中的任意两个概念词记为第二概念词对,得到多个第二概念词对;
    计算在所述概念知识库中的所述多个第二概念词对的第二数量;
    计算所述第一概念词对的第一数量与所述多个第二概念词对的第二数量的比值,将所述第一概念词对的第一数量与所述多个第二概念词对的第二数量的比值作为所述概念词组合中的任意两个概念词的第一概率,得到多个第一概率。
PCT/CN2020/131954 2020-09-30 2020-11-26 概念词序列生成方法、装置、计算机设备及存储介质 WO2021174923A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011064339.X 2020-09-30
CN202011064339.XA CN112199958A (zh) 2020-09-30 2020-09-30 概念词序列生成方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021174923A1 true WO2021174923A1 (zh) 2021-09-10

Family

ID=74013140

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/131954 WO2021174923A1 (zh) 2020-09-30 2020-11-26 概念词序列生成方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN112199958A (zh)
WO (1) WO2021174923A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361272B (zh) * 2021-06-22 2023-03-21 海信视像科技股份有限公司 一种媒资标题的概念词提取方法及装置
CN113255351B (zh) * 2021-06-22 2023-02-03 中国平安财产保险股份有限公司 语句意图识别方法、装置、计算机设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020042711A1 (en) * 2000-08-11 2002-04-11 Yi-Chung Lin Method for probabilistic error-tolerant natural language understanding
CN101097573A (zh) * 2006-06-28 2008-01-02 腾讯科技(深圳)有限公司 一种自动问答系统及方法
CN105279252A (zh) * 2015-10-12 2016-01-27 广州神马移动信息科技有限公司 挖掘相关词的方法、搜索方法、搜索系统
CN107832291A (zh) * 2017-10-26 2018-03-23 平安科技(深圳)有限公司 人机协作的客服方法、电子装置及存储介质
CN108509476A (zh) * 2017-09-30 2018-09-07 平安科技(深圳)有限公司 问题联想推送方法、电子装置及计算机可读存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150382B (zh) * 2013-03-14 2015-04-01 中国科学院计算技术研究所 基于开放知识库的短文本语义概念自动化扩展方法及系统
CN108460011B (zh) * 2018-02-01 2022-03-25 北京百度网讯科技有限公司 一种实体概念标注方法及系统
CN109492222B (zh) * 2018-10-31 2023-04-07 平安科技(深圳)有限公司 基于概念树的意图识别方法、装置及计算机设备
CN110866089B (zh) * 2019-11-14 2023-04-28 国家电网有限公司 基于同义多语境分析的机器人知识库构建系统及方法
CN111639164A (zh) * 2020-04-30 2020-09-08 中国平安财产保险股份有限公司 问答系统的问答匹配方法、装置、计算机设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020042711A1 (en) * 2000-08-11 2002-04-11 Yi-Chung Lin Method for probabilistic error-tolerant natural language understanding
CN101097573A (zh) * 2006-06-28 2008-01-02 腾讯科技(深圳)有限公司 一种自动问答系统及方法
CN105279252A (zh) * 2015-10-12 2016-01-27 广州神马移动信息科技有限公司 挖掘相关词的方法、搜索方法、搜索系统
CN108509476A (zh) * 2017-09-30 2018-09-07 平安科技(深圳)有限公司 问题联想推送方法、电子装置及计算机可读存储介质
CN107832291A (zh) * 2017-10-26 2018-03-23 平安科技(深圳)有限公司 人机协作的客服方法、电子装置及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHAO, WENJING: "Research on Adding Semantic Concepts to Micro-blog Posts Based on Wikipedia", MASTER THESIS, 15 December 2013 (2013-12-15), China, pages 1 - 55, XP009530132 *

Also Published As

Publication number Publication date
CN112199958A (zh) 2021-01-08

Similar Documents

Publication Publication Date Title
CN110929125B (zh) 搜索召回方法、装置、设备及其存储介质
CN113032528B (zh) 案件分析方法、装置、设备及存储介质
WO2023178971A1 (zh) 就医的互联网挂号方法、装置、设备及存储介质
CN111985241B (zh) 医学信息查询方法、装置、电子设备及介质
CN111984851A (zh) 医学资料搜索方法、装置、电子装置及存储介质
WO2022222942A1 (zh) 问答记录生成方法、装置、电子设备及存储介质
WO2021174923A1 (zh) 概念词序列生成方法、装置、计算机设备及存储介质
CN113094478B (zh) 表情回复方法、装置、设备及存储介质
WO2022016995A1 (zh) 问答库构建方法、装置、电子设备及存储介质
CN116992007B (zh) 基于问题意图理解的限定问答系统
CN113722512A (zh) 基于语言模型的文本检索方法、装置、设备及存储介质
CN112214515A (zh) 数据自动匹配方法、装置、电子设备及存储介质
CN114387061A (zh) 产品推送方法、装置、电子设备及可读存储介质
CN117520503A (zh) 基于llm模型的金融客服对话生成方法、装置、设备及介质
CN115222443A (zh) 客户群体划分方法、装置、设备及存储介质
CN113268597B (zh) 文本分类方法、装置、设备及存储介质
WO2022227171A1 (zh) 关键信息提取方法、装置、电子设备及介质
CN111104481B (zh) 一种识别匹配字段的方法、装置及设备
CN116402166B (zh) 一种预测模型的训练方法、装置、电子设备及存储介质
US20210117448A1 (en) Iterative sampling based dataset clustering
CN116757207A (zh) 基于人工智能的icd自动编码方法及相关设备
CN116468043A (zh) 嵌套实体识别方法、装置、设备及存储介质
CN116469526A (zh) 中医诊断模型训练方法、装置、设备及存储介质
CN113627186B (zh) 基于人工智能的实体关系检测方法及相关设备
CN113420545B (zh) 摘要生成方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20923374

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20923374

Country of ref document: EP

Kind code of ref document: A1