CN112199958A - Concept word sequence generation method and device, computer equipment and storage medium - Google Patents

Concept word sequence generation method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112199958A
CN112199958A CN202011064339.XA CN202011064339A CN112199958A CN 112199958 A CN112199958 A CN 112199958A CN 202011064339 A CN202011064339 A CN 202011064339A CN 112199958 A CN112199958 A CN 112199958A
Authority
CN
China
Prior art keywords
concept
word
keyword
knowledge base
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011064339.XA
Other languages
Chinese (zh)
Inventor
蒙元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011064339.XA priority Critical patent/CN112199958A/en
Priority to PCT/CN2020/131954 priority patent/WO2021174923A1/en
Publication of CN112199958A publication Critical patent/CN112199958A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention relates to the technical field of artificial intelligence, and provides a concept word sequence generation method, a device, computer equipment and a storage medium. The method obtains question sentences; acquiring a concept knowledge base; extracting keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base; determining concept words corresponding to the keywords of the question sentences according to the corresponding relation between the keywords and the concept words in the concept knowledge base; and combining the concept words corresponding to the keywords of the question sentences into a concept word sequence according to the word sequence of the keywords of the question sentences. The method and the device determine the concept words corresponding to the keywords of the question sentences according to the corresponding relation between the keywords and the concept words in the concept knowledge base, generate the concept word sequence of the sentences, and improve the efficiency of generating the concept word sequence. The application also relates to the field of medical science and technology, and the accuracy of medical intelligent question answering is improved.

Description

Concept word sequence generation method and device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a device for generating a concept word sequence, computer equipment and a storage medium.
Background
In natural language processing in the technical field of artificial intelligence, intelligent customer service, remote inquiry and the like are important sections. The intelligent customer service or remote inquiry needs to have question and answer matching capability and recommendation capability. The concept word sequence is the basis of the question-answer matching capability and the recommendation capability of the intelligent customer service.
The sequence of concept words is a concept index corresponding to a keyword in a question sentence. But when generating the concept word sequence, the method needs larger abstract time consumption to generate keywords; meanwhile, a large matching time is needed to match the concept words according to the keywords.
How to generate the concept index of the question statement according to the question statement improves the efficiency of generating the concept index, and becomes a problem to be solved.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, an apparatus, a computer device and a storage medium for generating a sequence of concept words of a sentence, which can generate the sequence of concept words of the sentence and improve the efficiency of generating the sequence of concept words.
A first aspect of the present application provides a method for generating a sequence of concept words, where the method for generating the sequence of concept words includes:
acquiring question sentences;
acquiring a concept knowledge base, wherein the concept knowledge base comprises a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentences corresponds to a concept word;
extracting keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base;
determining concept words corresponding to the keywords of the question sentences according to the corresponding relation between the keywords and the concept words in the concept knowledge base;
and combining the concept words corresponding to the keywords of the question sentences into a concept word sequence according to the word sequence of the keywords of the question sentences.
In another possible implementation manner, the extracting the keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base includes:
performing multiple random word segmentation on the question sentence to obtain multiple word segmentation results;
for each word segmentation result, calculating the similarity score and the length score of each word in the word segmentation result according to the keywords of the concept knowledge base;
calculating the keyword score of the word segmentation result according to the similarity score and the length score of each word in the word segmentation result;
and extracting words from the word segmentation result with the lowest keyword score as the keywords of the question sentences.
In another possible implementation manner, the determining, according to the correspondence between the keyword and the concept word in the concept knowledge base, the concept word corresponding to the keyword of the question sentence includes:
acquiring a plurality of concept words of each keyword of the question sentence from the concept knowledge base according to the corresponding relation between the keyword and the concept word in the concept knowledge base;
combining any concept word of each keyword of the question sentence into one concept word combination of the question sentence to obtain a plurality of concept word combinations of the question sentence;
calculating a probability score of each concept word combination of the question sentence;
and matching the concept words in the concept word combination with the highest probability score to obtain the concept words corresponding to the keywords of the question sentences.
In another possible implementation manner, the calculating the probability score of the concept word combination includes:
randomly extracting two target concept words from the concept knowledge base, and calculating a first probability that the two target concept words are consistent with any two concept words in the concept word combination according to the concept words of the concept knowledge base to obtain a plurality of first probabilities;
randomly extracting a keyword from the concept knowledge base, and calculating a second probability that the extracted keyword is consistent with each keyword of the question sentence according to the keyword and the concept word of the concept knowledge base to obtain a plurality of second probabilities;
and calculating the product of the plurality of first probabilities and the plurality of second probabilities, and taking the obtained product result as the probability score of the concept word combination.
In another possible implementation manner, the randomly extracting two target concept words from the concept knowledge base, and calculating, according to the concept words of the concept knowledge base, first probabilities that the two target concept words are consistent with any two concept words in the concept word combination to obtain a plurality of first probabilities includes:
recording any two concept words in the concept word group as a first concept word pair, searching the first concept word pair in each sample sentence of the concept knowledge base, and counting a first number of the first concept word pairs searched in the concept knowledge base;
acquiring a plurality of concept words in the concept knowledge base, performing duplication elimination processing on the plurality of concept words in the concept knowledge base, and marking any two concept words in the duplication elimination concept words in the concept knowledge base as second concept word pairs to obtain a plurality of second concept word pairs;
calculating a second number of the plurality of second concept word pairs in the concept knowledge base;
and calculating the ratio of the first number of the first concept word pairs to the second number of the plurality of second concept word pairs, and taking the ratio of the first number of the first concept word pairs to the second number of the plurality of second concept word pairs as the first probability of any two concept words in the concept word combination to obtain a plurality of first probabilities.
In another possible implementation manner, the randomly extracting a keyword from the concept knowledge base, and calculating a second probability that the extracted keyword is consistent with each keyword of the question sentence according to the keyword and the concept word of the concept knowledge base to obtain a plurality of second probabilities includes:
recording each keyword of the question sentence as a given keyword, searching a concept word corresponding to the given keyword from the concept word combination, recording the concept word as a given concept word, and combining the given keyword and the given concept word into a first target word pair;
counting the number of the given concept words in the concept words of the concept knowledge base and recording as a fifth number;
counting the number of the first target word pairs in the keyword-concept word pairs of the concept knowledge base, and recording as a sixth number;
and calculating the ratio of the sixth quantity to the fifth quantity to obtain a second probability of the given keyword and obtain a plurality of second probabilities.
In another possible implementation manner, the randomly extracting a keyword from the concept knowledge base, and calculating a second probability that the extracted keyword is consistent with each keyword of the question sentence according to the keyword and the concept word of the concept knowledge base to obtain a plurality of second probabilities includes:
recording each keyword of the question sentence as an appointed keyword, searching a concept word corresponding to the appointed keyword from the concept word combination, and recording as an appointed concept word;
acquiring context information of the specified keyword from the question sentence;
combining the given keyword, the context information and the given conceptual phrase into a second target word pair, and combining the context information and the given conceptual phrase into a third target word pair;
acquiring context information-concept word pairs of the concept knowledge base and keyword-context information-concept word pairs of the concept knowledge base;
counting the number of the third target word pairs in the context information-concept word pairs of the concept knowledge base, and recording as a seventh number;
counting the number of the fourth target word pairs in the keyword-context information-concept word pairs of the concept knowledge base, and recording as an eighth number;
and calculating the ratio of the eighth quantity to the seventh quantity to obtain a second probability of the specified keyword, and further obtaining a plurality of second probabilities.
A second aspect of the present application provides a concept word sequence generating apparatus including:
the first acquisition module is used for acquiring question sentences;
the second acquisition module is used for acquiring a concept knowledge base, wherein the concept knowledge base comprises a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of each sample sentence corresponds to a concept word;
the extraction module is used for extracting the keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base;
the determining module is used for determining the concept words corresponding to the keywords of the question sentences according to the corresponding relation between the keywords and the concept words in the concept knowledge base;
and the combination module is used for combining the concept words corresponding to the keywords of the question sentences into concept word sequences according to the word sequences of the keywords of the question sentences.
A third aspect of the application provides a computer device comprising a processor for implementing the concept word sequence generation method when executing computer readable instructions stored in a memory.
A fourth aspect of the present application provides a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor, implement the concept word sequence generating method.
The method determines the concept words corresponding to the keywords of the question sentences according to the corresponding relation between the keywords and the concept words in the concept knowledge base, generates the concept word sequence of the sentences, and improves the efficiency of generating the concept word sequence.
Drawings
Fig. 1 is a flowchart of a concept word sequence generating method according to an embodiment of the present invention.
Fig. 2 is a flowchart for determining concept words corresponding to keywords of a question sentence according to an embodiment of the present invention.
Fig. 3 is a flowchart for calculating a probability score of a concept word combination according to an embodiment of the present invention.
Fig. 4 is a block diagram of a concept word sequence generating apparatus according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a computer device provided by an embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention, and the described embodiments are merely a subset of the embodiments of the present invention, rather than a complete embodiment.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Preferably, the concept word sequence generating method of the present invention is applied to one or more computer devices. The computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
Example one
Fig. 1 is a flowchart of a concept word sequence generating method according to an embodiment of the present invention. The method for generating the concept word sequence is applied to computer equipment and used for generating the concept word sequence of the sentence and improving the efficiency of generating the concept word sequence.
As shown in fig. 1, the method for generating a sequence of concept words includes:
101, obtaining a question statement.
In a specific embodiment, the obtaining the question statement may include: pulling question sentences from the cloud storage; or receiving question sentences input by a user; or acquiring an image comprising question sentences through a camera, and identifying the question sentences in the image through a character recognition method. The present application is not particularly limited. In the present application, the question sentence may be a question sentence related to medical insurance, for example, the question sentence is "what diseases can be protected by million doctors".
102, obtaining a concept knowledge base, wherein the concept knowledge base comprises a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentences corresponds to a concept word.
For example, the sample sentence is "i set 5 jin apples on the apple phone", the plurality of keywords corresponding to the sample sentence are the first "apple" and the second "apple", the first "apple" corresponds to one concept word "apple phone", and the second "apple" corresponds to the other concept word "fruit apple".
For another example, the sample sentence is "this programmer is often called" code man "by others, the plurality of keywords corresponding to the sample sentence are" programmer "," code man "," programmer "corresponds to the concept word" computer practitioner ", and" code man "corresponds to the concept word" computer practitioner ".
The plurality of sample sentences may have a plurality of identical keywords, and the concept words corresponding to the identical keywords may be identical or different.
The multiple keywords corresponding to each sample statement may be entity objects in the sample statement, or may be key intents of the sample statement. In a question-answer matching model based on a knowledge system, knowledge of people, namely the corresponding relation between keywords and concept words corresponding to the keywords, needs to be added. For example, apples can be considered as fruits and also as mobile phones.
The computing delay of the question-answer matching model can be reduced by indexing, matching and recommending the corresponding relation between the concept words and the keywords. The association between concept words may also be used for recommendation of insurance products.
103, extracting the keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base.
The concept knowledge base comprises a plurality of sample sentences, and each sample sentence corresponds to a plurality of keywords. Namely, a plurality of sample sentences labeled with relevant key words are contained in the concept library, and can be used as historical data with reference value, and key words can be extracted from the question sentences based on a statistical method.
In a specific embodiment, the extracting the keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base includes:
performing multiple random word segmentation on the question sentence to obtain multiple word segmentation results;
for each word segmentation result, calculating the similarity score and the length score of each word in the word segmentation result according to the keywords of the concept knowledge base;
calculating the keyword score of the word segmentation result according to the similarity score and the length score of each word in the word segmentation result;
and extracting words from the word segmentation result with the lowest keyword score as the keywords of the question sentences.
Specifically, the word segmentation result is described in detail as follows. For any word segmentation result in the plurality of word segmentation results, the word segmentation result comprises K words. For any word in the word segmentation result, the shorter the length of the any word is, the lower the length score of the any word is; and acquiring the most similar keyword of any word from the concept knowledge base, and determining the reciprocal of the similarity of the any word and the most similar keyword as the similarity score of the any word, wherein the higher the similarity of the any word and the most similar keyword is, the lower the similarity score of the any word is.
Specifically, the lowest keyword score of the word segmentation result is keywords,
Figure BDA0002713309400000051
Figure BDA0002713309400000052
wherein, argmin(K,keyword)Represents the values of the variable K and the keyword, cost (K) when the function following the symbol reaches the minimum value]) Represents a similarity score, len (keyword K) for the kth word of the K words]) Represents the length score of the kth word of the K words.
Specifically, the obtaining of the most similar keyword of any term from the concept knowledge base may include:
obtaining a vector representation of each term in the concept knowledge base;
calculating Euclidean distances of the any term from each term in the concept knowledge base based on vector representations;
and determining the word with the minimum Euclidean distance to any word in the concept knowledge base as the most similar keyword.
And 104, determining the concept words corresponding to the keywords of the question sentences according to the corresponding relation between the keywords and the concept words in the concept knowledge base.
The concept knowledge base comprises a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentences corresponds to a concept word. Namely, the concept knowledge base has the corresponding relation between the key words and the concept words. Keywords may be extracted from the question sentences using a statistical method based on the correspondence between the keywords and the concept words in the concept knowledge base.
As shown in fig. 2, the determining the concept word corresponding to the keyword of the question sentence according to the correspondence between the keyword and the concept word in the concept knowledge base includes:
41, obtaining a plurality of concept words of each keyword of the question sentence from the concept knowledge base according to the corresponding relation between the keyword and the concept word in the concept knowledge base;
42, combining any concept word of each keyword of the question sentence into one concept word combination of the question sentence to obtain a plurality of concept word combinations of the question sentence;
43, calculating a probability score of each concept word combination of the question sentence;
and 44, matching the concept words in the concept word combination with the highest probability score to obtain the concept words corresponding to the keywords of the question sentences.
And acquiring a plurality of concept words of each keyword of the question sentence from the concept knowledge base according to the corresponding relation between the keyword and the concept word in the concept knowledge base so as to obtain the concept word combination of the question sentence. For example, the question sentence includes keyword 1 and keyword 2. In a concept knowledge base, a keyword 1 in a sample sentence 1 corresponds to a concept word 11, and a keyword 1 in a sample sentence 2 corresponds to a concept word 12; the keyword 2 in the sample sentence 3 corresponds to the concept word 21, and the keyword 2 in the sample sentence 3 corresponds to the operation concept word 22. Obtaining a plurality of concept words corresponding to the keyword 1 of the question sentence as a concept word 11 and a concept word 12; the plurality of concept words corresponding to the keyword 2 of the question sentence are the concept word 21 and the concept word 22. The concept word combinations of the question sentences may include "concept word 11-concept word 21", "concept word 11-concept word 22", "concept word 12-concept word 21", "concept word 12-concept word 22".
A highest probability score for the concept word combination may be calculated based on the joint probabilities. The optimized objective function is:
Figure BDA0002713309400000061
Figure BDA0002713309400000062
the variable e representing the time when the postcursor function reaches the maximum value1,e2,…,enThe value of (a).
Wherein, wnIs the nth keyword, e, in the question statementnThe problem statement is a concept word corresponding to the nth keyword, N is more than or equal to 1 and less than or equal to N, and N is the number of the keywords in the problem statement.
In an alternative embodiment, the probability score for the combination of concept words is calculated based on joint probabilities
Figure BDA0002713309400000071
Figure BDA0002713309400000073
Wherein, P (w)1,w2,…,wn) Is the global federation information based on the maximum clique formation.
As shown in fig. 3, the calculating the probability score of the concept word combination further includes:
431, randomly extracting two target concept words from the concept knowledge base, and calculating a first probability that the two target concept words are consistent with any two concept words in the concept word combination according to the concept words of the concept knowledge base to obtain a plurality of first probabilities;
432, randomly extracting a keyword from the concept knowledge base, and calculating a second probability that the extracted keyword is consistent with each keyword of the question sentence according to the keyword and the concept word of the concept knowledge base to obtain a plurality of second probabilities;
and 433, calculating the product of the plurality of first probabilities and the plurality of second probabilities, and taking the obtained product result as the probability score of the concept word combination.
Specifically, the product of the plurality of first probabilities and the plurality of second probabilities is P (e)1,e2,…,en|w1,w2,…,wn) In direct proportion, the resulting product can be used as the probability score, P (e), of the concept word combination1,e2,…,en|w1,w2,…,wn)∝ΠnP(wn|eni,jP(ei,ej). Therein, IIi,jP(ei,ej) Global union information, P (w), simplified for maximum cliquen|en) Representing a second probability, P (e)i,ej) A first probability is represented. I is more than or equal to 1 and less than or equal to N, and j is more than or equal to 1 and less than or equal to N.
In a specific embodiment, the randomly extracting two target concept words from the concept knowledge base, and calculating, according to the concept words of the concept knowledge base, first probabilities that the two target concept words are consistent with any two concept words in the concept word combinations to obtain a plurality of first probabilities, includes:
(a) recording any two concept words in the concept word group as a first concept word pair, searching the first concept word pair in each sample sentence of the concept knowledge base, and counting a first number of the first concept word pairs searched in the concept knowledge base;
(b) acquiring a plurality of concept words in the concept knowledge base, performing duplication elimination processing on the plurality of concept words in the concept knowledge base, and marking any two concept words in the duplication elimination concept words in the concept knowledge base as second concept word pairs to obtain a plurality of second concept word pairs;
(c) calculating a second number of the plurality of second concept word pairs in the concept knowledge base;
(d) and calculating the ratio of the first number of the first concept word pairs to the second number of the plurality of second concept word pairs, and taking the ratio of the first number of the first concept word pairs to the second number of the plurality of second concept word pairs as the first probability of any two concept words in the concept word combination to obtain a plurality of first probabilities.
In particular, a first number of the first pairs of termsA ratio to a second number of the plurality of second concept word pairs and P (e)i,ej) In direct proportion, a ratio of the first number of the first concept word pairs to the second number of the plurality of second concept word pairs may be used as a first probability of any two concept words in the concept word combination,
Figure BDA0002713309400000074
wherein, count [ e ]i,ej]Representing a first number of said first pairs of concept words,
Figure BDA0002713309400000075
a ratio representing a second number of the plurality of second concept word pairs.
In another optional embodiment, the randomly extracting two target concept words from the concept knowledge base according to the concept word calculation of the concept knowledge base, wherein the first probabilities that the two target concept words are consistent with any two concept words in the concept word combination result in a plurality of first probabilities, includes:
(a) recording any two concept words in the concept phrase as a third concept word pair, searching the third concept word pair in each sample sentence of the concept knowledge base, and counting the third number of the third concept word pairs searched in the concept knowledge base;
(b) acquiring a target keyword pair corresponding to the third concept word pair from the keywords of the question sentence, acquiring a plurality of fourth concept word pairs corresponding to the target keyword pair from the concept knowledge base, searching the plurality of fourth concept word pairs in each sample sentence of the concept knowledge base, and counting the fourth number of the plurality of fourth concept word pairs searched in the concept knowledge base.
(c) And calculating the ratio of the third number of the third concept word pairs to the fourth number of the plurality of fourth concept word pairs to obtain a first probability of any two concept words in the concept phrase and obtain a plurality of first probabilities.
In a specific embodiment, the randomly extracting a keyword from the concept knowledge base, and calculating a second probability that the extracted keyword is consistent with each keyword of the question sentence according to the keyword and the concept word of the concept knowledge base to obtain a plurality of second probabilities, includes:
(a) recording each keyword of the question sentence as a given keyword, searching a concept word corresponding to the given keyword from the concept word combination, recording the concept word as a given concept word, and combining the given keyword and the given concept word into a first target word pair;
(b) counting the number of the given concept words in the concept words of the concept knowledge base and recording as a fifth number;
(c) counting the number of the first target word pairs in the keyword-concept word pairs of the concept knowledge base, and recording as a sixth number;
(d) and calculating the ratio of the sixth quantity to the fifth quantity to obtain a second probability of the given keyword and obtain a plurality of second probabilities.
In particular, the ratio of the sixth number to the fifth number to P (w)n|en) In direct proportion, a ratio of the sixth number to the fifth number may be used as the second probability for the given keyword,
Figure BDA0002713309400000081
wherein, count [ wn,en]A sixth number, count [ w ], representing the first target word pairn,en]A fifth number representing the given concept word.
In another embodiment, the randomly extracting a keyword from the concept knowledge base, and calculating a second probability that the extracted keyword is consistent with each keyword of the question sentence according to the keyword and the concept word of the concept knowledge base to obtain a plurality of second probabilities, includes:
(a) recording each keyword of the question sentence as an appointed keyword, searching a concept word corresponding to the appointed keyword from the concept word combination, and recording as an appointed concept word;
(b) acquiring context information of the specified keyword from the question sentence, combining the given keyword, the context information and the given conceptual phrase into a second target word pair, and combining the context information and the given conceptual phrase into a third target word pair;
(c) acquiring context information-concept word pairs of the concept knowledge base and keyword-context information-concept word pairs of the concept knowledge base;
(d) counting the number of the third target word pairs in the context information-concept word pairs of the concept knowledge base, and recording as a seventh number;
(e) counting the number of the fourth target word pairs in the keyword-context information-concept word pairs of the concept knowledge base, and recording as an eighth number;
(f) and calculating the ratio of the eighth quantity to the seventh quantity to obtain a second probability of the specified keyword, and further obtaining a plurality of second probabilities.
In particular, the ratio of the eighth number to the seventh number to P (w)n|en) A ratio of the eighth number to the seventh number is used as a second probability of the specified keyword,
Figure BDA0002713309400000091
wherein, count [ wn,en,wn-1,wn-2,wn+1,wn+2]An eighth number, count [ e ], representing the fourth target word pairn,wn-1,wn-2,wn+1,wn+2]A seventh number representing the third target word pair. w is an-1,wn-2,wn+1,wn+2Representing the context information, such as the two words before and after the specified keyword.
As in the above example, the question sentence is "what diseases can be guaranteed by million doctors", and the concept word corresponding to the keyword of the question sentence is "e live and safe million doctors, what, diseases, and guarantee".
And 105, combining the concept words corresponding to the keywords of the question sentences into a concept word sequence according to the word sequence of the keywords of the question sentences.
For example, a preset word queue may be obtained, a keyword of a question sentence may be converted into a word vector, a plurality of word vectors obtained by the conversion may be combined into a concept word sequence according to a word order of the keyword of the question sentence, and the preset word queue may be stored.
The concept word sequence is an abstract representation of the question statement, and can be used as intermediate data for further performing natural language processing on the question statement.
In another embodiment, after combining the concept words corresponding to the keywords of the question sentence into a concept word sequence according to the word sequence of the keywords of the question sentence, the method for generating the concept word sequence further includes:
and matching answers corresponding to the question sentences according to the concept word sequences.
And determining the concept words corresponding to the keywords of the question sentences according to the corresponding relation between the keywords and the concept words in the concept knowledge base, so that the efficiency of generating the concept word sequence is improved, and the accuracy of performing question-answer matching through the concept word sequence of the sentences is increased.
The method for generating the concept word sequence according to the first embodiment determines the concept words corresponding to the keywords of the question sentences according to the correspondence between the keywords and the concept words in the concept knowledge base, and generates the concept word sequence of the sentences, thereby improving the efficiency of generating the concept word sequence.
By the method for generating the concept word sequence, the efficiency of the concept word sequence can be improved in the remote inquiry of medical science and technology, the inquiry and answer accuracy of the remote inquiry is further improved, and the development of remote medical service is facilitated.
Example two
Fig. 4 is a block diagram of a concept word sequence generating apparatus according to a second embodiment of the present invention. The concept word sequence generating apparatus 20 is applied to a computer device. The concept word sequence generating device 20 is configured to generate a concept word sequence of a sentence, and improve efficiency of generating the concept word sequence.
As shown in fig. 4, the concept word sequence generating apparatus 20 may include a first obtaining module 201, a second obtaining module 202, an extracting module 203, a determining module 204, and a combining module 205.
The first obtaining module 201 is configured to obtain a question statement.
In a specific embodiment, the obtaining the question statement may include: pulling question sentences from the cloud storage; or receiving question sentences input by a user; or acquiring an image comprising question sentences through a camera, and identifying the question sentences in the image through a character recognition method. The present application is not particularly limited. In the present application, the question sentence may be a question sentence related to medical insurance, for example, the question sentence is "what diseases can be protected by million doctors".
A second obtaining module 202, configured to obtain a concept knowledge base, where the concept knowledge base includes a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentence corresponds to a concept word.
For example, the sample sentence is "i set 5 jin apples on the apple phone", the plurality of keywords corresponding to the sample sentence are the first "apple" and the second "apple", the first "apple" corresponds to one concept word "apple phone", and the second "apple" corresponds to the other concept word "fruit apple".
For another example, the sample sentence is "this programmer is often called" code man "by others, the plurality of keywords corresponding to the sample sentence are" programmer "," code man "," programmer "corresponds to the concept word" computer practitioner ", and" code man "corresponds to the concept word" computer practitioner ".
The plurality of sample sentences may have a plurality of identical keywords, and the concept words corresponding to the identical keywords may be identical or different.
The multiple keywords corresponding to each sample statement may be entity objects in the sample statement, or may be key intents of the sample statement. In a question-answer matching model based on a knowledge system, knowledge of people, namely the corresponding relation between keywords and concept words corresponding to the keywords, needs to be added. For example, apples can be considered as fruits and also as mobile phones.
The computing delay of the question-answer matching model can be reduced by indexing, matching and recommending the corresponding relation between the concept words and the keywords. The association between concept words may also be used for recommendation of insurance products.
An extracting module 203, configured to extract the keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base.
The concept knowledge base comprises a plurality of sample sentences, and each sample sentence corresponds to a plurality of keywords. Namely, a plurality of sample sentences labeled with relevant key words are contained in the concept library, and can be used as historical data with reference value, and key words can be extracted from the question sentences based on a statistical method.
In a specific embodiment, the extracting the keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base includes:
performing multiple random word segmentation on the question sentence to obtain multiple word segmentation results;
for each word segmentation result, calculating the similarity score and the length score of each word in the word segmentation result according to the keywords of the concept knowledge base;
calculating the keyword score of the word segmentation result according to the similarity score and the length score of each word in the word segmentation result;
and extracting words from the word segmentation result with the lowest keyword score as the keywords of the question sentences.
Specifically, the word segmentation result is described in detail as follows. For any word segmentation result in the plurality of word segmentation results, the word segmentation result comprises K words. For any word in the word segmentation result, the shorter the length of the any word is, the lower the length score of the any word is; and acquiring the most similar keyword of any word from the concept knowledge base, and determining the reciprocal of the similarity of the any word and the most similar keyword as the similarity score of the any word, wherein the higher the similarity of the any word and the most similar keyword is, the lower the similarity score of the any word is.
Specifically, the lowest keyword score of the word segmentation result is keywords,
Figure BDA0002713309400000111
Figure BDA0002713309400000112
wherein, argmin(K,keyword)Represents the values of the variable K and the keyword, cost (K) when the function following the symbol reaches the minimum value]) Represents a similarity score, len (keyword K) for the kth word of the K words]) Represents the length score of the kth word of the K words.
Specifically, the obtaining of the most similar keyword of any term from the concept knowledge base may include:
obtaining a vector representation of each term in the concept knowledge base;
calculating Euclidean distances of the any term from each term in the concept knowledge base based on vector representations;
and determining the word with the minimum Euclidean distance to any word in the concept knowledge base as the most similar keyword.
A determining module 204, configured to determine, according to the correspondence between the keyword and the concept word in the concept knowledge base, the concept word corresponding to the keyword of the question sentence.
The concept knowledge base comprises a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentences corresponds to a concept word. Namely, the concept knowledge base has the corresponding relation between the key words and the concept words. Keywords may be extracted from the question sentences using a statistical method based on the correspondence between the keywords and the concept words in the concept knowledge base.
In a specific embodiment, the determining, according to the correspondence between the keyword and the concept word in the concept knowledge base, the concept word corresponding to the keyword of the question sentence includes:
41, obtaining a plurality of concept words of each keyword of the question sentence from the concept knowledge base according to the corresponding relation between the keyword and the concept word in the concept knowledge base;
42, combining any concept word of each keyword of the question sentence into one concept word combination of the question sentence to obtain a plurality of concept word combinations of the question sentence;
43, calculating a probability score of each concept word combination of the question sentence;
and 44, matching the concept words in the concept word combination with the highest probability score to obtain the concept words corresponding to the keywords of the question sentences.
And acquiring a plurality of concept words of each keyword of the question sentence from the concept knowledge base according to the corresponding relation between the keyword and the concept word in the concept knowledge base so as to obtain the concept word combination of the question sentence. For example, the question sentence includes keyword 1 and keyword 2. In a concept knowledge base, a keyword 1 in a sample sentence 1 corresponds to a concept word 11, and a keyword 1 in a sample sentence 2 corresponds to a concept word 12; the keyword 2 in the sample sentence 3 corresponds to the concept word 21, and the keyword 2 in the sample sentence 3 corresponds to the operation concept word 22. Obtaining a plurality of concept words corresponding to the keyword 1 of the question sentence as a concept word 11 and a concept word 12; the plurality of concept words corresponding to the keyword 2 of the question sentence are the concept word 21 and the concept word 22. The concept word combinations of the question sentences may include "concept word 11-concept word 21", "concept word 11-concept word 22", "concept word 12-concept word 21", "concept word 12-concept word 22".
A highest probability score for the concept word combination may be calculated based on the joint probabilities. The optimized objective function is:
Figure BDA0002713309400000121
Figure BDA0002713309400000122
the variable e representing the time when the postcursor function reaches the maximum value1,e2,…,enThe value of (a).
Wherein, wnIs the nth keyword, e, in the question statementnThe problem statement is a concept word corresponding to the nth keyword, N is more than or equal to 1 and less than or equal to N, and N is the number of the keywords in the problem statement.
In an alternative embodiment, the probability score for the combination of concept words is calculated as P (e) based on joint probabilities1,e2,…,en|w1,w2,…,wn),
Figure BDA0002713309400000123
Figure BDA0002713309400000124
Wherein, P (w)1,w2,…,wn) Is the global federation information based on the maximum clique formation.
In a specific embodiment, the calculating the probability score of the concept word combination further includes:
431, randomly extracting two target concept words from the concept knowledge base, and calculating a first probability that the two target concept words are consistent with any two concept words in the concept word combination according to the concept words of the concept knowledge base to obtain a plurality of first probabilities;
432, randomly extracting a keyword from the concept knowledge base, and calculating a second probability that the extracted keyword is consistent with each keyword of the question sentence according to the keyword and the concept word of the concept knowledge base to obtain a plurality of second probabilities;
and 433, calculating the product of the plurality of first probabilities and the plurality of second probabilities, and taking the obtained product result as the probability score of the concept word combination.
Specifically, the product of the plurality of first probabilities and the plurality of second probabilities is P (e)1,e2,…,en|w1,w2,…,wn) In direct proportion, the resulting product result may be taken as the summaryProbability score of a combination of pronouns, P (e)1,e2,…,en|w1,w2,…,wn)∝ΠnP(wn|eni,jP(ei,ej). Therein, IIi,jP(ei,ej) Global union information, P (w), simplified for maximum cliquen|en) Representing a second probability, P (e)i,ej) A first probability is represented. I is more than or equal to 1 and less than or equal to N, and j is more than or equal to 1 and less than or equal to N.
In a specific embodiment, the randomly extracting two target concept words from the concept knowledge base, and calculating, according to the concept words of the concept knowledge base, first probabilities that the two target concept words are consistent with any two concept words in the concept word combinations to obtain a plurality of first probabilities, includes:
(a) recording any two concept words in the concept word group as a first concept word pair, searching the first concept word pair in each sample sentence of the concept knowledge base, and counting a first number of the first concept word pairs searched in the concept knowledge base;
(b) acquiring a plurality of concept words in the concept knowledge base, performing duplication elimination processing on the plurality of concept words in the concept knowledge base, and marking any two concept words in the duplication elimination concept words in the concept knowledge base as second concept word pairs to obtain a plurality of second concept word pairs;
(c) calculating a second number of the plurality of second concept word pairs in the concept knowledge base;
(d) and calculating the ratio of the first number of the first concept word pairs to the second number of the plurality of second concept word pairs, and taking the ratio of the first number of the first concept word pairs to the second number of the plurality of second concept word pairs as the first probability of any two concept words in the concept word combination to obtain a plurality of first probabilities.
Specifically, a ratio of a first number of the first concept word pairs to a second number of the plurality of second concept word pairs to P (e)i,ej) In direct proportion, the first concept can beA ratio of the first number of word pairs to the second number of the plurality of second concept word pairs is used as a first probability of any two concept words in the concept word combination,
Figure BDA0002713309400000131
wherein, count [ e ]i,ej]Representing a first number of said first pairs of concept words,
Figure BDA0002713309400000132
a ratio representing a second number of the plurality of second concept word pairs.
In another optional embodiment, the randomly extracting two target concept words from the concept knowledge base according to the concept word calculation of the concept knowledge base, the first probability that the two target concept words are consistent with any two concept words in the concept word combination comprises:
(a) recording any two concept words in the concept phrase as a third concept word pair, searching the third concept word pair in each sample sentence of the concept knowledge base, and counting the third number of the third concept word pairs searched in the concept knowledge base;
(b) acquiring a target keyword pair corresponding to the third concept word pair from the keywords of the question sentence, acquiring a plurality of fourth concept word pairs corresponding to the target keyword pair from the concept knowledge base, searching the plurality of fourth concept word pairs in each sample sentence of the concept knowledge base, and counting the fourth number of the plurality of fourth concept word pairs searched in the concept knowledge base.
(c) And calculating the ratio of the third number of the third concept word pairs to the fourth number of the plurality of fourth concept word pairs to obtain a first probability of any two concept words in the concept phrase and obtain a plurality of first probabilities.
In a specific embodiment, the randomly extracting a keyword from the concept knowledge base, and calculating a second probability that the extracted keyword is consistent with each keyword of the question sentence according to the keyword and the concept word of the concept knowledge base to obtain a plurality of second probabilities, includes:
(a) recording each keyword of the question sentence as a given keyword, searching a concept word corresponding to the given keyword from the concept word combination, recording the concept word as a given concept word, and combining the given keyword and the given concept word into a first target word pair;
(b) counting the number of the given concept words in the concept words of the concept knowledge base and recording as a fifth number;
(c) counting the number of the first target word pairs in the keyword-concept word pairs of the concept knowledge base, and recording as a sixth number;
(d) and calculating the ratio of the sixth quantity to the fifth quantity to obtain a second probability of the given keyword and obtain a plurality of second probabilities.
In particular, the ratio of the sixth number to the fifth number to P (w)n|en) In direct proportion, a ratio of the sixth number to the fifth number may be used as the second probability for the given keyword,
Figure BDA0002713309400000133
wherein, count [ wn,en]A sixth number, count [ w ], representing the first target word pairn,en]A fifth number representing the given concept word.
In another embodiment, the randomly extracting a keyword from the concept knowledge base, and calculating a second probability that the extracted keyword is consistent with each keyword of the question sentence according to the keyword and the concept word of the concept knowledge base to obtain a plurality of second probabilities, includes:
(a) recording each keyword of the question sentence as an appointed keyword, searching a concept word corresponding to the appointed keyword from the concept word combination, and recording as an appointed concept word;
(b) acquiring context information of the specified keyword from the question sentence, combining the given keyword, the context information and the given conceptual phrase into a second target word pair, and combining the context information and the given conceptual phrase into a third target word pair;
(c) acquiring context information-concept word pairs of the concept knowledge base and keyword-context information-concept word pairs of the concept knowledge base;
(d) counting the number of the third target word pairs in the context information-concept word pairs of the concept knowledge base, and recording as a seventh number;
(e) counting the number of the fourth target word pairs in the keyword-context information-concept word pairs of the concept knowledge base, and recording as an eighth number;
(f) and calculating the ratio of the eighth quantity to the seventh quantity to obtain a second probability of the specified keyword, and further obtaining a plurality of second probabilities.
In particular, the ratio of the eighth number to the seventh number to P (w)n|en) A ratio of the eighth number to the seventh number is used as a second probability of the specified keyword,
Figure BDA0002713309400000141
wherein, count [ wn,en,wn-1,wn-2,wn+1,wn+2]An eighth number, count [ e ], representing the fourth target word pairn,wn-1,wn-2,wn+1,wn+2]A seventh number representing the third target word pair. w is an-1,wn-2,wn+1,wn+2Representing the context information, such as the two words before and after the specified keyword.
As in the above example, the question sentence is "what diseases can be guaranteed by million doctors", and the concept word corresponding to the keyword of the question sentence is "e live and safe million doctors, what, diseases, and guarantee".
And the combining module 205 is configured to combine the concept words corresponding to the keywords of the question sentences into a concept word sequence according to the word order of the keywords of the question sentences.
For example, a preset word queue may be obtained, a keyword of a question sentence may be converted into a word vector, a plurality of word vectors obtained by the conversion may be combined into a concept word sequence according to a word order of the keyword of the question sentence, and the preset word queue may be stored.
The concept word sequence is an abstract representation of the question statement, and can be used as intermediate data for further performing natural language processing on the question statement.
In another embodiment, the concept word sequence generating device further includes a matching module, configured to, after combining the concept words corresponding to the keywords of the question sentence into the concept word sequence according to the word sequence of the keywords of the question sentence, match an answer corresponding to the question sentence according to the concept word sequence.
And determining the concept words corresponding to the keywords of the question sentences according to the corresponding relation between the keywords and the concept words in the concept knowledge base, so that the efficiency of generating the concept word sequence is improved, and the accuracy of performing question-answer matching through the concept word sequence of the sentences is increased.
The concept word sequence generating apparatus 20 according to the second embodiment determines the concept words corresponding to the keywords of the question sentences according to the correspondence between the keywords and the concept words in the concept knowledge base, generates the concept word sequence of the sentences, and improves the efficiency of generating the concept word sequence.
EXAMPLE III
The present embodiment provides a storage medium, which stores computer readable instructions, and the computer readable instructions, when executed by a processor, implement the steps in the concept word sequence generating method described above, for example, the steps 101 and 105 shown in fig. 1:
101, acquiring a question statement;
102, obtaining a concept knowledge base, wherein the concept knowledge base comprises a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentences corresponds to a concept word;
103, extracting the keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base;
104, determining the concept words corresponding to the keywords of the question sentences according to the corresponding relation between the keywords and the concept words in the concept knowledge base;
and 105, combining the concept words corresponding to the keywords of the question sentences into a concept word sequence according to the word sequence of the keywords of the question sentences.
Alternatively, the computer readable instructions, when executed by the processor, implement the functions of the modules in the above device embodiments, for example, the module 201 and 205 in fig. 4:
a first obtaining module 201, configured to obtain a question statement;
a second obtaining module 202, configured to obtain a concept knowledge base, where the concept knowledge base includes a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentence corresponds to a concept word;
an extraction module 203, configured to extract keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base;
a determining module 204, configured to determine, according to a correspondence between the keyword and the concept word in the concept knowledge base, a concept word corresponding to the keyword of the question sentence;
and the combining module 205 is configured to combine the concept words corresponding to the keywords of the question sentences into a concept word sequence according to the word order of the keywords of the question sentences.
Example four
Fig. 5 is a schematic diagram of a computer device according to a third embodiment of the present invention. The computer device 30 comprises a memory 301, a processor 302, and computer readable instructions, such as a concept word sequence generating program, stored in the memory 301 and executable on the processor 302. The processor 302, when executing the computer readable instructions, implements the steps in the above-mentioned method for generating a concept word sequence, for example, 101-105 shown in fig. 1:
101, acquiring a question statement;
102, obtaining a concept knowledge base, wherein the concept knowledge base comprises a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentences corresponds to a concept word;
103, extracting the keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base;
104, determining the concept words corresponding to the keywords of the question sentences according to the corresponding relation between the keywords and the concept words in the concept knowledge base;
and 105, combining the concept words corresponding to the keywords of the question sentences into a concept word sequence according to the word sequence of the keywords of the question sentences.
Alternatively, the computer readable instructions, when executed by the processor, implement the functions of the modules in the above device embodiments, for example, the module 201 and 205 in fig. 4:
a first obtaining module 201, configured to obtain a question statement;
a second obtaining module 202, configured to obtain a concept knowledge base, where the concept knowledge base includes a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentence corresponds to a concept word;
an extraction module 203, configured to extract keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base;
a determining module 204, configured to determine, according to a correspondence between the keyword and the concept word in the concept knowledge base, a concept word corresponding to the keyword of the question sentence;
and the combining module 205 is configured to combine the concept words corresponding to the keywords of the question sentences into a concept word sequence according to the word order of the keywords of the question sentences.
Illustratively, the computer readable instructions may be partitioned into one or more modules that are stored in the memory 301 and executed by the processor 302 to perform the present method. The one or more modules may be a series of computer-readable instructions capable of performing certain functions and describing the execution of the computer-readable instructions in the computer device 30. For example, the computer readable instructions can be divided into a first obtaining module 201, a second obtaining module 202, an extracting module 203, a determining module 204 and a combining module 205 in fig. 4, and the specific functions of the modules are shown in embodiment two.
Those skilled in the art will appreciate that the schematic diagram 5 is merely an example of the computer device 30 and does not constitute a limitation of the computer device 30, and may include more or less components than those shown, or combine certain components, or different components, for example, the computer device 30 may also include input and output devices, network access devices, buses, etc.
The Processor 302 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor 302 may be any conventional processor or the like, the processor 302 being the control center for the computer device 30 and connecting the various parts of the overall computer device 30 using various interfaces and lines.
The memory 301 may be used to store the computer readable instructions, and the processor 302 may implement the various functions of the computer device 30 by executing or executing the computer readable instructions or modules stored in the memory 301 and invoking the data stored in the memory 301. The memory 301 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the computer device 30, and the like. In addition, the Memory 301 may include a hard disk, a Memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Memory Card (Flash Card), at least one disk storage device, a Flash Memory device, a Read-Only Memory (ROM), a Random Access Memory (RAM), or other non-volatile/volatile storage devices.
The modules integrated by the computer device 30 may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by hardware related to computer readable instructions, which may be stored in a storage medium, and when the computer readable instructions are executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer readable instructions comprise computer readable instruction code which may be in source code form, object code form, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer readable instruction code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, Read Only Memory (ROM), Random Access Memory (RAM), etc.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware form, and can also be realized in a form of hardware and a software functional module.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the concept word sequence generating method according to the embodiments of the present invention.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned. Furthermore, it is to be understood that the word "comprising" does not exclude other modules or steps, and the singular does not exclude the plural. A plurality of modules or means recited in the system claims may also be implemented by one module or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method for generating a sequence of concept words, the method comprising:
acquiring question sentences;
acquiring a concept knowledge base, wherein the concept knowledge base comprises a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of the sample sentences corresponds to a concept word;
extracting keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base;
determining concept words corresponding to the keywords of the question sentences according to the corresponding relation between the keywords and the concept words in the concept knowledge base;
and combining the concept words corresponding to the keywords of the question sentences into a concept word sequence according to the word sequence of the keywords of the question sentences.
2. The method for generating a sequence of concept words according to claim 1, wherein the extracting the keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base comprises:
performing multiple random word segmentation on the question sentence to obtain multiple word segmentation results;
for each word segmentation result, calculating the similarity score and the length score of each word in the word segmentation result according to the keywords of the concept knowledge base;
calculating the keyword score of the word segmentation result according to the similarity score and the length score of each word in the word segmentation result;
and extracting words from the word segmentation result with the lowest keyword score as the keywords of the question sentences.
3. The method for generating a sequence of concept words according to claim 1, wherein the determining the concept words corresponding to the keywords of the question sentence according to the correspondence between the keywords and the concept words in the concept knowledge base comprises:
acquiring a plurality of concept words of each keyword of the question sentence from the concept knowledge base according to the corresponding relation between the keyword and the concept word in the concept knowledge base;
combining any concept word of each keyword of the question sentence into one concept word combination of the question sentence to obtain a plurality of concept word combinations of the question sentence;
calculating a probability score of each concept word combination of the question sentence;
and matching the concept words in the concept word combination with the highest probability score to obtain the concept words corresponding to the keywords of the question sentences.
4. The method of generating a sequence of concept words according to claim 3, wherein the calculating a probability score of the combination of concept words includes:
randomly extracting two target concept words from the concept knowledge base, and calculating a first probability that the two target concept words are consistent with any two concept words in the concept word combination according to the concept words of the concept knowledge base to obtain a plurality of first probabilities;
randomly extracting a keyword from the concept knowledge base, and calculating a second probability that the extracted keyword is consistent with each keyword of the question sentence according to the keyword and the concept word of the concept knowledge base to obtain a plurality of second probabilities;
and calculating the product of the plurality of first probabilities and the plurality of second probabilities, and taking the obtained product result as the probability score of the concept word combination.
5. The method for generating a sequence of concept words according to claim 4, wherein the randomly extracting two target concept words from the concept knowledge base, and calculating a first probability that the two target concept words are consistent with any two concept words in the combination of concept words according to the concept words of the concept knowledge base to obtain a plurality of first probabilities includes:
recording any two concept words in the concept word group as a first concept word pair, searching the first concept word pair in each sample sentence of the concept knowledge base, and counting a first number of the first concept word pairs searched in the concept knowledge base;
acquiring a plurality of concept words in the concept knowledge base, performing duplication elimination processing on the plurality of concept words in the concept knowledge base, and marking any two concept words in the duplication elimination concept words in the concept knowledge base as second concept word pairs to obtain a plurality of second concept word pairs;
calculating a second number of the plurality of second concept word pairs in the concept knowledge base;
and calculating the ratio of the first number of the first concept word pairs to the second number of the plurality of second concept word pairs, and taking the ratio of the first number of the first concept word pairs to the second number of the plurality of second concept word pairs as the first probability of any two concept words in the concept word combination to obtain a plurality of first probabilities.
6. The method for generating a sequence of concept words according to claim 4, wherein said randomly extracting a keyword from said concept knowledge base, calculating a second probability that said extracted keyword is consistent with each keyword of said question sentence according to said keyword and said concept word of said concept knowledge base, and obtaining a plurality of second probabilities comprises:
recording each keyword of the question sentence as a given keyword, searching a concept word corresponding to the given keyword from the concept word combination, recording the concept word as a given concept word, and combining the given keyword and the given concept word into a first target word pair;
counting the number of the given concept words in the concept words of the concept knowledge base and recording as a fifth number;
counting the number of the first target word pairs in the keyword-concept word pairs of the concept knowledge base, and recording as a sixth number;
and calculating the ratio of the sixth quantity to the fifth quantity to obtain a second probability of the given keyword and obtain a plurality of second probabilities.
7. The method for generating a sequence of concept words according to claim 4, wherein said randomly extracting a keyword from said concept knowledge base, calculating a second probability that said extracted keyword is consistent with each keyword of said question sentence according to said keyword and said concept word of said concept knowledge base, and obtaining a plurality of second probabilities comprises:
recording each keyword of the question sentence as an appointed keyword, searching a concept word corresponding to the appointed keyword from the concept word combination, and recording as an appointed concept word;
acquiring context information of the specified keyword from the question sentence;
combining the given keyword, the context information and the given conceptual phrase into a second target word pair, and combining the context information and the given conceptual phrase into a third target word pair;
acquiring context information-concept word pairs of the concept knowledge base and keyword-context information-concept word pairs of the concept knowledge base;
counting the number of the third target word pairs in the context information-concept word pairs of the concept knowledge base, and recording as a seventh number;
counting the number of the fourth target word pairs in the keyword-context information-concept word pairs of the concept knowledge base, and recording as an eighth number;
and calculating the ratio of the eighth quantity to the seventh quantity to obtain a second probability of the specified keyword, and further obtaining a plurality of second probabilities.
8. A concept word sequence generating apparatus, characterized in that the concept word sequence generating apparatus comprises:
the first acquisition module is used for acquiring question sentences;
the second acquisition module is used for acquiring a concept knowledge base, wherein the concept knowledge base comprises a plurality of sample sentences, each sample sentence corresponds to a plurality of keywords, and each keyword of each sample sentence corresponds to a concept word;
the extraction module is used for extracting the keywords of the question sentences from the question sentences according to the keywords of the concept knowledge base;
the determining module is used for determining the concept words corresponding to the keywords of the question sentences according to the corresponding relation between the keywords and the concept words in the concept knowledge base;
and the combination module is used for combining the concept words corresponding to the keywords of the question sentences into concept word sequences according to the word sequences of the keywords of the question sentences.
9. A computer device, characterized in that the computer device comprises a processor and a memory, the processor being configured to execute computer-readable instructions stored in the memory to implement the concept word sequence generation method according to any of claims 1 to 7.
10. A storage medium having stored thereon computer-readable instructions, which when executed by a processor implement the concept word sequence generating method according to any one of claims 1 to 7.
CN202011064339.XA 2020-09-30 2020-09-30 Concept word sequence generation method and device, computer equipment and storage medium Withdrawn CN112199958A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011064339.XA CN112199958A (en) 2020-09-30 2020-09-30 Concept word sequence generation method and device, computer equipment and storage medium
PCT/CN2020/131954 WO2021174923A1 (en) 2020-09-30 2020-11-26 Concept word sequence generation method, apparatus, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011064339.XA CN112199958A (en) 2020-09-30 2020-09-30 Concept word sequence generation method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112199958A true CN112199958A (en) 2021-01-08

Family

ID=74013140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011064339.XA Withdrawn CN112199958A (en) 2020-09-30 2020-09-30 Concept word sequence generation method and device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112199958A (en)
WO (1) WO2021174923A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255351A (en) * 2021-06-22 2021-08-13 中国平安财产保险股份有限公司 Sentence intention recognition method and device, computer equipment and storage medium
CN113361272A (en) * 2021-06-22 2021-09-07 海信视像科技股份有限公司 Method and device for extracting concept words of media asset title

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150382A (en) * 2013-03-14 2013-06-12 中国科学院计算技术研究所 Automatic short text semantic concept expansion method and system based on open knowledge base
CN107832291A (en) * 2017-10-26 2018-03-23 平安科技(深圳)有限公司 Client service method, electronic installation and the storage medium of man-machine collaboration
CN108460011A (en) * 2018-02-01 2018-08-28 北京百度网讯科技有限公司 A kind of entitative concept mask method and system
CN109492222A (en) * 2018-10-31 2019-03-19 平安科技(深圳)有限公司 Intension recognizing method, device and computer equipment based on conceptional tree
CN110866089A (en) * 2019-11-14 2020-03-06 国家电网有限公司 Robot knowledge base construction system and method based on synonymous multi-language environment analysis
CN111639164A (en) * 2020-04-30 2020-09-08 中国平安财产保险股份有限公司 Question-answer matching method and device of question-answer system, computer equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW472232B (en) * 2000-08-11 2002-01-11 Ind Tech Res Inst Probability-base fault-tolerance natural language understanding method
CN101097573B (en) * 2006-06-28 2010-06-09 腾讯科技(深圳)有限公司 Automatically request-answering system and method
CN105279252B (en) * 2015-10-12 2017-12-26 广州神马移动信息科技有限公司 Excavate method, searching method, the search system of related term
CN108509476A (en) * 2017-09-30 2018-09-07 平安科技(深圳)有限公司 Problem associates method for pushing, electronic device and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150382A (en) * 2013-03-14 2013-06-12 中国科学院计算技术研究所 Automatic short text semantic concept expansion method and system based on open knowledge base
CN107832291A (en) * 2017-10-26 2018-03-23 平安科技(深圳)有限公司 Client service method, electronic installation and the storage medium of man-machine collaboration
CN108460011A (en) * 2018-02-01 2018-08-28 北京百度网讯科技有限公司 A kind of entitative concept mask method and system
CN109492222A (en) * 2018-10-31 2019-03-19 平安科技(深圳)有限公司 Intension recognizing method, device and computer equipment based on conceptional tree
CN110866089A (en) * 2019-11-14 2020-03-06 国家电网有限公司 Robot knowledge base construction system and method based on synonymous multi-language environment analysis
CN111639164A (en) * 2020-04-30 2020-09-08 中国平安财产保险股份有限公司 Question-answer matching method and device of question-answer system, computer equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255351A (en) * 2021-06-22 2021-08-13 中国平安财产保险股份有限公司 Sentence intention recognition method and device, computer equipment and storage medium
CN113361272A (en) * 2021-06-22 2021-09-07 海信视像科技股份有限公司 Method and device for extracting concept words of media asset title
CN113361272B (en) * 2021-06-22 2023-03-21 海信视像科技股份有限公司 Method and device for extracting concept words of media asset title

Also Published As

Publication number Publication date
WO2021174923A1 (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN106649818B (en) Application search intention identification method and device, application search method and server
CN112667794A (en) Intelligent question-answer matching method and system based on twin network BERT model
CN110297893B (en) Natural language question-answering method, device, computer device and storage medium
CN111984851B (en) Medical data searching method, device, electronic device and storage medium
CN110457672B (en) Keyword determination method and device, electronic equipment and storage medium
CN111078837A (en) Intelligent question and answer information processing method, electronic equipment and computer readable storage medium
CN112287069B (en) Information retrieval method and device based on voice semantics and computer equipment
CN112328800A (en) System and method for automatically generating programming specification question answers
CN115470338B (en) Multi-scenario intelligent question answering method and system based on multi-path recall
CN113094478B (en) Expression reply method, device, equipment and storage medium
US20210174161A1 (en) Method and apparatus for multi-document question answering
CN115062134B (en) Knowledge question-answering model training and knowledge question-answering method, device and computer equipment
KR102271361B1 (en) Device for automatic question answering
CN112199958A (en) Concept word sequence generation method and device, computer equipment and storage medium
CN114387061A (en) Product pushing method and device, electronic equipment and readable storage medium
WO2022222942A1 (en) Method and apparatus for generating question and answer record, electronic device, and storage medium
CN112214515A (en) Data automatic matching method and device, electronic equipment and storage medium
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
CN114328800A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN113722512A (en) Text retrieval method, device and equipment based on language model and storage medium
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
CN114842982B (en) Knowledge expression method, device and system for medical information system
CN113468311B (en) Knowledge graph-based complex question and answer method, device and storage medium
CN115437620B (en) Natural language programming method, device, equipment and storage medium
CN111159366A (en) Question-answer optimization method based on orthogonal theme representation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210108

WW01 Invention patent application withdrawn after publication